When eBPF pt_regs reads return garbage on the latest Linux kernels, blame Fred

105 points by tanelpoder 4 months ago

blahgeek 4 months ago

The title seems to suggest it's a bug of the linux kernel, but it's not. The layout of an internal struct is never part of the user-facing API of kernel, so it's bound to be changing. eBPF programs are expected to be compiled against the exact kernel config that is going to be executed, just like any kernel modules.

tanelpoder 4 months ago

Author here, yep, I agree, not a bug, just what you get if you start accessing raw stuff instead of stable interfaces and underlying things change (but in this case it was necessary). I mentioned this in the summary in the end.
My understanding is that with BTF + CO-RE, you'll have the flexibility of building your program binary once and it will work on other (BTF+CORE capable) kernel versions without without needing a recompile. But since I had to use lower level methods for stack & pt_regs access, I had to manually add logic for checking kernel structural differences at runtime.
That being said, I have not yet gotten to test if a compiled xcapture binary can just be copied between machines (of the same architecture) and whether it really works as advertised...
- throwaway127482 4 months ago
  
  How would you recommend learning about eBPF / BTF / CO-RE? I have a basic understanding of what they are, but am not sure where to start for writing eBPF tracing programs, setting them up in production w/ Grafana, etc.
  - tanelpoder 4 months ago
    
    For getting started with eBPF performance mindset, I normally recommend Brendan Gregg's book just to see what's possible:
    - https://www.brendangregg.com/bpf-performance-tools-book.html
    And as a related activity, you could just install the bcc-tools package (on RHEL clones) and check out the /usr/share/bcc/tools directory to see what's already implemented (on latest Ubuntu, these tools seem to be installed in /usr/sbin, etc, but you could "find /usr -name *bpfcc" to get a list of eBPF tools already installed there (and test/man some more interesting ones).
    For the bigger picture and other eBPF uses like networking, I'd get Liz Rice's eBPF book (free download):
    - https://isovalent.com/books/learning-ebpf/
    But the most valuable resource for me when I took the leap from writing bpftrace one-liners to more sophisticated modern eBPF programs was (and still is) Andrii Nakryiko's blog with examples of modern BPF programming:
    - https://nakryiko.com/
    
    AceJohnny2 4 months ago
    
    > but you could "find /usr -name bpfcc"*
    Or ask your package manager. Debian (Ubuntu): `dpkg -L bpfcc-tools`, RedHat: `rpm -ql bcc-tools`
- quotemstr 4 months ago
  
  Even CO-RE won't help long term. Sure, it'll adjust structure offsets and such for you, but it can't deal with conceptual changes. We need an explicit contractual stable API, at the BPF helper level, not a quivering mound of hacks trying to make unstable APIs magically stable without the work needed to actually stabilize them.

quotemstr 4 months ago

ABI-stable APIs win in the end --- and if you don't make them with deliberation, you get them accidentally as you buckle under the social pressure of people complaining that you broke their software.

Piracy is famously a customer service problem. So is ABI instability. The Linux kernel should have ABI stable APIs that let people do what they need to do. These APIs can live at the syscall level, the eBPF helper level, the ftrace level, or anywhere else, but they have to be defined, stable, and tested contracts. The current approach to eBPF creates incredible moral hazard because it lets people work around missing stable APIs by poking at internal kernel data structures. The result is an ecosystem full of code that will break when Linux changes ostensibly unstable things, forcing Linux to stabilize those things regardless of whether these things are the right things --- thereby constraining the evolution of the kernel more than if they'd just designed deliberate APIs that let people do what they need.

tialaramex 4 months ago

I think I partly agree with you.
Because of Hyrum's Law people will end up depending on everything but in practice if you stabilize everything as C++ attempts to that's a disaster too, just a different and more insidious one. The amount of lost performance is small (Titus Winters estimated perhaps 1% in 2019) but it grows slowly and forever. However the missed opportunities to fix bugs are an incalculable loss. The reason something as basic as sorting isn't very good is sorting code which was passable in 1998 is grandfathered in as a must-not-change Hyrum's law stable ABI. The state of the art changed, the provided baseline stayed where it was.
You need to properly set expectations, and part of that is design by avoiding mirages of stability. If X isn't promised, don't leave it to chance whether your users think they're promised X, make it as obvious as possible that you can't have X. Don't just accept that "Life, uh, finds a way". We're active agents, we can do better.
When Clang's libc++ tried to fix sorting they had to roll it back due to breaking real software. When Rust changed their sort implementations most people didn't even notice, maybe a few said "Ooh, my program is faster, I guess the new compiler optimised it better?". That's about correctly setting expectations.
LegionMammal978 4 months ago

Who buckles first likely depends on the parties involved. Linux maintainers can be pretty insistent with their interface design (cf. the recent Rust drama), so I could just as easily see a status quo where these eBPF hooks continue to be available, but almost everyone steers clear of them due to their instability, except for ad-hoc things like SystemTap scripts.
(The social contract in these sorts of projects goes both ways: maintainers usually get a lot of leeway to force devs to adapt to changing APIs or requirements, but the devs will start running off if it's done too frequently and drastically. Unless the maintainer has a monopoly on end-users, like Apple, or enough of a cult of personality to cast everything as the devs' fault for not keeping up.)
(There's also sometimes a third option, where the project has become so entrenched that devs keep using it but stop updating it, like Java 8 or Python 2. This often causes much consternation among maintainers, occasionally to the point of inserting active mechanisms to punish devs who don't update.)

awsanswers 4 months ago

This is interesting. Operating on the edge of user space

tanelpoder 4 months ago

Author here, yeah indeed. Earlier I was worried if eBPF verifier would even let me do plain integer arithmetic and then just use it as a struct pointer to an "arbitrary" kernel memory location, but it works.
Another research/testing area is whether I need to worry about memory ordering and possibly add some memory barriers, especially on ARM platform, as the eBPF task iterator passive sampler is an outside observer, periodically running on a different CPU, not injected to the context (and critical path!) of the monitored threads via tracepoints/probes.