5 Essential Ways GitHub Uses eBPF to Prevent Deployment Disasters

Imagine relying on a tool that itself depends on your own code to run—that’s the circular dilemma GitHub faces. Since GitHub hosts its own source code on github.com, any outage could create a paradox: to fix GitHub, you need GitHub. This post explores how GitHub turned to eBPF (extended Berkeley Packet Filter) to break this cycle and ensure safe, reliable deployments. We’ll walk through five key strategies, from understanding the problem to implementing a kernel-level solution that monitors and blocks risky calls. Whether you’re a developer or DevOps engineer, these insights can help you build more resilient systems.

1. The Circular Dependency Challenge

GitHub’s own infrastructure runs on the very platform it provides. This means every deployment script, if not carefully designed, can introduce a circular dependency. For example, if GitHub goes down, fixing it requires deploying new code—but that deployment might need to fetch assets from GitHub, which is unavailable. To mitigate this, GitHub maintains a mirror of the codebase and pre-built rollback assets. However, these measures alone don’t prevent scripts from accidentally calling internal services or downloading binaries from GitHub during an incident. That’s where eBPF comes in: it provides a way to see and control system calls without changing application code, enabling GitHub to set rules that block such risky operations during deployments.

5 Essential Ways GitHub Uses eBPF to Prevent Deployment Disasters — Source: github.blog

2. Direct Dependencies: The Obvious Trap

In a hypothetical MySQL outage scenario, a deploy script might try to pull the latest release of an open source tool from GitHub. If GitHub is down, the script fails—a direct dependency. This is the most straightforward circular dependency, but also the easiest to overlook. eBPF allows GitHub to attach hooks at the kernel level to monitor all network calls made by deployment scripts. If a script attempts to reach github.com (or any known internal service) during a critical restore, eBPF can log the attempt and even block it. This ensures that deployment code remains self-contained and doesn’t rely on the very system it’s trying to fix.

3. Hidden Dependencies: The Silent Threat

Sometimes a script uses a tool already on disk, but that tool secretly checks for updates online. For instance, a servicing tool might ping GitHub to see if a new version exists. If GitHub is unreachable, the tool could hang or fail silently—a hidden dependency. These are dangerous because they’re not obvious from reading the script. eBPF can detect such outbound calls by inspecting system calls like connect() or sendto(). GitHub uses eBPF to whitelist only necessary connections during deployments; anything else is logged and optionally blocked. This exposes hidden dependencies that teams might otherwise miss until an actual outage.

4. Transient Dependencies: The Escalating Risk

Transient dependencies occur when a script calls another internal service (e.g., a migration service), which in turn tries to fetch something from GitHub. The failure cascades back to the deploy script. These chains are especially tricky because they involve multiple services and can be hard to trace. eBPF provides a unified view: because it monitors all system calls across the system, GitHub can see the entire call chain. By setting eBPF programs that track the process hierarchy, they can attribute each network request to the original deploy script. This allows them to enforce policies like “no external network calls allowed for any process spawned by the deployment” during incident response.

5. Implementing eBPF for Deployment Safety

GitHub integrated eBPF into their new host-based deployment system by writing small programs that run in the kernel’s virtual machine. These programs attach to tracepoints or kprobes to monitor specific system events. For example, they can filter outbound network connections by checking destination IP addresses against a list of allowed services. If a forbidden connection is attempted, the eBPF program can block it and log a warning. The key advantage is performance: eBPF is extremely lightweight and doesn’t require modifying the deployment scripts themselves. This approach scales across thousands of machines and helps GitHub maintain a high level of reliability. By sharing their findings, GitHub hopes to inspire other organizations to explore eBPF for their own deployment safety challenges.

eBPF isn’t just a tool for network monitoring or observability—it’s a powerful ally in breaking circular dependencies. By proactively blocking risky calls during deployments, GitHub reduces the likelihood of exacerbating an outage. The result is a more resilient infrastructure that can recover faster, even when the primary platform is down. If you’re looking to improve your deployment safety, consider how eBPF might help you see and control the hidden paths that lead to failure.