CI/CD build cache savings in 2026: how much caching actually returns
Caching is the single largest cost-reduction lever in CI. The marketing copy promises 80-90% savings; the realistic steady-state on a working pipeline is 20-40% of total minute spend. Both numbers are technically defensible because they answer different questions. This page walks through what caching actually does, what it does not do, the per-vendor rules that constrain it, and the cache-hit-rate metric that tells you whether your caches are delivering.
Headline at a glance (2026)
Dependency caches (npm, pip, Maven, Cargo) routinely cut cold-install time 60-90% per build. Total monthly bill savings land around 20-40% because dependency install is rarely the entire build. Docker layer caching via BuildKit registry cache adds another 30-50% reduction on container-heavy workloads. Combined, caching well halves a typical CI bill.
What caching actually saves
The cache step replaces a slow operation (downloading and installing dependencies, building Docker layers, computing test indexes) with a fast operation (restoring the result of the previous run). A well-configured cache turns a 90-second npm ci into a 5-second actions/cache restore. Multiplied across thousands of CI runs per month, the cumulative time saved is substantial.
The catch: cache-hit rate. Caching only saves time when the cache restores successfully. A cache miss falls back to the slow path AND incurs the cache restore round-trip overhead, which is a small net loss per missed run. If 90% of your runs hit the cache, you save 90% of cold-install time on those runs. If only 50% hit, you save half. If 10% hit, the cache is a net loss because the overhead of trying to restore exceeds the savings.
The cache-key strategy decides hit rate. The standard pattern (key based on a hash of the lockfile) gives near-100% hit rate on consecutive runs of the same branch where the lockfile has not changed, and 0% when it has. Including a fallback key (try the exact match, then try the most recent cache for the branch) lifts hit rate to 70-90% on actively-developed branches at the cost of occasionally restoring a slightly-stale cache.
Per-vendor cache mechanics
| Vendor | Cache primitive | Size limit | Eviction | Source |
|---|---|---|---|---|
| GitHub Actions | actions/cache | 10 GB / repo | LRU after 7 days unused | vendor docs |
| GitLab CI | cache: stanza, S3-backed | BYO storage limit | Configurable | vendor docs |
| CircleCI | save_cache / restore_cache | 15 GB / project (Free) | 15 days unused | vendor docs |
| Buildkite | BYO (S3 / artifact mounts) | BYO storage limit | BYO retention | vendor docs |
| AWS CodeBuild | S3 cache, local cache | BYO storage | Configurable | vendor docs |
GitHub Actions and CircleCI provide cache-as-a-service: vendor-managed storage, simple API, no infrastructure to operate. The trade-off is the size limits. GitLab and Buildkite expect you to provide storage, typically S3, which removes the size limits but adds the egress cost we covered on the cross-region egress cost page if the storage is in a different region from your runners.
Per-stack realistic savings
Per-stack figures from public benchmarks and our own measurements across multiple production pipelines. These are what to plan for, not the marketing maximum.
| Stack | Cold install time | Cached install time | Total build saving (typical) |
|---|---|---|---|
| Node / npm | 60-180s | 3-10s | 15-30% |
| Node / pnpm | 30-90s | 2-6s | 10-20% |
| Python / pip | 45-150s | 5-15s | 15-25% |
| Python / Poetry | 90-300s | 10-25s | 25-40% |
| Java / Maven | 120-400s (.m2 cache) | 15-40s | 30-50% |
| Java / Gradle | 150-450s (Gradle wrapper + deps) | 20-60s | 30-50% |
| Go / go mod | 30-90s | 3-10s | 10-20% |
| Rust / Cargo | 180-1200s (target/ cache) | 30-120s | 40-70% |
| Docker / BuildKit registry | 60-600s | 10-90s | 30-50% |
Two surprises in the table. Rust wins biggest because the target/ directory caching is enormous (compiled artifacts) and the cold-build penalty is severe. Node loses smallest because dependency install is a small fraction of a typical Node test run; the test suite dominates and is not cached. Docker layer caching via BuildKit registry export is the highest-impact intervention for container-heavy CI but requires non-trivial setup compared to dependency caching.
The cache-hit-rate dashboard
Cache savings are easy to measure if you instrument them. Each cache step in your CI workflow can emit a metric: hit, miss, or fallback. Aggregate per workflow per week and you have a chart that tells you whether the cache is delivering. Common patterns visible in such a dashboard: a Friday-afternoon dependency upgrade tanking hit rate for two days, a flaky cache backend region causing a 24-hour 0% rate, a misconfigured cache key on a new workflow showing 0% hit rate because the key never matches anything.
GitHub Actions exposes cache hit/miss in workflow logs but does not aggregate per-workflow. The simplest way to build the dashboard is a small step that pushes a metric to a CloudWatch / Datadog / Prometheus endpoint with the cache action's output. The metric becomes the leading indicator of cache health; the savings are a lagging indicator on the bill.
Distributed build caches: the next layer
For larger codebases, the per-step CI cache is not enough. Distributed build caches at the build-tool level (Bazel remote cache, Nx Cloud, Turborepo Remote Cache, Gradle Build Cache, sccache for Rust) cache individual compilation units across all CI runs and developers. The first developer to compile a particular function on a given input gets a 30-second compile; everyone else after gets an instant cache hit on the same artifact.
Distributed caches are powerful and operationally heavier. They require a backend (S3 + a small service, or a managed offering like Nx Cloud at $25-100/seat/month). The setup is several days of platform work. They are typically worth it for codebases of 100k+ lines or for monorepos where many services share dependencies, where the distributed cache hit rate gets high enough to deliver order-of-magnitude speedups on any given individual file or service. We cover the wider set of caching techniques on the CI/CD caching strategies page.
When caching breaks: the failure modes
Caching has three common failure modes. First, stale cache hit: the cache restores an out-of-date dependency tree and the build passes when it should fail (or vice versa). Mitigation: include the lockfile hash in the cache key, never just the package name. Second, cache poisoning: a build writes corrupt data to the cache and every subsequent build restores corruption. Mitigation: only push to cache from main-branch builds, not from PRs. Third, runaway cache size: the cache grows unboundedly until it hits the vendor limit and starts evicting useful entries. Mitigation: explicit eviction logic in workflows that prune caches matching old branch names.
When caches break the symptoms can be confusing because they look like flake (intermittent failure for "no reason"). Maintain a runbook entry for each cache failure mode and the diagnostic steps. Most teams discover a cache problem during a post-mortem after a misleading green-build incident, not proactively.
Frequently Asked Questions
How much does CI caching save?
On a per-build basis, dependency caching typically cuts 30-60% of cold-install time on Node and Python pipelines. On a per-month-bill basis, the saving is smaller because cold installs are not the dominant cost line: typical realistic savings are 20-40% of total minute spend after caching is set up. The 80-90% savings claims in marketing copy assume a cold baseline that nobody actually runs in steady state.
What is cache-hit rate and why does it matter?
Cache-hit rate is the percentage of CI runs where the cache restored successfully and the cached step ran in the fast path. A high cache-hit rate (80%+) is the leading indicator of cache-derived savings. A low rate (under 50%) means the cache key strategy is wrong and most runs fall back to cold. The metric is more important than the per-build saving because it tells you whether the cache is doing its job.
What is the GitHub Actions cache size limit?
GitHub Actions cache limits per repository are 10 GB total cache size across all branches. When the limit is hit, GitHub evicts the oldest caches (LRU). For monorepos this is restrictive: a few large cache entries per branch can quickly approach 10 GB. The mitigation is per-key caching: split into smaller per-package or per-component caches that get reused across PRs rather than one large monolithic cache.
Should I cache Docker layers?
Yes, but the implementation matters. The naive approach (saving the entire Docker daemon state to cache) consumes the cache budget quickly. The better approach is BuildKit cache export (--cache-to=type=registry,ref=...) which pushes cache layers to a container registry where they can be pulled per-layer on cache restore. For monorepos with many services, BuildKit cache to ECR or GHCR consistently delivers 50-70% reduction in Docker build time after warmup.
Why does my cache restore take longer than the work it saves?
Two reasons. First, cache size is too large; restoring a 5 GB cache takes 60-90 seconds and only saves 30 seconds of work. Second, cache restore comes from cross-region storage; if your runner is in eu-west-2 and the cache lives in us-east-1, the network transfer dominates. Fix: keep caches small per key (under 500 MB), make sure the cache backend is regional to the runner, and measure both restore time and saved work to confirm the ratio is favourable.