Ephemeral runner cost in 2026: spin-up-spin-down economics

Self-hosted CI runners save compute money compared to hosted, but the savings depend on whether your runner pool is well-utilised. Always-on runners pay 720 hours of EC2 per month per machine regardless of how busy they are. Ephemeral runners only pay for the time a job is actually running. This page covers the architecture, the cold-start latency trade-off, and the break-even calculation that decides whether the operational complexity of an ephemeral pool is worth the compute saving.

Headline at a glance (2026)

Ephemeral runners win on cost when utilisation of always-on capacity is below 40-50%. Above that, always-on is cheaper because the cold-start overhead exceeds the idle-time savings. For teams with bursty CI patterns (busy weekday mornings, quiet weekends), ephemeral typically wins. For teams with steady continuous load, always-on wins.

The three ephemeral architectures

Three patterns dominate. First, Actions Runner Controller (ARC) on Kubernetes for GitHub Actions. ARC watches the GitHub API for queued workflows and provisions runner pods to handle them. When the job finishes, the pod is destroyed. ARC works with any Kubernetes cluster: EKS, GKE, AKS, on-prem.

Second, GitLab CI's Kubernetes Executor. GitLab Runner installed in a Kubernetes cluster spawns a pod per job, runs the job in the pod, terminates the pod. Same conceptual model as ARC, native to GitLab.

Third, ad-hoc EC2 spin-up via Lambda. A queue (SQS or similar) receives job requests; a Lambda function provisions a new EC2 instance, runs the job, terminates the instance. Simpler than Kubernetes but lacks the orchestration features (job grouping, runner labels, pod isolation) that ARC and the Kubernetes Executor provide. Good fit for very small teams or for very specific workloads (long-running ML jobs that justify a dedicated EC2).

The compute economics

Always-on EC2 m5.large costs $0.096/hour x 720 hours = $69/month. Ephemeral m5.large pods that run 80 hours of jobs/month pay $0.096 x 80 = $7.68/month plus a small Kubernetes scheduling overhead.

Worked example: a 25-developer team running 30,000 build minutes/month (500 hours of compute). On always-on with one m5.large dedicated: $69/month, but 500 hours / 720 = 69% utilisation, with overflow during busy periods queuing for capacity. Add a second runner for headroom: $138/month, 35% utilisation per runner. On ephemeral: 500 hours x $0.096 = $48/month for compute, plus $30-50 in Kubernetes operational overhead, plus the cluster cost itself (typically a small EKS control plane is $73/month). Total $150-170/month.

At this utilisation (35-69%), ephemeral is comparable to always-on once cluster overhead is included. The ephemeral case improves dramatically at lower utilisation: a team running 100 hours/month would pay $9.60 on ephemeral vs $69 on one always-on runner. The case worsens at higher utilisation: 600 hours/month at ephemeral is $57.60 plus overhead vs one always-on at $69. The crossover sits around 40-50% utilisation.

Cold-start latency: the wait-time tax

Ephemeral runners have longer queue-to-start time than always-on. The slow path is when a job arrives and no runner pod has capacity: ARC must request a new node from Karpenter or the cluster autoscaler, the node must boot (60-120 seconds for EC2), the runner image must pull (5-30 seconds), the job can finally start. Total slow-path latency: 90-180 seconds.

The fast path is when an existing node has spare capacity for a new pod: ARC schedules the pod, the runner starts. Total fast-path latency: 10-30 seconds. Most jobs after the first burst of the morning land on the fast path; the first few jobs in the morning eat the slow-path cost.

Cost in developer time: 90 seconds extra wait per first-of-burst job. For our 25-dev team with 5 pushes per dev per day, 125 jobs/day x say 10% on slow path x 90 seconds = 19 minutes/day = 6.3 hours/month of extra wait. At $100/hour fully loaded times 40% productivity factor, $250/month in payroll value lost to cold-start latency. Compare this to the compute saving: if ephemeral saves $50/month vs always-on but costs $250/month in extra wait, always-on actually wins on total economics.

Mitigation: keep a small warm pool of pre-warmed runners that absorb the morning burst, and let ephemeral handle the bulk of steady load. This hybrid model gets most of the cost saving without the cold-start tax.

The Karpenter case for AWS

On AWS, the Kubernetes node autoscaler decision is between Cluster Autoscaler and Karpenter. Karpenter is faster (newer architecture, no ASG round-trip), more flexible (can pick instance types per pod requirement), and handles spot instances better. For CI runner pools where pods come and go in seconds, Karpenter's faster scale-up materially improves the cold-start latency we just discussed.

Karpenter integrates well with EC2 spot instances, which is the largest single cost-reduction lever available to self-hosted CI. We cover spot in detail on the spot instance CI cost page; the brief is that combining ephemeral pods with spot nodes gets you 70%+ off on-demand compute prices, with interruption tolerance built into the ephemeral model itself (a job interrupted by spot reclaim retries on a new pod, which is exactly what ephemeral runners are designed for).

When ephemeral is the wrong choice

Three cases. First, very small teams (under 10 devs): the operational overhead of ARC plus Kubernetes plus Karpenter is more engineering time than the entire CI bill saves. Use hosted CI or a single always-on runner. Second, very steady continuous load: a team running 700+ hours/month consistently has high enough utilisation that always-on with horizontal scaling beats ephemeral on total cost. Third, jobs with very long startup costs: any job that takes 5+ minutes to set up environment / restore caches is being billed for the cold-start every time on ephemeral, which compounds.

The pattern that consistently wins for the 25-200 developer band: a hybrid model. Small always-on pool for the morning burst and the on-call critical path. Ephemeral pool for the bulk PR-check workload during the day. Spot-backed Karpenter for the ephemeral pool. This combination gives 50-70% reduction in compute spend versus pure always-on without the wait-time tax of pure ephemeral.

Frequently Asked Questions

What is an ephemeral CI runner?

An ephemeral runner is a single-job runner created when a build queues and destroyed when the build finishes. Each job runs on a fresh container or VM with no state from prior jobs. Ephemeral patterns are common with Actions Runner Controller (ARC) for GitHub Actions, GitLab Kubernetes Executor for GitLab CI, and Karpenter or Cluster Autoscaler for Kubernetes-native scaling. They eliminate idle compute charges at the cost of cold-start latency per job.

What is Actions Runner Controller?

Actions Runner Controller (ARC) is GitHub's official Kubernetes operator for hosting GitHub Actions self-hosted runners. ARC watches for queued workflows and dynamically provisions runner pods to handle them, then terminates the pods when the job completes. It supports horizontal pod autoscaling, runner groups for permission boundaries, and runs on any Kubernetes cluster (EKS, GKE, AKS, on-prem).

What does Karpenter add to CI runner pools?

Karpenter is a Kubernetes node autoscaler that provisions EC2 instances on demand based on pod requirements. For CI runner pools using ARC, Karpenter adds the EC2 layer: when a runner pod is queued and no node has capacity, Karpenter spins up an EC2 instance. When pods finish and capacity is no longer needed, nodes are scaled down. The combination ARC plus Karpenter gives true ephemeral runner economics on AWS.

What is the cold-start cost of ephemeral runners?

Two parts. EC2 instance spin-up: 60-120 seconds depending on instance type and AMI. Pod startup on existing nodes: 5-30 seconds depending on container image size. Total cold-start latency for the slow path (new node): 90-180 seconds added to job queue time. For warm path (pod on existing node): 10-30 seconds. Per developer-wait-time math, 90 seconds extra wait per build at 5 pushes/dev/day at 25 devs is 31 hours/month or roughly $1,250 in payroll value, which often offsets the compute savings on smaller deployments.

When are ephemeral runners cheaper than always-on?

When utilisation of always-on capacity is below 40-50%. Always-on runners pay for 720 hours/month regardless of how busy they are. Ephemeral runners pay only for active build time. A team running 80 hours/month of builds (12% utilisation of one runner) wastes 88% of always-on capacity. The same workload on ephemeral pays for 80 hours of EC2, roughly $25 vs $30 for the always-on. The crossover is at ~40-50% utilisation; below that, ephemeral wins.