GCP Cloud Run Pricing Guide (2026): Requests, CPU, Memory, Egress
Cloud Run-style platforms are predictable when you model measurable drivers: requests, CPU/memory time (duration), and outbound transfer. The main mistakes are ignoring response size and logging, and ignoring peak periods where retries multiply costs.
0) Pick the right unit of analysis
- Requests/month: the driver for HTTP services.
- Duration: the multiplier that turns requests into compute.
- Concurrency: affects instance count and can change per-request CPU time under contention.
- Egress: response bytes and external calls create transfer costs.
- Logs: per-request bytes logged x request volume (often overlooked).
1) Requests (monthly volume)
Convert request rate to monthly requests and keep baseline + peak separate. Peaks include bot spikes, marketing events, and incident retry storms.
Tool: RPS to monthly requests.
2) Duration and concurrency (compute time)
Use percentiles (p50 and p95) instead of one average so you can model normal vs slow-path behavior. Slow paths often come from upstream latency (DB/APIs), cold starts, and throttling.
- If you increase concurrency, validate latency and CPU contention; concurrency is not free for CPU-bound handlers.
- Track retries/timeouts: they multiply both request count and total compute time.
3) Response transfer and egress
If each request returns significant data, egress becomes a major driver. Estimate average response size and multiply by monthly requests, then split transfer by destination if needed.
Tools: Response transfer, Egress cost.
- Model heavy-tail endpoints separately (downloads, exports) so they do not disappear into a blended average.
- Include external dependency calls that return large payloads (they create egress too).
4) Logs (often the second spike)
Logging can outweigh compute if you log too much per request. Estimate log bytes per request and multiply by request volume, then model retention and scan/search if you query heavily.
Tools: Log ingestion, Tiered log storage, Log scan.
Cloud Run quick cost model
- Compute driver: requests x average compute time per request.
- Transfer driver: requests x average response bytes.
- Observability driver: requests x log bytes per request x retention policy.
- Peak factor: retries and timeouts can increase all three drivers at the same time.
Worked estimate template (copy/paste)
- Requests/month = baseline + peak (include retries)
- Duration = p50 + p95 scenario (seconds)
- Egress GB/month = requests/month x avg response size (GB) (split heavy endpoints)
- Log GB/month = requests/month x avg log bytes/request (baseline + peak)
Common pitfalls
- Using only average duration and ignoring p95 (slow path drives cost and capacity).
- Ignoring retries/timeouts which multiply requests, duration, and downstream calls.
- Not modeling response size (egress dominates for large payloads).
- Verbose logs per request (ingestion dominates at scale).
- Not separating baseline vs peak months (incidents change the cost shape).
How to validate
- Validate p50/p95 latency and cold start behavior.
- Validate retries/timeouts and incident traffic windows.
- Validate top endpoints by bytes (not just by request count).
- Validate log bytes per request and retention settings.
Concurrency tuning decision matrix
- CPU-bound handlers: keep lower concurrency and scale out earlier.
- I/O-bound handlers: higher concurrency can improve cost efficiency.
- Latency-sensitive APIs: protect p95 first, then optimize concurrency.
- Batch-like endpoints: isolate them from interactive traffic in separate services.
Failure patterns
- One average duration used for all endpoints and all traffic windows.
- Retry traffic counted in infra dashboards but excluded from cost models.
- Large response endpoints hidden inside blended average response size.
- Verbose request logging left unchanged after traffic growth.