GCP Cloud Run Pricing: Request-Based vs Instance-Based Billing, vCPU, Memory, and Egress

Reviewed by CloudCostKit Editorial Team. Last updated: 2026-06-19. Editorial policy and methodology.

Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.


This is the Cloud Run service behavior and pricing decision page. Use it when you are budgeting Cloud Run from the way the service behaves in production: request volume, execution time, concurrency, outbound transfer, and logging.

Use this page when you need to decide whether Cloud Run spend is mostly request scale, slow handlers, low-concurrency compute time, large responses, or logging that rises with every retry and timeout.

Go back to the serverless parent guide if the broader architecture model is still unclear and you still need to map retries, downstream amplification, and observability before narrowing the estimate to Cloud Run.

Quick pricing read: what most teams need first

Cloud Run pricing starts with the billing setting, not with a single "price per request." For services, the first fork is request-based billing versus instance-based billing. After that, the bill is shaped by vCPU, memory, request charges, region, billed instance time, concurrency, and whether the workload is an HTTP service, a job, or a specialized GPU-style deployment.

  • Request-based billing: the default service model. You pay for request charges plus vCPU and memory during billable request processing, startup, and shutdown windows.
  • Instance-based billing: you pay for the full lifecycle of a running instance, with no separate per-request fee. It can fit steadier services, background work, or cases where always-allocated CPU is intentional.
  • vCPU, memory, and request charges: these are the core Cloud Run-native service surfaces to model before you add network, logs, or downstream dependencies.
  • Cloud Run jobs: jobs are billed around CPU, memory, active runtime, region, and any parallel execution shape rather than HTTP request volume.
  • Adjacent costs: Artifact Registry, Cloud Build, outbound data transfer, Cloud Logging, Cloud Monitoring, and downstream databases should sit beside the Cloud Run bill unless you are explicitly building a full application budget.

This page was updated on 2026-06-19 against the current Google Cloud Run pricing page, Cloud Run billing settings documentation, and Cloud Run cost-optimization guidance.

What Cloud Run teams usually pay for in the real world

Most teams start with request count because it is easy to understand, but request count alone is not the bill. Cloud Run cost is usually shaped by a stack of related drivers that need to be read together. The important move is to separate Cloud Run-native charges from the surrounding delivery workflow.

  • Billing setting decides whether the service is modeled as request-based billing or instance-based billing before any traffic math begins.
  • Requests tell you how often the service executes, but they only become meaningful after you split baseline traffic from peak and retry traffic.
  • vCPU and memory time decide how expensive each request or running instance becomes, which is why a slow path or poor concurrency setting can change the monthly total quickly.
  • Minimum instances can create idle billable time that looks small per minute but becomes a real baseline if it is enabled across many services or regions.
  • Egress matters when responses are large, clients are global, or the service is exporting files rather than returning small JSON payloads.
  • Logs and metrics matter because request-by-request observability scales linearly with traffic and can become a second bill long before the application feels large.

The practical takeaway is simple: if you only model requests, you will usually undercount the expensive month. If you model billing setting, requests, billed instance time, transfer, and logs together, Cloud Run becomes much easier to budget.

Inside the Cloud Run bill vs beside the Cloud Run bill

  • Inside the Cloud Run bill: request charges when applicable, vCPU-seconds, memory GiB-seconds, GPU seconds when used, jobs runtime, worker-pool runtime, minimum-instance exposure, and region-specific pricing.
  • Beside the Cloud Run bill: Artifact Registry image storage, Cloud Build build minutes, Cloud Logging ingestion and retention, Cloud Monitoring, outbound data transfer, load balancing, Cloud SQL, storage, Pub/Sub, and any downstream API charges.
  • Why this matters: Artifact Registry and Cloud Build belong beside the Cloud Run bill, not hidden inside it, because image storage or build activity can grow even when the Cloud Run service itself is efficient.

Build the first estimate from service behavior, not from one average

A useful Cloud Run estimate starts with the inputs that map cleanly to how the service behaves under normal and stressed conditions. The goal is not to create a perfect finance model on day one. The goal is to capture the drivers that explain why this service is cheap in one month and surprisingly expensive in another.

  • Billing mode: decide whether the service is using request-based billing or instance-based billing, and keep that assumption visible in the estimate.
  • Requests per month: convert traffic into a monthly number, but keep baseline and peak separated instead of blending them.
  • Duration by percentile: use p50 and p95 so you can see what normal execution looks like and what the slow path costs.
  • CPU and memory shape: capture what each request actually consumes while it is running, especially for handlers that are not purely I/O-bound.
  • Concurrency behavior: note whether higher concurrency improves efficiency or causes latency and contention that erase the savings.
  • Minimum instances: model warm capacity separately from live request processing so idle exposure is not mistaken for traffic-driven spend.
  • Response size and egress: isolate heavy endpoints, downloads, exports, or media responses so they do not disappear inside one blended average.
  • Log bytes per request: estimate logging separately from application payload size because verbose logs often scale differently from response bodies.

Tools that help with these inputs: RPS to monthly requests, response transfer, egress cost, log cost.

The strongest habit on this page is to separate baseline service behavior from peak behavior. A launch week, retry storm, or upstream slowdown does not just increase one line item. It changes requests, duration, transfer, and log volume together.

How billing mode, concurrency, transfer, and logs interact

This is the part many quick guides skip. Cloud Run does not become expensive because one input goes up in isolation. It gets expensive when several cost drivers reinforce each other.

  • Request-based billing with bursty traffic can be efficient when scale-to-zero and short request windows dominate the pattern.
  • Instance-based billing with steady traffic can be more predictable when instances stay useful between requests and no per-request fee is valuable.
  • High latency with low effective concurrency means compute time dominates faster because every request holds resources longer.
  • Large responses can push network transfer ahead of compute, especially for export, download, or media endpoints.
  • Retry storms and timeouts multiply requests, compute time, downstream calls, and log volume together.
  • Verbose request logging creates a second scale curve that keeps rising even if the application code itself is lightweight.

Concurrency deserves special attention because it changes the economics of the service. Higher concurrency can make an I/O-bound handler much more efficient, but the same setting can harm a CPU-bound endpoint, increase tail latency, and create a misleadingly optimistic estimate. When you model Cloud Run, treat concurrency as an operating choice that needs evidence, not as a constant you can pick once and forget.

  • CPU-bound handlers: lower concurrency is often safer because it protects p95 latency and keeps contention visible.
  • I/O-bound handlers: higher concurrency can improve cost efficiency, but only if latency stays controlled under load.
  • Mixed workloads: split batch-like or heavy endpoints into separate services so one concurrency decision does not distort the entire estimate.

Services, jobs, and GPU workloads should not share one shortcut

A service that responds to HTTP requests, a job that runs to completion, and a GPU-backed service can all be Cloud Run workloads, but they do not deserve the same budget shortcut. Keeping them separate makes the estimate easier to explain and makes the page more useful for real architecture reviews.

  • Services: model billing mode, requests, duration, vCPU, memory, concurrency, min instances, transfer, and logs.
  • Jobs: model task count, parallelism, retry behavior, active runtime, vCPU, memory, region, and whether failed attempts create repeated runtime.
  • GPU workloads: model them as instance-based commitments with GPU seconds and idle exposure visible, not as ordinary request-priced services.

Scenario planning: the fastest way to avoid a weak budget

A Cloud Run estimate becomes useful when it explains more than one month shape. Instead of asking "what is my Cloud Run cost," ask what a normal month, a peak month, and a bad month look like. That framing usually catches the same risks that later show up in billing surprises.

Scenario What changes Why it matters
Baseline month Expected traffic, chosen billing mode, normal latency, standard log volume Gives you the operating floor for a stable period
Peak month Higher request volume, larger response mix, busier endpoints, more billed instance time Shows whether network and logs scale faster than compute
Failure month Retries, timeouts, slow upstreams, cold starts, noisy logging, failed job reruns Reveals how incidents turn one service into several simultaneous cost spikes

If your estimate only covers a blended average month, it will look cleaner than the real system. That is exactly why first-pass Cloud Run budgets tend to fail during launches or incidents.

What usually goes wrong, and how to validate before you trust the estimate

Most weak Cloud Run estimates fail for operational reasons, not because the spreadsheet math is hard. Teams often have the right variables but the wrong shape of data behind them.

  • The billing mode is not recorded, so request-based and instance-based assumptions get mixed in one spreadsheet.
  • One average duration is used for every endpoint, which hides the slow path that drives cost and tail latency.
  • Minimum instances are treated as a reliability setting but never modeled as idle billable time.
  • Retry traffic appears in dashboards but never makes it into the budget model.
  • Large-response endpoints are blended into a harmless-looking average response size.
  • Logging assumptions stay frozen even after traffic growth or more verbose instrumentation.
  • Baseline and peak periods are merged, which makes the model look stable when the system is not.

Before you sign off on the estimate, validate the service against real operating signals rather than intuition.

  • Check whether the service uses request-based billing or instance-based billing before comparing months.
  • Check p50 and p95 latency for the endpoints that dominate traffic or spend.
  • Check concurrency behavior under load so you know whether your efficiency assumptions survive real traffic.
  • Check minimum instances and idle exposure separately from active request processing.
  • Check the top endpoints by response bytes, not just by request count.
  • Check retries, timeout windows, and incident periods separately from normal traffic.
  • Check log bytes per request and retention settings so observability is not treated as a rounding error.

A practical sign-off rule works well here: every major number in the model should map back to something measurable in production or a billing export. If you cannot explain where a number comes from, the estimate is not ready for budget decisions yet.

Next actions if you are budgeting Cloud Run now

If you are building a review packet for finance or for an internal architecture discussion, pair those calculators with the cloud cost estimation checklist so your estimate is tied to measurable inputs and a validation step.

Related GCP service guides

  • GCP Cloud SQL pricing: useful when Cloud Run traffic, connection patterns, or background jobs move database capacity and backup costs at the same time.
  • GCP Cloud Storage pricing: useful when response transfer, stored assets, or archive flows shape the broader service bill.

Sources


Related guides

Cloud Functions pricing (GCP): invocations, duration, egress, and log volume
A practical Cloud Functions cost model: invocations, execution time, outbound transfer, and logs. Includes a workflow to estimate baseline + peak and validate retries, cold starts, and log bytes per invocation.
Google Kubernetes Engine (GKE) pricing: nodes, networking, storage, and observability
GKE cost is not just nodes: include node pools, autoscaling, requests/limits (bin packing), load balancing/egress, storage, and logs/metrics. Includes a worked estimate template, pitfalls, and validation steps to keep clusters right-sized.
Cloud cost estimation checklist: build a model Google (and finance) will trust
A practical checklist to estimate cloud cost without missing major line items: requests, compute, storage, logs/metrics, and network transfer. Includes a worksheet template, validation steps, and the most common double-counting traps.
GCP Cloud Storage Pricing & Cost Guide
Understand GCP Cloud Storage cost drivers across storage class, operations, retrieval, replication, and egress, with estimation steps for request-heavy and archive-heavy workloads.
GCP Cloud CDN Pricing: Cache Egress, Requests, Cache Fill, and Origin Boundaries
Understand GCP Cloud CDN pricing through cache egress bandwidth, cache lookup requests, cache fill, hit-rate pressure, and the origin or transfer costs that belong beside the Cloud CDN bill.
GCP Cloud SQL Pricing: Instance Hours, HA, Storage, Backups, and Network
Understand GCP Cloud SQL pricing through edition choice, instance hours, HA and replicas, storage, backups, and network-sensitive access patterns, with adjacent application and analytics costs kept separate.

Related calculators


FAQ

What usually drives Cloud Run cost?
For services, the main Cloud Run-native drivers are billing setting, vCPU time, memory time, request charges when request-based billing is used, and the amount of billable instance time created by latency, concurrency, startup, and shutdown behavior.
Is Cloud Run request-based or instance-based?
Cloud Run services can use request-based billing or instance-based billing. Request-based billing adds request charges and bills CPU and memory around request processing. Instance-based billing bills the full instance lifecycle and removes the per-request fee.
How do I estimate quickly?
Start with the billing setting, monthly requests, p50/p95 duration, vCPU, memory, concurrency, min instances, response transfer, and log volume. Keep Artifact Registry, Cloud Build, and downstream services beside the Cloud Run line.
What is the most common mistake?
The most common mistake is blending services, jobs, images, build minutes, network transfer, and logs into one Cloud Run number. That hides whether the service itself is expensive or the surrounding workflow is expensive.
How do I validate?
Validate the billing setting, billed instance time, p50/p95 latency, concurrency behavior, min-instance idle time, retries/timeouts, top endpoints by bytes, and log bytes per request.

Last updated: 2026-06-19. Reviewed against CloudCostKit methodology and current provider documentation. See the Editorial Policy .