ADK · A2A · Cloud Run · GCP

7 ADK agents in
production with
Cloud Run

Each agent is an independent service with its own URL, its own container, and scales to zero. They communicate over HTTP using the A2A protocol.

Watch on YouTube Code on GitHub Documentation ›

The core problem

Shared filesystem doesn't scale

Docker local

Agent 1 → writes /app/outputs/config.yaml

Agent 2 → reads /app/outputs/config.yaml

Agent 3 → reads /app/outputs/*.yaml

All share the same disk

Cloud Run

Container 1 → writes /app/outputs/config.yaml

Container 2 → FILE NOT FOUND

Container 3 → FILE NOT FOUND

Each container is isolated

On Cloud Run each container is isolated — no shared disk, no shared memory, no shared process. If agent 3 writes a file, agent 4 cannot see it. The solution: move the context from the filesystem into the HTTP messages.

The solution: context in messages

Instead of writing to disk, the orchestrator accumulates the output of each agent and sends it in the next request. Every agent receives the full history of what the previous ones decided.

HTTP orchestrator flow

# 1. Orchestrator calls Agent 1

POST https://architect-xxx.run.app/run

body: { "task": "Build IDP for Python" }

→ returns: platform-config decisions

# 2. Orchestrator calls Agent 2 with accumulated context

POST https://infra-xxx.run.app/run

body: { "task": "...", "context": "architect output" }

→ returns: Dockerfile + docker-compose

# 3. Orchestrator calls Agent 3 with ALL previous context

POST https://security-xxx.run.app/run

body: { "task": "...", "context": "architect + infra output" }

→ returns: security report

# ... same pattern for agents 4-7

Key insight: context_json parameter

Each agent's tools accept a context_json parameter with the accumulated output from all previous agents. The agent doesn't need to read files from disk — it receives everything it needs in the HTTP request body. This is what makes A2A work over the network: the protocol carries the state.

The 7 specialized agents

Each one runs as an independent Cloud Run service with its own URL, its own container, and scales to zero.

1

🏗️

Platform Architect

Analyzes the task and decides the tech stack: runtime, framework, database, CI/CD, monitoring. Justifies each decision.

planningstack-decision

2

🐳

Infrastructure

Generates Dockerfile and docker-compose with all services, healthchecks, networks and volumes.

dockercompose

3

🔐

Security

Policies, RBAC and TLS. Scans for exposed secrets, open ports, images with CVEs. Generates a structured report.

rbactlspolicies

4

🔄

CI/CD

Build, test and deploy pipelines. Generates automation scripts adapted to the decided stack.

buildtestdeploy

5

📊

Observability

Monitoring with Prometheus and Grafana dashboards: application metrics and system metrics.

prometheusgrafana

6

💻

DevEx

CLI tool with project commands: status, logs, deploy, rollback, scale and more.

clideveloper-tools

7

🌐

Web Portal

Complete web dashboard with FastAPI backend, HTML templates and real-time visualization of services.

fastapidashboard

Each agent needs 3 things

To deploy any ADK agent as an independent A2A server on Cloud Run, you need exactly these three files.

1

Container

Dockerfile

Each agent has its own Dockerfile. Install dependencies, copy the agent code, expose port 8080. Cloud Run requires port 8080 by default.

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["python", "server.py"]

2

Discovery

agent.json

The A2A agent card. Declares the agent's name, description, capabilities and endpoint URL. The orchestrator reads this to know where to send requests.

{
  "name": "platform-architect",
  "description": "Analyzes task and decides tech stack",
  "url": "https://architect-xxx.run.app",
  "capabilities": ["planning", "stack-decision"]
}

3

Context

Tools accepting context_json

Every tool function accepts a context_json parameter with the accumulated output from previous agents. No filesystem reads — everything arrives in the HTTP payload.

def generate_dockerfile(
    context_json: str
) -> dict:
    """Reads architect decisions from context
    and generates the Dockerfile."""
    context = json.loads(context_json)
    # ... generate based on context
    return {"status": "success"}

Deploy to Cloud Run

One command per agent. Each one becomes an independent service with its own URL.

gcloud run deploy

gcloud run deploy architect-agent \
  --source . \
  --region us-central1 \
  --service-account agent-sa@PROJECT.iam.gserviceaccount.com \
  --set-secrets GOOGLE_API_KEY=gemini-key:latest \
  --allow-unauthenticated=false \
  --port 8080 \
  --memory 512Mi \
  --timeout 300

--source .

Cloud Build builds the container from the Dockerfile in the current directory

--service-account

GCP service account with Vertex AI and Secret Manager permissions

--set-secrets

Injects secrets from Secret Manager as environment variables

--allow-unauthenticated=false

Requires IAM authentication to invoke the service

--port 8080

Cloud Run default port

--timeout 300

Agents can take up to 5 minutes to respond

GCP setup

Four steps to prepare your Google Cloud project before deploying the agents.

1

Authentication

gcloud auth login and gcloud config set project PROJECT_ID. Everything else uses this authentication context.

2

Enable APIs

Cloud Run, Cloud Build, Artifact Registry, Secret Manager, and Vertex AI. All required for the deploy and runtime.

3

Service account permissions

Create a service account with roles: Cloud Run Invoker, Vertex AI User, Secret Manager Secret Accessor. Each agent runs under this identity.

4

Secret Manager

Store API keys and credentials in Secret Manager. Reference them in the deploy command with --set-secrets — they are injected as environment variables at runtime, never baked into the container image.

Discovery during development

Gemini API key → 503 errors on Cloud Run

The problem

503 Service Unavailable

Gemini API with API key worked locally

but returned 503 on Cloud Run.

Rate limits + cold starts + multiple

agents calling simultaneously = failure.

The fix

Migrate to Vertex AI

No API key needed. Authentication

flows through the service account.

Higher rate limits, better reliability,

native GCP integration.

The migration was minimal: change the model provider configuration and remove the API key. Vertex AI uses IAM natively — the same service account that runs the container authenticates with the model.

Frequently asked questions

Common questions about deploying ADK agents on Cloud Run with A2A protocol.

What is the A2A protocol?

+

A2A (Agent-to-Agent) is an open communication standard from the Linux Foundation that lets AI agents coordinate without a central orchestrator passing data between them. In the local version, agents share a filesystem. In the Cloud Run version, agents communicate over HTTP: the orchestrator sends accumulated context in each request, so every agent has the full history of what the previous ones decided.

Why not use a shared filesystem on Cloud Run?

+

In Docker locally, all 7 agents share /app/outputs/ — each one writes and reads files from the same directory. On Cloud Run each container is isolated: no shared disk, no shared memory, no shared process. If agent 3 writes a file, agent 4 cannot see it. The solution is to move the context into the HTTP messages themselves using the A2A protocol.

What's the difference between adk deploy cloud_run and a custom Dockerfile?

+

adk deploy cloud_run is the official ADK command that packages your agent into a container and deploys it to Cloud Run automatically. It works for simple agents. But when you need custom dependencies, multi-stage builds, or specific configurations per agent, a custom Dockerfile gives you full control. In this project each agent has its own Dockerfile because each one has different requirements.

Do services scale to zero?

+

Yes. Cloud Run scales to zero by default — if no one calls an agent, it stops consuming resources and you pay nothing. When a request arrives, Cloud Run cold-starts the container. For this project with 7 agents, that means you only pay for actual invocations. The tradeoff is cold start latency on the first request.

How does authentication between agents work?

+

Each agent runs as an independent Cloud Run service with its own URL. The orchestrator authenticates using a GCP service account with the Cloud Run Invoker role. When calling each agent, it sends an identity token in the Authorization header. No API keys, no shared secrets — it is GCP IAM native.

Can I use Vertex AI instead of an API Key?

+

Yes, and in fact this project migrated to Vertex AI during development. The Gemini API with API key returned 503 errors on Cloud Run. Vertex AI uses GCP service account authentication natively — no API key needed. The agent code changes minimally: you swap the model provider configuration and the authentication flows through IAM automatically.

Does it work with models other than Gemini?

+

Google ADK supports other models via LiteLLM — including Claude and GPT-4. The A2A protocol is model-agnostic: it is an open standard, not tied to any provider. This repository uses Gemini 2.5 Flash via Vertex AI, but you can adapt the model configuration without rewriting the tools or orchestration logic.

Resources

Repository

Deploy A2A agents on Cloud Run in Agentic Engineers

7 ADK agents as independent A2A servers in production, scale to zero and context over HTTP — with real repos to practice. Free community access; the full courses are in the Premium tier.

Join Agentic Engineers →

YouTube Channel

@NicolasNeiraGarcia

ADK · A2A · Cloud Run · Claude Code · Automation

Subscribe ›

7 ADK agents inproduction withCloud Run