Agent Harness — Reference Architecture

flowchart LR USR(["User ticket"]) REPLY(["Customer reply"]) subgraph CTX["① Context engineering"] POL["policies/*.md ⓘ
refund · escalation rules
step 01"] KB["knowledge-base/*.md ⓘ
10 articles
step 01"] RAG[("pgvector :5434 ⓘ
RAG retrieval
step 05")] end subgraph TOOLS["② Tool access"] MCP["MCP server ⓘ
steps/02/server.py · 7 tools
step 02"] SQLITE[("SQLite ⓘ
db/shopharness.db")] end subgraph ORCH["③ Orchestration"] SDK["Claude Agent SDK ⓘ
think · act · observe
step 03"] end subgraph GUARD["④ Guardrails"] NEMO["NeMo Guardrails ⓘ
Colang input + output rails
step 04"] end subgraph MEM["⑤ Memory"] MEM0["Mem0 ⓘ
conversation persistence
step 05"] end subgraph COST["⑥ Cost controls"] LITELLM["LiteLLM proxy :4000 ⓘ
virtual keys · budgets · fallback
step 06"] end subgraph OBS["⑦ Observability + evals"] LF["Langfuse :3000 ⓘ
session traces · LLM-as-judge
step 07"] end subgraph SAND["⑧ Sandbox"] E2B["E2B ⓘ
per-ticket VM
step 08"] end USR --> NEMO NEMO --> SDK SDK --> CTX SDK --> MEM SDK --> MCP MCP --> SQLITE MCP --> E2B SDK --> NEMO NEMO --> REPLY SDK -.routes.-> LITELLM SDK -.spans.-> LF click POL "javascript:void(0)" "Markdown policies loaded into the system prompt at step 01. Refund thresholds, escalation rules, brand voice." click KB "javascript:void(0)" "10 knowledge-base articles (warranty, shipping, troubleshooting). Loaded full at step 01; RAG-retrieved from step 05." click RAG "javascript:void(0)" "PostgreSQL with pgvector. Embeddings of knowledge-base articles. Retrieval scoped to the current ticket — drops context size from ~4k to ~400-800 tokens." click MCP "javascript:void(0)" "Custom MCP server (steps/02-tool-access/server.py). 7 tools: lookup_order, lookup_customer, lookup_customer_by_email, search_orders_by_customer, lookup_subscription, issue_refund, cancel_subscription." click SQLITE "javascript:void(0)" "SQLite — embedded DB at db/shopharness.db. 6 business tables (orders, customers, products, subscriptions, refunds, support_tickets). No daemon, no port." click SDK "javascript:void(0)" "Claude Agent SDK runs the think-act-observe loop. max_turns caps runaway loops. Tool-result reflection improves multi-step coherence." click NEMO "javascript:void(0)" "NeMo Guardrails (Apache 2.0). Colang rules enforce deterministic input + output guardrails. Social-engineering blocked even when the model fails probabilistic safety." click MEM0 "javascript:void(0)" "Mem0 (MIT) on pgvector. Stores conversation memory across sessions. Customer doesn't re-explain. Graph backend disabled in this build (proxy compat)." click LITELLM "javascript:void(0)" "LiteLLM proxy on :4000. Virtual keys per team, per-day budgets, tiered routing (Haiku for classification, Sonnet for response). Admin UI at :4000/ui." click LF "javascript:void(0)" "Langfuse self-hosted on :3000. Session-level traces with nested spans. LLM-as-judge evaluators scored against policy criteria. Login: admin@shopharness.dev." click E2B "javascript:void(0)" "E2B per-ticket cloud VM. Tool execution isolated from host. Step 08 — not yet built. Free $100 credit on signup." click USR "javascript:void(0)" "Inbound customer ticket — text only in this workshop. tickets/ticket-NN-*.md." click REPLY "javascript:void(0)" "Outbound reply after passing through output guardrails." classDef component fill:#fafaf8,stroke:#c45d2c,stroke-width:1.5px classDef tool fill:#ffffff,stroke:#e0ddd5,stroke-width:1px class USR,REPLY component class POL,KB,RAG,MCP,SQLITE,SDK,NEMO,MEM0,LITELLM,LF,E2B tool

Eight numbered components, each labeled with the workshop tool that implements it and the step that introduces it. Solid arrows = request path. Dashed = cross-cutting (cost routing, observability spans). Hover the ⓘ marker on any node for setup detail.

— Components

What each piece does

Component	Tool	What it solves	Container / Port
Context engineering	Markdown policies + pgvector RAG	Agent does not know your business rules	pgvector :5434
Tool access	MCP servers	Agent cannot read or write your data	local stdio
Orchestration	Claude Agent SDK	Multi-step reasoning across tickets	in-process
Guardrails	NeMo Guardrails	Probabilistic safety is not enough	in-process
Memory	Mem0 + pgvector	Customer re-explains every interaction	pgvector :5434
Cost controls	LiteLLM proxy	Ticket spike drains AI budget	:4000 (PG :5435)
Observability + evals	Langfuse self-hosted	You cannot debug what you cannot see	:3000 (PG :5436)
Sandbox	E2B	Tool execution touches production	per-ticket VM

— Data flow

How a ticket moves through

Ingest + input guardrails

Ticket text passes through NeMo input rails. Social engineering and PII leak attempts blocked deterministically before the model sees them.

Context assembly

Policies (markdown), retrieved knowledge base articles (RAG over pgvector), and conversation memory (Mem0) merged into the system prompt.

Agent loop

Claude Agent SDK runs the think-act-observe cycle. Each tool call goes through MCP into the per-ticket E2B sandbox.

Cost + observability cross-cuts

Every model call routes through LiteLLM (model selection, budget enforcement, fallback). Every span streams to Langfuse for trace + eval scoring.

Output guardrails + reply

Final response passes through output rails (regex + LLM-judged). Approved reply ships to the customer; rejected output triggers escalation.

— Workshop deployment

What runs on your laptop

Each workshop step is additive. Step 00–04 is in-process Python only. Step 05 introduces containers (pgvector for RAG + Mem0). Step 06 adds a model-routing proxy. Step 07 adds observability infra. Step 08 reaches into E2B's cloud for per-ticket sandboxes. Total local footprint at the end: ~10 containers, ~16 GB RAM recommended.

flowchart LR USR(["User ticket"]) NEMOIN["NeMo Guardrails ⓘ
input rails · step 04"] SDK["Claude Agent SDK ⓘ
think · act · observe · step 03"] CTX["Context ⓘ
markdown + pgvector RAG + Mem0"] TOOLS["Tools + Code ⓘ
MCP server · SQLite · E2B"] NEMOOUT["NeMo Guardrails ⓘ
output rails · step 04"] REPLY(["Customer reply"]) LITELLM["LiteLLM :4000 ⓘ
routing + budgets · step 06"] LF["Langfuse :3000 ⓘ
traces + evals · step 07"] USR -->|① input| NEMOIN NEMOIN -->|② run| SDK SDK -->|③ assemble| CTX CTX --> SDK SDK -->|④ act| TOOLS TOOLS --> SDK SDK -->|⑤ output| NEMOOUT NEMOOUT -->|⑥ reply| REPLY SDK -.model calls.-> LITELLM SDK -.spans.-> LF click USR "javascript:void(0)" "Customer support ticket — text only in this workshop. tickets/ticket-NN-*.md (6 scenarios: order status, refund, escalation, multi-issue, social engineering, RAG)" click NEMOIN "javascript:void(0)" "NeMo Guardrails Colang input rails — deterministic block on social engineering, PII leak attempts, off-topic requests. Runs before the model sees the ticket." click SDK "javascript:void(0)" "Claude Agent SDK — think-act-observe loop. max_turns caps runaway loops. Tool-result reflection improves multi-step coherence on multi-issue tickets." click CTX "javascript:void(0)" "Context assembly: markdown policies + 10 knowledge-base articles (step 01); pgvector RAG retrieval scoped to current ticket (step 05); Mem0 conversation memory across sessions (step 05). Drops context size from ~4k to ~400-800 tokens vs always-loading-everything." click TOOLS "javascript:void(0)" "Custom MCP server with 7 tools (step 02): lookup_order, lookup_customer, lookup_customer_by_email, search_orders_by_customer, lookup_subscription, issue_refund, cancel_subscription. SQLite at db/shopharness.db. Step 08 adds E2B per-ticket sandbox for tool isolation." click NEMOOUT "javascript:void(0)" "NeMo Guardrails output rails — Colang + regex. Rejects PII leaks, competitor mentions, off-policy responses. Final gate before customer sees the reply." click REPLY "javascript:void(0)" "Outbound reply after passing through output guardrails. In production this would ship to email/Zendesk/etc." click LITELLM "javascript:void(0)" "LiteLLM proxy on :4000 (step 06). Routes every model call. Virtual keys per team, per-day budgets, tiered routing (Haiku for classification, Sonnet for response), fallback chains. Admin UI at :4000/ui." click LF "javascript:void(0)" "Langfuse self-hosted on :3000 (step 07). Captures session-level traces with nested spans (RAG · memory · each turn · each tool). LLM-as-judge evaluators score against policy criteria. Login: admin@shopharness.dev / shopharness." classDef step fill:#fafaf8,stroke:#c45d2c,stroke-width:1.5px classDef cross fill:#f5f5f0,stroke:#6b6b6b,stroke-dasharray:4 3 class USR,NEMOIN,SDK,CTX,TOOLS,NEMOOUT,REPLY step class LITELLM,LF cross

Numbered edges = the 6-step flow through the workshop stack. Dashed = cross-cuts (every model call routes through LiteLLM; every span streams to Langfuse). Mirror of the AWS and Azure reference flows below — same shape, the workshop's open-source tools instead of managed cloud services. Hover the ⓘ marker on any node for setup detail.

Container reference

Container	Port	Step	Purpose
pgvector	5434	05	PostgreSQL + pgvector for RAG embeddings + Mem0 conversation memory
Neo4j	7474 / 7687	05	Graph store for Mem0 — currently idle (proxy `tool_choice` mismatch)
LiteLLM	4000	06	Model-routing proxy with virtual keys, budgets, fallbacks. Admin UI at `/ui`
LiteLLM PG	5435	06	PostgreSQL backing LiteLLM (keys, budgets, usage)
Langfuse web	3000	07	Self-hosted observability UI — session traces, datasets, LLM-as-judge evaluators
Langfuse PG	5436	07	PostgreSQL backing Langfuse (projects, users, prompts, scores)
ClickHouse	internal	07	High-cardinality columnar store for traces / spans
Redis	internal	07	Langfuse queue + cache
MinIO	9090	07	S3-compatible blob storage for trace payloads
Langfuse worker	internal	07	Async eval-job runner
E2B sandbox	cloud	08	Per-ticket VM for tool execution (not yet built)

Footprint

~10 containers running simultaneously by step 07. Recommend 16 GB RAM. If memory pressure shows up, stop Neo4j (idle): docker compose -f steps/05-memory/docker-compose.yml stop neo4j.

Step-by-step bring-up. You don't need everything at once. Each step's docker-compose.yml brings up only that step's containers; prior steps run from previously-started containers. Reset a step with make reset S=05.

— AWS reference

If you built this on AWS

Same eight components, mapped to canonical AWS services. Synthesized from production deployments (Robinhood, Epsilon, Rede Mater Dei) and AWS official guidance (Bedrock AgentCore, Strands SDK, Well-Architected GenAI Lens).

flowchart LR USR(["User / system"]) EDGE["CloudFront + WAF ⓘ"] APIGW["API Gateway ⓘ"] COST["GenAI Gateway ⓘ
Lambda Authz + DynamoDB"] BG["Bedrock Guardrails ⓘ"] RT["AgentCore Runtime ⓘ
Strands SDK"] CTX["Context ⓘ
KB · Gateway · Memory"] TOOLS["Tools + Code ⓘ
Gateway · Policy · Code Interpreter"] EXT[("Enterprise APIs")] ACOBS["Observability + Evaluations ⓘ"] ACID["AgentCore Identity ⓘ"] USR -->|① ingress| EDGE EDGE --> APIGW APIGW -->|② cost gate| COST COST --> BG BG -->|③ run| RT RT -->|④ assemble| CTX CTX --> RT RT -->|⑤ act| TOOLS TOOLS --> EXT TOOLS --> RT RT -->|⑥ output| BG BG -->|⑥ reply| USR RT -.spans.-> ACOBS ACID -.OAuth.-> RT ACID -.OAuth.-> TOOLS click USR "javascript:void(0)" "End user or upstream system initiating the request" click EDGE "javascript:void(0)" "CloudFront global CDN + AWS WAF managed rules at the edge — DDoS, rate-based filtering, geo controls" click APIGW "javascript:void(0)" "Amazon API Gateway — REST/HTTP gateway: auth (Cognito/IAM/Lambda), per-key usage plans, request-count throttling" click COST "javascript:void(0)" "GenAI Gateway pattern — Lambda Authorizer reads DynamoDB token budget; rejects over-budget pre-inference (AWS lacks a native APIM-equivalent token policy)" click BG "javascript:void(0)" "Amazon Bedrock Guardrails — content filters, denied topics, prompt-attack detection, PII redaction; applied to both prompt and response" click RT "javascript:void(0)" "AgentCore Runtime — serverless Firecracker microVM execution substrate; Strands/LangGraph/CrewAI compatible; A2A multi-agent protocol" click CTX "javascript:void(0)" "Bedrock Knowledge Bases (RAG over S3 Vectors) + AgentCore Gateway (semantic tool discovery, OpenAPI→MCP) + AgentCore Memory (short + long term)" click TOOLS "javascript:void(0)" "AgentCore Gateway converts OpenAPI to MCP; AgentCore Policy enforces Cedar behavioral rules; Code Interpreter runs in Firecracker microVM (use VPC mode)" click EXT "javascript:void(0)" "Enterprise APIs — CRM, ERP, data lakes the agent reads or writes through MCP-converted endpoints" click ACOBS "javascript:void(0)" "AgentCore Observability — OTEL traces in CloudWatch Transaction Search. AgentCore Evaluations — 13 built-in evaluators sampling live traffic" click ACID "javascript:void(0)" "AgentCore Identity — OAuth 2.0 user tokens + M2M tokens propagated end-to-end so the audit trail is coherent" classDef step fill:#fafaf8,stroke:#c45d2c,stroke-width:1.5px classDef cross fill:#f5f5f0,stroke:#6b6b6b,stroke-dasharray:4 3 class USR,EDGE,APIGW,COST,BG,RT,CTX,TOOLS,EXT step class ACOBS,ACID cross

Numbered edges = the 6-step flow. AWS has no APIM-equivalent — the GenAI Gateway is composed from API Gateway + Lambda Authorizer + DynamoDB. Hover the ⓘ marker on any node for a 1-line service description.

Component mapping

Harness piece	AWS service	Why this
Context engineering	Bedrock Knowledge Bases + AgentCore Gateway	KB handles full RAG (ingest → embed → retrieve). Gateway's semantic tool discovery prevents context bloat across hundreds of APIs.
Tool access	AgentCore Gateway (OpenAPI → MCP) + Lambda action groups	Gateway converts any OpenAPI-spec API to MCP without code. Lambda is the escape hatch for custom logic.
Orchestration	AgentCore Runtime + Strands Agents SDK	Runtime is the framework-agnostic substrate (Firecracker microVMs, A2A protocol). Strands is AWS's recommended SDK for new projects (14M+ downloads, GA 1.0).
Guardrails	Bedrock Guardrails + AgentCore Policy	Two enforcement planes: content safety (what the model says) + behavioral rules in Cedar (what the agent does with tools).
Memory	AgentCore Memory	Short-term session events (TTL up to 365d) + long-term async-extracted facts. KMS-encrypted. No DIY vector DB.
Cost controls	Step Functions + DynamoDB + CloudWatch + Budgets + prompt caching	No native budget knob. Step Functions gates each inference against DynamoDB token budgets. Claude prompt caching ≈ 90% savings on re-used context.
Observability + evals	AgentCore Observability + AgentCore Evaluations	OTEL traces in CloudWatch Transaction Search. 13 built-in evaluators (correctness, safety, goal success). Optional sinks: Datadog, Langfuse.
Sandbox / isolation	AgentCore Code Interpreter + VPC + PrivateLink	Firecracker microVM. Use VPC mode (not Sandbox mode — DNS exfil risk per Unit 42 research). Bedrock PrivateLink keeps inference off public internet.

How a request moves through AWS

Ingress: CloudFront + WAF + API Gateway

Request hits the edge: CloudFront for global delivery and DDoS shielding, AWS WAF for rate-based and content-rule filtering. Then Amazon API Gateway handles auth (Cognito / IAM / Lambda authorizer), routing, and per-API-key usage plans for request-count throttling. This is the AWS analogue of Azure APIM — except request-count throttling, not token-aware throttling, lives here.

Cost gate + input guardrails (the GenAI Gateway pattern)

Because API Gateway has no native token-aware quota policy, AWS production architectures compose a "GenAI Gateway": a Lambda Authorizer (or Step Functions step) that reads per-tenant token budgets from DynamoDB, rejects over-budget requests with 429 + Retry-After before any inference cost, and tracks consumption async. Approved requests then pass through Bedrock Guardrails for content filters, denied-topics, prompt-attack detection, and PII redaction. CloudWatch Alarms + AWS Budgets fire on threshold breach. Bedrock prompt caching cuts re-used context cost ≈ 90%.

Agent boots in AgentCore Runtime

Firecracker microVM, session-isolated. Strands SDK starts the think-act-observe loop. AgentCore Identity propagates the user's OAuth token (or M2M token) to every downstream call so the audit trail stays coherent end-to-end.

Context assembly: RAG + tool catalog + memory

Bedrock Knowledge Bases retrieves relevant chunks via S3 Vectors. AgentCore Gateway's semantic tool discovery surfaces only the tools relevant to this task — not the full catalog — so the context window stays lean. AgentCore Memory loads session events plus extracted long-term facts (preferences, prior summaries).

Tool calls + behavioral policy + code execution

Every tool call routes through AgentCore Gateway (OpenAPI → MCP). AgentCore Policy evaluates the call against Cedar rules in real time — allow, deny, or escalate. Code blocks execute inside AgentCore Code Interpreter (Firecracker microVM, VPC mode for full network isolation).

Output guardrails + observability

The agent's response passes back through Bedrock Guardrails for output-side filtering. Every span (LLM call, tool call, policy decision) streams via OpenTelemetry to AgentCore Observability → CloudWatch Transaction Search. AgentCore Evaluations asynchronously samples live traffic and scores it on 13 built-in evaluators (correctness, helpfulness, goal success, safety, context relevance).

Watch for

No APIM-equivalent on AWS. Azure APIM bundles auth, request throttling, and the AI-aware azure-openai-token-limit policy in one service. AWS has no single equivalent — production teams compose API Gateway (request count) + Lambda Authorizer + DynamoDB (token count) + Bedrock Guardrails. This is the "GenAI Gateway" pattern AWS publishes for multi-tenant deployments.

Sandbox mode ≠ network isolation. AgentCore Code Interpreter Sandbox Mode allows DNS resolution by design — confirmed exfil vector. Use VPC Mode for any regulated workload.

No native offline eval registry. AgentCore Evaluations samples live traffic only. For dataset-driven offline evals, teams reach for Langfuse, Braintrust, or S3 + Athena.

AgentCore · Strands 1.0 · Well-Architected GenAI Lens · Proactive cost management

— Azure reference

If you built this on Azure

Same eight components, mapped to canonical Azure services. Synthesized from Microsoft Foundry Agent Service production guidance, Azure Architecture Center reference architectures, and Microsoft product team blog posts (April 2026).

flowchart LR U(["User / app"]) FD["Front Door + WAF ⓘ"] APIM["APIM AI Gateway ⓘ
token-limit policy"] ACS["Foundry Guardrails ⓘ
Content Safety + Prompt Shields"] FAS["Foundry Agent Service ⓘ
+ Agent Framework"] CTX["Context ⓘ
AI Search · Toolbox · Cosmos DB"] TOOLS["Tools + Code ⓘ
Functions · ACA Dynamic Sessions"] FOBS["Foundry Observability ⓘ
+ App Insights"] AZMON["Cost Management ⓘ
Budgets + Action Groups"] U -->|① ingress| FD FD --> APIM APIM -->|② input check| ACS ACS -->|③ run| FAS FAS -->|④ assemble| CTX CTX --> FAS FAS -->|⑤ act| TOOLS TOOLS -->|guardrails 2 + 3| ACS ACS -->|allow| TOOLS TOOLS --> FAS FAS -->|⑥ output| ACS ACS -->|⑥ reply| U FAS -.spans.-> FOBS AZMON -.spend kill.-> APIM click U "javascript:void(0)" "End user or upstream app initiating the request" click FD "javascript:void(0)" "Azure Front Door — global edge, DDoS protection, WAF rules. Application Gateway used for regional alternative." click APIM "javascript:void(0)" "Azure API Management — auth + the native azure-openai-token-limit policy for AI-aware per-consumer TPM/quota enforcement (the AWS gap)" click ACS "javascript:void(0)" "Foundry Guardrails — Azure AI Content Safety + Prompt Shields applied at 4 intervention points: input, tool call, tool response, output" click FAS "javascript:void(0)" "Microsoft Foundry Agent Service — managed agent runtime + Microsoft Agent Framework (Semantic Kernel + AutoGen)" click CTX "javascript:void(0)" "Azure AI Search Agentic Retrieval (vector + hybrid + semantic) + Foundry Toolbox (versioned MCP) + Cosmos DB thread/entity stores + Foundry Managed Memory" click TOOLS "javascript:void(0)" "Foundry Toolbox + Azure Functions custom MCP webhooks + ACA Dynamic Sessions Hyper-V-isolated code sandboxes (millisecond cold start, MCP-native)" click FOBS "javascript:void(0)" "Foundry Observability — agent monitoring dashboard, OTEL traces, built-in evaluators (groundedness, relevance, tool-call accuracy), AI red-teaming agent (PyRIT)" click AZMON "javascript:void(0)" "Azure Cost Management Budgets — Action Groups trigger Automation runbooks to disable a deployment when spend breaches threshold" classDef step fill:#fafaf8,stroke:#2563eb,stroke-width:1.5px classDef cross fill:#f5f5f0,stroke:#6b6b6b,stroke-dasharray:4 3 class U,FD,APIM,ACS,FAS,CTX,TOOLS step class FOBS,AZMON cross

Numbered edges = the 6-step flow. APIM bundles auth + AI-aware token quotas natively — no compose-from-primitives required. Foundry Guardrails appear at all 4 intervention points (input · tool call · tool response · output). Hover the ⓘ marker on any node for a 1-line service description.

Component mapping

Harness piece	Azure service	Why this
Context engineering	Azure AI Search (Agentic Retrieval) + Foundry IQ	Native vector + hybrid + semantic ranking. Agentic Retrieval (GA 2025) breaks compound questions into sub-queries — built for agent consumption.
Tool access	Foundry Toolbox + Azure Functions (MCP webhook)	Toolbox centralizes versioned tool definitions on a single MCP endpoint. Functions exposes custom tools via `/runtime/webhooks/mcp`. Managed identity + OBO auth native.
Orchestration	Foundry Agent Service + Microsoft Agent Framework	Foundry is the managed runtime (hosting, scaling, threads). Agent Framework (preview Oct 2025) merges Semantic Kernel + AutoGen — sequential, concurrent, group-chat, magentic patterns.
Guardrails	Foundry Guardrails (Content Safety + Prompt Shields)	Four explicit intervention points: input, tool call, tool response, output. Prompt Shields handles direct + indirect prompt injection. PII detection still preview.
Memory	Cosmos DB (BYO) + Foundry Managed Memory	Cosmos containers: `thread-message-store`, `system-thread-message-store`, `agent-entity-store`. Foundry's managed layer (preview) is user-scoped, zero-infra.
Cost controls	APIM `azure-openai-token-limit` + Cost Management Budgets + Automation runbooks	Two-tier: APIM hard rate limits per consumer (TPM/quota → 429); Cost Management triggers Action Groups → runbook to disable a deployment when spend breaches budget.
Observability + evals	Foundry Observability (GA) + Application Insights	OTEL traces, agent monitoring dashboard, built-in evaluators (groundedness, relevance, tool-call accuracy), continuous prod evaluation, AI red-teaming agent (PyRIT). CI/CD quality gates.
Sandbox / isolation	ACA Dynamic Sessions + VNet + Private Endpoints	Hyper-V per session (stronger than container-only), millisecond cold start, MCP endpoint native (`isMCPServerEnabled: true`). Confidential Computing (TEE) in preview.

How a request moves through Azure

Ingress through APIM AI Gateway (optionally fronted by Front Door + WAF)

Production deployments commonly add Azure Front Door (global edge, DDoS) or Application Gateway (regional) with WAF in front of APIM for content-rule filtering. APIM is where the AI-aware concerns live: the azure-openai-token-limit policy enforces per-consumer TPM and token-count quotas natively (the unique AWS gap — see below), authentication via managed identity / OAuth2, OpenAPI-based routing. Over-budget consumers get 429 + Retry-After before any agent code runs.

Foundry Guardrails — input intervention (point 1 of 4)

Content Safety classifiers run on the user input. Prompt Shields catches direct jailbreak attempts and indirect prompt injection (XPIA) embedded in documents or tool responses. Harm categories (Hate, Sexual, Violence, Self-harm), task adherence, and protected-material checks fire first.

Agent runs in Foundry Agent Service

Managed runtime — hosting, scaling, identity, thread management. Microsoft Agent Framework (Semantic Kernel + AutoGen merged) drives the loop and supports sequential, concurrent, group-chat, handoff, and magentic patterns. Agent managed identity + OBO auth applied for downstream calls.

Context assembly: AI Search + Toolbox + memory

Foundry Toolbox surfaces versioned tools through a single MCP-compatible endpoint. Azure AI Search Agentic Retrieval breaks compound questions into sub-queries, runs vector + hybrid + semantic ranking, and merges results — built for agent consumption. Cosmos DB thread-message-store and agent-entity-store load session memory; Foundry Managed Memory adds user-scoped long-term context.

Tool calls re-intercepted + code in ACA Dynamic Sessions

Each tool call (intervention point 2) and tool response (intervention point 3) is screened by Foundry Guardrails — Azure's two extra interception points beyond AWS. Code execution lands in Azure Container Apps Dynamic Sessions: Hyper-V-isolated per-session sandbox, started in milliseconds from a pre-warmed pool, exposed as a remote MCP server.

Output intervention + observability

Final response passes through Foundry Guardrails one more time (intervention point 4). OpenTelemetry traces stream to Foundry Observability + Application Insights — agent monitoring dashboard, distributed traces, built-in evaluators (groundedness, relevance, tool-call accuracy, task completion), and continuous production evaluation. AI red-teaming agent (PyRIT) runs scheduled adversarial tests; Cost Management Budgets can trigger Action Groups → Automation runbooks to disable a deployment when spend breaches threshold.

Watch for

Hosted agents are North Central US only in preview as of April 2026. EMEA production deployments need prompt-agent type with private networking — plan around the residency gap.

Guardrails differ materially from AWS. Azure intercepts at four points (input · tool call · tool response · output) — more agent-native than Bedrock's two-plane model. But Azure's PII detection is preview and bias detection is absent.

Sandbox is purpose-built. ACA Dynamic Sessions is the closest thing to E2B in any major cloud — Hyper-V isolated, MCP-native, sub-second startup. AWS has no first-class equivalent.

Foundry Agent Service · Foundry Guardrails · ACA Dynamic Sessions · Foundry Observability

The agent harness, fully wired.

What each piece does

How a ticket moves through

Ingest + input guardrails

Context assembly

Agent loop

Cost + observability cross-cuts

Output guardrails + reply

What runs on your laptop

Container reference

If you built this on AWS

Component mapping

How a request moves through AWS

Ingress: CloudFront + WAF + API Gateway

Cost gate + input guardrails (the GenAI Gateway pattern)

Agent boots in AgentCore Runtime

Context assembly: RAG + tool catalog + memory

Tool calls + behavioral policy + code execution

Output guardrails + observability

If you built this on Azure

Component mapping

How a request moves through Azure

Ingress through APIM AI Gateway (optionally fronted by Front Door + WAF)

Foundry Guardrails — input intervention (point 1 of 4)

Agent runs in Foundry Agent Service

Context assembly: AI Search + Toolbox + memory

Tool calls re-intercepted + code in ACA Dynamic Sessions

Output intervention + observability