Eight components. Containerized. Production-shaped.
Eight numbered components, each labeled with the workshop tool that implements it and the step that introduces it. Solid arrows = request path. Dashed = cross-cutting (cost routing, observability spans). Hover the ⓘ marker on any node for setup detail.
| Component | Tool | What it solves | Container / Port |
|---|---|---|---|
| Context engineering | Markdown policies + pgvector RAG | Agent does not know your business rules | pgvector :5434 |
| Tool access | MCP servers | Agent cannot read or write your data | local stdio |
| Orchestration | Claude Agent SDK | Multi-step reasoning across tickets | in-process |
| Guardrails | NeMo Guardrails | Probabilistic safety is not enough | in-process |
| Memory | Mem0 + pgvector | Customer re-explains every interaction | pgvector :5434 |
| Cost controls | LiteLLM proxy | Ticket spike drains AI budget | :4000 (PG :5435) |
| Observability + evals | Langfuse self-hosted | You cannot debug what you cannot see | :3000 (PG :5436) |
| Sandbox | E2B | Tool execution touches production | per-ticket VM |
Ticket text passes through NeMo input rails. Social engineering and PII leak attempts blocked deterministically before the model sees them.
Policies (markdown), retrieved knowledge base articles (RAG over pgvector), and conversation memory (Mem0) merged into the system prompt.
Claude Agent SDK runs the think-act-observe cycle. Each tool call goes through MCP into the per-ticket E2B sandbox.
Every model call routes through LiteLLM (model selection, budget enforcement, fallback). Every span streams to Langfuse for trace + eval scoring.
Final response passes through output rails (regex + LLM-judged). Approved reply ships to the customer; rejected output triggers escalation.
Each workshop step is additive. Step 00–04 is in-process Python only. Step 05 introduces containers (pgvector for RAG + Mem0). Step 06 adds a model-routing proxy. Step 07 adds observability infra. Step 08 reaches into E2B's cloud for per-ticket sandboxes. Total local footprint at the end: ~10 containers, ~16 GB RAM recommended.
Numbered edges = the 6-step flow through the workshop stack. Dashed = cross-cuts (every model call routes through LiteLLM; every span streams to Langfuse). Mirror of the AWS and Azure reference flows below — same shape, the workshop's open-source tools instead of managed cloud services. Hover the ⓘ marker on any node for setup detail.
| Container | Port | Step | Purpose |
|---|---|---|---|
| pgvector | 5434 | 05 | PostgreSQL + pgvector for RAG embeddings + Mem0 conversation memory |
| Neo4j | 7474 / 7687 | 05 | Graph store for Mem0 — currently idle (proxy tool_choice mismatch) |
| LiteLLM | 4000 | 06 | Model-routing proxy with virtual keys, budgets, fallbacks. Admin UI at /ui |
| LiteLLM PG | 5435 | 06 | PostgreSQL backing LiteLLM (keys, budgets, usage) |
| Langfuse web | 3000 | 07 | Self-hosted observability UI — session traces, datasets, LLM-as-judge evaluators |
| Langfuse PG | 5436 | 07 | PostgreSQL backing Langfuse (projects, users, prompts, scores) |
| ClickHouse | internal | 07 | High-cardinality columnar store for traces / spans |
| Redis | internal | 07 | Langfuse queue + cache |
| MinIO | 9090 | 07 | S3-compatible blob storage for trace payloads |
| Langfuse worker | internal | 07 | Async eval-job runner |
| E2B sandbox | cloud | 08 | Per-ticket VM for tool execution (not yet built) |
~10 containers running simultaneously by step 07. Recommend 16 GB RAM. If memory pressure shows up, stop Neo4j (idle): docker compose -f steps/05-memory/docker-compose.yml stop neo4j.
Step-by-step bring-up. You don't need everything at once. Each step's docker-compose.yml brings up only that step's containers; prior steps run from previously-started containers. Reset a step with make reset S=05.
Same eight components, mapped to canonical AWS services. Synthesized from production deployments (Robinhood, Epsilon, Rede Mater Dei) and AWS official guidance (Bedrock AgentCore, Strands SDK, Well-Architected GenAI Lens).
Numbered edges = the 6-step flow. AWS has no APIM-equivalent — the GenAI Gateway is composed from API Gateway + Lambda Authorizer + DynamoDB. Hover the ⓘ marker on any node for a 1-line service description.
| Harness piece | AWS service | Why this |
|---|---|---|
| Context engineering | Bedrock Knowledge Bases + AgentCore Gateway | KB handles full RAG (ingest → embed → retrieve). Gateway's semantic tool discovery prevents context bloat across hundreds of APIs. |
| Tool access | AgentCore Gateway (OpenAPI → MCP) + Lambda action groups | Gateway converts any OpenAPI-spec API to MCP without code. Lambda is the escape hatch for custom logic. |
| Orchestration | AgentCore Runtime + Strands Agents SDK | Runtime is the framework-agnostic substrate (Firecracker microVMs, A2A protocol). Strands is AWS's recommended SDK for new projects (14M+ downloads, GA 1.0). |
| Guardrails | Bedrock Guardrails + AgentCore Policy | Two enforcement planes: content safety (what the model says) + behavioral rules in Cedar (what the agent does with tools). |
| Memory | AgentCore Memory | Short-term session events (TTL up to 365d) + long-term async-extracted facts. KMS-encrypted. No DIY vector DB. |
| Cost controls | Step Functions + DynamoDB + CloudWatch + Budgets + prompt caching | No native budget knob. Step Functions gates each inference against DynamoDB token budgets. Claude prompt caching ≈ 90% savings on re-used context. |
| Observability + evals | AgentCore Observability + AgentCore Evaluations | OTEL traces in CloudWatch Transaction Search. 13 built-in evaluators (correctness, safety, goal success). Optional sinks: Datadog, Langfuse. |
| Sandbox / isolation | AgentCore Code Interpreter + VPC + PrivateLink | Firecracker microVM. Use VPC mode (not Sandbox mode — DNS exfil risk per Unit 42 research). Bedrock PrivateLink keeps inference off public internet. |
Request hits the edge: CloudFront for global delivery and DDoS shielding, AWS WAF for rate-based and content-rule filtering. Then Amazon API Gateway handles auth (Cognito / IAM / Lambda authorizer), routing, and per-API-key usage plans for request-count throttling. This is the AWS analogue of Azure APIM — except request-count throttling, not token-aware throttling, lives here.
Because API Gateway has no native token-aware quota policy, AWS production architectures compose a "GenAI Gateway": a Lambda Authorizer (or Step Functions step) that reads per-tenant token budgets from DynamoDB, rejects over-budget requests with 429 + Retry-After before any inference cost, and tracks consumption async. Approved requests then pass through Bedrock Guardrails for content filters, denied-topics, prompt-attack detection, and PII redaction. CloudWatch Alarms + AWS Budgets fire on threshold breach. Bedrock prompt caching cuts re-used context cost ≈ 90%.
Firecracker microVM, session-isolated. Strands SDK starts the think-act-observe loop. AgentCore Identity propagates the user's OAuth token (or M2M token) to every downstream call so the audit trail stays coherent end-to-end.
Bedrock Knowledge Bases retrieves relevant chunks via S3 Vectors. AgentCore Gateway's semantic tool discovery surfaces only the tools relevant to this task — not the full catalog — so the context window stays lean. AgentCore Memory loads session events plus extracted long-term facts (preferences, prior summaries).
Every tool call routes through AgentCore Gateway (OpenAPI → MCP). AgentCore Policy evaluates the call against Cedar rules in real time — allow, deny, or escalate. Code blocks execute inside AgentCore Code Interpreter (Firecracker microVM, VPC mode for full network isolation).
The agent's response passes back through Bedrock Guardrails for output-side filtering. Every span (LLM call, tool call, policy decision) streams via OpenTelemetry to AgentCore Observability → CloudWatch Transaction Search. AgentCore Evaluations asynchronously samples live traffic and scores it on 13 built-in evaluators (correctness, helpfulness, goal success, safety, context relevance).
No APIM-equivalent on AWS. Azure APIM bundles auth, request throttling, and the AI-aware azure-openai-token-limit policy in one service. AWS has no single equivalent — production teams compose API Gateway (request count) + Lambda Authorizer + DynamoDB (token count) + Bedrock Guardrails. This is the "GenAI Gateway" pattern AWS publishes for multi-tenant deployments.
Sandbox mode ≠ network isolation. AgentCore Code Interpreter Sandbox Mode allows DNS resolution by design — confirmed exfil vector. Use VPC Mode for any regulated workload.
No native offline eval registry. AgentCore Evaluations samples live traffic only. For dataset-driven offline evals, teams reach for Langfuse, Braintrust, or S3 + Athena.
AgentCore · Strands 1.0 · Well-Architected GenAI Lens · Proactive cost management
Same eight components, mapped to canonical Azure services. Synthesized from Microsoft Foundry Agent Service production guidance, Azure Architecture Center reference architectures, and Microsoft product team blog posts (April 2026).
Numbered edges = the 6-step flow. APIM bundles auth + AI-aware token quotas natively — no compose-from-primitives required. Foundry Guardrails appear at all 4 intervention points (input · tool call · tool response · output). Hover the ⓘ marker on any node for a 1-line service description.
| Harness piece | Azure service | Why this |
|---|---|---|
| Context engineering | Azure AI Search (Agentic Retrieval) + Foundry IQ | Native vector + hybrid + semantic ranking. Agentic Retrieval (GA 2025) breaks compound questions into sub-queries — built for agent consumption. |
| Tool access | Foundry Toolbox + Azure Functions (MCP webhook) | Toolbox centralizes versioned tool definitions on a single MCP endpoint. Functions exposes custom tools via /runtime/webhooks/mcp. Managed identity + OBO auth native. |
| Orchestration | Foundry Agent Service + Microsoft Agent Framework | Foundry is the managed runtime (hosting, scaling, threads). Agent Framework (preview Oct 2025) merges Semantic Kernel + AutoGen — sequential, concurrent, group-chat, magentic patterns. |
| Guardrails | Foundry Guardrails (Content Safety + Prompt Shields) | Four explicit intervention points: input, tool call, tool response, output. Prompt Shields handles direct + indirect prompt injection. PII detection still preview. |
| Memory | Cosmos DB (BYO) + Foundry Managed Memory | Cosmos containers: thread-message-store, system-thread-message-store, agent-entity-store. Foundry's managed layer (preview) is user-scoped, zero-infra. |
| Cost controls | APIM azure-openai-token-limit + Cost Management Budgets + Automation runbooks | Two-tier: APIM hard rate limits per consumer (TPM/quota → 429); Cost Management triggers Action Groups → runbook to disable a deployment when spend breaches budget. |
| Observability + evals | Foundry Observability (GA) + Application Insights | OTEL traces, agent monitoring dashboard, built-in evaluators (groundedness, relevance, tool-call accuracy), continuous prod evaluation, AI red-teaming agent (PyRIT). CI/CD quality gates. |
| Sandbox / isolation | ACA Dynamic Sessions + VNet + Private Endpoints | Hyper-V per session (stronger than container-only), millisecond cold start, MCP endpoint native (isMCPServerEnabled: true). Confidential Computing (TEE) in preview. |
Production deployments commonly add Azure Front Door (global edge, DDoS) or Application Gateway (regional) with WAF in front of APIM for content-rule filtering. APIM is where the AI-aware concerns live: the azure-openai-token-limit policy enforces per-consumer TPM and token-count quotas natively (the unique AWS gap — see below), authentication via managed identity / OAuth2, OpenAPI-based routing. Over-budget consumers get 429 + Retry-After before any agent code runs.
Content Safety classifiers run on the user input. Prompt Shields catches direct jailbreak attempts and indirect prompt injection (XPIA) embedded in documents or tool responses. Harm categories (Hate, Sexual, Violence, Self-harm), task adherence, and protected-material checks fire first.
Managed runtime — hosting, scaling, identity, thread management. Microsoft Agent Framework (Semantic Kernel + AutoGen merged) drives the loop and supports sequential, concurrent, group-chat, handoff, and magentic patterns. Agent managed identity + OBO auth applied for downstream calls.
Foundry Toolbox surfaces versioned tools through a single MCP-compatible endpoint. Azure AI Search Agentic Retrieval breaks compound questions into sub-queries, runs vector + hybrid + semantic ranking, and merges results — built for agent consumption. Cosmos DB thread-message-store and agent-entity-store load session memory; Foundry Managed Memory adds user-scoped long-term context.
Each tool call (intervention point 2) and tool response (intervention point 3) is screened by Foundry Guardrails — Azure's two extra interception points beyond AWS. Code execution lands in Azure Container Apps Dynamic Sessions: Hyper-V-isolated per-session sandbox, started in milliseconds from a pre-warmed pool, exposed as a remote MCP server.
Final response passes through Foundry Guardrails one more time (intervention point 4). OpenTelemetry traces stream to Foundry Observability + Application Insights — agent monitoring dashboard, distributed traces, built-in evaluators (groundedness, relevance, tool-call accuracy, task completion), and continuous production evaluation. AI red-teaming agent (PyRIT) runs scheduled adversarial tests; Cost Management Budgets can trigger Action Groups → Automation runbooks to disable a deployment when spend breaches threshold.
Hosted agents are North Central US only in preview as of April 2026. EMEA production deployments need prompt-agent type with private networking — plan around the residency gap.
Guardrails differ materially from AWS. Azure intercepts at four points (input · tool call · tool response · output) — more agent-native than Bedrock's two-plane model. But Azure's PII detection is preview and bias detection is absent.
Sandbox is purpose-built. ACA Dynamic Sessions is the closest thing to E2B in any major cloud — Hyper-V isolated, MCP-native, sub-second startup. AWS has no first-class equivalent.
Foundry Agent Service · Foundry Guardrails · ACA Dynamic Sessions · Foundry Observability