TL;DR
- Learn the core design patterns that actually matter for AI systems.
- See how each pattern maps to real AI orchestration problems.
- Use concrete code-style examples to plug into your stack.
- Understand trade-offs, not just theory.
- Apply a simple roadmap to introduce patterns without over-engineering.
The Stateless Problem: Why AI Needs Architecture
LLMs are stateless.
Every time you send a prompt to GPT, Claude, or any model, it forgets everything outside the current request. It does not remember your previous message, the plan it created two steps ago, or the tools it just called. Any “memory” comes from the system around it, not from the model itself.
This means the orchestration layer must own:
- Conversation history
- Business rules
- Tool routing
- Execution state
If you ignore this, you end up with a pile of scripts: ad-hoc calls to the model, scattered prompt templates, and duplicated logic around tools and memory. It works for a demo. It breaks in production.
Design patterns help turn that mess into a clear, composable architecture.
Below are 7 patterns that map directly to real AI problems, with examples you can adapt.
1. Factory Pattern: Standardizing Agent Creation
Problem
You need different agents: "Coder", "Researcher", "Critic", "Planner". Each needs:
- Specific system prompts
- Model configuration (temperature, model name)
- Tool access
- Vector store or memory wiring
Hard-coding this setup everywhere leads to duplicated logic and bugs whenever you change anything.
Bad approach:
# scattered setup
researcher = Agent(
role="researcher",
model="gpt-4o",
tools=[web_search, retrieve_docs],
memory=DocStore("research_index"),
)
coder = Agent(
role="coder",
model="gpt-4o",
tools=[code_exec, unit_test],
memory=None,
)
Every time you change how a researcher should work, you edit several places.
Pattern
Use a Factory to centralize agent creation.
class AgentFactory:
def __init__(self, vector_store, tools, models):
self.vector_store = vector_store
self.tools = tools
self.models = models
def create(self, agent_type: str):
if agent_type == "researcher":
return Agent(
role="researcher",
model=self.models["deep_reasoning"],
tools=[self.tools["web_search"], self.tools["retrieval"]],
memory=self.vector_store["research"],
)
if agent_type == "coder":
return Agent(
role="coder",
model=self.models["code_first"],
tools=[self.tools["code_exec"], self.tools["tests"]],
memory=None,
)
if agent_type == "critic":
return Agent(
role="critic",
model=self.models["deep_reasoning"],
tools=[self.tools["style_checker"]],
memory=None,
)
raise ValueError(f"Unknown agent type: {agent_type}")
Usage:
factory = AgentFactory(vector_store, tools, models)
researcher = factory.create("researcher")
coder = factory.create("coder")
critic = factory.create("critic")
When to Use
- You have more than one agent type.
- You want to update behavior in one place.
- You are building reusable frameworks or internal platforms.
When Not to Use
- You have a single agent and do not expect variation.
- Your use case is experimental and changing daily.
2. Strategy Pattern: Hot-Swapping “Brains”
Problem
Not every task needs GPT‑4‑level reasoning. Some tasks are simple:
- Summarize a short paragraph
- Extract a few fields
- Reformat text
If all requests go to a heavy model, you burn budget and increase latency.
Hard-coded model usage:
def answer(query: str) -> str:
# always uses the most expensive model
return call_llm("gpt-4o", query)
Pattern
Use Strategy to define a family of “brain” options and pick the right one at runtime.
class ModelStrategy:
def generate(self, prompt: str) -> str:
raise NotImplementedError
class CheapModelStrategy(ModelStrategy):
def generate(self, prompt: str) -> str:
return call_llm("gpt-4o-mini", prompt)
class ExpensiveModelStrategy(ModelStrategy):
def generate(self, prompt: str) -> str:
return call_llm("gpt-4o", prompt)
class Router:
def __init__(self, cheap: ModelStrategy, expensive: ModelStrategy):
self.cheap = cheap
self.expensive = expensive
def handle(self, task: dict) -> str:
if self._is_simple(task):
return self.cheap.generate(task["prompt"])
return self.expensive.generate(task["prompt"])
def _is_simple(self, task: dict) -> bool:
return len(task["prompt"]) < 300 and not task.get("requires_tools")
Usage:
router = Router(
cheap=CheapModelStrategy(),
expensive=ExpensiveModelStrategy(),
)
response = router.handle({"prompt": user_input, "requires_tools": False})
Benefits
- Reduces cost by routing trivial tasks to cheaper models.
- Keeps the calling code stable while you change routing rules.
When Not to Use
- You always use a single fixed model.
- You do not yet measure cost or latency.
3. Proxy Pattern: The Gatekeeper to Your LLM
Problem
Direct calls to LLM APIs can cause:
- Uncontrolled cost
- Rate-limit issues
- Lack of observability
- Compliance and PII risks
Code without a proxy:
def ask_llm(prompt: str) -> str:
return openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
Every part of your code calls the provider directly. You cannot enforce rules in one place.
Pattern
Use a Proxy between your app and the LLM provider.
class LLMProxy:
def __init__(self, client, cache, limiter, pii_filter):
self.client = client
self.cache = cache
self.limiter = limiter
self.pii_filter = pii_filter
def generate(self, model: str, messages: list[dict]) -> str:
key = self._cache_key(model, messages)
cached = self.cache.get(key)
if cached:
return cached
self.limiter.check_quota()
safe_messages = self.pii_filter.strip(messages)
response = self.client.chat(model=model, messages=safe_messages)
self.cache.set(key, response)
return response
def _cache_key(self, model: str, messages: list[dict]) -> str:
# implement a stable hash
...
Usage:
proxy = LLMProxy(client, cache, limiter, pii_filter)
def ask_llm(prompt: str) -> str:
return proxy.generate("gpt-4o", [{"role": "user", "content": prompt}])
Benefits
- Central control of cost, logs, and safety.
- Easy integration with tools like LiteLLM or Helicone.
- Keeps business logic clean.
When Not to Use
- Local-only experiments.
- One-off scripts where cost and safety are not concerns.
4. Decorator Pattern: Observability Without Clutter
Problem
You want to log:
- Inputs
- Outputs
- Token usage
- Execution time
Adding logging code inside every function makes it noisy.
Bad approach:
def search_web(query: str) -> str:
start = time.time()
print("search_web called with:", query)
result = call_api(query)
print("result:", result, "took", time.time() - start)
return result
Pattern
Use Decorators to wrap behavior around functions.
def trace(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
log_event(
name=func.__name__,
args=args,
kwargs=kwargs,
duration_ms=int(duration * 1000),
)
return result
return wrapper
@trace
def search_web(query: str) -> str:
return call_api(query)
All functions marked with @trace become observable without changing their core logic.
Benefits
- Clean separation between logic and observability.
- Easy integration with tracing tools (e.g., LangSmith, OpenTelemetry).
When Not to Use
- Extremely performance-critical paths where decorator overhead is not acceptable.
- Cases where you must see all logic inline for audit reasons.
5. Composite Pattern: Handling Multi-Step Workflows as Single Units
Problem
Complex AI flows often have nested steps:
- Research → Plan → Code → Review
- Retrieve → Generate → Validate → Persist
If you treat every step as a separate top-level entity, you lose structure and reuse.
Pattern
Use Composite so both atomic steps and groups of steps share the same interface.
class Task:
def run(self, context: dict) -> dict:
raise NotImplementedError
class SimpleTask(Task):
def __init__(self, name, handler):
self.name = name
self.handler = handler
def run(self, context: dict) -> dict:
return self.handler(context)
class CompositeTask(Task):
def __init__(self, name, children: list[Task]):
self.name = name
self.children = children
def run(self, context: dict) -> dict:
state = context
for child in self.children:
state = child.run(state)
return state
Usage:
research = SimpleTask("research", research_handler)
draft = SimpleTask("draft", draft_handler)
review = SimpleTask("review", review_handler)
write_article = CompositeTask("write_article", [research, draft, review])
result = write_article.run({"topic": "multi-agent systems"})
Benefits
- Treat multi-step workflows like single tasks.
- Compose larger flows from smaller units.
When Not to Use
- Very simple linear flows with no reuse.
- Cases where orchestration is already handled by an external engine (e.g., n8n, Airflow) and you do not need an in-code representation.
6. Chain of Responsibility: Tool and Agent Routing
Problem
A single agent or function tries to handle every request:
- Some queries need a web search.
- Others need database access.
- Others need code execution.
Hard-coded logic becomes a large if/else block.
Pattern
Use Chain of Responsibility to pass a request through a pipeline of handlers until one takes responsibility.
class Handler:
def __init__(self, next_handler=None):
self.next = next_handler
def handle(self, request: dict):
if self.next:
return self.next.handle(request)
return None
class WebSearchHandler(Handler):
def handle(self, request: dict):
if "search" in request["intent"]:
return search_web(request["query"])
return super().handle(request)
class DbQueryHandler(Handler):
def handle(self, request: dict):
if "db" in request["intent"]:
return query_db(request["query"])
return super().handle(request)
Usage:
pipeline = WebSearchHandler(
next_handler=DbQueryHandler(
next_handler=Handler() # default fallback
)
)
response = pipeline.handle({"intent": "search", "query": "vector databases"})
Benefits
- Each handler has a single responsibility.
- Easy to add, remove, or reorder steps.
When Not to Use
- When routing rules are simple and stable.
- When you already use a dedicated router component.
7. Mediator Pattern: Coordination in Multi-Agent Systems
Problem
In a multi-agent setup, agents start calling each other directly:
- Researcher calls Coder
- Coder calls Critic
- Critic calls Researcher again
This creates hidden coupling and complex feedback loops.
Pattern
Use a Mediator that manages communication. Agents talk to the mediator, not to each other.
class Mediator:
def __init__(self, factory: AgentFactory):
self.factory = factory
def handle(self, task: dict) -> str:
researcher = self.factory.create("researcher")
coder = self.factory.create("coder")
critic = self.factory.create("critic")
research = researcher.run(task)
code = coder.run({"spec": research})
review = critic.run({"code": code})
if review["status"] == "approve":
return code
return self._iterate(task, review)
def _iterate(self, task: dict, review: dict) -> str:
# implement revision loop
...
Agents stay simple. The mediator owns the workflow logic.
Benefits
- Central place to change orchestration.
- Easier to reason about loops and failure modes.
When Not to Use
- Single-agent systems with no collaboration.
- Extremely simple flows that do not justify a coordinator.
A Practical Adoption Roadmap
Do not try to implement all patterns at once. A safe sequence:
- Proxy: Put a proxy in front of your LLM provider to gain observability, cost control, and safety.
- Factory: Standardize how agents are created to remove setup duplication.
- Strategy: Add model routing to control cost and latency.
- Decorator: Add tracing and logging without touching business logic.
- Composite / Chain of Responsibility / Mediator: Introduce structured orchestration as workflows grow.
If you already feel your codebase is becoming “just scripts,” you are at the point where these patterns pay off.