Design Patterns Every AI Team Should Know

Use Factory, Strategy, Proxy, Decorator, Composite, Chain of Responsibility, and Mediator to build scalable, maintainable AI agent architectures.
Master Factory, Strategy, Proxy, Decorator, Composite, Chain of Responsibility, and Mediator patterns to build scalable, maintainable AI agent architectures.

TL;DR

  • Learn the core design patterns that actually matter for AI systems.
  • See how each pattern maps to real AI orchestration problems.
  • Use concrete code-style examples to plug into your stack.
  • Understand trade-offs, not just theory.
  • Apply a simple roadmap to introduce patterns without over-engineering.

The Stateless Problem: Why AI Needs Architecture

LLMs are stateless.

Every time you send a prompt to GPT, Claude, or any model, it forgets everything outside the current request. It does not remember your previous message, the plan it created two steps ago, or the tools it just called. Any “memory” comes from the system around it, not from the model itself.

This means the orchestration layer must own:

  • Conversation history
  • Business rules
  • Tool routing
  • Execution state

If you ignore this, you end up with a pile of scripts: ad-hoc calls to the model, scattered prompt templates, and duplicated logic around tools and memory. It works for a demo. It breaks in production.

Design patterns help turn that mess into a clear, composable architecture.

Below are 7 patterns that map directly to real AI problems, with examples you can adapt.


1. Factory Pattern: Standardizing Agent Creation

Problem

You need different agents: "Coder", "Researcher", "Critic", "Planner". Each needs:

  • Specific system prompts
  • Model configuration (temperature, model name)
  • Tool access
  • Vector store or memory wiring

Hard-coding this setup everywhere leads to duplicated logic and bugs whenever you change anything.

Bad approach:

# scattered setup
researcher = Agent(
    role="researcher",
    model="gpt-4o",
    tools=[web_search, retrieve_docs],
    memory=DocStore("research_index"),
)

coder = Agent(
    role="coder",
    model="gpt-4o",
    tools=[code_exec, unit_test],
    memory=None,
)

Every time you change how a researcher should work, you edit several places.

Pattern

Use a Factory to centralize agent creation.

class AgentFactory:
    def __init__(self, vector_store, tools, models):
        self.vector_store = vector_store
        self.tools = tools
        self.models = models

    def create(self, agent_type: str):
        if agent_type == "researcher":
            return Agent(
                role="researcher",
                model=self.models["deep_reasoning"],
                tools=[self.tools["web_search"], self.tools["retrieval"]],
                memory=self.vector_store["research"],
            )
        if agent_type == "coder":
            return Agent(
                role="coder",
                model=self.models["code_first"],
                tools=[self.tools["code_exec"], self.tools["tests"]],
                memory=None,
            )
        if agent_type == "critic":
            return Agent(
                role="critic",
                model=self.models["deep_reasoning"],
                tools=[self.tools["style_checker"]],
                memory=None,
            )
        raise ValueError(f"Unknown agent type: {agent_type}")

Usage:

factory = AgentFactory(vector_store, tools, models)

researcher = factory.create("researcher")
coder = factory.create("coder")
critic = factory.create("critic")

When to Use

  • You have more than one agent type.
  • You want to update behavior in one place.
  • You are building reusable frameworks or internal platforms.

When Not to Use

  • You have a single agent and do not expect variation.
  • Your use case is experimental and changing daily.

2. Strategy Pattern: Hot-Swapping “Brains”

Problem

Not every task needs GPT‑4‑level reasoning. Some tasks are simple:

  • Summarize a short paragraph
  • Extract a few fields
  • Reformat text

If all requests go to a heavy model, you burn budget and increase latency.

Hard-coded model usage:

def answer(query: str) -> str:
    # always uses the most expensive model
    return call_llm("gpt-4o", query)

Pattern

Use Strategy to define a family of “brain” options and pick the right one at runtime.

class ModelStrategy:
    def generate(self, prompt: str) -> str:
        raise NotImplementedError

class CheapModelStrategy(ModelStrategy):
    def generate(self, prompt: str) -> str:
        return call_llm("gpt-4o-mini", prompt)

class ExpensiveModelStrategy(ModelStrategy):
    def generate(self, prompt: str) -> str:
        return call_llm("gpt-4o", prompt)

class Router:
    def __init__(self, cheap: ModelStrategy, expensive: ModelStrategy):
        self.cheap = cheap
        self.expensive = expensive

    def handle(self, task: dict) -> str:
        if self._is_simple(task):
            return self.cheap.generate(task["prompt"])
        return self.expensive.generate(task["prompt"])

    def _is_simple(self, task: dict) -> bool:
        return len(task["prompt"]) < 300 and not task.get("requires_tools")

Usage:

router = Router(
    cheap=CheapModelStrategy(),
    expensive=ExpensiveModelStrategy(),
)

response = router.handle({"prompt": user_input, "requires_tools": False})

Benefits

  • Reduces cost by routing trivial tasks to cheaper models.
  • Keeps the calling code stable while you change routing rules.

When Not to Use

  • You always use a single fixed model.
  • You do not yet measure cost or latency.

3. Proxy Pattern: The Gatekeeper to Your LLM

Problem

Direct calls to LLM APIs can cause:

  • Uncontrolled cost
  • Rate-limit issues
  • Lack of observability
  • Compliance and PII risks

Code without a proxy:

def ask_llm(prompt: str) -> str:
    return openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )

Every part of your code calls the provider directly. You cannot enforce rules in one place.

Pattern

Use a Proxy between your app and the LLM provider.

class LLMProxy:
    def __init__(self, client, cache, limiter, pii_filter):
        self.client = client
        self.cache = cache
        self.limiter = limiter
        self.pii_filter = pii_filter

    def generate(self, model: str, messages: list[dict]) -> str:
        key = self._cache_key(model, messages)

        cached = self.cache.get(key)
        if cached:
            return cached

        self.limiter.check_quota()

        safe_messages = self.pii_filter.strip(messages)

        response = self.client.chat(model=model, messages=safe_messages)

        self.cache.set(key, response)
        return response

    def _cache_key(self, model: str, messages: list[dict]) -> str:
        # implement a stable hash
        ...

Usage:

proxy = LLMProxy(client, cache, limiter, pii_filter)

def ask_llm(prompt: str) -> str:
    return proxy.generate("gpt-4o", [{"role": "user", "content": prompt}])

Benefits

  • Central control of cost, logs, and safety.
  • Easy integration with tools like LiteLLM or Helicone.
  • Keeps business logic clean.

When Not to Use

  • Local-only experiments.
  • One-off scripts where cost and safety are not concerns.

4. Decorator Pattern: Observability Without Clutter

Problem

You want to log:

  • Inputs
  • Outputs
  • Token usage
  • Execution time

Adding logging code inside every function makes it noisy.

Bad approach:

def search_web(query: str) -> str:
    start = time.time()
    print("search_web called with:", query)
    result = call_api(query)
    print("result:", result, "took", time.time() - start)
    return result

Pattern

Use Decorators to wrap behavior around functions.

def trace(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start

        log_event(
            name=func.__name__,
            args=args,
            kwargs=kwargs,
            duration_ms=int(duration * 1000),
        )
        return result
    return wrapper

@trace
def search_web(query: str) -> str:
    return call_api(query)

All functions marked with @trace become observable without changing their core logic.

Benefits

  • Clean separation between logic and observability.
  • Easy integration with tracing tools (e.g., LangSmith, OpenTelemetry).

When Not to Use

  • Extremely performance-critical paths where decorator overhead is not acceptable.
  • Cases where you must see all logic inline for audit reasons.

5. Composite Pattern: Handling Multi-Step Workflows as Single Units

Problem

Complex AI flows often have nested steps:

  • Research → Plan → Code → Review
  • Retrieve → Generate → Validate → Persist

If you treat every step as a separate top-level entity, you lose structure and reuse.

Pattern

Use Composite so both atomic steps and groups of steps share the same interface.

class Task:
    def run(self, context: dict) -> dict:
        raise NotImplementedError

class SimpleTask(Task):
    def __init__(self, name, handler):
        self.name = name
        self.handler = handler

    def run(self, context: dict) -> dict:
        return self.handler(context)

class CompositeTask(Task):
    def __init__(self, name, children: list[Task]):
        self.name = name
        self.children = children

    def run(self, context: dict) -> dict:
        state = context
        for child in self.children:
            state = child.run(state)
        return state

Usage:

research = SimpleTask("research", research_handler)
draft = SimpleTask("draft", draft_handler)
review = SimpleTask("review", review_handler)

write_article = CompositeTask("write_article", [research, draft, review])

result = write_article.run({"topic": "multi-agent systems"})

Benefits

  • Treat multi-step workflows like single tasks.
  • Compose larger flows from smaller units.

When Not to Use

  • Very simple linear flows with no reuse.
  • Cases where orchestration is already handled by an external engine (e.g., n8n, Airflow) and you do not need an in-code representation.

6. Chain of Responsibility: Tool and Agent Routing

Problem

A single agent or function tries to handle every request:

  • Some queries need a web search.
  • Others need database access.
  • Others need code execution.

Hard-coded logic becomes a large if/else block.

Pattern

Use Chain of Responsibility to pass a request through a pipeline of handlers until one takes responsibility.

class Handler:
    def __init__(self, next_handler=None):
        self.next = next_handler

    def handle(self, request: dict):
        if self.next:
            return self.next.handle(request)
        return None

class WebSearchHandler(Handler):
    def handle(self, request: dict):
        if "search" in request["intent"]:
            return search_web(request["query"])
        return super().handle(request)

class DbQueryHandler(Handler):
    def handle(self, request: dict):
        if "db" in request["intent"]:
            return query_db(request["query"])
        return super().handle(request)

Usage:

pipeline = WebSearchHandler(
    next_handler=DbQueryHandler(
        next_handler=Handler()  # default fallback
    )
)

response = pipeline.handle({"intent": "search", "query": "vector databases"})

Benefits

  • Each handler has a single responsibility.
  • Easy to add, remove, or reorder steps.

When Not to Use

  • When routing rules are simple and stable.
  • When you already use a dedicated router component.

7. Mediator Pattern: Coordination in Multi-Agent Systems

Problem

In a multi-agent setup, agents start calling each other directly:

  • Researcher calls Coder
  • Coder calls Critic
  • Critic calls Researcher again

This creates hidden coupling and complex feedback loops.

Pattern

Use a Mediator that manages communication. Agents talk to the mediator, not to each other.

class Mediator:
    def __init__(self, factory: AgentFactory):
        self.factory = factory

    def handle(self, task: dict) -> str:
        researcher = self.factory.create("researcher")
        coder = self.factory.create("coder")
        critic = self.factory.create("critic")

        research = researcher.run(task)
        code = coder.run({"spec": research})
        review = critic.run({"code": code})

        if review["status"] == "approve":
            return code
        return self._iterate(task, review)

    def _iterate(self, task: dict, review: dict) -> str:
        # implement revision loop
        ...

Agents stay simple. The mediator owns the workflow logic.

Benefits

  • Central place to change orchestration.
  • Easier to reason about loops and failure modes.

When Not to Use

  • Single-agent systems with no collaboration.
  • Extremely simple flows that do not justify a coordinator.

A Practical Adoption Roadmap

Do not try to implement all patterns at once. A safe sequence:

  1. Proxy: Put a proxy in front of your LLM provider to gain observability, cost control, and safety.
  2. Factory: Standardize how agents are created to remove setup duplication.
  3. Strategy: Add model routing to control cost and latency.
  4. Decorator: Add tracing and logging without touching business logic.
  5. Composite / Chain of Responsibility / Mediator: Introduce structured orchestration as workflows grow.

If you already feel your codebase is becoming “just scripts,” you are at the point where these patterns pay off.


Go to top