How should I provide an agent to a LangGraph server?

I’m using LangGraph to serve an agent with langgraph dev --no-browser.

The code below provides a get_agent function that returns a CompiledStateGraph object. Notice the _agent variable is used as a global variable to prevent the function from creating a new agent (or CompiledStateGraph object) each time the function is called.

# joker_agent.py

from langchain.agents import create_agent
from langgraph.graph.state import CompiledStateGraph
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient


_agent: CompiledStateGraph | None = None


async def get_agent() -> CompiledStateGraph:
    """Build and cache the compiled agent graph."""
    global _agent

    if _agent is not None:
        return _agent

    mcp_client = MultiServerMCPClient(
        {
            "ihub": {
                "transport": "http",
                "url": "",
            },
        }
    )

    tools = await mcp_client.get_tools()

    # Setup LLM for the agent
    model = AzureChatOpenAI(
        azure_endpoint="",
        deployment_name="gpt-5.2",
        openai_api_version="2024-12-01-preview",
        model_name="gpt-5.2",
    )

    _agent = create_agent(model, tools, system_prompt="You are a hilarious assistant agent")

    return _agent

The langgraph.json configuration file is shown below. The path points to the get_agent function which returns the agent (graph) object.

{
  "$schema": "https://langgra.ph/schema.json",
  "dependencies": ["."],
  "graphs": {
    "Joker Remote Agent": {
      "path": "./src/joker_agent.py:get_agent",
      "description": "Joker-style conversational agent"
    }
  },
  "env": "./.env",
  "python_version": "3.13"
}

Is it necessary to use a global object such as the _agent variable in my example for serving the agent? I typically avoid creating expensive objects each time a function is called but I’m not sure if that is necessary here. I was thinking that since the agent is being “served” to clients (users) then you don’t want the agent (graph) object being created on every request. Any guidance on how to provide the agent to the LangGraph server would be very helpful.

hi @wigging

How to Provide an Agent to a LangGraph Server

The user’s concern is valid - their global caching is necessary with a plain async factory function, because the server calls the factory for every request, including schema introspection (Studio refresh, get_graph, get_schema), state reads, and actual execution. Without caching they’d pay the Azure OpenAI + MCP initialization cost on every introspection call.

However, there’s a better, officially documented pattern.

How the server loads your graph

From langgraph_cli/schemas.py, the graphs field supports three forms:

  1. Module-level compiled object - imported once at server startup, never called again
  2. Async context manager factory - called per-request, receives RunnableConfig (legacy) or ServerRuntime (modern)
  3. Async function factory - same as above, but returns instead of yielding

The server calls your factory in 4 contexts: threads.create_run (actual execution), threads.update, threads.read (state history, used by Studio’s useStream), and assistants.read (schema introspection). This is why the caching is needed with a plain factory.


The recommended modern pattern: ServerRuntime (server v0.7.30+)

From langgraph_sdk/runtime.py:

import contextlib
from langchain.agents import create_agent
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph_sdk.runtime import ServerRuntime

llm = AzureChatOpenAI(
    azure_deployment="your-deployment",
    azure_endpoint="https://...",
    api_version="2024-02-01",
)

# Lightweight agent for introspection - no MCP connection needed
_base_agent = create_agent(llm, tools=[])

@contextlib.asynccontextmanager
async def get_agent(runtime: ServerRuntime):
    if runtime.execution_runtime:
        # Only connect to MCP during actual runs
        async with MultiServerMCPClient({...}) as mcp:
            tools = await mcp.get_tools()
            yield create_agent(llm, tools=tools)
    else:
        # Schema reads, Studio refresh - skip expensive MCP setup
        yield _base_agent

langgraph.json:

{
  "graphs": {
    "joker_agent": {
      "path": "./src/joker_agent.py:get_agent",
      "description": "Joker agent with Azure OpenAI and MCP tools"
    }
  }
}

Why this is better than the global cache

Concern Global cache ServerRuntime factory
Avoids re-init on every call Yes (cached) Yes (execution_runtime guard)
Handles MCP disconnects No (connection held forever) Yes (fresh per run, teardown after yield)
Skips MCP during introspection No Yes
Proper cleanup No Yes (code after yield)

The global caching pattern is fragile for MCP specifically: connections can time out while the server keeps running, and the cached agent won’t reconnect. The ServerRuntime context manager solves this by connecting fresh per execution and tearing down cleanly.


When to use each pattern

Scenario Recommended pattern
No async init (sync tools only) Module-level object: graph = create_agent(...)
MCP tools, async resources ServerRuntime async context manager (v0.7.30+)
Older server, async resources RunnableConfig async context manager
Per-user graph customization ServerRuntime factory using runtime.ensure_user()

I get the following error when I try the ServerRuntime approach:

As of langchain-mcp-adapters 0.1.0, MultiServerMCPClient cannot be used as a context manager (e.g., async with MultiServerMCPClient(...)). Instead, you can do one of the following:
1. client = MultiServerMCPClient(...)
   tools = await client.get_tools()
2. client = MultiServerMCPClient(...)
   async with client.session(server_name) as session:
       tools = await load_mcp_tools(session)

hi @wigging

MultiServerMCPClient dropped context manager support in 0.1.0
The async with MultiServerMCPClient({...}) as mcp: syntax was removed. You now have two correct approaches:

Option A - client.get_tools() (simplest):

client = MultiServerMCPClient(connections=MCP_CONNECTIONS)
tools = await client.get_tools()

Option B - client.session() + load_mcp_tools() (explicit per-server control, what deepagents CLI uses internally in mcp_tools.py):

from contextlib import AsyncExitStack
from langchain_mcp_adapters.tools import load_mcp_tools

client = MultiServerMCPClient(connections=MCP_CONNECTIONS)
async with AsyncExitStack() as stack:
    session = await stack.enter_async_context(client.session("your-server"))
    tools = await load_mcp_tools(session, server_name="your-server")

Updated ServerRuntime pattern (post-0.1.0)

@contextlib.asynccontextmanager
async def get_agent(runtime: ServerRuntime):
    if runtime.execution_runtime:
        client = MultiServerMCPClient(connections=MCP_CONNECTIONS)
        async with AsyncExitStack() as stack:
            all_tools = []
            for server_name in MCP_CONNECTIONS:
                session = await stack.enter_async_context(client.session(server_name))
                tools = await load_mcp_tools(session, server_name=server_name)
                all_tools.extend(tools)
            yield create_agent(llm, tools=all_tools)
            # AsyncExitStack cleans up all sessions on exit
    else:
        yield _base_agent

Or the minimal single-server form using get_tools():

@contextlib.asynccontextmanager
async def get_agent(runtime: ServerRuntime):
    if runtime.execution_runtime:
        tools = await MultiServerMCPClient(connections=MCP_CONNECTIONS).get_tools()
        yield create_agent(llm, tools=tools)
    else:
        yield _base_agent

Alternative (load once at startup): If the MCP server is stable, compile the graph at module level using asyncio.run() - that’s exactly what the deepagents CLI does in server_graph.py. Trade-off: if the MCP session drops, you must restart the LangGraph server process to reconnect.

Is there any documentation that discusses using the LangGraph server for an agent with MCP tools? I found this on the langchain-mcp-adapters repo but it doesn’t talk about using the ServerRuntime approach that you are suggesting. It just shows an async function that returns the agent directly.

Hi @wigging

The README pattern is officially supported - and it’s simpler than ServerRuntime

The user is right: the langchain-mcp-adapters README recommends a plain async function, no context manager, no ServerRuntime:

async def make_graph():
    client = MultiServerMCPClient({...})
    tools = await client.get_tools()
    return create_agent("openai:gpt-4.1", tools)
{ "graphs": { "agent": "./graph.py:make_graph" } }

Why this works (verified against client.py)

Looking at the actual source in langchain-mcp-adapters/langchain_mcp_adapters/client.py:152, get_tools() calls load_mcp_tools(None, connection=...) - passing session=None. The returned tools are thin wrappers that open a fresh MCP session per tool call using the saved connection. There’s no persistent session to manage, which is why no context manager is needed.

The docstring confirms this: “A new session will be created for each tool call”.

Trade-offs vs ServerRuntime

Pattern Per-request cost Per-tool-call cost
README make_graph() + get_tools() One list_tools() round-trip per server (also runs for introspection) New MCP session per tool call
ServerRuntime + client.session() + AsyncExitStack One session opened per execution, skipped for introspection Reuses the same session for all tool calls in the run
Module-level graph = asyncio.run(make_graph()) Nothing per request New session per tool call

Critical caveat - stdio is bad for servers

The README itself flags this inside the example:

ATTENTION: MCP’s stdio transport was designed primarily to support applications running on a user’s machine. Before using stdio in a web server context, evaluate whether there’s a more appropriate solution.

For a Studio refresh + one agent run with N tool calls, the README pattern with stdio spawns 1 + N subprocesses per server. If you’re stuck on stdio in a server context, use the client.session() + AsyncExitStack pattern from Followup #1 to keep one subprocess alive per run.

About the user’s original get_agent() cache

Compared to the README’s make_graph(), the only difference is the global cache. With get_tools(), the cache mostly saves the initial tool-listing round-trip - the cached tools still open a new session per invocation. If you’re on HTTP/SSE, drop the cache and use the README pattern. Keep it only if you have other heavy non-MCP startup work.

Recommendation

Scenario Pattern
HTTP/SSE, simplicity matters README async def make_graph()
stdio (or many tool calls per run) ServerRuntime + client.session()
Skip MCP during Studio/schema reads ServerRuntime with runtime.execution_runtime guard
Per-user tool customization ServerRuntime + runtime.ensure_user()

The README’s make_graph() is the right baseline. Reach for ServerRuntime only when you have a concrete reason.

So based on all the replies, it looks like I just need to define an async function that returns the agent as shown below. There is no need for a global agent variable or any of the server runtime stuff since I’m dealing with HTTP/SSE tools. Am I understanding this correctly?

from langchain.agents import create_agent
from langgraph.graph.state import CompiledStateGraph
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient

async def get_agent() -> CompiledStateGraph:
    mcp_client = MultiServerMCPClient(
        {
            "mymcp": {
                "transport": "http",
                "url": "https://address.com/mcp",
            },
        }
    )

    tools = await mcp_client.get_tools()

    model = AzureChatOpenAI(
        azure_endpoint="",
        deployment_name="gpt-5.2",
        openai_api_version="2024-12-01-preview",
        model_name="gpt-5.2",
    )

    agent = create_agent(
        model,
        tools,
        system_prompt="You are a hilarious, yet serious, assistant with access to MCP tools",
    )

    return agent

yes, it looks correct.

For HTTP/SSE MCP servers, the plain async def get_agent() returning a CompiledStateGraph is exactly the right pattern. It matches the official make_graph() example from the langchain-mcp-adapters README. The reasoning, restated:

  • The server calls get_agent() per request anyway, so a global cache wouldn’t help
  • MultiServerMCPClient(connections=...) is just a config object - free to construct
  • get_tools() does one list_tools HTTP round-trip per server - fast over HTTP
  • The returned tools each open a short-lived session per invocation (per langchain_mcp_adapters/client.py:152 docstring: “A new session will be created for each tool call”). Nothing to manage, nothing to clean up
  • create_agent(...) just compiles an in-memory StateGraph - cheap

Dropping both the _agent global and the ServerRuntime wrapper is the right call.

When to revisit

Symptom Switch to
Studio sluggish on every refresh ServerRuntime with runtime.execution_runtime guard
Many tool calls per run, latency accumulates ServerRuntime + client.session() + AsyncExitStack
Per-user MCP auth headers ServerRuntime factory with runtime.ensure_user()
Switch to stdio transport Don’t - but if you must, use client.session() to keep one subprocess alive

Until one of those bites, the current code is the right answer.

Optional micro-tweak

Hoist the LLM to module scope so its internal httpx.AsyncClient connection pool gets reused across requests:

_llm = AzureChatOpenAI(azure_deployment="gpt-5.2", ...)

async def get_agent() -> CompiledStateGraph:
    client = MultiServerMCPClient(connections={...})
    tools = await client.get_tools()
    return create_agent(_llm, tools=tools, system_prompt="...")

Tiny win, free to do. That is the entire optimization story - ship the simple version.

How about using Python’s cache feature to cache the language model? I think this is a cleaner approach and avoids defining a global variable.

import os
from functools import cache

from langchain.agents import create_agent
from langgraph.graph.state import CompiledStateGraph
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient


@cache
def get_model() -> AzureChatOpenAI:
    """Get a cached language model for the agent."""
    llm = AzureChatOpenAI(
        azure_endpoint="blahblah",
        deployment_name="gpt-5.2",
        openai_api_version="2024-12-01-preview",
        model_name="gpt-5.2",
    )

    return llm


async def get_agent() -> CompiledStateGraph:
    """Build the compiled agent graph."""
    token = os.environ["blahblah"]

    # Setup the MCP server for the agent
    mcp_client = MultiServerMCPClient(
        {
            "my-mcp-server": {
                "transport": "http",
                "url": MCP_URL,
                "headers": {"Authorization": f"Token {token}"},
            },
        }
    )

    # Setup MCP tools for the agent
    tools = await mcp_client.get_tools()

    # Get the language model for the agent
    llm = get_model()

    # Define agent with system prompt for working with the MCP server
    agent = create_agent(
        llm,
        tools,
        system_prompt="You are a helpful assistant with access to some MCP tools"
    )

    return agent