How should I provide an agent to a LangGraph server?

wigging · April 23, 2026, 1:54pm

I’m using LangGraph to serve an agent with langgraph dev --no-browser.

The code below provides a get_agent function that returns a CompiledStateGraph object. Notice the _agent variable is used as a global variable to prevent the function from creating a new agent (or CompiledStateGraph object) each time the function is called.

# joker_agent.py

from langchain.agents import create_agent
from langgraph.graph.state import CompiledStateGraph
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient


_agent: CompiledStateGraph | None = None


async def get_agent() -> CompiledStateGraph:
    """Build and cache the compiled agent graph."""
    global _agent

    if _agent is not None:
        return _agent

    mcp_client = MultiServerMCPClient(
        {
            "ihub": {
                "transport": "http",
                "url": "",
            },
        }
    )

    tools = await mcp_client.get_tools()

    # Setup LLM for the agent
    model = AzureChatOpenAI(
        azure_endpoint="",
        deployment_name="gpt-5.2",
        openai_api_version="2024-12-01-preview",
        model_name="gpt-5.2",
    )

    _agent = create_agent(model, tools, system_prompt="You are a hilarious assistant agent")

    return _agent

The langgraph.json configuration file is shown below. The path points to the get_agent function which returns the agent (graph) object.

{
  "$schema": "https://langgra.ph/schema.json",
  "dependencies": ["."],
  "graphs": {
    "Joker Remote Agent": {
      "path": "./src/joker_agent.py:get_agent",
      "description": "Joker-style conversational agent"
    }
  },
  "env": "./.env",
  "python_version": "3.13"
}

Is it necessary to use a global object such as the _agent variable in my example for serving the agent? I typically avoid creating expensive objects each time a function is called but I’m not sure if that is necessary here. I was thinking that since the agent is being “served” to clients (users) then you don’t want the agent (graph) object being created on every request. Any guidance on how to provide the agent to the LangGraph server would be very helpful.

pawel-twardziak · April 23, 2026, 2:17pm

hi @wigging

How to Provide an Agent to a LangGraph Server

The user’s concern is valid - their global caching is necessary with a plain async factory function, because the server calls the factory for every request, including schema introspection (Studio refresh, get_graph, get_schema), state reads, and actual execution. Without caching they’d pay the Azure OpenAI + MCP initialization cost on every introspection call.

However, there’s a better, officially documented pattern.

How the server loads your graph

From langgraph_cli/schemas.py, the graphs field supports three forms:

Module-level compiled object - imported once at server startup, never called again
Async context manager factory - called per-request, receives RunnableConfig (legacy) or ServerRuntime (modern)
Async function factory - same as above, but returns instead of yielding

The server calls your factory in 4 contexts: threads.create_run (actual execution), threads.update, threads.read (state history, used by Studio’s useStream), and assistants.read (schema introspection). This is why the caching is needed with a plain factory.

The recommended modern pattern: `ServerRuntime` (server v0.7.30+)

From langgraph_sdk/runtime.py:

import contextlib
from langchain.agents import create_agent
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph_sdk.runtime import ServerRuntime

llm = AzureChatOpenAI(
    azure_deployment="your-deployment",
    azure_endpoint="https://...",
    api_version="2024-02-01",
)

# Lightweight agent for introspection - no MCP connection needed
_base_agent = create_agent(llm, tools=[])

@contextlib.asynccontextmanager
async def get_agent(runtime: ServerRuntime):
    if runtime.execution_runtime:
        # Only connect to MCP during actual runs
        async with MultiServerMCPClient({...}) as mcp:
            tools = await mcp.get_tools()
            yield create_agent(llm, tools=tools)
    else:
        # Schema reads, Studio refresh - skip expensive MCP setup
        yield _base_agent

langgraph.json:

{
  "graphs": {
    "joker_agent": {
      "path": "./src/joker_agent.py:get_agent",
      "description": "Joker agent with Azure OpenAI and MCP tools"
    }
  }
}

Why this is better than the global cache

Concern	Global cache	`ServerRuntime` factory
Avoids re-init on every call	Yes (cached)	Yes (`execution_runtime` guard)
Handles MCP disconnects	No (connection held forever)	Yes (fresh per run, teardown after yield)
Skips MCP during introspection	No	Yes
Proper cleanup	No	Yes (code after `yield`)

The global caching pattern is fragile for MCP specifically: connections can time out while the server keeps running, and the cached agent won’t reconnect. The ServerRuntime context manager solves this by connecting fresh per execution and tearing down cleanly.

When to use each pattern

Scenario	Recommended pattern
No async init (sync tools only)	Module-level object: `graph = create_agent(...)`
MCP tools, async resources	`ServerRuntime` async context manager (v0.7.30+)
Older server, async resources	`RunnableConfig` async context manager
Per-user graph customization	`ServerRuntime` factory using `runtime.ensure_user()`

wigging · April 24, 2026, 1:05pm

I get the following error when I try the ServerRuntime approach:

As of langchain-mcp-adapters 0.1.0, MultiServerMCPClient cannot be used as a context manager (e.g., async with MultiServerMCPClient(...)). Instead, you can do one of the following:
1. client = MultiServerMCPClient(...)
   tools = await client.get_tools()
2. client = MultiServerMCPClient(...)
   async with client.session(server_name) as session:
       tools = await load_mcp_tools(session)

pawel-twardziak · April 24, 2026, 3:02pm

hi @wigging

MultiServerMCPClient dropped context manager support in 0.1.0
The async with MultiServerMCPClient({...}) as mcp: syntax was removed. You now have two correct approaches:

Option A - client.get_tools() (simplest):

client = MultiServerMCPClient(connections=MCP_CONNECTIONS)
tools = await client.get_tools()

Option B - client.session() + load_mcp_tools() (explicit per-server control, what deepagents CLI uses internally in mcp_tools.py):

from contextlib import AsyncExitStack
from langchain_mcp_adapters.tools import load_mcp_tools

client = MultiServerMCPClient(connections=MCP_CONNECTIONS)
async with AsyncExitStack() as stack:
    session = await stack.enter_async_context(client.session("your-server"))
    tools = await load_mcp_tools(session, server_name="your-server")

Updated ServerRuntime pattern (post-0.1.0)

@contextlib.asynccontextmanager
async def get_agent(runtime: ServerRuntime):
    if runtime.execution_runtime:
        client = MultiServerMCPClient(connections=MCP_CONNECTIONS)
        async with AsyncExitStack() as stack:
            all_tools = []
            for server_name in MCP_CONNECTIONS:
                session = await stack.enter_async_context(client.session(server_name))
                tools = await load_mcp_tools(session, server_name=server_name)
                all_tools.extend(tools)
            yield create_agent(llm, tools=all_tools)
            # AsyncExitStack cleans up all sessions on exit
    else:
        yield _base_agent

Or the minimal single-server form using get_tools():

@contextlib.asynccontextmanager
async def get_agent(runtime: ServerRuntime):
    if runtime.execution_runtime:
        tools = await MultiServerMCPClient(connections=MCP_CONNECTIONS).get_tools()
        yield create_agent(llm, tools=tools)
    else:
        yield _base_agent

Alternative (load once at startup): If the MCP server is stable, compile the graph at module level using asyncio.run() - that’s exactly what the deepagents CLI does in server_graph.py. Trade-off: if the MCP session drops, you must restart the LangGraph server process to reconnect.

wigging · April 24, 2026, 8:53pm

Is there any documentation that discusses using the LangGraph server for an agent with MCP tools? I found this on the langchain-mcp-adapters repo but it doesn’t talk about using the ServerRuntime approach that you are suggesting. It just shows an async function that returns the agent directly.

pawel-twardziak · April 24, 2026, 9:58pm

Hi @wigging

The README pattern is officially supported - and it’s simpler than `ServerRuntime`

The user is right: the langchain-mcp-adapters README recommends a plain async function, no context manager, no ServerRuntime:

async def make_graph():
    client = MultiServerMCPClient({...})
    tools = await client.get_tools()
    return create_agent("openai:gpt-4.1", tools)

{ "graphs": { "agent": "./graph.py:make_graph" } }

Why this works (verified against `client.py`)

Looking at the actual source in langchain-mcp-adapters/langchain_mcp_adapters/client.py:152, get_tools() calls load_mcp_tools(None, connection=...) - passing session=None. The returned tools are thin wrappers that open a fresh MCP session per tool call using the saved connection. There’s no persistent session to manage, which is why no context manager is needed.

The docstring confirms this: “A new session will be created for each tool call”.

Trade-offs vs `ServerRuntime`

Pattern	Per-request cost	Per-tool-call cost
README `make_graph()` + `get_tools()`	One `list_tools()` round-trip per server (also runs for introspection)	New MCP session per tool call
`ServerRuntime` + `client.session()` + `AsyncExitStack`	One session opened per execution, skipped for introspection	Reuses the same session for all tool calls in the run
Module-level `graph = asyncio.run(make_graph())`	Nothing per request	New session per tool call

Critical caveat - stdio is bad for servers

The README itself flags this inside the example:

ATTENTION: MCP’s stdio transport was designed primarily to support applications running on a user’s machine. Before using stdio in a web server context, evaluate whether there’s a more appropriate solution.

For a Studio refresh + one agent run with N tool calls, the README pattern with stdio spawns 1 + N subprocesses per server. If you’re stuck on stdio in a server context, use the client.session() + AsyncExitStack pattern from Followup #1 to keep one subprocess alive per run.

About the user’s original `get_agent()` cache

Compared to the README’s make_graph(), the only difference is the global cache. With get_tools(), the cache mostly saves the initial tool-listing round-trip - the cached tools still open a new session per invocation. If you’re on HTTP/SSE, drop the cache and use the README pattern. Keep it only if you have other heavy non-MCP startup work.

Recommendation

Scenario	Pattern
HTTP/SSE, simplicity matters	README `async def make_graph()`
stdio (or many tool calls per run)	`ServerRuntime` + `client.session()`
Skip MCP during Studio/schema reads	`ServerRuntime` with `runtime.execution_runtime` guard
Per-user tool customization	`ServerRuntime` + `runtime.ensure_user()`

The README’s make_graph() is the right baseline. Reach for ServerRuntime only when you have a concrete reason.

wigging · April 26, 2026, 12:47am

So based on all the replies, it looks like I just need to define an async function that returns the agent as shown below. There is no need for a global agent variable or any of the server runtime stuff since I’m dealing with HTTP/SSE tools. Am I understanding this correctly?

from langchain.agents import create_agent
from langgraph.graph.state import CompiledStateGraph
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient

async def get_agent() -> CompiledStateGraph:
    mcp_client = MultiServerMCPClient(
        {
            "mymcp": {
                "transport": "http",
                "url": "https://address.com/mcp",
            },
        }
    )

    tools = await mcp_client.get_tools()

    model = AzureChatOpenAI(
        azure_endpoint="",
        deployment_name="gpt-5.2",
        openai_api_version="2024-12-01-preview",
        model_name="gpt-5.2",
    )

    agent = create_agent(
        model,
        tools,
        system_prompt="You are a hilarious, yet serious, assistant with access to MCP tools",
    )

    return agent

pawel-twardziak · April 27, 2026, 4:37pm

yes, it looks correct.

For HTTP/SSE MCP servers, the plain async def get_agent() returning a CompiledStateGraph is exactly the right pattern. It matches the official make_graph() example from the langchain-mcp-adapters README. The reasoning, restated:

The server calls get_agent() per request anyway, so a global cache wouldn’t help
MultiServerMCPClient(connections=...) is just a config object - free to construct
get_tools() does one list_tools HTTP round-trip per server - fast over HTTP
The returned tools each open a short-lived session per invocation (per langchain_mcp_adapters/client.py:152 docstring: “A new session will be created for each tool call”). Nothing to manage, nothing to clean up
create_agent(...) just compiles an in-memory StateGraph - cheap

Dropping both the _agent global and the ServerRuntime wrapper is the right call.

When to revisit

Symptom	Switch to
Studio sluggish on every refresh	`ServerRuntime` with `runtime.execution_runtime` guard
Many tool calls per run, latency accumulates	`ServerRuntime` + `client.session()` + `AsyncExitStack`
Per-user MCP auth headers	`ServerRuntime` factory with `runtime.ensure_user()`
Switch to stdio transport	Don’t - but if you must, use `client.session()` to keep one subprocess alive

Until one of those bites, the current code is the right answer.

Optional micro-tweak

Hoist the LLM to module scope so its internal httpx.AsyncClient connection pool gets reused across requests:

_llm = AzureChatOpenAI(azure_deployment="gpt-5.2", ...)

async def get_agent() -> CompiledStateGraph:
    client = MultiServerMCPClient(connections={...})
    tools = await client.get_tools()
    return create_agent(_llm, tools=tools, system_prompt="...")

Tiny win, free to do. That is the entire optimization story - ship the simple version.

wigging · April 27, 2026, 8:06pm

How about using Python’s cache feature to cache the language model? I think this is a cleaner approach and avoids defining a global variable.

import os
from functools import cache

from langchain.agents import create_agent
from langgraph.graph.state import CompiledStateGraph
from langchain_openai import AzureChatOpenAI
from langchain_mcp_adapters.client import MultiServerMCPClient


@cache
def get_model() -> AzureChatOpenAI:
    """Get a cached language model for the agent."""
    llm = AzureChatOpenAI(
        azure_endpoint="blahblah",
        deployment_name="gpt-5.2",
        openai_api_version="2024-12-01-preview",
        model_name="gpt-5.2",
    )

    return llm


async def get_agent() -> CompiledStateGraph:
    """Build the compiled agent graph."""
    token = os.environ["blahblah"]

    # Setup the MCP server for the agent
    mcp_client = MultiServerMCPClient(
        {
            "my-mcp-server": {
                "transport": "http",
                "url": MCP_URL,
                "headers": {"Authorization": f"Token {token}"},
            },
        }
    )

    # Setup MCP tools for the agent
    tools = await mcp_client.get_tools()

    # Get the language model for the agent
    llm = get_model()

    # Define agent with system prompt for working with the MCP server
    agent = create_agent(
        llm,
        tools,
        system_prompt="You are a helpful assistant with access to some MCP tools"
    )

    return agent

Topic		Replies	Views
Help for Agents connection with FastMCP LangChain python-help	8	712	October 19, 2025
Langgraph FastAPI Example to use with Agent Chat Ui LangGraph self-hosted , python-help	2	1093	January 9, 2026
When I input a travel plan, it needs to call many tools and perform 25 steps. Does it really take that long? Is there a better solution? LangGraph python-help	6	395	December 12, 2025
Are dynamic tool lists allowed when using create_agent? Talking Shop intro-to-langgraph , python-help	16	3390	May 8, 2026
LangGraph Server MCP api LangGraph js-help	2	328	September 24, 2025

How should I provide an agent to a LangGraph server?

How to Provide an Agent to a LangGraph Server

How the server loads your graph

The recommended modern pattern: ServerRuntime (server v0.7.30+)

Why this is better than the global cache

When to use each pattern

Updated ServerRuntime pattern (post-0.1.0)

The README pattern is officially supported - and it’s simpler than ServerRuntime

Why this works (verified against client.py)

Trade-offs vs ServerRuntime