How to Preload LLM and Embedding Models at Startup to Avoid Repeated Initialization

Najiya · October 16, 2025, 6:36am

How can I initialize the LLM and embedding models and vector store onnection at startup? Currently, my starting node performs RAG, so the embedding models are being initialized on each invocation, adding an extra 3–4 seconds. Is there a way to initialize them once at the beginning, before any nodes run, to avoid repeated initialization

pawel-twardziak · October 16, 2025, 4:16pm

Hi @Najiya

what about this:

Preload at module import time (Python)

Define the LLM, embeddings, and store once at the top of your module and compile a single graph instance that the server imports and reuses:

import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langgraph.graph import StateGraph, START, END
from langgraph.store.postgres import PostgresStore

# Initialize once at import time
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

DB_URI = os.environ["POSTGRES_URL"]
store = PostgresStore.from_conn_string(DB_URI)
store.setup()  # run migrations once (no-op if already applied)

def rag(state):
    # Use preinitialized llm/embeddings/store
    ...

graph = StateGraph(dict)
graph.add_node("rag", rag)
graph.add_edge(START, "rag")
graph.add_edge("rag", END)

# Compile once and inject the store (and/or checkpointer, cache) so nodes can access them
compiled = graph.compile(store=store)

When using LangGraph API, export the compiled instance at module scope and reference it in langgraph.json. The server uses the compiled graph defined at the top level, avoiding rebuilds on each request.

Inject resources via `compile`

Both Python and JS allow passing long‑lived instances into the compiled graph:

compiled = graph.compile(store=store, checkpointer=my_checkpointer, cache=my_cache)

This ensures a single store/checkpointer/cache is reused across node executions, eliminating repeated connection setup.

Use lifespan hooks (LangGraph Platform with FastAPI/Starlette)

If deploying with LangGraph Platform (Python), add a FastAPI lifespan handler to open connections and create clients once at startup, then dispose them on shutdown. Set them on app.state and have your graph code read/reuse them.

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    # e.g., initialize DB engine, vector store, clients
    app.state.store = PostgresStore.from_conn_string(DB_URI)
    app.state.store.setup()
    yield
    # cleanup
    app.state.store.close()

app = FastAPI(lifespan=lifespan)

Cache embedding calls (optional)

If initialization is fast but recomputation of embeddings is costly, wrap your embedder with LangChain’s CacheBackedEmbeddings to avoid repeated vectorization of the same inputs.

from langchain.storage import LocalFileStore
from langchain_v1.embeddings.cache import CacheBackedEmbeddings

underlying = OpenAIEmbeddings(model="text-embedding-3-small")
store = LocalFileStore("./emb_cache")
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying, store, namespace=underlying.model
)

Topic		Replies	Views
LangGraph Cloud – Using the Built-in PostgreSQL Store for Long-Term Memory (LTM) and Vector Similarity Search Deployment cloud , python-help	5	1130	July 8, 2025
LLM Cache for LangGraph LangGraph python-help	3	875	August 5, 2025
Is langchain/langgraph bloated? LangGraph intro-to-langgraph , python-help	2	173	October 26, 2025
❓ How to Reduce Double Agent Calls in React architecture (LangGraph) & Reduce Latency LangGraph python-help	4	530	September 26, 2025
Memory Store Langchain LangGraph python-help	2	460	September 23, 2025

How to Preload LLM and Embedding Models at Startup to Avoid Repeated Initialization

Preload at module import time (Python)

Inject resources via compile

Use lifespan hooks (LangGraph Platform with FastAPI/Starlette)

Cache embedding calls (optional)

Related topics

Inject resources via `compile`