How can I initialize the LLM and embedding models and vector store onnection at startup? Currently, my starting node performs RAG, so the embedding models are being initialized on each invocation, adding an extra 3–4 seconds. Is there a way to initialize them once at the beginning, before any nodes run, to avoid repeated initialization
Hi @Najiya
what about this:
Preload at module import time (Python)
Define the LLM, embeddings, and store once at the top of your module and compile a single graph instance that the server imports and reuses:
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langgraph.graph import StateGraph, START, END
from langgraph.store.postgres import PostgresStore
# Initialize once at import time
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
DB_URI = os.environ["POSTGRES_URL"]
store = PostgresStore.from_conn_string(DB_URI)
store.setup() # run migrations once (no-op if already applied)
def rag(state):
# Use preinitialized llm/embeddings/store
...
graph = StateGraph(dict)
graph.add_node("rag", rag)
graph.add_edge(START, "rag")
graph.add_edge("rag", END)
# Compile once and inject the store (and/or checkpointer, cache) so nodes can access them
compiled = graph.compile(store=store)
When using LangGraph API, export the compiled instance at module scope and reference it in langgraph.json. The server uses the compiled graph defined at the top level, avoiding rebuilds on each request.
Inject resources via compile
Both Python and JS allow passing long‑lived instances into the compiled graph:
compiled = graph.compile(store=store, checkpointer=my_checkpointer, cache=my_cache)
This ensures a single store/checkpointer/cache is reused across node executions, eliminating repeated connection setup.
Use lifespan hooks (LangGraph Platform with FastAPI/Starlette)
If deploying with LangGraph Platform (Python), add a FastAPI lifespan handler to open connections and create clients once at startup, then dispose them on shutdown. Set them on app.state and have your graph code read/reuse them.
from contextlib import asynccontextmanager
from fastapi import FastAPI
@asynccontextmanager
async def lifespan(app: FastAPI):
# e.g., initialize DB engine, vector store, clients
app.state.store = PostgresStore.from_conn_string(DB_URI)
app.state.store.setup()
yield
# cleanup
app.state.store.close()
app = FastAPI(lifespan=lifespan)
Cache embedding calls (optional)
If initialization is fast but recomputation of embeddings is costly, wrap your embedder with LangChain’s CacheBackedEmbeddings to avoid repeated vectorization of the same inputs.
from langchain.storage import LocalFileStore
from langchain_v1.embeddings.cache import CacheBackedEmbeddings
underlying = OpenAIEmbeddings(model="text-embedding-3-small")
store = LocalFileStore("./emb_cache")
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying, store, namespace=underlying.model
)