Hi all,
I’m experimenting with LangGraph for a simple prototype where the graph has just one node that calls an LLM. The wrinkle is that the LLM call is not a trivial text prompt:
- The input includes a PDF file that must be uploaded (and referenced) in the model call.
Right now I’m handling this by calling the OpenAI Responses API directly inside the node. This works fine, but it makes my graph tightly coupled to OpenAI. I would like to make the workflow model-agnostic so that later I can swap in Anthropic, Gemini, or other providers without rewriting the graph logic.
My questions:
-
Does LangGraph provide any built-in functions or abstractions to handle multimodal message objects (text + image/file), or is the expectation that developers wrap each provider’s SDK/API in their own adapter nodes?
-
If the latter, is the recommended pattern to:
-
Define a neutral internal schema for messages (e.g.,
{type: "text" | "image" | "file", content: ...}) -
Then write per-provider adapters that map this neutral schema to the provider’s required structure?
-
-
Are there any examples of LangGraph projects where multimodal input is handled in a provider-agnostic way, so that graph nodes remain portable across OpenAI, Anthropic, Gemini, etc.
Thanks a lot for any guidance!