Hi all — I’m building an agentic workflow (LangGraph + Gemini) where one of my tools returns a very large JSON output. The full data is required for downstream processing, but:
-
I must not send the raw JSON to the LLM
-
I must avoid huge payloads appearing in LangSmith traces
-
I’m running local-only and Streamlit.io, no external DB
-
The agent may generate multiple large artifacts per run
I tested three approaches:
1. response_format="content_and_artifact"
Store the large JSON in artifact, return only a summary in content.
With a small patch, LangSmith won’t upload the full artifact.
Pros: agent-friendly, clean, scalable
Cons: still need disk storage if I want persistence
2. Write large JSON to local temp files
Tool writes to disk and returns only a file key/path.
Pros: safe, no risk of LLM exposure or LangSmith overload
Cons: manual lifecycle, less native to LangGraph
3. Global variables
Simple dict holding all results.
Pros: easy to prototype
Cons: breaks in reruns, not persistent, not agent-friendly
Question:
What is the recommended pattern for handling large tool outputs that must remain accessible across an agent’s workflow while keeping them out of the LLM context and LangSmith traces?
Should I primarily rely on content_and_artifact + disk-backed storage, or is there a more idiomatic approach?
Thanks!