Thanks for the detailed writeup here — this matches my understanding.
Today, Deep Agents does not have a dedicated image-context compression mechanism. The existing text-oriented summarization / compaction paths can remove older messages that contain image blocks once those messages fall into the summarized partition, but they do not intelligently resize, summarize, or preserve images as reusable visual context. Recent messages that are kept in context will still include the images as-is.
So the practical recommendations right now are:
- Prefer URLs or file/backend references over base64 image payloads in message history.
- Store generated screenshots / charts / images externally and pass references back to the agent.
- Use a custom token counter if your workload is image-heavy, since approximate token counting may not reflect actual provider-side image cost.
- Consider custom middleware that replaces older image blocks with text summaries or placeholders before model calls.
- Tune summarization thresholds more conservatively for multimodal workloads.
We can probably improve the docs here, since this distinction is not very obvious today: existing context management is mostly text/message-history oriented, not true multimodal compression. I’ll look into adding clearer guidance around image inputs, token accounting, and recommended patterns for storing/referencing images rather than keeping large image payloads directly in context.