Best Practices for Streaming in Agentic Systems with Non-Streaming Proxies

I’m working on building an agentic system using LangGraph in an enterprise environment where all outbound traffic must pass through a corporate proxy. A common issue in such setups is that the proxy does not support streaming and instead buffers the entire response from the LLM before forwarding it.
This effectively negates the real-time, token-by-token streaming benefit, even if the underlying LLM API supports it. The application receives the full response in a single chunk after the LLM has completed generation.
While LangGraph’s ability to stream state updates via stream_mode="updates" is incredibly valuable for showing node-to-node progress, the loss of the LLM’s “typing effect” impacts the user experience.
My question to the community is:
How are people managing streaming for agentic systems when faced with a non-streaming proxy?
I’m particularly interested in any established best practices or architectural patterns. For instance:

  1. Graceful Fallback: Are there recommended ways to configure a LangGraph application to detect a non-streaming connection and fall back gracefully, perhaps by managing user expectations in the UI?
  2. Communicating with IT: For those who have successfully addressed this at the infrastructure level, what arguments or technical details were most effective in convincing IT/security teams to enable streaming for specific LLM endpoints on proxies (e.g., Zscaler, Palo Alto, etc.)?
  3. Alternative Patterns: Are there any alternative design patterns within LangGraph to simulate a more responsive experience in this scenario? For example, breaking down large LLM calls into smaller, sequential calls to provide more frequent updates, even if they aren’t token-level.
    Any insights, shared experiences, or recommended workarounds would be highly appreciated. Understanding how others are tackling this common real-world constraint would be a great help to many developers in the community.