Langgraph performance with ChatConverse

knightsrule · July 15, 2025, 3:10am

Hi,

I am a building a voice virtual agent which is a simple react agent with tool calling. This python app is currently hosted on AWS and uses Nova Pro via AWS Bedrock as the language model. I consistently see 2 seconds of execution time whenever the ChatConverse is called. You can see two examples in the attached. I have tried OpenAI and 4o-mini but performance seems very similar. This seems slow and insufficient for voice experiences.

First call is to use language model to figure out which tool to call and second call is to conver tool output to a user response.

My question are:

Is this performance expected?
If no, what options do you explore to lower the latency.

Thanks!

eyurtsev · July 15, 2025, 5:57pm

LLMs typically take a few seconds to complete a response especially if it’s long.

To lower latency, you can:

swap to smaller models (find models that have lower latency) – this usually comes with a loss of quality
cache the client – i.e., try not to re-initialize the client with every request. This will allow you to skip the initial tcp handshake
Reduce the size of the input prompt or the size of the output you ask the model to generate (this isn’t always realistic)

Topic		Replies	Views
Agent LLM calls taking much longer than reported in LLM logs LangGraph python-help	0	11	July 22, 2025
How to handle AWS Bedrock and Open AI model providers with the same graph LangGraph python-help	2	9	July 24, 2025
Parallel tool calling in Langgraph LangGraph python-help	3	47	July 15, 2025
Tool calls that take a long time LangGraph python-help	2	62	July 15, 2025
Graph randomly freezes without error LangGraph Platform	1	65	July 7, 2025

Langgraph performance with ChatConverse

Related topics