Langgraph performance with ChatConverse

Hi,

I am a building a voice virtual agent which is a simple react agent with tool calling. This python app is currently hosted on AWS and uses Nova Pro via AWS Bedrock as the language model. I consistently see 2 seconds of execution time whenever the ChatConverse is called. You can see two examples in the attached. I have tried OpenAI and 4o-mini but performance seems very similar. This seems slow and insufficient for voice experiences.

First call is to use language model to figure out which tool to call and second call is to conver tool output to a user response.

My question are:

  1. Is this performance expected?
  2. If no, what options do you explore to lower the latency.

Thanks!

LLMs typically take a few seconds to complete a response especially if it’s long.

To lower latency, you can:

  1. swap to smaller models (find models that have lower latency) – this usually comes with a loss of quality
  2. cache the client – i.e., try not to re-initialize the client with every request. This will allow you to skip the initial tcp handshake
  3. Reduce the size of the input prompt or the size of the output you ask the model to generate (this isn’t always realistic)