Langchiain deep_agent very slowly

The “langchian deep_agent” is calling the sub-agent very slowly. Could you please tell me how to solve this problem? I have only loaded one sub-agent.

Hello @xiaoyang , could you please share some code?

ok, here are some code

agent = create_deep_agent(
model=model,
tools=[
],
subagents=SUBAGENTS,
system_prompt=MAIN_AGENT_PROMPT,
backend=FilesystemBackend(root_dir=r"\Parse\src\api", virtual_mode=True),
interrupt_on={
}
)

Hi @xiaoyang what do you mean by “is calling the sub-agent very slowly”. What model are you using? Are you using thinking mode? Since you are using create_deep_agent this intentionally works slower compared to create_agent.

  1. Could you try using different/faster model and observe its behaviour
  2. Do you have any logging? You could add some output logging or middleware to observe where exactly it works slowly

感觉思考非常慢,底层应该多次在反复思考

i am use deepseek-chat model , and this model has only one tool.

@xiaoyang is this self-hosted?

use crate_deep_agent is model self-hosted

I think so that is the reason why your execution is too slow. Can you just do a simple request, and perhaps see what the inference speed of your model is.

yes, i think deep_agent send multiple requests to the API . so it was very slow

So that is the problem, I would recommend you host your model on better specs or use a mini version of the model.

@xiaoyang if you were able to verify this, can you mark the last answer as the solution?

Hi @xiaoyang ,

The slowness isn’t coming from LangChain itself — it’s because:

  1. You’re self-hosting the model.

  2. create_deep_agent (Deep Agent) makes multiple LLM calls internally.

  3. Your model (e.g., deepseek-chat) likely has slow inference speed on your current hardware.

Deep agents typically:

  • Plan

  • Reason

  • Possibly re-evaluate

  • Call tools

  • Reflect

That means multiple API/model calls per user request. If your model is self-hosted and slow per request, total latency increases quickly.


Why it feels like “thinking multiple times”

Because it actually is.

Deep agents often:

  • Generate plan

  • Evaluate tool choice

  • Possibly re-plan

  • Continue reasoning

Each step = another model call.


What you can do

  • Test raw inference speed with a single simple prompt

  • Check tokens/sec throughput

  • Upgrade hardware (GPU, VRAM, CPU)

  • Use a smaller model variant

  • Reduce reasoning depth if configurable


Bottom line

Slow execution =
(multiple agent calls) × (slow self-hosted model inference)

So yes — your observation is correct.