The “langchian deep_agent” is calling the sub-agent very slowly. Could you please tell me how to solve this problem? I have only loaded one sub-agent.
Hello @xiaoyang , could you please share some code?
ok, here are some code
agent = create_deep_agent(
model=model,
tools=[
],
subagents=SUBAGENTS,
system_prompt=MAIN_AGENT_PROMPT,
backend=FilesystemBackend(root_dir=r"\Parse\src\api", virtual_mode=True),
interrupt_on={
}
)
Hi @xiaoyang what do you mean by “is calling the sub-agent very slowly”. What model are you using? Are you using thinking mode? Since you are using create_deep_agent this intentionally works slower compared to create_agent.
- Could you try using different/faster model and observe its behaviour
- Do you have any logging? You could add some output logging or middleware to observe where exactly it works slowly
感觉思考非常慢,底层应该多次在反复思考
i am use deepseek-chat model , and this model has only one tool.
@xiaoyang is this self-hosted?
use crate_deep_agent is model self-hosted
I think so that is the reason why your execution is too slow. Can you just do a simple request, and perhaps see what the inference speed of your model is.
yes, i think deep_agent send multiple requests to the API . so it was very slow
So that is the problem, I would recommend you host your model on better specs or use a mini version of the model.
@xiaoyang if you were able to verify this, can you mark the last answer as the solution?
Hi @xiaoyang ,
The slowness isn’t coming from LangChain itself — it’s because:
-
You’re self-hosting the model.
-
create_deep_agent(Deep Agent) makes multiple LLM calls internally. -
Your model (e.g.,
deepseek-chat) likely has slow inference speed on your current hardware.
Deep agents typically:
-
Plan
-
Reason
-
Possibly re-evaluate
-
Call tools
-
Reflect
That means multiple API/model calls per user request. If your model is self-hosted and slow per request, total latency increases quickly.
Why it feels like “thinking multiple times”
Because it actually is.
Deep agents often:
-
Generate plan
-
Evaluate tool choice
-
Possibly re-plan
-
Continue reasoning
Each step = another model call.
What you can do
-
Test raw inference speed with a single simple prompt
-
Check tokens/sec throughput
-
Upgrade hardware (GPU, VRAM, CPU)
-
Use a smaller model variant
-
Reduce reasoning depth if configurable
Bottom line
Slow execution =
(multiple agent calls) × (slow self-hosted model inference)
So yes — your observation is correct.