Best Practices for Running LLM-Based Tests at Scale Without Hitting Rate Limits

arthurmarcal · December 18, 2025, 6:30pm

Hi everyone,

I’m running a large number of LLM-based tests in parallel and consistently hitting OpenAI rate limits during execution. I’d love to learn what patterns or best practices people are using to handle this at scale.

In particular, I’m curious about:

How do you typically manage concurrency when running many LLM-powered tests?
Is using multiple OpenAI API keys ever considered a valid approach, or is that generally discouraged?

More context on my setup:

I have multiple datasets representing different workflows and expected outcomes. Each dataset contains traces that:

Feed inputs into a deepagents-based agent
Produce a trajectory (tool calls, arguments, messages, decisions, etc.)
Are evaluated by one or more LLM judges that score the trajectory against expectations

The judges validate things like:

Correct tool usage and ordering
Arguments passed to tools
Content of generated messages
Decisions taken given specific tool outputs

Right now:

Tests are run locally with pytest
Tools are mocked to isolate backend dependencies
Scenarios Framework is used to structure the cases
Tests are executed in parallel, which is where rate limits start to hurt

This works well functionally, but once the test count grows, rate limiting becomes the bottleneck.

Thanks in advance. I would really appreciate any insights or references.

Topic		Replies	Views
Parallel Test Scenarios vs OpenAI Rate Limits LangGraph langsmith-studio , python-help	1	102	January 8, 2026
3rd party rate limit handling(OpenAI-level rate limiting in LangChain.js) Talking Shop	2	28	June 2, 2026
RateLimitError in the first lesson of LangChain Essentials LangChain Academy	1	158	January 6, 2026
Enhancement: Allow ToolCallLimitMiddleware to accept multiple tool names LangChain python-help	2	292	January 22, 2026
Can we make parallel tool calls in create_agent, if yes how?If no, is there any way we can do? LangChain python-help	1	328	January 2, 2026

Best Practices for Running LLM-Based Tests at Scale Without Hitting Rate Limits

Related topics