Hi everyone,
I’m running a large number of LLM-based tests in parallel and consistently hitting OpenAI rate limits during execution. I’d love to learn what patterns or best practices people are using to handle this at scale.
In particular, I’m curious about:
- How do you typically manage concurrency when running many LLM-powered tests?
- Is using multiple OpenAI API keys ever considered a valid approach, or is that generally discouraged?
More context on my setup:
I have multiple datasets representing different workflows and expected outcomes. Each dataset contains traces that:
- Feed inputs into a deepagents-based agent
- Produce a trajectory (tool calls, arguments, messages, decisions, etc.)
- Are evaluated by one or more LLM judges that score the trajectory against expectations
The judges validate things like:
- Correct tool usage and ordering
- Arguments passed to tools
- Content of generated messages
- Decisions taken given specific tool outputs
Right now:
- Tests are run locally with
pytest - Tools are mocked to isolate backend dependencies
- Scenarios Framework is used to structure the cases
- Tests are executed in parallel, which is where rate limits start to hurt
This works well functionally, but once the test count grows, rate limiting becomes the bottleneck.
Thanks in advance. I would really appreciate any insights or references.