Parallel Test Scenarios vs OpenAI Rate Limits

Hi everyone,

I’m running a large number of LangWatch scenarios in parallel (scenario.langwatch.ai) and I’m consistently hitting OpenAI rate limits during execution.

I’m trying to understand what the recommended or best-practice approaches are for handling this at scale. For example:

How do you typically manage concurrency when running many LLM tests?
Is using multiple OpenAI API keys ever considered a valid approach, or is it generally discouraged?

I’d love to hear how others have solved this in production or CI environments, and whether there are LangWatch-specific patterns that work well.

Thanks in advance!

Rate limiting of the LLM provider is nothing in our control and using multiple API key to work around their limits feels fragile. I would recommend that you try to throttle the parallel execution of the test you are running to avoid being throttled. We have some guidance on how to rate limit in case you are using evaluations in python here: How to handle model rate limits - Docs by LangChain