Hi team & community,
We are currently experiencing a critical issue with our production deployment. Our instance has been down for several hours, and we are seeking an urgent resolution.
Deployment Details:
- Deployment Name: mcp-playground-agent-scale
- Deployment ID: acb59f5b-e4b4-4c9c-be09-3e9cd8abbe1d
- Workspace ID: 5f6ecbbf-d894-480c-b184-9b677b3ed123
-
Latest Revision ID: 34c2f392-e53d-471e-ba20-1927d9c65c0f
-
Last Working Revision ID: 538ed1d8-3590-486b-814f-df722e4ca307
Based on deployment server logs, the application is failing to start due to internal Redis connection timeouts. Specifically, we are seeing redis.exceptions.TimeoutError: Timeout connecting to server and asyncio.exceptions.CancelledError during the migration and lifespan startup phases.
Relevant Log Snippet:
1/4/2026, 9:32:05 AM Application startup failed. Exiting.
…
redis.exceptions.TimeoutError: Timeout connecting to server
1/4/2026, 9:31:55 AM Redis ping timed out
Could you please investigate this internal connectivity issue and advise on a resolution as soon as possible?
This deployment was well used for the past few months, without errors of this kind. Also, we didn’t make any change to the repo (didn’t push to main), or to the deployment. It just started to raise these errors.
This is the only deployment in our workspace raising these errors (we have one more production deployment and one more development deployment that work fine).
This is a crucial deployment for us, holding most of our threads, and we are seeking to fix & enable it back again.
Thanks in advance!