Problem
Production deployment health check times out after 600 seconds, but Development deployment succeeds with identical code and config.
- Deployment: prod-agent
- Workspace: Workspace 1
- LangGraph API version: 0.7.72
- Banner: “This deployment is using our new and improved architecture”
What works
- Development deployment deploys successfully with full config (auth, custom HTTP app, middleware)
- Server starts cleanly, warmup completes in ~8s, total startup ~28-42s
/okendpoint is served and responsive- No crashes, no restarts
What fails
- Production deployment: “Timeout: New revision health check did not succeed after 600 seconds”
- Even with ALL custom code removed (no auth, no middleware, no custom HTTP app, single graph), Production still fails
- Redis connection errors seen in Production logs:
redis: connection pool: failed to dial after 5 attempts: dial tcp 192.168.28.161:6379: connect: connection refused
redis: connection pool: failed to dial after 1 attempts: dial tcp 192.168.28.161:6379: i/o timeout
What we tested
- Removed custom auth — still fails on Prod
- Removed custom HTTP app/middleware — still fails on Prod
- Removed all graphs except one — still fails on Prod
- Stripped to hello_world graph with zero deps — still fails on Prod
- Pinned base image to 0.7.69 — still fails on Prod
- Created brand new Production deployment — still fails
- Development deployment with full config — works
Server logs (Production)
Server starts normally, all health checks served, but revision never marked healthy:
Application started up in 25.530s
Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Then Redis fails:
redis: connection pool: failed to dial after 5 attempts: connect: connection refused
Request
The Production deployment infrastructure appears to have a Redis provisioning issue on the “new architecture.” Can you investigate the Redis connectivity for our Production deployment?