Production deployment health check fails (600s timeout) — Development works with identical code

aadedewe_epoch · March 16, 2026, 1:49pm

Problem

Production deployment health check times out after 600 seconds, but Development deployment succeeds with identical code and config.

Deployment: prod-agent
Workspace: Workspace 1
LangGraph API version: 0.7.72
Banner: “This deployment is using our new and improved architecture”

What works

Development deployment deploys successfully with full config (auth, custom HTTP app, middleware)
Server starts cleanly, warmup completes in ~8s, total startup ~28-42s
/ok endpoint is served and responsive
No crashes, no restarts

What fails

Production deployment: “Timeout: New revision health check did not succeed after 600 seconds”
Even with ALL custom code removed (no auth, no middleware, no custom HTTP app, single graph), Production still fails
Redis connection errors seen in Production logs:
redis: connection pool: failed to dial after 5 attempts: dial tcp 192.168.28.161:6379: connect: connection refused
redis: connection pool: failed to dial after 1 attempts: dial tcp 192.168.28.161:6379: i/o timeout

What we tested

Removed custom auth — still fails on Prod
Removed custom HTTP app/middleware — still fails on Prod
Removed all graphs except one — still fails on Prod
Stripped to hello_world graph with zero deps — still fails on Prod
Pinned base image to 0.7.69 — still fails on Prod
Created brand new Production deployment — still fails
Development deployment with full config — works

Server logs (Production)

Server starts normally, all health checks served, but revision never marked healthy:
Application started up in 25.530s
Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Then Redis fails:
redis: connection pool: failed to dial after 5 attempts: connect: connection refused

Request

The Production deployment infrastructure appears to have a Redis provisioning issue on the “new architecture.” Can you investigate the Redis connectivity for our Production deployment?

lc-chad · March 16, 2026, 2:37pm

Hi @aadedewe_epoch

We did have an incident that we posted to our status page which lines up with the timeout error you reported:

If the issue is persisting please let us know.

Best,
Chad

Topic		Replies	Views
Prod Langsmith deployment raising internal redis errors Deployment	2	167	January 5, 2026
LangGraph deployment fail because of a Redis connection issue Deployment	1	253	October 2, 2025
LangSmith Cloud Deployment Intermittent Issue Deployment cloud	8	145	April 3, 2026
Deployment constantly failing with no logs Deployment cloud	3	571	July 8, 2025
Deployment failing after LangChain/Graph JS v1 upgrade Deployment js-help	8	586	November 13, 2025