Add nginx load balancing support for multi-instance Streamlit deployments#336
Add nginx load balancing support for multi-instance Streamlit deployments#336t0mdavid-m merged 7 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDockerfiles add nginx and a generated /app/entrypoint.sh, plus a public env var STREAMLIT_SERVER_COUNT (default 1). At startup the entrypoint either runs a single Streamlit on 8501 or launches multiple Streamlit processes (8510+) and nginx on 8501 as a load balancer. Changes
Sequence Diagram(s)sequenceDiagram
participant Container as Container Start
participant Entrypoint as Entrypoint Script
participant Nginx as Nginx (8501)
participant S0 as Streamlit (8510 / 8501)
participant Sx as Streamlit (8511+)
participant Client as Client
Container->>Entrypoint: run image CMD -> entrypoint
Entrypoint->>Entrypoint: read STREAMLIT_SERVER_COUNT
alt STREAMLIT_SERVER_COUNT > 1
Entrypoint->>Entrypoint: generate nginx upstream (8510, 8511...)
Entrypoint->>S0: start Streamlit on 8510
Entrypoint->>Sx: start additional Streamlit instances
Entrypoint->>Nginx: write nginx.conf and start nginx on 8501
Client->>Nginx: HTTP request to 8501
Nginx->>S0: proxy to chosen instance
S0-->>Nginx: response
Nginx-->>Client: response
else STREAMLIT_SERVER_COUNT == 1
Entrypoint->>S0: start single Streamlit on 8501
Client->>S0: direct request to 8501
S0-->>Client: response
end
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@entrypoint_simple.sh`:
- Around line 33-54: The current nginx upstream uses ip_hash inside the upstream
block for streamlit_backend which pins sessions by client IP and will misroute
multiple users sharing an IP (e.g., behind NAT or another proxy); replace
ip_hash-based affinity by either removing ip_hash entirely to allow normal
round-robin load balancing, or implement a cookie-based sticky session solution
(e.g., nginx-sticky-module or commercial sticky cookie) in the upstream
configuration for streamlit_backend, or instead move session affinity out of the
proxy and configure Streamlit to use an external session-state backend like
Redis; update the upstream streamlit_backend block and related proxy_pass
handling accordingly.
- Around line 9-11: Validate STREAMLIT_SERVER_COUNT before using it in the
numeric comparison: ensure SERVER_COUNT (the variable set from
STREAMLIT_SERVER_COUNT) contains only digits (e.g., with a regex check like [[
"$SERVER_COUNT" =~ ^[0-9]+$ ]] or a case statement) and if it is not numeric,
set SERVER_COUNT to a safe default (e.g., 1) or exit with a clear error; then
use the existing if [ "$SERVER_COUNT" -gt 1 ] test unchanged. Update the
assignment and add the numeric guard near where SERVER_COUNT is set so invalid
values (like "auto") won't cause the shell arithmetic comparison to fail.
- Around line 74-85: The current loop launches Streamlit instances in background
(variables SERVER_COUNT, BASE_PORT, PORT) and then uses a fixed sleep before
starting nginx, which provides no readiness or supervision; change this by
adding a per-instance readiness loop that polls each Streamlit health endpoint
(e.g., http://127.0.0.1:$PORT/_stcore/health) with curl -sf and a timeout/retry
backoff and only proceed to start nginx when all ports respond healthy, and add
a simple supervisor/trap: track PIDs of background streamlit processes, install
a trap to kill children on exit and run a wait loop that exits the container if
any Streamlit PID dies (or restarts it), so nginx (exec nginx -g 'daemon off;')
is only started after readiness and the container fails fast on instance crash.
In `@entrypoint.sh`:
- Around line 19-24: The RQ workers are started in the background without
supervision; update the worker startup loop (where WORKER_COUNT and the "rq
worker openms-workflows --url $REDIS_URL --name worker-$i &" command are used)
to record each child PID (e.g., append $! to a PID file or an array) and add a
trap/wait loop that monitors child exits (using trap '...' SIGCHLD or a wait -n
loop) to log failures and restart the specific worker process when it dies;
ensure restart logic uses the same invocation and preserves the worker name
(worker-$i) and logs PID changes so crashes are detected and workers are
relaunched.
- Around line 12-17: The Redis readiness loop using "until redis-cli ping" can
hang indefinitely; modify the entrypoint.sh loop (the until ... redis-cli ping
... done block) to enforce a retry limit by introducing a counter and a
max-retries variable (configurable via an env var like REDIS_WAIT_MAX_RETRIES
with a sensible default), incrementing the counter each iteration, and breaking
with a non-zero exit and a clear error message (e.g., "Redis did not become
ready after X attempts") when the limit is reached; keep the existing sleep
interval and successful "Redis is ready" echo on success.
🧹 Nitpick comments (2)
entrypoint_simple.sh (1)
1-89: Significant code duplication withentrypoint.sh.The entire nginx configuration generation and multi-instance startup logic (lines 11–85) is duplicated verbatim between this file and
entrypoint.sh. If the nginx config or startup logic needs to change, both files must be updated in lockstep.Consider extracting the shared nginx/load-balancer logic into a common helper script (e.g.,
start_load_balancer.sh) that both entrypoints source.Dockerfile_simple (1)
28-28: nginx is installed unconditionally, even when load balancing is disabled (default).Since the default
STREAMLIT_SERVER_COUNT=1means nginx is never used, this adds ~5–10 MB to every image. This is a minor trade-off for simplicity, but worth noting — if image size matters, nginx could be installed conditionally or in a separate build target.
entrypoint_simple.sh
Outdated
| SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1} | ||
|
|
||
| if [ "$SERVER_COUNT" -gt 1 ]; then |
There was a problem hiding this comment.
No input validation on STREAMLIT_SERVER_COUNT.
If a user sets STREAMLIT_SERVER_COUNT to a non-numeric value (e.g., "auto"), the [ "$SERVER_COUNT" -gt 1 ] comparison will produce a shell error and the container will exit due to set -e. Consider adding a guard:
Proposed fix
SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1}
+if ! [[ "$SERVER_COUNT" =~ ^[0-9]+$ ]]; then
+ echo "ERROR: STREAMLIT_SERVER_COUNT must be a positive integer, got '$SERVER_COUNT'" >&2
+ exit 1
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1} | |
| if [ "$SERVER_COUNT" -gt 1 ]; then | |
| SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1} | |
| if ! [[ "$SERVER_COUNT" =~ ^[0-9]+$ ]]; then | |
| echo "ERROR: STREAMLIT_SERVER_COUNT must be a positive integer, got '$SERVER_COUNT'" >&2 | |
| exit 1 | |
| fi | |
| if [ "$SERVER_COUNT" -gt 1 ]; then |
🤖 Prompt for AI Agents
In `@entrypoint_simple.sh` around lines 9 - 11, Validate STREAMLIT_SERVER_COUNT
before using it in the numeric comparison: ensure SERVER_COUNT (the variable set
from STREAMLIT_SERVER_COUNT) contains only digits (e.g., with a regex check like
[[ "$SERVER_COUNT" =~ ^[0-9]+$ ]] or a case statement) and if it is not numeric,
set SERVER_COUNT to a safe default (e.g., 1) or exit with a clear error; then
use the existing if [ "$SERVER_COUNT" -gt 1 ] test unchanged. Update the
assignment and add the numeric guard near where SERVER_COUNT is set so invalid
values (like "auto") won't cause the shell arithmetic comparison to fail.
entrypoint_simple.sh
Outdated
| upstream streamlit_backend { | ||
| # Sticky sessions: ensure a user always reaches the same Streamlit instance. | ||
| # This is critical because Streamlit maintains per-session state in-process. | ||
| ip_hash; | ||
|
|
||
| ${UPSTREAM_SERVERS} } | ||
|
|
||
| map \$http_upgrade \$connection_upgrade { | ||
| default upgrade; | ||
| '' close; | ||
| } | ||
|
|
||
| server { | ||
| listen 8501; | ||
|
|
||
| location / { | ||
| proxy_pass http://streamlit_backend; | ||
| proxy_http_version 1.1; | ||
|
|
||
| # WebSocket support (required for Streamlit) | ||
| proxy_set_header Upgrade \$http_upgrade; | ||
| proxy_set_header Connection \$connection_upgrade; |
There was a problem hiding this comment.
ip_hash breaks when clients share an IP or are behind a downstream load balancer.
ip_hash uses the client's IP for sticky sessions. In environments where multiple users share the same public IP (corporate NAT, another reverse proxy in front), all those users will be pinned to the same Streamlit instance, negating the load-balancing benefit and potentially mixing sessions.
If the deployment will sit behind another proxy, consider using a cookie-based sticky session mechanism (e.g., nginx sticky cookie in the commercial version, or the nginx-sticky-module for OSS) or Streamlit's built-in external session-state backend (e.g., Redis) to decouple session affinity from the proxy layer.
🤖 Prompt for AI Agents
In `@entrypoint_simple.sh` around lines 33 - 54, The current nginx upstream uses
ip_hash inside the upstream block for streamlit_backend which pins sessions by
client IP and will misroute multiple users sharing an IP (e.g., behind NAT or
another proxy); replace ip_hash-based affinity by either removing ip_hash
entirely to allow normal round-robin load balancing, or implement a cookie-based
sticky session solution (e.g., nginx-sticky-module or commercial sticky cookie)
in the upstream configuration for streamlit_backend, or instead move session
affinity out of the proxy and configure Streamlit to use an external
session-state backend like Redis; update the upstream streamlit_backend block
and related proxy_pass handling accordingly.
entrypoint_simple.sh
Outdated
| for i in $(seq 0 $((SERVER_COUNT - 1))); do | ||
| PORT=$((BASE_PORT + i)) | ||
| echo "Starting Streamlit instance on port $PORT..." | ||
| streamlit run app.py --server.port $PORT --server.address 127.0.0.1 & | ||
| done | ||
|
|
||
| # Give Streamlit instances a moment to start | ||
| sleep 2 | ||
|
|
||
| # Start nginx in foreground as the main process | ||
| echo "Starting nginx load balancer on port 8501..." | ||
| exec nginx -g 'daemon off;' |
There was a problem hiding this comment.
Backgrounded Streamlit instances are unsupervised; no health-check before starting nginx.
- Streamlit instances are launched with
&but nothing monitors them. If one crashes, nginx will route traffic to a dead upstream and return 502s indefinitely. sleep 2is a fixed delay rather than an actual readiness check. On slow or resource-constrained hosts, 2 seconds may not be enough.
Consider:
- Adding a readiness loop that polls each Streamlit port (e.g.,
curl -sf http://127.0.0.1:$PORT/_stcore/health) before starting nginx. - Using a lightweight process supervisor (e.g.,
supervisord, or a trap + wait loop) so that if any Streamlit instance dies, the container can either restart it or exit cleanly.
🤖 Prompt for AI Agents
In `@entrypoint_simple.sh` around lines 74 - 85, The current loop launches
Streamlit instances in background (variables SERVER_COUNT, BASE_PORT, PORT) and
then uses a fixed sleep before starting nginx, which provides no readiness or
supervision; change this by adding a per-instance readiness loop that polls each
Streamlit health endpoint (e.g., http://127.0.0.1:$PORT/_stcore/health) with
curl -sf and a timeout/retry backoff and only proceed to start nginx when all
ports respond healthy, and add a simple supervisor/trap: track PIDs of
background streamlit processes, install a trap to kill children on exit and run
a wait loop that exits the container if any Streamlit PID dies (or restarts it),
so nginx (exec nginx -g 'daemon off;') is only started after readiness and the
container fails fast on instance crash.
entrypoint.sh
Outdated
| # Wait for Redis to be ready | ||
| until redis-cli ping > /dev/null 2>&1; do | ||
| echo "Waiting for Redis..." | ||
| sleep 1 | ||
| done | ||
| echo "Redis is ready" |
There was a problem hiding this comment.
Redis readiness loop has no timeout — container can hang indefinitely.
If Redis fails to start (e.g., misconfiguration, port conflict), this loop will spin forever and the container will appear stuck. Add a retry limit:
Proposed fix
# Wait for Redis to be ready
-until redis-cli ping > /dev/null 2>&1; do
+RETRIES=0
+MAX_RETRIES=30
+until redis-cli ping > /dev/null 2>&1; do
+ RETRIES=$((RETRIES + 1))
+ if [ "$RETRIES" -ge "$MAX_RETRIES" ]; then
+ echo "ERROR: Redis failed to start after $MAX_RETRIES seconds" >&2
+ exit 1
+ fi
echo "Waiting for Redis..."
sleep 1
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Wait for Redis to be ready | |
| until redis-cli ping > /dev/null 2>&1; do | |
| echo "Waiting for Redis..." | |
| sleep 1 | |
| done | |
| echo "Redis is ready" | |
| # Wait for Redis to be ready | |
| RETRIES=0 | |
| MAX_RETRIES=30 | |
| until redis-cli ping > /dev/null 2>&1; do | |
| RETRIES=$((RETRIES + 1)) | |
| if [ "$RETRIES" -ge "$MAX_RETRIES" ]; then | |
| echo "ERROR: Redis failed to start after $MAX_RETRIES seconds" >&2 | |
| exit 1 | |
| fi | |
| echo "Waiting for Redis..." | |
| sleep 1 | |
| done | |
| echo "Redis is ready" |
🤖 Prompt for AI Agents
In `@entrypoint.sh` around lines 12 - 17, The Redis readiness loop using "until
redis-cli ping" can hang indefinitely; modify the entrypoint.sh loop (the until
... redis-cli ping ... done block) to enforce a retry limit by introducing a
counter and a max-retries variable (configurable via an env var like
REDIS_WAIT_MAX_RETRIES with a sensible default), incrementing the counter each
iteration, and breaking with a non-zero exit and a clear error message (e.g.,
"Redis did not become ready after X attempts") when the limit is reached; keep
the existing sleep interval and successful "Redis is ready" echo on success.
entrypoint.sh
Outdated
| # Start RQ worker(s) in background | ||
| WORKER_COUNT=${RQ_WORKER_COUNT:-1} | ||
| echo "Starting $WORKER_COUNT RQ worker(s)..." | ||
| for i in $(seq 1 $WORKER_COUNT); do | ||
| rq worker openms-workflows --url $REDIS_URL --name worker-$i & | ||
| done |
There was a problem hiding this comment.
RQ workers are fire-and-forget — no monitoring or restart on failure.
Similar to the Streamlit instances, RQ workers are backgrounded and never checked. If a worker crashes, the job queue will silently stop processing. Consider logging worker PIDs and/or using a trap to detect child exits.
🤖 Prompt for AI Agents
In `@entrypoint.sh` around lines 19 - 24, The RQ workers are started in the
background without supervision; update the worker startup loop (where
WORKER_COUNT and the "rq worker openms-workflows --url $REDIS_URL --name
worker-$i &" command are used) to record each child PID (e.g., append $! to a
PID file or an array) and add a trap/wait loop that monitors child exits (using
trap '...' SIGCHLD or a wait -n loop) to log failures and restart the specific
worker process when it dies; ensure restart logic uses the same invocation and
preserves the worker name (worker-$i) and logs PID changes so crashes are
detected and workers are relaunched.
When STREAMLIT_SERVER_COUNT > 1, the entrypoint dynamically generates an nginx config and launches multiple Streamlit instances on internal ports (8510+), with nginx on port 8501 using ip_hash sticky sessions for WebSocket compatibility. Default (STREAMLIT_SERVER_COUNT=1) preserves existing behavior with no nginx overhead. https://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
b58b03c to
be23790
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Dockerfile (1)
162-219: 🛠️ Refactor suggestion | 🟠 MajorSame inline-script and maintainability concerns as
Dockerfile_simple— extract to an externalentrypoint.shfile.This is the same pattern flagged in
Dockerfile_simple. The PR description states entrypoint scripts were "replaced with external entrypoint scripts" but both Dockerfiles still generate them inline. Extracting to aCOPY'd file is even more important here due to the additional Redis/RQ logic increasing the script's complexity.
🤖 Fix all issues with AI agents
In `@Dockerfile`:
- Around line 211-213: Replace the static "sleep 2" before the "exec nginx -g
'daemon off;'" startup with a health-check loop that polls Streamlit's
/_stcore/health endpoint on each internal port (the same ports your nginx
upstreams use) and only proceeds to start nginx when all endpoints respond
healthy; implement a loop that iterates the port list, uses a lightweight HTTP
check (curl -sf or wget --spider) with a short retry/backoff and overall timeout
per port, logs failures and retries, and exits with non-zero if any port never
becomes healthy, then call the existing "exec nginx -g 'daemon off;'" once all
checks pass.
- Around line 187-218: The Dockerfile spawns multiple background services
(Redis, RQ workers, multiple Streamlit instances created from the SERVER_COUNT
loop, and nginx started with exec nginx -g "daemon off;") without a proper
init/supervisor to reap zombies or restart failed services; replace the current
ad-hoc backgrounding with a real process supervisor (e.g., supervisord) or at
minimum use tini as PID 1: add tini as the ENTRYPOINT and install/configure
supervisord to run and manage redis, the RQ worker processes, each streamlit
instance (those started in the for loop using streamlit run app.py --server.port
$PORT --server.address 127.0.0.1), and nginx (instead of exec nginx -g "daemon
off;"), or create a supervisord config that defines programs for redis,
rqworker, streamlit instances (parametrized by SERVER_COUNT), and nginx so they
are supervised, can restart, and are correctly reaped.
In `@Dockerfile_simple`:
- Line 102: Validate the STREAMLIT_SERVER_COUNT/SERVER_COUNT env var before
using the numeric comparison in the entrypoint: check that "$SERVER_COUNT"
matches a positive-integer regex and is >=1, and if not print a clear error to
stderr and exit non‑zero; then you can safely use the existing conditional that
contains [ "$SERVER_COUNT" -gt 1 ]. Ensure you update the variable name
references (STREAMLIT_SERVER_COUNT -> SERVER_COUNT) consistently and place the
validation before the line with [ "$SERVER_COUNT" -gt 1 ].
- Around line 116-125: The startup loop that launches multiple Streamlit
instances by backgrounding "streamlit run app.py ... &" (using SERVER_COUNT and
BASE_PORT) leaves those children orphaned when the script execs nginx ("exec
nginx -g \"daemon off;\""), causing zombie processes and no restarts; replace
this pattern by running a proper init/reaper (add tini as PID 1 or run the
container with --init) and supervise/restart the Streamlit workers instead of
backgrounding them — e.g., use a lightweight supervisor (supervisord) or
implement a restart loop/trap in the entrypoint to launch managed workers and
have the supervisor start nginx (or run nginx under the supervisor) so Streamlit
processes are reaped and automatically restarted on crash.
- Around line 92-131: The Dockerfile currently builds a fragile inline
entrypoint by echoing a large heredoc into /app/entrypoint.sh (see the block
creating /app/entrypoint.sh which references STREAMLIT_SERVER_COUNT,
UPSTREAM_SERVERS, BASE_PORT and the nginx config); replace this by moving the
entire script to a new file named entrypoint_simple.sh in the repo, COPY that
file into the image in the Dockerfile, chmod +x it, and change the Dockerfile to
use that script as the container entrypoint (avoid the echo -e generation and
inline nginx config), keeping the same runtime logic (cron start, multi-instance
loop, nginx exec or single exec streamlit) so symbols like
STREAMLIT_SERVER_COUNT, BASE_PORT, and the nginx upstream/template remain
unchanged.
- Around line 123-125: The Dockerfile currently uses a hardcoded "sleep 2"
before starting nginx which can cause 502s; replace that with a readiness loop
that polls Streamlit's health endpoint (/_stcore/health) on the Streamlit
port(s) using curl and only proceeds to run the final exec nginx -g "daemon
off;" once a successful HTTP 200 response is received (with a timeout and
retry/backoff to avoid infinite loops); remove the static sleep and ensure the
loop logs retries and exits non-zero on overall timeout so nginx (the existing
exec command) only starts when Streamlit is ready.
🧹 Nitpick comments (1)
Dockerfile (1)
196-199:PORTvariable in the loop shadows the buildARG PORT=8501— no functional issue but potentially confusing.The entrypoint script uses
PORTas a loop variable for the internal Streamlit ports (8510+), while the Dockerfile declaresARG PORT=8501at line 13. Since the ARG doesn't persist into the entrypoint's runtime environment, there's no conflict, but using a different name (e.g.,INSTANCE_PORT) would improve clarity.
| # Load balancer setup\n\ | ||
| SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1}\n\ | ||
| \n\ | ||
| if [ "$SERVER_COUNT" -gt 1 ]; then\n\ | ||
| echo "Starting $SERVER_COUNT Streamlit instances with nginx load balancer..."\n\ | ||
| \n\ | ||
| # Generate nginx upstream block\n\ | ||
| UPSTREAM_SERVERS=""\n\ | ||
| BASE_PORT=8510\n\ | ||
| for i in $(seq 0 $((SERVER_COUNT - 1))); do\n\ | ||
| PORT=$((BASE_PORT + i))\n\ | ||
| UPSTREAM_SERVERS="${UPSTREAM_SERVERS} server 127.0.0.1:${PORT};\\n"\n\ | ||
| done\n\ | ||
| \n\ | ||
| # Write nginx config\n\ | ||
| echo -e "worker_processes auto;\\npid /run/nginx.pid;\\n\\nevents {\\n worker_connections 1024;\\n}\\n\\nhttp {\\n upstream streamlit_backend {\\n ip_hash;\\n${UPSTREAM_SERVERS} }\\n\\n map \\$http_upgrade \\$connection_upgrade {\\n default upgrade;\\n \\x27\\x27 close;\\n }\\n\\n server {\\n listen 8501;\\n\\n location / {\\n proxy_pass http://streamlit_backend;\\n proxy_http_version 1.1;\\n proxy_set_header Upgrade \\$http_upgrade;\\n proxy_set_header Connection \\$connection_upgrade;\\n proxy_set_header Host \\$host;\\n proxy_set_header X-Real-IP \\$remote_addr;\\n proxy_set_header X-Forwarded-For \\$proxy_add_x_forwarded_for;\\n proxy_set_header X-Forwarded-Proto \\$scheme;\\n proxy_read_timeout 86400;\\n proxy_send_timeout 86400;\\n proxy_buffering off;\\n }\\n }\\n}" > /etc/nginx/nginx.conf\n\ | ||
| \n\ | ||
| # Start Streamlit instances on internal ports (localhost only)\n\ | ||
| for i in $(seq 0 $((SERVER_COUNT - 1))); do\n\ | ||
| PORT=$((BASE_PORT + i))\n\ | ||
| echo "Starting Streamlit instance on port $PORT..."\n\ | ||
| streamlit run app.py --server.port $PORT --server.address 127.0.0.1 &\n\ | ||
| done\n\ | ||
| \n\ | ||
| sleep 2\n\ | ||
| echo "Starting nginx load balancer on port 8501..."\n\ | ||
| exec nginx -g "daemon off;"\n\ | ||
| else\n\ | ||
| # Single instance mode (default) - run Streamlit directly on port 8501\n\ | ||
| echo "Starting Streamlit app..."\n\ | ||
| exec streamlit run app.py\n\ | ||
| fi\n\ |
There was a problem hiding this comment.
Compounded unsupervised-process problem: Redis + RQ workers + Streamlit instances + nginx.
In multi-instance mode, this container runs at least 4+ background daemons (Redis, N RQ workers, M Streamlit instances) with nginx as PID 1. None of the background processes have supervision or restart capability, and nginx won't reap zombie children.
This is the same issue flagged in Dockerfile_simple but amplified here. A process supervisor like supervisord would be strongly recommended for this container, or at minimum tini as the init process.
🤖 Prompt for AI Agents
In `@Dockerfile` around lines 187 - 218, The Dockerfile spawns multiple background
services (Redis, RQ workers, multiple Streamlit instances created from the
SERVER_COUNT loop, and nginx started with exec nginx -g "daemon off;") without a
proper init/supervisor to reap zombies or restart failed services; replace the
current ad-hoc backgrounding with a real process supervisor (e.g., supervisord)
or at minimum use tini as PID 1: add tini as the ENTRYPOINT and
install/configure supervisord to run and manage redis, the RQ worker processes,
each streamlit instance (those started in the for loop using streamlit run
app.py --server.port $PORT --server.address 127.0.0.1), and nginx (instead of
exec nginx -g "daemon off;"), or create a supervisord config that defines
programs for redis, rqworker, streamlit instances (parametrized by
SERVER_COUNT), and nginx so they are supervised, can restart, and are correctly
reaped.
Dockerfile
Outdated
| sleep 2\n\ | ||
| echo "Starting nginx load balancer on port 8501..."\n\ | ||
| exec nginx -g "daemon off;"\n\ |
There was a problem hiding this comment.
Same sleep 2 readiness concern as Dockerfile_simple.
Replace with a health-check loop polling Streamlit's /_stcore/health endpoint on each internal port before starting nginx. See the proposed fix in the Dockerfile_simple review.
🤖 Prompt for AI Agents
In `@Dockerfile` around lines 211 - 213, Replace the static "sleep 2" before the
"exec nginx -g 'daemon off;'" startup with a health-check loop that polls
Streamlit's /_stcore/health endpoint on each internal port (the same ports your
nginx upstreams use) and only proceeds to start nginx when all endpoints respond
healthy; implement a loop that iterates the port list, uses a lightweight HTTP
check (curl -sf or wget --spider) with a short retry/backoff and overall timeout
per port, logs failures and retries, and exits with non-zero if any port never
becomes healthy, then call the existing "exec nginx -g 'daemon off;'" once all
checks pass.
| RUN echo -e '#!/bin/bash\n\ | ||
| set -e\n\ | ||
| source /root/miniforge3/bin/activate streamlit-env\n\ | ||
| \n\ | ||
| # Start cron for workspace cleanup\n\ | ||
| service cron start\n\ | ||
| \n\ | ||
| # Load balancer setup\n\ | ||
| SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1}\n\ | ||
| \n\ | ||
| if [ "$SERVER_COUNT" -gt 1 ]; then\n\ | ||
| echo "Starting $SERVER_COUNT Streamlit instances with nginx load balancer..."\n\ | ||
| \n\ | ||
| # Generate nginx upstream block\n\ | ||
| UPSTREAM_SERVERS=""\n\ | ||
| BASE_PORT=8510\n\ | ||
| for i in $(seq 0 $((SERVER_COUNT - 1))); do\n\ | ||
| PORT=$((BASE_PORT + i))\n\ | ||
| UPSTREAM_SERVERS="${UPSTREAM_SERVERS} server 127.0.0.1:${PORT};\\n"\n\ | ||
| done\n\ | ||
| \n\ | ||
| # Write nginx config\n\ | ||
| echo -e "worker_processes auto;\\npid /run/nginx.pid;\\n\\nevents {\\n worker_connections 1024;\\n}\\n\\nhttp {\\n upstream streamlit_backend {\\n ip_hash;\\n${UPSTREAM_SERVERS} }\\n\\n map \\$http_upgrade \\$connection_upgrade {\\n default upgrade;\\n \\x27\\x27 close;\\n }\\n\\n server {\\n listen 8501;\\n\\n location / {\\n proxy_pass http://streamlit_backend;\\n proxy_http_version 1.1;\\n proxy_set_header Upgrade \\$http_upgrade;\\n proxy_set_header Connection \\$connection_upgrade;\\n proxy_set_header Host \\$host;\\n proxy_set_header X-Real-IP \\$remote_addr;\\n proxy_set_header X-Forwarded-For \\$proxy_add_x_forwarded_for;\\n proxy_set_header X-Forwarded-Proto \\$scheme;\\n proxy_read_timeout 86400;\\n proxy_send_timeout 86400;\\n proxy_buffering off;\\n }\\n }\\n}" > /etc/nginx/nginx.conf\n\ | ||
| \n\ | ||
| # Start Streamlit instances on internal ports (localhost only)\n\ | ||
| for i in $(seq 0 $((SERVER_COUNT - 1))); do\n\ | ||
| PORT=$((BASE_PORT + i))\n\ | ||
| echo "Starting Streamlit instance on port $PORT..."\n\ | ||
| streamlit run app.py --server.port $PORT --server.address 127.0.0.1 &\n\ | ||
| done\n\ | ||
| \n\ | ||
| sleep 2\n\ | ||
| echo "Starting nginx load balancer on port 8501..."\n\ | ||
| exec nginx -g "daemon off;"\n\ | ||
| else\n\ | ||
| # Single instance mode (default) - run Streamlit directly on port 8501\n\ | ||
| echo "Starting Streamlit app..."\n\ | ||
| exec streamlit run app.py\n\ | ||
| fi\n\ | ||
| ' > /app/entrypoint.sh |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Massive inline script is fragile and hard to maintain — consider an external file.
The entrypoint script is generated via a single echo -e with deeply nested escaping (especially the nginx config on line 114). This makes the script nearly impossible to read, debug, or modify safely. A single misplaced \n or \\ will silently produce a broken script or nginx config at runtime, with no build-time feedback.
Consider using a separate entrypoint_simple.sh file and COPYing it into the image (which is what the PR description says was done, but the actual code still uses inline generation).
🛠️ Proposed approach
Create a file entrypoint_simple.sh alongside the Dockerfile with the script contents, then in the Dockerfile:
-RUN echo -e '#!/bin/bash\n\
-set -e\n\
-...long inline script...
-' > /app/entrypoint.sh
+COPY entrypoint_simple.sh /app/entrypoint.shThis gives you:
- Syntax highlighting and linting in your editor/CI
- Shell linting via
shellcheck - Readable nginx config generation
- Easier diffs in future PRs
🤖 Prompt for AI Agents
In `@Dockerfile_simple` around lines 92 - 131, The Dockerfile currently builds a
fragile inline entrypoint by echoing a large heredoc into /app/entrypoint.sh
(see the block creating /app/entrypoint.sh which references
STREAMLIT_SERVER_COUNT, UPSTREAM_SERVERS, BASE_PORT and the nginx config);
replace this by moving the entire script to a new file named
entrypoint_simple.sh in the repo, COPY that file into the image in the
Dockerfile, chmod +x it, and change the Dockerfile to use that script as the
container entrypoint (avoid the echo -e generation and inline nginx config),
keeping the same runtime logic (cron start, multi-instance loop, nginx exec or
single exec streamlit) so symbols like STREAMLIT_SERVER_COUNT, BASE_PORT, and
the nginx upstream/template remain unchanged.
| # Load balancer setup\n\ | ||
| SERVER_COUNT=${STREAMLIT_SERVER_COUNT:-1}\n\ | ||
| \n\ | ||
| if [ "$SERVER_COUNT" -gt 1 ]; then\n\ |
There was a problem hiding this comment.
Non-numeric STREAMLIT_SERVER_COUNT will crash the entrypoint with an unhelpful error.
The [ "$SERVER_COUNT" -gt 1 ] comparison will fail with a confusing error if the env var is set to a non-integer value (e.g., "auto" or ""). Consider adding input validation early in the script.
🛠️ Proposed validation
# Validate SERVER_COUNT is a positive integer
if ! [[ "$SERVER_COUNT" =~ ^[0-9]+$ ]] || [ "$SERVER_COUNT" -lt 1 ]; then
echo "ERROR: STREAMLIT_SERVER_COUNT must be a positive integer, got: '$SERVER_COUNT'" >&2
exit 1
fi🤖 Prompt for AI Agents
In `@Dockerfile_simple` at line 102, Validate the
STREAMLIT_SERVER_COUNT/SERVER_COUNT env var before using the numeric comparison
in the entrypoint: check that "$SERVER_COUNT" matches a positive-integer regex
and is >=1, and if not print a clear error to stderr and exit non‑zero; then you
can safely use the existing conditional that contains [ "$SERVER_COUNT" -gt 1 ].
Ensure you update the variable name references (STREAMLIT_SERVER_COUNT ->
SERVER_COUNT) consistently and place the validation before the line with [
"$SERVER_COUNT" -gt 1 ].
Dockerfile_simple
Outdated
| # Start Streamlit instances on internal ports (localhost only)\n\ | ||
| for i in $(seq 0 $((SERVER_COUNT - 1))); do\n\ | ||
| PORT=$((BASE_PORT + i))\n\ | ||
| echo "Starting Streamlit instance on port $PORT..."\n\ | ||
| streamlit run app.py --server.port $PORT --server.address 127.0.0.1 &\n\ | ||
| done\n\ | ||
| \n\ | ||
| sleep 2\n\ | ||
| echo "Starting nginx load balancer on port 8501..."\n\ | ||
| exec nginx -g "daemon off;"\n\ |
There was a problem hiding this comment.
Backgrounded Streamlit processes lack supervision and will become zombies under nginx PID 1.
When SERVER_COUNT > 1, Streamlit instances are forked into the background (&), then exec nginx replaces the shell as PID 1. This creates two problems:
- Zombie processes: nginx won't
wait()on the orphaned Streamlit children. If any Streamlit process exits, it becomes a zombie. - No restart on crash: If a Streamlit instance crashes, it stays down permanently with no supervision.
Consider using tini (or docker run --init) as PID 1 to handle zombie reaping, and a lightweight process supervisor (e.g., supervisord) or a trap-based loop to restart crashed instances.
🤖 Prompt for AI Agents
In `@Dockerfile_simple` around lines 116 - 125, The startup loop that launches
multiple Streamlit instances by backgrounding "streamlit run app.py ... &"
(using SERVER_COUNT and BASE_PORT) leaves those children orphaned when the
script execs nginx ("exec nginx -g \"daemon off;\""), causing zombie processes
and no restarts; replace this pattern by running a proper init/reaper (add tini
as PID 1 or run the container with --init) and supervise/restart the Streamlit
workers instead of backgrounding them — e.g., use a lightweight supervisor
(supervisord) or implement a restart loop/trap in the entrypoint to launch
managed workers and have the supervisor start nginx (or run nginx under the
supervisor) so Streamlit processes are reaped and automatically restarted on
crash.
Dockerfile_simple
Outdated
| sleep 2\n\ | ||
| echo "Starting nginx load balancer on port 8501..."\n\ | ||
| exec nginx -g "daemon off;"\n\ |
There was a problem hiding this comment.
Hardcoded sleep 2 is unreliable as a readiness gate.
Streamlit may take longer than 2 seconds to start (especially under load or on slower hardware). nginx will begin proxying immediately, resulting in 502 errors for early requests. Consider a readiness loop that polls the Streamlit ports before starting nginx:
🛠️ Proposed readiness check
- sleep 2
+ # Wait for all Streamlit instances to be ready
+ for i in $(seq 0 $((SERVER_COUNT - 1))); do
+ PORT=$((BASE_PORT + i))
+ echo "Waiting for Streamlit on port $PORT..."
+ until curl -s -o /dev/null http://127.0.0.1:$PORT/_stcore/health; do
+ sleep 1
+ done
+ doneNote: This requires curl to be installed (it's already in the apt-get line). The /_stcore/health endpoint is Streamlit's built-in health check.
🤖 Prompt for AI Agents
In `@Dockerfile_simple` around lines 123 - 125, The Dockerfile currently uses a
hardcoded "sleep 2" before starting nginx which can cause 502s; replace that
with a readiness loop that polls Streamlit's health endpoint (/_stcore/health)
on the Streamlit port(s) using curl and only proceeds to run the final exec
nginx -g "daemon off;" once a successful HTTP 200 response is received (with a
timeout and retry/backoff to avoid infinite loops); remove the static sleep and
ensure the loop logs retries and exits non-zero on overall timeout so nginx (the
existing exec command) only starts when Streamlit is ready.
The mamba environment activation shadows system binaries on the PATH. https://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@Dockerfile`:
- Line 203: The nginx upstream block currently uses ip_hash which pins by
$remote_addr and will break behind a cloud LB/NAT; in the upstream named
streamlit_backend replace ip_hash with a header-based consistent hash (e.g. use
hash $http_x_real_ip consistent) and ensure the proxy setup forwards/relies on
X-Real-IP (see proxy_set_header X-Real-IP) so backend affinity uses the real
client IP header rather than $remote_addr.
In `@Dockerfile_simple`:
- Line 115: The generated nginx upstream uses ip_hash which hashes $remote_addr
and will collapse clients behind a proxy/NAT; update the upstream block for
streamlit_backend to use a consistent hash on the forwarded client IP instead
(e.g., replace the ip_hash directive in the upstream streamlit_backend with a
hash directive that uses $http_x_forwarded_for and the consistent keyword such
as hash $http_x_forwarded_for consistent;) and ensure the existing
proxy_set_header X-Real-IP / X-Forwarded-For headers remain so the backend
selection uses the original client IP.
🧹 Nitpick comments (1)
Dockerfile (1)
187-219: Load balancer logic is copy-pasted fromDockerfile_simple— extract to a shared script.Lines 187–219 are nearly identical to lines 99–131 in
Dockerfile_simple. If a bug is fixed in one (e.g., thesleep 2readiness issue, input validation, or the nginx config), it must be manually replicated in the other — a classic DRY violation.Since the previous review already recommended extracting entrypoint scripts to external files, consider also sharing the load-balancer setup logic between the two entrypoints (e.g., a
setup-nginx.shhelper sourced by bothentrypoint.shandentrypoint_simple.sh).
Dockerfile
Outdated
| \n\ | ||
| # Write nginx config\n\ | ||
| mkdir -p /etc/nginx\n\ | ||
| echo -e "worker_processes auto;\\npid /run/nginx.pid;\\n\\nevents {\\n worker_connections 1024;\\n}\\n\\nhttp {\\n upstream streamlit_backend {\\n ip_hash;\\n${UPSTREAM_SERVERS} }\\n\\n map \\$http_upgrade \\$connection_upgrade {\\n default upgrade;\\n \\x27\\x27 close;\\n }\\n\\n server {\\n listen 8501;\\n\\n location / {\\n proxy_pass http://streamlit_backend;\\n proxy_http_version 1.1;\\n proxy_set_header Upgrade \\$http_upgrade;\\n proxy_set_header Connection \\$connection_upgrade;\\n proxy_set_header Host \\$host;\\n proxy_set_header X-Real-IP \\$remote_addr;\\n proxy_set_header X-Forwarded-For \\$proxy_add_x_forwarded_for;\\n proxy_set_header X-Forwarded-Proto \\$scheme;\\n proxy_read_timeout 86400;\\n proxy_send_timeout 86400;\\n proxy_buffering off;\\n }\\n }\\n}" > /etc/nginx/nginx.conf\n\ |
There was a problem hiding this comment.
Same ip_hash concern as Dockerfile_simple — will break behind an upstream proxy.
ip_hash hashes $remote_addr. When this container runs behind a cloud LB or NAT, all requests share the same peer IP and get pinned to a single backend. See the Dockerfile_simple review comment for the recommended hash $http_x_real_ip consistent; alternative.
🤖 Prompt for AI Agents
In `@Dockerfile` at line 203, The nginx upstream block currently uses ip_hash
which pins by $remote_addr and will break behind a cloud LB/NAT; in the upstream
named streamlit_backend replace ip_hash with a header-based consistent hash
(e.g. use hash $http_x_real_ip consistent) and ensure the proxy setup
forwards/relies on X-Real-IP (see proxy_set_header X-Real-IP) so backend
affinity uses the real client IP header rather than $remote_addr.
Dockerfile_simple
Outdated
| \n\ | ||
| # Write nginx config\n\ | ||
| mkdir -p /etc/nginx\n\ | ||
| echo -e "worker_processes auto;\\npid /run/nginx.pid;\\n\\nevents {\\n worker_connections 1024;\\n}\\n\\nhttp {\\n upstream streamlit_backend {\\n ip_hash;\\n${UPSTREAM_SERVERS} }\\n\\n map \\$http_upgrade \\$connection_upgrade {\\n default upgrade;\\n \\x27\\x27 close;\\n }\\n\\n server {\\n listen 8501;\\n\\n location / {\\n proxy_pass http://streamlit_backend;\\n proxy_http_version 1.1;\\n proxy_set_header Upgrade \\$http_upgrade;\\n proxy_set_header Connection \\$connection_upgrade;\\n proxy_set_header Host \\$host;\\n proxy_set_header X-Real-IP \\$remote_addr;\\n proxy_set_header X-Forwarded-For \\$proxy_add_x_forwarded_for;\\n proxy_set_header X-Forwarded-Proto \\$scheme;\\n proxy_read_timeout 86400;\\n proxy_send_timeout 86400;\\n proxy_buffering off;\\n }\\n }\\n}" > /etc/nginx/nginx.conf\n\ |
There was a problem hiding this comment.
ip_hash will collapse all clients to one backend when behind an upstream proxy or NAT.
ip_hash hashes $remote_addr, which is the direct peer IP. If this container sits behind a cloud load balancer, CDN, or corporate NAT, all users share the same $remote_addr and will always hit the same Streamlit instance — defeating load balancing entirely.
Consider using the hash directive with the forwarded client IP instead:
Proposed nginx upstream change (inside the generated config)
upstream streamlit_backend {
- ip_hash;
+ hash $http_x_real_ip consistent;
server 127.0.0.1:8510;
...
}Alternatively, hash $http_x_forwarded_for consistent; works if X-Real-IP is not set by the outer proxy. The consistent keyword enables consistent hashing so adding/removing backends doesn't remap all sessions.
🤖 Prompt for AI Agents
In `@Dockerfile_simple` at line 115, The generated nginx upstream uses ip_hash
which hashes $remote_addr and will collapse clients behind a proxy/NAT; update
the upstream block for streamlit_backend to use a consistent hash on the
forwarded client IP instead (e.g., replace the ip_hash directive in the upstream
streamlit_backend with a hash directive that uses $http_x_forwarded_for and the
consistent keyword such as hash $http_x_forwarded_for consistent;) and ensure
the existing proxy_set_header X-Real-IP / X-Forwarded-For headers remain so the
backend selection uses the original client IP.
ip_hash pins all users behind the same NAT/VPN/reverse-proxy to a single backend, defeating the load balancer. least_conn distributes new connections to the instance with fewest active connections, and once a WebSocket is established it stays on that backend for the session lifetime, so sticky sessions are not needed. https://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@Dockerfile`:
- Line 203: The nginx upstream for streamlit_backend currently uses
"least_conn", which breaks Streamlit's in-process sessions; update the upstream
definition in the Dockerfile to enable session affinity for "upstream
streamlit_backend" (e.g., replace "least_conn" with "ip_hash" or a sticky
session directive such as the upstream sticky module) so reconnecting users are
routed to the same backend instance; ensure the change is applied where
"upstream streamlit_backend" and "least_conn" appear in the Dockerfile.
In `@Dockerfile_simple`:
- Line 115: The upstream directive uses "least_conn" which breaks Streamlit
session affinity (upstream block named streamlit_backend); change it to a sticky
hashing strategy such as replacing least_conn with "ip_hash" (or use "hash
$cookie_streamlit_session consistent" / "hash $http_x_forwarded_for consistent"
for proxied setups) so repeated requests and reconnects are routed to the same
backend and users don't lose in-process Streamlit state; update the upstream
block for streamlit_backend accordingly and ensure any proxy headers used (e.g.,
X-Forwarded-For) match the chosen hash strategy.
🧹 Nitpick comments (1)
Dockerfile (1)
162-220: Near-complete duplication of entrypoint logic betweenDockerfileandDockerfile_simple.The load-balancing block (nginx config generation, Streamlit instance loop, readiness wait, nginx exec) is copy-pasted across both Dockerfiles. Any bug fix or improvement must be applied in two places. Consider extracting the shared logic into a common script (e.g.,
start-load-balancer.sh) that both entrypoints source or call.
Dockerfile
Outdated
| \n\ | ||
| # Write nginx config\n\ | ||
| mkdir -p /etc/nginx\n\ | ||
| echo -e "worker_processes auto;\\npid /run/nginx.pid;\\n\\nevents {\\n worker_connections 1024;\\n}\\n\\nhttp {\\n upstream streamlit_backend {\\n least_conn;\\n${UPSTREAM_SERVERS} }\\n\\n map \\$http_upgrade \\$connection_upgrade {\\n default upgrade;\\n \\x27\\x27 close;\\n }\\n\\n server {\\n listen 8501;\\n\\n location / {\\n proxy_pass http://streamlit_backend;\\n proxy_http_version 1.1;\\n proxy_set_header Upgrade \\$http_upgrade;\\n proxy_set_header Connection \\$connection_upgrade;\\n proxy_set_header Host \\$host;\\n proxy_set_header X-Real-IP \\$remote_addr;\\n proxy_set_header X-Forwarded-For \\$proxy_add_x_forwarded_for;\\n proxy_set_header X-Forwarded-Proto \\$scheme;\\n proxy_read_timeout 86400;\\n proxy_send_timeout 86400;\\n proxy_buffering off;\\n }\\n }\\n}" > /etc/nginx/nginx.conf\n\ |
There was a problem hiding this comment.
Same least_conn session-affinity issue as Dockerfile_simple.
Streamlit is stateful — least_conn will route reconnecting users to different backends, losing their in-process session state. See the detailed comment on Dockerfile_simple line 115.
🤖 Prompt for AI Agents
In `@Dockerfile` at line 203, The nginx upstream for streamlit_backend currently
uses "least_conn", which breaks Streamlit's in-process sessions; update the
upstream definition in the Dockerfile to enable session affinity for "upstream
streamlit_backend" (e.g., replace "least_conn" with "ip_hash" or a sticky
session directive such as the upstream sticky module) so reconnecting users are
routed to the same backend instance; ensure the change is applied where
"upstream streamlit_backend" and "least_conn" appear in the Dockerfile.
Dockerfile_simple
Outdated
| \n\ | ||
| # Write nginx config\n\ | ||
| mkdir -p /etc/nginx\n\ | ||
| echo -e "worker_processes auto;\\npid /run/nginx.pid;\\n\\nevents {\\n worker_connections 1024;\\n}\\n\\nhttp {\\n upstream streamlit_backend {\\n least_conn;\\n${UPSTREAM_SERVERS} }\\n\\n map \\$http_upgrade \\$connection_upgrade {\\n default upgrade;\\n \\x27\\x27 close;\\n }\\n\\n server {\\n listen 8501;\\n\\n location / {\\n proxy_pass http://streamlit_backend;\\n proxy_http_version 1.1;\\n proxy_set_header Upgrade \\$http_upgrade;\\n proxy_set_header Connection \\$connection_upgrade;\\n proxy_set_header Host \\$host;\\n proxy_set_header X-Real-IP \\$remote_addr;\\n proxy_set_header X-Forwarded-For \\$proxy_add_x_forwarded_for;\\n proxy_set_header X-Forwarded-Proto \\$scheme;\\n proxy_read_timeout 86400;\\n proxy_send_timeout 86400;\\n proxy_buffering off;\\n }\\n }\\n}" > /etc/nginx/nginx.conf\n\ |
There was a problem hiding this comment.
least_conn breaks Streamlit session affinity — users will lose state on reconnect.
Streamlit stores session state in-process. With least_conn, when a user refreshes the page or their WebSocket reconnects (network hiccup, browser tab sleep), the new connection is routed to whichever backend has the fewest connections — likely a different instance. This silently destroys the user's session state (uploaded files, workflow progress, widget values).
The commit message notes that "established WebSocket connections remain bound to their backend," which is true, but any new connection (page refresh, network interruption, new browser tab) will be load-balanced afresh, losing the session.
Neither least_conn nor ip_hash is ideal here. Consider ip_hash (acceptable for most deployments) or hash $cookie_streamlit_session consistent with a sticky cookie if you need to handle clients behind shared NATs:
Proposed approach using ip_hash (simplest fix)
Replace least_conn with ip_hash in the upstream block. For deployments behind a reverse proxy, use hash $http_x_forwarded_for consistent instead.
🤖 Prompt for AI Agents
In `@Dockerfile_simple` at line 115, The upstream directive uses "least_conn"
which breaks Streamlit session affinity (upstream block named
streamlit_backend); change it to a sticky hashing strategy such as replacing
least_conn with "ip_hash" (or use "hash $cookie_streamlit_session consistent" /
"hash $http_x_forwarded_for consistent" for proxied setups) so repeated requests
and reconnects are routed to the same backend and users don't lose in-process
Streamlit state; update the upstream block for streamlit_backend accordingly and
ensure any proxy headers used (e.g., X-Forwarded-For) match the chosen hash
strategy.
nginx defaults to 1MB max body size, which blocks Streamlit file uploads with a 400 error. Set to 0 (unlimited) to let Streamlit enforce its own 200MB limit from config.toml. https://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
least_conn routes each HTTP request independently, so the file upload POST (/_stcore/upload_file) can land on a different backend than the WebSocket session, causing a 400 error. Use hash $remote_addr$http_x_forwarded_for consistent instead: - Provides session affinity so uploads hit the correct backend - Behind a reverse proxy: XFF header differentiates real client IPs - Direct connections: falls back to remote_addr (like ip_hash) - "consistent" minimizes redistribution when backends are added/removed https://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
Replace ip_hash/hash-on-IP with cookie-based session affinity using nginx's built-in map and $request_id: - map $cookie_stroute $route_key: if browser has a "stroute" cookie, reuse its value; otherwise fall back to $request_id (a unique random hex string nginx generates per-request) - hash $route_key consistent: route based on the cookie/random value - add_header Set-Cookie on every response to persist the routing key This ensures each browser gets its own sticky backend regardless of source IP, fixing both: - File uploads (POST must hit the same backend as the WebSocket session) - Load distribution when all users share the same IP (NAT/VPN/proxy) No new packages required - uses only built-in nginx directives. https://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
3 similar comments
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
Summary
This PR adds optional nginx load balancing to support running multiple Streamlit server instances in parallel, improving scalability and availability. The feature is disabled by default (single instance mode) and can be enabled by setting
STREAMLIT_SERVER_COUNTenvironment variable to a value greater than 1.Key Changes
DockerfileandDockerfile_simplefor load balancing capabilitiesentrypoint.shandentrypoint_simple.sh) that replace inline shell script generation:entrypoint.sh: Full-featured version with Redis, RQ workers, cron, and load balancing supportentrypoint_simple.sh: Lightweight version with cron and load balancing supportSTREAMLIT_SERVER_COUNTip_hashfor sticky sessions (critical for Streamlit's in-process state management)STREAMLIT_SERVER_COUNTenvironment variable (default: 1) to control the number of Streamlit instancesdocker-compose.ymlto expose the new environment variable configurationImplementation Details
STREAMLIT_SERVER_COUNT=1(default), Streamlit runs directly on port 8501 with no load balancer overheadSTREAMLIT_SERVER_COUNT>1, multiple Streamlit instances start on internal ports (8510+) and nginx proxies requests on port 8501ip_hashensure users maintain session state across requestshttps://claude.ai/code/session_018VEL5xKZfe4LCcUa8iUHJ9
Summary by CodeRabbit
New Features
Chores