Skip to content

Conversation

@rishikej-stripe
Copy link
Contributor

Summary

This PR does the following:

  1. Adds a CONNECT tunnel limiting mechanism to prevent smokescreen resource exhaustion.
  2. The tunnel limiter increments the acquired slot on every dialContext call and releases the slot when the connection is closed. This tunnel slot release is invoked upon InstrumentedConn closure, which happens when the tunnel is closed.

NOTE: This allows as many CONNECT requests to call smokescreen, perform ACL checks etc, but only the tunnel limit number of connections can actually call DialContext() and establish tunnel. If we want to limit pre-Dial access also, we can configure the rate-limiter and concurrency-limiter for that purpose.

Motivation

The max_concurrent_requests configuration in Smokescreen is ineffective for CONNECT tunnels. The rate limiter acquires a slot when ServeHTTP() starts and releases it via defer when the function returns. For CONNECT requests, goproxy hijacks the connection and ServeHTTP() returns immediately while the tunnel continues in background goroutines. This allows unlimited CONNECT tunnels regardless of the configured limit, enabling resource exhaustion (OOM crash in ~10 seconds with 64MB memory limit).

The current max_concurrent_requests is basically limiting the number of connections being served by smokescreen, but does not monitor if they still last. We need a dedicated tunnel limiter for that.

Test plan

We create 20 non-CONNECT requests and then 20 CONNECT requests to a slow backend via smokescreen proxy. The first 20 requests get handled by the existing concurrency limiter in smokescreen. The tunnel limiting restricts the number of CONNECT requests.

Before

We see the number of connections exceeding the max_concurrent_requests limit and blocking smokescreen resources for 5 seconds.

Screen.Recording.2026-02-12.at.2.30.17.PM.mov

After
We see the tunnel limit is applied and allows only 3 requests. 17 other CONNECT requests are rejected

Screen.Recording.2026-02-12.at.2.32.03.PM.mov

@coveralls
Copy link

Pull Request Test Coverage Report for Build 21940689647

Details

  • 58 of 87 (66.67%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 56.293%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/smokescreen/conntrack/instrumented_conn.go 2 4 50.0%
pkg/smokescreen/config_loader.go 0 3 0.0%
pkg/smokescreen/config.go 0 4 0.0%
cmd/smokescreen.go 0 9 0.0%
pkg/smokescreen/smokescreen.go 21 32 65.63%
Totals Coverage Status
Change from base Build 20851823156: 0.3%
Covered Lines: 1807
Relevant Lines: 3210

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants