feat: automate load and scalability testing in CI pipeline#671
feat: automate load and scalability testing in CI pipeline#671ARCoder181105 wants to merge 5 commits intoOneBusAway:mainfrom
Conversation
87576de to
c3fe471
Compare
aaronbrethorst
left a comment
There was a problem hiding this comment.
Aditya, the two-tier strategy here is well thought out — a fast smoke gate on PRs and a full stress test with pprof on main/nightly is exactly the right architecture for catching performance regressions without slowing down the PR cycle. The health-check polling loop instead of a fragile sleep is a nice detail, and the continue-on-error + deferred failure pattern for capturing results before failing the job is solid.
There's one bug that will prevent the PR comment feature from working, and a data mismatch that means the smoke test won't exercise real query paths in CI. Both should be fixed before merging.
Critical
-
handleSummary()insmoke.jsprevents--summary-exportfrom creating the JSON file (loadtest/k6/smoke.js:20-22,loadtest-smoke.yml:88)When k6 sees a
handleSummaryexport, it takes over all summary output — the--summary-exportCLI flag is ignored (k6 docs). Since yourhandleSummaryonly writes tostdout, the fileloadtest/k6/smoke-summary.jsonis never created. The PR comment step will always fall into thecatchblock and post "Could not parse smoke test results."Fix: either remove
handleSummaryentirely (let--summary-exportdo its job), or add the file output inside it:import { textSummary } from 'https://jslib.k6.io/k6-summary/0.0.1/index.js'; export function handleSummary(data) { return { stdout: textSummary(data, { indent: ' ', enableColors: false }), 'loadtest/k6/smoke-summary.json': JSON.stringify(data), }; }
Important
-
Smoke test uses IDs and coordinates that don't exist in the RABA test data (
loadtest/k6/smoke.js)The CI config loads
testdata/raba.zip(Redding, CA — agency ID25, stops like1001–1369, lat ~40.57). But the smoke test hardcodes:routes-for-agency/40.json— agency40doesn't exist in RABA (should be25)arrivals-and-departures-for-stop/1_75403.json— stop75403doesn't exist in RABAstops-for-location.json?lat=47.6062&lon=-122.3321— Seattle coordinates, 700km from Redding
Since 4xx responses are excluded from failures, the test "passes" but only exercises error/empty-result paths, not actual data serving. The healthz check is the only endpoint actually returning real data.
Suggested fix for the smoke test:
// RABA agency const AGENCY_ID = '25'; // Known RABA stop const STOP_ID = '25_1001'; // Redding, CA center const LAT = '40.5865'; const LON = '-122.3917';
The pre-existing k6 data CSV files (
loadtest/k6/data/*.csv) also contain Seattle-area data, not RABA data. This means the stress test has the same issue. Consider adding a note toloadtest/README.mdabout regenerating these for the target dataset, or adding a script that generates them from the configured GTFS feed.
Fit and Finish
-
Missing newline at end of both YAML files (
.github/workflows/loadtest-smoke.yml:159,.github/workflows/loadtest-stress.yml:131)Both files are missing a trailing newline. Most editors and linters flag this. Add a blank line at the end of each.
-
Redundant check in
smoke.js(loadtest/k6/smoke.js:44-45)'arrivals-and-departures: no 5xx': (r) => r.status < 500, 'arrivals-and-departures: no panic': (r) => r.status !== 500,
These are logically identical —
status < 500already excludes500. Remove the second one or make it check for something different (e.g., response body not containing a stack trace).
c3fe471 to
d62a033
Compare
Integrate existing k6 load testing scripts into the CI pipeline using a two-tiered strategy to prevent performance regressions and latency spikes from silently merging into the main branch. Changes include: - Tier 1 (PR Smoke Test): Runs 5 VUs x 30s against critical endpoints on every PR. Posts a latency/error summary comment directly on the PR. - Tier 2 (Nightly Stress Test): Runs the full ramp-up scenario on merges to main and nightly crons. Captures CPU profile concurrently during peak load and uploads pprof artifacts for historical baselining. - k6 Scripts: Added smoke.js, updated scenarios.js to exclude 4xx responses from failure metrics, and fixed fallback stop IDs. - CI/CD: Server readiness now uses a health-check polling loop against /healthz instead of a static sleep. Threshold SLOs enforced: - Smoke (PR): p(95) latency < 300ms, Error rate (5xx) < 1% - Stress (nightly): p(99) latency < 200ms, Error rate < 1% Closes OneBusAway#672
d62a033 to
63fc4f0
Compare
This commit resolves the final issues with the load testing PR: - Fix k6 smoke test to correctly export summary JSON for PR comments - Update default coordinates and IDs in load tests to match the RABA test dataset used in CI (prevents false-positive 404s) - Switch k6 stress scenarios to use an exempt API key (org.onebusaway.iphone) to avoid rate limiting false failures during high-concurrency testing - Add documentation to loadtest README regarding test data generation - Fix minor linting issues (trailing newlines, redundant status checks)
|
Just pushed up the final fixes for the load testing harness! Everything is testing completely green locally. Updates included:
Local Verification:
|
|
The CI is failing due to the openAPI.yaml issue I think it might get addressed |
aaronbrethorst
left a comment
There was a problem hiding this comment.
Aditya, nice work addressing the previous feedback — the handleSummary fix is exactly right, the RABA fallback IDs in smoke.js are solid, and removing the redundant check was clean. There's one new issue introduced by the rate-limiter-bypass change that needs to be fixed before this can merge.
Critical
-
scenarios.jsuses an API key that won't authenticate in CI (loadtest/k6/scenarios.js:37,.github/workflows/loadtest-stress.yml:41)The stress test sets
API_KEY = 'org.onebusaway.iphone'to bypass rate limiting. But the CI config only has"api-keys": ["test"]. Theorg.onebusaway.iphonekey is the default exempt key for rate limiting (ExemptApiKeys), but it is not added to theApiKeyslist — these are separate config fields (seeinternal/appconf/json_config.go:38-40).Every request hits
RequestHasInvalidAPIKey()ininternal/app/api_keys.go:13, gets rejected with HTTP 401, and sincehttp.setResponseCallback(http.expectedStatuses({ min: 200, max: 499 }))treats 401 as acceptable, k6 reports 0% failure rate. The stress test passes while exercising zero application code.Fix: either change the API key back to
'test'(matchingsmoke.js), or addorg.onebusaway.iphoneto the CI config'sapi-keysarray. The first option is simpler and consistent withsmoke.js. To address the rate limiting concern at 500 VUs, the CI config already sets"rate-limit": 1000— you could increase this further if needed.
Important
-
CSV data files still contain Seattle/Sound Transit IDs (
loadtest/k6/data/*.csv)The CSV files loaded by
scenarios.js(stop IDs, route IDs, trip IDs, locations) contain Sound Transit data (Seattle-area coordinates, agency40IDs). In CI, wheretestdata/raba.zipis loaded, none of these IDs exist — every request from CSV data returns 404, which is excluded from failure metrics.The fallback IDs are correct now (
25_1001, agency25, Redding coordinates), butrandomItem()will return a CSV value (not the fallback) as long as the CSV files are non-empty. So the fallbacks are effectively dead code in CI.The README note you added acknowledges this, which is good. But consider either: (a) generating RABA-specific CSV files, or (b) adding a CI-specific env var to skip CSV loading and only use the hardcoded RABA fallbacks.
Fit and Finish
-
Missing trailing newline in
loadtest-smoke.yml(.github/workflows/loadtest-smoke.yml:159)Still missing from the prior review. The stress YAML was fixed but the smoke YAML wasn't. Add a newline after
exit 1. -
Dangling code fence in README (
loadtest/README.md:35)The "Note on Test Data" section ends with an orphan
```that isn't opening or closing anything. Remove it. -
--summary-exportflag is now redundant in smoke workflow (.github/workflows/loadtest-smoke.yml:87)Since
handleSummary()insmoke.jsnow writessmoke-summary.jsondirectly, the--summary-exportCLI flag is ignored by k6 (k6 disables it whenhandleSummaryis defined). Not a bug — the file gets created correctly — but the flag is misleading dead code. Consider removing it for clarity.
Strengths
- The two-tier architecture (fast PR gate + full nightly stress test with pprof) is well-designed
- Health-check polling loop instead of
sleepis a good pattern - The
continue-on-error+ deferred failure pattern correctly captures artifacts before failing smoke.jsis in great shape — correct API key, correct RABA IDs, correct coordinates- The
handleSummaryfix is clean and handles both stdout and file output correctly
Recommended Action
Request changes. The API key mismatch in scenarios.js means the nightly stress test provides zero signal. Fix that, and ideally address the CSV data issue so the stress test exercises real query paths.
Summary
Integrates k6 load testing into the CI pipeline using a two-tiered strategy — a fast smoke test on every PR, and a full stress test on merges to
main.Problem
The existing
loadtest/k6/scripts require manual execution. This means performance regressions, panics under load, and latency spikes can silently merge intomainwithout any automated gate.Changes
New workflows:
.github/workflows/loadtest-smoke.yml— triggers on every PR, runs 5 VUs x 30s, posts a latency/error summary comment directly on the PR.github/workflows/loadtest-stress.yml— triggers on merge tomainand nightly cron, runs the fullscenarios.jsramp-up and capturespprofprofiles (heap, goroutine, mutex) after load and CPU profile during peak load for accurate profilingUpdated k6 scripts:
loadtest/k6/smoke.js— new lightweight script targeting critical endpoints (/healthz, arrivals, stops, routes)loadtest/k6/scenarios.js— excludes 4xx responses from failure metric, fixes fallback stop IDsloadtest/k6/thresholds.js— adds separatesmokeThresholdsexport; existingthresholdsfor stress test unchangedThreshold SLOs
Local Test Results
Tested against both
testdata/raba.zipand live GTFS feeds:testdata/raba.zipImplementation Notes
make build— stays in sync with Makefile flags (CGO_ENABLED=1,-tags sqlite_fts5) automatically/healthzinstead of a fragilesleephttp_req_failed— only 5xx counts as a failurepprofartifacts on every run to allow for historical baseliningtestdata/raba.zipto avoid external network dependencies on PRsCloses #672