CI Orchestration - GitHub Actions Runner Autoscaler

Scale GitHub Actions self-hosted runners on Railway from idle to N replicas based on demand. Near-zero cost when no jobs are running.

The problem

Running self-hosted runners on Railway means paying for containers 24/7, even when no CI jobs are running. With 8 replicas, that's 8x the cost sitting idle most of the day.

The solution

This tiny service (< 5 MB RAM) sits between GitHub and Railway. It listens for GitHub webhooks and scales your runner service up and down automatically:

                     workflow_job webhook
  GitHub  ──────────────────────────────────>  Scaler (always on, tiny)
                                                  │
                                                  │ Railway API
                                                  v
                                               Runner service
                                               1 replica (idle)
                                               ──────────────────
                                               N replicas (CI running)

No jobs queued --> 1 idle replica (Railway minimum), containers dead, near-zero cost Jobs queued --> scales up to match demand, restarts containers Jobs done --> scales back to 1 after a grace period

Quick start

See INSTALL.md for the full step-by-step setup guide covering:

GitHub side: PAT creation, webhook configuration
Railway side: runner service, scaler service, environment variables
Verification and troubleshooting

Configuration

All configuration is done via environment variables on the scaler service.

Variable	Required	Default	Description
`RAILWAY_API_TOKEN`	Yes	-	Railway project token to control runner replicas
`TARGET_SERVICE_ID`	Yes	-	Railway service ID of the runner service
`TARGET_ENVIRONMENT_ID`	No	-	Railway environment ID (overrides auto-injected `RAILWAY_ENVIRONMENT_ID`)
`WEBHOOK_SECRET`	No	-	GitHub webhook secret for signature verification
`MAX_REPLICAS`	No	`8`	Maximum number of runner replicas
`SCALE_DOWN_DELAY_MS`	No	`30000`	Wait time (ms) before scaling down after last job
`RUNNER_LABEL`	No	`railway`	Label to filter which jobs trigger scaling
`PORT`	No	`3000`	HTTP port for the scaler service
`GITHUB_TOKEN`	No	-	GitHub PAT for startup sync and periodic reconciliation
`GITHUB_REPO`	No	-	GitHub repo (`owner/repo`) — required with `GITHUB_TOKEN`
`SYNC_INTERVAL_MS`	No	`900000`	Reconciliation interval in ms (default 15 min)

Note on environment ID: Railway auto-injects RAILWAY_ENVIRONMENT_ID into every service with the service's own environment ID. If the scaler and runner are in the same environment (typical), you don't need to set TARGET_ENVIRONMENT_ID -- the auto-injected value works. Set TARGET_ENVIRONMENT_ID only if the runner is in a different environment.

API endpoints

Method	Path	Description
`POST`	`/webhook`	GitHub webhook receiver
`GET`	`/health`	Current state: job counts, replica count, config
`GET`	`/logs`	Last 50 scaling events with timestamps

Health check example

curl https://<your-scaler>.up.railway.app/health

{
  "status": "ok",
  "uptime": 3600,
  "queuedJobs": 0,
  "activeJobs": 0,
  "currentReplicas": 1,
  "maxReplicas": 8,
  "scaleDownPending": false,
  "syncEnabled": true
}

How it works under the hood

Scaling lifecycle

You configure a GitHub webhook that sends workflow_job events to this service
When a job targeting your self-hosted runners is queued, the scaler adds +1 replica (incremental, not jump-to-total)
The scaler then restarts the runner deployment so containers come alive (ephemeral runners exit after each job)
Each runner starts with EPHEMERAL=true -- it registers with GitHub, picks up one job, executes it, then exits cleanly
When a job is completed, the scaler decrements the count and schedules a gradual scale-down
Scale-down removes -1 replica every SCALE_DOWN_DELAY_MS until replicas match the remaining job count (minimum 1)
New jobs arriving during scale-down cancel the timer, keeping spare replicas warm

Concurrency handling

Multiple webhooks arriving simultaneously are handled safely:

A scaling mutex prevents concurrent Railway API calls -- extra requests are deferred and coalesced
During scale-down, new jobs correctly detect that replicas are being reduced and trigger a scale-up
Deployment restarts only fire after the final replica count is committed, preventing partial-scale restarts
State is in-memory only -- with GITHUB_TOKEN/GITHUB_REPO configured, startup reconciliation restores accurate state; without them, the next webhook self-corrects

State reconciliation (optional)

Set GITHUB_TOKEN and GITHUB_REPO to enable automatic state sync:

Startup sync: On boot, the scaler queries Railway for the current replica count and GitHub for queued/active jobs. This prevents premature scale-down if the scaler restarts mid-build.
Periodic reconciliation: Every SYNC_INTERVAL_MS (default 15 min), the scaler re-syncs with both APIs and adjusts replicas if they've drifted from reality (e.g., due to missed webhooks).
Graceful degradation: If the GitHub token is invalid or the API is down, sync failures are logged as warnings and the scaler continues with its current state.

The same GitHub PAT used for the runner's ACCESS_TOKEN works here -- it just needs read access to workflow runs.

Railway healthcheck

This service ships with a railway.json that configures:

Healthcheck path: /health -- Railway pings this on every deploy to confirm the service is up before routing traffic
Healthcheck timeout: 30 seconds (the app starts in under 2 seconds, so this is generous)
Restart policy: ALWAYS with 3 max retries -- if the scaler crashes, Railway restarts it automatically

These settings are picked up automatically when you deploy from this directory. No manual configuration needed in the Railway dashboard.

Workflow configuration

Your GitHub Actions workflows must target the self-hosted label:

jobs:
  build:
    runs-on: [self-hosted, railway]
    steps:
      - uses: actions/checkout@v4
      - run: echo "Running on an autoscaled Railway runner"

The second label (railway) must match the LABELS env var on the runner service and the RUNNER_LABEL env var on the scaler.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
Makefile		Makefile
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
plan-plus-1.md		plan-plus-1.md
railway.json		railway.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CI Orchestration - GitHub Actions Runner Autoscaler

The problem

The solution

Quick start

Configuration

API endpoints

Health check example

How it works under the hood

Scaling lifecycle

Concurrency handling

State reconciliation (optional)

Railway healthcheck

Workflow configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CI Orchestration - GitHub Actions Runner Autoscaler

The problem

The solution

Quick start

Configuration

API endpoints

Health check example

How it works under the hood

Scaling lifecycle

Concurrency handling

State reconciliation (optional)

Railway healthcheck

Workflow configuration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages