Skip to content

TommyDong1998/CodeReviewer-Worker

Repository files navigation

CodeReviewer Worker

This is the worker environment for the CodeReviewer application. It handles long-running security scan tasks via an SQS queue.

Architecture

The worker environment:

  • HTTP server that receives POST requests from Elastic Beanstalk's SQS daemon (sqsd)
  • sqsd automatically polls the SQS queue and forwards messages as HTTP POST requests
  • Runs security scanning tools (Semgrep, OpenGrep, Gitleaks, Checkov, Trivy)
  • Updates scan results in the PostgreSQL database
  • Runs on AWS Elastic Beanstalk Worker tier with t4g.medium instances

How Elastic Beanstalk Worker Tier Works

SQS Queue → sqsd (Beanstalk daemon) → HTTP POST → Worker App
                                                        ↓
                                                   Process Job
                                                        ↓
                                         Return 200 OK / 500 Error
                                                        ↓
                                    sqsd deletes message (if 200)
                                    or retries (if 500)

Architecture Notes

Module System: This worker uses ES Modules (ESM) ("type": "module" in package.json). This is required because web-tree-sitter is an ESM-only package. Do not convert back to CommonJS as it will break the tree-sitter parser.

Setup

  1. Install dependencies:
npm install
  1. Configure environment variables (.env):
POSTGRES_URL=postgresql://user:password@host:port/database
PORT=8080

# GitHub App Configuration (required for private repo access)
GITHUB_APP_ID=your_github_app_id
GITHUB_APP_PRIVATE_KEY_PATH=./config/github-app-private-key.pem

Note: SQS queue configuration is handled by Elastic Beanstalk's sqsd daemon, not by the application directly.

GitHub App Authentication

The worker uses GitHub App authentication to access private repositories. This is the same GitHub App used by the main CodeReviewer application.

How it works:

  1. The main app stores GitHub App installations in the database with access tokens
  2. When a security scan job is queued, it includes the repo's installation ID
  3. The worker fetches or refreshes the installation token from the database
  4. The token is used to authenticate with GitHub when downloading the repository

Configuration:

  • GITHUB_APP_ID: Your GitHub App ID (found in GitHub App settings)
  • GITHUB_APP_PRIVATE_KEY_PATH: Absolute path to the PEM file for your GitHub App private key. By default the repo ships with config/github-app-private-key.pem so you can mount or replace it as needed.
  • (Legacy) GITHUB_APP_PRIVATE_KEY: Only for backwards compatibility. Prefer storing the PEM on disk and pointing GITHUB_APP_PRIVATE_KEY_PATH at it.

The worker will automatically:

  • Use cached installation tokens if still valid (1-hour expiry)
  • Refresh tokens when expired or about to expire
  • Fall back to public repo access if no authentication is available
  1. Build:
npm run build
  1. Run locally (for testing):
npm run dev

Deployment

The worker is deployed to AWS Elastic Beanstalk Worker tier via the infrastructure in CodeReviewerInfra/elastic_beanstalk_worker.tf.

Manual Deployment

  1. Build the application:
npm run build
  1. Create a deployment package:
zip -r worker-deploy.zip package.json dist/ src/ .ebextensions/ Dockerfile
  1. Deploy to Elastic Beanstalk:
eb deploy codereview-production-worker

Security Tools

The worker environment includes the following security scanning tools:

  • Semgrep: SAST using Trail of Bits rules
  • OpenGrep: Fast SAST fork of Semgrep
  • Gitleaks: Secret and credential detection
  • Checkov: Infrastructure as Code scanning
  • Trivy: Dependency vulnerability scanning

All tools are installed via .ebextensions/01_security_tools.config.

Job Processing

The worker receives HTTP POST requests from sqsd with the job in the request body:

{
  "scanId": "scan_123456_abc",
  "repoId": 1,
  "repoUrl": "https://github.com/owner/repo.git",
  "branch": "main",
  "installationId": "12345678"
}

Legacy format (still supported):

{
  "scanId": "scan_123456_abc",
  "repoId": 1,
  "repoUrl": "https://github.com/owner/repo.git",
  "branch": "main",
  "token": "ghp_..."
}

The worker:

  1. Receives HTTP POST request from sqsd (on port 8080, path /)
  2. Parses the job from the request body
  3. Downloads the repository
  4. Runs all security scanners in parallel
  5. Updates the database with results
  6. Returns HTTP 200 (success) or 500 (failure)
  7. sqsd automatically deletes the message from SQS if 200, or retries if 500

Monitoring

  • CloudWatch Logs: /aws/elasticbeanstalk/codereview-production-worker/
  • CloudWatch Alarms:
    • codereview-production-sqs-dlq-messages: Alerts when messages appear in the dead letter queue
    • codereview-production-sqs-message-age: Alerts when messages are not being processed

Troubleshooting

Worker not processing messages

  1. Check CloudWatch Logs for errors
  2. Verify worker HTTP server is running (should see "Worker HTTP server listening on port 8080")
  3. Check sqsd is configured correctly in Beanstalk worker settings
  4. Verify security tools are installed correctly
  5. Test the worker endpoint manually: curl -X POST http://localhost:8080/ -d '{"scanId":"test","repoId":1,...}'

Security scans timing out

The worker has a 1-hour timeout for each scan job. If scans are taking longer:

  1. Increase the SQS visibility timeout
  2. Increase the worker instance size
  3. Optimize the scan configuration (skip certain tools)

Development

Adding a new security scanner

  1. Create a new scanner file in src/security/scanners/
  2. Add the scanner to orchestrator.ts
  3. Update .ebextensions/01_security_tools.config to install the tool
  4. Update the Dockerfile to include the tool installation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors