Skip to content

Conversation

@samrose
Copy link
Collaborator

@samrose samrose commented Nov 21, 2025

Depends on #1941

Refactor AMI builds to use nightly base image

Summary

This PR refactors the AMI build pipeline to separate platform provisioning (stage 1) from application installation (stage 2). Stage 1 now builds a single version-agnostic base image nightly, which all
stage 2 builds consume.

Changes

New workflow:

  • .github/workflows/base-image-nightly.yml - Runs daily at 2 AM UTC, builds version-agnostic base stage 1 AMI, replicates to us-east-1 and ap-southeast-1

Stage 1 changes:

  • amazon-arm64-nix.pkr.hcl - Added base-nightly mode with conditional AMI naming
  • ebssurrogate/scripts/surrogate-bootstrap-nix.sh - Removed postgres version variables
  • ansible/tasks/setup-postgrest.yml - moved to stage 2
  • ansible/playbook.yml - Move PostgREST installation to stage 2 only

Stage 2 changes:

  • stage2-nix-psql.pkr.hcl - Search for base-nightly AMI instead of versioned stage 1

Workflow updates:

  • .github/workflows/ami-release-nix.yml - Remove stage 1 build
  • .github/workflows/ami-release-nix-single.yml - Remove stage 1 build
  • .github/workflows/testinfra-ami-build.yml - Remove stage 1 build

Rationale

Current state: Every release workflow builds stage 1 from scratch for each postgres version (15, 17, orioledb-17), taking 30+ minutes and creating 3-4 redundant AMIs per release.

Problem: Stage 1 installs OS packages, system dependencies, and tooling that are identical across all postgres versions. Rebuilding this repeatedly is inefficient.

Solution: Build stage 1 once per night as a version-agnostic base image. All postgres versions share the same tested platform base, and stage 2 handles version-specific installation.

Benefits

  • Release workflows 50-75% faster (skip 30-minute stage 1 build)
  • 66-75% reduction in AMI storage (1 base image vs 3-4 per release)
  • Daily OS security updates automatically incorporated into base
  • Cleaner separation of concerns: platform vs application
  • Both regions (us-east-1, ap-southeast-1) automatically updated

Testing

  1. Manually trigger base-image-nightly workflow
  2. Verify AMI creation in both regions
  3. Trigger ami-release-nix-single for one postgres version
  4. Confirm stage 2 successfully finds and uses nightly base
  5. Run testinfra to validate final AMI

Migration Notes

  • No breaking changes to existing workflows
  • New nightly workflow should run successfully for 3-4 days before merging
  • Old versioned stage 1 AMIs can be cleaned up after validation period
  • Rollback: Latest nightly base is always available; stage 2 builds test branch changes

@samrose samrose force-pushed the base-image-nightly branch from 97b23c9 to ef228f0 Compare December 1, 2025 19:26
@samrose samrose force-pushed the base-image-nightly branch from b1f78cd to 3cc0f96 Compare December 1, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants