Skip to content

Conversation

@Vishak-V
Copy link

@Vishak-V Vishak-V commented Sep 22, 2025

[Short description explaining the high-level reason for the pull request]

Additions

Removals

Changes

  • Migrated base OS from Amazon Linux 2023 to Ubuntu 22.04. Changed package manager from dnf to apt-get. Updated Python from 3.9 to 3.10. Added environment variables to prevent interactive prompts. Improved package cleanup for smaller images

  • Updated minimum Python requirement from >=3.9 to >=3.10 • Changed Python classifier in setup.cfg to 3.10

  • Implemented unique timestamp tagging (ubuntu-YYYYMMDD-HHMMSS-x86) • Replaced docker-compose with direct docker builds • Added registry-first strategy (build deps → push → build main) • Fixed TAG_NAME argument passing issues • Enhanced debugging and logging

  • Added dynamic Python version extraction from setup.cfg • Migrated all workflows to use uv package manager • Implemented virtual environment isolation with uv venv

  • Fixed Parquet test failures by replacing curl with Python requests. Added Parquet file validation using pyarrow. Implemented graceful test skipping when downloads fail. Added proper error handling and cleanup

  • Faster CI/CD with uv package manager. Resolved Python version conflicts. Better caching strategy with unique Docker tags. Enhanced error handling and debugging

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

- Replace amazonlinux:2023 with ubuntu:22.04 base image
- Update package manager from dnf to apt-get
- Update Python requirement from 3.9 to 3.10
- Convert package names to Ubuntu equivalents
- Update Python version from 3.9 to 3.10 in all test workflows
- Upgrade actions/setup-python from v3 to v5 for better Python 3.10 support
- Upgrade actions/checkout from v3 to v4 for improved compatibility
- NRDS workflow already reads Python version dynamically from setup.cfg
- Comment out cache_from in docker-compose to prevent using old Amazon Linux base image
- Add --no-cache flag to forcingprocessor build to ensure clean build
- This should resolve the 'Python 3.9.23 not in >=3.10' error
- Add --pull=false to prevent Docker from pulling old base images from registry
- Add depends_on to ensure proper build order in docker-compose
- This should ensure the locally built Ubuntu-based deps image is used
- Remove ARG TAG_NAME=latest default value in main Dockerfile
- This forces the use of the TAG_NAME passed from docker-compose
- Prevents falling back to old latest tag with Python 3.9
- Ensures latest-x86 tag with Ubuntu/Python 3.10 is used
@Vishak-V Vishak-V force-pushed the ubuntu-base-image branch 2 times, most recently from 0bc0d27 to 384f93e Compare September 24, 2025 19:39
vishakvik and others added 13 commits September 25, 2025 14:03
- Updated build_test_push_docker_fp_X86_arm_version_aware.yaml to use same approach as build_test_fp.yaml
- Implemented unique timestamp tagging to prevent Docker registry caching issues
- Simplified build process with direct docker build and explicit TAG_NAME argument
- Maintained Ubuntu 22.04 and Python 3.10 compatibility
- Removed unnecessary complex changes from earlier debugging session
- Updated 5 workflow files to extract Python version from setup.cfg
- Changed from hardcoded python-version: 3.10 to dynamic version extraction
- Ensures consistency across all workflows and automatic updates when requirements change
- Files updated:
  - forcingprocessor_weights.yaml
  - forcingprocessor_plotting.yaml
  - forcingprocessor_output_opts.yaml
  - forcingprocessor_gcs_sources.yaml
  - forcingprocessor_aws_sources.yaml
- Added uv installation step to all 6 forcing processor workflows
- Replaced traditional pip with faster uv for package installation
- Implemented virtual environment isolation with 'uv venv' and 'source .venv/bin/activate'
- Updated test execution to run within activated virtual environments
- Benefits: faster installs, better dependency resolution, isolated environments

Files updated:
- forcingprocessor_weights.yaml
- forcingprocessor_plotting.yaml
- forcingprocessor_output_opts.yaml
- forcingprocessor_gcs_sources.yaml
- forcingprocessor_aws_sources.yaml
- forcingprocessor_nrds.yaml
- Replace unreliable curl commands with Python requests for file downloads
- Add parquet file validation using pyarrow to ensure files are valid before testing
- Implement graceful test skipping when downloads fail instead of test failures
- Add proper error handling and cleanup for corrupted downloads
- Fixes: pyarrow.lib.ArrowInvalid: Parquet magic bytes not found errors

This resolves the issue where curl was downloading HTML error pages instead of actual parquet files, causing parquet format validation failures.
- name: Build docker containers
run: |
docker system prune -a
ARCH=x86 TAG=latest-x86 docker compose -f docker/docker-compose.yml build forcingprocessor-deps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is direct docker build better here? This seems to work fine, but @harshavemula-ua recently implemented the docker compose build. We should pick a method and stick with it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker was prioritizing remote registry images over local builds. Even when building locally, Docker would pull cached layers from DockerHub. Registry images took precedence over freshly built local images. I can give this another try, but this was the solution I came up with

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should ideally use the docker-compose

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a change, the workflow now uses docker-compose instead of build and pulls the right image.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused on why this is necessary. Docker compose should build forcingprocessor-deps locally as is here. If we were to leave the forcingprocessor-deps build args off, then I do think compose would pull from the registry, but my understanding is compose will build forcingprocessor-deps locally with the current command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a change to fix this. Moving from the previous change made me forget we could do this. Thanks for catching this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed still.

- Modified build_test_fp.yaml to use docker-compose instead of direct docker builds
- Added explicit environment variable setting (TAG, ARCH, TAG_NAME) before docker-compose calls
- Enhanced docker-compose.yml with improved argument fallback mechanisms
- Maintained unique timestamp tagging strategy to prevent registry conflicts
- Preserved registry-first approach: build deps  push  build main
- Added debugging output for environment variable tracking
- Fixed TAG_NAME propagation through docker-compose args to Dockerfile

This resolves the docker-compose tag argument passing issues while maintaining
the robust build strategy that prevents Docker registry caching problems.
- Replace 'docker-compose' with 'docker compose' (modern Docker CLI syntax)
- GitHub Actions runners use newer Docker version with integrated compose
- Maintains all tag passing functionality while using correct command syntax
Copy link
Collaborator

@JordanLaserGit JordanLaserGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to addressing the docker compose build, this PR needs to resolve merge conflicts.

- name: Build docker containers
run: |
docker system prune -a
ARCH=x86 TAG=latest-x86 docker compose -f docker/docker-compose.yml build forcingprocessor-deps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused on why this is necessary. Docker compose should build forcingprocessor-deps locally as is here. If we were to leave the forcingprocessor-deps build args off, then I do think compose would pull from the registry, but my understanding is compose will build forcingprocessor-deps locally with the current command.

- Changed from separate service builds to single 'docker compose build --no-cache'
- Leverages docker-compose dependency relationship (forcingprocessor depends_on forcingprocessor-deps)
- Simplifies build process while maintaining proper tag passing
- Push both images after single build operation
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates the project from Amazon Linux 2023 to Ubuntu 22.04, updating Python from 3.9 to 3.10 and modernizing the CI/CD pipeline with the uv package manager.

  • Replaced Amazon Linux 2023 base image with Ubuntu 22.04 and migrated from dnf to apt-get package manager
  • Updated Python version requirement from >=3.9 to >=3.10 across configuration files
  • Enhanced test reliability by replacing curl commands with Python requests library and adding proper error handling
  • Modernized CI/CD workflows with dynamic Python version extraction, uv package manager, and unique Docker tagging strategy

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
versions.yml Updated version tags for forcingprocessor containers
tests/test_hf2ds.py Replaced curl with requests library and added robust error handling for file downloads
setup.cfg Updated Python version requirement and classifier from 3.9 to 3.10
docker/docker-compose.yml Enhanced argument handling and added dependency management
docker/Dockerfile.forcingprocessor-deps Migrated from Amazon Linux to Ubuntu 22.04 with updated package management
docker/Dockerfile.forcingprocessor Updated default tag naming convention
.github/workflows/*.yaml Modernized all workflows with dynamic Python version extraction and uv package manager

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

&& make -j$(nproc) \
&& make install \
&& echo "/usr/local/lib" > /etc/ld.so.conf.d/geos.conf \
&& echo "/usr/local/lib64" >> /etc/ld.so.conf.d/geos.conf \
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library path configuration is duplicating /usr/local/lib64 since it's already set in the ENV variable at line 104. Consider removing the redundant echo command or consolidating the library path management.

Suggested change
&& echo "/usr/local/lib64" >> /etc/ld.so.conf.d/geos.conf \

Copilot uses AI. Check for mistakes.
Comment on lines 47 to 48
import pytest
pytest.skip("Could not download valid parquet file")
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pytest import should be moved to the top of the file with other imports rather than importing it conditionally within functions. This follows Python import conventions and improves code readability.

Copilot uses AI. Check for mistakes.
ARCH=x86 TAG=latest-x86 docker compose -f docker/docker-compose.yml build forcingprocessor-deps
TAG=latest-x86 docker compose -f docker/docker-compose.yml build forcingprocessor
# Generate unique tag to ensure fresh builds
UNIQUE_TAG="ubuntu-$(date +%Y%m%d-%H%M%S)-x86"
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The hardcoded 'ubuntu-' prefix in the tag name creates tight coupling to the OS choice. Consider using a more generic prefix or making it configurable to improve maintainability if the base OS changes again.

Suggested change
UNIQUE_TAG="ubuntu-$(date +%Y%m%d-%H%M%S)-x86"
BASE_OS=ubuntu
UNIQUE_TAG="${BASE_OS}-$(date +%Y%m%d-%H%M%S)-x86"

Copilot uses AI. Check for mistakes.
- name: Build docker containers
run: |
docker system prune -a
ARCH=x86 TAG=latest-x86 docker compose -f docker/docker-compose.yml build forcingprocessor-deps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed still.

- Removed unique timestamp tag generation for simpler workflow
- Updated build to use docker-compose with latest-x86 tags
- Simplified environment variable setup (TAG=latest-x86, ARCH=x86)
- Updated test section to use latest-x86 instead of unique tags
- Maintains Ubuntu 22.04 base image while aligning with main branch approach
- Cleaner, more maintainable workflow that leverages docker-compose dependencies
- Kept build_test_fp.yaml with Ubuntu 22.04 and docker-compose improvements
- Accepted main's workflow reorganization (new integration workflows)
- Preserved our Ubuntu base image migration and Python 3.10 updates
- Maintained docker-compose build approach with standard tagging
Copilot AI review requested due to automatic review settings November 3, 2025 19:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 47 to 48
import pytest
pytest.skip("Could not download valid parquet file")
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pytest at the module level (top of file) rather than inside conditional blocks. The conditional import is repeated in 6 different test functions, creating code duplication. Moving it to the top with the other imports improves code clarity and follows Python best practices.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +20
def download_and_verify_parquet(url, filepath):
"""Download parquet file and verify it's valid"""
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download_and_verify_parquet and download_gpkg functions share significant duplicated code (90% identical). Consider refactoring into a single generic download_file function with an optional verification callback parameter to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
@@ -1,4 +1,4 @@
ARG TAG_NAME="latest"
ARG TAG_NAME=latest-x86
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value 'latest-x86' appears to be architecture-specific but doesn't align with the ARCH parameter pattern used in Dockerfile.forcingprocessor-deps. This creates inconsistency in how architecture variants are handled across the Docker build system. Consider using a value consistent with the architecture handling approach or documenting why this specific default is needed.

Suggested change
ARG TAG_NAME=latest-x86
ARG ARCH=x86
ARG TAG_NAME=latest-${ARCH}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@JordanLaserGit JordanLaserGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch has merge conflicts and many tests are failing. I suggest isolating/separating the work here into several, more manageable, pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants