Skip to content

Add Pydantic-based postprocessor configuration framework#25

Open
tomdurrant wants to merge 3 commits intomainfrom
postprocessing_enhancement
Open

Add Pydantic-based postprocessor configuration framework#25
tomdurrant wants to merge 3 commits intomainfrom
postprocessing_enhancement

Conversation

@tomdurrant
Copy link
Contributor

Summary

Implements a comprehensive Pydantic-based postprocessor configuration framework that brings parity with the run backend configuration system. Enables configuration-driven postprocessing via YAML/JSON files with CLI support.

Motivation

The existing postprocessing system used string-based processor names, which lacked type safety, validation, and extensibility. This implementation provides:

  • Type-safe configuration with Pydantic validation
  • Entry point-based dynamic loading
  • Consistent API with backend configuration system
  • Full CLI integration
  • Enhanced developer experience for custom postprocessors

Changes

Core Implementation

New Files:

  • src/rompy/postprocess/config.py (226 lines) - Configuration classes and loading infrastructure
  • tests/test_postprocess_config.py (234 lines) - Comprehensive test suite (18 tests)
  • examples/backends/postprocessor_configs/ - Example YAML configurations with documentation

Modified Files:

  • src/rompy/cli.py - Updated CLI commands with --processor-config option
  • src/rompy/model.py - Updated postprocess() to require config objects
  • src/rompy/pipeline/__init__.py - Updated LocalPipelineBackend for config objects
  • pyproject.toml - Added entry point registration
  • tests/backends/test_enhanced_backends.py - Updated all pipeline tests

Documentation

Updated 5 documentation files with 1064+ lines of new content:

  • docs/cli.md - CLI options and usage
  • docs/backends.md - Postprocessor configuration guide
  • docs/plugin_architecture.md - Plugin development
  • docs/configuration_deep_dive.md - Configuration patterns
  • docs/developer/backend_reference.md - Developer reference

Breaking Changes

API Changes

Old (removed):

model_run.postprocess(processor="noop")

New (required):

from rompy.postprocess.config import NoopPostprocessorConfig
config = NoopPostprocessorConfig(validate_outputs=True)
model_run.postprocess(processor=config)

CLI Changes

Old (removed):

rompy pipeline config.yml --processor noop

New (required):

rompy pipeline config.yml --processor-config processor.yml

Features

Configuration Classes

  • BasePostprocessorConfig - Abstract base with common fields
  • NoopPostprocessorConfig - Concrete implementation for validation-only processing
  • ProcessorConfig - Type alias for Union of all processor configs

Entry Point System

Configurations are dynamically loaded via entry points:

[project.entry-points."rompy.postprocess.config"]
noop = "rompy.postprocess.config:NoopPostprocessorConfig"

CLI Integration

  • rompy postprocess - Process existing model outputs with config file
  • rompy pipeline - Updated to require --processor-config option
  • rompy backends validate - Added --processor-type option for validation

Configuration File Format

type: noop
validate_outputs: true
timeout: 3600
env_vars:
  DEBUG: "1"
  LOG_LEVEL: "INFO"

Testing

  • ✅ 18 new tests (all passing)
  • ✅ 254 total tests passing
  • ✅ 15 skipped
  • ✅ All existing pipeline tests updated

Migration Guide

For Users

  1. Replace string processor names with configuration files:

    # Create processor.yml
    cat > processor.yml <<EOF
    type: noop
    validate_outputs: true
    timeout: 3600
    EOF
    
    # Use in commands
    rompy postprocess model.yml --processor-config processor.yml
  2. Update Python code:

    from rompy.postprocess.config import NoopPostprocessorConfig
    
    config = NoopPostprocessorConfig(validate_outputs=True)
    results = model_run.postprocess(processor=config)

For Plugin Developers

  1. Create configuration class:

    from rompy.postprocess.config import BasePostprocessorConfig
    from pydantic import Field
    
    class MyProcessorConfig(BasePostprocessorConfig):
        type: str = Field("myprocessor", const=True)
        custom_field: str = Field(..., description="Custom parameter")
        
        def get_postprocessor_class(self):
            from mypackage import MyPostprocessor
            return MyPostprocessor
  2. Register in pyproject.toml:

    [project.entry-points."rompy.postprocess.config"]
    myprocessor = "mypackage:MyProcessorConfig"

Examples

See examples/backends/postprocessor_configs/ for:

  • noop_basic.yml - Minimal configuration
  • noop_advanced.yml - Advanced options
  • README.md - Usage documentation

Backward Compatibility

No backward compatibility maintained. The postprocessing framework is relatively new, and the priority is a clean, functional implementation over maintaining the old string-based API.

Related Issues

Addresses the need for configuration-driven postprocessing to match the existing backend configuration framework.

Checklist

  • Implementation complete
  • Tests passing (254/254)
  • Documentation updated (5 files)
  • Example configurations created
  • Entry points registered
  • Breaking changes documented
  • Migration guide provided

Changelog Entry

Added to HISTORY.rst for v0.6.0:

Breaking Changes:

  • Postprocessor API now requires Pydantic configuration objects instead of strings
  • CLI --processor option replaced with --processor-config (required)

New Features:

  • Pydantic-based postprocessor configuration framework
  • Entry point-based dynamic configuration loading
  • CLI support for postprocessor configuration files
  • Configuration validation and schema generation

Note: This PR does not require backward compatibility as the postprocessing system is relatively new.

Implement configuration-driven postprocessing to bring parity with run backend framework.

BREAKING CHANGES:
- Replace string-based processor names with config objects
- ModelRun.postprocess() now requires BasePostprocessorConfig instances
- CLI --processor option replaced with --processor-config (required)
- LocalPipelineBackend.execute() requires processor_config parameter

New Features:
- BasePostprocessorConfig abstract class with common fields
- NoopPostprocessorConfig concrete implementation
- Entry point-based dynamic config loading (rompy.postprocess.config)
- ProcessorConfig type alias for Union of all processor configs
- CLI --processor-config option for postprocess/pipeline commands
- CLI --processor-type option for backends validate command
- Example configs in examples/backends/postprocessor_configs/

Files Added:
- src/rompy/postprocess/config.py - Config classes and loading
- tests/test_postprocess_config.py - Comprehensive test suite (18 tests)
- examples/backends/postprocessor_configs/ - Example YAML configs

Files Modified:
- src/rompy/cli.py - Updated CLI commands
- src/rompy/model.py - Updated postprocess() method
- src/rompy/pipeline/__init__.py - Updated LocalPipelineBackend
- pyproject.toml - Added entry point registration
- tests/backends/test_enhanced_backends.py - Updated pipeline tests
- README.md - Added postprocessor configuration section
- HISTORY.rst - Added v0.6.0 changelog entry

All tests passing: 254 passed, 15 skipped
Add comprehensive documentation for Pydantic-based postprocessor configuration system across all relevant docs:

- CLI documentation: Added --processor-config and --processor-type options for postprocess, pipeline, and backends validate commands
- Backends documentation: Added complete Postprocessor Configuration section with usage examples, validation, and CLI integration
- Plugin architecture: Updated postprocessor section to reflect configuration-based approach with detailed implementation examples
- Configuration deep dive: Added postprocessor configuration section with validation, serialization, and entry point loading
- Backend reference: Comprehensive developer documentation for custom postprocessor configurations and implementations

Key documentation additions:
- Configuration file formats (YAML/JSON)
- Entry point registration and discovery
- Type-safe validation with Pydantic
- Custom postprocessor configuration development
- CLI integration and usage examples
- Configuration serialization and schema generation
- Integration with pipeline backends

All documentation now reflects the new configuration-driven postprocessing API that replaces the old string-based processor selection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant