Skip to content

Conversation

@Sohailm25
Copy link

Introduces a unified inference policy interface decoupling model serving from training infrastructure. Key features include:

  • Abstract InferencePolicy base class with standardized async-first API for model generation
  • APIPolicy for OpenAI-compatible endpoints (OpenAI, Anthropic, vLLM servers)
  • VLLMPolicy for high-throughput production serving with optional weight synchronization
  • Factory method pattern via InferencePolicy.from_client() for seamless migration
  • Comprehensive test coverage with 12 test cases covering all policy implementations
  • Complete backwards compatibility - all 181 existing tests pass unchanged

This provides researchers flexibility to evaluate models without training infrastructure and enables production deployment with optimized serving backends while maintaining a consistent interface.

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

@CLAassistant
Copy link

CLAassistant commented Oct 10, 2025

CLA assistant check
All committers have signed the CLA.

…ing from training infrastructure. Key features include:

- Abstract InferencePolicy base class with standardized async-first API for model generation
- APIPolicy for OpenAI-compatible endpoints (OpenAI, Anthropic, vLLM servers)
- VLLMPolicy for high-throughput production serving with optional weight synchronization
- Factory method pattern via InferencePolicy.from_client() for seamless migration
- Comprehensive test coverage with 12 test cases covering all policy implementations
- Complete backwards compatibility - all 181 existing tests pass unchanged

This provides researchers flexibility to evaluate models without training infrastructure and enables production deployment with optimized serving backends while maintaining a consistent interface.
@Sohailm25 Sohailm25 force-pushed the feature/inference-policy-abstraction branch from 3b02d31 to 91ca879 Compare October 10, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants