Add InferencePolicy abstraction for deployment flexibility #450

Sohailm25 · 2025-10-10T12:01:42Z

Introduces a unified inference policy interface decoupling model serving from training infrastructure. Key features include:

Abstract InferencePolicy base class with standardized async-first API for model generation
APIPolicy for OpenAI-compatible endpoints (OpenAI, Anthropic, vLLM servers)
VLLMPolicy for high-throughput production serving with optional weight synchronization
Factory method pattern via InferencePolicy.from_client() for seamless migration
Comprehensive test coverage with 12 test cases covering all policy implementations
Complete backwards compatibility - all 181 existing tests pass unchanged

This provides researchers flexibility to evaluate models without training infrastructure and enables production deployment with optimized serving backends while maintaining a consistent interface.

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

CLAassistant · 2025-10-10T12:01:57Z

All committers have signed the CLA.

…ing from training infrastructure. Key features include: - Abstract InferencePolicy base class with standardized async-first API for model generation - APIPolicy for OpenAI-compatible endpoints (OpenAI, Anthropic, vLLM servers) - VLLMPolicy for high-throughput production serving with optional weight synchronization - Factory method pattern via InferencePolicy.from_client() for seamless migration - Comprehensive test coverage with 12 test cases covering all policy implementations - Complete backwards compatibility - all 181 existing tests pass unchanged This provides researchers flexibility to evaluate models without training infrastructure and enables production deployment with optimized serving backends while maintaining a consistent interface.

Sohailm25 force-pushed the feature/inference-policy-abstraction branch from 3b02d31 to 91ca879 Compare October 10, 2025 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add InferencePolicy abstraction for deployment flexibility #450

Add InferencePolicy abstraction for deployment flexibility #450

Uh oh!

Sohailm25 commented Oct 10, 2025

Uh oh!

CLAassistant commented Oct 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add InferencePolicy abstraction for deployment flexibility #450

Are you sure you want to change the base?

Add InferencePolicy abstraction for deployment flexibility #450

Uh oh!

Conversation

Sohailm25 commented Oct 10, 2025

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

CLAassistant commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Oct 10, 2025 •

edited

Loading