Skip to content

Optimizing prompt job Rest API #9

Open
WeichenXu123 wants to merge 20 commits intoBenWilson2:eval-dataset-unityfrom
WeichenXu123:optim-prompt-rest-api
Open

Optimizing prompt job Rest API #9
WeichenXu123 wants to merge 20 commits intoBenWilson2:eval-dataset-unityfrom
WeichenXu123:optim-prompt-rest-api

Conversation

@WeichenXu123
Copy link

@WeichenXu123 WeichenXu123 commented Aug 26, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/9/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/9/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/9/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

This PR adds optimizing prompt job Rest API /api/3.0/mlflow/optimize-prompt and /api/3.0/mlflow/get-optimize-prompt-job.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

End-to-end tests:

  1. generates input (scorer, input prompt, train dataset, eval dataset):
import mlflow
from mlflow.genai.datasets import create_dataset
mlflow.set_tracking_uri("sqlite:///mlflow.db")

from mlflow.genai.scorers import scorer
from mlflow.tracking.fluent import _get_experiment_id


# Define a custom scorer function to evaluate prompt performance with the @scorer decorator.
# The scorer function for optimization can take inputs, outputs, and expectations.
@scorer
def exact_match(expectations, outputs) -> bool:
    return expectations["answer"] == outputs["answer"]


exp_id = _get_experiment_id()
exact_match.register(name="exact_match", experiment_id=exp_id)


# Register the initial prompt
initial_template = """
Answer to this math question: {{question}}.
Return the result in a JSON string in the format of {"answer": "xxx"}.
"""

prompt = mlflow.genai.register_prompt(
    name="math",
    template=initial_template,
)


# The data can be a list of dictionaries, a pandas DataFrame, or an mlflow.genai.EvaluationDataset
# It needs to contain inputs and expectations where each row is a dictionary.
train_data = [
    {
        "inputs": {"question": "Given that $y=3$, evaluate $(1+y)^y$."},
        "expectations": {"answer": "64"},
    },
]

eval_data = [
    {
        "inputs": {
            "question": "The sum of 27 consecutive positive integers is $3^7$. What is their median?"
        },
        "expectations": {"answer": "81"},
    },
]

input_prompt = f"prompts:/math/1"
train_dataset = mlflow.genai.datasets.create_dataset("train_ds2")
train_dataset.merge_records(train_data)
print(f"train dataset: {train_dataset.dataset_id}")

eval_dataset = create_dataset("eval_ds2")
eval_dataset.merge_records(eval_data)
print(f"eval dataset: {eval_dataset.dataset_id}")
  1. Start MLflow tracking server:
export OPENAI_API_KEY=...
mlflow server --backend-store-uri=sqlite:///mlflow.db
  1. Run client code to send job creation / get job info request
#!/usr/bin/env python3
"""
Simple test script for MLflow Prompt Optimization Job API.
Tests create job and get job endpoints using requests.
"""
import os
from typing import Any
import mlflow
import os
import requests
import json
import time
import uuid

from mlflow.tracking.fluent import _get_experiment_id


# MLflow server URL
BASE_URL = "http://localhost:5000"

def create_job():
    """Create a prompt optimization job."""
    url = f"{BASE_URL}/api/3.0/mlflow/optimize-prompt"
    
    # Generate unique test data
    train_dataset_id = "d-1e9e6cd236bb407b9af496195b4aaadb"
    test_dataset_id = "d-c7bd9b6fd37549bd839282985abe78da"
    input_prompt = f"prompts:/math/1"
    
    payload = {
        "train_dataset_id": train_dataset_id,
        "eval_dataset_id": test_dataset_id,
        "prompt_url": input_prompt,
        "scorers": [{
            "custom_scorer": {
                "name": "exact_match",
                "experiment_id": "0",
            },
        }],
        "target_llm": "openai/gpt-4.1-mini",
        "algorithm": "DSPy/MIPROv2",
    }

    print("Creating prompt optimization job...")
    
    response = requests.post(url, json=payload)
    
    print(f"Response Status: {response.status_code}")
    print(f"Response Body: {response.text}")
    
    if response.status_code == 200:
        result = response.json()
        job_id = result.get("jobId")
        print(f"✓ Job created successfully! Job ID: {job_id}")
        return job_id
    else:
        print(f"✗ Failed to create job: {response.status_code}")
        return None

def get_job(job_id):
    """Get job status and details."""
    url = f"{BASE_URL}/api/3.0/mlflow/get-optimize-prompt-job/{job_id}"
    
    print(f"\nGetting job {job_id}...")
    print(f"URL: {url}")
    
    response = requests.get(url)
    
    print(f"Response Status: {response.status_code}")
    print(f"Response Body: {response.text}")
    
    if response.status_code == 200:
        result = response.json()
        status = result.get("status")
        print(f"✓ Job status: {status}")
        
        if status == "COMPLETED":
            result_data = result.get("result", {})
            print(f"Optimized prompt URL: {result_data.get('prompt_url')}")
            print(f"Evaluation score: {result_data.get('evaluation_score')}")
        
        return result
    else:
        print(f"✗ Failed to get job: {response.status_code}")
        return None

def main():
    """Main test function."""
    # Step 1: Create a job
    job_id = create_job()
    print(f"job id: {job_id}")

    time.sleep(200)

    result = get_job(0)
    print(result)


if __name__ == "__main__":
    main()

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

added optimizing prompt job Rest API

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 changed the title [WIP] Optimizing prompt Rest API [WIP] Optimizing prompt job Rest API Aug 27, 2025
@TomeHirata
Copy link

Left some comments, but overall looks great!

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 changed the title [WIP] Optimizing prompt job Rest API Optimizing prompt job Rest API Aug 31, 2025
// Result of the optimization job (only present if status is COMPLETED).
optional PromptOptimizationResult result = 2;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth implementing DELETE /mlflow/optimize-prompt/{job_id} for cancelation as a follow up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we launch the job in a python thread, we don't have a safe way to kill a python thread, in order to safely cancel job, the job needs to be executed as subprocesses. We can do it as follow-up, if needed.

Copy link

@TomeHirata TomeHirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants