Optimizing prompt job Rest API by WeichenXu123 · Pull Request #9 · BenWilson2/mlflow

WeichenXu123 · 2025-08-26T13:55:00Z

🛠 DevTools 🛠

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/9/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/9/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/9/merge

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

This PR adds optimizing prompt job Rest API /api/3.0/mlflow/optimize-prompt and /api/3.0/mlflow/get-optimize-prompt-job.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

End-to-end tests:

generates input (scorer, input prompt, train dataset, eval dataset):

import mlflow
from mlflow.genai.datasets import create_dataset
mlflow.set_tracking_uri("sqlite:///mlflow.db")

from mlflow.genai.scorers import scorer
from mlflow.tracking.fluent import _get_experiment_id


# Define a custom scorer function to evaluate prompt performance with the @scorer decorator.
# The scorer function for optimization can take inputs, outputs, and expectations.
@scorer
def exact_match(expectations, outputs) -> bool:
    return expectations["answer"] == outputs["answer"]


exp_id = _get_experiment_id()
exact_match.register(name="exact_match", experiment_id=exp_id)


# Register the initial prompt
initial_template = """
Answer to this math question: {{question}}.
Return the result in a JSON string in the format of {"answer": "xxx"}.
"""

prompt = mlflow.genai.register_prompt(
    name="math",
    template=initial_template,
)


# The data can be a list of dictionaries, a pandas DataFrame, or an mlflow.genai.EvaluationDataset
# It needs to contain inputs and expectations where each row is a dictionary.
train_data = [
    {
        "inputs": {"question": "Given that $y=3$, evaluate $(1+y)^y$."},
        "expectations": {"answer": "64"},
    },
]

eval_data = [
    {
        "inputs": {
            "question": "The sum of 27 consecutive positive integers is $3^7$. What is their median?"
        },
        "expectations": {"answer": "81"},
    },
]

input_prompt = f"prompts:/math/1"
train_dataset = mlflow.genai.datasets.create_dataset("train_ds2")
train_dataset.merge_records(train_data)
print(f"train dataset: {train_dataset.dataset_id}")

eval_dataset = create_dataset("eval_ds2")
eval_dataset.merge_records(eval_data)
print(f"eval dataset: {eval_dataset.dataset_id}")

Start MLflow tracking server:

export OPENAI_API_KEY=...
mlflow server --backend-store-uri=sqlite:///mlflow.db

Run client code to send job creation / get job info request

#!/usr/bin/env python3
"""
Simple test script for MLflow Prompt Optimization Job API.
Tests create job and get job endpoints using requests.
"""
import os
from typing import Any
import mlflow
import os
import requests
import json
import time
import uuid

from mlflow.tracking.fluent import _get_experiment_id


# MLflow server URL
BASE_URL = "http://localhost:5000"

def create_job():
    """Create a prompt optimization job."""
    url = f"{BASE_URL}/api/3.0/mlflow/optimize-prompt"
    
    # Generate unique test data
    train_dataset_id = "d-1e9e6cd236bb407b9af496195b4aaadb"
    test_dataset_id = "d-c7bd9b6fd37549bd839282985abe78da"
    input_prompt = f"prompts:/math/1"
    
    payload = {
        "train_dataset_id": train_dataset_id,
        "eval_dataset_id": test_dataset_id,
        "prompt_url": input_prompt,
        "scorers": [{
            "custom_scorer": {
                "name": "exact_match",
                "experiment_id": "0",
            },
        }],
        "target_llm": "openai/gpt-4.1-mini",
        "algorithm": "DSPy/MIPROv2",
    }

    print("Creating prompt optimization job...")
    
    response = requests.post(url, json=payload)
    
    print(f"Response Status: {response.status_code}")
    print(f"Response Body: {response.text}")
    
    if response.status_code == 200:
        result = response.json()
        job_id = result.get("jobId")
        print(f"✓ Job created successfully! Job ID: {job_id}")
        return job_id
    else:
        print(f"✗ Failed to create job: {response.status_code}")
        return None

def get_job(job_id):
    """Get job status and details."""
    url = f"{BASE_URL}/api/3.0/mlflow/get-optimize-prompt-job/{job_id}"
    
    print(f"\nGetting job {job_id}...")
    print(f"URL: {url}")
    
    response = requests.get(url)
    
    print(f"Response Status: {response.status_code}")
    print(f"Response Body: {response.text}")
    
    if response.status_code == 200:
        result = response.json()
        status = result.get("status")
        print(f"✓ Job status: {status}")
        
        if status == "COMPLETED":
            result_data = result.get("result", {})
            print(f"Optimized prompt URL: {result_data.get('prompt_url')}")
            print(f"Evaluation score: {result_data.get('evaluation_score')}")
        
        return result
    else:
        print(f"✗ Failed to get job: {response.status_code}")
        return None

def main():
    """Main test function."""
    # Step 1: Create a job
    job_id = create_job()
    print(f"job id: {job_id}")

    time.sleep(200)

    result = get_job(0)
    print(result)


if __name__ == "__main__":
    main()

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

added optimizing prompt job Rest API

Components

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mlflow/protos/service.proto

mlflow/server/handlers.py

TomeHirata · 2025-08-27T04:45:05Z

Left some comments, but overall looks great!

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mlflow/protos/service.proto

TomeHirata · 2025-09-02T22:01:54Z

mlflow/protos/service.proto

+    // Result of the optimization job (only present if status is COMPLETED).
+    optional PromptOptimizationResult result = 2;
+  }
+}


Maybe it's worth implementing DELETE /mlflow/optimize-prompt/{job_id} for cancelation as a follow up

Currently we launch the job in a python thread, we don't have a safe way to kill a python thread, in order to safely cancel job, the job needs to be executed as subprocesses. We can do it as follow-up, if needed.

TomeHirata

LGTM

WeichenXu123 added 7 commits August 26, 2025 20:39

init

9df8686

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

gen proto code

b708568

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

handlers

e1e6a2f

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update proto

fb8373f

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

regen code

13c535c

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

0c03304

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

use threadpool

2d80003

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 changed the title ~~[WIP] Optimizing prompt Rest API~~ [WIP] Optimizing prompt job Rest API Aug 27, 2025