Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
c61cc37
improving error logging in optimizer
finitearth Nov 21, 2025
a09de91
fix error messaging inside judge
finitearth Nov 21, 2025
d4124d1
improve readablity
finitearth Nov 21, 2025
9a0ab28
fail save for task cache
finitearth Nov 21, 2025
6e89ff9
extend api llm interface
finitearth Nov 21, 2025
32c93f6
bring
finitearth Nov 22, 2025
cef2370
bring
finitearth Nov 22, 2025
d2d9774
merge
finitearth Nov 22, 2025
c7a202d
restrain dependency restrictions
finitearth Nov 22, 2025
b8134e2
explicitly add scipy
finitearth Nov 22, 2025
03a5ec7
replace only one occurence of 'input' for few shot examples
finitearth Nov 22, 2025
bb146f1
improve task description handling
finitearth Nov 22, 2025
5272703
incoperate extraction description
finitearth Nov 22, 2025
2cbaedb
renaming predictors AND prompt creation
finitearth Nov 25, 2025
350b54e
update prompt creation
finitearth Nov 25, 2025
1d45840
optimize capo survival
finitearth Nov 25, 2025
5ad89dc
improve fileoutput callback handling
finitearth Nov 26, 2025
6b668a3
Change vllm test-dependency version to exact match
finitearth Nov 26, 2025
2b087ca
incoperate comments
finitearth Nov 27, 2025
af53cd3
Merge branch 'Feature/API-Enhancements' of https://github.com/finitea…
finitearth Nov 27, 2025
fa1ddc4
incoperated comments
finitearth Nov 28, 2025
275e5e6
specify ruff arguments
finitearth Nov 28, 2025
a44bc48
make ruff happy
finitearth Nov 28, 2025
3fde9bd
first draft for read me
finitearth Nov 28, 2025
a77ca76
Adjust image heights in README
finitearth Nov 28, 2025
4eba7da
allow for no init prompts in helper functions
finitearth Nov 28, 2025
be16e15
Merge branch 'Feature/API-Enhancements' of https://github.com/finitea…
finitearth Nov 28, 2025
eeea270
fix prompt type in helper
finitearth Nov 28, 2025
e40bf4b
relax none variables
finitearth Nov 28, 2025
352e53d
fix meta prompt
finitearth Nov 28, 2025
31dd222
relax api timemout constraints
finitearth Nov 28, 2025
52fc799
relax token constraints
finitearth Nov 28, 2025
e73e9df
Revise README with updated images and details
mo374z Nov 29, 2025
1ba9a2c
Small changes
mo374z Nov 29, 2025
3ee0ffe
Update documentation page to match modules
mo374z Nov 30, 2025
50c358f
move LMU logo
mo374z Nov 30, 2025
d72b5fb
Update Logos
mo374z Nov 30, 2025
0e53e0d
add release notes
Nov 30, 2025
4f7a6c8
running pre-commit
Nov 30, 2025
50e2899
only import base llm if type checking
finitearth Nov 30, 2025
484844a
fix tests
finitearth Nov 30, 2025
1b6ca5b
Remove comment
finitearth Nov 30, 2025
4eff0db
add create_prompts_from_task_description to imports
finitearth Nov 30, 2025
5666134
Merge branch 'Feature/API-Enhancements' of https://github.com/finitea…
finitearth Nov 30, 2025
7c5f4ff
fix formatting
mo374z Nov 30, 2025
efa27f3
allow capo to overwrite eval strategy
finitearth Nov 30, 2025
7d80705
Merge branch 'Feature/API-Enhancements' of https://github.com/finitea…
finitearth Nov 30, 2025
6dcf75d
clarify prompt format
finitearth Nov 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .coverage
Binary file not shown.
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ repos:
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.14.6
hooks:
- id: ruff-check
args: [ --fix ]
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
Expand Down
116 changes: 52 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
![promptolution](https://github.com/user-attachments/assets/84c050bd-61a1-4f2e-bc4e-874d9b4a69af)

![Coverage](https://img.shields.io/badge/Coverage-91%25-brightgreen)
[![CI](https://github.com/finitearth/promptolution/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/finitearth/promptolution/actions/workflows/ci.yml)
Expand All @@ -7,104 +6,93 @@
![Python Versions](https://img.shields.io/badge/Python%20Versions-≥3.10-blue)
[![Getting Started](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb)

Promptolution is a library that provides a modular and extensible framework for implementing prompt tuning for single tasks and larger experiments. It offers a user-friendly interface to assemble the core components for various prompt optimization tasks.
![promptolution](https://github.com/user-attachments/assets/84c050bd-61a1-4f2e-bc4e-874d9b4a69af)

<p align="center">
<img height="60" alt="lmu_logo" src="https://github.com/user-attachments/assets/5aecd0d6-fc2d-48b2-b395-d1877578a3c5" />
<img height="60" alt="mcml" src="https://github.com/user-attachments/assets/d9f3b18e-a5ec-4c3f-b449-e57cb977f483" />
<img height="60" alt="ellis_logo" src="https://github.com/user-attachments/assets/60654a27-0f8f-4624-a1d5-5122f2632bec" />
<img height="60" alt="uni_freiburg_color" src="https://github.com/user-attachments/assets/f5eabbd2-ae6a-497b-857b-71958ed77335" />
<img height="60" alt="tum_logo" src="https://github.com/user-attachments/assets/982ec2f0-ec14-4dc2-8d75-bfae09d4fa73" />
</p>

## 🚀 What is Promptolution?

This project was developed by [Timo Heiß](https://www.linkedin.com/in/timo-heiss/), [Moritz Schlager](https://www.linkedin.com/in/moritz-schlager/) and [Tom Zehle](https://www.linkedin.com/in/tom-zehle/) as part of a study program at LMU Munich.
**Promptolution** is a unified, modular framework for prompt optimization built for researchers and advanced practitioners who want full control over their experimental setup. Unlike end-to-end application frameworks with high abstraction, promptolution focuses exclusively on the optimization stage, providing a clean, transparent, and extensible API. It allows for simple prompt optimization for one task up to large-scale reproducible benchmark experiments.

<img width="808" height="356" alt="promptolution_framework" src="https://github.com/user-attachments/assets/e3d05493-30e3-4464-b0d6-1d3e3085f575" />

### Key Features

## Installation
* Implementation of many current prompt optimizers out of the box.
* Unified LLM backend supporting API-based models, Local LLMs, and vLLM clusters.
* Built-in response caching to save costs and parallelized inference for speed.
* Detailed logging and token usage tracking for granular post-hoc analysis.

Use pip to install our library:
Have a look at our [Release Notes](https://finitearth.github.io/promptolution/release-notes/) for the latest updates to promptolution.

## 📦 Installation

```
pip install promptolution[api]
```

If you want to run your prompt optimization locally, either via transformers or vLLM, consider running:
Local inference via vLLM or transformers:

```
pip install promptolution[vllm,transformers]
```

Alternatively, clone the repository, run
From source:

```
git clone https://github.com/finitearth/promptolution.git
cd promptolution
poetry install
```

to install the necessary dependencies. You might need to install [pipx](https://pipx.pypa.io/stable/installation/) and [poetry](https://python-poetry.org/docs/) first.

## Usage

To get started right away, take a look at our [getting started notebook](https://github.com/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb) and our [other demos and tutorials](https://github.com/finitearth/promptolution/blob/main/tutorials).
For more details, a comprehensive **documentation** with API reference is availabe at https://finitearth.github.io/promptolution/.
## 🔧 Quickstart

### Featured Optimizers
Start with the **Getting Started tutorial**:
[https://github.com/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb](https://github.com/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb)

| **Name** | **Paper** | **init prompts** | **Exploration** | **Costs** | **Parallelizable** | **Utilizes Fewshot Examples** |
| :-----------: | :----------------------------------------------: | :--------------: | :-------------: | :-------: | :-------------------: | :---------------------------: |
| `CAPO` | [Zehle et al.](https://arxiv.org/abs/2504.16005) | _required_ | 👍 | 💲 | ✅ | ✅ |
| `EvoPromptDE` | [Guo et al.](https://arxiv.org/abs/2309.08532) | _required_ | 👍 | 💲💲 | ✅ | ❌ |
| `EvoPromptGA` | [Guo et al.](https://arxiv.org/abs/2309.08532) | _required_ | 👍 | 💲💲 | ✅ | ❌ |
| `OPRO` | [Yang et al.](https://arxiv.org/abs/2309.03409) | _optional_ | 👎 | 💲💲 | ❌ | ❌ |
Full docs:
[https://finitearth.github.io/promptolution/](https://finitearth.github.io/promptolution/)

### Core Components

- `Task`: Encapsulates initial prompts, dataset features, targets, and evaluation methods.
- `Predictor`: Implements the prediction logic, interfacing between the `Task` and `LLM` components.
- `LLM`: Unifies the process of obtaining responses from language models, whether locally hosted or accessed via API.
- `Optimizer`: Implements prompt optimization algorithms, utilizing the other components during the optimization process.

### Key Features

- Modular and object-oriented design
- Extensible architecture
- Easy-to-use interface for assembling experiments
- Parallelized LLM requests for improved efficiency
- Integration with langchain for standardized LLM API calls
- Detailed logging and callback system for optimization analysis
## 🧠 Featured Optimizers

## Changelog
| **Name** | **Paper** | **Init prompts** | **Exploration** | **Costs** | **Parallelizable** | **Few-shot** |
| ---- | ---- | ---- |---- |---- | ----|---- |
| `CAPO` | [Zehle et al., 2025](https://arxiv.org/abs/2504.16005) | required | 👍 | 💲 | ✅ | ✅ |
| `EvoPromptDE` | [Guo et al., 2023](https://arxiv.org/abs/2309.08532) | required | 👍 | 💲💲 | ✅ | ❌ |
| `EvoPromptGA` | [Guo et al., 2023](https://arxiv.org/abs/2309.08532) | required | 👍 | 💲💲 | ✅ | ❌ |
| `OPRO` | [Yang et al., 2023](https://arxiv.org/abs/2309.03409) | optional | 👎 | 💲💲 | ❌ | ❌ |

Release notes for each version of the library can be found [here](https://finitearth.github.io/promptolution/release-notes/)
## 🏗 Components

## Contributing
* **`Task`** – Manages the dataset, evaluation metrics, and subsampling.
* **`Predictor`** – Defines how to extract the answer from the model's response.
* **`LLM`** – A unified interface handling inference, token counting, and concurrency.
* **`Optimizer`** – The core component that implements the algorithms that refine prompts.
* **`ExperimentConfig`** – A configuration abstraction to streamline and parametrize large-scale scientific experiments.

The first step to contributing is to open an issue describing the bug, feature, or enhancements. Ensure the issue is clearly described, assigned, and properly tagged. All work should be linked to an open issue.
## 🤝 Contributing

### Code Style and Linting
Open an issue → create a branch → PR → CI → review → merge.
Branch naming: `feature/...`, `fix/...`, `chore/...`, `refactor/...`.

We use Black for code formatting, Flake8 for linting, pydocstyle for docstring conventions (Google format), and isort to sort imports. All these checks are enforced via pre-commit hooks, which automatically run on every commit. Install the pre-commit hooks to ensure that all checks run automatically:
Please ensure to use pre-commit, which assists with keeping the code quality high:

```
pre-commit install
```

To run all checks manually:

```
pre-commit run --all-files
```

### Branch Protection and Merging Guidelines

- The main branch is protected. No direct commits are allowed for non-administrators.
- Rebase your branch on main before opening a pull request.
- All contributions must be made on dedicated branches linked to specific issues.
- Name the branch according to {prefix}/{description} with one of the prefixes fix, feature, chore, or refactor.
- A pull request must have at least one approval from a code owner before it can be merged into main.
- CI checks must pass before a pull request can be merged.
- New releases will only be created by code owners.

### Testing

We use pytest to run tests, and coverage to track code coverage. Tests automatically run on pull requests and pushes to the main branch, but please ensure they also pass locally before pushing!
To run the tests with coverage locally, use the following commands or your IDE's test runner:
We encourage every contributor to also write tests, that automatically check if the implementation works as expected:

```
poetry run python -m coverage run -m pytest
```

To see the coverage report run:
```
poetry run python -m coverage report
```

Developed by **Timo Heiß**, **Moritz Schlager**, and **Tom Zehle** (LMU Munich, MCML, ELLIS, TUM, Uni Freiburg).
5 changes: 3 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,6 @@ Or clone our GitHub repository:
- [Optimizers](api/optimizers.md)
- [Predictors](api/predictors.md)
- [Tasks](api/tasks.md)
- [Callbacks](api/callbacks.md)
- [Config](api/config.md)
- [Helpers](api/helpers.md)
- [Utils](api/utils.md)
- [Exemplar Selectors](api/examplar_selectors.md)
13 changes: 13 additions & 0 deletions docs/release-notes/v2.2.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Release v2.2.0
### What's changed

#### Added features:
* Extended interface of APILLM allowing to pass kwargs to the API
* Improve asynchronous parallelization of LLM calls shortening inference times
* Introduced a `Prompt` class to encapsulate instructions and few-shot examples

#### Further changes:
* Improved error handling
* Improved task-description infusion mechanism for meta-prompts

**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/2.1.0...v2.2.0)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ nav:
- Home: index.md
- Release Notes:
- Overview: release-notes.md
- v2.2.0: release-notes/v2.2.0.md
- v2.1.0: release-notes/v2.1.0.md
- v2.0.1: release-notes/v2.0.1.md
- v2.0.0: release-notes/v2.0.0.md
Expand Down
5 changes: 5 additions & 0 deletions promptolution/exemplar_selectors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,8 @@

from promptolution.exemplar_selectors.random_search_selector import RandomSearchSelector
from promptolution.exemplar_selectors.random_selector import RandomSelector

__all__ = [
"RandomSelector",
"RandomSearchSelector",
]
6 changes: 4 additions & 2 deletions promptolution/exemplar_selectors/base_exemplar_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

from typing import TYPE_CHECKING, Optional

from promptolution.utils.prompt import Prompt

if TYPE_CHECKING: # pragma: no cover
from promptolution.predictors.base_predictor import BasePredictor
from promptolution.tasks.base_task import BaseTask
Expand Down Expand Up @@ -33,11 +35,11 @@ def __init__(self, task: "BaseTask", predictor: "BasePredictor", config: Optiona
config.apply_to(self)

@abstractmethod
def select_exemplars(self, prompt: str, n_examples: int = 5) -> str:
def select_exemplars(self, prompt: Prompt, n_examples: int = 5) -> Prompt:
"""Select exemplars based on the given prompt.

Args:
prompt (str): The input prompt to base the exemplar selection on.
prompt (Prompt): The input prompt to base the exemplar selection on.
n_examples (int, optional): The number of exemplars to select. Defaults to 5.

Returns:
Expand Down
7 changes: 4 additions & 3 deletions promptolution/exemplar_selectors/random_search_selector.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Random search exemplar selector."""

from promptolution.exemplar_selectors.base_exemplar_selector import BaseExemplarSelector
from promptolution.utils.prompt import Prompt


class RandomSearchSelector(BaseExemplarSelector):
Expand All @@ -10,7 +11,7 @@ class RandomSearchSelector(BaseExemplarSelector):
evaluates their performance, and selects the best performing set.
"""

def select_exemplars(self, prompt: str, n_trials: int = 5) -> str:
def select_exemplars(self, prompt: Prompt, n_trials: int = 5) -> Prompt:
"""Select exemplars using a random search strategy.

This method generates multiple sets of random examples, evaluates their performance
Expand All @@ -21,7 +22,7 @@ def select_exemplars(self, prompt: str, n_trials: int = 5) -> str:
n_trials (int, optional): The number of random trials to perform. Defaults to 5.

Returns:
str: The best performing prompt, which includes the original prompt and the selected exemplars.
Prompt: The best performing prompt, which includes the original prompt and the selected exemplars.
"""
best_score = 0.0
best_prompt = prompt
Expand All @@ -30,7 +31,7 @@ def select_exemplars(self, prompt: str, n_trials: int = 5) -> str:
_, seq = self.task.evaluate(
prompt, self.predictor, eval_strategy="subsample", return_seq=True, return_agg_scores=False
)
prompt_with_examples = "\n\n".join([prompt] + [seq[0][0]]) + "\n\n"
prompt_with_examples = Prompt(prompt.instruction, [seq[0][0]])
# evaluate prompts as few shot prompt
score = self.task.evaluate(prompt_with_examples, self.predictor, eval_strategy="subsample")[0]
if score > best_score:
Expand Down
9 changes: 5 additions & 4 deletions promptolution/exemplar_selectors/random_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from typing import TYPE_CHECKING, List, Optional

from promptolution.exemplar_selectors.base_exemplar_selector import BaseExemplarSelector
from promptolution.utils.prompt import Prompt

if TYPE_CHECKING: # pragma: no cover
from promptolution.predictors.base_predictor import BasePredictor
Expand Down Expand Up @@ -37,18 +38,18 @@ def __init__(
self.desired_score = desired_score
super().__init__(task, predictor, config)

def select_exemplars(self, prompt: str, n_examples: int = 5) -> str:
def select_exemplars(self, prompt: Prompt, n_examples: int = 5) -> Prompt:
"""Select exemplars using a random selection strategy.

This method generates random examples and selects those that are evaluated as correct
(score == self.desired_score) until the desired number of exemplars is reached.

Args:
prompt (str): The input prompt to base the exemplar selection on.
prompt (Prompt): The input prompt to base the exemplar selection on.
n_examples (int, optional): The number of exemplars to select. Defaults to 5.

Returns:
str: A new prompt that includes the original prompt and the selected exemplars.
Prompt: A new prompt that includes the original prompt and the selected exemplars.
"""
examples: List[str] = []
while len(examples) < n_examples:
Expand All @@ -59,4 +60,4 @@ def select_exemplars(self, prompt: str, n_examples: int = 5) -> str:
seq = seqs[0][0]
if score == self.desired_score:
examples.append(seq)
return "\n\n".join([prompt] + examples) + "\n\n"
return Prompt(prompt.instruction, examples)
Loading
Loading