A lightweight research toolkit for multimodal classification using GPT (text/image/both) and CLIP (zero-shot or fine-tuned). Built for quick, reproducible experiments and paper appendices.
- Try the Colab Demo — Run the full tutorial in your browser
- Case Studies — Complete replication scripts in the
Case_analysis/folder - GitHub Repository
LabelGenius helps you:
- Label with GPT — Classify text, images, or multimodal data using zero-shot or few-shot prompting
- Classify with CLIP — Run zero-shot classification or fine-tune a lightweight model
- Evaluate results — Get accuracy, F1 scores, and confusion matrices automatically
- Estimate costs — Calculate API expenses before running large-scale experiments
From PyPI (recommended):
pip install labelgeniusPinned version:
pip install labelgenius==0.1.6From source:
git clone https://github.com/mediaccs/LabelGenius
cd labelgenius
pip install -e .Requirements: Python 3.9+, dependencies auto-installed from pyproject.toml. To manually install dependencies, see pyproject.toml for the full list. For GPT features, get an API key from OpenAI.
| Function | Purpose |
|---|---|
classification_GPT |
Text classification (zero-shot or few-shot) |
generate_GPT_finetune_jsonl |
Prepare training data for fine-tuning |
finetune_GPT |
Fine-tune a GPT model |
| Function | Purpose |
|---|---|
classification_CLIP_0_shot |
Zero-shot classification with CLIP |
classification_CLIP_finetuned |
Use a fine-tuned CLIP model |
finetune_CLIP |
Fine-tune CLIP on your dataset |
| Function | Purpose |
|---|---|
auto_verification |
Calculate classification metrics |
price_estimation |
Estimate OpenAI API costs |
from labelgenius import (
classification_GPT,
classification_CLIP_0_shot,
finetune_CLIP,
classification_CLIP_finetuned,
auto_verification,
price_estimation,
)- Text only: CSV/Excel/JSON with your text columns
- Images only: Directory of images (with optional ID mapping table)
- Multimodal: Table with
image_idcolumn + images atimage_dir/<image_id>.jpg
LabelGenius supports both GPT-4.1 and GPT-5 with different control mechanisms:
Control randomness with temperature (0 = deterministic, 2 = creative):
df = classification_GPT(
text_path="data/headlines.xlsx",
category=["sports", "politics", "tech"],
prompt="Classify this headline into one category.",
column_4_labeling=["headline"],
model="gpt-4.1",
temperature=0.7,
mode="text",
output_column_name="predicted_category"
)GPT-5 supports both reasoning and non-reasoning modes controlled by reasoning_effort:
df = classification_GPT(
text_path="data/headlines.xlsx",
category=["sports", "politics", "tech"],
prompt="Classify this headline into one category.",
column_4_labeling=["headline"],
model="gpt-5",
reasoning_effort="medium", # minimal | low | medium | high
mode="text",
output_column_name="predicted_category"
)Reasoning Modes:
"minimal"— Non-reasoning mode (fast, standard responses)"low"/"medium"/"high"— Reasoning modes (increasing reasoning depth)
Classify text, images, or multimodal data using GPT models.
Parameters:
text_path— Path to your data file (CSV/Excel/JSON)category— List of possible category labelsprompt— Instruction prompt for classificationcolumn_4_labeling— Column(s) to use as inputmodel— Model name (e.g., "gpt-4.1", "gpt-5")temperature— Randomness control for GPT-4.1 (0-2)reasoning_effort— Reasoning depth for GPT-5 ("minimal", "low", "medium", "high")mode— Input type: "text", "image", or "both"output_column_name— Name for prediction columnnum_themes— Number of themes for multi-label tasks (default: 1)num_votes— Number of runs for majority voting (default: 1)
Example (Single-Label Text):
df = classification_GPT(
text_path="data/articles.csv",
category=["positive", "negative", "neutral"],
prompt="Classify the sentiment of this article.",
column_4_labeling=["title", "content"],
model="gpt-4.1",
temperature=0.5,
mode="text",
output_column_name="sentiment"
)Example (Multi-Label with Majority Voting):
prompt = """Label these 5 topics with 0 or 1. Return up to two 1s.
Answer format: [0,0,0,0,0]"""
df = classification_GPT(
text_path="data/posts.xlsx",
category=["0", "1"],
prompt=prompt,
column_4_labeling=["post_text"],
model="gpt-5",
reasoning_effort="low",
mode="text",
output_column_name="topic_labels",
num_themes=5,
num_votes=3 # Run 3 times and use majority vote
)Note: You can control the number of labels by adjusting the prompt. For example:
- "Return exactly one 1" — Single label
- "Return up to two 1s" — Up to 2 labels
- "Return up to three 1s" — Up to 3 labels
- "Return any number of 1s" — Unrestricted multi-label
Example (Multimodal):
df = classification_GPT(
text_path="data/products.csv",
category=["electronics", "clothing", "food"],
prompt="Classify this product based on text and image.",
column_4_labeling=["product_name", "description"],
model="gpt-4.1",
temperature=0.3,
mode="both", # Uses both text and images
image_dir="data/product_images",
output_column_name="category"
)Prepare training data in JSONL format for GPT fine-tuning.
Parameters:
df— DataFrame containing your training dataoutput_path— Where to save the JSONL filesystem_prompt— Instruction prompt for the modelinput_col— Column(s) to use as inputlabel_col— Column(s) containing true labels
Example:
generate_GPT_finetune_jsonl(
df=training_data,
output_path="finetune_data.jsonl",
system_prompt="Classify the sentiment of this review.",
input_col=["review_text"],
label_col=["sentiment"]
)Launch a GPT fine-tuning job using OpenAI's API.
Parameters:
training_file_path— Path to your JSONL training filemodel— Base model to fine-tune (must be a snapshot model, i.e., model name ending with a specific date, e.g., "gpt-4o-mini-2024-07-18")hyperparameters— Dict with batch_size, learning_rate_multiplier, etc.
Returns: Fine-tuned model identifier
Example:
model_id = finetune_GPT(
training_file_path="finetune_data.jsonl",
model="gpt-4o-mini-2024-07-18",
hyperparameters={
"batch_size": 8,
"learning_rate_multiplier": 0.01,
"n_epochs": 3
}
)
print(f"Fine-tuned model: {model_id}")Note: GPT-5 fine-tuning availability varies. Use GPT-4 snapshots if needed.
Using the fine-tuned model:
After fine-tuning completes, you'll receive a model ID via email or in the Python output. Use this ID with classification_GPT by replacing the model parameter:
# Use your fine-tuned model for classification
df = classification_GPT(
text_path="data/test.csv",
category=["positive", "negative", "neutral"],
prompt="Classify the sentiment of this review.",
column_4_labeling=["review_text"],
model="ft:gpt-4o-mini-2024-07-18:your-org:model-name:abc123", # Your fine-tuned model ID
temperature=0.5,
mode="text",
output_column_name="sentiment"
)Perform zero-shot classification using CLIP without training.
Parameters:
text_path— Path to your data fileimg_dir— Directory containing images (optional)mode— Input type: "text", "image", or "both"prompt— List of category names/descriptions for zero-shottext_column— Column(s) to use for text inputpredict_column— Name for prediction column
Example (Text Only):
df = classification_CLIP_0_shot(
text_path="data/articles.csv",
mode="text",
prompt=["sports news", "political news", "technology news"],
text_column=["headline", "summary"],
predict_column="clip_category"
)Example (Image Only):
df = classification_CLIP_0_shot(
text_path="data/image_ids.csv",
img_dir="data/images",
mode="image",
prompt=["a photo of a cat", "a photo of a dog", "a photo of a bird"],
predict_column="animal_type"
)Example (Multimodal):
df = classification_CLIP_0_shot(
text_path="data/posts.csv",
img_dir="data/post_images",
mode="both",
prompt=["advertisement", "personal photo", "news image"],
text_column=["caption"],
predict_column="image_category"
)Fine-tune CLIP on your labeled dataset with a small classification head.
Parameters:
mode— Input type: "text", "image", or "both"text_path— Path to training datatext_column— Column(s) for text input (if using text)img_dir— Image directory (if using images)true_label— Column containing true labelsmodel_name— Filename to save the trained modelnum_epochs— Number of training epochsbatch_size— Training batch sizelearning_rate— Learning rate for optimizer
Returns: Best validation accuracy achieved
Example:
best_acc = finetune_CLIP(
mode="both",
text_path="data/train.csv",
text_column=["headline", "description"],
img_dir="data/train_images",
true_label="category_id",
model_name="my_clip_model.pth",
num_epochs=10,
batch_size=16,
learning_rate=1e-5
)
print(f"Best validation accuracy: {best_acc:.2%}")Use your fine-tuned CLIP model to classify new data.
Parameters:
mode— Input type: "text", "image", or "both"text_path— Path to test dataimg_dir— Image directory (if using images)model_name— Path to your trained model filetext_column— Column(s) for text inputpredict_column— Name for prediction columnnum_classes— Number of categories in your task
Example:
df = classification_CLIP_finetuned(
mode="both",
text_path="data/test.csv",
img_dir="data/test_images",
model_name="my_clip_model.pth",
text_column=["headline", "description"],
predict_column="predicted_category",
num_classes=24
)Calculate classification metrics including accuracy, F1 scores, and confusion matrix.
Parameters:
df— DataFrame with predictions and true labelspredicted_cols— List of prediction column namestrue_cols— List of true label column namescategory— List of possible categories
Example:
auto_verification(
df=results,
predicted_cols=["gpt_pred", "clip_pred"],
true_cols=["true_label", "true_label"],
category=["0", "1"]
)Output: Prints accuracy, F1 scores, and displays confusion matrix visualization.
Estimate the cost of OpenAI API calls.
Parameters:
response— OpenAI API response objectnum_rows— Number of data rows processedinput_cost_per_million— Input token cost (in USD per 1M tokens)output_cost_per_million— Output token cost (in USD per 1M tokens)num_votes— Number of API calls per row (for majority voting)
Returns: Estimated total cost in USD
Example:
cost = price_estimation(
response=api_response,
num_rows=1000,
input_cost_per_million=5.0,
output_cost_per_million=15.0,
num_votes=3
)
print(f"Estimated total cost: ${cost:.2f}")Demo Scale:
The Colab demo uses ~20 samples per task for speed. Expect high variance in small-sample results.
Training Data Requirements:
Fine-tuning typically requires hundreds of examples per class. Small datasets (~20 examples) may lead to overfitting or majority-class predictions.
Image Requirements:
For multimodal classification, ensure your image_id column matches filenames in image_dir.
Model Availability:
GPT-5 fine-tuning may not be available for all accounts. Use GPT-4 snapshots as alternatives.
We welcome contributions! Feel free to open an issue for bugs or feature requests, or submit a pull request to improve the codebase and documentation.
LabelGenius is released under the MIT License.