Skip to content

Releases: starfishdata/starfish

v0.1.3

28 May 06:59
aeed17c

Choose a tag to compare

Release v0.1.3

🚀 Highlights

Data Generation Template

Full release of the Data Generation Template with core registry and two prebuilt data generation template.

  • Generate By Topic: The generate_by_topic template is designed to create diverse synthetic data across multiple topics based on user instructions. It can automatically generate relevant topics if not provided and handles deduplication across generated content.
  • Function Calling: This template replicates the methodology from the APIGen paper to generate high-quality synthetic datasets for training function-calling AI models.

Data Ingest (Preview)

Introducing the first preview release of Data Ingest – Enables ingestion of raw content into structured text format. Current support includes:

  • PDF
  • HTML
  • YouTube
  • And more formats coming soon

📌 *Note: Data Ingest is released as a preview feature and will continue to evolve with broader format support and Data Generation Template with data ingestion components.

Other Improvements

  • Shared State Stability: Improved shared state to be thread-safe
  • Bug fixes

Full Changelog: v0.1.2...v0.1.3

v0.1.2

08 May 17:58

Choose a tag to compare

Release v0.1.2

🚀 Highlights

Data Factory Enhancements

  • Improved Resume Functionality: Refactored and separated resume logic from resume_from_checkpoint for greater clarity and resilience
  • Index Handling: Fixed indexing issues and introduced get_index to improve usability
  • DLQ Support: Integrated Dead Letter Queue (DLQ) handling for better error tracking and recovery
  • Stability & Lock Fixes: Resolved multiple database locking issues to improve concurrent processing and reliability
  • Internal Refactoring: Reorganized factory logic for better maintainability and system performance

Data Generation Template (Preview)

  • Initial release of the Data Generation Template Core
  • This version lays the groundwork for data simulation features, with more test coverage and enhancements planned in future releases

Other Improvements

  • Dependency Cleanup: Trimmed unnecessary package dependencies to streamline the environment and reduce bloat
  • Jupyter/Colab Support: Added test case for Jupyter Notebook to improve compatibility with Google Colab and similar environments

📌 Note: The Data Generation Template is an early-stage feature and will evolve significantly in upcoming versions.

Full Changelog: Compare v0.1.1...v0.1.2

v0.1.1

25 Apr 17:39

Choose a tag to compare

Release v0.1.1

🚀 Highlights

  • Enhanced Stability: Significantly improved Data Factory stability for more reliable operations
  • Persistence Improvements: Enhanced resume capability with robust same-session and cross-session persistence for seamless workflow continuity
  • Better Diagnostics: Added telemetry system for improved issue tracking and faster resolution
  • Parser Enhancements: Upgraded Python parser with LaTeX support for more robust document processing

This release focuses on stability and reliability improvements

v0.1.0

25 Apr 08:17

Choose a tag to compare

Starfish v0.1.0 - Initial Release

We're excited to announce the first release of Starfish, a Python library for synthetic data generation made easy.

Overview

Starfish helps you build synthetic data your way by combining structured LLM outputs with efficient parallel processing. The library adapts to your workflow—not the other way around.

Key Features

  • Structured LLM: Type-safe outputs from any model using JSON schemas or Pydantic models
  • Model Flexibility: Compatible with any LLM provider via LiteLLM (OpenAI, Anthropic, local models, etc.)
  • Dynamic Prompts: Built-in Jinja2 templates support
  • Data Factory: Scale any workflow with a single decorator for parallel processing
  • Resilient Pipeline: Automatic retries, error handling, and job resumption capabilities
  • Complete Control: Share state across your pipeline and extend functionality with custom hooks

Installation

pip install starfish-core

Documentation

Visit our website for more information, or check the example notebooks in the repository.

Feedback and Support

If you encounter any issues or have questions, please open an issue on our repository.

We appreciate your interest in Starfish and look forward to seeing what you build with it!