Releases: starfishdata/starfish
v0.1.3
Release v0.1.3
🚀 Highlights
Data Generation Template
Full release of the Data Generation Template with core registry and two prebuilt data generation template.
- Generate By Topic: The generate_by_topic template is designed to create diverse synthetic data across multiple topics based on user instructions. It can automatically generate relevant topics if not provided and handles deduplication across generated content.
- Function Calling: This template replicates the methodology from the APIGen paper to generate high-quality synthetic datasets for training function-calling AI models.
Data Ingest (Preview)
Introducing the first preview release of Data Ingest – Enables ingestion of raw content into structured text format. Current support includes:
- HTML
- YouTube
- And more formats coming soon
📌 *Note: Data Ingest is released as a preview feature and will continue to evolve with broader format support and Data Generation Template with data ingestion components.
Other Improvements
- Shared State Stability: Improved shared state to be thread-safe
- Bug fixes
Full Changelog: v0.1.2...v0.1.3
v0.1.2
Release v0.1.2
🚀 Highlights
Data Factory Enhancements
- Improved Resume Functionality: Refactored and separated
resumelogic fromresume_from_checkpointfor greater clarity and resilience - Index Handling: Fixed indexing issues and introduced
get_indexto improve usability - DLQ Support: Integrated Dead Letter Queue (DLQ) handling for better error tracking and recovery
- Stability & Lock Fixes: Resolved multiple database locking issues to improve concurrent processing and reliability
- Internal Refactoring: Reorganized factory logic for better maintainability and system performance
Data Generation Template (Preview)
- Initial release of the Data Generation Template Core
- This version lays the groundwork for data simulation features, with more test coverage and enhancements planned in future releases
Other Improvements
- Dependency Cleanup: Trimmed unnecessary package dependencies to streamline the environment and reduce bloat
- Jupyter/Colab Support: Added test case for Jupyter Notebook to improve compatibility with Google Colab and similar environments
📌 Note: The Data Generation Template is an early-stage feature and will evolve significantly in upcoming versions.
Full Changelog: Compare v0.1.1...v0.1.2
v0.1.1
Release v0.1.1
🚀 Highlights
- Enhanced Stability: Significantly improved Data Factory stability for more reliable operations
- Persistence Improvements: Enhanced resume capability with robust same-session and cross-session persistence for seamless workflow continuity
- Better Diagnostics: Added telemetry system for improved issue tracking and faster resolution
- Parser Enhancements: Upgraded Python parser with LaTeX support for more robust document processing
This release focuses on stability and reliability improvements
v0.1.0
Starfish v0.1.0 - Initial Release
We're excited to announce the first release of Starfish, a Python library for synthetic data generation made easy.
Overview
Starfish helps you build synthetic data your way by combining structured LLM outputs with efficient parallel processing. The library adapts to your workflow—not the other way around.
Key Features
- Structured LLM: Type-safe outputs from any model using JSON schemas or Pydantic models
- Model Flexibility: Compatible with any LLM provider via LiteLLM (OpenAI, Anthropic, local models, etc.)
- Dynamic Prompts: Built-in Jinja2 templates support
- Data Factory: Scale any workflow with a single decorator for parallel processing
- Resilient Pipeline: Automatic retries, error handling, and job resumption capabilities
- Complete Control: Share state across your pipeline and extend functionality with custom hooks
Installation
pip install starfish-coreDocumentation
Visit our website for more information, or check the example notebooks in the repository.
Feedback and Support
If you encounter any issues or have questions, please open an issue on our repository.
We appreciate your interest in Starfish and look forward to seeing what you build with it!