Skip to content

Releases: dais-polymtl/flock

v0.2.1 Enhancement Release

15 Dec 22:29
ad676de

Choose a tag to compare

This is a simple enhancement release based on feedback after we released v0.2.0. MODEL(s) and PROMPT(s) can be defined with CREATE commands as GLOBAL (persisting across multiple databases) or LOCAL to the database. By default and if not specified, the model or prompt defined is LOCAL.

What's Changed

  • Openai-Compatible Providers by @dorbanianas in #97
  • Add the global and local prompts/models support by @dorbanianas in #99
  • Added a Global and Local example to the product review notebook by @dorbanianas in #104
  • Upgrade to duckdb 1.1.3 by @dorbanianas in #106

New Contributors

Full Changelog: v0.2.0...v0.2.1

FlockMTL v0.2.0 "St-Viateur Bagel"

11 Dec 02:53
b92ae14

Choose a tag to compare

What's Changed

  • remove redundant code by @dorbanianas in #41
  • Add llm_max & llm_min Aggregate Functions for Selecting Most/Least Relevant Row by group by @dorbanianas in #43
  • create llm_rerank aggregate function by @dorbanianas in #44
  • change prompt_name for aggregate functions into search_query by @dorbanianas in #47
  • Refactor/change max min to first last by @dorbanianas in #48
  • Fix Combine for multithreading & windows build issues by @dorbanianas in #49
  • Feature/model provider setup by @SunnyYasser in #45
  • fix windows and linux builds by @dorbanianas in #50
  • drop model_name new lines removal from modal manager by @dorbanianas in #51
  • fix llm_rerank index accessing and gpt-4o-mini ranked list issue by @dorbanianas in #52
  • Added Jupyter Notebook for FlockMTL setup by @SunnyYasser in #57
  • Feature/getting started with flock mtl example by @SunnyYasser in #58
  • Added python script for FlockMTL examples by @SunnyYasser in #59
  • Add llm_reduce Aggregate Function by @dorbanianas in #53
  • Add the batching to the llm_embedding by @dorbanianas in #54
  • Ollama provider onboarding by @SunnyYasser in #55
  • Feat/batching on output max size by @dorbanianas in #56
  • Refactor llm_* Function Call Structure for Enhanced Flexibility in Prompt Specification by @dorbanianas in #61
  • Add Support for Prompt Versioning by @dorbanianas in #63
  • Add Support for Configuring Secrets for Multiple Model Providers by @dorbanianas in #64
  • Add fusion_relative Function for Score Normalization and Max Aggregation by @dorbanianas in #65
  • Add Support for Configuring Secrets for Multiple Model Providers by @dorbanianas in #66
  • Fix Windows Build by Enabling C++17 by @dorbanianas in #67
  • Prompt Manager Refactoring by @dorbanianas in #68
  • Integrate New Prompt Manager Implementation within Scalar and Aggregate Functions by @dorbanianas in #69
  • Refactor and Improve Model Manager Implementation by @dorbanianas in #70
  • move the custom parser from flockmtl/core/parser into flockmtl/custom… by @dorbanianas in #73
  • Enhanced String Manipulation with duckdb_fmt::format by @dorbanianas in #75
  • Fix the missing double quotation mark by @dorbanianas in #76
  • Refactor/functions registry by @dorbanianas in #77
  • Refactor the scalar and aggregate functions by @dorbanianas in #78
  • Add the full website with the GH action by @dorbanianas in #81
  • Fix deploy gh action by @dorbanianas in #82
  • Enable workflow dispatch by @dorbanianas in #83
  • Remove Env Variable Requirement and Add Support for New Secrets in Secret Manager by @dorbanianas in #80
  • Fix the redifinition of supported providers by @dorbanianas in #84
  • fix typo and add paths ignore to main distribution pipeline by @dorbanianas in #86
  • Fix secret queries by @dorbanianas in #87
  • Udpate the readme file by @dorbanianas in #85
  • udpate the azure type into azure_llm in secret manager by @dorbanianas in #88
  • Add Product Review Notebook Script & Edit the GH Action Paths by @dorbanianas in #89
  • Update README.md by @dorbanianas in #90

Full Changelog: v0.1.0...v0.2.0

FlockMTL v0.1.0 "Schwartz's Deli"

21 Oct 23:58
1bd8ac0

Choose a tag to compare

FlockMTL v0.1.0 is a DuckDB extension that integrates language model (LLM) capabilities directly into your queries and workflows. This experimental extension enables DuckDB users to add semantic analysis (classification, filtering, completion, all w/ structured output) and embeddings using GPT models—all from within SQL commands. Following the tradition of declarativity, we introduce an administrative view of MODEL(s) and PROMPT(s) akin to TABLE(s).

Key Features:

  • LLM Integration: Seamlessly call OpenAI models, such as GPT-4, directly from SQL with different function signatures to obtain structured output, VARCHAR completions, boolean filters.
  • Prompt Definition: Define and store custom prompts for repeated use across queries. Similarly, fix the query once with a prompt name and iterate on the prompt separately or even change it later.
  • Model Definition: Define and switch between different models by giving them names and updating the models name without a change to the query.

The full documentation can be found in our repo's README file.


Next, we demonstrate how to use the DuckDB LLM Extension and using product reviews analysis as an examples using a product_reviews table with attributes review_text, review_id, and customer_name. You can also manage LLM prompts and models but we do not show those examples here.

1. Text Generation with llm_complete

llm_complete generates text based on a given prompt and LLM model. The following examples show how this functionality can be used in real-world scenarios.

  • Basic Text Generation – Product Description:
    Imagine you have a list of product names, and you want to automatically generate product descriptions based on the product name.
    SELECT product_name, llm_complete('generate_product_description', 'default', {'product_name': product_name}, {'max_tokens': 150}) AS product_description 
    FROM products;

2. JSON Output with llm_complete_json

llm_complete_json is useful when you want structured output, such as detailed classification or multi-part answers.

  • Example: Sentiment and Detailed Analysis:
    Classify whether a review is positive or negative and provide a detailed analysis of the sentiment in JSON format.
    SELECT review_id, llm_complete_json('detailed_sentiment_analysis', 'default', {'text': review_text}, {'max_tokens': 200}) AS analysis 
    FROM product_reviews;
    Result:
    review_id analysis
    112233 {"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}

3. Filtering with llm_filter

Use llm_filter to filter records based on LLM-powered classifications, such as identifying specific sentiments or themes in product reviews.

  • Example: Filter Reviews Mentioning Specific Features:
    Filter reviews to return only those that mention specific product features, such as "battery life" or "performance."
    SELECT review_id, customer_name, review_text 
    FROM product_reviews 
    WHERE llm_filter('mentions_feature', 'default', {'text': review_text, 'feature': 'battery life'});
    Result:
    review_id customer_name review_text
    56789 John Doe "The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."

4. Text Embedding with llm_embedding

llm_embedding generates vector embeddings for text, which can be used for tasks like semantic similarity, clustering, or advanced search.

  • Example: Find Reviews Similar to a Target Review:
    Generate embeddings for each review and compare them to a target review to find similar reviews based on their semantic content.
    WITH input_embedding AS (
        SELECT llm_embedding({'text': 'Comfortable to wear, but the sound quality is just average. Expected better for the price.'},
                 'text-embedding-3-small') AS embedding
    ),
    review_embeddings AS (
        SELECT 
            review_id, 
            array_distance(input_embedding.embedding::DOUBLE[1536], 
                           llm_embedding({'text': review_text}, 'text-embedding-3-small')::DOUBLE[1536]) AS similarity
        FROM product_reviews, input_embedding
    )
    SELECT review_id AS similar_review_id
    FROM review_embeddings
    ORDER BY similarity
    LIMIT 1;
    Result:
    similar_review_id
    4