Legend Lite

A clean-room reimplementation of the FINOS Legend Engine in ~25K lines of modern Java 21. Legend Lite compiles Pure models and queries to SQL and executes them entirely inside the database — zero rows are ever fetched into the JVM.

What is Legend?

Legend is an open-source data management platform created by Goldman Sachs and donated to FINOS. Its core idea: define your data model once in a formal language (Pure), then map it to physical databases and query it with type-safe semantics.

Legend Lite reimplements this in a fraction of the code, with 100% SQL push-down execution.

The Pure Language

Pure is Legend's modeling and query language. You define what your data looks like (classes), where it lives (databases + mappings), and how to access it (runtimes + connections). Then you query it with a type-safe expression language that compiles to SQL.

1. Classes — Define Your Data Model

Classes describe the shape of your domain objects:

Class model::Person
{
    firstName: String[1];
    lastName:  String[1];
    age:       Integer[1];
}

Class model::Address
{
    street: String[1];
    city:   String[1];
}

String[1] — required, exactly one value
String[0..1] — optional
String[*] — zero or more (collection)
Types: String, Integer, Float, Decimal, Boolean, Date, DateTime, StrictDate

2. Associations — Link Classes Together

Associations define relationships between classes:

Association model::Person_Address
{
    person:    Person[1];
    addresses: Address[*];
}

This says: a Person has many Addresses, and each Address belongs to one Person. Both sides are navigable — you can traverse $person.addresses or $address.person in queries.

3. Inheritance — Extend Classes

Class model::GeographicEntity
{
}

Class model::Address extends model::GeographicEntity
{
    name: String[1];
}

Class model::Location extends model::GeographicEntity
{
    place: String[1];
}

4. Databases — Describe Physical Storage

Database definitions describe tables, columns, and joins:

Database store::PersonDatabase
(
    Table T_PERSON
    (
        ID         INTEGER PRIMARY KEY,
        FIRST_NAME VARCHAR(100) NOT NULL,
        LAST_NAME  VARCHAR(100) NOT NULL,
        AGE_VAL    INTEGER NOT NULL
    )
    Table T_ADDRESS
    (
        ID        INTEGER PRIMARY KEY,
        PERSON_ID INTEGER NOT NULL,
        STREET    VARCHAR(200) NOT NULL,
        CITY      VARCHAR(100) NOT NULL
    )
    Join Person_Address(T_PERSON.ID = T_ADDRESS.PERSON_ID)
)

5. Mappings — Connect Models to Storage

Mappings tell the engine how class properties map to table columns:

Mapping model::PersonMapping
(
    Person: Relational
    {
        ~mainTable [PersonDatabase] T_PERSON
        firstName: [PersonDatabase] T_PERSON.FIRST_NAME,
        lastName:  [PersonDatabase] T_PERSON.LAST_NAME,
        age:       [PersonDatabase] T_PERSON.AGE_VAL
    }

    Address: Relational
    {
        ~mainTable [PersonDatabase] T_ADDRESS
        street: [PersonDatabase] T_ADDRESS.STREET,
        city:   [PersonDatabase] T_ADDRESS.CITY
    }
)

A single class can be mapped to different databases — swap the mapping at runtime to query Postgres, DuckDB, or SQLite with the same model.

6. Model-to-Model Mappings — Transform Between Classes

M2M mappings transform one class into another using Pure expressions. The engine compiles all transforms to SQL — no in-memory processing:

// Source class (raw staging data)
Class model::RawPerson
{
    firstName: String[1];
    lastName:  String[1];
    age:       Integer[1];
    salary:    Float[1];
    isActive:  Boolean[1];
}

// Target class (clean business model)
Class model::Person
{
    fullName:      String[1];
    upperLastName: String[1];
}

// M2M mapping: RawPerson → Person
Mapping model::PersonM2MMapping
(
    Person: Pure
    {
        ~src RawPerson
        fullName:      $src.firstName + ' ' + $src.lastName,
        upperLastName: $src.lastName->toUpper()
    }
)

~src RawPerson — declares the source class
$src.property — references a source property
String concat (+), function chaining (->toUpper()), and conditionals all compile to SQL

Conditionals, filters, and nested if/else:

// Conditional: age grouping with nested if/else → CASE WHEN
Mapping model::ConditionalMapping
(
    PersonView: Pure
    {
        ~src RawPerson
        firstName: $src.firstName,
        ageGroup:  if($src.age < 18, |'Minor', |if($src.age < 65, |'Adult', |'Senior'))
    }
)

// Filter: only include active people → WHERE clause
Mapping model::FilteredMapping
(
    ActivePerson: Pure
    {
        ~src RawPerson
        ~filter $src.isActive == true
        firstName: $src.firstName,
        lastName:  $src.lastName
    }
)

Chained M2M — compose transforms through multiple layers:

// PersonSummary → Person → RawPerson (resolves through the full chain to SQL)
Mapping model::ChainedM2MMapping
(
    PersonSummary: Pure
    {
        ~src Person
        name:      $src.fullName,
        nameUpper: $src.upperLastName
    }
)

Nested associations in M2M — @JoinName for deep-fetch:

Mapping model::DeepFetchMapping
(
    PersonWithAddress: Pure
    {
        ~src RawPerson
        fullName: $src.firstName + ' ' + $src.lastName,
        address:  @PersonAddress
    }
    Address: Pure
    {
        ~src RawAddress
        city:   $src.city,
        street: $src.street
    }
)

7. Graph Fetch — Nested / Structured Results

Instead of flat tabular output, graphFetch returns nested JSON objects matching the class graph shape:

// Flat properties
Person.all()
  ->graphFetch(#{ Person { fullName, upperLastName } }#)
  ->serialize(#{ Person { fullName, upperLastName } }#)

[{"fullName": "John Smith", "upperLastName": "SMITH"},
 {"fullName": "Jane Doe", "upperLastName": "DOE"}]

Nested associations — objects within objects:

PersonWithAddress.all()
  ->graphFetch(#{ PersonWithAddress { fullName, address { city, street } } }#)
  ->serialize(#{ PersonWithAddress { fullName, address { city, street } } }#)

[{"fullName": "John Smith", "address": [{"city": "New York", "street": "123 Main St"}]},
 {"fullName": "Bob Jones",  "address": [{"city": "Chicago", "street": "789 Pine Rd"},
                                         {"city": "Seattle", "street": "321 Elm Blvd"}]}]

The entire object graph — including nested one-to-many associations — is assembled in a single SQL statement using DuckDB's ROW() and LIST() constructs with correlated subqueries.

8. Connections & Runtimes — Bind to a Live Database

RelationalDatabaseConnection store::TestConnection
{
    type: DuckDB;
    specification: InMemory { };
    auth: NoAuth { };
}

Runtime test::TestRuntime
{
    mappings:
    [
        model::PersonMapping
    ];
    connections:
    [
        store::PersonDatabase:
        [
            environment: store::TestConnection
        ]
    ];
}

The Runtime binds everything together: which mapping to use, and which database connection to target. Change the runtime to switch databases without touching your model or queries.

9. Queries — Type-Safe Expressions

Query the model with Pure expressions that compile to SQL:

// Filter + project (tabular)
Person.all()
  ->filter({p | $p.age > 30})
  ->project([p|$p.firstName, p|$p.lastName], ['first', 'last'])

// Navigate through associations
Person.all()
  ->filter({p | $p.addresses.city == 'New York'})
  ->project({p | $p.firstName}, {p | $p.addresses.street})

// GroupBy with aggregation
Person.all()
  ->project([p|$p.lastName, p|$p.age], ['name', 'age'])
  ->groupBy([{r|$r.name}], [{r|$r.age->average()}], ['name', 'avgAge'])

// Sort + limit
Person.all()
  ->project([p|$p.firstName, p|$p.age], ['name', 'age'])
  ->sort(descending('age'))
  ->limit(5)

// Graph fetch (nested/structured JSON)
Person.all()
  ->graphFetch(#{ Person { fullName, upperLastName } }#)
  ->serialize(#{ Person { fullName, upperLastName } }#)

// Conditional logic
Person.all()->project({p | if($p.age < 18, |'Minor', |'Adult')}, 'category')

Every one of these compiles to a single SQL statement. Associations become JOINs, to-many filters become EXISTS, graph fetches become nested STRUCTs, M2M transforms become inline SQL expressions — all pushed down to the database.

Why Legend Lite?

Dimension	Legend Engine (FINOS)	Legend Lite
Codebase	~2M LOC across 400+ Maven modules	~25K LOC in 3 modules
Execution	Mixed: some SQL, some in-memory Java	100% SQL push-down — always
Build time	15–30 minutes	19 seconds (clean build + 955 tests)
Dependencies	Hundreds of JARs	DuckDB, ANTLR, JUnit — that's it
Java version	Java 11	Java 21 (records, sealed interfaces, pattern matching)
Databases	Postgres, Databricks, Snowflake, etc.	DuckDB (primary), SQLite

Quick Start

Prerequisites

Java 21+ — required for records, sealed interfaces, pattern matching
Maven 3.9+
GEMINI_API_KEY — required only for NLQ features

Build & Test

mvn clean install -DskipTests     # ~8 seconds
mvn clean test -pl engine          # 955 tests in ~19 seconds

Run the Server

# Engine only (LSP, query execution, SQL, diagrams)
mvn exec:java -pl engine \
  -Dexec.mainClass="com.gs.legend.server.LegendHttpServer"

# With NLQ (adds natural language → Pure endpoint)
GEMINI_API_KEY=your-key \
mvn exec:java -pl nlq \
  -Dexec.mainClass="org.finos.legend.engine.nlq.NlqHttpServer"

Both start on port 8080. Connect Studio Lite (the React IDE) to http://localhost:8080.

Modules

legend-lite/
├── engine/     Core: parser, compiler, SQL generator, executor, HTTP server
├── nlq/        Natural Language Query — English → Pure via LLM pipeline
├── pct/        Pure Compatibility Tests against the Legend specification
└── docs/       Design documents (Model-to-Model transforms, etc.)

Engine Architecture

Every Pure query flows through the same pipeline:

Pure Source → PureParser → AST → PureModelBuilder → Model
                                                      ↓
                                                  TypeChecker → Typed AST
                                                                   ↓
                                                             PlanGenerator → SqlBuilder
                                                                               ↓
                                                                          SqlDialect → SQL String
                                                                                         ↓
                                                                                    PlanExecutor → Rows

The engine is organized into 12 packages:

Package	Purpose
`ast`	Value specification AST — all Pure expressions as Java records
`antlr`	ANTLR visitors that build the AST from parse trees
`parser`	Hand-written recursive-descent parser for Pure model definitions
`compiler`	`TypeChecker` — resolves types, validates semantics, annotates the AST
`model`	`PureModelBuilder` — constructs the in-memory model (classes, mappings, runtimes)
`plan`	`PlanGenerator` — compiles typed AST → SQL via `SqlBuilder` / `SqlExpr`
`sql`	`SqlBuilder` — structural SQL AST (SELECT, JOIN, WHERE, etc.)
`sqlgen`	Dialect layer — translates generic SQL nodes to database-specific SQL text
`exec`	`PlanExecutor` — sends SQL to DuckDB/SQLite, maps results to typed rows
`serial`	Result serializers (JSON, CSV)
`server`	`LegendHttpServer` — embedded HTTP server
`service`	`QueryService` — high-level orchestration

SQL Push-Down Strategy

Pure Operation	SQL Translation
`filter({p\| ...})`	`WHERE` clause
`project([...])`	`SELECT` columns
`groupBy([...], [...])`	`GROUP BY` + aggregate functions
`sort(ascending('col'))`	`ORDER BY col ASC`
`limit(n)` / `drop(n)`	`LIMIT` / `OFFSET`
`distinct()`	`SELECT DISTINCT`
Association in filter	`WHERE EXISTS (...)` — prevents row explosion
Association in project	`LEFT OUTER JOIN` — preserves nulls
`if(cond, \|then, \|else)`	`CASE WHEN ... THEN ... ELSE ... END`
`fold({e,a\| ...}, init)`	`LIST_REDUCE(...)`
Graph fetch	Nested `ROW(...)` / `STRUCT(...)` with correlated subqueries

HTTP API

All endpoints are served on port 8080.

Method	Endpoint	Description
`POST`	`/lsp`	LSP JSON-RPC — diagnostics, completions, hover
`POST`	`/engine/execute`	Compile + execute a Pure query against a database
`POST`	`/engine/sql`	Execute raw SQL against a Runtime's connection
`POST`	`/engine/diagram`	Extract class diagram data from a Pure model
`POST`	`/engine/nlq`	Translate natural language to Pure query (NLQ module)
`GET`	`/health`	Health check

Execute Endpoint

Request:

{
  "code": "Class model::Person { ... }\nDatabase store::DB ( ... )\nMapping model::M ( ... )\nRuntime model::R { ... }\n\nPerson.all()->filter({p|$p.age > 30})->project({p|$p.firstName})"
}

The code field contains the full Pure source. The query is the last expression (after the Runtime definition). The engine parses the model, compiles the query, generates SQL, executes it, and returns tabular results.

NLQ (Natural Language Query)

The NLQ module translates English questions into executable Pure queries using a 4-step LLM pipeline:

Question → Semantic Retrieval → LLM Router → Query Planner → Pure Generator → Parse Validation
              (TF-IDF)          (root class)   (JSON plan)    (Pure syntax)    (PureParser)

Example:

Input:  "show me total notional by desk"
Output: Trade.all()->project([t|$t.trader.desk.name, t|$t.notional], ['desk', 'notional'])
          ->groupBy([{r|$r.desk}], [{r|$r.notional->sum()}], ['desk', 'totalNotional'])

NLQ Annotations

Improve NLQ accuracy by annotating your Pure model:

Profile nlq {
  tags: [description, synonyms, businessDomain, importance,
         exampleQuestions, displayName, sampleValues, unit];
}

Class {nlq.description = 'Individual trade execution'} model::Trade {
  {nlq.synonyms = 'deal, transaction'} notional: Float[1];
  {nlq.unit = 'USD'} price: Float[1];
}

Configuration

Variable	Required	Default	Description
`GEMINI_API_KEY`	Yes (NLQ only)	—	Google Gemini API key
`GEMINI_MODEL`	No	`gemini-3-flash-preview`	Gemini model name

Testing

# Engine: full test suite (955 tests, ~19s clean / ~8s incremental)
mvn clean test -pl engine

# Engine: single test class
mvn test -pl engine -Dtest="DuckDBIntegrationTest"

# NLQ eval (requires GEMINI_API_KEY)
GEMINI_API_KEY=your-key mvn test -pl nlq -Dtest="NlqFullPipelineEvalTest"

# PCT compatibility tests
mvn test -pl pct

Project Stats

Metric	Value
Engine source files	110
Engine test files	41
Engine LOC	~25,000
Total project LOC	~88,000
Engine tests	955
Clean build + test	~19 seconds
Incremental test	~8 seconds
Java version	21
Dependencies	DuckDB 1.5, ANTLR 4.13.1, JUnit 5.11

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 519 Commits
.agents/workflows		.agents/workflows
docs		docs
engine		engine
nlq		nlq
pct		pct
.gitignore		.gitignore
AGENTS.md		AGENTS.md
FAQ.md		FAQ.md
README.md		README.md
fold_plan_backup.md		fold_plan_backup.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legend Lite

What is Legend?

The Pure Language

1. Classes — Define Your Data Model

2. Associations — Link Classes Together

3. Inheritance — Extend Classes

4. Databases — Describe Physical Storage

5. Mappings — Connect Models to Storage

6. Model-to-Model Mappings — Transform Between Classes

7. Graph Fetch — Nested / Structured Results

8. Connections & Runtimes — Bind to a Live Database

9. Queries — Type-Safe Expressions

Why Legend Lite?

Quick Start

Prerequisites

Build & Test

Run the Server

Modules

Engine Architecture

SQL Push-Down Strategy

HTTP API

Execute Endpoint

NLQ (Natural Language Query)

NLQ Annotations

Configuration

Testing

Project Stats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Legend Lite

What is Legend?

The Pure Language

1. Classes — Define Your Data Model

2. Associations — Link Classes Together

3. Inheritance — Extend Classes

4. Databases — Describe Physical Storage

5. Mappings — Connect Models to Storage

6. Model-to-Model Mappings — Transform Between Classes

7. Graph Fetch — Nested / Structured Results

8. Connections & Runtimes — Bind to a Live Database

9. Queries — Type-Safe Expressions

Why Legend Lite?

Quick Start

Prerequisites

Build & Test

Run the Server

Modules

Engine Architecture

SQL Push-Down Strategy

HTTP API

Execute Endpoint

NLQ (Natural Language Query)

NLQ Annotations

Configuration

Testing

Project Stats

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages