Pipelex · thomashebrard · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026 · Jan 27, 2026
diff --git a/.blackboxrules b/.blackboxrules
diff --git a/.cursor/rules/run_pipelex.mdc b/.cursor/rules/run_pipelex.mdc
@@ -6,21 +6,6 @@ globs:
 ---
 # Guide to execute a pipeline and write example code
 
-## Prerequisites: Virtual Environment
-
-**CRITICAL**: Before running any `pipelex` commands or `pytest`, you MUST activate the appropriate Python virtual environment. Without proper venv activation, these commands will not work.
-
-For standard installations, the virtual environment is named `.venv`. Always check this first:
-
-```bash
-# Activate the virtual environment (standard installation)
-source .venv/bin/activate  # On macOS/Linux
-# or
-.venv\Scripts\activate  # On Windows
-```
-
-If your installation uses a different venv name or location, activate that one instead. All subsequent `pipelex` and `pytest` commands assume the venv is active.
-
 ## Example to execute a pipeline with text output
 
 ```python
@@ -114,13 +99,13 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
         },
     )
 
-# Here we have a single input and it's a PDF.
-# Because PDFContent is a native concept, we can use it directly as a value,
+# Here we have a single input and it's a document.
+# Because DocumentContent is a native concept, we can use it directly as a value,
 # the system knows what content it corresponds to:
     pipe_output = await execute_pipeline(
         pipe_code="power_extractor_dpe",
         inputs={
-            "document": PDFContent(url=pdf_url),
+            "document": DocumentContent(url=pdf_url),
         },
     )
 

diff --git a/.cursor/rules/write_pipelex.mdc b/.cursor/rules/write_pipelex.mdc
@@ -10,7 +10,7 @@ globs:
 - Always first write your "plan" in natural language, then transcribe it in pipelex.
 - You should ALWAYS RUN validation when you are writing or editing a `.plx` file. It will ensure the pipe is runnable. If not, iterate.
   - For a specific file: `pipelex validate path_to_file.plx`
-  - For all pipelines: `pipelex validate all`
+  - For all pipelines: `pipelex validate --all`
   - **IMPORTANT**: Ensure the Python virtual environment is activated before running `pipelex` commands. For standard installations, the venv is named `.venv` - always check that first. The commands will not work without proper venv activation.
 - Please use POSIX standard for files. (empty lines, no trailing whitespaces, etc.)
 
@@ -27,10 +27,10 @@ A pipeline file has three main sections:
 
 ### Domain Statement
 ```plx
-domain = "domain_name"
+domain = "domain_code"
 description = "Description of the domain" # Optional
 ```
-Note: The domain name usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.
+Note: The domain code usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.
 
 ### Concept Definitions
 
@@ -45,28 +45,36 @@ ConceptName = "Description of the concept"
 - Use PascalCase for concept names
 - Never use plurals (no "Stories", use "Story") - lists are handled implicitly by Pipelex
 - Avoid circumstantial adjectives (no "LargeText", use "Text") - focus on the essence of what the concept represents
-- Don't redefine native concepts (Text, Image, PDF, TextAndImages, Number, Page)
+- Don't redefine native concepts (Text, Image, PDF, TextAndImages, Number, Page, JSON)
 
 **Native Concepts:**
-Pipelex provides built-in native concepts: `Text`, `Image`, `PDF`, `TextAndImages`, `Number`, `Page`. Use these directly or refine them when appropriate.
+Pipelex provides built-in native concepts: `Text`, `Image`, `PDF`, `TextAndImages`, `Number`, `Page`, `JSON`. Use these directly or refine them when appropriate.
 
-**Refining Native Concepts:**
-To create a concept that specializes a native concept without adding fields:
+**Refining Concepts:**
+To create a concept that specializes another concept without adding fields, use `refines`:
 
 ```plx
+# Refining a native concept
 [concept.Landscape]
 description = "A scenic outdoor photograph"
 refines = "Image"
+
+# Refining a custom concept (must be in domain.ConceptCode format)
+[concept.PremiumCustomer]
+description = "A premium customer with special benefits"
+refines = "myapp.Customer"
 ```
 
+Note: When refining a custom (non-native) concept, you must use the fully qualified concept ref in `domain.ConceptCode` format. Pipelex automatically handles the dependency order to ensure referenced concepts are loaded first.
+
 For details on how to structure concepts with fields, see the "Structuring Models" section below.
 
 ### Pipe Definitions
 
 ## Pipe Base Definition
 
 ```plx
-[pipe.your_pipe_name]
+[pipe.your_pipe_code]
 type = "PipeLLM"
 description = "A description of what your pipe does"
 inputs = { input_1 = "ConceptName1", input_2 = "ConceptName2" }
@@ -76,7 +84,7 @@ output = "ConceptName"
 The pipes will all have at least this base definition. 
 - `inputs`: Dictionary of key being the variable used in the prompts, and the value being the ConceptName. It should ALSO LIST THE INPUTS OF THE INTERMEDIATE STEPS (if PipeSequence) or of the conditional pipes (if PipeCondition).
 So If you have this error:
-`StaticValidationError: missing_input_variable • domain='expense_validator' • pipe='validate_expense' • 
+`PipeValidationError: missing_input_variable • domain='expense_validator' • pipe='validate_expense' • 
 variable='['invoice']'``
 That means that the pipe validate_expense is missing the input `invoice` because one of the subpipe is needing it.
 
@@ -131,16 +139,16 @@ For concepts with structured fields, define them inline using TOML syntax:
 description = "A commercial document issued by a seller to a buyer"
 
 [concept.Invoice.structure]
-invoice_number = "The unique invoice identifier"
+invoice_number = "The unique invoice identifier" # This will be optional by default
 issue_date = { type = "date", description = "The date the invoice was issued", required = true }
 total_amount = { type = "number", description = "The total invoice amount", required = true }
-vendor_name = "The name of the vendor"
-line_items = { type = "list", item_type = "text", description = "List of items", required = false }
+vendor_name = "The name of the vendor" # This will be optional by default
+line_items = { type = "list", item_type = "text", description = "List of items" }
 ```
 
-**Supported inline field types:** `text`, `integer`, `boolean`, `number`, `date`, `list`, `dict`
+**Supported inline field types:** `text`, `integer`, `boolean`, `number`, `date`, `list`, `dict`, `concept`
 
-**Field properties:** `type`, `description`, `required` (default: true), `default_value`, `choices`, `item_type` (for lists), `key_type` and `value_type` (for dicts)
+**Field properties:** `type`, `description`, `required` (default: false), `default_value`, `choices`, `item_type` (for lists), `key_type` and `value_type` (for dicts), `concept_ref` (for concept references), `item_concept_ref` (for lists of concepts)
 
 **Simple syntax** (creates required text field):
 ```plx
@@ -149,9 +157,46 @@ field_name = "Field description"
 
 **Detailed syntax** (with explicit properties):
 ```plx
-field_name = { type = "text", description = "Field description", required = false, default_value = "default" }
+field_name = { type = "text", description = "Field description", default_value = "default" }
+```
+
+**Concept reference syntax** (referencing another concept):
+```plx
+# Single concept reference
+customer = { type = "concept", concept_ref = "myapp.Customer", description = "The customer" }
+
+# List of concepts
+line_items = { type = "list", item_type = "concept", item_concept_ref = "myapp.LineItem", description = "Line items" }
+```
+
+Example with concept references:
+```plx
+[concept.Customer]
+description = "A customer entity"
+
+[concept.Customer.structure]
+name = { type = "text", description = "Customer name" }
+email = { type = "text", description = "Customer email" }
+
+[concept.LineItem]
+description = "A line item in an invoice"
+
+[concept.LineItem.structure]
+product = { type = "text", description = "Product name" }
+quantity = { type = "integer", description = "Quantity ordered" }
+unit_price = { type = "number", description = "Price per unit" }
+
+[concept.Invoice]
+description = "An invoice document"
+
+[concept.Invoice.structure]
+customer = { type = "concept", concept_ref = "myapp.Customer", description = "The customer" }
+items = { type = "list", item_type = "concept", item_concept_ref = "myapp.LineItem", description = "Line items" }
+total = { type = "number", description = "Invoice total" }
 ```
 
+Note: Pipelex automatically determines the correct loading order for concepts based on their dependencies (topological sort), so concepts can reference each other across domains as long as there are no circular dependencies.
+
 **3. Python StructuredContent Class (For Advanced Features)**
 
 Create a Python class when you need:
@@ -204,12 +249,14 @@ class Invoice(StructuredContent):
 ### Inline Structure Limitations
 
 Inline structures:
-- ✅ Support all common field types (text, number, date, list, dict, etc.)
+- ✅ Support all common field types (text, number, date, list, dict, concept, etc.)
 - ✅ Support required/optional fields, defaults, choices
+- ✅ Support concept-to-concept references (type = "concept" with concept_ref)
+- ✅ Support lists of concepts (type = "list" with item_type = "concept")
+- ✅ Support refining both native and custom concepts
 - ✅ Generate full Pydantic models with validation
 - ❌ Cannot have custom validators or complex validation logic
 - ❌ Cannot have computed properties or custom methods
-- ❌ Cannot refine custom (non-native) concepts
 - ❌ Limited IDE autocomplete compared to explicit Python classes
 
 
@@ -475,7 +522,7 @@ The PipeExtract operator is used to extract text and images from an image or a P
 [pipe.extract_info]
 type = "PipeExtract"
 description = "extract the information"
-inputs = { document = "PDF" } # or { image = "Image" } if it's an image. This is the only input.
+inputs = { document = "Document" } # or { image = "Image" } if it's an image. This is the only input.
 output = "Page"
 ```
 
@@ -484,7 +531,7 @@ Using Extract Model Settings:
 [pipe.extract_with_model]
 type = "PipeExtract"
 description = "Extract with specific model"
-inputs = { document = "PDF" }
+inputs = { document = "Document" }
 output = "Page"
 model = "base_extract_mistral"  # Use predefined extract preset or model alias
 ```
@@ -592,25 +639,160 @@ $sales_rep.phone | $sales_rep.email
 """
 ```
 
-### Key Parameters
+### Key Parameters (Template Mode)
 
-- `template`: Inline template string (mutually exclusive with template_name)
+- `template`: Inline template string (mutually exclusive with template_name and construct)
 - `template_name`: Name of a predefined template (mutually exclusive with template)
 - `template_category`: Template type ("llm_prompt", "html", "markdown", "mermaid", etc.)
 - `templating_style`: Styling options for template rendering
 - `extra_context`: Additional context variables for template
 
 For more control, you can use a nested `template` section instead of the `template` field:
+
 - `template.template`: The template string
 - `template.category`: Template type
 - `template.templating_style`: Styling options
 
 ### Template Variables
 
 Use the same variable insertion rules as PipeLLM:
+
 - `@variable` for block insertion (multi-line content)
 - `$variable` for inline insertion (short text)
 
+### Construct Mode (for StructuredContent Output)
+
+PipeCompose can also generate `StructuredContent` objects using the `construct` section. This mode composes field values from fixed values, variable references, templates, or nested structures.
+
+**When to use construct mode:**
+
+- You need to output a structured object (not just Text)
+- You want to deterministically compose fields from existing data
+- No LLM is needed - just data composition and templating
+
+#### Basic Construct Usage
+
+```plx
+[concept.SalesSummary]
+description = "A structured sales summary"
+
+[concept.SalesSummary.structure]
+report_title = { type = "text", description = "Title of the report" }
+customer_name = { type = "text", description = "Customer name" }
+deal_value = { type = "number", description = "Deal value" }
+summary_text = { type = "text", description = "Generated summary text" }
+
+[pipe.compose_summary]
+type = "PipeCompose"
+description = "Compose a sales summary from deal data"
+inputs = { deal = "Deal" }
+output = "SalesSummary"
+
+[pipe.compose_summary.construct]
+report_title = "Monthly Sales Report"
+customer_name = { from = "deal.customer_name" }
+deal_value = { from = "deal.amount" }
+summary_text = { template = "Deal worth $deal.amount with $deal.customer_name" }
+```
+
+#### Field Composition Methods
+
+There are four ways to define field values in a construct:
+
+**1. Fixed Value (literal)**
+
+Use a literal value directly:
+
+```plx
+[pipe.compose_report.construct]
+report_title = "Annual Report"
+report_year = 2024
+is_draft = false
+```
+
+**2. Variable Reference (`from`)**
+
+Get a value from working memory using a dotted path:
+
+```plx
+[pipe.compose_report.construct]
+customer_name = { from = "deal.customer_name" }
+total_amount = { from = "order.total" }
+street_address = { from = "customer.address.street" }
+```
+
+**3. Template (`template`)**
+
+Render a Jinja2 template with variable substitution:
+
+```plx
+[pipe.compose_report.construct]
+invoice_number = { template = "INV-$order.id" }
+summary = { template = "Deal worth $deal.amount with $deal.customer_name on {{ current_date }}" }
+```
+
+**4. Nested Construct**
+
+For nested structures, use a TOML subsection:
+
+```plx
+[pipe.compose_invoice.construct]
+invoice_number = { template = "INV-$order.id" }
+total = { from = "order.total_amount" }
+
+[pipe.compose_invoice.construct.billing_address]
+street = { from = "customer.address.street" }
+city = { from = "customer.address.city" }
+country = "France"
+```
+
+#### Complete Construct Example
+
+```plx
+domain = "invoicing"
+
+[concept.Address]
+description = "A postal address"
+
+[concept.Address.structure]
+street = { type = "text", description = "Street address" }
+city = { type = "text", description = "City name" }
+country = { type = "text", description = "Country name" }
+
+[concept.Invoice]
+description = "An invoice document"
+
+[concept.Invoice.structure]
+invoice_number = { type = "text", description = "Invoice number" }
+total = { type = "number", description = "Total amount" }
+
+[pipe.compose_invoice]
+type = "PipeCompose"
+description = "Compose an invoice from order and customer data"
+inputs = { order = "Order", customer = "Customer" }
+output = "Invoice"
+
+[pipe.compose_invoice.construct]
+invoice_number = { template = "INV-$order.id" }
+total = { from = "order.total_amount" }
+
+[pipe.compose_invoice.construct.billing_address]
+street = { from = "customer.address.street" }
+city = { from = "customer.address.city" }
+country = "France"
+```
+
+#### Key Parameters (Construct Mode)
+
+- `construct`: Dictionary mapping field names to their composition rules
+- Each field can be:
+  - A literal value (string, number, boolean)
+  - A dict with `from` key for variable reference
+  - A dict with `template` key for template rendering
+  - A nested dict for nested structures
+
+**Note:** You must use either `template` or `construct`, not both. They are mutually exclusive.
+
 ## PipeImgGen operator
 
 The PipeImgGen operator is used to generate images using AI image generation models.
@@ -824,7 +1006,7 @@ Presets are meant to record the choice of an llm with its hyper parameters (temp
 
 Examples:
 ```toml
-llm_for_complex_reasoning = { model = "base-claude", temperature = 1 }
+llm_to_engineer = { model = "base-claude", temperature = 1 }
 llm_to_extract_invoice = { model = "claude-3-7-sonnet", temperature = 0.1, max_tokens = "auto" }
 ```
 
@@ -855,7 +1037,7 @@ You can override the predefined llm presets by setting them in `.pipelex/inferen
 
 ALWAYS RUN validation when you are finished writing pipelines: This checks for errors. If there are errors, iterate until it works.
 - For a specific bundle/file: `pipelex validate path_to_file.plx`
-- For all pipelines: `pipelex validate all`
+- For all pipelines: `pipelex validate --all`
 - Remember: Ensure your Python virtual environment is activated (typically `.venv` for standard installations) before running `pipelex` commands.
 
 Then, create an example file to run the pipeline in the `examples` folder.