Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
329 changes: 209 additions & 120 deletions .blackboxrules

Large diffs are not rendered by default.

21 changes: 3 additions & 18 deletions .cursor/rules/run_pipelex.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,6 @@ globs:
---
# Guide to execute a pipeline and write example code

## Prerequisites: Virtual Environment

**CRITICAL**: Before running any `pipelex` commands or `pytest`, you MUST activate the appropriate Python virtual environment. Without proper venv activation, these commands will not work.

For standard installations, the virtual environment is named `.venv`. Always check this first:

```bash
# Activate the virtual environment (standard installation)
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows
```

If your installation uses a different venv name or location, activate that one instead. All subsequent `pipelex` and `pytest` commands assume the venv is active.

## Example to execute a pipeline with text output

```python
Expand Down Expand Up @@ -114,13 +99,13 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
},
)

# Here we have a single input and it's a PDF.
# Because PDFContent is a native concept, we can use it directly as a value,
# Here we have a single input and it's a document.
# Because DocumentContent is a native concept, we can use it directly as a value,
# the system knows what content it corresponds to:
pipe_output = await execute_pipeline(
pipe_code="power_extractor_dpe",
inputs={
"document": PDFContent(url=pdf_url),
"document": DocumentContent(url=pdf_url),
},
)

Expand Down
228 changes: 205 additions & 23 deletions .cursor/rules/write_pipelex.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ globs:
- Always first write your "plan" in natural language, then transcribe it in pipelex.
- You should ALWAYS RUN validation when you are writing or editing a `.plx` file. It will ensure the pipe is runnable. If not, iterate.
- For a specific file: `pipelex validate path_to_file.plx`
- For all pipelines: `pipelex validate all`
- For all pipelines: `pipelex validate --all`
- **IMPORTANT**: Ensure the Python virtual environment is activated before running `pipelex` commands. For standard installations, the venv is named `.venv` - always check that first. The commands will not work without proper venv activation.
- Please use POSIX standard for files. (empty lines, no trailing whitespaces, etc.)

Expand All @@ -27,10 +27,10 @@ A pipeline file has three main sections:

### Domain Statement
```plx
domain = "domain_name"
domain = "domain_code"
description = "Description of the domain" # Optional
```
Note: The domain name usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.
Note: The domain code usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.

### Concept Definitions

Expand All @@ -45,28 +45,36 @@ ConceptName = "Description of the concept"
- Use PascalCase for concept names
- Never use plurals (no "Stories", use "Story") - lists are handled implicitly by Pipelex
- Avoid circumstantial adjectives (no "LargeText", use "Text") - focus on the essence of what the concept represents
- Don't redefine native concepts (Text, Image, PDF, TextAndImages, Number, Page)
- Don't redefine native concepts (Text, Image, PDF, TextAndImages, Number, Page, JSON)

**Native Concepts:**
Pipelex provides built-in native concepts: `Text`, `Image`, `PDF`, `TextAndImages`, `Number`, `Page`. Use these directly or refine them when appropriate.
Pipelex provides built-in native concepts: `Text`, `Image`, `PDF`, `TextAndImages`, `Number`, `Page`, `JSON`. Use these directly or refine them when appropriate.

**Refining Native Concepts:**
To create a concept that specializes a native concept without adding fields:
**Refining Concepts:**
To create a concept that specializes another concept without adding fields, use `refines`:

```plx
# Refining a native concept
[concept.Landscape]
description = "A scenic outdoor photograph"
refines = "Image"

# Refining a custom concept (must be in domain.ConceptCode format)
[concept.PremiumCustomer]
description = "A premium customer with special benefits"
refines = "myapp.Customer"
```

Note: When refining a custom (non-native) concept, you must use the fully qualified concept ref in `domain.ConceptCode` format. Pipelex automatically handles the dependency order to ensure referenced concepts are loaded first.

For details on how to structure concepts with fields, see the "Structuring Models" section below.

### Pipe Definitions

## Pipe Base Definition

```plx
[pipe.your_pipe_name]
[pipe.your_pipe_code]
type = "PipeLLM"
description = "A description of what your pipe does"
inputs = { input_1 = "ConceptName1", input_2 = "ConceptName2" }
Expand All @@ -76,7 +84,7 @@ output = "ConceptName"
The pipes will all have at least this base definition.
- `inputs`: Dictionary of key being the variable used in the prompts, and the value being the ConceptName. It should ALSO LIST THE INPUTS OF THE INTERMEDIATE STEPS (if PipeSequence) or of the conditional pipes (if PipeCondition).
So If you have this error:
`StaticValidationError: missing_input_variable • domain='expense_validator' • pipe='validate_expense' •
`PipeValidationError: missing_input_variable • domain='expense_validator' • pipe='validate_expense' •
variable='['invoice']'``
That means that the pipe validate_expense is missing the input `invoice` because one of the subpipe is needing it.

Expand Down Expand Up @@ -131,16 +139,16 @@ For concepts with structured fields, define them inline using TOML syntax:
description = "A commercial document issued by a seller to a buyer"

[concept.Invoice.structure]
invoice_number = "The unique invoice identifier"
invoice_number = "The unique invoice identifier" # This will be optional by default
issue_date = { type = "date", description = "The date the invoice was issued", required = true }
total_amount = { type = "number", description = "The total invoice amount", required = true }
vendor_name = "The name of the vendor"
line_items = { type = "list", item_type = "text", description = "List of items", required = false }
vendor_name = "The name of the vendor" # This will be optional by default
line_items = { type = "list", item_type = "text", description = "List of items" }
```

**Supported inline field types:** `text`, `integer`, `boolean`, `number`, `date`, `list`, `dict`
**Supported inline field types:** `text`, `integer`, `boolean`, `number`, `date`, `list`, `dict`, `concept`

**Field properties:** `type`, `description`, `required` (default: true), `default_value`, `choices`, `item_type` (for lists), `key_type` and `value_type` (for dicts)
**Field properties:** `type`, `description`, `required` (default: false), `default_value`, `choices`, `item_type` (for lists), `key_type` and `value_type` (for dicts), `concept_ref` (for concept references), `item_concept_ref` (for lists of concepts)

**Simple syntax** (creates required text field):
```plx
Expand All @@ -149,9 +157,46 @@ field_name = "Field description"

**Detailed syntax** (with explicit properties):
```plx
field_name = { type = "text", description = "Field description", required = false, default_value = "default" }
field_name = { type = "text", description = "Field description", default_value = "default" }
```

**Concept reference syntax** (referencing another concept):
```plx
# Single concept reference
customer = { type = "concept", concept_ref = "myapp.Customer", description = "The customer" }

# List of concepts
line_items = { type = "list", item_type = "concept", item_concept_ref = "myapp.LineItem", description = "Line items" }
```

Example with concept references:
```plx
[concept.Customer]
description = "A customer entity"

[concept.Customer.structure]
name = { type = "text", description = "Customer name" }
email = { type = "text", description = "Customer email" }

[concept.LineItem]
description = "A line item in an invoice"

[concept.LineItem.structure]
product = { type = "text", description = "Product name" }
quantity = { type = "integer", description = "Quantity ordered" }
unit_price = { type = "number", description = "Price per unit" }

[concept.Invoice]
description = "An invoice document"

[concept.Invoice.structure]
customer = { type = "concept", concept_ref = "myapp.Customer", description = "The customer" }
items = { type = "list", item_type = "concept", item_concept_ref = "myapp.LineItem", description = "Line items" }
total = { type = "number", description = "Invoice total" }
```

Note: Pipelex automatically determines the correct loading order for concepts based on their dependencies (topological sort), so concepts can reference each other across domains as long as there are no circular dependencies.

**3. Python StructuredContent Class (For Advanced Features)**

Create a Python class when you need:
Expand Down Expand Up @@ -204,12 +249,14 @@ class Invoice(StructuredContent):
### Inline Structure Limitations

Inline structures:
- ✅ Support all common field types (text, number, date, list, dict, etc.)
- ✅ Support all common field types (text, number, date, list, dict, concept, etc.)
- ✅ Support required/optional fields, defaults, choices
- ✅ Support concept-to-concept references (type = "concept" with concept_ref)
- ✅ Support lists of concepts (type = "list" with item_type = "concept")
- ✅ Support refining both native and custom concepts
- ✅ Generate full Pydantic models with validation
- ❌ Cannot have custom validators or complex validation logic
- ❌ Cannot have computed properties or custom methods
- ❌ Cannot refine custom (non-native) concepts
- ❌ Limited IDE autocomplete compared to explicit Python classes


Expand Down Expand Up @@ -475,7 +522,7 @@ The PipeExtract operator is used to extract text and images from an image or a P
[pipe.extract_info]
type = "PipeExtract"
description = "extract the information"
inputs = { document = "PDF" } # or { image = "Image" } if it's an image. This is the only input.
inputs = { document = "Document" } # or { image = "Image" } if it's an image. This is the only input.
output = "Page"
```

Expand All @@ -484,7 +531,7 @@ Using Extract Model Settings:
[pipe.extract_with_model]
type = "PipeExtract"
description = "Extract with specific model"
inputs = { document = "PDF" }
inputs = { document = "Document" }
output = "Page"
model = "base_extract_mistral" # Use predefined extract preset or model alias
```
Expand Down Expand Up @@ -592,25 +639,160 @@ $sales_rep.phone | $sales_rep.email
"""
```

### Key Parameters
### Key Parameters (Template Mode)

- `template`: Inline template string (mutually exclusive with template_name)
- `template`: Inline template string (mutually exclusive with template_name and construct)
- `template_name`: Name of a predefined template (mutually exclusive with template)
- `template_category`: Template type ("llm_prompt", "html", "markdown", "mermaid", etc.)
- `templating_style`: Styling options for template rendering
- `extra_context`: Additional context variables for template

For more control, you can use a nested `template` section instead of the `template` field:

- `template.template`: The template string
- `template.category`: Template type
- `template.templating_style`: Styling options

### Template Variables

Use the same variable insertion rules as PipeLLM:

- `@variable` for block insertion (multi-line content)
- `$variable` for inline insertion (short text)

### Construct Mode (for StructuredContent Output)

PipeCompose can also generate `StructuredContent` objects using the `construct` section. This mode composes field values from fixed values, variable references, templates, or nested structures.

**When to use construct mode:**

- You need to output a structured object (not just Text)
- You want to deterministically compose fields from existing data
- No LLM is needed - just data composition and templating

#### Basic Construct Usage

```plx
[concept.SalesSummary]
description = "A structured sales summary"

[concept.SalesSummary.structure]
report_title = { type = "text", description = "Title of the report" }
customer_name = { type = "text", description = "Customer name" }
deal_value = { type = "number", description = "Deal value" }
summary_text = { type = "text", description = "Generated summary text" }

[pipe.compose_summary]
type = "PipeCompose"
description = "Compose a sales summary from deal data"
inputs = { deal = "Deal" }
output = "SalesSummary"

[pipe.compose_summary.construct]
report_title = "Monthly Sales Report"
customer_name = { from = "deal.customer_name" }
deal_value = { from = "deal.amount" }
summary_text = { template = "Deal worth $deal.amount with $deal.customer_name" }
```

#### Field Composition Methods

There are four ways to define field values in a construct:

**1. Fixed Value (literal)**

Use a literal value directly:

```plx
[pipe.compose_report.construct]
report_title = "Annual Report"
report_year = 2024
is_draft = false
```

**2. Variable Reference (`from`)**

Get a value from working memory using a dotted path:

```plx
[pipe.compose_report.construct]
customer_name = { from = "deal.customer_name" }
total_amount = { from = "order.total" }
street_address = { from = "customer.address.street" }
```

**3. Template (`template`)**

Render a Jinja2 template with variable substitution:

```plx
[pipe.compose_report.construct]
invoice_number = { template = "INV-$order.id" }
summary = { template = "Deal worth $deal.amount with $deal.customer_name on {{ current_date }}" }
```

**4. Nested Construct**

For nested structures, use a TOML subsection:

```plx
[pipe.compose_invoice.construct]
invoice_number = { template = "INV-$order.id" }
total = { from = "order.total_amount" }

[pipe.compose_invoice.construct.billing_address]
street = { from = "customer.address.street" }
city = { from = "customer.address.city" }
country = "France"
```

#### Complete Construct Example

```plx
domain = "invoicing"

[concept.Address]
description = "A postal address"

[concept.Address.structure]
street = { type = "text", description = "Street address" }
city = { type = "text", description = "City name" }
country = { type = "text", description = "Country name" }

[concept.Invoice]
description = "An invoice document"

[concept.Invoice.structure]
invoice_number = { type = "text", description = "Invoice number" }
total = { type = "number", description = "Total amount" }

[pipe.compose_invoice]
type = "PipeCompose"
description = "Compose an invoice from order and customer data"
inputs = { order = "Order", customer = "Customer" }
output = "Invoice"

[pipe.compose_invoice.construct]
invoice_number = { template = "INV-$order.id" }
total = { from = "order.total_amount" }

[pipe.compose_invoice.construct.billing_address]
street = { from = "customer.address.street" }
city = { from = "customer.address.city" }
country = "France"
```

#### Key Parameters (Construct Mode)

- `construct`: Dictionary mapping field names to their composition rules
- Each field can be:
- A literal value (string, number, boolean)
- A dict with `from` key for variable reference
- A dict with `template` key for template rendering
- A nested dict for nested structures

**Note:** You must use either `template` or `construct`, not both. They are mutually exclusive.

## PipeImgGen operator

The PipeImgGen operator is used to generate images using AI image generation models.
Expand Down Expand Up @@ -824,7 +1006,7 @@ Presets are meant to record the choice of an llm with its hyper parameters (temp

Examples:
```toml
llm_for_complex_reasoning = { model = "base-claude", temperature = 1 }
llm_to_engineer = { model = "base-claude", temperature = 1 }
llm_to_extract_invoice = { model = "claude-3-7-sonnet", temperature = 0.1, max_tokens = "auto" }
```

Expand Down Expand Up @@ -855,7 +1037,7 @@ You can override the predefined llm presets by setting them in `.pipelex/inferen

ALWAYS RUN validation when you are finished writing pipelines: This checks for errors. If there are errors, iterate until it works.
- For a specific bundle/file: `pipelex validate path_to_file.plx`
- For all pipelines: `pipelex validate all`
- For all pipelines: `pipelex validate --all`
- Remember: Ensure your Python virtual environment is activated (typically `.venv` for standard installations) before running `pipelex` commands.

Then, create an example file to run the pipeline in the `examples` folder.
Expand Down
Loading
Loading