Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7ef18ce
Demo materials
lchoquel Nov 13, 2025
3a0c0bb
Merge branch 'dev' into demo
lchoquel Nov 13, 2025
776e27e
config and rules
lchoquel Nov 13, 2025
717efb6
Merge branch 'main' into demo
lchoquel Nov 13, 2025
5848ed2
Merge branch 'main' into demo
lchoquel Nov 18, 2025
d58129b
Merge branch 'main' into demo
lchoquel Nov 30, 2025
78fd33a
ENable only Pipelex inference
lchoquel Nov 30, 2025
2e86a60
Merge branch 'main' into demo
lchoquel Dec 2, 2025
333a959
cv_match_scw
lchoquel Dec 4, 2025
1060d0d
Multi-CV demo
lchoquel Dec 4, 2025
3490f5d
Cleanup
lchoquel Dec 4, 2025
78e998a
Update to support Gateway
lchoquel Dec 9, 2025
5b383d0
Use Gateway (local dep)
lchoquel Dec 14, 2025
cf7a637
update pipelex, proper CV inputs to demo
lchoquel Jan 4, 2026
7b4deab
Examples run generating graph
lchoquel Jan 4, 2026
d79a90a
Use feature/Chicago
lchoquel Jan 12, 2026
ce90326
Use feature/Chicago (WIP)
lchoquel Jan 12, 2026
500d41f
Update PDF -> Document
lchoquel Jan 14, 2026
92be946
Update agent rules. Cleanup config files and test pipeline.
lchoquel Jan 19, 2026
98d4220
Pipelex dep
lchoquel Jan 19, 2026
653bc0a
Support Pipelex Chicago
lchoquel Jan 19, 2026
6e09ecf
CI --disable-inference to run without Pipelex Service agreement
lchoquel Jan 19, 2026
9451bc7
Removed the separate "Boot test" step since gha-tests already runs al…
lchoquel Jan 19, 2026
4103dba
Use Chicago b2
lchoquel Jan 21, 2026
497c8ce
git ignores
lchoquel Jan 21, 2026
24499e3
Merge branch 'feature/Chicago' into demos/Chicago
lchoquel Jan 21, 2026
9125497
Pipelex version dep
lchoquel Jan 21, 2026
880d6df
Merge branch 'feature/Chicago' into demos/Chicago
lchoquel Jan 21, 2026
8f7bf83
Merge branch 'feature/Chicago' into demos/Chicago
lchoquel Jan 21, 2026
5ca2676
Use new base deck
lchoquel Jan 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 145 additions & 88 deletions .blackboxrules
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ A pipeline file has three main sections:

#### Domain Statement
```plx
domain = "domain_name"
domain = "domain_code"
description = "Description of the domain" # Optional
```
Note: The domain name usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.
Note: The domain code usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.

#### Concept Definitions

Expand Down Expand Up @@ -62,7 +62,7 @@ For details on how to structure concepts with fields, see the "Structuring Model
### Pipe Base Definition

```plx
[pipe.your_pipe_name]
[pipe.your_pipe_code]
type = "PipeLLM"
description = "A description of what your pipe does"
inputs = { input_1 = "ConceptName1", input_2 = "ConceptName2" }
Expand Down Expand Up @@ -471,7 +471,7 @@ The PipeExtract operator is used to extract text and images from an image or a P
[pipe.extract_info]
type = "PipeExtract"
description = "extract the information"
inputs = { document = "PDF" } # or { image = "Image" } if it's an image. This is the only input.
inputs = { document = "Document" } # or { image = "Image" } if it's an image. This is the only input.
output = "Page"
```

Expand All @@ -480,7 +480,7 @@ Using Extract Model Settings:
[pipe.extract_with_model]
type = "PipeExtract"
description = "Extract with specific model"
inputs = { document = "PDF" }
inputs = { document = "Document" }
output = "Page"
model = "base_extract_mistral" # Use predefined extract preset or model alias
```
Expand Down Expand Up @@ -588,25 +588,160 @@ $sales_rep.phone | $sales_rep.email
"""
```

#### Key Parameters
#### Key Parameters (Template Mode)

- `template`: Inline template string (mutually exclusive with template_name)
- `template`: Inline template string (mutually exclusive with template_name and construct)
- `template_name`: Name of a predefined template (mutually exclusive with template)
- `template_category`: Template type ("llm_prompt", "html", "markdown", "mermaid", etc.)
- `templating_style`: Styling options for template rendering
- `extra_context`: Additional context variables for template

For more control, you can use a nested `template` section instead of the `template` field:

- `template.template`: The template string
- `template.category`: Template type
- `template.templating_style`: Styling options

#### Template Variables

Use the same variable insertion rules as PipeLLM:

- `@variable` for block insertion (multi-line content)
- `$variable` for inline insertion (short text)

#### Construct Mode (for StructuredContent Output)

PipeCompose can also generate `StructuredContent` objects using the `construct` section. This mode composes field values from fixed values, variable references, templates, or nested structures.

**When to use construct mode:**

- You need to output a structured object (not just Text)
- You want to deterministically compose fields from existing data
- No LLM is needed - just data composition and templating

##### Basic Construct Usage

```plx
[concept.SalesSummary]
description = "A structured sales summary"

[concept.SalesSummary.structure]
report_title = { type = "text", description = "Title of the report" }
customer_name = { type = "text", description = "Customer name" }
deal_value = { type = "number", description = "Deal value" }
summary_text = { type = "text", description = "Generated summary text" }

[pipe.compose_summary]
type = "PipeCompose"
description = "Compose a sales summary from deal data"
inputs = { deal = "Deal" }
output = "SalesSummary"

[pipe.compose_summary.construct]
report_title = "Monthly Sales Report"
customer_name = { from = "deal.customer_name" }
deal_value = { from = "deal.amount" }
summary_text = { template = "Deal worth $deal.amount with $deal.customer_name" }
```

##### Field Composition Methods

There are four ways to define field values in a construct:

**1. Fixed Value (literal)**

Use a literal value directly:

```plx
[pipe.compose_report.construct]
report_title = "Annual Report"
report_year = 2024
is_draft = false
```

**2. Variable Reference (`from`)**

Get a value from working memory using a dotted path:

```plx
[pipe.compose_report.construct]
customer_name = { from = "deal.customer_name" }
total_amount = { from = "order.total" }
street_address = { from = "customer.address.street" }
```

**3. Template (`template`)**

Render a Jinja2 template with variable substitution:

```plx
[pipe.compose_report.construct]
invoice_number = { template = "INV-$order.id" }
summary = { template = "Deal worth $deal.amount with $deal.customer_name on {{ current_date }}" }
```

**4. Nested Construct**

For nested structures, use a TOML subsection:

```plx
[pipe.compose_invoice.construct]
invoice_number = { template = "INV-$order.id" }
total = { from = "order.total_amount" }

[pipe.compose_invoice.construct.billing_address]
street = { from = "customer.address.street" }
city = { from = "customer.address.city" }
country = "France"
```

##### Complete Construct Example

```plx
domain = "invoicing"

[concept.Address]
description = "A postal address"

[concept.Address.structure]
street = { type = "text", description = "Street address" }
city = { type = "text", description = "City name" }
country = { type = "text", description = "Country name" }

[concept.Invoice]
description = "An invoice document"

[concept.Invoice.structure]
invoice_number = { type = "text", description = "Invoice number" }
total = { type = "number", description = "Total amount" }

[pipe.compose_invoice]
type = "PipeCompose"
description = "Compose an invoice from order and customer data"
inputs = { order = "Order", customer = "Customer" }
output = "Invoice"

[pipe.compose_invoice.construct]
invoice_number = { template = "INV-$order.id" }
total = { from = "order.total_amount" }

[pipe.compose_invoice.construct.billing_address]
street = { from = "customer.address.street" }
city = { from = "customer.address.city" }
country = "France"
```

##### Key Parameters (Construct Mode)

- `construct`: Dictionary mapping field names to their composition rules
- Each field can be:
- A literal value (string, number, boolean)
- A dict with `from` key for variable reference
- A dict with `template` key for template rendering
- A nested dict for nested structures

**Note:** You must use either `template` or `construct`, not both. They are mutually exclusive.

### PipeImgGen operator

The PipeImgGen operator is used to generate images using AI image generation models.
Expand Down Expand Up @@ -952,13 +1087,13 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
},
)

## Here we have a single input and it's a PDF.
## Because PDFContent is a native concept, we can use it directly as a value,
## Here we have a single input and it's a document.
## Because DocumentContent is a native concept, we can use it directly as a value,
## the system knows what content it corresponds to:
pipe_output = await execute_pipeline(
pipe_code="power_extractor_dpe",
inputs={
"document": PDFContent(url=pdf_url),
"document": DocumentContent(url=pdf_url),
},
)

Expand Down Expand Up @@ -1081,82 +1216,4 @@ result_list = pipe_output.main_stuff_as_items(item_type=GanttChart)
```

---

## Rules to choose LLM models used in PipeLLMs.

### LLM Configuration System

In order to use it in a pipe, an LLM is referenced by its llm_handle (alias) and possibly by an llm_preset.
LLM configurations are managed through the new inference backend system with files located in `.pipelex/inference/`:

- **Model Deck**: `.pipelex/inference/deck/base_deck.toml` and `.pipelex/inference/deck/overrides.toml`
- **Backends**: `.pipelex/inference/backends.toml` and `.pipelex/inference/backends/*.toml`
- **Routing**: `.pipelex/inference/routing_profiles.toml`

### LLM Handles

An llm_handle can be either:
1. **A direct model name** (like "gpt-4o-mini", "claude-3-sonnet") - automatically available for all models loaded by the inference backend system
2. **An alias** - user-defined shortcuts that map to model names, defined in the `[aliases]` section:

```toml
[aliases]
base-claude = "claude-4.5-sonnet"
base-gpt = "gpt-5"
base-gemini = "gemini-2.5-flash"
base-mistral = "mistral-medium"
```

The system first looks for direct model names, then checks aliases if no direct match is found. The system handles model routing through backends automatically.

### Using an LLM Handle in a PipeLLM

Here is an example of using an llm_handle to specify which LLM to use in a PipeLLM:

```plx
[pipe.hello_world]
type = "PipeLLM"
description = "Write text about Hello World."
output = "Text"
model = { model = "gpt-5", temperature = 0.9 }
prompt = """
Write a haiku about Hello World.
"""
```

As you can see, to use the LLM, you must also indicate the temperature (float between 0 and 1) and max_tokens (either an int or the string "auto").

### LLM Presets

Presets are meant to record the choice of an llm with its hyper parameters (temperature and max_tokens) if it's good for a particular task. LLM Presets are skill-oriented.

Examples:
```toml
llm_to_engineer = { model = "base-claude", temperature = 1 }
llm_to_extract_invoice = { model = "claude-4.5-sonnet", temperature = 0.1, max_tokens = "auto" }
```

The interest is that these presets can be used to set the LLM choice in a PipeLLM, like this:

```plx
[pipe.extract_invoice]
type = "PipeLLM"
description = "Extract invoice information from an invoice text transcript"
inputs = { invoice_text = "InvoiceText" }
output = "Invoice"
model = "llm_to_extract_invoice"
prompt = """
Extract invoice information from this invoice:

The category of this invoice is: $invoice_details.category.

@invoice_text
"""
```

The setting here `model = "llm_to_extract_invoice"` works because "llm_to_extract_invoice" has been declared as an llm_preset in the deck.
You must not use an LLM preset in a PipeLLM that does not exist in the deck. If needed, you can add llm presets.


You can override the predefined llm presets by setting them in `.pipelex/inference/deck/overrides.toml`.
<!-- END_PIPELEX_RULES -->
6 changes: 3 additions & 3 deletions .cursor/rules/run_pipelex.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -99,13 +99,13 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
},
)

# Here we have a single input and it's a PDF.
# Because PDFContent is a native concept, we can use it directly as a value,
# Here we have a single input and it's a document.
# Because DocumentContent is a native concept, we can use it directly as a value,
# the system knows what content it corresponds to:
pipe_output = await execute_pipeline(
pipe_code="power_extractor_dpe",
inputs={
"document": PDFContent(url=pdf_url),
"document": DocumentContent(url=pdf_url),
},
)

Expand Down
Loading
Loading