Ordis is a local-first tool and library that turns messy, unstructured text into clean, structured data using a schema-driven extraction pipeline powered by LLMs. You give it a schema that describes the fields you expect, point it at some raw text, and choose any OpenAI-compatible model. Ordis builds the prompt, calls the model, validates the output, and returns either a correct structured record or a clear error.
Ordis does for LLM extraction what Prisma does for databases: strict schemas, predictable output and no more glue code.
β CLI functional - Core extraction pipeline working with real LLMs. Ready for testing and feedback.
β Programmatic API - Can be used as an npm package in Node.js applications.
- Local-first extraction: Supports Ollama, LM Studio, or any OpenAI-compatible endpoint
- Schema-first workflow: Define your data structure upfront
- Deterministic output: Returns validated records or structured failures
- Token budget awareness: Automatic token counting with warnings and limits
- HTML preprocessing: Strip noise from web pages before extraction
- Dual-purpose: Use as a CLI or import as a library
- TypeScript support: Full type definitions included
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base http://localhost:11434/v1 \
--model llama3.1:8b \
--debugSample schema (invoice.schema.json):
{
"fields": {
"invoice_id": { "type": "string" },
"amount": { "type": "number" },
"currency": { "type": "string", "enum": ["USD", "SGD", "EUR"] },
"date": { "type": "string", "format": "date-time", "optional": true }
}
}Works with any service exposing an OpenAI-compatible API:
- Ollama
- LM Studio
- OpenRouter
- Mistral
- Groq
- OpenAI
- vLLM servers
Install globally to use the CLI anywhere:
npm install -g @ordis-dev/ordis
ordis --helpOr install locally in your project:
npm install @ordis-dev/ordisgit clone https://github.com/ordis-dev/ordis
cd ordis
npm install
npm run build
node dist/cli.js --helpExtract data from text using a schema:
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base http://localhost:11434/v1 \
--model llama3.1:8b \
--debugWith API key (for providers like OpenAI, Deepseek, etc.):
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base https://api.deepseek.com/v1 \
--model deepseek-chat \
--api-key your-api-key-hereEnable JSON mode (for reliable JSON responses):
# OpenAI and compatible providers
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base https://api.openai.com/v1 \
--model gpt-4o-mini \
--api-key your-api-key \
--json-mode
# Ollama (recommended: use /v1 endpoint for portability)
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base http://localhost:11434/v1 \
--model qwen2.5:32b \
--json-modeπ‘ Note: For Ollama, use http://localhost:11434/v1 for maximum portability across providers. Both /v1 (OpenAI-compatible) and /api (native) endpoints work correctly with JSON mode.
Use ordis as a library in your Node.js application:
import { extract, loadSchema, LLMClient } from '@ordis-dev/ordis';
// Load schema from file
const schema = await loadSchema('./invoice.schema.json');
// Or create schema from object
import { loadSchemaFromObject } from 'ordis-cli';
const schema = loadSchemaFromObject({
fields: {
invoice_id: { type: 'string' },
amount: { type: 'number' },
currency: { type: 'string', enum: ['USD', 'EUR', 'SGD'] }
}
});
// Configure LLM
const llmConfig = {
baseURL: 'http://localhost:11434/v1',
model: 'llama3.2:3b'
};
// Extract data
const result = await extract({
input: 'Invoice #INV-2024-0042 for $1,250.00 USD',
schema,
llmConfig
});
if (result.success) {
console.log(result.data);
// { invoice_id: 'INV-2024-0042', amount: 1250, currency: 'USD' }
console.log('Confidence:', result.confidence);
} else {
console.error('Extraction failed:', result.errors);
}Using LLM Presets:
import { extract, loadSchema, LLMPresets } from '@ordis-dev/ordis';
const schema = await loadSchema('./schema.json');
// Use preset configurations
const result = await extract({
input: text,
schema,
llmConfig: LLMPresets.ollama('llama3.2:3b')
// Or: LLMPresets.openai(apiKey, 'gpt-4o-mini')
// Or: LLMPresets.lmStudio('local-model')
});
// Enable JSON mode (provider auto-detected from baseURL)
const resultWithJsonMode = await extract({
input: text,
schema,
llmConfig: {
baseURL: 'http://localhost:11434/v1',
model: 'qwen2.5:32b',
jsonMode: true // Auto-detects Ollama, uses format: "json"
}
});
// Explicit provider override
const resultExplicit = await extract({
input: text,
schema,
llmConfig: {
baseURL: 'https://api.openai.com/v1',
model: 'gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY,
jsonMode: true,
provider: 'openai' // Uses response_format: { type: "json_object" }
}
});Extracting from HTML:
import { extract, loadSchema } from '@ordis-dev/ordis';
const schema = await loadSchema('./schema.json');
// Strip HTML noise before extraction
const result = await extract({
input: rawHtmlContent,
schema,
llmConfig: { baseURL: 'http://localhost:11434/v1', model: 'llama3.2:3b' },
preprocessing: {
stripHtml: true // Removes scripts, styles, nav, ads, etc.
// Or with options:
// stripHtml: {
// preserveStructure: true, // Convert headings/lists to markdown
// removeSelectors: ['.sidebar', '#comments'],
// maxLength: 10000
// }
}
});- β Schema loader and validator
- β Prompt builder with confidence scoring
- β Universal LLM client (OpenAI-compatible APIs)
- β Token budget awareness with warnings and errors
- β Structured error system
- β CLI extraction command
- β Programmatic API for library usage
- β Field-level confidence tracking
- β TypeScript type definitions
- β Performance benchmarks
- β HTML preprocessing for noisy web content
Pipeline overhead is negligible (~1-2ms). LLM calls dominate execution time (1-10s depending on model). See benchmarks/README.md for detailed metrics.
Run benchmarks:
npm run benchmarkCompleted in v0.6.1:
- β
Fixed JSON mode with Ollama /v1 endpoint (#81)
- Automatic endpoint detection (response_format for /v1, format for /api)
- Improved documentation with endpoint comparison and recommendations
Completed in v0.6.0:
- β
JSON mode support for OpenAI and Ollama providers (#78)
- Auto-detection based on base URL
- Eliminates parsing failures from non-JSON responses
- Works with both Ollama (
format: "json") and OpenAI (response_format)
Completed in v0.5.1:
- β Default context window increased to 32k (was 4096)
- β Markdown-wrapped JSON parsing (#74)
- β AMD GPU support in benchmarks (rocm-smi detection)
- β GPU health monitoring in benchmarks (VRAM pressure, utilization)
Completed in v0.5.0:
- β
Type coercion for LLM output (#71)
- Automatic string-to-number/boolean coercion
- Null-like string handling ("null"/"none"/"n/a")
- Enum case-insensitive matching ("Series B" β "series_b")
- Date format normalization (US, EU, written formats)
- β
Array of objects support (#70)
- Nested object schemas with recursive validation
- Proper error paths (e.g., "items[1].price")
- β Ollama runtime options (num_ctx, num_gpu)
Completed in v0.4.0:
- β
User-friendly error messages (#63)
- Emoji indicators (β, π‘, βΉοΈ) for quick scanning
- Expected vs. actual values for validation errors
- Actionable suggestions for common issues
- Service-specific troubleshooting (Ollama, LM Studio, OpenAI)
- β
Debug mode enhancements
- Full LLM request/response logging
- Token usage breakdown
See CHANGELOG.md for complete version history.
Contributions are welcome!