Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
e7d633d
refactor: remove table and lexicon modules and update dependencies
ericpan64 Nov 25, 2025
f5ca33c
refactor: get -> grab
ericpan64 Nov 26, 2025
645955a
refactor: add DROP enum
ericpan64 Nov 26, 2025
fa10d00
feat: introduce KEEP wrapper for empty value preservation
ericpan64 Nov 26, 2025
cccc26a
feat: add process_output function and update documentation
ericpan64 Nov 26, 2025
6f65ce0
feat: add @mapper decorator for data transformations and update docs
ericpan64 Nov 26, 2025
c6d9bed
feat: add mapping_context for strict mode
ericpan64 Nov 26, 2025
c995696
chore: remove unused legacy code
ericpan64 Nov 26, 2025
0d271a2
test: rewrite tests for new API
ericpan64 Nov 26, 2025
12ef112
docs: finalize README for new API
ericpan64 Nov 26, 2025
9640feb
rename
ericpan64 Nov 26, 2025
187122f
feat(validation): add types.py with Result type and aliases
ericpan64 Nov 26, 2025
f7201bf
feat(validation): add core.py with V, DictV, ListV validators
ericpan64 Nov 26, 2025
d80c5c0
feat(validation): add validators.py with built-in validators
ericpan64 Nov 26, 2025
280b474
feat(validation): add core.py with V, DictV, ListV classes
ericpan64 Nov 26, 2025
8a33cea
feat(validation): add validators.py with built-in validators
ericpan64 Nov 26, 2025
2d44c20
feat(validation): add schema.py with validate() and to_pydantic()
ericpan64 Nov 26, 2025
b9233a8
feat(validation): add __init__.py with public API exports
ericpan64 Nov 26, 2025
23e75b8
test(validation): add comprehensive tests for validation module
ericpan64 Nov 26, 2025
c7ccb79
docs: add validation module documentation to README
ericpan64 Nov 26, 2025
66eeacd
cleanup
ericpan64 Nov 26, 2025
694acea
refactor: update process.py, core.py and drop.py modules
ericpan64 Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
355 changes: 257 additions & 98 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,129 +1,288 @@
# chidian - <ins alt="chi">chi</ins>meric <ins alt="d̲">d</ins>ata <ins alt="i̲">i</ins>nterch<ins alt="a̲n̲">an</ins>ge
# chidian

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

> Declarative, type-safe data mapping for savvy data engineers
> Dict-to-dict data mappings that look like dicts

**chidian** is a composable framework for building readable data transformations with **Pydantic v2**.
**chidian** lets you write data transformations as plain dictionaries. Your mapping *looks like* your output.

## Quick Start

```python
from pydantic import BaseModel
from chidian import Mapper
import chidian.partials as p

# Source data (nested)
source_data = {
"name": {"first": "Gandalf", "given": ["the", "Grey"], "suffix": None},
"address": {
"street": ["Bag End", "Hobbiton"],
"city": "The Shire",
"postal_code": "ME001",
"country": "Middle Earth"
from chidian import mapper, grab

@mapper
def patient_summary(d):
return {
"patient_id": grab(d, "data.patient.id"),
"is_active": grab(d, "data.patient.active"),
"latest_visit": grab(d, "data.visits[0].date"),
}
}

# Target data (flat)
target = {
"full_name": "Gandalf the Grey",
"address": "Bag End\nHobbiton\nThe Shire\nME001\nMiddle Earth"
source = {
"data": {
"patient": {"id": "p-123", "active": True},
"visits": [
{"date": "2024-01-15", "type": "checkup"},
{"date": "2024-02-20", "type": "followup"}
]
}
}

# Define schemas
class SourceSchema(BaseModel):
name: dict
address: dict

class TargetSchema(BaseModel):
full_name: str
address: str

# Create type-safe mapper
person_mapping = Mapper(
{
"full_name": p.get([
"name.first",
"name.given[*]",
"name.suffix"
]).join(" ", flatten=True),

"address": p.get([
"address.street[*]",
"address.city",
"address.postal_code",
"address.country"
]).join("\n", flatten=True),
},
min_input_schemas=[SourceSchema],
output_schema=TargetSchema,
)
result = patient_summary(source)
# {"patient_id": "p-123", "is_active": True, "latest_visit": "2024-01-15"}
```

## Core Idea

Write your mapping as the dict you want back:

```python
from chidian import mapper, grab, DROP, KEEP

@mapper
def normalize_user(d):
return {
# Static values — just write them
"version": "2.0",

# Pull from source
"name": grab(d, "user.name"),

# Nested output — nest your mapping
"address": {
"city": grab(d, "location.city"),
"zip": grab(d, "location.postal"),
},

# Conditionally drop
"risky_field": DROP.THIS_OBJECT if not grab(d, "verified") else grab(d, "data"),
}
```

Decorated functions are shareable, testable, and composable:

```python
# Import and use in your codebase
from myproject.mappings import normalize_user

result = normalize_user(raw_data)
```

## `grab(data, path)`

Extract values using dot notation and bracket indexing:

```python
grab(d, "user.name") # Nested access
grab(d, "items[0]") # List index
grab(d, "items[-1]") # Negative index
grab(d, "users[*].name") # Map over list
```

## `DROP` — Conditional Removal

Control what gets excluded from output. `DROP` propagates upward through the structure:

| Sentinel | Effect |
|----------|--------|
| `DROP.THIS_OBJECT` | Remove this value (or list item, or dict) |
| `DROP.PARENT` | Remove the parent container |
| `DROP.GRANDPARENT` | Remove two levels up |
| `DROP.GREATGRANDPARENT` | Remove three levels up (raises if out of bounds) |

```python
@mapper
def with_drops(d):
return {
"kept": {"id": grab(d, "data.patient.id")},
"dropped": {
"trigger": DROP.THIS_OBJECT, # This whole dict removed
"ignored": "never appears",
},
"items": [
{"bad": DROP.PARENT, "also_ignored": "x"}, # Removes entire list
{"good": "value"},
],
}

# Result: {"kept": {"id": "..."}}
```

**In lists**, `DROP.THIS_OBJECT` removes just that item:

```python
@mapper
def filter_list(d):
return {
"tags": [
"first_kept",
DROP.THIS_OBJECT, # Removed
"third_kept",
{"nested": DROP.THIS_OBJECT}, # Entire dict removed
],
}

# Result: {"tags": ["first_kept", "third_kept"]}
```

## `KEEP` — Preserve Empty Values

# Execute
result = person_mapping(SourceSchema(**source_data))
assert result == TargetSchema(**target)
By default, empty values (`{}`, `[]`, `""`, `None`) are removed. Wrap with `KEEP()` to preserve them:

```python
from chidian import KEEP

@mapper
def with_empties(d):
return {
"explicit_empty": KEEP({}), # Preserved as {}
"explicit_none": KEEP(None), # Preserved as None
"implicit_empty": {}, # Removed by default
"normal_value": "hello",
}

# Result: {"explicit_empty": {}, "explicit_none": None, "normal_value": "hello"}
```

## Decorator Options

```python
@mapper(remove_empty=False)
def keep_all_empties(d):
return {
"empty_dict": {}, # Kept
"empty_list": [], # Kept
"none_val": None, # Kept
}
```

## Strict Mode

Catch missing keys during development:

```python
from chidian import mapper, grab, mapping_context

@mapper
def risky_mapping(d):
return {
"id": grab(d, "data.patient.id"),
"missing": grab(d, "key.not.found"), # Doesn't exist
}

# Normal — missing keys become empty/removed
result = risky_mapping(source)

# Strict — raises KeyError on missing keys
with mapping_context(strict=True):
risky_mapping(source) # KeyError!
```

**Note**: Strict mode distinguishes between "key not found" and "key exists with `None` value":

```python
source = {"has_none": None}

@mapper
def check_none(d):
return {
"explicit_none": grab(d, "has_none"), # OK — key exists, value is None
"missing": grab(d, "does.not.exist"), # Raises in strict mode
}
```

## Core Features
## Validation

| Component | Purpose |
| ---------------- | ------------------------------------------------------------------------ |
| **Mapper** | Dict→dict transformations with optional schema validation |
| **DataMapping** | Pydantic-validated, type-safe transformations |
| **Partials API** | Composable operators for concise transformation chains |
| **Table** | Sparse tables with path queries, joins, pandas/polars interop |
| **Lexicon** | Bidirectional code lookups (e.g., LOINC ↔ SNOMED) with metadata |
chidian includes a dict-like validation DSL that mirrors your data structure:

## Table & DataFrames
```python
from chidian.validation import Required, Optional, validate, to_pydantic, Gte, InSet

schema = {
"name": Required(str),
"email": Optional(str),
"age": int & Gte(0),
"role": InSet({"admin", "user"}),
"tags": [str],
"profile": {
"bio": Optional(str),
"avatar_url": str,
},
}

Seamless conversion between chidian Tables and pandas/polars:
# Validate data
data = {"name": "Alice", "age": 30, "role": "admin", "tags": ["python"]}
result = validate(data, schema)

```bash
pip install 'chidian[pandas]' # pandas support
pip install 'chidian[polars]' # polars support
pip install 'chidian[df]' # both
if result.is_ok():
print("Valid!", result.value)
else:
for path, msg in result.error:
print(f" {'.'.join(map(str, path))}: {msg}")
```

### Composing Validators

Use `&` (and) and `|` (or) to combine validators:

```python
from chidian.table import Table
from chidian.validation import IsType, Gt, Matches

# Both must pass
positive_int = IsType(int) & Gt(0)

table = Table([
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
])
# Either can pass
str_or_int = str | int

df_pd = table.to_pandas(index=True)
df_pl = table.to_polars(add_index=True)
# With regex
email = str & Matches(r"^[\w.-]+@[\w.-]+\.\w+$")
```

### Flatten Nested Data
### Pydantic Integration

Convert nested structures into flat, column-based tables:
Compile schemas to Pydantic models for runtime validation:

```python
table = Table([
{"user": {"name": "John", "prefs": ["email", "sms"]}, "id": 123},
{"user": {"name": "Jane", "prefs": ["phone"]}, "id": 456}
])

# Flatten with intuitive path notation
flat = table.flatten()
print(flat.columns)
# {'id', 'user.name', 'user.prefs[0]', 'user.prefs[1]'}

# Export flattened data
table.to_pandas(flatten=True)
table.to_polars(flatten=True)
table.to_csv("flat.csv", flatten=True)

# Control flattening behavior
table.flatten(max_depth=2, array_index_limit=5)
User = to_pydantic("User", {
"name": Required(str),
"email": Optional(str),
"age": int,
})

user = User(name="Alice", age=30) # Full Pydantic validation
```

**Features:**
- Path notation: `user.name`, `items[0]`, `data.settings.theme`
- Handles sparse data (different nesting per row)
- Special key escaping for dots/brackets
- Depth and array size controls
### Built-in Validators

| Validator | Description |
|-----------|-------------|
| `Required(v)` | Field cannot be None |
| `Optional(v)` | Field can be None |
| `IsType(t)` | Value must be instance of type |
| `InRange(lo, hi)` | Length must be in range |
| `InSet(values)` | Value must be in set |
| `Matches(pattern)` | String must match regex |
| `Gt`, `Gte`, `Lt`, `Lte` | Numeric comparisons |
| `Between(lo, hi)` | Value between bounds |
| `Predicate(fn, msg)` | Custom validation function |

## API Reference

### `@mapper` / `@mapper(remove_empty=True)`

Decorator that transforms a mapping function into a callable mapper.

### `grab(data, path)`

Extract values using dot notation and bracket indexing:

```python
grab(d, "user.name") # Nested access
grab(d, "items[0]") # List index
grab(d, "items[-1]") # Negative index
grab(d, "users[*].name") # Map over list
```

## Design Philosophy

Expand All @@ -137,7 +296,7 @@ Built by data engineers, for data engineers. chidian solves common pain points:
**Solutions:**
- **Iterate over perfection**: Learn and adapt as you build
- **Functions as first-class objects**: Compose transformations cleanly
- **JSON-first**: Simple, universal data structures
- **Keep things dict-like**: Simple, universal structure that's quick to read

chidian applies functional programming principles to data mappings, drawing inspiration from [Pydantic](https://github.com/pydantic/pydantic), [JMESPath](https://github.com/jmespath), [funcy](https://github.com/Suor/funcy), and others.

Expand All @@ -147,4 +306,4 @@ Contributions welcome! Open an issue to discuss your idea before submitting a PR

---

See [tests](/chidian/tests) for more examples.
See [tests](/tests) for more examples.
Loading
Loading