IAB Content Taxonomy Mapper (Python)

View on PyPI • View on GitHub • Open Web Tool

Map IAB Content Taxonomy 2.x labels/codes to IAB 3.0 locally with a deterministic → fuzzy → (optional) semantic pipeline.

This is the Python implementation. For JavaScript/TypeScript, see @mixpeek/iab-mapper.

🔧 Install

From PyPI (recommended)

pip install iab-mapper

From source

cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Optional (enable local embeddings / KNN search)
pip install -e ".[emb]"

🚀 Quick Start

# simplest path: fuzzy only, CSV in → JSON out
iab-mapper sample_2x_codes.csv -o mapped.json

# enable local embeddings (improves recall on free‑text labels)
iab-mapper sample_2x_codes.csv -o mapped.json --use-embeddings

🐍 Python API

from pathlib import Path
from iab_mapper.pipeline import Mapper, MapConfig
import iab_mapper as pkg

# Use packaged stub catalogs or point data_dir to your own
data_dir = Path(pkg.__file__).parent / "data"

cfg = MapConfig(
    fuzzy_method="bm25",   # rapidfuzz|tfidf|bm25
    fuzzy_cut=0.92,
    use_embeddings=False,   # set True and choose emb_model to enable
    max_topics=3,
    drop_scd=False,
    cattax="2",            # OpenRTB content.cattax enum
    overrides_path=None     # path to JSON overrides if desired
)

mapper = Mapper(cfg, str(data_dir))

# Single record with optional vectors
rec = {
    "code": "2-12",
    "label": "Food & Drink",
    "channel": "editorial",
    "type": "article",
    "format": "video",
    "language": "en",
    "source": "professional",
    "environment": "ctv",
}

out = mapper.map_record(rec)
print(out["out_ids"])         # topic + vector IDs
print(out["openrtb"])         # {"content": {"cat": [...], "cattax": "2"}}
print(out["vast_contentcat"]) # "id1","id2",...

# Or just map topics
topics = mapper.map_topics("Cooking how-to")

# Batch over a list of dicts
rows = [rec, {"label": "Sports"}]
mapped = [mapper.map_record(r) for r in rows]

⚙️ Useful Flags

Flag	Default	What it does
`--fuzzy-cut`	`0.92`	Stricter = fewer, higher-confidence matches
`--use-embeddings`	off	Enable local embeddings for near-miss labels
`--emb-model`	`all-MiniLM-L6-v2`	Sentence-Transformers model or `tfidf`
`--emb-cut`	`0.80`	Cosine similarity threshold for embeddings
`--max-topics`	`3`	Cap topic IDs per row
`--drop-scd`	off	Exclude Sensitive Content nodes
`--cattax`	`2`	OpenRTB `content.cattax` enum
`--unmapped-out`	—	Write misses to file for audit
`--overrides`	—	Force mappings before match

🖥️ Web Demo

cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install -r requirements-dev.txt
uvicorn scripts.web_server:app --port 8000 --reload

Open http://localhost:8000/

📜 License

BSD 2-Clause. See LICENSE.

For full documentation, see the main README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IAB Content Taxonomy Mapper (Python)

🔧 Install

From PyPI (recommended)

From source

🚀 Quick Start

🐍 Python API

⚙️ Useful Flags

🖥️ Web Demo

📜 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

IAB Content Taxonomy Mapper (Python)

🔧 Install

From PyPI (recommended)

From source

🚀 Quick Start

🐍 Python API

⚙️ Useful Flags

🖥️ Web Demo

📜 License