HonestRoles

HonestRoles, developed by Hypertrial, is a Python package designed to transform raw job posting data into structured, scored, and searchable datasets.

Features

🧹 Clean: HTML stripping, location normalization (city/region/country), salary parsing, and record deduplication.
🔍 Filter: High-performance FilterChain with predicates for location, salary, skills, and keyword matching.
🏷️ Label: Automated seniority detection, role categorization, and tech stack extraction.
⭐️ Rate: Comprehensive job description scoring for completeness and quality.
🤖 LLM Integration: seamless integration with local Ollama models (e.g., Llama 3) for deep semantic analysis.

Installation

pip install honestroles

For development:

git clone https://github.com/hypertrial/honestroles.git
cd honestroles
pip install -e ".[dev]"

Quickstart

import honestroles as hr
from honestroles import schema

# Load raw job data (Parquet or DuckDB)
df = hr.read_parquet("jobs_current.parquet")

# 1. Clean and normalize data
df = hr.clean_jobs(df)

# 2. Apply complex filtering
chain = hr.FilterChain()
chain.add(hr.filter.by_location, regions=["California", "New York"])
chain.add(hr.filter.by_salary, min_salary=120_000, currency="USD")
chain.add(hr.filter.by_skills, required=["Python", "React"])
df = chain.apply(df)

# 3. Label roles (Heuristics + LLM)
df = hr.label_jobs(df, use_llm=True, model="llama3")

# 4. Rate job quality
df = hr.rate_jobs(df)

# Access data using schema constants
print(df[[schema.TITLE, schema.CITY, schema.COUNTRY]].head())

# Save structured results
hr.write_parquet(df, "jobs_scored.parquet")

Contract-First Flow

For source data, use contract normalization + validation before processing:

import honestroles as hr

df = hr.read_parquet("jobs_current.parquet", validate=False)
df = hr.normalize_source_data_contract(df)
df = hr.validate_source_data_contract(df)

df = hr.clean_jobs(df)
df = hr.filter_jobs(df, remote_only=True)
df = hr.label_jobs(df, use_llm=False)
df = hr.rate_jobs(df, use_llm=False)

See /docs/start/quickstart.md and /docs/reference/source_data_contract_v1.md.

Documentation index: /docs/index.md. Docs stack: /docs/maintainers/docs_stack.md.

Build docs locally:

pip install -e ".[docs]"
mkdocs serve

Deploy docs on GitHub Pages:

Ensure repository Settings -> Pages -> Build and deployment -> Source is set to GitHub Actions.
Push to main to trigger .github/workflows/docs-pages.yml.

Core Modules

Schema Constants

Always use honestroles.schema for consistent column referencing:

from honestroles import schema

# Available constants:
# schema.TITLE, schema.DESCRIPTION_TEXT, schema.COMPANY
# schema.CITY, schema.REGION, schema.COUNTRY
# schema.SALARY_MIN, schema.SALARY_MAX, etc.

Filtering with `FilterChain`

The FilterChain allows you to compose multiple filtering rules efficiently:

from honestroles import FilterChain, filter_jobs

# Functional approach:
df = filter_jobs(df, remote_only=True, min_salary=100_000)

# Composable approach:
chain = FilterChain()
chain.add(hr.filter.by_keywords, include=["Engineer"], exclude=["Manager"])
chain.add(hr.filter.by_completeness, required_fields=[schema.DESCRIPTION_TEXT, schema.APPLY_URL])
filtered_df = chain.apply(df)

Local LLM Usage (Ollama)

Ensure Ollama is running locally:

ollama serve
ollama pull llama3

Then enable LLM-based labeling or quality rating:

df = hr.label_jobs(df, use_llm=True, model="llama3")
df = hr.rate_jobs(df, use_llm=True, model="llama3")

Package Layout

src/honestroles/
├── clean/        # HTML stripping, normalization, and dedup
├── filter/       # Composed FilterChain and predicates
├── io/           # Parquet and DuckDB I/O with validation
├── label/        # Seniority, Category, and Tech Stack labeling
├── llm/          # Ollama client and prompt templates
├── rate/         # Completeness, Quality, and Composite ratings
└── schema.py     # Centralized column name constants

Testing

Run the test suite with pytest:

pytest

Run all CI-equivalent quality checks automatically before each local commit:

pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files

This installs a Git pre-commit hook that runs ruff, mypy, and pytest -m "not performance" -q.

Stability

Changelog: /CHANGELOG.md
Performance guardrails: /docs/maintainers/performance.md
Docs index: /docs/index.md

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
docs		docs
examples		examples
plugin_template		plugin_template
plugins-index		plugins-index
scripts		scripts
src/honestroles		src/honestroles
tests		tests
.cursorrules		.cursorrules
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING_PLUGIN.md		CONTRIBUTING_PLUGIN.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HonestRoles

Features

Installation

Quickstart

Contract-First Flow

Core Modules

Schema Constants

Filtering with `FilterChain`

Local LLM Usage (Ollama)

Package Layout

Testing

Stability

About

Uh oh!

Languages

License

hypertrial/honestroles

Folders and files

Latest commit

History

Repository files navigation

HonestRoles

Features

Installation

Quickstart

Contract-First Flow

Core Modules

Schema Constants

Filtering with FilterChain

Local LLM Usage (Ollama)

Package Layout

Testing

Stability

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

Filtering with `FilterChain`