Releases: ksanyok/TextHumanize
v0.11.0 — 3x Dictionary Expansion + Composer Fix
What's New
Massive Dictionary Expansion (3x total)
All 9 language dictionaries expanded from 2,281 to 6,881 entries (3.0x growth):
| Language | Before | After | Growth |
|---|---|---|---|
| English | 257 | 1,391 | 5.4x |
| Russian | 291 | 956 | 3.3x |
| Ukrainian | 252 | 780 | 3.1x |
| German | 235 | 724 | 3.1x |
| French | 263 | 599 | 2.3x |
| Spanish | 255 | 613 | 2.4x |
| Italian | 244 | 616 | 2.5x |
| Polish | 244 | 617 | 2.5x |
| Portuguese | 240 | 585 | 2.4x |
All 9 categories expanded: synonyms, bureaucratic words/phrases, AI connectors, sentence starters, colloquial markers, perplexity boosters, split conjunctions, abbreviations.
Bug Fixes
- Composer package name — root
composer.jsonhad incorrect nameksanyok/texthumanize(no hyphen). Fixed toksanyok/text-humanize. Also changedtypefromprojecttolibrarywith proper Packagist metadata. - TOC dots preservation — table-of-contents leader dots (
...........) no longer collapse into ellipsis.
Install
# Python
pip install texthumanize
# PHP
composer require ksanyok/text-humanize1,455 tests passing.
v0.10.0 — Grammar, Uniqueness, Health Score, Semantic & Sentence Readability
What's New in v0.10.0
5 New Analysis Modules (all offline, no ML/API)
| Module | Function | Description |
|---|---|---|
| Grammar Checker | check_grammar() / fix_grammar() |
Rule-based grammar checking for 9 languages |
| Uniqueness Score | uniqueness_score() / compare_texts() |
N-gram fingerprinting uniqueness analysis |
| Content Health | content_health() |
Composite quality: readability + grammar + uniqueness + AI + coherence |
| Semantic Similarity | semantic_similarity() |
Measures semantic preservation between original and processed text |
| Sentence Readability | sentence_readability() |
Per-sentence difficulty scoring (easy/medium/hard/very_hard) |
Custom Dictionary API
result = humanize(text, custom_dict={
"implement": "build",
"utilize": ["use", "apply", "employ"], # random pick
})Massively Expanded Dictionaries
All 9 language dictionaries balanced (367-439 entries each):
- FR: 281→397, ES: 275→388, IT: 272→379, PL: 257→368, PT: 256→367
- EN/RU/UK: added perplexity_boosters
Stats
- 28 files changed, +2333 lines
- 1455 tests passing (82 new)
- 17 new public exports
- Zero external dependencies
v0.9.0 — Kirchenbauer Watermark, HTML Diff, Quality Gate, Selective Humanization, Stylometric Anonymizer
What's New
Kirchenbauer Watermark Detector
Green-list z-test based on Kirchenbauer et al. 2023. Uses SHA-256 hash of previous token to partition vocabulary into green/red lists (γ=0.25), computes z-score and p-value. Flags AI watermark at z ≥ 4.0.
from texthumanize import detect_watermarks
report = detect_watermarks(text)
print(report.kirchenbauer_score, report.kirchenbauer_p_value)HTML Diff Report
explain() now supports multiple output formats:
html = explain(result, fmt='html') # self-contained HTML page
json_str = explain(result, fmt='json') # RFC 6902 JSON Patch
diff = explain(result, fmt='diff') # unified diffQuality Gate
CLI + GitHub Action + pre-commit hook to check text for AI artifacts:
python -m texthumanize.quality_gate README.md docs/ --ai-threshold 25Selective Humanization
Process only AI-flagged sentences, leaving human text untouched:
result = humanize(text, only_flagged=True)Stylometric Anonymizer
Disguise authorship by transforming text toward a target style:
from texthumanize import anonymize_style
result = anonymize_style(text, target='blogger')Stats
- 1,373 Python tests passing
- 40 new tests for v0.9.0 features
- Ruff lint clean
- 22 files changed, 1,637 additions
v0.8.1 — Dual License, Commercial Pricing, Enterprise-Ready
What's New
Licensing:
- Dual license model — free for personal/academic use, commercial licenses from $99/year
- COMMERCIAL.md — dedicated page with pricing tiers, feature comparison, FAQ
- Clear LICENSE file with no ambiguity for legal/compliance teams
Benchmarks:
benchmarks/full_benchmark.py— reproducible benchmark suite (speed, memory, predictability, AI detection, quality)- Real measured data in README: 26K-38K chars/sec, ~2.5 MB peak memory, 100% determinism
Enterprise Documentation:
- "For Business & Enterprise" section with corporate requirements mapping
- Processing Modes:
normalize/style_soft/rewritewith audit trail - Change Report section with
explain()examples - Predictability guarantees with seed-based determinism proof
Commercial License Tiers
| Tier | Price | Includes |
|---|---|---|
| Indie | $99/yr | 1 project, 1 dev |
| Startup | $299/yr | 3 projects, 5 devs |
| Business | $799/yr | Unlimited, 20 devs |
| Enterprise | Contact us | On-prem, SLA |
Install / Update
pip install git+https://github.com/ksanyok/TextHumanize.git@v0.8.1Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md
v0.8.0 — Style Presets, Auto-Tuner, Semantic Guards, TS/JS Port
🎯 Highlights
The most feature-complete release yet — 27,000+ lines of code across 3 platforms.
Added
- Style Presets — 5 predefined targets: student, copywriter, scientist, journalist, blogger
- Auto-Tuner — feedback loop that learns optimal intensity from processing history
- Semantic preservation guards — expanded context guards with 20+ patterns across EN/RU/UK/DE
- Typography-only fast path — AI ≤ 5% skips semantic stages entirely
- TypeScript/JavaScript port — full pipeline with adaptive intensity (28 tests)
- Complete documentation rewrite — README (2500+ lines), API Reference (660+ lines), Cookbook (14 recipes)
Changed
- change_ratio calculation — switched to SequenceMatcher (fixes critical inflation bug)
- Graduated retry — retries at lower intensity instead of full rollback
- German dictionaries — bureaucratic 22→64, phrases 14→25, connectors 12→20, synonyms 26→45
Fixed
- DE zero-change bug (dictionary contained only infinitives)
- Natural text over-processing (AI ≤ 5%)
- Validator change_ratio consistency
📊 Stats
| Platform | Lines of Code | Tests |
|---|---|---|
| Python | 16,820 | 1,333 |
| PHP | 10,000 | 223 |
| TypeScript | 1,031 | 28 |
| Total | 27,851 | 1,584 |
Benchmark: 100% (45/45) · Coverage: 99% · Speed: 56K chars/sec
Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md
Install / Update
pip install git+https://github.com/ksanyok/TextHumanize.git@v0.8.0v0.7.0 — AI Detection 2.0, C2PA Watermarks, Streaming
Added
- 13 AI-detection metrics — new perplexity_score metric (character-level trigram model)
- Ensemble boosting — 3-classifier aggregation: base weighted sum (50%), strong-signal detector (30%), majority voting (20%). 90.9% accuracy
- Benchmark suite — 11 labeled samples, per-label accuracy breakdown
- CLI detect subcommand —
texthumanize detect [file]with emoji verdicts - Streaming progress callback —
humanize_batch(texts, on_progress=callback) - C2PA / IPTC watermark detection — content provenance pattern detection
- Tone replacements for UK/DE/FR/ES — informal ↔ formal replacement pairs
- PHP examples/ — basic_usage.php & advanced.php
Changed
- Zipf metric rewritten — log-log linear regression with R² goodness-of-fit
- Confidence formula — 4-component with text length, metric agreement, extreme bonus
- Grammar detection expanded — 5 → 9 indicators
Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md
Install
pip install git+https://github.com/ksanyok/TextHumanize.git@v0.7.0v0.6.0 — Batch Processing, Quality Metrics, 99% Coverage
Added
- humanize_batch() / humanizeBatch() — batch processing (Python + PHP)
- HumanizeResult.similarity — Jaccard similarity metric (0..1)
- HumanizeResult.quality_score — overall quality score (0..1)
- 1255 Python tests — up from 500, with 99% code coverage
- 223 PHP tests (825 assertions)
Changed
- Python test coverage 85% → 99% (28 of 38 modules at 100%)
- mypy clean — 0 type errors across all 38 source files
- Dead code removed — 11 unreachable blocks cleaned up
Fixed
- ToneAnalyzer MARKETING direction
- PHP SentenceSplitter Cyrillic support
- 37 mypy type errors fixed
Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md
Install
pip install git+https://github.com/ksanyok/TextHumanize.git@v0.6.0v0.5.0 - Code Quality and Coverage
What's New in v0.5.0
Quality Engineering
- 500 tests - up from 382, covering 85% of the codebase (was 80%)
- Zero lint errors - fixed all 67 ruff errors across the project
- PEP 561 compliance - py.typed marker for downstream type checkers
- Pre-commit hooks - ruff lint+format, trailing whitespace, YAML/TOML checks
- mypy integration - type checking configuration in pyproject.toml
- Enhanced CI/CD - ruff lint step + mypy type check + XML coverage output
Coverage Improvements
| Module | Before | After |
|---|---|---|
| morphology.py | 55% | 93% |
| coherence.py | 68% | 96% |
| paraphrase.py | 71% | 87% |
| watermark.py | 74% | 87% |
| Overall | 80% | 85% |
PHP Fixes
- SentenceSplitter - PREG_OFFSET_CAPTURE offset properly cast to int
- ToneAnalyzer - preg_match offset cast to int for mb_substr() compatibility
Full Changelog
See CHANGELOG.md for details.
Full backward compatibility with v0.4.0 - no breaking changes.