Skip to content

Releases: ksanyok/TextHumanize

v0.11.0 — 3x Dictionary Expansion + Composer Fix

20 Feb 12:05

Choose a tag to compare

What's New

Massive Dictionary Expansion (3x total)

All 9 language dictionaries expanded from 2,281 to 6,881 entries (3.0x growth):

Language Before After Growth
English 257 1,391 5.4x
Russian 291 956 3.3x
Ukrainian 252 780 3.1x
German 235 724 3.1x
French 263 599 2.3x
Spanish 255 613 2.4x
Italian 244 616 2.5x
Polish 244 617 2.5x
Portuguese 240 585 2.4x

All 9 categories expanded: synonyms, bureaucratic words/phrases, AI connectors, sentence starters, colloquial markers, perplexity boosters, split conjunctions, abbreviations.

Bug Fixes

  • Composer package name — root composer.json had incorrect name ksanyok/texthumanize (no hyphen). Fixed to ksanyok/text-humanize. Also changed type from project to library with proper Packagist metadata.
  • TOC dots preservation — table-of-contents leader dots (...........) no longer collapse into ellipsis.

Install

# Python
pip install texthumanize

# PHP
composer require ksanyok/text-humanize

1,455 tests passing.

v0.10.0 — Grammar, Uniqueness, Health Score, Semantic & Sentence Readability

20 Feb 09:36

Choose a tag to compare

What's New in v0.10.0

5 New Analysis Modules (all offline, no ML/API)

Module Function Description
Grammar Checker check_grammar() / fix_grammar() Rule-based grammar checking for 9 languages
Uniqueness Score uniqueness_score() / compare_texts() N-gram fingerprinting uniqueness analysis
Content Health content_health() Composite quality: readability + grammar + uniqueness + AI + coherence
Semantic Similarity semantic_similarity() Measures semantic preservation between original and processed text
Sentence Readability sentence_readability() Per-sentence difficulty scoring (easy/medium/hard/very_hard)

Custom Dictionary API

result = humanize(text, custom_dict={
    "implement": "build",
    "utilize": ["use", "apply", "employ"],  # random pick
})

Massively Expanded Dictionaries

All 9 language dictionaries balanced (367-439 entries each):

  • FR: 281→397, ES: 275→388, IT: 272→379, PL: 257→368, PT: 256→367
  • EN/RU/UK: added perplexity_boosters

Stats

  • 28 files changed, +2333 lines
  • 1455 tests passing (82 new)
  • 17 new public exports
  • Zero external dependencies

v0.9.0 — Kirchenbauer Watermark, HTML Diff, Quality Gate, Selective Humanization, Stylometric Anonymizer

20 Feb 09:05

Choose a tag to compare

What's New

Kirchenbauer Watermark Detector

Green-list z-test based on Kirchenbauer et al. 2023. Uses SHA-256 hash of previous token to partition vocabulary into green/red lists (γ=0.25), computes z-score and p-value. Flags AI watermark at z ≥ 4.0.

from texthumanize import detect_watermarks
report = detect_watermarks(text)
print(report.kirchenbauer_score, report.kirchenbauer_p_value)

HTML Diff Report

explain() now supports multiple output formats:

html = explain(result, fmt='html')      # self-contained HTML page
json_str = explain(result, fmt='json')  # RFC 6902 JSON Patch
diff = explain(result, fmt='diff')      # unified diff

Quality Gate

CLI + GitHub Action + pre-commit hook to check text for AI artifacts:

python -m texthumanize.quality_gate README.md docs/ --ai-threshold 25

Selective Humanization

Process only AI-flagged sentences, leaving human text untouched:

result = humanize(text, only_flagged=True)

Stylometric Anonymizer

Disguise authorship by transforming text toward a target style:

from texthumanize import anonymize_style
result = anonymize_style(text, target='blogger')

Stats

  • 1,373 Python tests passing
  • 40 new tests for v0.9.0 features
  • Ruff lint clean
  • 22 files changed, 1,637 additions

v0.8.1 — Dual License, Commercial Pricing, Enterprise-Ready

19 Feb 21:23

Choose a tag to compare

What's New

Licensing:

  • Dual license model — free for personal/academic use, commercial licenses from $99/year
  • COMMERCIAL.md — dedicated page with pricing tiers, feature comparison, FAQ
  • Clear LICENSE file with no ambiguity for legal/compliance teams

Benchmarks:

  • benchmarks/full_benchmark.py — reproducible benchmark suite (speed, memory, predictability, AI detection, quality)
  • Real measured data in README: 26K-38K chars/sec, ~2.5 MB peak memory, 100% determinism

Enterprise Documentation:

  • "For Business & Enterprise" section with corporate requirements mapping
  • Processing Modes: normalize / style_soft / rewrite with audit trail
  • Change Report section with explain() examples
  • Predictability guarantees with seed-based determinism proof

Commercial License Tiers

Tier Price Includes
Indie $99/yr 1 project, 1 dev
Startup $299/yr 3 projects, 5 devs
Business $799/yr Unlimited, 20 devs
Enterprise Contact us On-prem, SLA

Install / Update

pip install git+https://github.com/ksanyok/TextHumanize.git@v0.8.1

Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

v0.8.0 — Style Presets, Auto-Tuner, Semantic Guards, TS/JS Port

19 Feb 20:32

Choose a tag to compare

🎯 Highlights

The most feature-complete release yet — 27,000+ lines of code across 3 platforms.

Added

  • Style Presets — 5 predefined targets: student, copywriter, scientist, journalist, blogger
  • Auto-Tuner — feedback loop that learns optimal intensity from processing history
  • Semantic preservation guards — expanded context guards with 20+ patterns across EN/RU/UK/DE
  • Typography-only fast path — AI ≤ 5% skips semantic stages entirely
  • TypeScript/JavaScript port — full pipeline with adaptive intensity (28 tests)
  • Complete documentation rewrite — README (2500+ lines), API Reference (660+ lines), Cookbook (14 recipes)

Changed

  • change_ratio calculation — switched to SequenceMatcher (fixes critical inflation bug)
  • Graduated retry — retries at lower intensity instead of full rollback
  • German dictionaries — bureaucratic 22→64, phrases 14→25, connectors 12→20, synonyms 26→45

Fixed

  • DE zero-change bug (dictionary contained only infinitives)
  • Natural text over-processing (AI ≤ 5%)
  • Validator change_ratio consistency

📊 Stats

Platform Lines of Code Tests
Python 16,820 1,333
PHP 10,000 223
TypeScript 1,031 28
Total 27,851 1,584

Benchmark: 100% (45/45) · Coverage: 99% · Speed: 56K chars/sec

Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

Install / Update

pip install git+https://github.com/ksanyok/TextHumanize.git@v0.8.0

v0.7.0 — AI Detection 2.0, C2PA Watermarks, Streaming

19 Feb 20:32

Choose a tag to compare

Added

  • 13 AI-detection metrics — new perplexity_score metric (character-level trigram model)
  • Ensemble boosting — 3-classifier aggregation: base weighted sum (50%), strong-signal detector (30%), majority voting (20%). 90.9% accuracy
  • Benchmark suite — 11 labeled samples, per-label accuracy breakdown
  • CLI detect subcommandtexthumanize detect [file] with emoji verdicts
  • Streaming progress callbackhumanize_batch(texts, on_progress=callback)
  • C2PA / IPTC watermark detection — content provenance pattern detection
  • Tone replacements for UK/DE/FR/ES — informal ↔ formal replacement pairs
  • PHP examples/ — basic_usage.php & advanced.php

Changed

  • Zipf metric rewritten — log-log linear regression with R² goodness-of-fit
  • Confidence formula — 4-component with text length, metric agreement, extreme bonus
  • Grammar detection expanded — 5 → 9 indicators

Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

Install

pip install git+https://github.com/ksanyok/TextHumanize.git@v0.7.0

v0.6.0 — Batch Processing, Quality Metrics, 99% Coverage

19 Feb 20:32

Choose a tag to compare

Added

  • humanize_batch() / humanizeBatch() — batch processing (Python + PHP)
  • HumanizeResult.similarity — Jaccard similarity metric (0..1)
  • HumanizeResult.quality_score — overall quality score (0..1)
  • 1255 Python tests — up from 500, with 99% code coverage
  • 223 PHP tests (825 assertions)

Changed

  • Python test coverage 85% → 99% (28 of 38 modules at 100%)
  • mypy clean — 0 type errors across all 38 source files
  • Dead code removed — 11 unreachable blocks cleaned up

Fixed

  • ToneAnalyzer MARKETING direction
  • PHP SentenceSplitter Cyrillic support
  • 37 mypy type errors fixed

Full Changelog: https://github.com/ksanyok/TextHumanize/blob/main/CHANGELOG.md

Install

pip install git+https://github.com/ksanyok/TextHumanize.git@v0.6.0

v0.5.0 - Code Quality and Coverage

18 Feb 23:13

Choose a tag to compare

What's New in v0.5.0

Quality Engineering

  • 500 tests - up from 382, covering 85% of the codebase (was 80%)
  • Zero lint errors - fixed all 67 ruff errors across the project
  • PEP 561 compliance - py.typed marker for downstream type checkers
  • Pre-commit hooks - ruff lint+format, trailing whitespace, YAML/TOML checks
  • mypy integration - type checking configuration in pyproject.toml
  • Enhanced CI/CD - ruff lint step + mypy type check + XML coverage output

Coverage Improvements

Module Before After
morphology.py 55% 93%
coherence.py 68% 96%
paraphrase.py 71% 87%
watermark.py 74% 87%
Overall 80% 85%

PHP Fixes

  • SentenceSplitter - PREG_OFFSET_CAPTURE offset properly cast to int
  • ToneAnalyzer - preg_match offset cast to int for mb_substr() compatibility

Full Changelog

See CHANGELOG.md for details.

Full backward compatibility with v0.4.0 - no breaking changes.