Skip to content

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 15, 2025

Bumps html-to-markdown from 2.9.2 to 2.14.2.

Release notes

Sourced from html-to-markdown's releases.

v2.14.2

Full Changelog: kreuzberg-dev/html-to-markdown@v2.14.1...v2.14.2

v2.14.1

[2.14.1] - 2025-12-12

Fixed

  • Issue #147: Word wrap now works correctly in list items when using the -w/--wrap flag. List items with long text are properly wrapped while preserving list structure and indentation for both ordered and unordered lists.
  • Issue #146: strip_tags and preserve_tags options now correctly prevent <meta> and <title> tags from being extracted into YAML frontmatter when extract_metadata is enabled.
  • Issue #145: strip_newlines=true no longer causes excessive whitespace around block elements. Structural whitespace is now properly normalized while still removing newlines within paragraph content.

v2.14.0 - Complete Metadata Extraction Across All Bindings

Release v2.14.0

🎉 Complete Metadata Extraction Release

This release completes the metadata extraction feature across all language bindings with comprehensive documentation, critical bug fixes, and 100% language compliance.

✨ New Features

CLI Metadata Extraction

  • New --with-metadata flag with JSON output support
  • Six extraction flags: --extract-document, --extract-headers, --extract-links, --extract-images, --extract-structured-data
  • JSON output format: {"markdown": "...", "metadata": {...}}
  • Feature enabled by default in CLI builds

Go FFI Binding

  • Complete ConvertWithMetadata() function with typed structs
  • 12 Go struct types with JSON tags for type-safe metadata access
  • JSON unmarshaling from FFI layer
  • 18 comprehensive tests covering all metadata types

Java FFI Binding

  • Complete convertWithMetadata() method with Java records
  • 11 Java record types using Panama FFM for FFI integration
  • Proper enum types for link/image/text direction (no string-based parsing)
  • Jackson JSON deserialization with error handling
  • 33 comprehensive tests including negative test cases

C# FFI Binding

  • Complete ConvertWithMetadata() method with C# records
  • 11 C# record types using P/Invoke for FFI integration
  • System.Text.Json deserialization with proper error handling
  • 23 comprehensive tests covering all metadata types

FFI Core API

  • New html_to_markdown_convert_with_metadata() C function
  • JSON serialization for language-agnostic metadata transfer
  • Proper memory management and error handling
  • 17 comprehensive tests including memory safety tests

... (truncated)

Changelog

Sourced from html-to-markdown's changelog.

[2.14.2] - 2025-12-13

Changed

  • CI/release automation: extracted Maven installer logic into scripts/common/install-maven-latest.sh and applied repo-wide lint/format cleanups.

[2.14.1] - 2025-12-12

Fixed

  • Issue #147: Word wrap now works correctly in list items when using the -w/--wrap flag. List items with long text are properly wrapped while preserving list structure and indentation for both ordered and unordered lists.
  • Issue #146: strip_tags and preserve_tags options now correctly prevent <meta> and <title> tags from being extracted into YAML frontmatter when extract_metadata is enabled.
  • Issue #145: strip_newlines=true no longer causes excessive whitespace around block elements. Structural whitespace is now properly normalized while still removing newlines within paragraph content.

[2.14.0] - 2025-12-11

Added

  • CLI Metadata Extraction: New --with-metadata flag with JSON output support for extracting document metadata, headers, links, images, and structured data from HTML documents.
    • Six extraction flags: --extract-document, --extract-headers, --extract-links, --extract-images, --extract-structured-data
    • JSON output format with markdown and metadata fields: {"markdown": "...", "metadata": {...}}
    • Feature enabled by default in CLI builds
  • Go FFI Binding: Complete ConvertWithMetadata() function with typed structs for metadata extraction.
    • 12 Go struct types with JSON tags for type-safe metadata access
    • JSON unmarshaling from FFI layer
    • 18 comprehensive tests covering all metadata types
  • Java FFI Binding: Complete convertWithMetadata() method with Java records for metadata extraction.
    • 11 Java record types using Panama FFM for FFI integration
    • Proper enum types for link/image/text direction (no string-based parsing)
    • Jackson JSON deserialization with error handling
    • 33 comprehensive tests including negative test cases
  • C# FFI Binding: Complete ConvertWithMetadata() method with C# records for metadata extraction.
    • 11 C# record types using P/Invoke for FFI integration
    • System.Text.Json deserialization with proper error handling
    • 23 comprehensive tests covering all metadata types
  • FFI Core API: New html_to_markdown_convert_with_metadata() C function for language-agnostic metadata extraction.
    • JSON serialization for cross-language compatibility
    • Proper memory management and error handling
    • 17 comprehensive tests including memory safety tests

Changed

  • Documentation Consolidation: Migrated all standalone METADATA.md files into binding READMEs for improved maintainability.
    • Deleted packages/typescript/METADATA.md (480 lines) and packages/ruby/METADATA.md (228 lines)
    • Enhanced Python, PHP, TypeScript, Ruby, Go, Java, and C# READMEs with comprehensive metadata sections
    • Root README now includes CLI metadata examples and links to all binding documentation
    • Each binding README is now self-contained with full metadata documentation
  • Type Definitions: Enhanced metadata type definitions across all language bindings.
    • Go: Complete struct types with JSON tags and godoc comments
    • Java: Proper enum types (LinkType, ImageType, TextDirection) instead of strings
    • C#: Complete record types with XML documentation
    • Python: Fixed max_structured_data_size default (100KB → 1MB)
    • TypeScript: Verified dimensions field type (Array for compatibility)
  • Docstrings: Enhanced documentation strings across all language bindings.

... (truncated)

Commits
  • cf46ae9 fix(ci): setup ruby for rubygems publish
  • 36f2530 fix(ci): ruby smoke ignore lockfile for local
  • 146f826 fix(ci): smoke ruby from workspace in publish
  • ba51524 fix(ci): smoke ruby gem install from artifact
  • 8bec77f fix(ci): smoke ruby gem path in temp dir
  • 8a4b7ba fix(ci): make ruby gem locator executable
  • 9fd9eab fix(ci): smoke Ruby gem from artifact
  • ffe9f49 chore(build): bump version to 2.14.2
  • 6ab5b09 refactor(ci): extract Maven install script
  • 8f71965 chore(build): apply prek auto-fixes
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [html-to-markdown](https://github.com/Goldziher/html-to-markdown) from 2.9.2 to 2.14.2.
- [Release notes](https://github.com/Goldziher/html-to-markdown/releases)
- [Changelog](https://github.com/Goldziher/html-to-markdown/blob/main/CHANGELOG.md)
- [Commits](kreuzberg-dev/html-to-markdown@v2.9.2...v2.14.2)

---
updated-dependencies:
- dependency-name: html-to-markdown
  dependency-version: 2.14.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code labels Dec 15, 2025
@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Dec 22, 2025

Superseded by #6.

@dependabot dependabot bot closed this Dec 22, 2025
@dependabot dependabot bot deleted the dependabot/uv/html-to-markdown-2.14.2 branch December 22, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant