Skip to content

mdepolli/pure_html

Repository files navigation

PureHTML

Hex.pm Docs

A pure Elixir HTML5 parser. No NIFs. No native dependencies. Just Elixir.

Why PureHTML?

Pure Elixir

PureHTML has zero dependencies. It's pure Elixir code all the way down.

  • Just install: No C extensions or system libraries required. Works anywhere Elixir runs.
  • Debuggable: Step through the parser with IEx to understand exactly how your HTML is being parsed.
  • Floki-compatible output: Returns {tag, attrs, children} tuples with attributes as lists, matching Floki's format.

Correct

PureHTML implements the WHATWG HTML5 specification. It handles all the complex error-recovery rules that browsers use.

  • Spec compliant: Implements the full HTML5 tree construction algorithm including adoption agency, foster parenting, and foreign content (SVG/MathML).
  • 100% html5lib compliance: Passes all 8,634 tests from the official html5lib-tests suite used by browser vendors.

Fast Enough

For raw speed, use a NIF-based parser. But for most use cases, PureHTML is fast enough while giving you the benefits of pure Elixir.

Installation

Add pure_html to your list of dependencies in mix.exs:

def deps do
  [
    {:pure_html, "~> 0.2.0"}
  ]
end

Quick Example

# Parse HTML into a document tree
PureHTML.parse("<p class='intro'>Hello!</p>")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [{"class", "intro"}], ["Hello!"]}]}]}]

# Works with malformed HTML just like browsers do
PureHTML.parse("<p>One<p>Two")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [], ["One"]}, {"p", [], ["Two"]}]}]}]

# Convert back to HTML
PureHTML.parse("<p>Hello</p>") |> PureHTML.to_html()
# => "<html><head></head><body><p>Hello</p></body></html>"

Querying

Find elements using CSS selectors.

html = PureHTML.parse("<div><p class='intro'>Hello</p><p>World</p></div>")

# Find by tag
PureHTML.query(html, "p")
# => [{"p", [{"class", "intro"}], ["Hello"]}, {"p", [], ["World"]}]

# Find by class
PureHTML.query(html, ".intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]

# Compound selectors
PureHTML.query(html, "p.intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]

# Combinators
PureHTML.query(html, "div > p")      # Direct children
PureHTML.query(html, "div p")        # All descendants

# Extract text content
PureHTML.text(html)
# => "HelloWorld"

# Extract attributes
PureHTML.attribute(html, "p", "class")
# => ["intro"]

Supported selectors: tag, *, .class, #id, [attr], [attr=val], [attr^=prefix], [attr$=suffix], [attr*=substring], selector lists (.a, .b), combinators (div p, div > p, h1 + p, h1 ~ p).

See the Querying Guide for complete documentation.

License

Copyright 2026 (c) Marcelo De Polli.

PureHTML source code is released under MIT License.

Check LICENSE file for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages