A pure Elixir HTML5 parser. No NIFs. No native dependencies. Just Elixir.
PureHTML has zero dependencies. It's pure Elixir code all the way down.
- Just install: No C extensions or system libraries required. Works anywhere Elixir runs.
- Debuggable: Step through the parser with IEx to understand exactly how your HTML is being parsed.
- Floki-compatible output: Returns
{tag, attrs, children}tuples with attributes as lists, matching Floki's format.
PureHTML implements the WHATWG HTML5 specification. It handles all the complex error-recovery rules that browsers use.
- Spec compliant: Implements the full HTML5 tree construction algorithm including adoption agency, foster parenting, and foreign content (SVG/MathML).
- 100% html5lib compliance: Passes all 8,634 tests from the official html5lib-tests suite used by browser vendors.
For raw speed, use a NIF-based parser. But for most use cases, PureHTML is fast enough while giving you the benefits of pure Elixir.
Add pure_html to your list of dependencies in mix.exs:
def deps do
[
{:pure_html, "~> 0.2.0"}
]
end# Parse HTML into a document tree
PureHTML.parse("<p class='intro'>Hello!</p>")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [{"class", "intro"}], ["Hello!"]}]}]}]
# Works with malformed HTML just like browsers do
PureHTML.parse("<p>One<p>Two")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [], ["One"]}, {"p", [], ["Two"]}]}]}]
# Convert back to HTML
PureHTML.parse("<p>Hello</p>") |> PureHTML.to_html()
# => "<html><head></head><body><p>Hello</p></body></html>"Find elements using CSS selectors.
html = PureHTML.parse("<div><p class='intro'>Hello</p><p>World</p></div>")
# Find by tag
PureHTML.query(html, "p")
# => [{"p", [{"class", "intro"}], ["Hello"]}, {"p", [], ["World"]}]
# Find by class
PureHTML.query(html, ".intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]
# Compound selectors
PureHTML.query(html, "p.intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]
# Combinators
PureHTML.query(html, "div > p") # Direct children
PureHTML.query(html, "div p") # All descendants
# Extract text content
PureHTML.text(html)
# => "HelloWorld"
# Extract attributes
PureHTML.attribute(html, "p", "class")
# => ["intro"]Supported selectors: tag, *, .class, #id, [attr], [attr=val], [attr^=prefix], [attr$=suffix], [attr*=substring], selector lists (.a, .b), combinators (div p, div > p, h1 + p, h1 ~ p).
See the Querying Guide for complete documentation.
Copyright 2026 (c) Marcelo De Polli.
PureHTML source code is released under MIT License.
Check LICENSE file for more information.