PureHTML

A pure Elixir HTML5 parser. No NIFs. No native dependencies. Just Elixir.

Why PureHTML?

Pure Elixir

PureHTML has zero dependencies. It's pure Elixir code all the way down.

Just install: No C extensions or system libraries required. Works anywhere Elixir runs.
Debuggable: Step through the parser with IEx to understand exactly how your HTML is being parsed.
Floki-compatible output: Returns {tag, attrs, children} tuples with attributes as lists, matching Floki's format.

Correct

PureHTML implements the WHATWG HTML5 specification. It handles all the complex error-recovery rules that browsers use.

Spec compliant: Implements the full HTML5 tree construction algorithm including adoption agency, foster parenting, and foreign content (SVG/MathML).
100% html5lib compliance: Passes all 8,634 tests from the official html5lib-tests suite used by browser vendors.

Fast Enough

For raw speed, use a NIF-based parser. But for most use cases, PureHTML is fast enough while giving you the benefits of pure Elixir.

Installation

Add pure_html to your list of dependencies in mix.exs:

def deps do
  [
    {:pure_html, "~> 0.2.0"}
  ]
end

Quick Example

# Parse HTML into a document tree
PureHTML.parse("<p class='intro'>Hello!</p>")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [{"class", "intro"}], ["Hello!"]}]}]}]

# Works with malformed HTML just like browsers do
PureHTML.parse("<p>One<p>Two")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [], ["One"]}, {"p", [], ["Two"]}]}]}]

# Convert back to HTML
PureHTML.parse("<p>Hello</p>") |> PureHTML.to_html()
# => "<html><head></head><body><p>Hello</p></body></html>"

Querying

Find elements using CSS selectors.

html = PureHTML.parse("<div><p class='intro'>Hello</p><p>World</p></div>")

# Find by tag
PureHTML.query(html, "p")
# => [{"p", [{"class", "intro"}], ["Hello"]}, {"p", [], ["World"]}]

# Find by class
PureHTML.query(html, ".intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]

# Compound selectors
PureHTML.query(html, "p.intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]

# Combinators
PureHTML.query(html, "div > p")      # Direct children
PureHTML.query(html, "div p")        # All descendants

# Extract text content
PureHTML.text(html)
# => "HelloWorld"

# Extract attributes
PureHTML.attribute(html, "p", "class")
# => ["intro"]

Supported selectors: tag, *, .class, #id, [attr], [attr=val], [attr^=prefix], [attr$=suffix], [attr*=substring], selector lists (.a, .b), combinators (div p, div > p, h1 + p, h1 ~ p).

See the Querying Guide for complete documentation.

License

PureHTML source code is released under MIT License.

Check LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
bench		bench
guides		guides
lib		lib
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.gitmodules		.gitmodules
.tool-versions		.tool-versions
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PureHTML

Why PureHTML?

Pure Elixir

Correct

Fast Enough

Installation

Quick Example

Querying

License

About

Uh oh!

Releases

Packages

Languages

License

mdepolli/pure_html

Folders and files

Latest commit

History

Repository files navigation

PureHTML

Why PureHTML?

Pure Elixir

Correct

Fast Enough

Installation

Quick Example

Querying

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages