Template-driven HTML to JSON extraction. Define what you want with jQuery selectors in a declarative JSON template, get structured data back.
SyphonX takes a JSON template with CSS/jQuery selectors and extracts structured data from any HTML — live pages or offline files. No imperative code, just a declarative template.
Here's a command that shows how to fetch a single element from a live page:
npx select --url=https://www.example.com --selector=h1OUTPUT
<h1>Example Domain</h1>
This fetches the page at the given URL and returns the raw HTML of the first element matching the CSS selector — useful for quickly checking what's on a page before writing a template.
Here's how it works in a little more detail...
SYPHONX TEMPLATE
{
"url": "https://www.example.com",
"actions": [
{
"select": [
{ "name": "title", "query": [["h1"]] },
{ "name": "link", "query": [["a", ["attr", "href"]]] }
]
}
]
}HTML INPUT
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples.</p>
<a href="https://www.iana.org/domains/example">More information...</a>
</div>SYPHONX OUTPUT
{
"title": "Example Domain",
"link": "https://www.iana.org/domains/example"
}Run the following command to produce the output described above...
npx online example.jsonThis example just scratches the surface, here's how to learn more...