A simple NodeJS library to convert HTML to Markdown.
The given HTML string is transformed into a virtual DOM using cheerio and afterwards minified using html-minifier. The resulting DOM-Nodes are then run though (extendable) rules and deconstructed into Markdown text.
Each DOM-Node replaces itself with the transformed output text. The end result is a <body> with only text nodes inside. The resulting innerHTML() of the body is then sanitized (removing more then 2 linkbreaks in a row etc.) and its content given back.
I was trying serveral other libs before and none was a perfect fit. Often they didn't escape correctly, so that a <div># Hello World</div> would result in # Hello World as Markdown, which is not correct. Also, some libs did not work in newer NodeJS versions.
Not everything has a coverage yet, but most things work quite well:
| Support | Feature | Notes |
|---|---|---|
| ✓ | Line Breaks | |
| ✓ | Images | |
| ✓ | Anchors | |
| ✓ | Lists (ordered, unordered) | |
| ✓ | Strong text | |
| ✓ | Italic text | |
| ✓ | Strikethrough text | |
| ✓ | Headings | |
| ✓ | Horizontal line | |
| ✓ | Paragraphs | |
| ✓ | Inline Code | |
| ✓ | Code Blocks | |
| ✓ | Blockquotes | |
| ✓ | Tables (Markdown Extra Feature) | colspan is buggy. rowspan is unsupported. |
const html2md = require('html2md');
console.log(html2md('<h1>Hello World</h1>')); // # Hello WorldSome features of the converter might be disabled. Currently only table is supported to be disabled:
html2md(html, {
disable: ['table']
});- Clone this repository and move into the
demofolder. - Add your own sites to
sites.json. - Run
node demo.jsand watch the newly written files under demodemo/outputfolder.
You might also want to have a look into the *.sample files within the test/samples folder of this repository.
