You have seen the problem. You want the article. The page gives you the article plus a sidebar, a comments section, a cookie notice, four related posts injected into the body, and whatever else the CMS decided to throw in.
Defuddle strips all of that. It extracts the main content from a web page and returns it as clean HTML or Markdown. One library, one job.
The fastest way to try it
No install needed:
npx defuddle parse https://example.com/article --markdown
That prints clean Markdown to stdout. Pipe it wherever you want.
npx defuddle parse https://example.com/article --markdown > output.md
Want the metadata too?
npx defuddle parse https://example.com/article --json
Returns the content alongside title, author, description, domain, publication date, and word count. Enough to generate frontmatter automatically.
In Node.js
import { JSDOM } from 'jsdom';
import { Defuddle } from 'defuddle/node';
const dom = await JSDOM.fromURL('https://example.com/article');
const result = await Defuddle(dom, url, { markdown: true });
console.log(result.title);
console.log(result.content);
The response object gives you everything: author, content, description, domain, published, wordCount, and more.
Why it is different from Readability
Mozilla Readability powers Firefox reader mode and has been the default choice for this kind of extraction. Defuddle is more forgiving — it removes less when uncertain — and standardises HTML before returning it. Footnotes, code blocks, and math elements come back in consistent formats. If you are feeding the output into a Markdown converter, that consistency matters.
Worth knowing
Defuddle is explicitly a work in progress. kepano (the person behind Obsidian) built it as the extraction layer for the clipper. If you use Web Clipper, you have already been downstream of it.
Try the playground at defuddle.md before building anything on top of it.
