Building an AI‑Powered Web Scraper in n8n (HTTP, HTML, JS, OpenAI)

This workflow is a clean, end‑to‑end example of how to pull content from the web, process it with JavaScript, and then hand it off to an AI model – all from a single visual canvas.

What this workflow does

At a high level, the workflow:

  • Starts when you manually click Execute workflow.
  • Sends an HTTP Request to a target URL.
  • Parses the response using an HTML node to extract just the part of the page you care about.
  • Runs a Code in JavaScript step to clean or reshape that content.
  • Feeds the processed text into Message a model, which generates a final, human‑readable summary or analysis.

It is ideal for turning any web page into something more readable: a short summary, a bullet‑point brief for teammates, or even a draft blog post.

Step‑by‑step through each node

1. Manual trigger


The workflow begins with a When clicking ‘Execute workflow’ node.

  • This keeps things simple for a portfolio demo: there is no cron, no webhooks, just a clear “Run” button.
  • It is great for exploratory analysis or for showcasing the pipeline in a live walkthrough, because you control exactly when the flow starts.

2. HTTP Request: fetching the page

Next, the flow moves into HTTP Request (GET), pointing at the target URL (for example, https://www.leftclick.ai).

  • The request node is responsible for downloading the raw HTML of the page, including all the markup, scripts, and styles.
  • In a real project, this is where you might add headers, authentication, or query parameters if the page is behind an API or needs filters.

3. HTML node: extracting meaningful content

Raw HTML is noisy, so the next step uses an HTML node (named extractHtmlContent) to pull out just the parts that matter.

  • Using CSS selectors or XPath, the node can target the main article container, headings, or specific sections instead of the entire page.
  • This dramatically reduces token usage for the model and improves response quality, because the AI sees focused text instead of layout code and navigation.

4. Code in JavaScript: cleaning and shaping the text

After extraction, the Code in JavaScript node gives you a place to fine‑tune the content before handing it to the model.

  • Typical tasks here include stripping leftover tags, normalizing whitespace, truncating extra‑long pages, or assembling a structured prompt for the AI.
  • This node is where you can add your own opinionated logic — for example, building a JSON object with title, summary, and keyPoints that will be passed forward.

5. Message a model: turning data into narrative

Finally, the processed text flows into Message a model, which calls an OpenAI‑compatible chat or responses endpoint.

  • The node can send a system message (for example: “You are a helpful web analyst.”) and a user message containing the cleaned page content.
  • The model then returns a well‑structured explanation, summary, or analysis that becomes the workflow’s final output — perfect for feeding into emails, dashboards, or documentation.

This workflow demonstrates several skills that are valuable for modern web and automation work.

  • It shows understanding of HTTP, HTML parsing, and JavaScript data shaping, all wired together in a visual automation tool.
  • It highlights practical use of AI APIs: not just calling a model, but preparing high‑quality input and integrating the response into a repeatable process.

Leave a Reply