🕸️ Advanced Web Scraper (Node.js)

A robust and modular web scraping solution built on Node.js, utilizing Puppeteer for modern, headless browser interaction and Cheerio for efficient DOM parsing. Data is automatically extracted into a CSV file.

Features

Headless Browser: Uses Puppeteer to handle dynamic content (AJAX, JavaScript rendering).
Efficient Parsing: Leverages Cheerio for fast DOM manipulation post-load.
Modular Code: Built as a reusable class (AdvancedScraper).
CSV Export: Automatically saves results to scraped_data.csv.

Getting Started

Prerequisites

Node.js (v14 or higher)

Installation

Clone the repository:

git clone https:[github.com/ewhx-dev/Advanced-Web-Scraper.git](https://github.com/ewhx-dev/Advanced-Web-Scraper.git)
cd Advanced-Web-Scraper

Install dependencies (Puppeteer, Cheerio, csv-writer):
```
npm install
```

Configuration and Run

Customize scraper.js:
- Update the TARGET_URL constant with the URL of the website you wish to scrape.
- Crucially, update the CSS selectors (e.g., .product-item, .product-title) within the extractData() method to match the HTML structure of your target website.
Run the script:
```
npm start 
# OR
node scraper.js
```
The extracted data will be saved to a file named scraped_data.csv in the project root.

License

This project is licensed under the ISC License. See the package.json for details.

⚠️ Legal Notice: Always check the website's robots.txt file and their terms of service before scraping. Use this tool responsibly and ethically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🕸️ Advanced Web Scraper (Node.js)

Features

Getting Started

Prerequisites

Installation

Configuration and Run

License

FilesExpand file tree

Readme.md

Latest commit

History

Readme.md

File metadata and controls

🕸️ Advanced Web Scraper (Node.js)

Features

Getting Started

Prerequisites

Installation

Configuration and Run

License