https://lablnet.com/project/alibabascraper
This is a robust web scraper that extracts data from the Alibaba website. It's multi-threaded and utilizes Playwright to efficiently scrape data from the website. This script is capable of scraping the entire Alibaba site, which would take approximately 4-6 months to complete.
- Clone the repository.
- Run
npm installto install the dependencies. - Copy
.env.exampleto.envand update the values. - Run
node ./alibaba/categories.jsto get the categories and store them in the database. - Run
node ./alibaba/processProducts.jsto start the scraper.- As you can not keep the terminal open so you can use nohup to run the script in background.
nohup node ./alibaba/processProducts.js &- The script will create
categories_queue1queue file in the root directory, and it will keep runing until the queue is empty.
- Scrape data from Alibaba website
- Multi-threaded
- Save data to Amazon DynamoDB
- Proxy support
- Proper error handling and logging
- MIT