Alibaba Scraper

https://lablnet.com/project/alibabascraper

This is a robust web scraper that extracts data from the Alibaba website. It's multi-threaded and utilizes Playwright to efficiently scrape data from the website. This script is capable of scraping the entire Alibaba site, which would take approximately 4-6 months to complete.

Installation.

Clone the repository.
Run npm install to install the dependencies.
Copy .env.example to .env and update the values.
Run node ./alibaba/categories.js to get the categories and store them in the database.
Run node ./alibaba/processProducts.js to start the scraper.
- As you can not keep the terminal open so you can use nohup to run the script in background.
- nohup node ./alibaba/processProducts.js &
- The script will create categories_queue1 queue file in the root directory, and it will keep runing until the queue is empty.

Features

Scrape data from Alibaba website
Multi-threaded
Save data to Amazon DynamoDB
Proxy support
Proper error handling and logging

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
alibaba		alibaba
helper		helper
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alibaba Scraper

Installation.

Features

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alibaba Scraper

Installation.

Features

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages