This project is a web scraping tool designed to extract product data from the website https://dentalstall.com/. It doesn't require login credentials and uses Beautiful Soup to parse HTML content and extract product details such as the product name, price, and image.
- FastAPI for building the API endpoints.
- BeautifulSoup for scraping and parsing HTML content.
- Pipenv for managing project dependencies.
- Docker and Docker Compose for containerization and orchestration.
- Scrapes product details (name, price, image) from https://dentalstall.com/.
- Data is stored in a JSON file for easy access.
- FastAPI endpoints to retrieve scraped product data.
Before running the project, ensure that Docker and Docker Compose are installed on your machine. You can download them from the official Docker website: https://docs.docker.com/get-docker/.:
- Clone the repository and navigate to the project folder.
- Build the services using Docker Compose::
docker-compose build1. Scrape the Product Data Run the collect_products.py file to scrape product data from https://dentalstall.com/ and save it to a JSON file: - page_count: The number of pages to scrape (default is to scrape all pages). - proxy: A proxy URL for scraping (optional)
-
Example usage:
- To scrape the first 5 pages without a proxy:
docker-compose run app pipenv run python collect_products.py 5
- To scrape the first 5 pages using a proxy:
docker-compose run app pipenv run python collect_products.py 5 http://myproxy.com
The scraped data will be saved to a JSON file named products.json. 2. Start the FastAPI Application After collecting the products, run the FastAPI server using Uvicorn:
docker-compose upThis will start the FastAPI server, and you can access the API at http://127.0.0.1:8000/.
Available Endpoints:
- /products: Retrieve a list of all scraped products.
- /redis_test: Test the redis connection using set and get