Web Scraping Project with FastAPI

Overview

This project is a web scraping tool designed to extract product data from the website https://dentalstall.com/. It doesn't require login credentials and uses Beautiful Soup to parse HTML content and extract product details such as the product name, price, and image.

Technologies Used:

FastAPI for building the API endpoints.
BeautifulSoup for scraping and parsing HTML content.
Pipenv for managing project dependencies.
Docker and Docker Compose for containerization and orchestration.

Features:

Scrapes product details (name, price, image) from https://dentalstall.com/.
Data is stored in a JSON file for easy access.
FastAPI endpoints to retrieve scraped product data.

Prerequisites

Before running the project, ensure that Docker and Docker Compose are installed on your machine. You can download them from the official Docker website: https://docs.docker.com/get-docker/.:

Installation

Clone the repository and navigate to the project folder.
Build the services using Docker Compose::

docker-compose build

Running the Project

1. Scrape the Product Data Run the collect_products.py file to scrape product data from https://dentalstall.com/ and save it to a JSON file: - page_count: The number of pages to scrape (default is to scrape all pages). - proxy: A proxy URL for scraping (optional)

Example usage:

To scrape the first 5 pages without a proxy:

docker-compose run app pipenv run python collect_products.py 5

To scrape the first 5 pages using a proxy:

docker-compose run app pipenv run python collect_products.py 5 http://myproxy.com

The scraped data will be saved to a JSON file named products.json. 2. Start the FastAPI Application After collecting the products, run the FastAPI server using Uvicorn:

docker-compose up

This will start the FastAPI server, and you can access the API at http://127.0.0.1:8000/.

Available Endpoints:

/products: Retrieve a list of all scraped products.
/redis_test: Test the redis connection using set and get

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
collect_products.py		collect_products.py
constants.py		constants.py
decorators.py		decorators.py
docker-compose.yml		docker-compose.yml
main.py		main.py
products.json		products.json
scraper.py		scraper.py
serializers.py		serializers.py
settings.py		settings.py
storage.py		storage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Project with FastAPI

Overview

Technologies Used:

Features:

Prerequisites

Installation

Running the Project

About

Uh oh!

Releases

Packages

Languages

akhilgarg007/Web-Scraping-and-Fast-API

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project with FastAPI

Overview

Technologies Used:

Features:

Prerequisites

Installation

Running the Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages