A robust web scraping solution built with Puppeteer and Node.js that extracts data from Quotes to Scrape. The application is containerized using Docker and deployed on Google Cloud Platform, providing easy access to scraped data through a RESTful API.
- Introduction
- Features
- Tech Stack
- Prerequisites
- Installation and Local Development
- API Documentation
- Deployment
- License
This project showcases the implementation of a production-ready web scraper using Node.js and Puppeteer. The application is containerized and deployed on google cloud platform, with automated deployment through GitHub Actions and container management via Google Cloud Platform. This architecture ensures scalability, maintainability, and easy deployment across different cloud providers.
- Automated web scraping with Puppeteer
- RESTful API for accessing scraped data
- Docker containerization for consistent environments
- Automated CI/CD pipeline with GitHub Actions
- Cloud-native deployment on GCP
- Scalable architecture
- Express.js: Fast, unopinionated web framework for Node.js
- Puppeteer: Headless Chrome automation library
- Docker: Application containerization
- Google Cloud Platform: Container registry and cloud infrastructure
- GitHub Actions: CI/CD automation
- Node.js: JavaScript runtime environment
- Node.js (v14 or higher)
- Docker
- Google Cloud SDK
- Git
# Clone the repository
git clone https://github.com/yourusername/puppeteer-node-scraper.git
# Navigate to project directory
cd puppeteer-node-scraper
# Install dependencies
npm install
# Build Docker image
docker build -t puppeteer-scraper .
# Run docker image
docker run -p 5000:5000 puppeteer-scraper
GET /api/quotes # Fetch all quotes
The application is automatically deployed to google cloud platform when changes are pushed to the main branch. The deployment process includes:
- Building the Docker image
- Pushing to Google Cloud Container Registry
- Deploying to Google Cloud Run via GitHub Actions
This project is licensed under the MIT License - see the LICENSE.md file for details.