- User registration and authentication
- Website scanning to extract links
- Background processing with Sidekiq
- Pagination with Kaminari
- Ruby 3.3.0
- Rails 8.0.0
- PostgreSQL
- Redis (for Sidekiq)
- Node.js and Yarn (for asset compilation)
- Clone the repository:
git clone https://github.com/ealbertos/koombea_web_scraper.git
cd koombea_web_scraper- Install dependencies:
bundle install
yarn install- Set up the database:
rails db:create
rails db:migrate- Start the servers:
# Preferred method
bin/dev
# If you want to run separate terminals
yarn build:css #Just the first time
rails server
bundle exec sidekiq- Visit
http://localhost:3000in your browser
- Sign up or log in to the application
- On the home page, enter the URL of the website you want to scrape
- Click "Scrape" to start the process
- The website will be processed in the background
- Click the name of the website to see all the links found on it
User: Manages user authentication (using Clearance)Website: Stores information about scraped websitesLink: Stores links found on scraped websites
HomeController: The root path that shows if the user is logged in or notWebsitesController: Manages website scrping and viewing
WebsiteScraperService: Handles the web scraping logic
ScrapeWebsiteJob: Processes website scraping in the background
To run the tests:
bundle exec rspecThe test suite includes:
- Model tests
- Controller tests
- Service tests
- Job tests
- Rails 8.0.0
- PostgreSQL
- Clearance: User authentication
- Sidekiq: Background job processing
- Nokogiri: HTML parsing
- HTTParty: HTTP requests
- Bootstrap 5: CSS framework
- Kaminari: Pagination
- RSpec
- FactoryBot
- Faker
- WebMock