Web Scraper

Features

User registration and authentication
Website scanning to extract links
Background processing with Sidekiq
Pagination with Kaminari

Requirements

Ruby 3.3.0
Rails 8.0.0
PostgreSQL
Redis (for Sidekiq)
Node.js and Yarn (for asset compilation)

Installation

Clone the repository:

git clone https://github.com/ealbertos/koombea_web_scraper.git
cd koombea_web_scraper

Install dependencies:

bundle install
yarn install

Set up the database:

rails db:create
rails db:migrate

Start the servers:

# Preferred method
bin/dev

# If you want to run separate terminals
yarn build:css #Just the first time
rails server
bundle exec sidekiq

Visit http://localhost:3000 in your browser

Usage

Sign up or log in to the application
On the home page, enter the URL of the website you want to scrape
Click "Scrape" to start the process
The website will be processed in the background
Click the name of the website to see all the links found on it

Architecture

Models

User: Manages user authentication (using Clearance)
Website: Stores information about scraped websites
Link: Stores links found on scraped websites

Controllers

HomeController: The root path that shows if the user is logged in or not
WebsitesController: Manages website scrping and viewing

Services

WebsiteScraperService: Handles the web scraping logic

Jobs

ScrapeWebsiteJob: Processes website scraping in the background

Testing

To run the tests:

bundle exec rspec

The test suite includes:

Model tests
Controller tests
Service tests
Job tests

Tools Used

Rails 8.0.0
PostgreSQL
Clearance: User authentication
Sidekiq: Background job processing
Nokogiri: HTML parsing
HTTParty: HTTP requests
Bootstrap 5: CSS framework
Kaminari: Pagination
RSpec
FactoryBot
Faker
WebMock

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
.kamal		.kamal
app		app
bin		bin
config		config
db		db
lib/tasks		lib/tasks
log		log
public		public
script		script
spec		spec
storage		storage
test		test
tmp		tmp
vendor		vendor
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile.dev		Procfile.dev
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper

Features

Requirements

Installation

Usage

Architecture

Models

Controllers

Services

Jobs

Testing

Tools Used

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ealbertos/koombea_web_scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper

Features

Requirements

Installation

Usage

Architecture

Models

Controllers

Services

Jobs

Testing

Tools Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages