In this project i will try show how to scrape some information from a website which can be used later to train a machine learning model for educational purposes.
I will collect company text reviews along with ratings from indeed.com and save those on a local disk. This can be used to train natural processing language model to predict a rating based on text information.
Selenium package in python will be used to scrape data online.
Scraping data from a website puts additional load on the server. In order to avoid this i will introduce a wait time between each request to the server and will only capture a limited volume of data to introduce the idea of web scraping.