Skip to content

rayhant2/Realtor.ca-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Realtor.ca-Web-Scraper

With thoughts of moving within Toronto, scrolling through countrless pages is tiring; especially with new properties constntly being added to the market. This Python scraper goes through the most recent 600 postings within Toronto, making real-estate listings local to Toronto much easier.


Overview

This Python script utilizes Selenium and Undetected_Chromedriver to scrape property data from Realtor.ca. It is designed to extract key information such as address, price, bedrooms, and bathrooms, and export all the data into an Excel Spreadsheet.

Prerequisites

Before running the script, make sure the following is installed:

  • Python (ideally 3.9 or newer)
  • Openpyxl
  • Selenium / Selenium-Wire
  • Undetected-Chromedriver
  • Pandas

You can install the required packages using the following:

pip install openpyxl selenium selenium-wire undetected-chromedriver pandas

Usage

  1. Clone the repository:
git clone https://github.com/rayhant2/Realtor.ca-Web-Scraper.git
  1. Run the script:
python app.py

The script will open a Chrome browser, scrape the property data, and export it to "properties.xlsx".

Issues & Contributions

  • The scraper only goes through the first, most recent 600 properties (50 pages x 12 listings); if there are more pages with properties listings, they are neglected - need to find a way to scrape all listings based on the number of pages available
  • For now, this property scraper only works for the city Toronto, ON - need to find a way to optimize the scraper to work for any city/region specified
  • Another issue was that I couldn't find a way to use the button to navigate the listing pages; I'm not sure if its because it wasn't an HTML element, but in order to counter this issue, I found a way where new tabs could be opened and scraping the data from there. Hopefully there is a better way to achieve this, since opening 50 tabs before closing the driver is not an efficient method for scraping data.
  • Other future updates include: Automation, Filtering results based on price/beds/baths, Including links,lot size, etc. in the spreadsheet, and solutions to the issues mentioned above.

If you encounter any issues or have suggestions for improvements, feel free to open an issue or submit a pull request.



Sample Output

An example of how the property data is stored can be found here
Image

License

MIT

About

Importing current property data from Toronto, ON

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages