This project's objective is to collect and organize information about the public schools in New York City (NYC), particularly their website URLs, borough, and district information. This information will be used to create a database of NYC public schools and their websites.
When teachers register for the union, they are advised not to use their work email addresses due to privacy concerns. However, some teachers still use their work emails. Additionally, the public domains used by NYC public schools tend to change frequently and are often inconsistent.
The NYC Department of Education maintains a comprehensive list of all public schools in the city, along with their contact information and websites. This information is available on the NYC Schools website (https://schoolsearch.schools.nyc/).
Project Source: Open Source website (https://schoolsearch.schools.nyc/)
To achieve our objective, we will follow these steps:
-
Web Crawling: Utilize a Python web crawler to extract information about each individual school's website URL, name, domain, district information, grade levels, the borough it is located in, as well as the latitude ane longitude (to be plotted on a map) from the NYC Schools website. The output will be in the form of a JSON file.
-
Database Creation: Create a MySQL database to store the collected information about NYC public schools. We will develop a Python script to read the JSON file generated from the web crawling process and populate the database with the relevant data.
-
The Python web crawler will navigate through the NYC Schools website (https://schoolsearch.schools.nyc/) and extract information about each public school.
-
Information to be extracted includes the school's website URL, borough, and district.
-
The extracted data will be formatted as a JSON file for further processing.
-
We will create a MySQL database schema to store the collected information.
-
A Python script will be developed to read the JSON file generated from the web crawling process.
-
The script will then establish a connection to the MySQL database and populate it with the extracted school data.
- Clone this repo: git clone https://github.com/mhou9/url_database.git
- cd url_database
- Open MySQL, connect to localhost by enter your mysql password and database name
- In VSCode terminal, to run the code file run: python (filename).py
- Both the output JSON file and database table should display the correct extracted school data
Mingrong Hou
Regina Rabkina