This repo houses GraduateNU's major requirements scraper. It scrapes the Northeastern Academic Catalog.
Clone the repo and run:
pnpm install
After installing dependencies, you can:
To scrape major requirements:
pnpm scrape
To scrape plans of study (templates):
pnpm scrape:templates
To scrape both major requirements and templates:
pnpm scrape:all
The scraper uses the current catalog by default, but you can specify one or more years as command line arguments. For example to scrape the catalog for 2021, 2022, and the current year, you'd write:
pnpm scrape 2021 2022 current
This will populate the results folder with parsed JSON files and the catalogCache folder with cached HTML.
There is a separate command that can scrape a single academic catalog log by providing a link. To do that, run the following:
pnpm scrape-link <link>
This will populate:
- The
degreesfolder with parsed major requirement JSON files - The
templatesfolder with plan of study templates in JSON format - The
catalogCachefolder with cached HTML files
- Major requirements:
degrees/<degree-type>/<year>/<college>/<major-name>/ - Templates:
templates/<year>/<college>/<major-name>/template.json