-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the NAF wiki pages! This wiki contains all the information about our project and how each component functions.
The project is organized into three major components:
- Crawler
- Analyzer
- Enricher
Each component has its own dedicated wiki page with setup instructions, functionality breakdowns, and implementation details.
Automatically searches for potential NAF alumni profiles using Google queries, which return the best yield of NAF alumni by tailoring searches to relevant attributes.
- Queries Google for LinkedIn pages using custom search strings.
- Iterates through the resulting links and collects the HTML of each LinkedIn profile.
- Parses the HTML to generate corresponding JSON objects.
- Stores profiles in the development database based on matching identifiers (such as schools, companies, or certifications).
Evaluates the likelihood that a profile collected by the crawler belongs to a NAF alum.
- Assigns weights to different identifiers such as schools or job roles.
- Calculates a confidence percentage for each profile based on those weights.
- Labels profiles as either likely NAF alumni or not based on their probability score.
Updates and enriches information on individuals already present in the NAF database, particularly when searched manually.
- First searches for the individual in a dynamic PostgreSQL database.
- If the person is not found, it uses a headless browser to search for them online (primarily LinkedIn).
- Scrapes key information from the resulting profiles.
- Exports the updated information to a CSV file and stores structured data in the database.
Each component has a dedicated wiki page.
If you’d like to dive deeper into how they work, learn how to set them up, or run them locally, check out the respective component's documentation: