Congratulations on making it to the second stage of the application! Thank you for applying to Generate and choosing the Data Branch!
In this step of the application process, you will be completing a take-home challenge and have an interview. In the interview, you will share your screen and walk us through the process you went through to complete the challenge. Be creative! Show us what you know!
DO NOT use generative AI (ChatGPT, Claude, etc.) in any part of your challenge. It is important to think outside the box and try different things! If you have any questions about any part of the challenge, don't hesitate to reach out to any one of us!
Please finish this challenge before the date of your interview. Please read through this document thoroughly to make sure you follow ALL instructions. Try your best, and we can't wait to meet you!
You're been brought in as a Data Scientist on a project with AeroConnect, an international airline focused on optimizing its routes and expanding profitable city pairs. The client wants to know:
a) Which routes have the highest and lowest passenger traffic over time?
b) Are there any trends or growth patterns across different cities or regions?
c) Can we predict traffic to help with resource allocations(aircrafts, crew, etc.)?
-
Understanding the Data
a) Identify the most and least trafficked routes
b) Analyze trends and/or geographical patterns
c) Create visualizations to demonstrate trends & patterns determined in part b -
Build a Model
a) Your model should predict passenger traffic for the next 6–12 months on at least 1 city pair
NOTE: Make sure to use proper coding practices (i.e. commenting, camelcase, etc.)! -
Evaluate your model
a) Explain your model choices — why did you choose the elements you did
b) Evaluate the model's performance & report the accuracy of the model -
Provide Recommendations
a) Which routes should AeroConnect invest more in or scale back from?
b) How can AeroConnect use this model going forward?
- Cleaned (if needed) data from the given CSV
- Code for the model
- Visualizations from Task 1c)
- Answers to questions 1a, 1b, 2a, 2b, 3a, and 3b in PDF format
- README file describing your process AND including the link to your cleaned data
- Fork the repo -- This creates a copy of the repo under your account
a) At the top-right corner of the repo page, click "fork"
b) Choose the Github account as the destination
c) Clone the forked repo: "git clone https://github.com/YOUR-USERNAME/data-tech-challenge.git"
** This is the model you will be sharing during your interview ** - DO NOT PUSH YOUR UPDATED DATA DIRECTLY TO THE FORKED REPO
Instead, upload it to google drive and include the link in your README file.
Good luck, and have fun with it!
If you have ANY questions, please do not hesitate to reach out to any of the following:
- Haley Martin (Director of Data) : martin.hal@northeastern.edu
- Sonal Gupta (Chief of Data) : gupta.sonal@northeastern.edu
- Nandeenee Singh (Chief of Data) : singh.nand@northeastern.edu
- Kaydence Lin (Project Lead of Data) : lin.kay@northeastern.edu
- Tanisha Joshi (Project Lead of Data) : joshi.tani@northeastern.edu
- Ben Marler (Tech Lead of Data) : marler.b@northeastern.edu
- Jerome Rodrigo (Tech Lead of Data) : rodrigo.j@northeastern.edu