In this project, you will pick a topic of your choosing and perform an end-to-end analysis using what you have learned. You will apply the statistical or machine learning techniques we have learned over the last few weeks and present your results to all of us as well as to the jury for the HackShow.
- Ask interesting and thoughful questions and find the data to answer them.
- Focus on improving in areas that are hard for you or learning more about something with which you feel comfortable.
- Apply the statistical and/or machine learning techniques we have learned.
- Create useful and clear graphs.
- Present your insights in a thoughtful, clear and accurate way.
- You must plan your project. That is why creating a Kanban or Trello Board is mandatory.
- You CANNOT CODE until you project is planned.
- Create a .gitignore file and include it in your repository.
- You may be on team (3 people max) or work on your own.
- You can include ML or statistics, but it is not required to do so, as long as you have a rigorous analysis.
- You may use the data from your last project or from past projects.
- A well-commented notebook with your analysis.
- A 5 minute presentation in the auditorium (+2 minutes of questions)
- A 5 minute presentation for the jury (+5 minutes of questions)
- Repository with your workflow, documentation, and code. Even if you are working alone, you need to keep good coding practices!
- The database where you have kept your data.
Wednesday
- Choose your topic and to work as a team or individually.
Thursday
- Create your repository and README overview (template provided).
- Choose the dataset(s) you would like to use.
- Teacher validation
NO CODE UNTIL HERE
Monday morning
- Database schema validation.
Monday
- Data in the database.
Tuesday evening
- Analysis validation.
Wednesday
- First rehearsal.
Thursday morning
- General rehearsal.
Thursday evening
- Presentation!
Friday morning
- Jury presentation: they will select their two prefered projects.
- We will vote for the third project to be presented at Friday's HackShow.
Friday evening. HACKSHOW!
- Auditorium presentation at 18:00.
- Vote for the winner of the HackShow.
- 5 minute presentation in the auditorium (+2 minutes of questions)
- 5 minute presentation for the jury (+5 minutes of questions)
- Organize yourself (don't get lost!).
- Respect deadlines.
- Ask for help but remember: Google is your friend!
- Define a simple approach first. You never know how the data can betray you 😉.
- Document yourself. Learn about the problem and what research has been done before.
- Before making a graph, think about what you want it to represent.
- Don't force yourself to use techniques if they are not helpful for your objective.
- If using machine learning, remember:
- This is an iterative process. Try your best to improve your model performance by:
- Trying different models and select one that is the simplest yet produces the best results. Make sure you know what metric you are using you are defining 'best' and why!
- Trying different hyperparameters and see if they improve the result.
- This is an iterative process. Try your best to improve your model performance by: