For your final assignment in this course you will work on a data science project over several weeks. The goal of the project is to go through the complete data scince process to answer questions about the dataset you choose. You will acquire the data, re-shape and clean it, explore it, run analysis on it, visualize your results and present it to a reader.
You will work closely with other classmates in a 2-3 person project team. You can come up with your own teams and use our discussion forum to find prospective team members. If you can’t find a partner we will team you up randomly. We recognize that individual schedules and other constraints might limit your ability to work in a team, and we will allow you to work alone for justifiable reasons. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member based on peer assessments (see below).
There are a few actions you have to for your final project. It is critical to note that no extensions will be given for any of these dates for any reason. For due dates see the schedule. Late days may not be used. Projects submitted after the final due date will not be graded. These steps are:
- Project proposals (treated as a homework)
- Milestone 1, a functional project prototype
- Project review with the staff
- Final project submission (including screen-cast) & peer evaluations
- Best project presentations and prizes
You start your project by forming your groups and letting us know what topic you are interested in exploring by submitting a project data form. Please submit only one form per team!. In addition to the form, you will create a proposal document in the form of a Jupyter Notebook, addressing the following points. Use these points as headers in your document.
- Basic Info. The project title, your names, e-mail addresses, UIDs.
- Background and Motivation. Discuss your motivations and reasons for choosing this project, especially any background or research interests that may have influenced your decision.
- Project Objectives. Provide the primary questions you are trying to answer in your project. What would you like to learn and accomplish? List the benefits.
- Data. From where and how are you collecting your data? If appropriate, provide a link to your data sources.
- Ethical considerations. Complete a stakeholder analysis for your project.
- Data Processing. Do you expect to do substantial data cleanup or data extraction? What quantities do you plan to derive from your data? How will data processing be implemented?
- Exploratory Analysis. Which methods and visualizations are you planning to use to look at your dataset?
- Analysis Methodology. How are you planning to analyze your data?
- Project Schedule. Make sure that you plan your work so that you can avoid a big rush right before the final project deadline, and delegate different modules and responsibilities among your team members. Write this in terms of weekly deadlines.
As a ballpark number: your proposal should be about 2-3 pages of text and figures. You can also include some preliminary data acquisition / anlysis steps.
Based on your proposals we will assign a staff member to your team who will guide you through the rest of the project. You will schedule a project review meeting with a staff member. Make sure all of your team members are present at the meeting.
Realizing your Project
When starting to work on your project, you begin to refine your proposal by filling in the conceptual sections with code. E.g., in the data section you add the code that acquires and loads your code, in the processing section, you add the code that does the cleanup, etc. Your project milestone and your final submission should be a well-narrated and documented Jupyter Notebook. You may add supplementary notebooks that, e.g., document analysis path that you chose to abandon, or you may add links to interactive web-based visualizations. Your main notebook, however, should contain everything we need to re-do and re-trace your analysis. As appropriate, transfer content from your proposal document to the notebook.
Make sure to also include your dataset in your submission, or, if appropriate link to the dataset.
For your milestone, we expect you to have acquired, cleaned, and explored your dataset. You should also explain in more detail what will go into your final analysis. Explain deviations from your initial project plan.
If you are uncertain about the scope, please contact the staff member responsible for your project.
Final Project Submission
For your final project you must complete the analysis in your notebook and present your results in a compelling way.
You must also include a three minute video including audio walking us through your project. Each team will create a three minute screen-cast with narration showing a demo of your project and/or some slides. You can use any screencast tool of your choice. Please make sure that the sound quality of your video is good. Upload the video to an online video-platform such as YouTube or Vimeo and link to it from your notebook.
We will strictly enforce the three minute time limit for the video, so please make sure you are not running longer. Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away? Make sure it is front and center rather than at the end.
It is important to provide positive feedback to people who truly worked hard for the good of the team and to also make suggestions to those you perceived not to be working as effectively on team tasks. We ask you to provide an honest assessment of the contributions of the members of your team, including yourself. The feedback you provide should reflect your judgment of each team member’s:
- Preparation – were they prepared during team meetings?
- Contribution – did they contribute productively to the team discussion and work?
- Respect for others’ ideas – did they encourage others to contribute their ideas?
- Flexibility – were they flexible when disagreements occurred?
Your teammate’s assessment of your contributions and the accuracy of your self-assessment will be considered as part of your overall project score.
Submission will be handled trough Canvas.
Each team will be given a brief slot (~5 minutes) to present their project in one of the two last lectures. Present your analysis questions and your main contributions, but also explain your methods and justify your choices. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away?
- Project Scope - Did you choose the appropriate complexity and level of difficulty of your project?
- Process Book - Did you follow the data science process and is it well documented in your notebook?
- Solution - Is your analysis effective and correct in answering your intended questions?
- Implementation - What is the quality of your code? Is it appropriately polished, robust, and reliable?
- Presentation - Is your notebook well narrated? Do you use the appropriate visualizations to communicate your data? Is your screencast clear, engaging, and effective?
- Are ethical considerations discussed.
- Peer Evaluations - Your individual project score will also be influenced by your peer evaluations.
- Project Presentation - Did you present your project well in class.