Resources

Project Data Form

Introduction

For your final assignment in this course you will work on a data science project over several weeks. The goal of the project is to go through the complete data science process to answer questions about the dataset you choose. You will acquire the data, re-shape and clean it, explore it, run analysis on it, visualize your results and present it to a reader.

Project Team

You will work closely with other classmates in a 3 person project team. You can come up with your own teams and use our discussion forum to find prospective team members. If you can’t find a partner we will team you up randomly. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member based on peer assessments (see below).

Project Steps

There are a few actions you have to for your final project. It is critical to note that no extensions will be given for any of these dates. For due dates see the canvas. Late days may not be used. Projects submitted after the final due date will not be graded. These steps are:

  • Project proposals
  • Project proposal peer review
  • Project Milestone: a functional project prototype
  • Project review with the staff
  • Final project submission (including screen-cast)
  • Peer evaluations

Proposal

You start your project by forming your groups and letting us know what topic you are interested in exploring by submitting a project data form project data form. Please submit only one form per team!. In addition to the form, you will create a proposal document in the form of a Jupyter Notebook, addressing the following points. Use these points as headers in your document.

  • Basic Info. The project title, your names, e-mail addresses, UIDs.
  • Background and Motivation. Discuss your motivations and reasons for choosing this project, especially any background or research interests that may have influenced your decision.
  • Project Objectives. Provide the primary questions you are trying to answer in your project. What would you like to learn and accomplish? List the benefits.
    • This should include both questions about the data and any learning objectives you would like to fulfill. In other words, there are two kinds of benefits to address.
  • Data Description and Acquisition. What format is your data in? How many items are there? What attributes do those items have? Are there special structures in it (e.g., networks, geographical)? From where and how are you collecting your data? If appropriate, provide a link to your data sources.
    • This part should be specific enough that the instructional staff is assured you have or will be able to obtain data.
      • If it’s online through direct download, link to the specific page from which you will download it.
      • If you will scrape it from the web, link to the page from which you will scrape it and a statement regarding how you have confirmed you are permitted to scrape it.
      • If you will use an API to access it, link to the documentation of the API and explain how you have access to that API
        • If it requires an account, state you have one.
        • If it doesn’t require an account, state that it does not require one.
      • If it is data you have access to through other means, describe in detail what the data is, how you have access to it, and why you have permission to use it.
  • Ethical Considerations. Complete a stakeholder analysis for your project.
    • Who may be affected by your project and its outcomes? How could you project be used for harm?
      • “There are no ethical considerations” must be strongly defended. No one successfully done this before in this class.
  • Data Cleaning and Processing. Do you expect to do substantial data cleanup or data extraction? What quantities do you plan to derive from your data? How will data processing be implemented?
  • Exploratory Analysis. Which methods and visualizations are you planning to use to look at your dataset?
  • Analysis Methodology. How are you planning to analyze your data?
    • What specific questions do you hope to calculate?
    • What methods (from class or otherwise) do you think you will use?
  • Project Schedule. Make sure that you plan your work so that you can avoid a big rush right before the final project deadline, and delegate different modules and responsibilities among your team members. Write this in terms of weekly deadlines.

As a ballpark number: your proposal should be about 2-3 pages of text and figures. You can also include some preliminary data acquisition / analysis steps.

Based on your proposals we will assign a staff member to your team who will guide you through the rest of the project. You will schedule a project review meeting with a staff member. Make sure all of your team members are present at the meeting.

Realizing your Project

When starting to work on your project, you begin to refine your proposal by filling in the conceptual sections with code. E.g., in the data section you add the code that acquires and loads your code, in the processing section, you add the code that does the cleanup, etc. Your project milestone and your final submission should be a well-narrated and documented Jupyter Notebook. You may add supplementary notebooks that, e.g., document analysis path that you chose to abandon, or you may add links to interactive web-based visualizations. Your main notebook, however, should contain everything we need to re-do and re-trace your analysis. As appropriate, transfer content from your proposal document to the notebook.

Make sure to also include your dataset in your submission, or, if appropriate link to the dataset.

Project Milestone

For your milestone, we expect you to have acquired, cleaned, and explored your dataset. You should also explain in more detail what will go into your final analysis. Explain deviations from your initial project plan. In other words, we expect an elaborated data description, acquisition, cleaning, exploratory analysis, and an updated project schedule that discusses changes in plans from your project proposal. The acquisition, cleaning, and exploratory analysis should include the code to accomplish these tasks. Please revise any other sections as necessary.

If you are uncertain about the scope, please contact the staff member responsible for your project.

The milestone should be submitted as a zip file containing a Jupyter notebook and any supporting documents. the Jupyter notebook should contain all the narrative and code. Do no submit a separate document with the write up from the Jupyter notebook. Note this zip file should also include your in-class feedback as a separate file with the name feedback_exercise.

Like with the assignments, submit the Jupyter notebook with the output. Make a large note at the top if you are not able to include your data due to size.

Final Project Submission

For your final project you must complete the analysis in your notebook and present your results in a compelling way. We recommend you include revised versions of all the sections from your proposal (except for the Project Schedule) along with the results of your analysis, the limitations of your analysis, and your conclusions from your data analysis.

Like the previous milestones, you should submit as a zip file containing a Jupyter notebook and any supporting documents. the Jupyter notebook should contain all the narrative and code. Do no submit a separate document with the write up from the Jupyter notebook.

Like with the assignments, submit the Jupyter notebook with the output. Make a large note at the top if you are not able to include your data due to size.

Project Screen-Cast

You must include a three minute video including audio walking us through your project. Each team will create a three minute screen-cast with narration showing a demo of your project and/or some slides. You can use any screencast tool of your choice. Please make sure that the sound quality of your video is good. Upload the video to an online video-platform such as YouTube or Vimeo and link to it from your notebook.

Present your analysis questions and your main contributions, but also explain your methods and justify your choices. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away?

We will strictly enforce the three minute time limit for the video, so please make sure you are not running longer. Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away? Make sure it is front and center rather than at the end.

Peer Assessment

It is important to provide positive feedback to people who truly worked hard for the good of the team and to also make suggestions to those you perceived not to be working as effectively on team tasks. We ask you to provide an honest assessment of the contributions of the members of your team, including yourself. The feedback you provide should reflect your judgment of each team member’s:

  • Preparation – were they prepared during team meetings?
  • Contribution – did they contribute productively to the team discussion and work?
  • Respect for others’ ideas – did they encourage others to contribute their ideas?
  • Flexibility – were they flexible when disagreements occurred?
  • Timeliness - could you count on them to get things done with enough time so as to not block the work of others? Did they respond to messages and report their status so teammates knew whether or not things would get done?

Your teammate’s assessment of your contributions and the accuracy of your self-assessment will be considered as part of your overall project score. While we expect most team members to earn similar grades, we reserve the right to not assign full team credit to team members who were not respectful of their teammates, including their teammates’ time.

Submission Instructions

Submission will be handled trough Canvas.

Grading Criteria

  • Project Scope - Did you choose the appropriate complexity and level of difficulty of your project?
  • Process Book - Did you follow the data science process and is it well documented in your notebook?
  • Solution - Is your analysis effective and correct in answering your intended questions?
  • Implementation - What is the quality of your code? Is it appropriately polished, robust, and reliable?
  • Presentation - Is your notebook well narrated? Do you use the appropriate visualizations to communicate your data? Is your screencast clear, engaging, and effective?
  • Are ethical considerations discussed?
  • Peer Evaluations - Your individual project score will also be influenced by your peer evaluations.
  • Project Presentation - Did you present your project well in your video.