Schedule
Homework
Homework 0 - Description | Submission
Homework 1 - Description | Submission
Homework 2 - Description | Submission
Homework 3 - Description | Submission
Homework 4 - Description | Submission
Homework 5 - Description | Submission
Homework 6 - Description | Submission
Project Proposal - Description | Submission
Homework 7 - Description | Submission
Project Milestone Report - Description | Submission
Homework 8 - Description | Submission
Final Project Report - Description | Submission
Other Project Phases
Announce your team and title
Peer feedback (in-class)
Written feedback from staff
Staff feedback by appointments
Project Awards (in-class)
Lectures
Lecture 1: Introduction
What is data science? Why is it important? Who are we? Course overview and syllabus.
Recommended reading
- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).
Lecture 2: Introduction to Programming in Python, Version Control
Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT
Lecture 3: Introduction to Programming in Python II
Data types and operators, conditions, lists, loops.
Lecture 4: Introduction to Descriptive Statistics
Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.
Lecture 5: Advanced Data Structures
Sets, dictionaries, pandas series, working with modules.
Lecture 6: Pandas DataFrames
Reading and writing data from files, pandas data frames, basic plotting.
Recommended reading
- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
Lecture 7: Hypothesis Testing and Statistical Inference
Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.
Mandatory reading
Lecture 8: Ethics
What are the social impacts of computing technology such as personal privacy, intellectual property, interface usability, accessibility, and reliability. What are scenarios where pervasive use of automated systems can and has disproportionately and negatively impacted some groups more than others? What are solutions to mitigate these effects?
Recommended reading
Lecture 9: Temporal Data Analysis and Applications to Stock Analysis
Downloading, cleaning, analyzing, and visualizing stock data.
Recommended reading
- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 3
Lecture 10: Linear Regression 2
Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints
Recommended reading
- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 4
Lecture 11: Linear Regression 3
Model generalizability, cross validation, and using categorical variables in regression
Recommended reading
- ISL, Ch. 3
Lecture 12: Data Visualization 1
Data Visualization in Python with Matplotlib, Seaboarn, Altair.
Lecture 13: Data Visualization 2
Principles of Data Visualization
Lecture 14: Web Scraping and APIs
Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.
Recommended reading
Lecture 15: Project, Extra Lecture Material
Going over project expectations, catching up on material (if needed), and discussing debugging
Lecture 16: Classification I: K-Nearest Neighbors, Decision Trees
Introduction to classification, k-nearest neighbors, generalizability, bias-variance, cross validation, discussion of course projects
Recommended reading
Lecture 17: No Class
Spring Break
Lecture 18: No Class
Spring Break
Lecture 19: Classification II: Logistic Regression and SVMs
Logistic Regression, Support Vector Machines (SVM), generalizability and cross validation
Recommended reading
- ISL, Ch. 8 and 9
Lecture 20: Regular Expressions, NLP in Practice
NLP in Python with NLTK. Parsing strings with regular expressions.
Lecture 21: Clustering I
Introduction to Clustering, supervised vs. unsupervised learning, k-means method
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 22: Natural Language Processing
Guest Lecture by Professor Vivek Srikumar
Lecture 23: Project Peer Feedback
Give and receive feedback on your project proposal from a peer group.
Lecture 24: Clustering II
Hierarchical clustering, dendogram plots, clustering in practice
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 25: Dimensionality Reduction
Principal Component Analysis (PCA), using PCA for visualization
Lecture 26: Neural Networks, Deep Learning, Tensor Flow
Classification and regression with neural networks. Network architectures. Using Tensor Flow.
Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Lecture 27: Neural Networks, Deep Learning, Tensor Flow
Classification and regression with neural networks. Network architectures. Using Tensor Flow.
Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Lecture 28: Databases
Working with relational databases in Python. Introduction to the Structured Query Language.
Lecture 29:
No class to give more time for project feedback meetings
Lecture 30: Network Analysis
Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.
Lecture 31: Best Project Presentations, Recap, Wrap-up, Outlook
What did we learn, what else is out there, what can you learn next?