Schedule
Week 1
Lecture 1: Introduction
Monday, Aug. 22What is data science? Why is it important? Who are we? Course overview.
Recommended reading
- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).
Lab 1: Introduction to Programming in Python
Wednesday, Aug. 24Running a Python program, IPython, Jupyter notebook, variables and data types, operations, functions, scope.
Lab 2: Introduction to Programming in Python II
Friday, Aug. 26Data types and operators, conditions, lists, loops.
Week 2
Lecture 2: Introduction to Descriptive Statistics
Monday, August 29Variable types, basic summary statistics and plotting, covariance and correlation, and confounders.
Mandatory reading
- Grus, Ch.5
Lab 3: Advanced Data Structures
Wednesday, August 31Sets, dictionaries, pandas series, working with modules.
Lab 4: Pandas DataFrames
Friday, September 2Reading and writing data, pandas data frames, basic plotting.
Recommended reading
- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
Week 3
Monday: Labor DayLecture 3: Hypothesis Testing and Statistical Inference
Wednesday, September 7Bernoulli, Binomial, and Normal distributions, Central Limit Theorem, and Introduction to Hypothesis Testing.
Mandatory reading
- Grus, Ch.7
Lab 5: Practical Visualization
Friday, September 9Visualization in Python using ggplot
Guest lecturer: Shirley ZhaoRecommended reading
Week 4
Lab 6: Temporal Data Analysis and Applications to Stock Analysis
Monday, September 12Downloading, cleaning, analyzing, and visualizing stock data
Guest lecturer: Curtis MillerLecture 4: Hypothesis Testing and Statistical Inference, part 2
Wednesday, September 14Hypothesis Testing with applications to A/B Testing
Mandatory reading
Lecture 5: Linear Regression 1
Friday, September 16Introduction to ordinary linear regression
Recommended reading
- ISLR, Ch. 3
Week 5
Lecture 6: Linear Regression 2
Monday, September 19multilinear regression, statistical inference
Recommended reading
- ISLR, Ch. 3
Lab 8: Web Scraping
Wednesday, September 21Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas.
Recommended reading
Lab 9: Regression in Practice
Friday, September 23Regression on the example of a credit card dataset
Week 6
Lecture 7: Classification I, Logistic Regression
Monday, September 26Introduction to classification and the logistic regression method
Recommended reading
- Grus Ch. 16
- ISLR Ch. 4
Lecture 8: Classification II, K-Nearest Neighbors
Monday, September 26Overfitting, bias-variance
Recommended reading
Lab 9: Collecting Data from Web APIs
Friday, September 30Connect to APIs such as Twitter, Reddit. JSON, REST.
Week 7
Lecture 9: Decision Trees
Monday, October 3Recommended reading
- Visual Intro to Machine Learning
- Grus, Ch.17
Lecture 10: Support Vector Machines
Wednesday, October 5Lab 10: Classification Methods in Practice
Friday, October 7Week 8
Fall BreakWeek 9
Lab 11: Version Control; Project Introduction
Monday, October 17What is version control, how to make use of it. Introduction to git and GitHub. Introduction to the final project.
Lecture 11: Clustering
Wednesday, October 19Clustering Basics. Partitional and Hierarchical Clustering Approaches. Two part lecture.
Mandatory reading
- Grus, Ch. 19
Lecture 12: Clustering
Friday, October 21Clustering Basics. Partitional and Hierarchical Clustering Approaches. Two part lecture.
Mandatory reading
- Grus, Ch. 19
Week 10
Lecture 13: Dimensionality Reduction
Monday, October 24Principal Component Analysis. Multidimensional Scaling
Lab 12: Clustering Example
Wednesday, October 26Clustering based on the MNIST dataset
Recommended reading
Week 11
Lecture 14: Introduction to Networks and Network Visualization
Monday, October 31Basics about Networks. Visualization methods for general graphs and trees.
Lab 13: Network Analysis
Wednesday, November 2Graph algorithms - path search, centrality, pagerank.
Mandatory reading
- Grus Ch. 21
Lecture 15: Elections
Friday, November 4History; election methods; Arrow’s impossibility theorem; Elections vs. rankings.
Week 12
Lecture 16: Rankings
Monday, November 7Lab 14: Project Peer Feedback
Wednesday, November 9Pitch your project to another group and receive feedback, give feedback to another group.
Lab 15: Ranking in Practice
Friday, November 11Week 13
Lab 16: Regular Expressions
Monday, November 14Lab 17: Practical NLP
Friday, November 18Week 14
Lab 18: Databases
Wednesday, November 23Working with relational databases in Python. Introduction to the Structured Query Language.
Week 15
Lecture 19: Weapons of Math Destruction: Discrimination by Algorithms
Monday, November 28Lab 19: Large Data Analysis
Wednesday, November 30Parallel programming. MapReduce.
Lecture 20: Recap, Wrap-up, Outlook
Friday, December 2What did we learn, what else is out there, what can you learn next?
Week 16
Lecture 21: Project Presentations
Monday, December 5Students present their final project.
Lecture 22: Project Presentations
Wednesday, December 7Students present their final project.