Subject to change.

Week 1

Lecture 1: Introduction

Monday, Aug. 22

What is data science? Why is it important? Who are we? Course overview.

Download slides

Recommended reading

Lab 1: Introduction to Programming in Python

Wednesday, Aug. 24

Running a Python program, IPython, Jupyter notebook, variables and data types, operations, functions, scope.

Lab 2: Introduction to Programming in Python II

Friday, Aug. 26

Data types and operators, conditions, lists, loops.

Homework 0, Introduction due. Friday, Aug. 26, 11:59pm

Week 2

Lecture 2: Introduction to Descriptive Statistics

Monday, August 29

Variable types, basic summary statistics and plotting, covariance and correlation, and confounders.

Download slides

Mandatory reading

  • Grus, Ch.5

Lab 3: Advanced Data Structures

Wednesday, August 31

Sets, dictionaries, pandas series, working with modules.

Lab 4: Pandas DataFrames

Friday, September 2

Reading and writing data, pandas data frames, basic plotting.

Recommended reading

  • Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
Homework 1 due. Friday, September 2, 11:59pm

Week 3

Monday: Labor Day

Lecture 3: Hypothesis Testing and Statistical Inference

Wednesday, September 7

Bernoulli, Binomial, and Normal distributions, Central Limit Theorem, and Introduction to Hypothesis Testing.

Mandatory reading

  • Grus, Ch.7

Lab 5: Practical Visualization

Friday, September 9

Visualization in Python using ggplot

Guest lecturer: Shirley Zhao

Recommended reading

Homework 2 due. Friday, September 9, 11:59pm

Week 4

Lab 6: Temporal Data Analysis and Applications to Stock Analysis

Monday, September 12

Downloading, cleaning, analyzing, and visualizing stock data

Guest lecturer: Curtis Miller

Lecture 4: Hypothesis Testing and Statistical Inference, part 2

Wednesday, September 14

Hypothesis Testing with applications to A/B Testing

Mandatory reading

Lecture 5: Linear Regression 1

Friday, September 16

Introduction to ordinary linear regression

Recommended reading

  • ISLR, Ch. 3

Week 5

Lecture 6: Linear Regression 2

Monday, September 19

multilinear regression, statistical inference

Recommended reading

  • ISLR, Ch. 3

Lab 8: Web Scraping

Wednesday, September 21

Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas.

Recommended reading

Lab 9: Regression in Practice

Friday, September 23

Regression on the example of a credit card dataset

Homework 3 due. Friday, September 23, 11:59pm

Week 6

Lecture 7: Classification I, Logistic Regression

Monday, September 26

Introduction to classification and the logistic regression method

Recommended reading

  • Grus Ch. 16
  • ISLR Ch. 4

Lecture 8: Classification II, K-Nearest Neighbors

Monday, September 26

Overfitting, bias-variance

Lab 9: Collecting Data from Web APIs

Friday, September 30

Connect to APIs such as Twitter, Reddit. JSON, REST.

Week 7

Lecture 9: Decision Trees

Monday, October 3

Recommended reading

Lecture 10: Support Vector Machines

Wednesday, October 5

Lab 10: Classification Methods in Practice

Friday, October 7

Homework 4 due. Friday, October 7, 11:59pm

Week 8

Fall Break

Week 9

Lab 11: Version Control; Project Introduction

Monday, October 17

What is version control, how to make use of it. Introduction to git and GitHub. Introduction to the final project.

Lecture 11: Clustering

Wednesday, October 19

Clustering Basics. Partitional and Hierarchical Clustering Approaches. Two part lecture.

Mandatory reading

  • Grus, Ch. 19

Lecture 12: Clustering

Friday, October 21

Clustering Basics. Partitional and Hierarchical Clustering Approaches. Two part lecture.

Mandatory reading

  • Grus, Ch. 19
Announcement of Project Team and Topic due. Friday, October 21, 11:59pm

Week 10

Lecture 13: Dimensionality Reduction

Monday, October 24

Principal Component Analysis. Multidimensional Scaling

Lab 12: Clustering Example

Wednesday, October 26

Clustering based on the MNIST dataset

Recommended reading

Friday, October 28: Lecture cancelled, staff is traveling.
Homework 5 and Project Proposal due. Friday, October 28, 11:59pm

Week 11

Lecture 14: Introduction to Networks and Network Visualization

Monday, October 31

Basics about Networks. Visualization methods for general graphs and trees.

Lab 13: Network Analysis

Wednesday, November 2

Graph algorithms - path search, centrality, pagerank.

Mandatory reading

  • Grus Ch. 21

Lecture 15: Elections

Friday, November 4

History; election methods; Arrow’s impossibility theorem; Elections vs. rankings.

Week 12

Lecture 16: Rankings

Monday, November 7

Lab 14: Project Peer Feedback

Wednesday, November 9

Pitch your project to another group and receive feedback, give feedback to another group.

Lab 15: Ranking in Practice

Friday, November 11

Homework 6 due. Friday, November 11, 11:59pm

Week 13

Lab 16: Regular Expressions

Monday, November 14

Lecture 17: Natural Language Processing

Wednesday, November 16

Guest lecturer: Vivek Srikumar

Lab 17: Practical NLP

Friday, November 18

Project Milestone due. Friday, November 18, 11:59pm

Week 14

Lecture 18: Data Science in the Health Sciences

Monday, November 21

Guest lecturer: Brian E Chapman

Lab 18: Databases

Wednesday, November 23

Working with relational databases in Python. Introduction to the Structured Query Language.

Friday: Thanksgiving Break

Week 15

Lecture 19: Weapons of Math Destruction: Discrimination by Algorithms

Monday, November 28

Guest lecturer: Suresh Venkatasubramanian

Lab 19: Large Data Analysis

Wednesday, November 30

Parallel programming. MapReduce.

Lecture 20: Recap, Wrap-up, Outlook

Friday, December 2

What did we learn, what else is out there, what can you learn next?

Final Project due. Sunday, December 4, 11:59pm

Week 16

Lecture 21: Project Presentations

Monday, December 5

Students present their final project.

Lecture 22: Project Presentations

Wednesday, December 7

Students present their final project.