# Schedule

## Week 1

### Lecture 1: Introduction

Monday, Aug. 22What is data science? Why is it important? Who are we? Course overview.

#### Recommended reading

- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).

### Lab 1: Introduction to Programming in Python

Wednesday, Aug. 24Running a Python program, IPython, Jupyter notebook, variables and data types, operations, functions, scope.

### Lab 2: Introduction to Programming in Python II

Friday, Aug. 26Data types and operators, conditions, lists, loops.

## Week 2

### Lecture 2: Introduction to Descriptive Statistics

Monday, August 29Variable types, basic summary statistics and plotting, covariance and correlation, and confounders.

#### Mandatory reading

- Grus, Ch.5

### Lab 3: Advanced Data Structures

Wednesday, August 31Sets, dictionaries, pandas series, working with modules.

### Lab 4: Pandas DataFrames

Friday, September 2Reading and writing data, pandas data frames, basic plotting.

#### Recommended reading

- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison

## Week 3

Monday: Labor Day### Lecture 3: Hypothesis Testing and Statistical Inference

Wednesday, September 7Bernoulli, Binomial, and Normal distributions, Central Limit Theorem, and Introduction to Hypothesis Testing.

#### Mandatory reading

- Grus, Ch.7

### Lab 5: Practical Visualization

Friday, September 9Visualization in Python using ggplot

Guest lecturer: Shirley Zhao#### Recommended reading

## Week 4

### Lab 6: Temporal Data Analysis and Applications to Stock Analysis

Monday, September 12Downloading, cleaning, analyzing, and visualizing stock data

Guest lecturer: Curtis Miller### Lecture 4: Hypothesis Testing and Statistical Inference, part 2

Wednesday, September 14Hypothesis Testing with applications to A/B Testing

#### Mandatory reading

### Lecture 5: Linear Regression 1

Friday, September 16Introduction to ordinary linear regression

#### Recommended reading

- ISLR, Ch. 3

## Week 5

### Lecture 6: Linear Regression 2

Monday, September 19multilinear regression, statistical inference

#### Recommended reading

- ISLR, Ch. 3

### Lab 8: Web Scraping

Wednesday, September 21Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas.

#### Recommended reading

### Lab 9: Regression in Practice

Friday, September 23Regression on the example of a credit card dataset

## Week 6

### Lecture 7: Classification I, Logistic Regression

Monday, September 26Introduction to classification and the logistic regression method

#### Recommended reading

- Grus Ch. 16
- ISLR Ch. 4

### Lecture 8: Classification II, K-Nearest Neighbors

Monday, September 26Overfitting, bias-variance

#### Recommended reading

### Lab 9: Collecting Data from Web APIs

Friday, September 30Connect to APIs such as Twitter, Reddit. JSON, REST.

## Week 7

### Lecture 9: Decision Trees

Monday, October 3#### Recommended reading

- Visual Intro to Machine Learning
- Grus, Ch.17

### Lecture 10: Support Vector Machines

Wednesday, October 5### Lab 10: Classification Methods in Practice

Friday, October 7## Week 8

Fall Break## Week 9

### Lab 11: Version Control; Project Introduction

Monday, October 17What is version control, how to make use of it. Introduction to git and GitHub. Introduction to the final project.

### Lecture 11: Clustering

Wednesday, October 19Clustering Basics. Partitional and Hierarchical Clustering Approaches. Two part lecture.

#### Mandatory reading

- Grus, Ch. 19

### Lecture 12: Clustering

Friday, October 21Clustering Basics. Partitional and Hierarchical Clustering Approaches. Two part lecture.

#### Mandatory reading

- Grus, Ch. 19

## Week 10

### Lecture 13: Dimensionality Reduction

Monday, October 24Principal Component Analysis. Multidimensional Scaling

### Lab 12: Clustering Example

Wednesday, October 26Clustering based on the MNIST dataset

#### Recommended reading

## Week 11

### Lecture 14: Introduction to Networks and Network Visualization

Monday, October 31Basics about Networks. Visualization methods for general graphs and trees.

### Lab 13: Network Analysis

Wednesday, November 2Graph algorithms - path search, centrality, pagerank.

#### Mandatory reading

- Grus Ch. 21

### Lecture 15: Elections

Friday, November 4History; election methods; Arrow’s impossibility theorem; Elections vs. rankings.

## Week 12

### Lecture 16: Rankings

Monday, November 7### Lab 14: Project Peer Feedback

Wednesday, November 9Pitch your project to another group and receive feedback, give feedback to another group.

### Lab 15: Ranking in Practice

Friday, November 11## Week 13

### Lab 16: Regular Expressions

Monday, November 14### Lab 17: Practical NLP

Friday, November 18## Week 14

### Lab 18: Databases

Wednesday, November 23Working with relational databases in Python. Introduction to the Structured Query Language.

## Week 15

### Lecture 19: Weapons of Math Destruction: Discrimination by Algorithms

Monday, November 28### Lab 19: Large Data Analysis

Wednesday, November 30Parallel programming. MapReduce.

### Lecture 20: Recap, Wrap-up, Outlook

Friday, December 2What did we learn, what else is out there, what can you learn next?

## Week 16

### Lecture 21: Project Presentations

Monday, December 5Students present their final project.

### Lecture 22: Project Presentations

Wednesday, December 7Students present their final project.