# Schedule

## Week 1

### Lecture 1: Introduction

Tuesday, Jan. 8What is data science? Why is it important? Who are we? Course overview and syllabus.

#### Recommended reading

- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).

### Lecture 2: Introduction to Programming in Python, Version Control

Thursday, Jan. 10Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT

## Week 2

### Lecture 3: Introduction to Programming in Python II

Tuesday, Jan. 15Data types and operators, conditions, lists, loops.

### Lecture 4: Introduction to Descriptive Statistics

Thursday, Jan. 17Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.

## Week 3

### Lecture 5: Advanced Data Structures

Tuesday, Jan. 22Sets, dictionaries, pandas series, working with modules.

### Lecture 6: Pandas DataFrames

Thursday, Jan. 24Reading and writing data, pandas data frames, basic plotting.

#### Recommended reading

- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison

## Week 4

### Lecture 7: Hypothesis Testing and Statistical Inference

Tuesday, Jan. 29Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.

#### Mandatory reading

### Lecture 8: Temporal Data Analysis and Applications to Stock Analysis

Thursday, Jan. 31Downloading, cleaning, analyzing, and visualizing stock data

Guest lecturer: Curtis Miller## Week 5

### Lecture 9: Linear Regression 1

Tuesday, Feb. 5Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints

#### Recommended reading

- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 3

### Lecture 10: Data Visualization

Thursday, Feb. 7Principles of Data Visualization; Visualization in Python

## Week 6

### Lecture 11: Linear Regression 2

Tuesday, Feb. 12Model generalizability, cross validation, and using categorical variables in regression

#### Recommended reading

- ISL, Ch. 3

### Lecture 12: Web Scraping; Collecting Data from Web APIs

Thursday, Feb. 14Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.

#### Recommended reading

## Week 7

### Lecture 13: Classification I: Logistic Regression, K-Nearest Neighbors

Tuesday, Feb. 19Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation.

#### Recommended reading

- ISL, Ch. 4
- J. Grus, Doing data science from scratch, Ch. 12, 16

### Lecture 14: Classification II: Decision Trees

Thursday, Feb. 21Decision trees, generalizability and cross validation, course projects

#### Recommended reading

- ISL, Ch. 8
- Visual Intro to Machine Learning
- Grus, Ch. 17

## Week 8

### Lecture 15: Classification III: Support Vector Machines and Classification Competition

Tuesday, Feb. 26Support Vector Machines (SVM), feature selection, competition of classification methods

#### Recommended reading

- ISL, Ch. 9
- A. Géron, Hands-On Machine Learning with Scikit-Learn & TensorFlow (2017), Ch. 5

## Week 9

### Lecture 17: Regular Expressions, NLP in practice

Tuesday, Mar. 5### Lecture 18: Clustering I

Thursday, Mar. 7Introduction to Clustering, supervised vs. unsupervised learning, k-means method

#### Recommended reading

- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering

## Week 10

Spring Break## Week 11

### Lecture 19: Clustering II

Tuesday, Mar. 19hierarchical clustering, dendogram plots, clustering in practice

#### Recommended reading

- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering

### Lecture 20: Project Peer Feedback

Thursday, Mar. 21Pitch your project to another group and receive feedback, give feedback to another group.

## Week 12

### Lecture 21: Dimensionality Reduction

Tuesday, Mar. 26Principal Component Analysis (PCA), using PCA for visualization

#### Recommended reading

- ISL, Ch. 10.2
- V. Powell, Principal Component Analysis: Explained Visually

### Lecture 22: Neural Networks, Deep Learning, Tensor Flow

Thursday, Mar. 28#### Recommended reading

- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow

## Week 13

### Lecture 23: Network Analysis

Tuesday, Apr. 2Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.

#### Mandatory reading

- Grus Ch. 21

### Lecture 24: Databases

Thursday, Apr. 4Working with relational databases in Python. Introduction to the Structured Query Language.

## Week 14

### Lecture 25: Neural Networks, Deep Learning, Tensor Flow

Tuesday, Apr. 9#### Recommended reading

- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow

### Lecture 26: Large Data Analysis

Thursday, Apr. 11Parallel programming. MapReduce.

## Week 15

### Lecture 27: Weapons of Math Destruction: Discrimination by Algorithms

Tuesday, Apr. 16#### Recommended reading

### Lecture 28: Ratings, Rankings, and Elections

Thursday, Apr. 18rating/ranking in sports, election methods

#### Recommended reading

- C. Borgers. Mathematics of Social Choice. SIAM, (2010).

## Week 16

### Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook

Tuesday, Apr. 23What did we learn, what else is out there, what can you learn next?