# Schedule

### Lecture 1: Introduction

What is data science? Why is it important? Who are we? Course overview and syllabus.

#### Recommended reading

#### Lecture Video

### Lecture 2: Introduction to Programming in Python, Version Control

Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT

#### Lecture Video

### Lecture 3: Introduction to Programming in Python II

Data types and operators, conditions, lists, loops.

#### Lecture Video

### Lecture 4: Introduction to Descriptive Statistics

Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.

#### Lecture Video

### Lecture 5: Advanced Data Structures

Sets, dictionaries, pandas series, working with modules.

#### Lecture Video

### Lecture 6: Pandas DataFrames

Reading and writing data from files, pandas data frames, basic plotting.

#### Recommended reading

- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison

#### Lecture Video

### Lecture 7: Hypothesis Testing and Statistical Inference

Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.

#### Mandatory reading

#### Lecture Video

### Lecture 8: Temporal Data Analysis and Applications to Stock Analysis

Downloading, cleaning, analyzing, and visualizing stock data.

#### Lecture Video

### Lecture 9: Linear Regression 1

Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints

#### Recommended reading

- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 3

#### Lecture Video

### Lecture 10: Linear Regression 2

Model generalizability, cross validation, and using categorical variables in regression

#### Recommended reading

- ISL, Ch. 3

#### Lecture Video

### Lecture 11: Practical Data Visualization

Data Visualization in Python with Matplotlib, Seaboarn, Altair.

#### Lecture Video

### Lecture 12: Data Visualization

Principles of Data Visualization.

#### Lecture Video

### Lecture 13: Web Scraping and APIs

Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.

#### Recommended reading

#### Lecture Video

### Lecture 14: Classification I: K-Nearest Neighbors

Introduction to classification, k-nearest neighbors, generalizability, bias-variance, cross validation, discussion of course projects

#### Recommended reading

#### Lecture Video

### Lecture 15: No Class

Spring Break Light

### Lecture 16: No Class

Spring Break Light

### Lecture 17: Classification II: Decision Trees and SVMs

Decision Trees and Support Vector Machines (SVM), generalizability and cross validation

#### Recommended reading

- ISL, Ch. 8 and 9

#### Lecture Video

### Lecture 18: Natural Language Processing

Guest Lecture by Vivek Srikumar. What are the challenges in understanding natural language? How can we build statistical models of language?

#### Lecture Video

### Lecture 19: Regular Expressions, NLP in Practice

NLP in Python with NLTK. Parsing strings with regular expressions.

#### Lecture Video

### Lecture 20: Clustering I

Introduction to Clustering, supervised vs. unsupervised learning, k-means method

#### Recommended reading

- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering

#### Lecture Video

### Lecture 21: Project Peer Feedback

Give and receive feedback on your project proposal from a peer group.

### Lecture 22: Clustering II

Hierarchical clustering, dendogram plots, clustering in practice

#### Recommended reading

- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering

#### Lecture Video

### Lecture 23: Dimensionality Reduction

Principal Component Analysis (PCA), using PCA for visualization

#### Lecture Video

### Lecture 24: Ethics

What are the social impacts of computing technology such as personal privacy, intellectual property, interface usability, accessibility, and reliability. What are scenarios where pervasive use of automated systems can and has disproportionately and negatively impacted some groups more than others? What are solutions to mitigate these effects?

#### Recommended reading

### Lecture 25: Neural Networks, Deep Learning, Tensor Flow

Classification and regression with neural networks. Network architectures. Using Tensor Flow.

#### Recommended reading

- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow

### Lecture 26: Neural Networks, Deep Learning, Tensor Flow

Classification and regression with neural networks. Network architectures. Using Tensor Flow.

#### Recommended reading

- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow

### Lecture 27: Databases

Working with relational databases in Python. Introduction to the Structured Query Language.

### Lecture 28: Network Analysis

Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.

#### Mandatory reading

- Grus Ch. 21

### Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook

What did we learn, what else is out there, what can you learn next?