Subject to change.

Week 1

Lecture 1: Introduction

Tuesday, Jan. 9

What is data science? Why is it important? Who are we? Course overview.

Recommended reading

Lecture 2: Introduction to Programming in Python, Version Control

Thursday, Jan. 11

Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT

Week 2

Lecture 3: Introduction to Programming in Python II

Tuesday, Jan. 16

Data types and operators, conditions, lists, loops.

Lecture 4: Introduction to Descriptive Statistics

Thursday, Jan. 18

Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.

Week 3

Lecture 5: Advanced Data Structures

Tuesday, Jan. 23

Sets, dictionaries, pandas series, working with modules.

Lecture 6: Pandas DataFrames

Thursday, Jan. 25

Reading and writing data, pandas data frames, basic plotting.

Recommended reading

  • Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison

Week 4

Lecture 7: Temporal Data Analysis and Applications to Stock Analysis

Tuesday, Jan. 30

Downloading, cleaning, analyzing, and visualizing stock data

Guest lecturer: Curtis Miller

Lecture 8: Hypothesis Testing and Statistical Inference

Thursday, Feb. 1

Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.

Mandatory reading

Week 5

Lecture 9: Data Visualization

Tuesday, Feb. 6

Principles of Data Visualization; Visualization in Python

Lecture 10: Linear Regression 1

Thursday, Feb. 8

Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints

Recommended reading

Week 6

Lecture 11: Linear Regression 2

Tuesday, Feb. 13

Model generalizability, cross validation, and using categorical variables in regression

Recommended reading

Lecture 12: Web Scraping; Collecting Data from Web APIs

Thursday, Feb. 15

Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.

Recommended reading

Week 7

Lecture 13: Classification I: Logistic Regression, K-Nearest Neighbors

Tuesday, Feb. 20

Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation.

Recommended reading

  • ISL, Ch. 4
  • J. Grus, Doing data science from scratch, Ch. 12, 16

Lecture 14: Classification II: Decision Trees

Thursday, Feb. 22

Decision trees, generalizability and cross validation, course projects

Recommended reading

Week 8

Lecture 15: Classification III: Support Vector Machines and Classification Competition

Tuesday, Feb. 27

Support Vector Machines (SVM), feature selection, competition of classification methods

Recommended reading

  • ISL, Ch. 9
  • A. Géron, Hands-On Machine Learning with Scikit-Learn & TensorFlow (2017), Ch. 5

Lecture 16: Natural Language Processing

Thursday, Mar. 1

Guest lecturer: Vivek Srikumar

Week 9

Lecture 17: Regular Expressions, NLP in practice

Tuesday, Mar. 6

Lecture 18: Clustering I

Thursday, Mar. 8

Introduction to Clustering, supervised vs. unsupervised learning, k-means method

Recommended reading

Week 10

Lecture 19: Clustering II

Tuesday, Mar. 13

hierarchical clustering, dendogram plots, clustering in practice

Recommended reading

Lecture 20: Project Peer Feedback

Thursday, Mar. 15

Pitch your project to another group and receive feedback, give feedback to another group.

Week 11

Spring Break

Week 12

Lecture 21: Dimensionality Reduction

Tuesday, Mar. 27

Principal Component Analysis (PCA), using PCA for visualization

Recommended reading

Lecture 22: Neural Networks, Deep Learning, Tensor Flow

Thursday, Mar. 29

Recommended reading

Week 13

Lecture 23: Network Analysis

Tuesday, Apr. 3

Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.

Mandatory reading

  • Grus Ch. 21

Lecture 24: Databases

Thursday, Apr. 5

Working with relational databases in Python. Introduction to the Structured Query Language.

Week 14

Lecture 25: Neural Networks, Deep Learning, Tensor Flow

Tuesday, Apr. 10

Recommended reading

Lecture 26: Large Data Analysis

Thursday, Apr. 12

Parallel programming. MapReduce.

Week 15

Lecture 27: Weapons of Math Destruction: Discrimination by Algorithms

Tuesday, Apr. 17

Guest lecturer: Katie Shelef

Lecture 28: Ratings, Rankings, and Elections

Thursday, Apr. 19

rating/ranking in sports, election methods

Recommended reading

  • C. Borgers. Mathematics of Social Choice. SIAM, (2010).

Week 16

Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook

Tuesday, Apr. 24

What did we learn, what else is out there, what can you learn next?