Subject to change.

Week 1

Lecture 1: Introduction

Tuesday, Jan. 7

What is data science? Why is it important? Who are we? Course overview and syllabus.

Recommended reading

Lecture 2: Introduction to Programming in Python, Version Control

Thursday, Jan. 9

Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT

Week 2

Lecture 3: Introduction to Programming in Python II

Tuesday, Jan. 14

Data types and operators, conditions, lists, loops.

Lecture 4: Introduction to Descriptive Statistics

Thursday, Jan. 16

Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.

Week 3

Lecture 5: Advanced Data Structures

Tuesday, Jan. 21

Sets, dictionaries, pandas series, working with modules.

Lecture 6: Pandas DataFrames

Thursday, Jan. 23

Reading and writing data, pandas data frames, basic plotting.

Recommended reading

  • Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison

Week 4

Lecture 7: Hypothesis Testing and Statistical Inference

Tuesday, Jan. 28

Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.

Mandatory reading

Lecture 8: Temporal Data Analysis and Applications to Stock Analysis

Thursday, Jan. 30

Downloading, cleaning, analyzing, and visualizing stock data

Guest lecturer: Curtis Miller

Week 5

Lecture 9: Linear Regression 1

Tuesday, Feb. 4

Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints

Recommended reading

Lecture 10: Data Visualization

Thursday, Feb. 6

Principles of Data Visualization; Visualization in Python

Week 6

Lecture 11: Linear Regression 2

Tuesday, Feb. 11

Model generalizability, cross validation, and using categorical variables in regression

Recommended reading

Lecture 12: Web Scraping; Collecting Data from Web APIs

Thursday, Feb. 13

Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.

Recommended reading

Week 7

Lecture 13: Classification I: Logistic Regression, K-Nearest Neighbors

Tuesday, Feb. 18

Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation.

Recommended reading

  • ISL, Ch. 4
  • J. Grus, Doing data science from scratch, Ch. 12, 16

Lecture 14: Classification II: Decision Trees

Thursday, Feb. 20

Decision trees, generalizability and cross validation, course projects

Recommended reading

Week 8

Lecture 15: Classification III: Support Vector Machines and Classification Competition

Tuesday, Feb. 25

Support Vector Machines (SVM), feature selection, competition of classification methods

Recommended reading

  • ISL, Ch. 9
  • A. Géron, Hands-On Machine Learning with Scikit-Learn & TensorFlow (2017), Ch. 5

Lecture 16: Natural Language Processing

Thursday, Feb. 27

Guest lecturer: Vivek Srikumar

Week 9

Lecture 17: Clustering I

Tuesday, Mar. 3

Introduction to Clustering, supervised vs. unsupervised learning, k-means method

Recommended reading

Lecture 18: Project Peer Feedback

Thursday, Mar. 5

Pitch your project to another group and receive feedback, give feedback to another group.

Week 10

Spring Break

Week 11

Lecture 19: Clustering II

Tuesday, Mar. 17

hierarchical clustering, dendogram plots, clustering in practice

Recommended reading

Lecture 20: Regular Expressions, NLP in practice

Thursday, Mar. 19

Week 12

Lecture 21: Dimensionality Reduction

Tuesday, Mar. 24

Principal Component Analysis (PCA), using PCA for visualization

Recommended reading

Lecture 22: Neural Networks, Deep Learning, Tensor Flow

Thursday, Mar. 26

Recommended reading

Week 13

Lecture 23: Network Analysis

Tuesday, Mar. 31

Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.

Mandatory reading

  • Grus Ch. 21

Lecture 24: Databases

Thursday, Apr. 2

Working with relational databases in Python. Introduction to the Structured Query Language.

Week 14

Lecture 25: Neural Networks, Deep Learning, Tensor Flow

Tuesday, Apr. 7

Recommended reading

Lecture 26: Large Data Analysis

Thursday, Apr. 9

Parallel programming. MapReduce.

Week 15

Lecture 27: Weapons of Math Destruction: Discrimination by Algorithms

Tuesday, Apr. 14

Guest lecturer: Katie Shelef

Lecture 28: Ratings, Rankings, and Elections

Thursday, Apr. 16

rating/ranking in sports, election methods

Recommended reading

  • C. Borgers. Mathematics of Social Choice. SIAM, (2010).

Week 16

Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook

Tuesday, Apr. 21

What did we learn, what else is out there, what can you learn next?