Schedule
Week 1
Lecture 1: Introduction
Tuesday, Jan. 9What is data science? Why is it important? Who are we? Course overview.
Recommended reading
- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).
Lecture 2: Introduction to Programming in Python, Version Control
Thursday, Jan. 11Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT
Week 2
Lecture 3: Introduction to Programming in Python II
Tuesday, Jan. 16Data types and operators, conditions, lists, loops.
Lecture 4: Introduction to Descriptive Statistics
Thursday, Jan. 18Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.
Week 3
Lecture 5: Advanced Data Structures
Tuesday, Jan. 23Sets, dictionaries, pandas series, working with modules.
Lecture 6: Pandas DataFrames
Thursday, Jan. 25Reading and writing data, pandas data frames, basic plotting.
Recommended reading
- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
Week 4
Lecture 7: Temporal Data Analysis and Applications to Stock Analysis
Tuesday, Jan. 30Downloading, cleaning, analyzing, and visualizing stock data
Guest lecturer: Curtis MillerLecture 8: Hypothesis Testing and Statistical Inference
Thursday, Feb. 1Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.
Mandatory reading
Week 5
Lecture 9: Data Visualization
Tuesday, Feb. 6Principles of Data Visualization; Visualization in Python
Lecture 10: Linear Regression 1
Thursday, Feb. 8Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints
Recommended reading
- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 3
Week 6
Lecture 11: Linear Regression 2
Tuesday, Feb. 13Model generalizability, cross validation, and using categorical variables in regression
Recommended reading
- ISL, Ch. 3
Lecture 12: Web Scraping; Collecting Data from Web APIs
Thursday, Feb. 15Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.
Recommended reading
Week 7
Lecture 13: Classification I: Logistic Regression, K-Nearest Neighbors
Tuesday, Feb. 20Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation.
Recommended reading
- ISL, Ch. 4
- J. Grus, Doing data science from scratch, Ch. 12, 16
Lecture 14: Classification II: Decision Trees
Thursday, Feb. 22Decision trees, generalizability and cross validation, course projects
Recommended reading
- ISL, Ch. 8
- Visual Intro to Machine Learning
- Grus, Ch. 17
Week 8
Lecture 15: Classification III: Support Vector Machines and Classification Competition
Tuesday, Feb. 27Support Vector Machines (SVM), feature selection, competition of classification methods
Recommended reading
- ISL, Ch. 9
- A. Géron, Hands-On Machine Learning with Scikit-Learn & TensorFlow (2017), Ch. 5
Week 9
Lecture 17: Regular Expressions, NLP in practice
Tuesday, Mar. 6Lecture 18: Clustering I
Thursday, Mar. 8Introduction to Clustering, supervised vs. unsupervised learning, k-means method
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Week 10
Lecture 19: Clustering II
Tuesday, Mar. 13hierarchical clustering, dendogram plots, clustering in practice
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 20: Project Peer Feedback
Thursday, Mar. 15Pitch your project to another group and receive feedback, give feedback to another group.
Week 11
Spring BreakWeek 12
Lecture 21: Dimensionality Reduction
Tuesday, Mar. 27Principal Component Analysis (PCA), using PCA for visualization
Recommended reading
- ISL, Ch. 10.2
- V. Powell, Principal Component Analysis: Explained Visually
Lecture 22: Neural Networks, Deep Learning, Tensor Flow
Thursday, Mar. 29Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Week 13
Lecture 23: Network Analysis
Tuesday, Apr. 3Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.
Mandatory reading
- Grus Ch. 21
Lecture 24: Databases
Thursday, Apr. 5Working with relational databases in Python. Introduction to the Structured Query Language.
Week 14
Lecture 25: Neural Networks, Deep Learning, Tensor Flow
Tuesday, Apr. 10Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Lecture 26: Large Data Analysis
Thursday, Apr. 12Parallel programming. MapReduce.
Week 15
Lecture 27: Weapons of Math Destruction: Discrimination by Algorithms
Tuesday, Apr. 17Recommended reading
Lecture 28: Ratings, Rankings, and Elections
Thursday, Apr. 19rating/ranking in sports, election methods
Recommended reading
- C. Borgers. Mathematics of Social Choice. SIAM, (2010).
Week 16
Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook
Tuesday, Apr. 24What did we learn, what else is out there, what can you learn next?