Schedule
Week 1
Lecture 1: Introduction
Tuesday, Jan. 8What is data science? Why is it important? Who are we? Course overview and syllabus.
Recommended reading
- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).
Lecture 2: Introduction to Programming in Python, Version Control
Thursday, Jan. 10Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT
Week 2
Lecture 3: Introduction to Programming in Python II
Tuesday, Jan. 15Data types and operators, conditions, lists, loops.
Lecture 4: Introduction to Descriptive Statistics
Thursday, Jan. 17Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.
Week 3
Lecture 5: Advanced Data Structures
Tuesday, Jan. 22Sets, dictionaries, pandas series, working with modules.
Lecture 6: Pandas DataFrames
Thursday, Jan. 24Reading and writing data, pandas data frames, basic plotting.
Recommended reading
- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
Week 4
Lecture 7: Hypothesis Testing and Statistical Inference
Tuesday, Jan. 29Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.
Mandatory reading
Lecture 8: Temporal Data Analysis and Applications to Stock Analysis
Thursday, Jan. 31Downloading, cleaning, analyzing, and visualizing stock data
Guest lecturer: Curtis MillerWeek 5
Lecture 9: Linear Regression 1
Tuesday, Feb. 5Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints
Recommended reading
- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 3
Lecture 10: Data Visualization
Thursday, Feb. 7Principles of Data Visualization; Visualization in Python
Week 6
Lecture 11: Linear Regression 2
Tuesday, Feb. 12Model generalizability, cross validation, and using categorical variables in regression
Recommended reading
- ISL, Ch. 3
Lecture 12: Web Scraping; Collecting Data from Web APIs
Thursday, Feb. 14Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.
Recommended reading
Week 7
Lecture 13: Classification I: Logistic Regression, K-Nearest Neighbors
Tuesday, Feb. 19Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation.
Recommended reading
- ISL, Ch. 4
- J. Grus, Doing data science from scratch, Ch. 12, 16
Lecture 14: Classification II: Decision Trees
Thursday, Feb. 21Decision trees, generalizability and cross validation, course projects
Recommended reading
- ISL, Ch. 8
- Visual Intro to Machine Learning
- Grus, Ch. 17
Week 8
Lecture 15: Classification III: Support Vector Machines and Classification Competition
Tuesday, Feb. 26Support Vector Machines (SVM), feature selection, competition of classification methods
Recommended reading
- ISL, Ch. 9
- A. Géron, Hands-On Machine Learning with Scikit-Learn & TensorFlow (2017), Ch. 5
Week 9
Lecture 17: Clustering I
Tuesday, Mar. 5Introduction to Clustering, supervised vs. unsupervised learning, k-means method
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 18: Project Peer Feedback
Thursday, Mar. 7Pitch your project to another group and receive feedback, give feedback to another group.
Week 10
Spring BreakWeek 11
Lecture 19: Clustering II
Tuesday, Mar. 19hierarchical clustering, dendogram plots, clustering in practice
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 20: Regular Expressions, NLP in practice
Thursday, Mar. 21Week 12
Lecture 21: Dimensionality Reduction
Tuesday, Mar. 26Principal Component Analysis (PCA), using PCA for visualization
Recommended reading
- ISL, Ch. 10.2
- V. Powell, Principal Component Analysis: Explained Visually
Lecture 22: Neural Networks, Deep Learning, Tensor Flow
Thursday, Mar. 28Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Week 13
Lecture 23: Network Analysis
Tuesday, Apr. 2Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.
Mandatory reading
- Grus Ch. 21
Lecture 24: Databases
Thursday, Apr. 4Working with relational databases in Python. Introduction to the Structured Query Language.
Week 14
Lecture 25: Weapons of Math Destruction: Discrimination by Algorithms
Tuesday, Apr. 9Recommended reading
Lecture 26: Neural Networks, Deep Learning, Tensor Flow
Tuesday, Apr. 11Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Week 15
Lecture 27: Large Data Analysis
Thursday, Apr. 16Parallel programming. MapReduce.
Lecture 28: Ratings, Rankings, and Elections
Thursday, Apr. 18rating/ranking in sports, election methods
Recommended reading
- C. Borgers. Mathematics of Social Choice. SIAM, (2010).
Week 16
Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook
Tuesday, Apr. 23What did we learn, what else is out there, what can you learn next?