Schedule
Week 1
Lecture 1: Introduction
Tuesday, Jan. 7What is data science? Why is it important? Who are we? Course overview and syllabus.
Recommended reading
- Cathy O’Neil and Rachel Schutt, Doing Data Science. (2014) Chapter 1.
- David Donoho, 50 years of Data Science. (2015).
Lecture 2: Introduction to Programming in Python, Version Control
Thursday, Jan. 9Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT
Week 2
Lecture 3: Introduction to Programming in Python II
Tuesday, Jan. 14Data types and operators, conditions, lists, loops.
Lecture 4: Introduction to Descriptive Statistics
Thursday, Jan. 16Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.
Week 3
Lecture 5: Advanced Data Structures
Tuesday, Jan. 21Sets, dictionaries, pandas series, working with modules.
Lecture 6: Pandas DataFrames
Thursday, Jan. 23Reading and writing data, pandas data frames, basic plotting.
Recommended reading
- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
Week 4
Lecture 7: Hypothesis Testing and Statistical Inference
Tuesday, Jan. 28Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.
Mandatory reading
Lecture 8: Temporal Data Analysis and Applications to Stock Analysis
Thursday, Jan. 30Downloading, cleaning, analyzing, and visualizing stock data
Week 5
Lecture 9: Linear Regression 1
Tuesday, Feb. 4Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints
Recommended reading
- G. James, D. Witten, T Hastie, and R. Tibshirani, An Introduction to Statistical Learning (ISL) (2015) Ch. 3
Lecture 10: Data Visualization
Thursday, Feb. 6Principles of Data Visualization; Visualization in Python
Week 6
Lecture 11: Web Scraping; Collecting Data from Web APIs
Tuesday, Feb. 11Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.
Recommended reading
Lecture 12: The Struggle Over Data’s Vulnerabilities and Legitimacy
Thursday, Feb. 13Goldman Sachs Lecture by danah boyd. 3pm, 3780 WEB
Guest lecturer: danah boydMandatory reading
- https://www.cs.utah.edu/calendar/goldman-sachs-lecture-danah-boyd/
Week 7
Lecture 13: Linear Regression 2
Tuesday, Feb. 18Model generalizability, cross validation, and using categorical variables in regression
Recommended reading
- ISL, Ch. 3
Lecture 14: Classification I: Logistic Regression and K-Nearest Neighbors
Thursday, Feb. 20Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation, discussion of course projects
Recommended reading
- ISL, Ch. 4
Week 8
Lecture 15: Classification II: Decision Trees and Support Vector Machines
Tuesday, Feb. 25Decision trees, Support Vector Machines (SVM), generalizability and cross validation
Recommended reading
- ISL, Ch. 8 and 9
- Visual Intro to Machine Learning
Week 9
Lecture 17: Clustering I
Tuesday, Mar. 3Introduction to Clustering, supervised vs. unsupervised learning, k-means method
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 18: Project Peer Feedback
Thursday, Mar. 5Pitch your project to another group and receive feedback, give feedback to another group.
Week 10
Spring BreakWeek 11
Classes cancelled university-wide.Lecture 20: Regular Expressions, NLP in practice
Thursday, Mar. 19Week 12
Lecture 21: Dimensionality Reduction
Tuesday, Mar. 24Principal Component Analysis (PCA), using PCA for visualization
Recommended reading
- ISL, Ch. 10.2
- V. Powell, Principal Component Analysis: Explained Visually
Lecture 22: Neural Networks, Deep Learning, Tensor Flow
Thursday, Mar. 26Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Week 13
Lecture 23: Clustering II
Tuesday, Mar. 31hierarchical clustering, dendogram plots, clustering in practice
Recommended reading
- ISL, Ch. 10.1 and 10.3
- Grus, Ch. 19
- scikit-learn documentation on clustering
Lecture 24: Databases
Thursday, Apr. 2Working with relational databases in Python. Introduction to the Structured Query Language.
Week 14
Lecture 25: Network Analysis
Tuesday, Apr. 7Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.
Mandatory reading
- Grus Ch. 21
Lecture 26: Neural Networks, Deep Learning, Tensor Flow
Thursday, Apr. 9Recommended reading
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow
Week 15
Lecture 28: Ratings, Rankings, and Elections
Thursday, Apr. 16rating/ranking in sports, election methods
Recommended reading
- C. Borgers. Mathematics of Social Choice. SIAM, (2010).
Week 16
Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook
Tuesday, Apr. 21What did we learn, what else is out there, what can you learn next?