Subject to change.

Week 1

Lecture 1: Introduction

Tuesday, Jan. 7

What is data science? Why is it important? Who are we? Course overview and syllabus.

Recommended reading

Lecture 2: Introduction to Programming in Python, Version Control

Thursday, Jan. 9

Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT

Week 2

Lecture 3: Introduction to Programming in Python II

Tuesday, Jan. 14

Data types and operators, conditions, lists, loops.

Lecture 4: Introduction to Descriptive Statistics

Thursday, Jan. 16

Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.

Week 3

Lecture 5: Advanced Data Structures

Tuesday, Jan. 21

Sets, dictionaries, pandas series, working with modules.

Lecture 6: Pandas DataFrames

Thursday, Jan. 23

Reading and writing data, pandas data frames, basic plotting.

Recommended reading

  • Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison

Week 4

Lecture 7: Hypothesis Testing and Statistical Inference

Tuesday, Jan. 28

Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.

Mandatory reading

Lecture 8: Temporal Data Analysis and Applications to Stock Analysis

Thursday, Jan. 30

Downloading, cleaning, analyzing, and visualizing stock data

Week 5

Lecture 9: Linear Regression 1

Tuesday, Feb. 4

Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints

Recommended reading

Lecture 10: Data Visualization

Thursday, Feb. 6

Principles of Data Visualization; Visualization in Python

Week 6

Lecture 11: Web Scraping; Collecting Data from Web APIs

Tuesday, Feb. 11

Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.

Recommended reading

Lecture 12: The Struggle Over Data’s Vulnerabilities and Legitimacy

Thursday, Feb. 13

Goldman Sachs Lecture by danah boyd. 3pm, 3780 WEB

Guest lecturer: danah boyd

Mandatory reading

  • https://www.cs.utah.edu/calendar/goldman-sachs-lecture-danah-boyd/

Week 7

Lecture 13: Linear Regression 2

Tuesday, Feb. 18

Model generalizability, cross validation, and using categorical variables in regression

Recommended reading

Lecture 14: Classification I: Logistic Regression and K-Nearest Neighbors

Thursday, Feb. 20

Introduction to classification, logistic regression, k-nearest neighbors, generalizability, bias-variance, cross validation, discussion of course projects

Recommended reading

Week 8

Lecture 15: Classification II: Decision Trees and Support Vector Machines

Tuesday, Feb. 25

Decision trees, Support Vector Machines (SVM), generalizability and cross validation

Recommended reading

Lecture 16: Natural Language Processing

Thursday, Feb. 27

Guest lecturer: Vivek Srikumar

Week 9

Lecture 17: Clustering I

Tuesday, Mar. 3

Introduction to Clustering, supervised vs. unsupervised learning, k-means method

Recommended reading

Lecture 18: Project Peer Feedback

Thursday, Mar. 5

Pitch your project to another group and receive feedback, give feedback to another group.

Week 10

Spring Break

Week 11

Classes cancelled university-wide.

Lecture 20: Regular Expressions, NLP in practice

Thursday, Mar. 19

Week 12

Lecture 21: Dimensionality Reduction

Tuesday, Mar. 24

Principal Component Analysis (PCA), using PCA for visualization

Recommended reading

Lecture 22: Neural Networks, Deep Learning, Tensor Flow

Thursday, Mar. 26

Recommended reading

Week 13

Lecture 23: Clustering II

Tuesday, Mar. 31

hierarchical clustering, dendogram plots, clustering in practice

Recommended reading

Lecture 24: Databases

Thursday, Apr. 2

Working with relational databases in Python. Introduction to the Structured Query Language.

Week 14

Lecture 25: Network Analysis

Tuesday, Apr. 7

Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.

Mandatory reading

  • Grus Ch. 21

Lecture 26: Neural Networks, Deep Learning, Tensor Flow

Thursday, Apr. 9

Recommended reading

Week 15

Lecture 27: Analyzing Weather Data

Tuesday, Apr. 14

Guest lecturer: Jim Steenburgh

Lecture 28: Ratings, Rankings, and Elections

Thursday, Apr. 16

rating/ranking in sports, election methods

Recommended reading

  • C. Borgers. Mathematics of Social Choice. SIAM, (2010).

Week 16

Lecture 29: Best Project Presentations, Recap, Wrap-up, Outlook

Tuesday, Apr. 21

What did we learn, what else is out there, what can you learn next?