Homework

due by 01/10/2025,11:59pm

Homework 0 - Description | Submission

due by 01/17/2025,11:59pm

Homework 1 - Description | Submission

due by 01/24/2025,11:59pm

Homework 2 - Description | Submission

due by 01/31/2025,11:59pm

Homework 3 - Description | Submission

due by 02/7/2025,11:59pm

Homework 4 - Description | Submission

due by 02/21/2025,11:59pm

Homework 5 - Description | Submission

due by 02/28/2025,11:59pm

Homework 6 - Description | Submission

due by 03/07/2025,11:59pm

Project Proposal - Description | Submission

due by 03/21/2025,11:59pm

Homework 7 - Description | Submission

due by 03/28/2025,11:59pm

Project Milestone Report - Description | Submission

due by 04/4/2025,11:59pm

Homework 8 - Description | Submission

due by 04/18/2025,11:59pm

Final Project Report - Description | Submission

due by 04/22/2025,11:59pm

Homework 9 - Description | Submission

Other Project Phases

02/14/2025,11:59pm

Announce your team and title

03/20/2025,3:40pm

Peer feedback (in-class)

03/31/2025 - 04/04/2025

Staff feedback by appointments

04/22/2025,03:40pm

Project Awards (in-class)

Lectures

1/7/25

Lecture 1: Introduction + HW 0 Intro

What is data science? Why is it important? Who are we? Course overview and syllabus.

Recommended reading

1/9/25

Lecture 2: Introduction to Programming in Python, Version Control, Chat GPT

Running a Python program, IPython, Jupyter notebooks, variables and data types, operations, functions, scope. Version Control with GIT

1/14/25

Lecture 3: Introduction to Programming in Python II

Data types and operators, conditions, lists, loops.

1/16/25

Lecture 4: Introduction to Descriptive Statistics

Variable types, basic summary statistics and plotting, covariance and correlation, confounders, probability: Bernoulli, Binomial, and Normal distributions.

1/21/25

Lecture 5: Advanced Data Structures

Sets, dictionaries, pandas series, working with modules.

1/23/25

Lecture 6: Pandas DataFrames

Reading and writing data from files, pandas data frames, basic plotting.

Recommended reading

  • Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization. Matt Harrison
1/28/25

Lecture 7: Hypothesis Testing and Statistical Inference

Introduction to Hypothesis Testing, Central Limit Theorem, A/B testing.

Mandatory reading

WIRED article on A/B testing

1/30/25

Lecture 8: Linear Regression 1

Introduction to simple linear regression, multiple linear regression, exploratory vs. inferential viewpoints

Recommended reading

2/4/25

Lecture 9: Linear Regression 2

Model generalizability, cross validation, and using categorical variables in regression

Recommended reading

2/6/25

Lecture 10: Data Visualization 1

Data Visualization in Python with Matplotlib, Seaboarn, Altair.

2/11/25

Lecture 11: Data Visualization 2

Principles of Visualization

2/13/25

Lecture 12: Web Scraping and APIs

Scrape HTML websites with Beautiful Soup. Data Cleanup with Pandas. Connect to APIs such as Twitter, Reddit. JSON, REST.

Recommended reading

2/18/25

Lecture 13: Databases

Working with relational databases in Python. Introduction to the Structured Query Language.

2/20/25

Lecture 14: Ethics + Kate Catch Up

What are the social impacts of computing technology such as personal privacy, intellectual property, interface usability, accessibility, and reliability. What are scenarios where pervasive use of automated systems can and has disproportionately and negatively impacted some groups more than others? What are solutions to mitigate these effects?

2/25/25

Lecture 15: Classification I: K-Nearest Neighbors, Decision Trees

Introduction to classification, k-nearest neighbors, generalizability, bias-variance, cross validation, discussion of course projects

Recommended reading

2/27/25

Lecture 16: Classification II: Logistic Regression and SVMs

Logistic Regression, Support Vector Machines (SVM), generalizability and cross validation

Recommended reading

  • ISL, Ch. 8 and 9
3/4/25

Lecture 19: Clustering I

Introduction to Clustering, supervised vs. unsupervised learning, k-means method

Recommended reading

3/6/25

Lecture 20: Clustering II

Hierarchical clustering, dendogram plots, clustering in practice

Recommended reading

3/11/25

Lecture 17: No Class

Spring Break

3/13/25

Lecture 18: No Class

Spring Break

3/18/25

Lecture 21: Dimensionality Reduction

Principal Component Analysis (PCA), using PCA for visualization

3/20/25

Lecture 22: Project Peer Feedback

Give and receive feedback on your project proposal from a peer group.

3/25/25

Lecture 23: Regular Expressions, NLP in Practice

NLP in Python with NLTK. Parsing strings with regular expressions.

3/27/25

Lecture 24: Network Analysis

Basics about Networks. Visualization methods for general graphs and trees. Graph algorithms - path search, centrality, pagerank.

Recommended reading

4/1/25

Lecture 25: Neural Networks, Deep Learning, Tensor Flow

Classification and regression with neural networks. Network architectures. Using Tensor Flow.

Recommended reading

4/3/25

Lecture 26: This time reserved for project feedback meetings with TAs

4/8/25

Lecture 27: Neural Networks, Deep Learning, Tensor Flow

Classification and regression with neural networks. Network architectures. Using Tensor Flow.

Recommended reading

4/10/25

Lecture 28: Temporal Data Analysis and Applications to Stock Analysis

Downloading, cleaning, analyzing, and visualizing stock data.

4/15/25

Lecture 29: Special Topics - Advanced Methods

4/17/25

Lecture 30: Special Topics - Advanced Methods

4/22/25

Lecture 31: Best Project Presentations, Recap, Wrap-up, Outlook

What did we learn, what else is out there, what can you learn next?