Yale School of Public Health · Department of Biostatistics

Introduction to
Health Data Science

Translating real-world public health problems into precise questions, applying appropriate analytical tools, and communicating findings effectively.

Instructor
Harsh Parikh, Ph.D.
Schedule
80 min · 2× / week
Format
Lectures + Cases + Labs
Language
Python
About This Course
Description

A comprehensive introduction to data science with a focus on public health and healthcare applications. Covers data structures and algorithms, machine learning, and causal inference — equipping students to decompose complex problems, select appropriate methods, and communicate results to diverse audiences.

Who Should Enroll
MS / PhD in Biostatistics Required
MPH students (quantitative focus)
Advanced undergrads (with permission)
Stats & DS or CS grad students
Harsh Parikh, Ph.D.
Assistant Professor, Biostatistics
harsh.parikh@yale.edu
Office Hours
TBD — 1 hour / week
60 College St, Rm 200
Teaching Fellows
3–4 TFs (TBD)
1 OH each / week
Course Website
harsh-parikh.github.io/intro_ds_2026
Canvas
canvas.yale.edu
Prerequisites

Foundational knowledge in the following areas, or concurrent enrollment in courses covering them.

Linear Algebra
Vectors, matrices, basic operations
Set Theory
Notation, union, intersection, complement
Probability
Random variables, distributions, Bayes' rule
Programming
Any language; Python helpful but not required
Learning Objectives

By the end of this course, students will be able to:

1
Formulate
precise mathematical or statistical questions from natural language problems, identifying meaningful estimands
2
Decompose
complex problems into tractable sub-problems and recognize appropriate tools for each component
3
Implement
data science methods using Python (NumPy, Pandas, Scikit-learn) to analyze real-world datasets
4
Interpret
results within the appropriate problem context, understanding capabilities and limitations
5
Communicate
findings effectively to academic and non-technical audiences through writing and visualization
6
Evaluate
the appropriateness of analytical approaches, making informed trade-offs between methods
Course Schedule
M1: Data Structures & Algorithms
M2: Machine Learning & AI
M3: Causal Inference
1
Course Introduction
Overview of data science; problem formulation; translating questions to estimands
Syllabus; Turing (1950)
Self-assessment quiz
12
1–2
Lists, Arrays & Sorting
Fundamental data structures; sorting algorithms; computational complexity
CLRS Ch. 2, 7, 8
HW 1 released (Wk 2)
23
3
Trees & Graph Traversal
Tree structures; BFS; DFS; applications in healthcare networks
CLRS Ch. 10, 22
Team formation due
23
4
Dynamic Programming
Optimization principles; memoization; tabulation; sequence alignment
CLRS Ch. 15
HW 1 due · Proposal due
23
5
Statistical Learning
Maximum likelihood estimation; uncertainty quantification
ISLR Ch. 1–2; IAML Ch. 1.1
HW 2 released
14
6
Linear Methods
Linear & logistic regression; regularization (Ridge, LASSO); evaluation metrics
ISLR Ch. 3, 4, 6; IAML Ch. 4
134
7
Nonparametric Methods
Decision trees; random forests; boosting & bagging
ISLR Ch. 8; IAML Ch. 2–3
HW 2 due
346
8
Neural Networks
Perceptrons; multi-layer networks; backpropagation; deep learning intro
ISLR Ch. 10; IAML Ch. 12.1
Progress report due · HW 3 released
34
9
Clustering
K-means; EM algorithm; hierarchical clustering
ISLR Ch. 12.4; IAML Ch. 10
346
10
Potential Outcomes & DAGs
Rubin Causal Model; causal graphs; confounding; identification strategies
Mixtape Ch. 3–4; FCCI Ch. 1–2
HW 3 due · HW 4 released
124
11
Matching & Weighting
Matching and weighting estimators; difference-in-differences
Mixtape Ch. 5; FCCI Ch. 10–11
346
12
Project Presentations
Student poster session; data equity discussion
HW 4 due · Final project (Finals Wk)
56
Assessments & Grading
Grade Components
36%
Homework Assignments (4)
Conceptual, programming, and applied problems every ~3 weeks
30%
Course Project
Team-based (3–4) applied project with proposal, report, and poster
18%
Scribe Notes (2)
Detailed LaTeX lecture notes with extensions beyond class content
11%
In-Class Quizzes
Short 5–10 min quizzes on readings and lecture material
5%
Participation
Class engagement, discussions, and forum contributions
Grading Scale (YSPH)
≥ 90%Honors (H)
80–89%High Pass (HP)
65–79%Pass (P)
< 65%Fail (F)
Late Policy

Late submissions are not accepted. Contact the instructor before the deadline for documented emergencies.

Grading Turnaround
Homework & Scribe Notes1 week
ParticipationMidterm & End of Semester

Participation grades are released twice — at midterm and at the end of the semester — so you can align your expectations with ours early on.

Key Dates
Homework 1Week 2 → Week 4
Homework 2Week 5 → Week 7
Homework 3Week 8 → Week 10
Homework 4Week 10 → Week 12
Project ProposalWeek 2 → Week 4
Progress ReportWeek 6 → Week 8
Final ReportWeek 10 → Finals Week
Poster SessionWeek 12 (in class)
Materials

No single required textbook. All texts are freely available online.

CLRS
Introduction to Algorithms
Cormen, Leiserson, Rivest, Stein
ISLR
An Introduction to Statistical Learning
James, Witten, Hastie, Tibshirani
PyDS
Python Data Science Handbook
Jake VanderPlas
Software Stack
Python 3.x NumPy Pandas PyTorch Scikit-learn Matplotlib Seaborn Jupyter Overleaf (LaTeX)
Course Policies
AI Policy

AI assistants (ChatGPT, Claude, Copilot) are permitted as learning aids for homework and scribe notes:

You must understand everything you submit
Disclose all AI use at the top of submissions
No acknowledgment → submission not accepted
You own all accuracy — AI hallucinations are on you
Quizzes: no AI allowed
Collaboration

Discussion with classmates is encouraged, but:

All written solutions must be your own
All code must be written individually
Acknowledge all collaborators and resources
Do not share answers, code, or LaTeX files
Electronic Devices

Mobile phones, iPads, and laptops are not permitted during lectures. Handwritten notes promote deeper processing. Exceptions for documented accommodations. No devices during quizzes under any circumstances.

Academic Integrity

All students must adhere to the YSPH Code of Academic and Professional Integrity (CAPI). Violations — plagiarism, unauthorized collaboration, undisclosed AI use, fabrication — will be referred to the CAPI Committee. Penalties can include expulsion.

Resources
Accessibility

Yale Student Accessibility Services (SAS). Email sas@yale.edu or call 203-432-2324.

Mental Health

YSPH Wellness Counselor: Diane Frankel-Gramelis. 988 Lifeline: dial 988 (24/7).

Writing Support

Graduate Writing Lab — free consultations. Book at yale.mywconline.net.

Inclusivity

Office of Community & Practice. Contact Mayur Desai or Randi McCray.

Title IX

Deputy coordinator: Kelly Shay. OIEA.

Classroom Safety

emergency.yale.edu · Preparedness