Statistical Inference
Likelihood, simulation, and decisions — learning from data under uncertainty

Inference is not a collection of formulas for confidence intervals and hypothesis tests. It is a disciplined way to connect data, probability models, assumptions, uncertainty, evidence, and decisions — to say something responsible about a larger process from a limited sample. This course teaches you to do that carefully, across four complementary ways of reasoning.
What this course is
This is a course in statistical inference built around one idea: inference is the disciplined process of learning from data under uncertainty. We begin with the inferential problem itself — we observe a sample but want to speak about a parameter, a population, a process, or a claim. We then build the machinery: sampling distributions and simulation, estimators and standard errors, bias and variance, likelihood and maximum likelihood estimation, confidence intervals, hypothesis tests and p-values, error rates and power, the bootstrap, randomization and permutation tests, Bayesian updating, and the link from inference to decisions.
The course is deliberately pluralistic. You will learn classical frequentist tools, likelihood-based reasoning, simulation-based inference, and introductory Bayesian inference — not to crown one framework and dismiss the rest, but to understand what each one conditions on, what each method claims, what assumptions it requires, and how its conclusions should be communicated. The emphasis throughout is reasoning, computation, and interpretation.
We use R and Quarto to simulate sampling distributions, resample, draw likelihood and posterior curves, and run randomization tests. But this is an inference course, not a programming course: the software exists to make inferential reasoning visible, and every line of code is in service of a statistical idea.
What you will be able to do
By the end of the term, you should be able to:
- Explain the inferential problem: how sample data are used to learn about unknown parameters, populations, processes, or claims.
- Distinguish parameters, statistics, estimators, estimates, standard errors, and sampling distributions.
- Use simulation to approximate a sampling distribution and reason about sampling variability.
- Evaluate estimators with bias, variance, mean squared error, and practical interpretability.
- Use likelihood to compare parameter values and compute maximum likelihood estimates in common settings.
- Construct and interpret confidence intervals without treating them as probability statements about a fixed parameter.
- Conduct and interpret hypothesis tests — test statistics, p-values, significance, Type I and Type II error, and power — and explain their common misinterpretations.
- Use the bootstrap, and randomization/permutation tests, to estimate uncertainty and weigh evidence.
- Explain the structure of Bayesian inference — prior, likelihood, posterior, posterior prediction — and compare a credible interval with a confidence interval.
- Use simple loss functions or decision rules to connect inference to action, and communicate conclusions carefully — assumptions, uncertainty, limitations, and consequences included.
How the site is organized
This public site has three working areas, reachable from the sidebar:
- Notes — the weekly instructional spine. Each week poses an inferential question, develops the concept, works it on a recurring teaching study, names a common mistake, and offers ungraded self-checks. Start here.
- Labs — the hands-on simulation strand. Four short labs in R and Quarto let you simulate sampling distributions, draw likelihood curves, build bootstrap intervals, and update a posterior. Code is shown for study; you run it in your own session.
- Resources — a notation glossary, a one-page inference reference that lays the four frameworks side by side, and setup instructions for R and Quarto. Keep these open while you read.
Software
We use R (via RStudio or Posit Cloud) together with Quarto to simulate, resample, and visualize inference. No prior coding experience is assumed — the simulation work is scaffolded, and the code is explained as it goes. On this site, R chunks are shown as static teaching code and are not executed in place; you run them in your own session.
Source and attribution
These notes are the course’s own synthesis, grounded in but not copied from three open sources:
- Primary spine: MIT OpenCourseWare 18.05, Introduction to Probability and Statistics (Spring 2022) — free at ocw.mit.edu. License: CC BY-NC-SA 4.0. It grounds the theory spine — sampling distributions, likelihood, maximum likelihood, confidence intervals, hypothesis testing, and Bayesian inference.
- Simulation supplement: Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, 2nd ed. (Ismay, Kim & Valdivia) — free at moderndive.com/v2. License: CC BY-NC-SA 4.0. It grounds simulation-based inference, the bootstrap, randomization tests, and reproducible R/Quarto workflows.
- Lighter review source: Introduction to Modern Statistics, 2nd ed. (Çetinkaya-Rundel & Hardin) — free at openintro-ims.netlify.app. License: CC BY-SA 3.0. Used selectively as a lighter calibration source for introductory inference concepts.
MIT OCW 18.05 and ModernDive are CC BY-NC-SA 4.0 (Attribution · NonCommercial · ShareAlike); IMS is CC BY-SA 3.0 (Attribution · ShareAlike). The course notes are instructor-original and only ground in these texts; any public reuse or adaptation is treated conservatively and transparently, with attribution. All example data are synthetic with seeds set; the prose here is original.
A note on what is public here
Everything on this site is public and ungraded — study material only. You will not find graded prompts, answer keys, rubrics, point values, or due dates here. The operational side of the course — graded inference checkpoints, quizzes, homework, inference labs, the midterm, the project, and the final, along with all dates and submissions — lives in Blackboard (the LMS), which is authoritative. If this site and Blackboard ever disagree, follow Blackboard.
This site is a draft course site, not a finished release. Some pages are drafts, every formula and numeric value is synthetic and provisional pending human review, and no accessibility-compliance claim is made. Treat it as a work in progress rather than the final word.