Week 15 — Final review & synthesis

The whole inferential arc as one practice

The week question

We have spent fourteen weeks building four ways to learn from data under uncertainty. The final question is the one that ties them together: faced with a real inferential problem, how do you choose an approach, carry it out honestly, and say what your data do and do not support — across estimation, likelihood, testing, resampling, and Bayes, as a single practice rather than five disconnected procedures?

This is a one-day synthesis meeting (the last class is Monday, December 7), and it introduces nothing new. Its job is to make the arc visible: to show that one thread — the reading-competency pass rate \(\theta\) — ran through the entire course, and that every tool we built was a different way of speaking about that one unknown. If you can retell that thread in your own words, you are ready for the cumulative final.

Why this matters

It matters because inference is judged as a whole, not as a checklist. On the final, and in any real analysis, the hard part is rarely a single formula; it is choosing a method that fits the question, stating what it conditions on, and writing a conclusion that survives a skeptical reading. A synthesis week is where the isolated skills become judgment.

It also matters because the course’s thesis only lands at the end. We never argued that one framework is correct and the others wrong. We argued that each conditions on something different, claims something different, and assumes something different — and that an educated person can hold all four, choose among them, and translate between them. This week is where that thesis is supposed to feel obvious.

Learning goals

By the end of this review you should be able to:

Retell the inferential arc — from the inferential problem to a decision — as one connected story.
Move fluently between parameter and statistic, estimator and estimate, and name what is fixed versus random in any method.
State, for each framework, what it conditions on and what its conclusion claims, without consulting notes.
Diagnose the standard misreadings (CI as a probability about \(\theta\); p-value as \(P(H_0)\); likelihood as a distribution; credible interval as a CI) on sight.
Choose a defensible framework for a described problem and justify the choice.
Communicate an inferential conclusion with its assumptions, uncertainty, limitations, and consequences.

Core vocabulary

This week reuses the term’s vocabulary rather than adding to it. The anchors worth being able to define cold: parameter / statistic / estimator / estimate; standard error and sampling distribution; bias, variance, MSE; likelihood and the MLE; confidence interval and its coverage; hypothesis test, p-value, Type I/II error, power; bootstrap and permutation null; prior, posterior, credible interval, posterior predictive; and loss / decision. The notation glossary and the inference reference collect all of these in one place — keep both open while you review.

Concept development

1. The arc, in one breath

The course is one story told in four dialects. We began with the inferential problem: a sample in hand, a parameter out of reach. We learned that an estimator is a random variable with a sampling distribution, and that its spread is the standard error — and that simulation can show us that distribution when algebra cannot. We judged estimators by bias, variance, and MSE, learning that “good” depends on the purpose. We let the data rank parameter values through the likelihood, and took the best-supported value as the MLE. We wrapped estimates in confidence intervals and learned to state coverage, not probability. We asked whether data were surprising under a null with hypothesis tests and p-values, and counted the cost of being wrong with error rates, power, and decisions. We estimated uncertainty by bootstrapping and tested by randomization, trading formulas for simulation. And we turned the question around with Bayesian inference, letting a prior become a posterior and finally making the probability statement about \(\theta\) that a confidence interval refused. Week 13 put all four on one table; Week 14 made you do the comparison yourself.

2. The one thread

What held the arc together was a single synthetic study — the reading-fluency study — and a single unknown, the pass rate \(\theta\). Every framework spoke about that one number. The estimate was \(\hat p = 0.65\). Its standard error was about \(0.075\). Its likelihood peaked at \(0.65\), which was also its MLE. Its 95% confidence interval was \((0.502, 0.798)\) and the one-sided test of \(\theta = 0.5\) gave \(p \approx 0.029\). Its \(\text{Beta}(2,2)\) prior updated to a \(\text{Beta}(28,16)\) posterior with credible interval \((0.493, 0.766)\) and \(P(\theta > 0.5 \mid x) \approx 0.975\). The numbers nearly agreed; the claims did not. Holding that — same unknown, four sentences — is the synthesis.

3. What never changes across the lenses

Three habits ran underneath every method, and they are what the final really tests. Name what is conditioned on: what is fixed, what is random, what is assumed. Refuse the standard misreadings: a confidence interval is coverage, not a probability about \(\theta\); a p-value is a tail probability under \(H_0\), not the probability \(H_0\) is true; a likelihood ranks values, it is not a distribution over them; a credible interval is a probability about \(\theta\), not a confidence interval. Communicate honestly: report the estimate, the uncertainty, the assumptions, the limitations, and — because inference informs action — the consequences of deciding on it. A correct number inside a careless sentence is still a failed inference.

Worked examples

Worked example — reading the one thread off the study

Take the recurring reading-fluency study (synthetic; seed set, set.seed(35103)), \(x = 26\) of \(n = 40\), and narrate \(\theta\) through the lenses without recomputing anything:

set.seed(35103)
x <- 26; n <- 40; phat <- x / n            # estimate: 0.65
# frequentist: CI (0.502, 0.798); one-sided p ~ 0.029 vs theta = 0.5
# likelihood:  L(theta) ∝ theta^26 (1-theta)^14 peaks at MLE 0.65
# simulation:  bootstrap/randomization reproduce the interval/test
# bayesian:    Beta(2,2) -> Beta(28,16); credible (0.493, 0.766); P(theta>0.5) ~ 0.975

For each line, say it in a sentence that names the claim: “95% of intervals built this way cover \(\theta\)”; “\(0.65\) best explains the data”; “if \(\theta = 0.5\), a result this high arises about 2.9% of the time”; “given the data and prior, there’s about a 95% probability \(\theta\) is between \(0.49\) and \(0.77\).” Four sentences, one unknown — if you can produce them from the numbers, you have the course.

Worked example — choosing a lens under pressure

A transfer drill for the final: you are told only “a clinic wants to know whether a new triage rule beats the old one, with a controlled false-alarm rate, using a small randomized trial.” Walk the choice: a controlled error rate points to a frequentist test (or, with minimal assumptions, a permutation test given the randomization); a small trial warns you that asymptotic intervals may wobble; prior clinical experience, if strong and defensible, could justify a Bayesian analysis with a stated prior. The exercise is not to compute — it is to choose and justify by what each framework conditions on. That judgment, applied to an unfamiliar prompt, is what synthesis means.

A common mistake

The synthesis-week mistake is studying the frameworks as four separate procedures to memorize, rather than as four answers to one question. Students who do this can run each method but freeze when a problem does not announce which method to use, or when two methods are asked to talk to each other. The fix is to rehearse the thread, not the procedures: take one unknown and narrate it through all four lenses, out loud, naming what each conditions on and claims. If you can do that for the pass rate \(\theta\), you can do it for any parameter, and the final’s unfamiliar contexts become familiar.

The second mistake is letting the term’s numerical coincidences — the near-equal intervals, the near-equal tail probabilities — convince you the frameworks are interchangeable. They agreed on an easy problem for specific reasons (a weak prior, a symmetric-ish likelihood, a moderate sample). The educated conclusion is not “they’re the same”; it is “they agreed here, and I can say why, and I know what would pull them apart.”

Low-stakes self-checks (ungraded)

These are ungraded self-checks — no points, no submission.

In four sentences — one per framework — state what each says about the pass rate \(\theta = 0.65\), naming the claim each makes.
For each of the four standard misreadings, write the wrong sentence and its correction.
Given “a regulator needs a controlled false-positive rate,” which framework leads, and why?
The confidence interval \((0.502, 0.798)\) and the credible interval \((0.493, 0.766)\) are close. Explain the coincidence and name one change that would separate them.
In one paragraph, narrate the course arc from “a sample in hand, a parameter out of reach” to “a decision,” touching every major tool.

Reading and source pointer

For review, revisit the MIT OCW 18.05 readings on estimation, likelihood, confidence intervals, testing, and Bayesian inference, and the ModernDive chapters on sampling, bootstrapping, and hypothesis testing — no new material, just the term’s arc in one pass. These notes are the course’s own synthesis, grounded in but not copied from the sources.

Formula-verification status

verified: false. Every number recalled here — \(\hat p = 0.65\), \(\operatorname{SE} \approx 0.075\), the CI \((0.502, 0.798)\), one-sided \(p \approx 0.029\), the posterior \(\text{Beta}(28,16)\) with credible interval \((0.493, 0.766)\) and \(P(\theta > 0.5 \mid x) \approx 0.975\) — is drafted, synthetic, and carried unchanged from earlier weeks; none is independently checked. The course math/statistics gate is BLOCKED: every value here is provisional, pending the human/source sign-off in _state/notation_ledger.md §5.

Public vs. graded

These notes, the examples, and the practice here are public and ungraded — study material only. No graded prompts, answer keys, rubrics, point values, or due dates appear on this site. The cumulative final — its date within the December 9–15 window, its format, and its rules — lives in Blackboard (the LMS), which is authoritative for due dates, submissions, and grades. If this page and Blackboard ever disagree, follow Blackboard.

Looking ahead

The cumulative final falls in the December 9–15 window (the exact block is posted on Blackboard). Beyond the final, the habits of this course outlast the exam: whenever you meet a claim from data — a poll, a trial, a dashboard, a headline — you now have four ways to ask what it conditions on, what it assumes, and what it honestly supports. That is the lasting point of learning inference.