Lab 5 — Bootstrap the median

Resampling with replacement to see the lumpy bootstrap distribution of the median

Purpose. This lab is the hands-on companion to Week 5 — Bootstrap distributions. The note argues that when no closed-form standard-error formula exists for a statistic — and there is none for the median — you can substitute the empirical distribution \(\hat F_n\) for the unknown population and resample your own data with replacement to watch the statistic vary. Here you build that object. You will fix the \(n = 25\) Express wait times of Dataset W, resample them with replacement, recompute the median about \(10{,}000\) times, and read off the bootstrap standard error (\(\approx 1.2\) min). You will also see the headline feature of this week: the bootstrap distribution of a median is lumpy / discrete — it lands on only a handful of distinct order-statistic values — and you will contrast it with the smoother distribution you get when you bootstrap the mean. Finally you will bootstrap the difference in medians (SE \(\approx 2.0\) min). The code is shown for study and is not executed on this site; you run it in your own R session.

The idea

You have one sample of \(25\) Express wait times, and from it you compute a single number — the sample median, \(12\) min. That is your best guess at the typical Express wait, but a fresh batch of arrivals tomorrow would hand you a median a little different. How much would it move? In the ideal world you would re-run the clinic a thousand times, collect a thousand medians, and look at how they spread — that spread is the sampling distribution of the median, and its standard deviation is the standard error you want. The fatal catch is that you have exactly one sample; you cannot redraw from the population at will.

The bootstrap’s move is to treat the data you already have as the best available picture of the population. The empirical distribution \(\hat F_n\) puts probability \(1/25\) on each of your \(25\) observed waits and nothing elsewhere. Drawing a fresh sample “from the population” then becomes drawing a sample from the data, and drawing from \(\hat F_n\) is exactly resampling with replacement: each of the \(25\) draws lands on one of your observed waits, uniformly and independently, so a value can recur and another can be missed. Recompute the median on each such resample, collect those bootstrap medians, and their spread approximates the sampling distribution you could never observe directly.

The median makes this idea unusually revealing, which is why it is the lab’s subject. The median of \(25\) numbers is the \(13\)th order statistic, \(x_{(13)}\) — one of the actual observed values. A bootstrap sample is built only from your original \(25\) numbers, so its median can only ever be one of those numbers. There are at most \(25\) possible answers, and in practice the bootstrap median piles onto the two or three values nearest the center. The bootstrap distribution of the median is therefore a short picket fence of spikes with visible gaps, not a smooth bell — and that lumpiness is correct, not a defect. It faithfully reflects that the sampling distribution of an order statistic really is more granular than that of an average. Naming what that buys you, and what it cannot, is the discipline of the lab.

Goal

By the end of this lab you should be able to:

Fix the \(n = 25\) Express wait times of Dataset W in R with the course seed, name the wait (an independent observation) as the resampling unit, and confirm the observed median is \(12\) min.
Use sample(x, replace = TRUE) to draw one bootstrap sample, and replicate() to repeat the resample-and-recompute step about \(10{,}000\) times into a bootstrap distribution of the median.
Read the bootstrap standard error of the Express median as the standard deviation of that distribution (\(\approx 1.2\) min), and see — via table() — that the distribution is lumpy / discrete, landing on a few order-statistic values.
Bootstrap the Express mean the same way and observe that its bootstrap distribution is smoother, so you can name why the lumpiness is a property of the statistic, not the method.
Bootstrap the difference in medians (Express minus Standard) by resampling each group separately with replacement, and read its larger standard error (\(\approx 2.0\) min).
State the assumption-ladder move the bootstrap makes — what it assumes, what it resamples, what it protects against, and what it cannot prove — and flag the two ways it fails.

The point of the lab is the demonstration, not the syntax: the bootstrap turns “I cannot redraw from the population” into “I can redraw from my data,” and the median makes the cost of that trade — a discrete, granular picture — plainly visible. The code is the means, not the message.

Setup

You need only base R — no packages. Every chunk that draws randomness begins with set.seed(45203), the course-wide seed, so your run reproduces the locked Week 5 numbers. The data are synthetic; seed set — these are not real clinic records.

Fix the facts before any code runs, because the resampling logic follows from them:

Resampling unit: the individual wait (one arrival’s intake time). Each of the \(25\) Express waits is treated as an independent observation, so a bootstrap sample resamples waits. (If each patient had contributed several visits, the patient would be the unit and you would resample patients — that wrong-unit trap is the lab’s common mistake below.)
What is not drawn fresh: you have exactly one sample of \(25\) Express waits; you cannot redraw from the population. The bootstrap substitutes \(\hat F_n\) — the data’s own distribution — for that unreachable population.
The locked slice (Dataset W, Express arm): \(n_T = 25\) right-skewed Express waits in minutes, sample median \(= 12\) min, mean \(\approx 15\) min (the right tail drags the mean above the median). Bootstrap SE of the Express median \(\approx 1.2\) min; the bootstrap distribution is lumpy / discrete. For the Standard arm, \(n_C = 25\), median \(= 18\) min, so the observed difference in medians is \(12 - 18 = -6\) min, with bootstrap SE of the difference \(\approx 2.0\) min.

set.seed(45203)

# --- Dataset W, Express arm: service wait times in minutes (synthetic; seed set) ---
# Resampling unit = the individual WAIT (one arrival's intake time).
# Right-skewed; the locked sample median is 12 min, the mean ~ 15 min.
express <- c(6, 7, 8, 8, 9, 10, 10, 11, 11, 12, 12, 12, 12,
             13, 14, 15, 16, 17, 19, 19, 21, 24, 28, 33, 41)

length(express)                      # 25 Express waits (n_T = 25)
median(express)                      # observed Express median -> 12 min  (locked)
mean(express)                        # observed Express mean   -> ~15 min (tail-inflated)

# The Standard arm (for the difference-in-medians step later); median = 18 min.
standard <- c(9, 10, 11, 11, 12, 12, 12, 14, 16, 17, 18, 18, 18,
              18, 19, 21, 24, 25, 26, 27, 29, 33, 49, 64, 88)
median(standard)                     # observed Standard median -> 18 min (locked)

The observed Express median is \(12\) min and the mean is about \(15\) min; the gap between them is the right skew made numeric — two long waits near \(33\) and \(41\) pull the mean up while leaving the median put. The assumption-ladder move so far: you assume the \(25\) waits are an (approximately) independent sample whose empirical distribution \(\hat F_n\) is a usable stand-in for the population of Express waits; you are not assuming normality, symmetry, or any formula for the median’s sampling distribution. That choice protects you from a density-dependent standard-error formula you have no way to evaluate; it cannot prove that \(25\) waits capture the true tail. Everything below asks one question about the \(12\): how much would it wobble if you could redraw?

Steps

You will build the bootstrap distribution in four moves: resample once to see the engine turn over (Step 1), resample about \(10{,}000\) times and read the standard error and the lumpiness (Step 2), bootstrap the mean for contrast (Step 3), then bootstrap the difference in medians (Step 4).

Step 1 — Draw one bootstrap sample and recompute the median

The bootstrap’s whole trick, done once: draw \(25\) values with replacement from your \(25\) Express waits, then recompute the median on that resample. sample(x, size = length(x), replace = TRUE) is the literal “draw from \(\hat F_n\)” operation — each draw is equally likely to be any of the \(25\) waits, and the same wait can be picked more than once.

set.seed(45203)

# One bootstrap sample: 25 draws WITH replacement from the 25 Express waits.
resample_1 <- sample(express, size = length(express), replace = TRUE)

sort(resample_1)                     # some waits appear 2-3 times, others are missing
median(resample_1)                   # the median of ONE bootstrap sample -> e.g. 11, 12, or 13

# How many DISTINCT original waits made it into this resample?
length(unique(resample_1))           # typically ~16 of the 25 (the rest are duplicated)

Sorting resample_1 shows the hallmark of resampling with replacement: some original waits appear two or three times while roughly a third of them are absent — on average only about \(16\) of the \(25\) distinct values survive into any one bootstrap sample. The single bootstrap median you get is one of your original waits (here something near \(11\), \(12\), or \(13\)), because a median drawn only from your \(25\) numbers can only ever be one of them. On its own this one value means nothing; it is one point in a distribution. Assumption-ladder move: you assumed \(\hat F_n\) stands in for the population and the waits are independent; you resampled waits with replacement, preserving nothing but the marginal distribution of the \(25\) values; this is the move that protects you from needing a median standard-error formula; it cannot prove the one sample resembles the population.

Step 2 — Build the bootstrap distribution and read the SE

Now repeat the one-resample trick about \(10{,}000\) times with replicate(), collect the \(10{,}000\) bootstrap medians, and summarize them. The bootstrap standard error is just the standard deviation of that pile. The table() line is the payoff of the lab: it shows how few distinct values the medians actually take.

set.seed(45203)

# Bootstrap the Express MEDIAN: resample with replacement, recompute, repeat.
B <- 10000
boot_med <- replicate(B, {
  resample <- sample(express, size = length(express), replace = TRUE)
  median(resample)                   # median of one bootstrap sample
})

sd(boot_med)                         # bootstrap SE of the Express median -> ~1.2 min
mean(boot_med)                       # bootstrap distribution centers near the observed 12

# The signature teaching point: the distribution is LUMPY / DISCRETE.
table(boot_med)                      # only a few distinct values dominate, e.g.
                                     #   11 : ~1600   12 : ~5200   13 : ~2300   ...
                                     #   (an order statistic on n = 25 -> a picket fence)

# bootstrap SE(Express median) ~= 1.2 min ; distribution lumpy/discrete

The bootstrap standard error comes out at about \(1.2\) min: the Express median of \(12\) min would typically shift by a little over a minute if you could redraw the sample. Read it for the question — the \(12\) is a point estimate uncertain by roughly a minute, not a margin of error or confidence interval (those wait for Week 6). The table(boot_med) output is the lesson the lab is built around: the bootstrap medians pile onto just a few distinct values (here \(11\), \(12\), \(13\) dominate, with visible gaps and small mass elsewhere), so the distribution is a picket fence, not a smooth curve. That is exactly what an order statistic on \(n = 25\) produces — the median can only land on values that were already in your data. Assumption-ladder move: you assumed \(\hat F_n\) stands in for the population and the waits are independent; you resampled the waits with replacement; the lumpiness protects you from over-trusting a too-smooth uncertainty picture for the median, and a percentile interval read off this distribution (Week 6) will inherit visibly chunky endpoints; it cannot make the median’s sampling distribution continuous — that granularity is real, and the most the bootstrap can do is show it honestly.

Step 3 — Bootstrap the mean for contrast (smoother)

To see that the lumpiness is a property of the statistic and not of the bootstrap, change one word — median to mean — and rerun. Averaging \(25\) numbers produces a near-continuous spread of values, so the bootstrap distribution of the mean is smooth and roughly bell-shaped.

set.seed(45203)

# Same machinery, the MEAN instead of the median.
boot_mean <- replicate(B, {
  resample <- sample(express, size = length(express), replace = TRUE)
  mean(resample)                     # mean of one bootstrap sample
})

length(unique(round(boot_mean, 2)))  # MANY distinct values -> a smooth spread
length(unique(boot_med))             # only a HANDFUL -> the lumpy median, for contrast

# the mean's bootstrap distribution is smooth and bell-ish;
# the median's is a discrete picket fence -- same method, different statistic

Comparing the two length(unique(...)) counts makes the contrast concrete: the bootstrap mean takes thousands of distinct values (a smooth spread), while the bootstrap median takes only a handful (the picket fence). The bootstrap did not change between Steps 2 and 3 — only the statistic did. An average mixes all \(25\) resampled numbers into a value that can fall almost anywhere, so its distribution is near-continuous; a median snaps to a single observed value, so its distribution is discrete. The assumption-ladder move is identical for both — assume \(\hat F_n\) stands in for the population, resample waits with replacement — but the shape of the result is dictated by the statistic. The lesson: never report a clean smooth curve for a median’s uncertainty; that would be the lie, and the lumpiness is the honest picture. (Note too that on this right-skewed sample the mean’s bootstrap distribution is itself not perfectly symmetric — the tail leaks into it — which is one reason the median is often the more stable summary here.)

Step 4 — Bootstrap the difference in medians (two groups)

The observed effect from Week 1 is a difference in medians, Express minus Standard, \(12 - 18 = -6\) min. Because the two groups are independent, you bootstrap them separately: resample the \(25\) Express waits with replacement, resample the \(25\) Standard waits with replacement, recompute the difference of the two sample medians, and repeat. The standard deviation of those differences is the bootstrap SE of the difference.

set.seed(45203)

# Observed difference in medians: Express - Standard = 12 - 18 = -6 min.
d_obs <- median(express) - median(standard)
d_obs                                # -> -6 min  (Express faster)

# Bootstrap the DIFFERENCE: resample EACH group separately, recompute the gap.
boot_diff <- replicate(B, {
  ex <- sample(express,  size = length(express),  replace = TRUE)
  st <- sample(standard, size = length(standard), replace = TRUE)
  median(ex) - median(st)            # difference of the two bootstrap medians
})

sd(boot_diff)                        # bootstrap SE of the difference -> ~2.0 min
mean(boot_diff)                      # centers near the observed -6

# bootstrap SE(difference in medians) ~= 2.0 min  (> 1.2: it carries TWO medians' wobble)

The bootstrap SE of the difference comes out at about \(2.0\) min, larger than the single median’s \(1.2\) min — and that ordering is no accident: a difference accumulates the sampling wobble of two medians, so its uncertainty exceeds either one alone. Read for the question: the Express-vs-Standard gap of \(-6\) min carries a sample-to-sample wobble of about \(2\) min, so the \(-6\) sits comfortably more than its own standard error away from zero — a foreshadowing of the Week 6 percentile interval of about \((-10, -2)\) that excludes zero. Assumption-ladder move: you assumed each group’s empirical distribution stands in for its population and that the two groups are independent (which is exactly why you resample them separately); you resampled within each group with replacement; this protected you against the unstable parametric standard error that the two long Standard waits (near \(64\) and \(88\)) would inflate; it cannot prove the two samples are free of dependence or measurement quirks the design might carry.

Verify

This is where the simulation and the Week 5 note are supposed to meet. Check each item against the note, and treat a mismatch as a bug in your code, not a discovery about the world — every target value is the locked, synthetic Week 5 slice, and thethe worked numbers are provisional and not independently verified.

The observed Express median is \(12\) min. median(express) from Setup should return \(12\). If it does not, your express vector drifted from the locked Dataset W slice — re-check that you typed the \(25\) values exactly and did not sort or edit them.
The bootstrap SE of the median is about \(1.2\) min. sd(boot_med) from Step 2 should land near \(1.2\). A value far off usually means replace = TRUE was dropped (so every resample is the original data and the SE collapses toward \(0\)) or the seed was not set.
The median’s bootstrap distribution is lumpy / discrete. table(boot_med) should show only a handful of distinct values (around \(11\), \(12\), \(13\)) carrying nearly all the mass, with gaps. If you see a smooth spread of many values, you almost certainly bootstrapped the mean — re-read Step 2 and confirm the inner function is median().
The mean is smoother than the median. length(unique(round(boot_mean, 2))) from Step 3 should be in the thousands while length(unique(boot_med)) is a single-digit-to-low-double-digit handful. If the two counts are similar, check that Step 3 really swapped in mean() and that both used the same B.
The bootstrap SE of the difference is about \(2.0\) min, and larger than the single median’s. sd(boot_diff) from Step 4 should be near \(2.0\), and sd(boot_diff) > sd(boot_med) should be TRUE. If the difference’s SE is smaller, you likely resampled one pooled vector instead of each group separately — the difference must resample Express and Standard independently.
The conclusion is stated correctly. In one sentence: resampling the \(25\) Express waits with replacement estimates that the median of \(12\) min wobbles by about \(1.2\) min sample to sample; the distribution is discrete because a sample median is an order statistic; and the bootstrap assumes — it does not guarantee — that these \(25\) waits are an independent, faithful picture of the population. If your written conclusion calls the bootstrap “assumption-free” or treats the lumpiness as a bug, the reasoning — not the code — needs fixing.

A small honest note on reproducibility: a bootstrap SE is itself a simulation estimate, so across seeds or replication counts it will wobble in its last digit around \(1.2\) (and \(2.0\) for the difference). Increasing B tightens that wobble; it does not change the lumpiness or the conclusion. And the bootstrap is assumption-light, not assumption-free: it drops normality and the closed-form formula, but it still assumes the sample is an independent, faithful stand-in for the population — and it fails quietly when that breaks (an extreme order statistic like the maximum the bootstrap can never resample beyond, very small \(n\), or dependent data). You meet the maximum failure case head-on in Week 6.

AI use note

If you use an AI assistant on this lab, record it briefly. The load-bearing column is Verification — how you confirmed the output yourself, against the Week 5 note and the locked numbers, rather than trusting the tool. Computation, AI-assisted or not, does not make the bootstrap assumption-free or rescue a wrong resampling unit.

Tool	Purpose	Verification
LLM chat assistant	Explain why `sample(x, replace = TRUE)` is the same as drawing from \(\hat F_n\)	Re-ran Step 1 and confirmed some waits duplicated while others were missing (`length(unique(resample_1))` < 25); reconciled the explanation against the note’s “resample with replacement” definition
LLM chat assistant	Ask why the median’s bootstrap distribution is lumpy but the mean’s is smooth	Ran Step 2 `table(boot_med)` and Step 3 `length(unique(...))` myself, confirmed the median lands on a handful of order-statistic values and the mean on thousands, and matched the note’s “picket fence” account
Code formatter / linter	Tidy the two `replicate()` blocks for readability	Diffed before/after to confirm only whitespace changed; re-ran to confirm `sd(boot_med)` still near 1.2 and `sd(boot_diff)` still near 2.0