Causal-diagram guide
How to draw and read a causal diagram — confounders, colliders, backdoor paths, and bad controls
A causal diagram (a directed acyclic graph, or DAG) is a picture of your assumptions about what causes what. It is not a result and not a conclusion — it is the reasoning tool that tells you what to adjust for and, just as importantly, what not to adjust for. This guide is a first-course, applied treatment: the goal is to reason clearly, not to prove identification theorems. (For a deeper, optional treatment, see Causal Inference: What If by Hernán & Robins, freely readable online — linked here, not reproduced.)
The pieces
| Piece | Picture | Meaning |
|---|---|---|
| node | a labelled point | a variable (treatment, outcome, or another factor) |
| directed edge | an arrow \(A \to B\) | “\(A\) is a direct cause of \(B\)” (your assumption) |
| path | a chain of edges | any route between two nodes, following or against arrows |
| causal path | arrows aligned \(Z \to \dots \to Y\) | how the treatment actually affects the outcome |
| backdoor path | a non-causal route from \(Z\) to \(Y\) that starts with an arrow into \(Z\) | a source of confounding you must block |
The three roles a third variable can play
This is the heart of the guide. The same variable can require opposite handling depending on its role.
- Confounder — a common cause of both treatment and outcome: \(Z \leftarrow C \to Y\). It opens a backdoor path. Adjust for it (stratify or include it) to close the path.
- Mediator — a variable on the causal path: \(Z \to M \to Y\). It carries part of the effect. Do not adjust for it if you want the total effect — adjusting removes the very effect you are trying to measure.
- Collider — a common effect of two variables: \(A \to K \leftarrow B\). A collider blocks its path by default; adjusting for it opens a spurious association between \(A\) and \(B\).
A bad control is adjusting for a mediator or a collider. It is one of the most common — and most invisible — ways to get a wrong causal answer, because the software runs fine and the number changes in a plausible-looking direction.
The backdoor rule (informally)
To estimate the causal effect of \(Z\) on \(Y\), block every backdoor path by adjusting for a set of pre-treatment variables (an adjustment set) — and add nothing post-treatment. Concretely:
- List the variables you believe cause the treatment, the outcome, or both.
- Mark which are pre-treatment (measured before treatment) and which are post-treatment.
- Find the backdoor paths (routes into \(Z\) that reach \(Y\)).
- Choose a set of pre-treatment variables that blocks all of them — that is your adjustment set.
- Never put a mediator or a collider in the adjustment set.
Worked example — the tutoring-center study
The course’s observational study asks whether using the tutoring center (\(Z\)) improves an end-of-term outcome (\(Y\)). Students self-select, so prior ability is in play. The assumed diagram (described in words, since this draft site renders diagrams as text):
- PriorAbility \(\to\) Use — more-prepared students are likelier to use the center.
- PriorAbility \(\to\) Outcome — more-prepared students do better anyway.
- Use \(\to\) Outcome — the effect we want.
- Use \(\to\) HoursStudied \(\to\) Outcome — using the center leads to more studying, which helps.
Reading it:
- PriorAbility is a confounder: it sits on the backdoor path Use \(\leftarrow\) PriorAbility \(\to\) Outcome. Adjusting for it closes the backdoor — which is why the naive difference \(+8.0\) falls to the adjusted \(+3.0\).
- HoursStudied is a mediator (post-treatment): it is on the causal path Use \(\to\) HoursStudied \(\to\) Outcome. Adjusting for it would be a bad control — it would remove part of the very effect you want and bias the estimate.
So the adjustment set is \(\{\text{PriorAbility}\}\), and not \(\{\text{PriorAbility}, \text{HoursStudied}\}\). The diagram, not the regression output, is what tells you that.
What a diagram does and does not buy you
- It does make your causal assumptions explicit and tell you what to adjust for.
- It does not prove those assumptions. If a real confounder is unmeasured, no adjustment can close its backdoor — which is the standing limit of observational evidence, and why random assignment is so valuable.
- A diagram with an arrow you cannot justify is worse than no diagram: every edge is a claim you are making.
This page is a study reference. For graded specifics — deadlines, submissions, and policies — Blackboard (the LMS) is authoritative.