I feel there is another important distinction that has less to do with statistics but also needs clarification. That is the difference between "small easy wins" and "high-cost transformative intervention".
For an example, suppose we figured out we could improve learning by a little bit simply by having the teacher start the semester saying "Kids, always remember you CAN learn!" Maybe it makes a small difference, but the cost of implementation is also really, really, low. It won't transform education forever, maybe the effect only shows up in the most powerful studies.
On the other hand, imagine something like full-time private tutors for every kid, or something that tries to get close to that. I bet it would work very well! But the cost is huge. Maybe some version of it passes the statistical test *and* still comes out as a positive return.
So, how do we compare those? Cost-benefit sure sounds good, and some interventions even _save_ money in the total account, by preventing other costs. You probably won't transform education *just* by doing those. But it would also be stupid not to!
And what should researchers focus on? It's a tough question!
Thank you for your comment :) I think it perfectly refines the lever metaphor.
If we think of a lever in terms of mechanical advantage (output/input), then your teacher example is a massive lever. Even if the effect size is small (say 0.02 SD), if the cost is literally zero (just a sentence spoken), the ROI becomes infinite. I think it's a lever worth pulling!
Also the private-tutor (or one-on-one instruction/high-dosage tutoring) example is documented in the literature and it does have an effect (~0.5SD, depending on the study). And as you said, it's a "costly" intervention, although it can be implemented at a larger scale by making kids working in small groups (3-5, with similar levels of ability - another thing to consider!), so you naturally dilute the cost.
Both your examples are complementary. The former is what I'd call a "high-leverage, low-risk" intervention (do it everywhere, it costs nothing), whilst the latter is "high-leverage, high-investment" (deploy strategically where the ROI justifies it). I'm getting this idea from portfolio selection theory hehe
We need researchers working on both. The danger (and what I hinted at in the post) is the "inefficient lever": the study that finds a tiny effect for a high cost, but gets published just because it's "identified."
Some interventions are levers on MULTIPLE outcomes (e.g., early childhood education might have small test score effects but large effects on incarceration, health, etc.), which changes the cost-benefit calc entirely
To answer your question on what we should focus on: avoid the inefficient lever. I don't really like checklists, that's why I refrained from providing one, but if we *were* to create one, some of the questions we should ask could include: a) am I identifying a local effect (what happened here) or a structural parameter (how the world works)?; b) if this works, will it explain a meaningful percentage of the variation in the outcome, or just 0.X%?; c) is the magnitude of the effect large enough to be practically distinct from zero?; and d) is the "mechanical advantage" real? (i.e., is the cost of the intervention low enough to justify a small effect, or are we using a "crane to lift a feather"?)
If a researcher finds that encouragement works, the next job is to find the mechanism (the channel). That would tell us why it works and if we can scale it.
Hope I addressed your concerns, and again thanks for the comment :)
I know that Tyler Cowen has been calling for more studies with “oomph” about questions we really care about (versus studies that are well identified but don’t have that impact). Your examples and background make that easier to understand.
It would be interesting to go into the approaches that economists/statisticians use to find the levers. Obviously R^2 is a good start but definitely not sufficient
The distinction between mechanisms and levers is really underappreciated in applied work. Too often we get hung up on whether someting is a cause without asking if we can actually pull that lever in practice. Your point about the credibility revolution focusing on local effects rather than structural drivers resonates. I think this is especially relevant when policymakers want actionable insights but get handed highly specific results from one SUTVA world. The question of what actually drives variation matters more than we aknowledge. Thanks for spelling this out so clearly!
One thing I always think about is that a non-significant coefficient doesn’t necessarily mean “there is no effect”, often it simply means that the estimate is too imprecise to detect one. Especially when sample sizes are limited or the variation is small, standard errors can be large even if the true effect is meaningful.
Sometimes, the literature and the context we're analyze tells us that there should be something, there should be an effect somewhere but then it turns out that it's not significant (or not highly significant). In other words, sometimes the question is not “is there an effect?” but “do we have enough information to measure it reliably?
Good post!
I feel there is another important distinction that has less to do with statistics but also needs clarification. That is the difference between "small easy wins" and "high-cost transformative intervention".
For an example, suppose we figured out we could improve learning by a little bit simply by having the teacher start the semester saying "Kids, always remember you CAN learn!" Maybe it makes a small difference, but the cost of implementation is also really, really, low. It won't transform education forever, maybe the effect only shows up in the most powerful studies.
On the other hand, imagine something like full-time private tutors for every kid, or something that tries to get close to that. I bet it would work very well! But the cost is huge. Maybe some version of it passes the statistical test *and* still comes out as a positive return.
So, how do we compare those? Cost-benefit sure sounds good, and some interventions even _save_ money in the total account, by preventing other costs. You probably won't transform education *just* by doing those. But it would also be stupid not to!
And what should researchers focus on? It's a tough question!
Hi Joao!
Thank you for your comment :) I think it perfectly refines the lever metaphor.
If we think of a lever in terms of mechanical advantage (output/input), then your teacher example is a massive lever. Even if the effect size is small (say 0.02 SD), if the cost is literally zero (just a sentence spoken), the ROI becomes infinite. I think it's a lever worth pulling!
Also the private-tutor (or one-on-one instruction/high-dosage tutoring) example is documented in the literature and it does have an effect (~0.5SD, depending on the study). And as you said, it's a "costly" intervention, although it can be implemented at a larger scale by making kids working in small groups (3-5, with similar levels of ability - another thing to consider!), so you naturally dilute the cost.
Both your examples are complementary. The former is what I'd call a "high-leverage, low-risk" intervention (do it everywhere, it costs nothing), whilst the latter is "high-leverage, high-investment" (deploy strategically where the ROI justifies it). I'm getting this idea from portfolio selection theory hehe
We need researchers working on both. The danger (and what I hinted at in the post) is the "inefficient lever": the study that finds a tiny effect for a high cost, but gets published just because it's "identified."
Some interventions are levers on MULTIPLE outcomes (e.g., early childhood education might have small test score effects but large effects on incarceration, health, etc.), which changes the cost-benefit calc entirely
To answer your question on what we should focus on: avoid the inefficient lever. I don't really like checklists, that's why I refrained from providing one, but if we *were* to create one, some of the questions we should ask could include: a) am I identifying a local effect (what happened here) or a structural parameter (how the world works)?; b) if this works, will it explain a meaningful percentage of the variation in the outcome, or just 0.X%?; c) is the magnitude of the effect large enough to be practically distinct from zero?; and d) is the "mechanical advantage" real? (i.e., is the cost of the intervention low enough to justify a small effect, or are we using a "crane to lift a feather"?)
If a researcher finds that encouragement works, the next job is to find the mechanism (the channel). That would tell us why it works and if we can scale it.
Hope I addressed your concerns, and again thanks for the comment :)
Great article! I wish I could write like this.
I know that Tyler Cowen has been calling for more studies with “oomph” about questions we really care about (versus studies that are well identified but don’t have that impact). Your examples and background make that easier to understand.
It would be interesting to go into the approaches that economists/statisticians use to find the levers. Obviously R^2 is a good start but definitely not sufficient
Thank you so much, Chris! I’ll think about it come back to your comment 🫡
The distinction between mechanisms and levers is really underappreciated in applied work. Too often we get hung up on whether someting is a cause without asking if we can actually pull that lever in practice. Your point about the credibility revolution focusing on local effects rather than structural drivers resonates. I think this is especially relevant when policymakers want actionable insights but get handed highly specific results from one SUTVA world. The question of what actually drives variation matters more than we aknowledge. Thanks for spelling this out so clearly!
thank you :)
Nice post!
One thing I always think about is that a non-significant coefficient doesn’t necessarily mean “there is no effect”, often it simply means that the estimate is too imprecise to detect one. Especially when sample sizes are limited or the variation is small, standard errors can be large even if the true effect is meaningful.
Sometimes, the literature and the context we're analyze tells us that there should be something, there should be an effect somewhere but then it turns out that it's not significant (or not highly significant). In other words, sometimes the question is not “is there an effect?” but “do we have enough information to measure it reliably?