Today¶
- Interpretation wrap (15 min max)
- Regressions
- vocab
- statistically vs. economically significant
- a (brief) discussion of causality
Finishing up the interpretation worksheet¶
< 15 minutes on this
Vocab¶
Over the next few slides, we cover:
- "null hypothesis"
- std errors
- t-stats
- p-values
- economic significance
Vocab via example¶
Last class, we estimated a model:
$$ intrate = a + b * CredScore + u$$- The null hypothesis we are testing is: "Credit score is NOT related to the interest rate" ($\beta=0$)
- Testable: Can we reject the null? (Establish $\beta \neq 0$)
- Not testable: Accepting the null (Establish $\beta = 0$)
Example: A covid test
- Null hypo: I don't have Covid
- Reject: I do have Covid :(
- Negative test means "I can't reject the null that I don't have covid"
- Negative test does not mean: "I don't have covid"
Our model gave us this output:
var | coef | std err | t | P>t |
---|---|---|---|---|
Intercept | 11.5819 | 0.046 | 253.270 | 0.000 |
CredScore | -0.0086 | 6.14e-05 | -139.198 | 0.000 |
- The coefficient column is the estimates of $\hat{a}$ and $\hat{b}$, (hats denote estimates)
- There is uncertainty in those estimates:
Our model gave us this output:
var | coef | std err | t | P>t |
---|---|---|---|---|
Intercept | 11.5819 | 0.046 | 253.270 | 0.000 |
CredScore | -0.0086 | 6.14e-05 | -139.198 | 0.000 |
- The coefficient column is the estimates of $\hat{a}$ and $\hat{b}$, (hats denote estimates)
- Std err estimates std deviation of the coef
- t-stat: coef/se
- p-value (P>|t|): what is the probability that my non-zero beta is not zero, by random chance?
- the lower it is, the more "certain" we can be that the relationship isn't zero
- In regression output tables, 1 star means p<10%, 2 means p<5%, 3 means p<1%
A "statistically significant relationship between X and Y" DOES NOT MEAN SIGNIFICANT¶
"Economic significance" matters: Stat sig but economically trivial = yawn
Loose definition: Is a "reasonable" change in X assoc with a "large" change in y?
- "reasonable $\Delta$X": $\beta$ captures a one unit change, which might be tiny or huge
- Good trick: Scale continuous variables by their STD so that a one unit change in X is a STD.
- "large $\Delta$y": Compare coefficient to avg and std of y
So you run a regression¶
... and a variable has a p-value below 5%.
... and the relationship is large enough to be meaningful.
Party?!
Not yet!
Everyone who confuses correlation with causation eventually ends up dead¶
More commonly: "Correlation is not causation"
Reasons your (significant) correlation ain't causation:
- You p-hacked
- Your focus should be on testing and evaluating a hypothesis, not "finding a result"
- Omitted variables ("CEO ability","firm quality")
- One version: Simpson's paradox can be found many places and sometimes can be fixed with fixed effects, which looks like this
- Mismeasurement of an X variable (IQ for "CEO ability", MTB for marginal Q)
- Look out for "proxies"
- Mismeasurement in y just raises SE
- (Classical) Mismeasurement in X causes attenuation
- Simultaneity (think: "equilibrium outcomes")
Everyone who confuses correlation with causation eventually ends up dead¶
More commonly: "Correlation is not causation"
Reasons your (significant) correlation ain't causation:
- It's spurious
- You p-hacked
- Omitted variables ("CEO ability","firm quality")
- Mismeasurement of an X variable (IQ for "CEO ability", MTB for marginal Q)
- Simultaneity (think: "equilibrium outcomes")
- Reverse causality
- Sample selection (effect of diversification of a firm on profitability)
- Selection on X is bad
- Selection on Y is superbad
On safely using this one-slide knowledge about causality
It's not clear to me if saying "correlation is not causation" causes people to become idiots, but it's clearly highly correlated with it.
Correlation not always being causal tells us nothing about a particular correlation being causal or not. It's just a dumb thinking substitute
Paul Graham hates the phrase but here is a more productive suggestion: propose a reason
The goal here is to understand what variables matter ($\hat{\beta}$ focused):
- For which X is the relationship $\hat{\beta}$ non-zero?
- Detect rejection of the null via P-value
- Is the relationship economically important?
- Size of coefficient
- Usually: Is the relationship causal?
- Requires (psuedo) random variation in X
- If causal, why? (What is the "mechanism" of causality?)
- If not causal: We say it is "descriptive"