L18_RegSEVocabIssues slides

Today¶

Interpretation wrap (15 min max)
Regressions
- vocab
- statistically vs. economically significant
- a (brief) discussion of causality

Finishing up the interpretation worksheet¶

< 15 minutes on this

Vocab¶

Over the next few slides, we cover:

"null hypothesis"
std errors
t-stats
p-values
economic significance

Vocab via example¶

Last class, we estimated a model:

$$ intrate = a + b * CredScore + u$$

The null hypothesis we are testing is: "Credit score is NOT related to the interest rate" ($\beta=0$)
Testable: Can we reject the null? (Establish $\beta \neq 0$)
Not testable: Accepting the null (Establish $\beta = 0$)

Example: A covid test

Null hypo: I don't have Covid
Reject: I do have Covid :(
Negative test means "I can't reject the null that I don't have covid"
Negative test does not mean: "I don't have covid"

Our model gave us this output:

var	coef	std err	t	P>t
Intercept	11.5819	0.046	253.270	0.000
CredScore	-0.0086	6.14e-05	-139.198	0.000

The coefficient column is the estimates of $\hat{a}$ and $\hat{b}$, (hats denote estimates)

There is uncertainty in those estimates:

Link

Our model gave us this output:

var	coef	std err	t	P>t
Intercept	11.5819	0.046	253.270	0.000
CredScore	-0.0086	6.14e-05	-139.198	0.000

The coefficient column is the estimates of $\hat{a}$ and $\hat{b}$, (hats denote estimates)
Std err estimates std deviation of the coef

t-stat: coef/se

p-value (P>|t|): what is the probability that my non-zero beta is not zero, by random chance?
- the lower it is, the more "certain" we can be that the relationship isn't zero
- In regression output tables, 1 star means p<10%, 2 means p<5%, 3 means p<1%

So you run a regression¶

... and a variable has a p-value below 5%.

Party?!

Not yet!

A "statistically significant relationship between X and Y" DOES NOT MEAN SIGNIFICANT¶

"Economic significance" matters: Stat sig but economically trivial = yawn

Loose definition: Is a "reasonable" change in X assoc with a "large" change in y?

"reasonable $\Delta$X": $\beta$ captures a one unit change, which might be tiny or huge
Good trick: Scale continuous variables by their STD so that a one unit change in X is a STD.
"large $\Delta$y": Compare coefficient to avg and std of y

So you run a regression¶

... and a variable has a p-value below 5%.

... and the relationship is large enough to be meaningful.

Party?!

Not yet!

Everyone who confuses correlation with causation eventually ends up dead¶

More commonly: "Correlation is not causation"

Reasons your (significant) correlation ain't causation:

It's spurious

You p-hacked
- Your focus should be on testing and evaluating a hypothesis, not "finding a result"

Omitted variables ("CEO ability","firm quality")
- One version: Simpson's paradox can be found many places and sometimes can be fixed with fixed effects, which looks like this

Mismeasurement of an X variable (IQ for "CEO ability", MTB for marginal Q)
- Look out for "proxies"
- Mismeasurement in y just raises SE
- (Classical) Mismeasurement in X causes attenuation

Simultaneity (think: "equilibrium outcomes")

Everyone who confuses correlation with causation eventually ends up dead¶

More commonly: "Correlation is not causation"

Reasons your (significant) correlation ain't causation:

It's spurious
You p-hacked
Omitted variables ("CEO ability","firm quality")
Mismeasurement of an X variable (IQ for "CEO ability", MTB for marginal Q)
Simultaneity (think: "equilibrium outcomes")

Reverse causality

Sample selection (effect of diversification of a firm on profitability)
- Selection on X is bad
- Selection on Y is superbad

On safely using this one-slide knowledge about causality

It's not clear to me if saying "correlation is not causation" causes people to become idiots, but it's clearly highly correlated with it.

Correlation not always being causal tells us nothing about a particular correlation being causal or not. It's just a dumb thinking substitute

Paul Graham hates the phrase but here is a more productive suggestion: propose a reason

Summary¶

The goal of this kind of analysis¶

Is NOT predicting the interest rate ($\hat{y}$ focused)

Model fit is NOT the focus

The goal here is to understand what variables matter ($\hat{\beta}$ focused):

For which X is the relationship $\hat{\beta}$ non-zero?
- Detect rejection of the null via P-value
Is the relationship economically important?
- Size of coefficient
Usually: Is the relationship causal?
- Requires (psuedo) random variation in X
- If causal, why? (What is the "mechanism" of causality?)
If not causal: We say it is "descriptive"

Discussion time¶

Where does your post on the discussion board fit into the issues outlined above?
Tell others about your post
Dig into one link

Next week¶

All my aspiring mad scientists rise up... ML is here!