Today¶

  1. Any Qs before we start?
  2. Regressions
    • vocab
    • statistically vs. economically significant
    • a (brief) discussion of causality

Vocab¶

Let's cover:

  • "null hypothesis"
  • std errors
  • t-stats
  • p-values
  • economic significance

Vocab via example¶

Last class, we estimated a model:

$$ intrate = a + b * CredScore + u$$
  • The null hypothesis we are testing is: "Credit score is NOT related to the interest rate" ($\beta=0$)
  • Testable: Can we reject the null? (Establish $\beta \neq 0$?)
  • Not testable: Accepting the null (Establish $\beta = 0$?)

Example: A covid test

  • Null hypo: I don't have Covid
  • Reject: I do have Covid :(
  • Negative test means "I can't reject the null that I don't have covid"
  • Negative test does not mean: "I don't have covid"

Our model gave us this output:

var coef std err t P>t
Intercept 11.5819 0.046 253.270 0.000
CredScore -0.0086 6.14e-05 -139.198 0.000
  • The coefficient column is the estimates of $\hat{a}$ and $\hat{b}$, (hats denote estimates)
  • There is uncertainty in those estimates. Again:

image.pngimage.png

Our model gave us this output:

var coef std err t P>t
Intercept 11.5819 0.046 253.270 0.000
CredScore -0.0086 6.14e-05 -139.198 0.000
  • The coefficient column is the estimates of $\hat{a}$ and $\hat{b}$, (hats denote estimates)
  • Std err estimates std deviation of the coef
  • t-stat: beta/se
  • p-value (P>|t|): what is the probability that my non-zero beta is not zero, by random chance?
    • the lower it is, the more "certain" we can be that the relationship isn't zero
    • In tables, 1 star means p<10%, 2 means p<5%, 3 means p<1%

So you run a regression¶

... and a variable has a p-value below 5%.

Party?!

Not yet!

A "statistically significant relationship between X and Y" DOES NOT MEAN SIGNIFICANT¶

"Economic significance" matters: Stat sig but economically trivial = yawn

Loose definition: Is a "reasonable" change in X assoc with a "large" change in y?

  • Former: $\beta$ captures a one unit change, which might be tiny or huge
  • Good trick: Scale continuous variables by their STD so that a one unit change in X is a STD.
  • Latter: Compare to avg and std of y

So you run a regression¶

... and a variable has a p-value below 5%.

... and the relationship is large enough to be meaningful.

Party?!

Not yet!

Everyone who confuses correlation with causation eventually ends up dead¶

More commonly: "Correlation is not causation"

Reasons your (significant) correlation ain't causation:

  • It's spurious
  • You p-hacked
    • Your focus should be on testing and evaluating a hypothesis, not "finding a result"
  • Omitted variables ("CEO ability","firm quality")
    • One version: Simpson's paradox can be found many places and sometimes can be fixed with fixed effects, which looks like this
  • Mismeasurement of an X variable (IQ for "CEO ability", MTB for marginal Q)
    • Look out for "proxies"
    • Mismeasurement in y just raises SE
    • (Classical) Mismeasurement in X causes attenuation
  • Simultaneity (think: "equilibrium outcomes")

Everyone who confuses correlation with causation eventually ends up dead¶

More commonly: "Correlation is not causation"

Reasons your (significant) correlation ain't causation:

  • It's spurious
  • You p-hacked
  • Omitted variables ("CEO ability","firm quality")
  • Mismeasurement of an X variable (IQ for "CEO ability", MTB for marginal Q)
  • Simultaneity (think: "equilibrium outcomes")
  • Reverse causality
  • Sample selection (effect of diversification of a firm on profitability)
    • Selection on X is bad
    • Selection on Y is superbad

Summary¶

The goal of this kind of analysis¶

Is NOT predicting the interest rate ($\hat{y}$ focused)

  • Model fit is NOT the focus

The goal here is to understand what variables matter ($\hat{\beta}$ focused):

  • For which X is the relationship $\hat{\beta}$ non-zero?
    • Detect rejection of the null via P-value
  • Is the relationship economically important?
    • Size of coefficient
  • Usually: Is the relationship causal?
    • Requires (psuedo) random variation in X
    • If causal, why? (What is the "mechanism" of causality?)
  • If not causal: We say it is "descriptive"

Discussion time¶

  • Where does your post on the discussion board fit into the issues outlined above?
  • Tell others about your post
  • Dig into one link

Next week¶

  • I need team names to make project repos