Hello! Today...¶

  • (QUICKLY) Midterm Q&A - general, no code talk
  • Start talking about modeling

Midterm Q&A¶

Hit me (with your questions)

Powering up¶

You are nearly done with a your first farm-to-table analysis. One big question for your findings: Are the correlations real? Will the they hold up to more rigorous scrutiny?

Over the next month, we going to look at

  • putting possible relationships under more scrutiny
  • building prediction models
  • working as a team

Methods:¶

Next two weeks: Regression

  • how-to,
  • why-to,
  • and what it "means"

After that: Some ML algos

  • most ML algos follow a very similar sequence of steps
  • potential huge problems
  • the steps to avoid those problems

This will be fun!¶

The promise of ML¶

ML worth 140 BILLION in US finance firms by 2025 via cost reduction alone¶

Solutions to problems at scale. Fin-ML wave 1:

  • Robo-advising
  • Manage risk (loans and insurance) to reduce write-offs and lower costs for consumers
  • Prevent and detect fraud (external and internal)
  • Investment choices - stocks, real estate (where to put factories)
  • Improve advertising offers to credit customers

Each of these combines a question/problem with data and a model¶

Let's start by talking about "modeling"¶

(Recommended reading: Chapter 4-4.2 of Data 100)

I'm going to use the word "model" a lot. So let's talk about that...

A model is an idealized representation of a system¶

Examples:

  • A weather forecast
  • $E=mc^2$
  • Financing policies: $investment = MarginalQ$
  • Equity value = $ \sum DIV_t / (1+r_e)^t $
  • Asset prices: $r = \beta * MKT$
  • Your FCF projections (the whole excel file is a model)
  • Your Final Grade as GPA = 3.1 (class average)
  • Your Final Grade as GPA = 3.1 + 0.3 * effort
  • Your Final Grade as GPA = your current grade

A model is an idealized representation of a system¶

Famous take: "All models are wrong, but some are useful"

Like that weather forecast...

A model is an idealized representation of a system¶

All of these are ways to estimate models:

  • average
  • median
  • regression
  • nearest-neighbor
  • boosted regression trees
  • support vector machines
  • on and on... but something of the form $y=f(X1,X2,...)$

A model is an idealized representation of a system¶

The purpose of a model ($y=f(X1,X2,...)$) is typically either prediction or understanding relationships (e.g. $\delta y / \delta X1$ )

Predictions: Given some specific data X as inputs, predict $\hat{y}$

  • Which loans will default?
  • Focus is on accuracy when applied to new real world data ("out of sample")
  • Black box is "fine"

Relationships: You care about estimating the parameters of $f$

  • Example: Do airline closures affect how VCs monitor portfolio companies?
  • Focus is on direction and magnitude, causality/not
  • Black box bad, interpretation essential

So when I discuss a "model"

  • in the next two weeks, I'm probably just referring to a regression
  • after that, it might be a "fancier" estimation

But generically, it's just a way of thinking about the data.

... like FCF projections in your corporate finance classes

The pitfalls of ML¶

I'm curious about your associations, experiences with (job apps, robo help, etc), and perspectives on ML

  • We discussed Zillow on Day 1
  • More examples in 5.1.1

I'd like to introduce a framework for attacking problems with the techniques we will cover in class...

The big picture / framework of modeling¶

  1. Start with an interesting question or problem
    • Zillow: Think about the economics of it before proceeding!
  2. What type of question are you asking? Relationship or prediction?
  3. Think about data:
    • What is the ideal dataset that would most easily answer your question?
    • What data is available (sources, how easy/costly is it to get)
    • Explore the data.
    • If you have sub-optimal data, you’ll have to adjust your subsequent steps to use that data. Adjustments are a natural part of the modeling cycle.
  4. Pick your model(s)
  5. Estimate your model(s) and evaluate the output

The midterm fits exactly into this framework¶

Caveat: Steps 4 and 5 look a little different in prediction problems (see chapter 5.3)¶

Summary of Today is a Framework¶

  1. Start with an interesting question or problem
    • Think about the economics
    • Relationship or prediction?
  2. Thinking about the data you'd need
  3. Pick your model(s)
    • An idealized representation of a system
    • Workhorse: Regression
  4. Estimate your model(s) and evaluate the output
    • Does it solve your question/problem?

Starting the final project¶

  1. Let's talk about it.
    • Q/prob: Students pick. Wide latitude.
    • Deliverables: Proposal, revision, status report, analysis repo, website, presentation
    • Timeline (Proposals can be sent earlier, and I will give feedback earlier)
  2. Team formation. Both are fine:
    • Idea/interest based
    • Teammate based
    • Goal alignment
  3. Remaining time: Brainstorming + team formation
    • Declare here (I will put link in teammates forum)

Next class¶

  • The mechanics of running regressions

Student demos¶

  • Task: I'll send data and a regression to do tomorrow morning
  • Class 1: Yang and Harry
  • Class 2: Austen, Colin, and Eric