Hello! Today...¶
- (QUICKLY) Midterm Q&A - general, no code talk
- Start talking about modeling
Midterm Q&A¶
Hit me (with your questions)
Powering up¶
You are nearly done with a your first farm-to-table analysis. One big question for your findings: Are the correlations real? Will the they hold up to more rigorous scrutiny?
Over the next month, we going to look at
- putting possible relationships under more scrutiny
- building prediction models
- working as a team
Methods:¶
Next two weeks: Regression
- how-to,
- why-to,
- and what it "means"
After that: Some ML algos
- most ML algos follow a very similar sequence of steps
- potential huge problems
- the steps to avoid those problems
This will be fun!¶
Solutions to problems at scale. Fin-ML wave 1:
- Robo-advising
- Manage risk (loans and insurance) to reduce write-offs and lower costs for consumers
- Prevent and detect fraud (external and internal)
- Investment choices - stocks, real estate (where to put factories)
- Improve advertising offers to credit customers
Each of these combines a question/problem with data and a model¶
Let's start by talking about "modeling"¶
(Recommended reading: Chapter 4-4.2 of Data 100)
I'm going to use the word "model" a lot. So let's talk about that...
A model is an idealized representation of a system¶
Examples:
- A weather forecast
- $E=mc^2$
- Financing policies: $investment = MarginalQ$
- Equity value = $ \sum DIV_t / (1+r_e)^t $
- Asset prices: $r = \beta * MKT$
- Your FCF projections (the whole excel file is a model)
- Your Final Grade as GPA = 3.1 (class average)
- Your Final Grade as GPA = 3.1 + 0.3 * effort
- Your Final Grade as GPA = your current grade
A model is an idealized representation of a system¶
Famous take: "All models are wrong, but some are useful"
Like that weather forecast...
A model is an idealized representation of a system¶
All of these are ways to estimate models:
- average
- median
- regression
- nearest-neighbor
- boosted regression trees
- support vector machines
- on and on... but something of the form $y=f(X1,X2,...)$
A model is an idealized representation of a system¶
The purpose of a model ($y=f(X1,X2,...)$) is typically either prediction or understanding relationships (e.g. $\delta y / \delta X1$ )
Predictions: Given some specific data X as inputs, predict $\hat{y}$
- Which loans will default?
- Focus is on accuracy when applied to new real world data ("out of sample")
- Black box is "fine"
Relationships: You care about estimating the parameters of $f$
- Example: Do airline closures affect how VCs monitor portfolio companies?
- Focus is on direction and magnitude, causality/not
- Black box bad, interpretation essential
So when I discuss a "model"
- in the next two weeks, I'm probably just referring to a regression
- after that, it might be a "fancier" estimation
But generically, it's just a way of thinking about the data.
... like FCF projections in your corporate finance classes
The pitfalls of ML¶
I'm curious about your associations, experiences with (job apps, robo help, etc), and perspectives on ML
- We discussed Zillow on Day 1
- More examples in 5.3
I'd like to introduce a framework for attacking problems with the techniques we will cover in class...
This framework will be even more valuable in a post-GPT world. (More on this later in the semester.)
The big picture / framework of modeling¶
- Start with an interesting question or problem
- Zillow: Think about the economics of it before proceeding!
- What type of question are you asking? Relationship or prediction?
- Think about data:
- What is the ideal dataset that would most easily answer your question?
- What data is available (sources, how easy/costly is it to get)
- Explore the data.
- If you have sub-optimal data, you’ll have to adjust your subsequent steps to use that data. Adjustments are a natural part of the modeling cycle.
- Pick your model(s)
- Estimate your model(s) and evaluate the output
The midterm fits exactly into this framework¶
Caveat: Steps 4 and 5 look a little different in prediction problems (see chapter 7)¶
Summary of Today is a Framework¶
- Start with an interesting question or problem
- Think about the economics
- Relationship or prediction?
- Thinking about the data you'd need
- Pick your model(s)
- An idealized representation of a system
- Workhorse: Regression
- Estimate your model(s) and evaluate the output
- Does it solve your question/problem?
Starting the final project¶
- Let's talk about it. Textbook descript here.
- Q/prob: Students pick. Wide latitude. Prior examples.
- Deliverables: Proposal, revision, status report, analysis repo, website and/or dashboard, presentation
- Timeline (Proposals can be sent earlier, and I will give feedback earlier)
- Team formation. Both are fine:
- Idea/interest based
- Teammate based
- Goal alignment
- Remaining time: Brainstorming + team formation