Teaching notes:

  • Print L11.txt to have myself class.

Hello! Today¶

  • Quick assignment 4 review
  • Introducing assignment 5 as a roadmap

Quick assignment 4 review¶

Gentle reminders:

  • ledatascifi.github.io, google.com, and stackoverflow.com are your friends!
  • pseudo code more - including writing on paper. I usually write what I want/need and work backwards

Any questions? Anyone want a chance to show their work? (Perhaps if you don't usually get a chance to talk...)

  1. Part 2: create the variable before the merge (follow the website!)
  2. Part 3: two things you had to do:
    • https://github.com/LeDataSciFi/ledatascifi-2021/blob/main/data/two_pat_vars.csv
    • merge on firm AND year (in the "tips & best practices"... verbatim)

The assignment (5 aka midterm)¶

I call it an "assignment" for continuity, but it's the [10% "Midterm Project"], meaning it's 2x the weight of an assignment

Really fun news: This is a end-to-end data science project! You will be downloading a lot of files, parsing/exploring/cleaning those file, and then exploring the data.

Implications:

  • 2x the weight: Will take more time
  • Real project: Fewer grading portions are black & white "objectively" correct
  • Deliverable will be graded like an essay, depends also on your economic/business arguments

The assignment (5 aka midterm)¶

  • Basic question: What "types" of firms were hurt more or less by covid?
  • Specific questions: What risk factors were associated with better/worse stock returns around the onset of covid?
    • This is called a "cross-sectional event study"
    • Expected minimum output: Scatterplot (x = some "risk factors", y = returns around March 2020) with regression lines; formatted well
    • Discussion of the economics linking the your risk factors to the returns is expected
    • Pro output: Regression tables, heatmaps, better scatterplots
  • New data science technique: Textual analysis. We will estimate "risk factors" from the text of S&P 500 firm's 10-K filings.

How do we solve this? As usual: Break the problem down into parts, working backwards

Work with your classmate to outline an approach (psuedocode even) and take notes - this will be useful!

I'll call on TBD to share their approach in 10 minutes.

Ok, now: Intro to wizardy (scraping data from the web)¶

  • We want to download 500ish 10-Ks

https://ledatascifi.github.io/ledatascifi-2022/content/04/01_Intro_to_scraping.html

Next class¶

  • After class: A5 will be live.
  • Building a spider to download 10-Ks
  • Next week: "read those pages for various risk indicators"

Student demos¶

  • Task: Will be a continuation of something from today
  • 920am: Dahee (@daheeseminara) and Matt (@mattmorana7)
  • 1045am: Priya (@priyagb99) and Sebastian (@SebastianStoneham)

Summary¶

We have planned and started some serious data analysis. Let's do this!