Hello! Today:¶
- Before class: Copy today's exercise file (
pandas exercises.ipynb
) and the Module 2 notes from the textbook into your class notes repo, open both - Discussing ASGN1 and peer reviews
- A brief discussion of
numpy
- Getting dirty with
pandas
Please fill out a short check in survey soon: Link here, or on the dashboard's tasks
ASGN 1 Thoughts...¶
And some awards¶
The award for most commits goes too...
- Class 1: justinreed23 and chrisztoh (close: Saintwy6)
- Class 2: nicoschuster01 (close: Josh Simon ask jls224; Nick Scheri)
Come on down!
And also, an award for favorite README meme (TOUGH competition)
Any interest in a free lunch?¶
Caveat: My company is a condition of the lunch. Lehigh pays so that students and faculty can break bread and chat, open topic. Up to 5 students per lunch.
If interested, email me and include
- Classmates that have agreed to join you.
- My availability (with preference ranking):
- Monday at 130pm (strongest)
- Monday or Wednesday brunch at 945am (I'll pick up and bring it) (strong)
- Friday noon, select Fridays this semester (limited)
- Else: which day of week/times can work for you
ASGN 1 Peer reviews¶
- Peer review: A chance to learn and teach!
- You were added to two classmate's repos today for reviewing
- Go to github.com. On the left side of the page, under repositories, you'll see that you have access to assignments for two peers.
- For each student you're reviewing, open the answer key I put in the repo and click on the link to the survey
- Most of the review can be done looking at the repo online, but...
⭐⭐⭐ MOST IMPORTANT: You must clone the repo to your computer and run the code ON YOUR COMPUTER (The essential ingredient of collaborative coding, and a fundamental takeaway from class) ⭐⭐⭐
If you have questions while doing reviews
- if it is a general question ("Is XYZ correct?"): @classmates, but don't identify the reviewee
- if it is sensitive: email TA (and CC me)
Review demo + working through Q7¶
Volunteer? I'll submit a full feedback form for your assignment as well.
Numpy summary¶
To use numpy functions, add this to beginning of your notebook: import numpy as np
Why is there a chapter in the book on it?
Numpy is great for:
- simulations and derivatives
- doing math operations pandas can't
- Ex:
np.median()
,np.percentile()
,np.floor()
- Ex:
- has features pandas doesn't
- Ex:
np.nan
(missing value)
- Ex:
- does all of that fast
np
🤝 pandas
NP reference¶
numpy.org/doc has pages with
- the "absolute basics" start here (very good)
- quickstart guide (next)
- how-to section
In-class NP examples¶
- Indexing looks the same (at least for 1D arrays):
import numpy as np
myray = np.arange(15) # create array
print("myray:", myray)
print("slice:", myray[6:11]) # pick the 6-10th elements
# Q1: pick the odd elements
myray: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14] slice: [ 6 7 8 9 10]
- Indexing looks the same (at least for 1D arrays)
- Booleans and masking
# create a random vector (every run of this --> diff #s)
from numpy.random import default_rng
rg = default_rng()
myray = rg.standard_normal(5)
print("myray:", myray)
# Q2: how can you always select the positive elements from this?
# prof demo: booleans, a single condition-->bool,
# using booleans on an array/list,
# indexing/filtering via booleans as "masks"
# then answer
myray: [ 1.33145787 0.54058088 -0.06731178 -0.88356257 -0.09785319]
What we just learned about boolean masking works directly in Pandas to filter data based on criteria!¶
PANDAS¶
- 3.2.0, 3.2.1
- How do I do X? 3.2.3 (!!!) and 3.2.7
Let's learn to use pandas by working with real data!
Exploring the incredible FRED dataset¶
FRED is https://fred.stlouisfed.org/
- Unreal repository of data: Download, graph, and track 786,000 US and international time series from 103 sources.
- Check out "at a glance" and "popular series"
- Usable in other classes: easy to download, clean, modify, analyze, plot all that in seconds of coding!
Panda basics¶
Now, let's go back to the exercise file and stop right before EDA.
A quick quiz¶
Student demos¶
- Everyone will do a couple throughout the semester
- Describe your approach (pseudocode) and then show and explain code step by step
- Low stakes - typically small problems
- Varieties: single or team show-and-tell, compare/contrast/discuss
Next class¶
Will resume the pandas
exercises next class, ⭐⭐ during which we will have our first student led demos. ⭐⭐
- Class 1:
- @bmnguyen6403 should solve Q0-Q3
- @MariaMaragkelli should solve Q4 and try Q5
- Class 2:
- @leosc326 should solve Q0-Q3
- @ZiggyFloydLee and @acg425 should solve Q4 and try Q5
The demo schedule is here: https://github.com/LeDataSciFi/ledatascifi-2024/discussions/3
Do these in the pandas exercise file and sync it - I'll open your class notes repo when we discuss your solutions.