Any questions about anything on this? (Last week we mostly talked about mechanics of making plots show up. We can discuss anything on the broad topic of viz here)
We also have time to discuss the AWESOME array of graphs submitted for A3.
.sample()
or kind='hex'
. Another option: Some students plotted various 2D density graphsI provided examples of both in "plotting exercises" (see Q5 & Q6 answers)
Check out the resource page's visualization references for pointers to more resources on effective visualization
Check out 3.3.5 in the book for a quickstart on how you can customize your figures next time you want to fine tune plots (and then google a lot, since sane human memorizes matplotlib...)
Context - starting point: Remember, the class's first two objectives are to:
- obtain, explore, groom, visualize, and analyze data
- make all of that reproducible, reusable, and shareable
Context - right now: At this point, we've covered/added skills
We need to talk about a few more issues before we get properly ambitious.
Context - going forward: We need to introduce a few more skills before we start really running analytical models.
In the "merging exercises" notebook, we have
import pandas as pd
left_df = pd.DataFrame({
'firm': ['Accenture','Citi','GS'],
'varA': ['A1', 'A2', 'A3']})
right_df = pd.DataFrame({
'firm': ['GS','Chase','WF'],
'varB': ['B1', 'B2', 'B3'],
'varc': ['C1', 'C2', 'C3']})
Let use shift+tab to talk about the parameters.
how
: left v. right v. inner v. outer¶option | observations in resulting dataset |
---|---|
how = "inner" |
Keys (on variables) that are in both datasets |
how = "left" |
"inner" + all unmatched obs in left |
how = "right" |
"inner" + all unmatched obs in right |
how = "outer" |
"inner" + all unmatched obs in left and right |
More in the book!
Always specify how
, on
, indicator
, and validate
inside pd.merge()
After the merge, check that it did what you expected, and give the resulting dataframe a good name. Don't name it "merged"!!!