Hello!¶
Today, we are still working on pandas exercises.ipynb
Please fill out a short check in survey soon: Link here, or on the dashboard's tasks
Data Analysis, AI, and ML are Mostly Data Wrangling.¶
Data Analysis, AI, and ML are Mostly Data Wrangling.¶
Can we make it fun?
No.
OK But can we eliminate frustration?
Also no.
However, we can make it WORK. (Also, it's weirdly satisfying once you get into it.)
Data wrangling starts with EDA - Exploratory Data Analysis¶
Student demos¶
- Everyone will do a couple throughout the semester
- Describe your approach (pseudocode) and then show and explain code step by step
- Low stakes - typically small problems
- Varieties: single or team show-and-tell, compare/contrast/discuss
Let's see what our classmates came up with for Q0-Q3.
The cookbook can help you slightly automate your EDA
ydata-profiling
(formerly Pandas-profiling
) can supercharge your EDA but is not a full replacement for ABCD.
Also, the link to ASGN2 is on the discussion board.
Q0-Q3¶
Demos
Q4 and Q5¶
Demos - walk us through your attempts!
Post script, I'll build on this to show:
- pseudo code / planning
- iteratively developing the code
- ⭐⭐ "for loop" in pandas = groupby ⭐⭐
- chaining (on one line, over multiple lines)
( # anything between these parens # is "one" line of code )
- assign + lambda
- temporary vs permanent changes to a dataframe
Prof demo: Q6 and Q7¶
The point here isn't that this all makes perfect sense.
Follow my process.
You can look into the specific code bits later.
This was a huge week¶
Pandas = $ : No ML without EDA, no EDA without pandas
- Working with dataframes: reshaping, creating vars, summarizing data, and more
- Planning/pseudo code > Developing > Chains
- EDA: some of the things to look for + a recipe
- Remember: In pandas, "for-loops" = groupby (usually)
Next week!¶
Bosses: