Pandas
Unit 05
Working with tabular data
Pandas is one of the most famous python libraries for science, data science, and machine learning. It excels at manipulating tabular data, i.e. the kind of data you would have in a spreadsheet, a csv file, or weather station data.
Pandas is a fantastic library: it will take you minutes to grasp the fundamentals, and years to master it: this is what I like most about it ;-).
Getting started with pandas
For today, I strongly recommend to follow the various tutorials on the getting started page of the pandas documentation. Make sure you have had a look at assignment #5, so you know where you need to pay close attention and where you can have a lighter read. Another idea is to solve the tasks while you work your way through the tutorials.
If you’re short on time, focus on the following:
- What kind of data does pandas handle?
- How do I select a subset of a DataFrame?
- How to calculate summary statistics
- How do I create plots in pandas?
I strongly recommend to work through the following ones as well, but they do go into more complex terrain, so make sure you have the previous ones on your belt first.
Learning checklist
- I know what Pandas Series and DataFrames are.
- I can create Series and DataFrames or read tabular data from
csv
files. - I can subset DataFrames using the
loc
/iloc
operators, or subset individual columns. - I can compute summary statistics using pandas.
- I can plot data either by using pandas methods or matplotlib methods.