Pandas

Unit 05

Working with tabular data

Pandas is one of the most famous python libraries for science, data science, and machine learning. It excels at manipulating tabular data, i.e. the kind of data you would have in a spreadsheet, a csv file, or weather station data.

Pandas is a fantastic library: it will take you minutes to grasp the fundamentals, and years to master it: this is what I like most about it ;-).

Getting started with pandas

For today, I strongly recommend to follow the various tutorials on the getting started page of the pandas documentation. Make sure you have had a look at assignment #5, so you know where you need to pay close attention and where you can have a lighter read. Another idea is to solve the tasks while you work your way through the tutorials.

If you’re short on time, focus on the following:

I strongly recommend to work through the following ones as well, but they do go into more complex terrain, so make sure you have the previous ones on your belt first.

Learning checklist

  • I know what Pandas Series and DataFrames are.
  • I can create Series and DataFrames or read tabular data from csv files.
  • I can subset DataFrames using the loc/iloc operators, or subset individual columns.
  • I can compute summary statistics using pandas.
  • I can plot data either by using pandas methods or matplotlib methods.