Matplotlib

Unit 05

Crafting beautiful figures

In the course of this lecture, you have already come across a handful of figures. Now that you have learned a good deal of python basics, you will want to apply all your new power and creativity to solve tomorrow’s energy challenges! Well, in my ears that sounds like complex data sets and intricate relationships between a variety of variables. To make most of your data, you will want to look at it. A lot. And that’s what this page is all about: Sketching out quick ’n dirty graphs that allow you to check whether your data analysis is on the right track and crafting beautiful figures that support your findings and convince your stakeholders.

matplotlib

matplotlib is the library that implements all the visualization functionalities we will work with. Actually, we will only work with a submodule of the entire package, called pyplot. Convention agrees to import the module like

import matplotlib.pyplot as plt

No matter whether you will work with NumPy arrays or pandas DataFrames, you can use matplotlib.pyplot to visualize your results. However, there are still many roads that lead to Rome, or in other words, many different ways of ending up with the same (or similar) graph.

In the following, you will see one example of a generic approach that I recommend you to follow for figures that you want to style so that they look good. We will briefly look into the quick ’n dirty way in our next topic on pandas.

# Imports
import matplotlib.pyplot as plt
import numpy as np

# Sample data generation
x = np.linspace(0, 2 * np.pi, 200)
y = np.sin(x)

# Plotting
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title("An example figure")
plt.show()

You can basically start any figure that you plan to make pretty with the first plotting line fig, ax = plt.subplots(). This will create one figure with one set of axes. In this function call, you can set specific properties of the figure itself, e.g., the figsize argument, or the number of panels, e.g., fig, (ax1, ax2) = plt.subplots(ncols = 2).

In a next step, you fill the axes with the plots of your choice. You can do this by calling methods on the axes handle, ax.plot(...). Briefly skim this overview of plot types so that you understand the vast pool of options you have.

Once the data is on the chart, you can spend hours to style, label, and prettify the plot to your liking (e.g., which colors to choose?). In the example here, I am happy with just adding a title. This is done with axes methods, ax.set_title(...). Again, skim this quick start guide to get an understanding of your options and, more importantly, to know where to find inspiration and help for your future plotting endeavors.

At the end, you will either want to display the figure right away, plt.show(), or save it to a file, plt.savefig().

Quick ’n dirty viz

While analyzing data, you often need to look at it to make more sense of it. I encourage you to look at your data as much as possible! The plots that you create to keep your analysis momentum going are working plots that don’t need to be pretty. Often you will have a look at a plot, see/appreciate/understand/infer from it, and never look at it again. For these plots, you don’t need to spend time to make them look good. By contrast, you want to spend as little time as possible creating these working plots.

Please go ahead with the next section on analyzing tabular data using pandas and when you have the pandas basics dialled, come back here for the following note.

I encourage you to make use of the pandas DataFrame and pandas Series methods that directly link to matplotlib’s plotting functionality for your quick ’n dirty plots. Most of these working plots are one-liners that look like

DataFrame['variable'].plot()
DataFrame['variable'].plot.hist()

If your original data structure is more complex and you need to filter and reshape your DataFrame, your working plots become a bit more complex. Often you will have to write a sequence of method calls (still one-liners!) that bring the data in the correct format for the quick ’n dirty plotting call. In these cases, the methods .pivot() and .groupby() can be your friends. Go back to the following tutorials if you need to catch up:

External resources

In addition to the tutorial just above, the following external links are referenced in the text above:

  • Overview of matplotlib’s plot types
  • Matplotlib’s names for colors
  • Official matplotlib quick start guide, skim for overview, inspiration, and your first address for learning ‘how-to’

Learning checklist

  • I know that I can create visuals with the matplotlib library.
  • I distinguish between working plots that are meant to allow me insight into my data and styled plots that I will share in reports, theses, etc.
  • I am aware that crafting beautiful figures is like an art, and that it will take time to familiarize myself with the different plots types, commands, etc.
  • I know where to find help when I want to change specific aspects of my plot, or when I want to browse through plotting options that matplotlib offers.
  • I know that matplotlib integrates with pandas, and several DataFrame plotting methods exist that are very similarly to pure matplotlib syntax.