Plotting figures

Unit 06

Crafting beautiful figures

In the course of this lecture, you have already come across a handful of figures. Now that you have learned a good deal of python basics, you will want to apply all your new power and creativity to solve tomorrow’s energy challenges! Well, in my ears that sounds like complex data sets and intricate relationships between a variety of variables. To make most of your data, you will want to look at it. A lot. And that’s what this page is all about: Sketching out quick ’n dirty graphs that allow you to check whether your data analysis is on the right track and crafting beautiful figures that support your findings and convince your audience.

In the following two sections, you will see brief examples of two approaches that you can use to create figures. While I recommend you to follow the first approach for all figures that you want to style so that they look good, the second approach is great for creating quick ’n dirty working plots.

Matplotlib

matplotlib is the library that implements all the visualization functionalities we will work with. Actually, we will only work with a submodule of the entire package, called pyplot. Convention agrees to import the module like

import matplotlib.pyplot as plt

No matter whether you will work with NumPy arrays or Pandas DataFrames, you can use matplotlib.pyplot to visualize your results. The following shows a very generic approach to crafting figures. I recommend you to follow this pattern for all figures that you will style and make beautiful.

# Imports
import matplotlib.pyplot as plt
import numpy as np

# Sample data generation
x = np.linspace(0, 2 * np.pi, 200)
y = np.sin(x)

# Plotting
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title("An example figure")
plt.show()

You can basically start any figure that you plan to make pretty with the first plotting line fig, ax = plt.subplots(). This will create one figure with one set of axes. In this function call, you can set specific properties of the figure itself, e.g., the figsize argument, or the number of panels, e.g., fig, (ax1, ax2) = plt.subplots(figsize=(12, 4), ncols=2). This will create a figure that’s 12 units wide and 4 units tall. Moreover, it will consist of two panels side-by-side (think of two columns, instead two rows, which could be created with the argument nrows=2). Whenever you initialize multiple panels, the function will return a tuple of axes handles, which I unpacked into individual handles (ax1, ax2) right away.

In a next step, you fill the axes with the plots of your choice. You can do this by calling methods on the axes handle, ax.plot(...). Briefly skim this overview of plot types so that you understand the vast pool of options you have.

Once the data is on the chart, you can spend hours to style, label, and prettify the plot to your liking (e.g., which colors to choose?). In the example here, I am happy with just adding a title. This is done with axes methods, ax.set_title(...). Again, skim this quick start guide to get an understanding of your options and, more importantly, to know where to find inspiration and help for your future plotting endeavors.

At the end, you will either want to display the figure right away, plt.show(), or save it to a file, plt.savefig().

Quick ’n dirty viz

While analyzing data, you often need to look at it to make more sense of it. I encourage you to look at your data as much as possible! The plots that you create to keep your analysis momentum going are working plots that don’t need to be pretty. Often you will have a look at a plot, see/appreciate/understand/infer from it, and never look at it again. For these plots, you don’t need to spend time to make them look good. By contrast, you want to spend as little time as possible creating these working plots.

Luckily, when we do data analysis with Pandas, we can choose to create quick ’n dirty working plots even faster than what we saw above. Pandas implements methods for DataFrames and Series that make use of matplotlib’s plotting capabilities. So you can create quick ’n dirty working plots as easily as, e.g.,

DataFrame['variable'].plot()
DataFrame['variable'].plot.hist()

The downside of this approach is, that you do not have all the flexibility and customizability of the first approach shown earlier. Therefore, I recommend you to use this approach only for quick working plots. Once you know what kind of plot you need for sharing and publication, rewrite your plot using the first approach and style it.

If your original data structure is more complex and you need to filter and reshape your DataFrame, your working plots become a bit more complex. Often you will have to write a sequence of method calls (still one-liners!) that bring the data in the correct format for the quick ’n dirty plotting call. We will get to this in a few units.

Please work through the first of the following Pandas tutorials and quickly peek into the second one:

External resources

In addition to the tutorial just above, the following external links are referenced in the text of this page:

  • Overview of matplotlib’s plot types
  • Matplotlib’s names for colors
  • Official matplotlib quick start guide, skim for overview, inspiration, and your first address for learning ‘how-to’

Learning checklist

  • I know that I can create visuals with the matplotlib library.
  • I distinguish between working plots that are meant to allow me insight into my data and styled plots that I will share in reports, theses, etc.
  • I am aware that crafting beautiful figures is like an art, and that it will take time to familiarize myself with the different plots types, commands, etc.
  • I know where to find help when I want to change specific aspects of my plot, or when I want to browse through plotting options that matplotlib offers.
  • I know that matplotlib integrates with Pandas, and several DataFrame plotting methods exist that are very similarly to pure matplotlib syntax.
  • I can plot data either by using pandas methods or matplotlib methods.