Packages

Unit 07

Last week you learned how to re-use code by organizing it in a module and importing the module into your notebook or script. However, our solution of placing the module in your current working directory is not super convenient and sustainable in the long run. Suppose you continue to add code into our module solar and at some point you need the module for different projects. You could always copy and paste the latest version of your module into the different working directories of your projects, but I ensure you that it won’t be long until you loose track of which version is used where. (Hold on a second, did I update the bug fix in the version of that other project?, etc.)

The solution to this problem is creating and installing your own package.

Basically, a package is a collection of modules. All the different libraries we have been using throughout this lecture, like numpy or matplotlib, are organized as packages. Creating your own package will allow you to install your package into your conda environment and import it from any project directory. This is super powerful, because all of a sudden you have one directory that contains your package. Whenever you want to add to a specific module in your package, or fix a bug, implement a more accurate equation, etc., you can immediately use the new functionality in all projects that use your module no matter where these project directories are located on your computer. Suppose you develop your own analysis approaches during your thesis. Organizing all your functions in a package, and importing your package in your analysis notebooks will allow you and your supervisors to understand your computations much better, make it easier to experiment with different analysis approaches, and share your code after you’re finished to make other students or companies benefit from it.

Your own package

The remainder of this Section is adapted from Fabien Maussion’s lecture material.

Fabien has written a demo package called “scispack” – a simple scientific python package template. I modified the template for our purposes and reduced it to the bare minimum that I want you to take home from this topic. The modified package template we will work with is called “mytoolbox”, and you can download it from ILIAS. Place it in your $SCIPRO directory.

Package structure

Directory root (./)

  • LICENSE.txt: always license your code
  • README.md: parts of this webpage, a markdown file *.md. This is a file type that contains only markdown text (as opposed to *.ipynb files that we use to mix python code with markdown text)
  • setup.py: this setup file is what makes your package installable by a package manager. It contains a set of simple instructions regarding e.g. the name of the package, its version number, etc

The actual package (./mytoolbox)

  • __init__.py: tells python that the directory is a package and enables the “dotted module names” import syntax (e.g., import matplotlib.pyplot as plt). Like in our case, this file is often empty.
  • solar.py: the first module in our package

Installing your package

Background: sys.path

Generally speaking, how does python know where to look for modules when you type import mymodule? It relies on a mechanism that compiles a list of directory paths that are used by your package manager to store packages.

import sys
sys.path
['/home/flo/documents/fhv/LV_programming_NES/WS2023_24/webpage',
 '/home/flo/miniconda3/envs/scipro2023/lib/python311.zip',
 '/home/flo/miniconda3/envs/scipro2023/lib/python3.11',
 '/home/flo/miniconda3/envs/scipro2023/lib/python3.11/lib-dynload',
 '',
 '/home/flo/miniconda3/envs/scipro2023/lib/python3.11/site-packages',
 '/home/flo/documents/fhv/LV_programming_NES/WS2023_24/mytoolbox']

Python will look into each of these directories in order. The first directory is always the current working directory (sometimes an empty string), the rest of the list may vary depending on your environment. When a file called mymodule.py is found, it is imported once (and only once) and added to the sys.modules variable, effectively telling python that the module has already been imported. This means that if you change the file and import it again, nothing will change. Fortunately, there are ways to avoid this behavior (e.g., see reload in Unit 5).

As you can see, there are many folders related to miniconda, the tool we used to install python. This makes sense, because we want python to look for modules related to our installation of python (not the one used by our operating system). In particular, the site-packages folder is of interest to us. If you look into this folder using the terminal you’ll find the many packages we already installed together in Workshop 1.

You can edit sys.path at your wish and add new folders to it. This can come in handy if you have a module in a different directory than your current working directory and you want to import it. In practice, however, it is recommended to use standard folders to install your packages (see below) instead of messing around with sys.path. The following command will add the folder to the current session or notebook only:

import sys
sys.path.append('/path/to/a/folder')

Installing a local package

In order to install our local package, we copy the package into a directory that is already listed in sys.path. This is exactly what package managers do as well. So far, we have used the package manager conda. For this task, we will install the local package with a different package manager, called pip, but into our existing conda environment scipro2023.

Installing a local package

Follow each steps below one-by-one. Each one is important an can not be left out.

  1. Open a terminal and navigate to the package’s root directory.
  2. Make sure the conda environment you want to install the package into (in this case scipro2023) is activated.
  3. Run pip install -e .
  4. (optional) Verify that the package is installed by checking whether it’s listed in the output of conda list

The pip install -e . command will look for a setup.py file in the current folder (this is why the dot . is used) and if found, use it to determine the package’s name and other installation options. The -e argument installs the package in “editable” or “development” mode. In simple terms, this option will create a symbolic link to the package directory instead of copying the files. Therefore, any changes to the code will always be available the next time you import or reload the respective module.

pip install -e . is the recommended way to install any local package, in pip or in conda environments. On computers where you don’t have super-user permissions, use pip install --user -e .

Outlook for the motivated ones among you

If you’re interested to implement slightly more sophisticated packages, feel free to check out scispack and the associated lecture material. You will peek into how to

  • implement the package as git repository to keep track of your development versions
  • implement command line tools for the terminal from your python package
  • write more thorough package documentation in case you plan to share your package publicly
  • implement package tests to ensure your functions yield the expected results
  • execute modules as scripts (this is located at a different link, here)

Learning checklist

  • I know the difference between a module and a package.
  • I am aware of the advantages of packaging my own code.
  • Following the step-by-step instructions above, I can install my own package into a conda environment in editable mode.