Outlook

Unit 10

Parts of this section are inspired from Fabien Maussion’s lecture material.

Hey you, High Five! This is the last section in your Introduction-to-scientific-programming-course. Over the last weeks, you have learned so many new concepts and skills, you dove into a whole new world that will enable you to apply your engineering knowledge in a very effective and powerful way. You can be very proud of yourself!

This last section consists of two main parts. The first part Big bang for your buck introduces two new tools to you, that are fairly easy to adopt, but will help make your coding experience more fun and the results more robust. The second part A glimpse outside the box provides you with some information that lies beyond the scope of this lecture but can be relevant for your future programming endeavors. If you like programming and want to continue developing your skills, you will find some interesting avenues to pursue.

Big bang for your buck

Testing a function’s output with doctest

You may have noticed the Examples section in the docstring of some functions we coded in our lecture.

The Examples section uses a specific syntax (the >>>) to signify that we are documenting python code. This can actually be run and understood by a tool called doctest.

Using it can help discover bugs in our documentation or the function itself, for example

import doctest

def hours2mins(hours):
    """Convert hours to minutes.

    Examples
    --------
    >>> hours2mins(2)
    120.0
    """
    mins = hours * 60.2
    return mins

doctest.testmod()
**********************************************************************
File "__main__", line 8, in __main__.hours2mins
Failed example:
    hours2mins(2)
Expected:
    120.0
Got:
    120.4
**********************************************************************
1 items had failures:
   1 of   1 in __main__.hours2mins
***Test Failed*** 1 failures.
TestResults(failed=1, attempted=1)

The doctest is pointing me to a problem in my code. We expect 2 hours to convert to 120 minutes. The code, however, has a typo mistake and uses a constant of 60.2 for the conversion instead of 60.0. Hence, doctest tells me that either my documentation is wrong, or there is a mistake in the code.

Doctests are very useful to:

  • document a function’s behavior with examples, which is often the best way to explain
  • test if the function is working as expected and discover future bugs if the internal code changes
  • test if the documentation is still correct after internal code changes

If you want to integrate doctests into your own modules/packages, check out scispack to learn how this can be done.

The python debugger

At some point in our lecture, the code we wrote started to get nested and more complicated. There were loops, functions, etc. Writing such code inevitably comes with bugs that we need to debug before we can actually use the program. It is often helpful to assemble your complicated program step-by-step and test each unit before trying to run all the code at once. However, sometimes this is exactly what you did, but there is still a bug somewhere. You can go back to the strategies you learned in Workshop 1 to find the bug. Here is another suggestion for a tool that will help you find your bug interactively: the Python Debugger (pdb), a powerful interactive debugging tool that comes built-in with Python. It allows you to set breakpoints, which will stop code execution at the scope of the breakpoint so that you can interactively inspect variables and step through code to find your bug. Here’s a quick tutorial to get you started:

  1. Starting the debugger

There are several ways to start pdb:

  • import pdb and then pdb.set_trace() in your script where you want to start debugging. When the python interpreter hits this line during execution, it will pause and enter the debugging mode.
  • or if your script crashes, you can load pdb after the fact to inspect the state at the time of the crash: pdb.pm() (for post-mortem)
  1. Basic commands while debugging

Once in pdb, you can use various commands to navigate and inspect your code:

  • l (list): Shows the current location in the file with some context.
  • n (next): Execute the next line of code.
  • s (step): Step into functions called at the current line.
  • c (continue): Continue normal execution until the next breakpoint.
  • b (break): Set a breakpoint. For example, b 15 sets a breakpoint at line 15.
  • p (print): Print the value of an expression. For example, p my_var prints the value of my_var.
  • q (quit): Exit from the debugger and end the program.
Your turn

Take one of the python scripts we wrote in the early days of the lecture and insert a pdb.set_trace() at some location. Run the script and explore what the debugger enables you to do. After you’ve done that, try the same with a notebook.

Once you’ve done that, check out the examples from the section on namespaces and scopes one more time. They offer an interesting playground to explore the capabilities of pdb, see below:

import pdb

a = 'global value'

def outer():
    a = 'enclosed value'

    def inner():
        a = 'local value'
        print(a)

    pdb.set_trace()
    inner()

outer()

A glimpse outside the box

Useful python packages and libraries

Useful standard packages include

  • os: interface with the underlying operating system, (create directories, check whether files exist, etc).
  • multiprocessing: create parallel processes that run on multiple cores of your laptop to run simulations faster.

Scientific python stack:

  • scipy is the scientist’s toolbox. It is a very large package and covers many aspects of the scientific workflow, organized in submodules, all dedicated to a specific aspect of data processing. For example: scipy.integrate, scipy.optimize, or scipy.linalg.
  • scikit-learn: machine learning tools.
  • seaborn: statistical data visualization in conjunction with pandas.
  • xarray: similar to pandas, but for N-dimensional arrays and more complex file formats like netcdf, hdf5, etc.
  • pandasql and pyodb: data base interfaces
  • geopandas: extension of pandas useful for spatial data

Packages from the renewable energy sector:

  • pvlib python: simulating performance of PV systems
  • windrose: polar wind roses useful for wind analysis
  • PySAM: wrapper for the National Renewable Energy Laboratory’s System Advisor Model (SAM)—a performance and financial model designed to estimate the cost of energy for grid-connected power projects based on installation and operating costs and system design in order to facilitate decision making for people involved in the renewable energy industry.

Programming tutorials, knowledge base, etc

You have learned about and from many different resources throughout this lecture. This is fantastic, because it trains you to be agile in your learning of programming skills. Another website I can highly reccommend to you, which I have not included in our material, is

Check it out if you want to dig deeper, or recap the basics with different exercises, descriptions, etc.

Object oriented programming

Adapted from Fabien Maussion’s lecture material.

Object-oriented programming (OOP) is a programming paradigm that uses “objects” to design applications and computer programs. Although Python is an OOP language at its core, it does not enforce its usage. In fact, you can design powerful code without having to write any OOP specific code. However, everything is an object in Python, so we all make heavy use of OOP code all the time. Python’s approach to OOP centers around the creation and manipulation of objects, which are instances of classes. Classes serve as blueprints for objects, defining their properties (known as attributes) and behaviors (represented by methods).

In this lecture, we have focused on functional programming (i.e., separating code that we want to re-use into functions). OOP represents a different style of programming that allows you to bundle data and behaviors into individual objects, possibly helping you to organize your code in a way that feels more natural and clear. In short, OOP is simply another way to structure your programs. And remember, it’s more a paradigm than a true necessity.

Learning to write OOP code can easily fill an entire semester. However, if you are interested, I encourage you to read up on it a little bit to find out whether you want to pursue this avenue or not. Like always, Fabien Maussion has curated a very illustrative series of three lectures on this topic that you can use as starting point.

Overview of programming languages

Compiled vs. interpreted languages

The main difference between compiled and interpreted languages lies in their execution. Compiled languages like Fortran, C, and C++ are converted into machine code before execution, which makes them faster and more efficient—a critical aspect in large-scale simulations and computations.

On the other hand, interpreted languages like Python and R prioritize ease of development, debugging, and flexibility. They are generally slower but significantly improve development time and offer a vast ecosystem of libraries and tools, making them ideal for data analysis, prototyping, and when the performance is not the primary concern.

1. Fortran

Fortran, a compiled and one of the oldest programming languages, still holds a prestigious place in scientific computing. Fortran’s longevity and ongoing development have resulted in a vast ecosystem of libraries and tools, particularly for high-performance computing in engineering, physics, and meteorology.

2. C and C++

C and C++ are also compiled languages and are known for their high performance and control over system resources, making them suitable for computationally intensive scientific applications. C++, with its support for object-oriented programming, offers more flexibility and complexity than C. These languages are widely used in simulations, game development, and real-time systems where performance is crucial.

3. Java

Java, a compiled (to bytecode) and interpreted language (by the JVM), strikes a balance between performance and portability. Its “write once, run anywhere” philosophy ensures high cross-platform compatibility. While not as fast as C/C++ or Fortran for numerical tasks, Java’s robustness and ease of use make it suitable for large-scale, distributed scientific applications.

4. R

R, primarily an interpreted language like Python, is a staple in statistical analysis and graphical representation of data. It’s not traditionally known for speed but excels in data analysis tasks due to its comprehensive collection of packages for statistical methods and data visualization.

5. Julia

Julia is a high-level, high-performance, dynamic programming language primarily used for technical and numerical computing. It combines the ease of use of Python and R with the speed of C++, making it particularly well-suited for data science, machine learning, and large-scale numerical analysis. Julia’s syntax is user-friendly and it has the ability to handle mathematical notation naturally, which is a big plus for scientists and engineers. A key feature of Julia is its performance, achieved through just-in-time compilation. This makes it comparable in speed to traditionally compiled languages. Julia also allows easy integration with other languages, offering a great balance between performance and productivity, making it increasingly popular in both academia and industry.

6. Python

Python stands out for its simplicity and readability, making it highly accessible for scientists who may not be professional programmers. As an interpreted language, Python can be slower than compiled languages. However, its extensive range of libraries (like NumPy, SciPy, and Pandas) for scientific computing, data analysis, and machine learning has made it a popular choice. Python acts as a ‘glue’ language, allowing for the integration of components written in other languages like C and Fortran, therefore mitigating its performance limitations.

7. SQL

SQL, or Structured Query Language, is a specialized language used for interacting with relational databases, a type of database that organizes data into tables that can be linked by relationships. Unlike general-purpose programming languages like Python, which are designed for a wide range of tasks, SQL is specifically tailored for database management tasks such as querying, updating, and managing data.

Relational databases structure data into tables, each resembling a spreadsheet with rows and columns very much like our Pandas Data Frames. The power of a relational database lies in its ability to efficiently handle much larger volumes of data (than Python/Pandas). Relationships between tables can be achieved through common columns, known as keys. This structure is particularly useful for complex queries, and any sort of standardized data manipulation.

In the context of scientific computing, SQL and relational databases serve a different purpose compared to previously mentioned languages like Python. While Python is used for numerical analysis, algorithm development, data processing and visualization, SQL is used for managing and querying structured data. It excels in scenarios where data is stored in a relational format, and there’s a need for robust data management.