Correction of 2nd exam 2024/25

NES—Programmiertechniken (Introduction to scientific programming)

21.03.2025

Please work through the following tasks. You have 75 minutes to complete the exam.
Make sure you executed all relevant code cells and save the notebook before the end of the exam.

This is a correction of the exam questions. Like always in programming, there are many ways to achieve similar results. My suggestions here follow our coding exercises discussed in class.

If you want to solve this exam as an exercise, download the uncorrected notebook.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Question 1: Multiple-choice (5 points)

To tick correct answers, replace - [ ] with - [x]

Careful: Wrongly ticked answers will give negative points!

(a) Loops and vectorization

Which of the following statements are true? (2.5 points)

When iterating through a sequence (like a list), you can use the enumerate() function to access both the index and value of each element in the loop.
When you want to iterate over multiple variables at the same time, the range() function allows you to process corresponding elements from different sequences simultaneously.
You can nest for-loops inside one another to iterate over multidimensional data structures like matrices.
The concept of applying one task to an array of numbers without using loops is called vectorization.
Vectorized code usually runs faster and is always much easier and faster to write.

(b) Indexing

Which of the statements about the following code block are true? (2.5 points)

a = np.arange(12).reshape((3, 4))
b = a * 2
nz = np.nonzero(a > 8)
b[nz] = a[nz]

This code produces an array b which doubles the values in a, except where the values are strictly larger than 8. In this case, the values are not doubled.
The same result can be obtained using logical indexing.
The code uses logical indexing.
The code uses positional indexing.
The arrays a and b have four rows and three columns.

Question 2: Spreadsheet data

Read the dataset global_energy.csv into a DataFrame called df. (1 point)

df = pd.read_csv('global_energy.csv')
df

	Country	Year	Renewable Energy Share (%)	Fossil Fuel Dependency (%)	Industrial Energy Use (%)
0	Canada	2018	50.38	49.540	43.39
1	Germany	2018	53.14	42.270	43.11
2	Russia	2018	48.71	43.800	33.10
3	Brazil	2018	52.16	45.240	36.70
4	UK	2018	45.28	41.320	38.89
...	...	...	...	...	...
245	India	2000	42.31	48.380	39.11
246	Australia	2000	56.51	45.100	35.06
247	China	2000	45.87	44.365	36.89
248	USA	2000	47.52	48.900	38.89
249	Japan	2000	36.53	48.360	42.95

250 rows × 5 columns

How many unique countries does the data set contain? (1 point)

countries = df['Country'].unique().shape[0]

What are the oldest and most recent years recorded in the DataFrame? (1 point)

df['Year'].min(), df['Year'].max()

(np.int64(2000), np.int64(2024))

How many rows does the DataFrame contain with ‘Renewable Energy Share (%)’ more than 50 % and at the same time ‘Fossil Fuel Dependency (%)’ lower than 50 %? (2 points)

np.sum((df['Renewable Energy Share (%)'] > 50) & (df['Fossil Fuel Dependency (%)'] < 50))

np.int64(65)

Create a subset df_sub of the DataFrame df that contains only records from Canada and USA, and only the following columns:
‘Year’, ‘Renewable Energy Share (%)’, ‘Fossil Fuel Dependency (%)’

Make sure you do this in a way that avoids any potential side effects.

(2 points)

df_sub = df.loc[df['Country'].isin(['Canada', 'USA']), ['Year', 'Renewable Energy Share (%)', 'Fossil Fuel Dependency (%)']].copy()

Using df_sub and the groupby method, create a quick working plot like the following: (2 points)

df_annual = df_sub.groupby('Year').median()
df_annual.plot(title='Median percentages in USA and Canada')
plt.show()

The following code cell computes a Pandas Series range_renewables. Compute a Numpy array range_renewables_loop that contains the same values as range_renewables but use a for-loop for the computation instead of the groupby method. (2 points)

df_min = df.groupby('Country').min()
df_max = df.groupby('Country').max()
range_renewables = df_max['Renewable Energy Share (%)'] - df_min['Renewable Energy Share (%)']

range_renewables_loop = []
for country in df['Country'].unique():
    renewables = df.loc[df['Country'].isin([country]), 'Renewable Energy Share (%)']
    range_renewables_loop.append(renewables.max() - renewables.min())
range_renewables_loop = np.array(range_renewables_loop)

Do the equivalent computation of the following code cell using df and range_renewables_loop to display the country. (1 point)
Tip: If you cannot remember the equivalent Numpy method, briefly skim the documentation of idxmax for help.

range_renewables.idxmax()

'Germany'

df['Country'].unique()[range_renewables_loop.argmax()]

'Germany'

Question 3: datetime calculations

Create a DataFrame df with a DatetimeIndex ranging from 2025-01-01 00:00 to 2025-12-31 23:00 in hourly sampling. (1 point)

df = pd.DataFrame(index=pd.date_range('2025-01-01 00:00', '2025-12-31 23:00', freq='h'))

Create a column named hour_of_year that stores the hour count from 1 to the total number of hours in 2025. (1 point)

df["hour_of_year"] = np.arange(1, df.shape[0] + 1)

Create another column sin_wave that computes the sine of hour_of_year, using the formula: (1 point)

\[ \sin\left(\frac{2\pi \cdot \text{hour\_of\_year}}{8760}\right) \]

df["sin_wave"] = np.sin((2 * np.pi * df["hour_of_year"]) / 8760)

Find the first datetime where sin_wave is close to 0 (using the function np.isclose). (1 point)

df.index[np.isclose(df['sin_wave'], 0)][0]

Timestamp('2025-07-02 11:00:00')

Resample sin_wave to daily minimum, mean, and maximum values and plot the results of the month of September. Reproduce the following figure as closely as possible.
Tip: Use the Axes method fill_between for the shaded area artist.

(7 points)

dfr = pd.DataFrame()
dfr['max'] = df['sin_wave'].resample('D').max()
dfr['min'] = df['sin_wave'].resample('D').min()
dfr['mean'] = df['sin_wave'].resample('D').mean()
dfr = dfr.loc[dfr.index.month==9]

fig, ax = plt.subplots()
ax.fill_between(dfr.index, dfr['min'], dfr['max'], alpha=0.2, label="Range")
dfr['mean'].plot(label="Average")
ax.set_title("Sine wave")
ax.set_ylabel("Unitless")
ax.legend()
ax.grid(linestyle=':', which='both')
plt.show()

Question 4: Functions and Classification

Define a function determine_phase_of_water that determines whether water is frozen or liquid at the given temperature. The function should take one positional argument temp and one keyword argument unit, which defaults to "Celsius". (6 points)
- If unit is not one of "Celsius" or "Farenheit", raise a ValueError with a meaningful error message.
  (If you don’t know how to raise a ValueError print a meaningful error message and return None.)
- If temp is below 0°C or 32°F, return "solid".
- Otherwise, return "liquid".
Write the docstring for the function, also including the Parameters and Returns sections. (2 points)

def determine_phase_of_water(temp, unit="Celsius"):
    """Determine the phase state of water at a given temperature

    Parameters
    ----------
    temp: float
        Temperature of water

    unit: str
        Unit of given temperature

    Returns
    -------
    str
        Phase state
    """

    if unit not in ["Celsius", "Farenheit"]:
        raise ValueError("Unknown unit.")
    
    if unit == "Celsius" and temp < 0:
        return "solid"
    elif unit == "Farenheit" and temp < 32:
        return "solid"
    else:
        return "liquid"

## You can use this code block to check the results of your function.
#  No need to change anything here.

print(determine_phase_of_water(-5))
print(determine_phase_of_water(70))
print(determine_phase_of_water(20, "Farenheit"))
try:
    print(determine_phase_of_water(273, "Kelvin"))
except ValueError as e:
    print(e)

solid
liquid
solid
Unknown unit.