Assignment #06

Unit 06

Until next week, work through the material provided in Unit 6 and solve the following exercises in a notebook.

#06-01: Use datetime provided in WX
  • Load WX_GNP.csv, the same data set we used in last unit’s assignment.
  • How many records are available from 2019?
  • Create a new data frame WX19 that contains all records from the 2018/19 winter season. Let’s take all records between September and Mai. The only meteorological variables we need in addition to the meta variables station_id and datetime are ta and iswr.
  • Save WX19 as a new csv file.
Tip

Make sure you use an elegant pythonic approach to solve these datetime tasks, like demonstrated in Pandas: Index & datetime.

#06-02: Quick ’n dirty time series
  1. From WX19, create a new data frame WXzoom1 that contains the records from the first two weeks of January and any station_id of your choice.
  2. Using the WXzoom1 data frame, plot a line graph of the temperature that also shows the measurements with circles and, in a different figure, plot an area curve of the incoming shortwave radiation.
Tip

If your x-axis shows integers instead of dates and times, compare WXzoom1 before and after running the following line of code: WXzoom1.set_index('datetime', inplace=True). What changed? Run your plot command again. Do you now see a proper time series representation with time on the x-axis?

#06-03: Summary stats
  1. From WX19, create a new data frame WXzoom2 that contains the records from the first two weeks of January and the first two station_id’s in WX19['station_id'].unique().
  2. Using the WXzoom2 data frame, what is the average temperature of each of the stations?
  3. Using the WXzoom2 data frame, plot the temperature curves of both stations into one figure.
  1. Remember the pandas method .isin() to filter for the station_ids.
  2. Go back to the Pandas tutorials we worked through in Unit #5, if you’re struggling.
  3. Create a figure and axes using plt.subplots() and then step-by-step fill the axes with the two lines one after another by subsetting WXzoom2 to the individual station_id’s.
#06-04: Summary stats
  • Using the WX19 data frame, compute which station has the maximum total irradiance (i.e., total incoming shortwave radiation).
  • Create a plot as similar as possible to the following plot.

Tip
  1. You should be able to compute the total irradiance for each station based on all the tips you got up until here. The step from the total irradiance at each station to getting the station_id with the maximum total irradiance is a bit confusing and tricky. Don’t get hung up here.
  2. If you made it to here, you can create the figure no problem! Look into options to plt.subplots(), the axes method .fill_between(), and control the transparency of the shaded areas with the alpha argument to your plot methods.

Solution

#06-01: Use datetime provided in WX

  • Load WX_GNP.csv, the same data set we used in last unit’s assignment.
  • How many records are available from 2019?
  • Create a new data frame WX19 that contains all records from the 2018/19 winter season. Let’s take all records between September and Mai. The only meteorological variables we need in addition to the meta variables station_id and datetime are ta and iswr.
  • Save WX19 as a new csv file.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
WX = pd.read_csv("../05_files/WX_GNP.csv", sep=",", parse_dates=["datetime"])
WX
datetime station_id hs hn24 hn72 rain iswr ilwr ta rh vw dw elev
0 2019-09-04 06:00:00 VIR075905 0.000 0.000000 0.00000 0.000000 0.000 256.326 6.281980 94.7963 2.32314 241.468 2121
1 2019-09-04 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000196 555.803 288.803 12.524600 72.0814 4.38687 247.371 2121
2 2019-09-05 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000045 534.011 287.089 14.265400 55.1823 1.93691 239.254 2121
3 2019-09-06 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000026 546.008 292.024 14.136600 72.5560 3.67782 239.715 2121
4 2019-09-07 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000150 528.582 289.508 14.623800 68.9262 1.90232 227.356 2121
... ... ... ... ... ... ... ... ... ... ... ... ... ...
178733 2021-05-25 17:00:00 VIR088016 279.822 2.197360 7.07140 0.108814 346.135 314.824 2.471430 94.4473 1.28551 170.356 2121
178734 2021-05-26 17:00:00 VIR088016 272.909 0.000000 1.73176 0.001291 747.383 256.193 5.066340 78.2142 4.11593 208.909 2121
178735 2021-05-27 17:00:00 VIR088016 267.290 2.412910 2.80912 0.167091 185.431 316.157 1.090880 97.3988 4.65204 218.963 2121
178736 2021-05-28 17:00:00 VIR088016 275.573 11.509200 12.80780 0.000000 269.329 308.515 0.247583 88.5444 5.33803 263.788 2121
178737 2021-05-29 17:00:00 VIR088016 267.562 0.259779 6.15200 0.000766 766.358 247.267 4.666620 69.3598 1.92538 256.041 2121

178738 rows × 13 columns

sum(WX['datetime'].dt.year == 2019)
57120

or less elegant, but conceptually simpler

WX[(WX['datetime'] >= pd.to_datetime('2019-01-01')) & (WX['datetime'] < pd.to_datetime('2020-01-01'))].shape[0]
57120
WX19 = WX.loc[
    (WX['datetime'] >= pd.to_datetime('2018-09-01')) & (WX['datetime'] < pd.to_datetime('2019-06-01')),
    ['datetime', 'station_id', 'ta', 'iswr']
].copy()
print(f"There are {len(WX19['datetime'].unique())} unique time stamps between '{WX19['datetime'].min()}' and '{WX19['datetime'].max()}'.")
There are 239 unique time stamps between '2018-09-05 06:00:00' and '2019-04-30 17:00:00'.

or alternatively using datetime components:

WX19 = WX.loc[
    ((WX['datetime'].dt.year == 2018) & (WX['datetime'].dt.month >= 9)) | 
    ((WX['datetime'].dt.year == 2019) & (WX['datetime'].dt.month <= 5)),
    ['datetime', 'station_id', 'ta', 'iswr']
].copy()
print(f"There are {len(WX19['datetime'].unique())} unique time stamps between '{WX19['datetime'].min()}' and '{WX19['datetime'].max()}'.")
There are 239 unique time stamps between '2018-09-05 06:00:00' and '2019-04-30 17:00:00'.
WX19.to_csv("WX19.csv", index=False)

#06-02: Quick ’n dirty time series

  • From WX19, create a new data frame WXzoom1 that contains the records from the first two weeks of January and any station_id of your choice.
  • Using the WXzoom1 data frame, plot a line graph of the temperature that also shows the measurements with circles and, in a different figure, plot an area curve of the incoming shortwave radiation.
WXzoom1 = WX19.loc[
    (WX19['datetime'].dt.dayofyear <= 14) & (WX19['station_id'].isin(["VIR075905"])),
    ['datetime', 'station_id', 'ta', 'iswr']
].copy()
WXzoom1
datetime station_id ta iswr
57477 2019-01-01 17:00:00 VIR075905 -7.78452 111.1250
57478 2019-01-02 17:00:00 VIR075905 -7.09891 72.2500
57479 2019-01-03 17:00:00 VIR075905 -3.90653 27.7500
57480 2019-01-04 17:00:00 VIR075905 -4.25220 33.3125
57481 2019-01-05 17:00:00 VIR075905 -4.86277 100.0000
57482 2019-01-06 17:00:00 VIR075905 -8.70954 83.3750
57483 2019-01-07 17:00:00 VIR075905 -9.06894 44.4375
57484 2019-01-08 17:00:00 VIR075905 -11.57240 61.1250
57485 2019-01-09 17:00:00 VIR075905 -6.33497 61.1250
57486 2019-01-10 17:00:00 VIR075905 -3.93680 122.2500
57487 2019-01-11 17:00:00 VIR075905 -5.83427 133.3750
57488 2019-01-12 17:00:00 VIR075905 -5.70246 144.4380
57489 2019-01-13 17:00:00 VIR075905 -5.11887 144.4440
57490 2019-01-14 17:00:00 VIR075905 -6.42997 155.6250
WXzoom1.set_index('datetime', inplace=True)
WXzoom1['ta'].plot(marker="o")
<Axes: xlabel='datetime'>

WXzoom1['iswr'].plot.area()
<Axes: xlabel='datetime'>

#06-03: Summary stats

  • From WX19, create a new data frame WXzoom2 that contains the records from the first two weeks of January and the first two station_id’s in WX19['station_id'].unique().
  • Using the WXzoom2 data frame, what is the average temperature of each of the stations?
  • Using the WXzoom2 data frame, plot the temperature curves of both stations into one figure.
WXzoom2 = WX19.loc[
    (WX19['datetime'].dt.dayofyear <= 14) & (WX19['station_id'].isin(WX19["station_id"].unique()[:2])),
    ['datetime', 'station_id', 'ta', 'iswr']
].copy()
WXzoom2.set_index('datetime', inplace=True)
WXzoom2.groupby("station_id")["ta"].mean()
station_id
VIR075905   -6.472368
VIR075906   -5.492010
Name: ta, dtype: float64

or alternatively, you can manually extract the values of all stations or use a loop:

for station in WXzoom2["station_id"].unique():
    print(f"{station}: {WXzoom2.loc[WXzoom2['station_id'] == station, 'ta'].mean():.4f}")
VIR075905: -6.4724
VIR075906: -5.4920
fig, axs = plt.subplots()
WXzoom2["ta"][WXzoom2["station_id"] == WXzoom2["station_id"].unique()[0]].plot(ax=axs)  # mind the `ax` argument to tell Pandas in where to plot this to
WXzoom2["ta"][WXzoom2["station_id"] == WXzoom2["station_id"].unique()[1]].plot(ax=axs)
<Axes: xlabel='datetime'>

or alternatively, using a clever way of re-shaping the data frame from a long to a wide format, and making use of the fact that every column will become its own artist on the plot.

WXzoom2.pivot(columns="station_id", values="ta").plot()
<Axes: xlabel='datetime'>

#06-04: Summary stats

  • Using the WX19 data frame, compute which station has the maximum total incoming shortwave radiation.
  • Create a plot as similar as possible to the following plot.
iswr_tot = WX19[["iswr", "station_id"]].groupby("station_id").sum()
iswr_tot.index[iswr_tot.values.argmax()]
'VIR078109'
WX19_grouped_ta = WX19[["datetime", "ta"]].groupby("datetime")
ta_lower = WX19_grouped_ta.quantile(0.1)
ta_upper = WX19_grouped_ta.quantile(0.9)
ta_median = WX19_grouped_ta.median()

WX19_grouped_iswr = WX19[["datetime", "iswr"]].groupby("datetime")
iswr_lower = WX19_grouped_iswr.quantile(0.1)
iswr_upper = WX19_grouped_iswr.quantile(0.9)
iswr_median = WX19_grouped_iswr.median()

alternatively, you can achieve the same result with a loop. This will yield a much more inefficient computation, though, and also the plot will have to be adjusted (e.g., using ta_dt as x axis instead of ta_lower.index):

# # Loop example:
# ta_lower = np.full(WX19['datetime'].unique().shape, np.nan)
# ta_upper = np.full(WX19['datetime'].unique().shape, np.nan)
# ta_median = np.full(WX19['datetime'].unique().shape, np.nan)
# ta_dt = WX19['datetime'].unique()

# for i, dt in enumerate(ta_dt):
#     ta_lower[i] = WX19.loc[WX19['datetime'] == dt, 'ta'].quantile(0.1)
#     ta_upper[i] = WX19.loc[WX19['datetime'] == dt, 'ta'].quantile(0.9)
#     ta_median[i] = WX19.loc[WX19['datetime'] == dt, 'ta'].median()
fig, (ax1, ax2) = plt.subplots(figsize=(12, 5), nrows=2, sharex=True)

ax1.plot(ta_median, color="red")
ax1.fill_between(ta_lower.index, ta_lower["ta"], ta_upper["ta"], color="red", alpha=0.2)
ax1.legend(["Median air temperature", "Temperature envelope"])
ax1.set_ylabel("Temperature (deg C)")
ax1.grid(linestyle=":")

ax2.fill_between(iswr_lower.index, 0, iswr_upper["iswr"], color="green", alpha=0.15)
ax2.fill_between(iswr_lower.index, 0, iswr_median["iswr"], color="green", alpha=0.25)
ax2.fill_between(iswr_lower.index, 0, iswr_lower["iswr"], color="green", alpha=0.5)
ax2.legend(["90th quantile", "Median incoming shortwave", "10th quantile"], loc="upper center")
ax2.set_ylabel("Radiation (W/m2)")
ax2.grid(linestyle=":")
# fig.savefig('twopanel_pretty.png')

Source: 06_tasks_solved.ipynb