import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Assignment #06
Unit 06
Until next week, work through the material provided in Unit 6 and solve the following exercises in a notebook.
- Load WX_GNP.csv, the same data set we used in last unit’s assignment.
- How many records are available from 2019?
- Create a new data frame
WX19
that contains all records from the 2018/19 winter season. Let’s take all records between September and Mai. The only meteorological variables we need in addition to the meta variablesstation_id
anddatetime
areta
andiswr
. - Save
WX19
as a new csv file.
Make sure you use an elegant pythonic approach to solve these datetime tasks, like demonstrated in Pandas: Index & datetime.
- From
WX19
, create a new data frameWXzoom1
that contains the records from the first two weeks of January and anystation_id
of your choice. - Using the
WXzoom1
data frame, plot a line graph of the temperature that also shows the measurements with circles and, in a different figure, plot an area curve of the incoming shortwave radiation.
- Use the Pandas syntax to create quick ’n dirty working plots.
If your x-axis shows integers instead of dates and times, compare WXzoom1
before and after running the following line of code: WXzoom1.set_index('datetime', inplace=True)
. What changed? Run your plot command again. Do you now see a proper time series representation with time on the x-axis?
- From
WX19
, create a new data frameWXzoom2
that contains the records from the first two weeks of January and the first twostation_id
’s inWX19['station_id'].unique()
. - Using the
WXzoom2
data frame, what is the average temperature of each of the stations? - Using the
WXzoom2
data frame, plot the temperature curves of both stations into one figure.
- Remember the pandas method
.isin()
to filter for the station_ids. - Go back to the Pandas tutorials we worked through in Unit #5, if you’re struggling.
- Create a figure and axes using
plt.subplots()
and then step-by-step fill the axes with the two lines one after another by subsettingWXzoom2
to the individualstation_id
’s.
- Using the
WX19
data frame, compute which station has the maximum total irradiance (i.e., total incoming shortwave radiation). - Create a plot as similar as possible to the following plot.
- You should be able to compute the total irradiance for each station based on all the tips you got up until here. The step from the total irradiance at each station to getting the
station_id
with the maximum total irradiance is a bit confusing and tricky. Don’t get hung up here. - If you made it to here, you can create the figure no problem! Look into options to
plt.subplots()
, the axes method.fill_between()
, and control the transparency of the shaded areas with thealpha
argument to your plot methods.
Solution
#06-01: Use datetime provided in WX
- Load
WX_GNP.csv
, the same data set we used in last unit’s assignment. - How many records are available from 2019?
- Create a new data frame
WX19
that contains all records from the 2018/19 winter season. Let’s take all records between September and Mai. The only meteorological variables we need in addition to the meta variablesstation_id
anddatetime
areta
andiswr
. - Save
WX19
as a new csv file.
datetime | station_id | hs | hn24 | hn72 | rain | iswr | ilwr | ta | rh | vw | dw | elev | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-09-04 06:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000000 | 0.000 | 256.326 | 6.281980 | 94.7963 | 2.32314 | 241.468 | 2121 |
1 | 2019-09-04 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000196 | 555.803 | 288.803 | 12.524600 | 72.0814 | 4.38687 | 247.371 | 2121 |
2 | 2019-09-05 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000045 | 534.011 | 287.089 | 14.265400 | 55.1823 | 1.93691 | 239.254 | 2121 |
3 | 2019-09-06 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000026 | 546.008 | 292.024 | 14.136600 | 72.5560 | 3.67782 | 239.715 | 2121 |
4 | 2019-09-07 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000150 | 528.582 | 289.508 | 14.623800 | 68.9262 | 1.90232 | 227.356 | 2121 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
178733 | 2021-05-25 17:00:00 | VIR088016 | 279.822 | 2.197360 | 7.07140 | 0.108814 | 346.135 | 314.824 | 2.471430 | 94.4473 | 1.28551 | 170.356 | 2121 |
178734 | 2021-05-26 17:00:00 | VIR088016 | 272.909 | 0.000000 | 1.73176 | 0.001291 | 747.383 | 256.193 | 5.066340 | 78.2142 | 4.11593 | 208.909 | 2121 |
178735 | 2021-05-27 17:00:00 | VIR088016 | 267.290 | 2.412910 | 2.80912 | 0.167091 | 185.431 | 316.157 | 1.090880 | 97.3988 | 4.65204 | 218.963 | 2121 |
178736 | 2021-05-28 17:00:00 | VIR088016 | 275.573 | 11.509200 | 12.80780 | 0.000000 | 269.329 | 308.515 | 0.247583 | 88.5444 | 5.33803 | 263.788 | 2121 |
178737 | 2021-05-29 17:00:00 | VIR088016 | 267.562 | 0.259779 | 6.15200 | 0.000766 | 766.358 | 247.267 | 4.666620 | 69.3598 | 1.92538 | 256.041 | 2121 |
178738 rows × 13 columns
or less elegant, but conceptually simpler
WX[(WX['datetime'] >= pd.to_datetime('2019-01-01')) & (WX['datetime'] < pd.to_datetime('2020-01-01'))].shape[0]
57120
WX19 = WX.loc[
(WX['datetime'] >= pd.to_datetime('2018-09-01')) & (WX['datetime'] < pd.to_datetime('2019-06-01')),
['datetime', 'station_id', 'ta', 'iswr']
].copy()
print(f"There are {len(WX19['datetime'].unique())} unique time stamps between '{WX19['datetime'].min()}' and '{WX19['datetime'].max()}'.")
There are 239 unique time stamps between '2018-09-05 06:00:00' and '2019-04-30 17:00:00'.
or alternatively using datetime components:
WX19 = WX.loc[
((WX['datetime'].dt.year == 2018) & (WX['datetime'].dt.month >= 9)) |
((WX['datetime'].dt.year == 2019) & (WX['datetime'].dt.month <= 5)),
['datetime', 'station_id', 'ta', 'iswr']
].copy()
print(f"There are {len(WX19['datetime'].unique())} unique time stamps between '{WX19['datetime'].min()}' and '{WX19['datetime'].max()}'.")
There are 239 unique time stamps between '2018-09-05 06:00:00' and '2019-04-30 17:00:00'.
#06-02: Quick ’n dirty time series
- From
WX19
, create a new data frameWXzoom1
that contains the records from the first two weeks of January and anystation_id
of your choice. - Using the
WXzoom1
data frame, plot a line graph of the temperature that also shows the measurements with circles and, in a different figure, plot an area curve of the incoming shortwave radiation.
WXzoom1 = WX19.loc[
(WX19['datetime'].dt.dayofyear <= 14) & (WX19['station_id'].isin(["VIR075905"])),
['datetime', 'station_id', 'ta', 'iswr']
].copy()
WXzoom1
datetime | station_id | ta | iswr | |
---|---|---|---|---|
57477 | 2019-01-01 17:00:00 | VIR075905 | -7.78452 | 111.1250 |
57478 | 2019-01-02 17:00:00 | VIR075905 | -7.09891 | 72.2500 |
57479 | 2019-01-03 17:00:00 | VIR075905 | -3.90653 | 27.7500 |
57480 | 2019-01-04 17:00:00 | VIR075905 | -4.25220 | 33.3125 |
57481 | 2019-01-05 17:00:00 | VIR075905 | -4.86277 | 100.0000 |
57482 | 2019-01-06 17:00:00 | VIR075905 | -8.70954 | 83.3750 |
57483 | 2019-01-07 17:00:00 | VIR075905 | -9.06894 | 44.4375 |
57484 | 2019-01-08 17:00:00 | VIR075905 | -11.57240 | 61.1250 |
57485 | 2019-01-09 17:00:00 | VIR075905 | -6.33497 | 61.1250 |
57486 | 2019-01-10 17:00:00 | VIR075905 | -3.93680 | 122.2500 |
57487 | 2019-01-11 17:00:00 | VIR075905 | -5.83427 | 133.3750 |
57488 | 2019-01-12 17:00:00 | VIR075905 | -5.70246 | 144.4380 |
57489 | 2019-01-13 17:00:00 | VIR075905 | -5.11887 | 144.4440 |
57490 | 2019-01-14 17:00:00 | VIR075905 | -6.42997 | 155.6250 |
#06-03: Summary stats
- From
WX19
, create a new data frameWXzoom2
that contains the records from the first two weeks of January and the first twostation_id
’s inWX19['station_id'].unique()
. - Using the
WXzoom2
data frame, what is the average temperature of each of the stations? - Using the
WXzoom2
data frame, plot the temperature curves of both stations into one figure.
station_id
VIR075905 -6.472368
VIR075906 -5.492010
Name: ta, dtype: float64
or alternatively, you can manually extract the values of all stations or use a loop:
for station in WXzoom2["station_id"].unique():
print(f"{station}: {WXzoom2.loc[WXzoom2['station_id'] == station, 'ta'].mean():.4f}")
VIR075905: -6.4724
VIR075906: -5.4920
fig, axs = plt.subplots()
WXzoom2["ta"][WXzoom2["station_id"] == WXzoom2["station_id"].unique()[0]].plot(ax=axs) # mind the `ax` argument to tell Pandas in where to plot this to
WXzoom2["ta"][WXzoom2["station_id"] == WXzoom2["station_id"].unique()[1]].plot(ax=axs)
<Axes: xlabel='datetime'>
or alternatively, using a clever way of re-shaping the data frame from a long to a wide format, and making use of the fact that every column will become its own artist on the plot.
#06-04: Summary stats
- Using the
WX19
data frame, compute which station has the maximum total incoming shortwave radiation. - Create a plot as similar as possible to the following plot.
iswr_tot = WX19[["iswr", "station_id"]].groupby("station_id").sum()
iswr_tot.index[iswr_tot.values.argmax()]
'VIR078109'
WX19_grouped_ta = WX19[["datetime", "ta"]].groupby("datetime")
ta_lower = WX19_grouped_ta.quantile(0.1)
ta_upper = WX19_grouped_ta.quantile(0.9)
ta_median = WX19_grouped_ta.median()
WX19_grouped_iswr = WX19[["datetime", "iswr"]].groupby("datetime")
iswr_lower = WX19_grouped_iswr.quantile(0.1)
iswr_upper = WX19_grouped_iswr.quantile(0.9)
iswr_median = WX19_grouped_iswr.median()
alternatively, you can achieve the same result with a loop. This will yield a much more inefficient computation, though, and also the plot will have to be adjusted (e.g., using ta_dt as x axis instead of ta_lower.index):
# # Loop example:
# ta_lower = np.full(WX19['datetime'].unique().shape, np.nan)
# ta_upper = np.full(WX19['datetime'].unique().shape, np.nan)
# ta_median = np.full(WX19['datetime'].unique().shape, np.nan)
# ta_dt = WX19['datetime'].unique()
# for i, dt in enumerate(ta_dt):
# ta_lower[i] = WX19.loc[WX19['datetime'] == dt, 'ta'].quantile(0.1)
# ta_upper[i] = WX19.loc[WX19['datetime'] == dt, 'ta'].quantile(0.9)
# ta_median[i] = WX19.loc[WX19['datetime'] == dt, 'ta'].median()
fig, (ax1, ax2) = plt.subplots(figsize=(12, 5), nrows=2, sharex=True)
ax1.plot(ta_median, color="red")
ax1.fill_between(ta_lower.index, ta_lower["ta"], ta_upper["ta"], color="red", alpha=0.2)
ax1.legend(["Median air temperature", "Temperature envelope"])
ax1.set_ylabel("Temperature (deg C)")
ax1.grid(linestyle=":")
ax2.fill_between(iswr_lower.index, 0, iswr_upper["iswr"], color="green", alpha=0.15)
ax2.fill_between(iswr_lower.index, 0, iswr_median["iswr"], color="green", alpha=0.25)
ax2.fill_between(iswr_lower.index, 0, iswr_lower["iswr"], color="green", alpha=0.5)
ax2.legend(["90th quantile", "Median incoming shortwave", "10th quantile"], loc="upper center")
ax2.set_ylabel("Radiation (W/m2)")
ax2.grid(linestyle=":")
# fig.savefig('twopanel_pretty.png')