05_tasks_solved.ipynb
Download Notebook

Assignment #05

#05-04: Explore WX

  • Read the comma separated spreadsheet WX_GNP.csv
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
WX = pd.read_csv("WX_GNP.csv", sep=",", parse_dates=["datetime"])
WX
datetime station_id hs hn24 hn72 rain iswr ilwr ta rh vw dw elev
0 2019-09-04 06:00:00 VIR075905 0.000 0.000000 0.00000 0.000000 0.000 256.326 6.281980 94.7963 2.32314 241.468 2121
1 2019-09-04 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000196 555.803 288.803 12.524600 72.0814 4.38687 247.371 2121
2 2019-09-05 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000045 534.011 287.089 14.265400 55.1823 1.93691 239.254 2121
3 2019-09-06 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000026 546.008 292.024 14.136600 72.5560 3.67782 239.715 2121
4 2019-09-07 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000150 528.582 289.508 14.623800 68.9262 1.90232 227.356 2121
... ... ... ... ... ... ... ... ... ... ... ... ... ...
178733 2021-05-25 17:00:00 VIR088016 279.822 2.197360 7.07140 0.108814 346.135 314.824 2.471430 94.4473 1.28551 170.356 2121
178734 2021-05-26 17:00:00 VIR088016 272.909 0.000000 1.73176 0.001291 747.383 256.193 5.066340 78.2142 4.11593 208.909 2121
178735 2021-05-27 17:00:00 VIR088016 267.290 2.412910 2.80912 0.167091 185.431 316.157 1.090880 97.3988 4.65204 218.963 2121
178736 2021-05-28 17:00:00 VIR088016 275.573 11.509200 12.80780 0.000000 269.329 308.515 0.247583 88.5444 5.33803 263.788 2121
178737 2021-05-29 17:00:00 VIR088016 267.562 0.259779 6.15200 0.000766 766.358 247.267 4.666620 69.3598 1.92538 256.041 2121

178738 rows × 13 columns

In [3]:
tabledict = {
    'datetime': 'datetime in the form YYYY-MM-DD HH:MM:SS',
    'station_id': 'ID of virtual weather station (i.e., weather model grid point)',
    'hs': 'Snow height (cm)',
    'hn24': 'Height of new snow within last 24 hours (cm)',
    'hn72': 'Height of new snow within last 72 hours (cm)',
    'rain': 'Liquid water accumulation within last 24 hours (mm)',
    'iswr': 'Incoming shortwave radiation (Wm**-2)',
    'iswr': 'Incoming longave radiation (W/m2)',
    'ta': 'Air temperature (degrees Celsius)',
    'rh': 'Relative humidity (%)',
    'vw': 'Wind speed (m/s)',
    'dw': 'Wind direction (degrees)',
    'elev': 'Station elevation (m asl)'
}
In [4]:
for key in tabledict:
    print(f'{key:>10}: {tabledict[key]}')
  datetime: datetime in the form YYYY-MM-DD HH:MM:SS
station_id: ID of virtual weather station (i.e., weather model grid point)
        hs: Snow height (cm)
      hn24: Height of new snow within last 24 hours (cm)
      hn72: Height of new snow within last 72 hours (cm)
      rain: Liquid water accumulation within last 24 hours (mm)
      iswr: Incoming longave radiation (W/m2)
        ta: Air temperature (degrees Celsius)
        rh: Relative humidity (%)
        vw: Wind speed (m/s)
        dw: Wind direction (degrees)
      elev: Station elevation (m asl)
  • How many different virtual stations are included in the data frame?
  • How many unique time stamps does the data frame contain?
  • What are the earliest and latest time records?
  • How many stations are located above 2000m but below 2200m?
  • Characterize the elevation distribution of the stations with a number of percentiles!
In [5]:
station_ids = WX['station_id'].unique()
n_stations = len(station_ids)
print(f'There are {n_stations} unique stations in the data frame and there labels are as follows:\n {station_ids}')
There are 238 unique stations in the data frame and there labels are as follows:
 ['VIR075905' 'VIR075906' 'VIR075907' 'VIR076452' 'VIR076456' 'VIR076457'
 'VIR076458' 'VIR076459' 'VIR077002' 'VIR077003' 'VIR077004' 'VIR077007'
 'VIR077008' 'VIR077009' 'VIR077010' 'VIR077548' 'VIR077549' 'VIR077550'
 'VIR077551' 'VIR077552' 'VIR077553' 'VIR077554' 'VIR077555' 'VIR077556'
 'VIR077557' 'VIR077558' 'VIR077559' 'VIR077560' 'VIR077561' 'VIR078100'
 'VIR078101' 'VIR078102' 'VIR078103' 'VIR078104' 'VIR078105' 'VIR078106'
 'VIR078107' 'VIR078108' 'VIR078109' 'VIR078110' 'VIR078111' 'VIR078112'
 'VIR078652' 'VIR078653' 'VIR078654' 'VIR078655' 'VIR078656' 'VIR078657'
 'VIR078658' 'VIR078659' 'VIR078660' 'VIR078661' 'VIR078662' 'VIR079203'
 'VIR079204' 'VIR079205' 'VIR079206' 'VIR079207' 'VIR079208' 'VIR079209'
 'VIR079210' 'VIR079211' 'VIR079212' 'VIR079213' 'VIR079754' 'VIR079755'
 'VIR079756' 'VIR079757' 'VIR079758' 'VIR079759' 'VIR079760' 'VIR079761'
 'VIR079762' 'VIR079763' 'VIR079764' 'VIR080304' 'VIR080305' 'VIR080306'
 'VIR080307' 'VIR080308' 'VIR080309' 'VIR080310' 'VIR080311' 'VIR080312'
 'VIR080313' 'VIR080314' 'VIR080315' 'VIR080855' 'VIR080856' 'VIR080857'
 'VIR080858' 'VIR080859' 'VIR080860' 'VIR080861' 'VIR080862' 'VIR080863'
 'VIR080864' 'VIR080865' 'VIR080866' 'VIR081406' 'VIR081407' 'VIR081408'
 'VIR081409' 'VIR081410' 'VIR081411' 'VIR081412' 'VIR081413' 'VIR081414'
 'VIR081415' 'VIR081416' 'VIR081417' 'VIR081420' 'VIR081956' 'VIR081957'
 'VIR081958' 'VIR081959' 'VIR081960' 'VIR081961' 'VIR081962' 'VIR081963'
 'VIR081964' 'VIR081965' 'VIR081966' 'VIR081967' 'VIR081968' 'VIR081969'
 'VIR081970' 'VIR081971' 'VIR082508' 'VIR082509' 'VIR082510' 'VIR082511'
 'VIR082512' 'VIR082513' 'VIR082514' 'VIR082515' 'VIR082516' 'VIR082517'
 'VIR082518' 'VIR082519' 'VIR082520' 'VIR082521' 'VIR082522' 'VIR082523'
 'VIR083059' 'VIR083060' 'VIR083061' 'VIR083062' 'VIR083063' 'VIR083064'
 'VIR083065' 'VIR083066' 'VIR083067' 'VIR083068' 'VIR083069' 'VIR083070'
 'VIR083071' 'VIR083072' 'VIR083073' 'VIR083610' 'VIR083611' 'VIR083612'
 'VIR083613' 'VIR083614' 'VIR083615' 'VIR083616' 'VIR083617' 'VIR083618'
 'VIR083619' 'VIR083620' 'VIR083621' 'VIR083622' 'VIR083623' 'VIR084161'
 'VIR084162' 'VIR084163' 'VIR084164' 'VIR084165' 'VIR084166' 'VIR084167'
 'VIR084168' 'VIR084169' 'VIR084170' 'VIR084171' 'VIR084711' 'VIR084712'
 'VIR084713' 'VIR084714' 'VIR084715' 'VIR084716' 'VIR084717' 'VIR084718'
 'VIR084719' 'VIR084720' 'VIR084721' 'VIR084722' 'VIR084723' 'VIR085261'
 'VIR085262' 'VIR085263' 'VIR085264' 'VIR085265' 'VIR085266' 'VIR085267'
 'VIR085268' 'VIR085269' 'VIR085270' 'VIR085271' 'VIR085272' 'VIR085812'
 'VIR085813' 'VIR085814' 'VIR085815' 'VIR085816' 'VIR085817' 'VIR085818'
 'VIR085819' 'VIR085820' 'VIR085821' 'VIR085822' 'VIR085823' 'VIR086362'
 'VIR086363' 'VIR086364' 'VIR086365' 'VIR086371' 'VIR086372' 'VIR086373'
 'VIR086374' 'VIR086912' 'VIR086913' 'VIR086914' 'VIR086915' 'VIR086916'
 'VIR087463' 'VIR087464' 'VIR087465' 'VIR088016']
In [6]:
print(f"There are {len(WX['datetime'].unique())} unique time stamps between '{WX['datetime'].min()}' and '{WX['datetime'].max()}'.")
There are 751 unique time stamps between '2018-09-05 06:00:00' and '2021-05-29 17:00:00'.
In [8]:
WX.loc[(WX['elev'] > 2000) & (WX['elev'] < 2200), 'elev'].unique().shape[0]
## or alternatively (less reccommended):
# WX['elev'][(WX['elev'] > 2000) & (WX['elev'] < 2200)].unique().shape[0]
## this is powerful as well:
# WX.loc[(WX['elev'] > 2000) & (WX['elev'] < 2200), ('elev', 'station_id')]
30
In [9]:
WX['elev'].describe()
count    178738.000000
mean       1836.436975
std         236.796505
min        1279.000000
25%        1671.000000
50%        1828.000000
75%        1989.000000
max        2497.000000
Name: elev, dtype: float64
In [10]:
WX['elev'].quantile(0.90)
np.float64(2139.0)

#05-05: Subset WX and compute more summary stats

  • Create another data frame as subset of WX, which contains all the data from one station_id of your choice.
  • What is the average air temperature and its standard deviation?
  • What is the median relative humidity rh when either hn24 is greater than 10 cm or rain is greater than 2 mm? What about the median rh during the opposite conditions?
  • Compute a new column hn72_check that should conceptually be identical to hn72. Use only hn24 to derive hn72_check.
  • Test whether hn72_check is indeed equal to hn72. Why not?
  • Store the new data frame in a csv file.
In [11]:
wx = WX[WX['station_id'].isin(['VIR075905'])].copy()
## or more generically:
# wx = WX.loc[WX['station_id'].isin([WX['station_id'].unique()[0]]), ]
wx
datetime station_id hs hn24 hn72 rain iswr ilwr ta rh vw dw elev
0 2019-09-04 06:00:00 VIR075905 0.000 0.000000 0.00000 0.000000 0.000 256.326 6.28198 94.7963 2.32314 241.468 2121
1 2019-09-04 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000196 555.803 288.803 12.52460 72.0814 4.38687 247.371 2121
2 2019-09-05 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000045 534.011 287.089 14.26540 55.1823 1.93691 239.254 2121
3 2019-09-06 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000026 546.008 292.024 14.13660 72.5560 3.67782 239.715 2121
4 2019-09-07 17:00:00 VIR075905 0.000 0.000000 0.00000 0.000150 528.582 289.508 14.62380 68.9262 1.90232 227.356 2121
... ... ... ... ... ... ... ... ... ... ... ... ... ...
114506 2021-05-25 17:00:00 VIR075905 281.450 0.000000 0.00000 0.401526 346.135 314.824 3.71658 88.5444 1.56797 156.521 2121
114507 2021-05-26 17:00:00 VIR075905 276.290 0.000000 0.00000 0.025453 774.383 242.693 6.56515 72.3112 5.24579 208.540 2121
114508 2021-05-27 17:00:00 VIR075905 265.546 0.000000 0.00000 0.313308 239.181 316.157 2.42825 97.3988 7.03859 218.594 2121
114509 2021-05-28 17:00:00 VIR075905 268.436 5.094300 5.09430 0.108912 376.829 301.828 1.07766 84.1171 6.69847 266.648 2121
114510 2021-05-29 17:00:00 VIR075905 265.560 0.797445 3.09568 0.010523 793.358 240.517 5.01248 70.8355 2.92842 265.633 2121

751 rows × 13 columns

Mind the copy in assigning the filtered data frame WX to a new one wx! What happens if you don’t copy? (Tip: Try it out and see whether you will see a Warning several cells below!)

In [12]:
wx['ta'].mean()
np.float64(-3.7703614580559255)
In [13]:
wx['ta'].std()
np.float64(6.582577628629118)
In [14]:
wx.loc[(wx['hn24'] > 10) | (wx['rain'] > 2), 'rh'].median()
np.float64(89.947)
In [15]:
wx.loc[~((wx['hn24'] > 10) | (wx['rain'] > 2)), 'rh'].median()
## or equivalently:
# wx.loc[~(wx['hn24'] > 10) & ~(wx['rain'] > 2), 'rh'].median()
# wx.loc[(wx['hn24'] <= 10) & (wx['rain'] <= 2), 'rh'].median()
np.float64(82.03875)
In [16]:
import numpy as np

The following solution matches the Pandas knowledge from this unit,

In [17]:
wx['hn72_check'] = np.nan
for i, val in enumerate(wx['hn24']):
    if i > 3:
        wx.iloc[i, 13] = wx.iloc[i-2:i+1, 3].sum()

whereas this next solution would be my go-to choice. There are a few extra tricks in here: What it comes down to, is that we would need to use the iloc operator for selecting the rows based on the integer count i, but at the same time we want to access the column by its name and not by its location, which requires the loc operator. I’m ultimately going for the loc operator to maximize readability and convenience. To still access the correct rows, however, I need to select wx.index[i] instead of i alone. Why? –> Because wx.index[i] will return the row names at the locations i (see Unit 6).

In [18]:
wx['hn72_check_II'] = np.nan
for i, val in enumerate(wx['hn24']):
    if i > 3:
        wx.loc[wx.index[i], 'hn72_check_II'] = wx.loc[wx.index[i-2]:wx.index[i], 'hn24'].sum()
In [19]:
wx[['hn24', 'hn72', 'hn72_check', 'hn72_check_II']]
hn24 hn72 hn72_check hn72_check_II
0 0.000000 0.00000 NaN NaN
1 0.000000 0.00000 NaN NaN
2 0.000000 0.00000 NaN NaN
3 0.000000 0.00000 NaN NaN
4 0.000000 0.00000 0.000000 0.000000
... ... ... ... ...
114506 0.000000 0.00000 0.000000 0.000000
114507 0.000000 0.00000 0.000000 0.000000
114508 0.000000 0.00000 0.000000 0.000000
114509 5.094300 5.09430 5.094300 5.094300
114510 0.797445 3.09568 5.891745 5.891745

751 rows × 4 columns

hn72 was computed on the hourly time series. Since we compute hn72_check on the daily time series, some values are identical while others are not. In any case, the first 4 entries can not be identical, because we are missing data points that are further in the past to compute the 3-day height of new snow.

In [76]:
wx.to_csv("WX_subset.csv", index=False)