= "This is a sentence"
sentence " ") sentence.split(
['This', 'is', 'a', 'sentence']
Unit 05
Until next unit, work through the material provided in Unit 5 and solve the following exercise blocks in separate notebooks.
Code a function that accepts two arguments, first an arbitrary word or sentence (type str
, e.g., "energy efficiency"
) then a character (also str
, e.g., "e"
). The function should return an integer of how many times the character appeared in the sentence. Write a comprehensive docstring for the function and test whether it performs as expected with a few test cases.
Consider this:
Code a function that accepts an arbitrary sentence as input and returns a list of strings containing the individual words, just like the example above. However, let’s imagine the .split()
method does not exist and you have to find another way of achieving this. Does your function return the same result as the split method?
Consider this:
v1a
and v1b
are equal again:v2a
and v2b
are equal again:For this exercise block, you will work with a real data set, nowcast data from a numerical weather prediction model in Glacier National Park, Canada, WX_GNP.csv.
Here is a dictionary that explains the meaning of the spread sheet variables
WX_dict = {
'datetime': 'datetime in the form YYYY-MM-DD HH:MM:SS',
'station_id': 'ID of virtual weather station (i.e., weather model grid point)',
'hs': 'Snow height (cm)',
'hn24': 'Height of new snow within last 24 hours (cm)',
'hn72': 'Height of new snow within last 72 hours (cm)',
'rain': 'Liquid water accumulation within last 24 hours (mm)',
'iswr': 'Incoming shortwave radiation (also refered to as irradiance) (W/m2)',
'ilwr': 'Incoming longave radiation (W/m2)',
'ta': 'Air temperature (degrees Celsius)',
'rh': 'Relative humidity (%)',
'vw': 'Wind speed (m/s)',
'dw': 'Wind direction (degrees)',
'elev': 'Station elevation (m asl)'
}
for key, explanation in WX_dict.items():
print(f'{key:>10}: {explanation}')
datetime: datetime in the form YYYY-MM-DD HH:MM:SS
station_id: ID of virtual weather station (i.e., weather model grid point)
hs: Snow height (cm)
hn24: Height of new snow within last 24 hours (cm)
hn72: Height of new snow within last 72 hours (cm)
rain: Liquid water accumulation within last 24 hours (mm)
iswr: Incoming shortwave radiation (also refered to as irradiance) (W/m2)
ilwr: Incoming longave radiation (W/m2)
ta: Air temperature (degrees Celsius)
rh: Relative humidity (%)
vw: Wind speed (m/s)
dw: Wind direction (degrees)
elev: Station elevation (m asl)
WX_GNP.csv
The methods .unique()
, .min()
, .max()
, .describe()
, and .quantile()
will be handy for these tasks.
station_id
of your choice.rh
when either hn24
is greater than 10 cm or rain
is greater than 2 mm? What about the median rh
during the opposite conditions?hn72_check
that should conceptually be identical to hn72
. Use only hn24
to derive hn72_check
.hn72_check
is indeed equal to hn72
. Why not?def count_character(statement, character):
"""
Count the occurrences of a specific character in a given statement.
Parameters:
-----------
statement : str
The word or sentence in which to search for the character.
character : str
The character to count. Must be a single character.
Returns:
--------
int
The number of times the character appears in the statement.
Raises:
-------
ValueError
If `character` is not a single character.
Examples:
---------
>>> count_character("energy efficiency", "e")
4
>>> count_character("hello world", "o")
2
>>> count_character("Python programming", "m")
2
"""
if len(character) != 1:
raise ValueError("The `character` argument must be a single character.")
count = 0
for char in statement:
if char == character:
count += 1
return count
def split_sentence(sentence):
"""
Splits a given sentence into a list of words based on spaces.
Parameters:
-----------
sentence : str
The sentence to split. Must be a string.
Returns:
--------
list
A list of words extracted from the sentence.
Raises:
-------
ValueError
If `sentence` is not a string.
Examples:
---------
>>> split_sentence("Hello world!")
['Hello', 'world!']
>>> split_sentence("SingleWord")
['SingleWord']
>>> split_sentence(" ")
[]
"""
if not type(sentence) is str:
raise ValueError("`sentence` needs to be a string!")
word_list = []
word = ""
for index, char in enumerate(sentence):
if char == " ":
if word:
word_list.append(word)
word = ""
else:
word = f"{word}{char}"
if index == len(sentence)-1 and word:
word_list.append(word)
return word_list
import numpy as np
count = np.arange(10, 20)
v1a = np.full(count.shape, False)
v1b = v1a.copy()
v2a = v1a.copy()
v2b = v1a.copy()
v1a[count > 16] = True
for i, ct in enumerate(count):
if ct > 16:
v1b[i] = True
print(f"v1a is equal to v1b: {np.array_equal(v1a, v1b)}")
for i, ct in enumerate(count):
if ct > 13 and ct < 17:
v2a[i] = True
v2b[(count > 13) & (count < 17)] = True
print(f"v2a is equal to v2b: {np.array_equal(v2a, v2b)}")
v1a is equal to v1b: True
v2a is equal to v2b: True
WX_GNP.csv
datetime | station_id | hs | hn24 | hn72 | rain | iswr | ilwr | ta | rh | vw | dw | elev | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-09-04 06:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000000 | 0.000 | 256.326 | 6.281980 | 94.7963 | 2.32314 | 241.468 | 2121 |
1 | 2019-09-04 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000196 | 555.803 | 288.803 | 12.524600 | 72.0814 | 4.38687 | 247.371 | 2121 |
2 | 2019-09-05 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000045 | 534.011 | 287.089 | 14.265400 | 55.1823 | 1.93691 | 239.254 | 2121 |
3 | 2019-09-06 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000026 | 546.008 | 292.024 | 14.136600 | 72.5560 | 3.67782 | 239.715 | 2121 |
4 | 2019-09-07 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000150 | 528.582 | 289.508 | 14.623800 | 68.9262 | 1.90232 | 227.356 | 2121 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
178733 | 2021-05-25 17:00:00 | VIR088016 | 279.822 | 2.197360 | 7.07140 | 0.108814 | 346.135 | 314.824 | 2.471430 | 94.4473 | 1.28551 | 170.356 | 2121 |
178734 | 2021-05-26 17:00:00 | VIR088016 | 272.909 | 0.000000 | 1.73176 | 0.001291 | 747.383 | 256.193 | 5.066340 | 78.2142 | 4.11593 | 208.909 | 2121 |
178735 | 2021-05-27 17:00:00 | VIR088016 | 267.290 | 2.412910 | 2.80912 | 0.167091 | 185.431 | 316.157 | 1.090880 | 97.3988 | 4.65204 | 218.963 | 2121 |
178736 | 2021-05-28 17:00:00 | VIR088016 | 275.573 | 11.509200 | 12.80780 | 0.000000 | 269.329 | 308.515 | 0.247583 | 88.5444 | 5.33803 | 263.788 | 2121 |
178737 | 2021-05-29 17:00:00 | VIR088016 | 267.562 | 0.259779 | 6.15200 | 0.000766 | 766.358 | 247.267 | 4.666620 | 69.3598 | 1.92538 | 256.041 | 2121 |
178738 rows × 13 columns
tabledict = {
'datetime': 'datetime in the form YYYY-MM-DD HH:MM:SS',
'station_id': 'ID of virtual weather station (i.e., weather model grid point)',
'hs': 'Snow height (cm)',
'hn24': 'Height of new snow within last 24 hours (cm)',
'hn72': 'Height of new snow within last 72 hours (cm)',
'rain': 'Liquid water accumulation within last 24 hours (mm)',
'iswr': 'Incoming shortwave radiation (Wm**-2)',
'iswr': 'Incoming longave radiation (W/m2)',
'ta': 'Air temperature (degrees Celsius)',
'rh': 'Relative humidity (%)',
'vw': 'Wind speed (m/s)',
'dw': 'Wind direction (degrees)',
'elev': 'Station elevation (m asl)'
}
datetime: datetime in the form YYYY-MM-DD HH:MM:SS
station_id: ID of virtual weather station (i.e., weather model grid point)
hs: Snow height (cm)
hn24: Height of new snow within last 24 hours (cm)
hn72: Height of new snow within last 72 hours (cm)
rain: Liquid water accumulation within last 24 hours (mm)
iswr: Incoming longave radiation (W/m2)
ta: Air temperature (degrees Celsius)
rh: Relative humidity (%)
vw: Wind speed (m/s)
dw: Wind direction (degrees)
elev: Station elevation (m asl)
station_ids = WX['station_id'].unique()
n_stations = len(station_ids)
print(f'There are {n_stations} unique stations in the data frame and there labels are as follows:\n {station_ids}')
There are 238 unique stations in the data frame and there labels are as follows:
['VIR075905' 'VIR075906' 'VIR075907' 'VIR076452' 'VIR076456' 'VIR076457'
'VIR076458' 'VIR076459' 'VIR077002' 'VIR077003' 'VIR077004' 'VIR077007'
'VIR077008' 'VIR077009' 'VIR077010' 'VIR077548' 'VIR077549' 'VIR077550'
'VIR077551' 'VIR077552' 'VIR077553' 'VIR077554' 'VIR077555' 'VIR077556'
'VIR077557' 'VIR077558' 'VIR077559' 'VIR077560' 'VIR077561' 'VIR078100'
'VIR078101' 'VIR078102' 'VIR078103' 'VIR078104' 'VIR078105' 'VIR078106'
'VIR078107' 'VIR078108' 'VIR078109' 'VIR078110' 'VIR078111' 'VIR078112'
'VIR078652' 'VIR078653' 'VIR078654' 'VIR078655' 'VIR078656' 'VIR078657'
'VIR078658' 'VIR078659' 'VIR078660' 'VIR078661' 'VIR078662' 'VIR079203'
'VIR079204' 'VIR079205' 'VIR079206' 'VIR079207' 'VIR079208' 'VIR079209'
'VIR079210' 'VIR079211' 'VIR079212' 'VIR079213' 'VIR079754' 'VIR079755'
'VIR079756' 'VIR079757' 'VIR079758' 'VIR079759' 'VIR079760' 'VIR079761'
'VIR079762' 'VIR079763' 'VIR079764' 'VIR080304' 'VIR080305' 'VIR080306'
'VIR080307' 'VIR080308' 'VIR080309' 'VIR080310' 'VIR080311' 'VIR080312'
'VIR080313' 'VIR080314' 'VIR080315' 'VIR080855' 'VIR080856' 'VIR080857'
'VIR080858' 'VIR080859' 'VIR080860' 'VIR080861' 'VIR080862' 'VIR080863'
'VIR080864' 'VIR080865' 'VIR080866' 'VIR081406' 'VIR081407' 'VIR081408'
'VIR081409' 'VIR081410' 'VIR081411' 'VIR081412' 'VIR081413' 'VIR081414'
'VIR081415' 'VIR081416' 'VIR081417' 'VIR081420' 'VIR081956' 'VIR081957'
'VIR081958' 'VIR081959' 'VIR081960' 'VIR081961' 'VIR081962' 'VIR081963'
'VIR081964' 'VIR081965' 'VIR081966' 'VIR081967' 'VIR081968' 'VIR081969'
'VIR081970' 'VIR081971' 'VIR082508' 'VIR082509' 'VIR082510' 'VIR082511'
'VIR082512' 'VIR082513' 'VIR082514' 'VIR082515' 'VIR082516' 'VIR082517'
'VIR082518' 'VIR082519' 'VIR082520' 'VIR082521' 'VIR082522' 'VIR082523'
'VIR083059' 'VIR083060' 'VIR083061' 'VIR083062' 'VIR083063' 'VIR083064'
'VIR083065' 'VIR083066' 'VIR083067' 'VIR083068' 'VIR083069' 'VIR083070'
'VIR083071' 'VIR083072' 'VIR083073' 'VIR083610' 'VIR083611' 'VIR083612'
'VIR083613' 'VIR083614' 'VIR083615' 'VIR083616' 'VIR083617' 'VIR083618'
'VIR083619' 'VIR083620' 'VIR083621' 'VIR083622' 'VIR083623' 'VIR084161'
'VIR084162' 'VIR084163' 'VIR084164' 'VIR084165' 'VIR084166' 'VIR084167'
'VIR084168' 'VIR084169' 'VIR084170' 'VIR084171' 'VIR084711' 'VIR084712'
'VIR084713' 'VIR084714' 'VIR084715' 'VIR084716' 'VIR084717' 'VIR084718'
'VIR084719' 'VIR084720' 'VIR084721' 'VIR084722' 'VIR084723' 'VIR085261'
'VIR085262' 'VIR085263' 'VIR085264' 'VIR085265' 'VIR085266' 'VIR085267'
'VIR085268' 'VIR085269' 'VIR085270' 'VIR085271' 'VIR085272' 'VIR085812'
'VIR085813' 'VIR085814' 'VIR085815' 'VIR085816' 'VIR085817' 'VIR085818'
'VIR085819' 'VIR085820' 'VIR085821' 'VIR085822' 'VIR085823' 'VIR086362'
'VIR086363' 'VIR086364' 'VIR086365' 'VIR086371' 'VIR086372' 'VIR086373'
'VIR086374' 'VIR086912' 'VIR086913' 'VIR086914' 'VIR086915' 'VIR086916'
'VIR087463' 'VIR087464' 'VIR087465' 'VIR088016']
print(f"There are {len(WX['datetime'].unique())} unique time stamps between '{WX['datetime'].min()}' and '{WX['datetime'].max()}'.")
There are 751 unique time stamps between '2018-09-05 06:00:00' and '2021-05-29 17:00:00'.
WX.loc[(WX['elev'] > 2000) & (WX['elev'] < 2200), 'elev'].unique().shape[0]
## or alternatively (less reccommended):
# WX['elev'][(WX['elev'] > 2000) & (WX['elev'] < 2200)].unique().shape[0]
## this is powerful as well:
# WX.loc[(WX['elev'] > 2000) & (WX['elev'] < 2200), ('elev', 'station_id')]
30
count 178738.000000
mean 1836.436975
std 236.796505
min 1279.000000
25% 1671.000000
50% 1828.000000
75% 1989.000000
max 2497.000000
Name: elev, dtype: float64
station_id
of your choice.rh
when either hn24
is greater than 10 cm or rain
is greater than 2 mm? What about the median rh
during the opposite conditions?hn72_check
that should conceptually be identical to hn72
. Use only hn24
to derive hn72_check
.hn72_check
is indeed equal to hn72
. Why not?wx = WX[WX['station_id'].isin(['VIR075905'])].copy()
## or more generically:
# wx = WX.loc[WX['station_id'].isin([WX['station_id'].unique()[0]]), ]
wx
datetime | station_id | hs | hn24 | hn72 | rain | iswr | ilwr | ta | rh | vw | dw | elev | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-09-04 06:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000000 | 0.000 | 256.326 | 6.28198 | 94.7963 | 2.32314 | 241.468 | 2121 |
1 | 2019-09-04 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000196 | 555.803 | 288.803 | 12.52460 | 72.0814 | 4.38687 | 247.371 | 2121 |
2 | 2019-09-05 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000045 | 534.011 | 287.089 | 14.26540 | 55.1823 | 1.93691 | 239.254 | 2121 |
3 | 2019-09-06 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000026 | 546.008 | 292.024 | 14.13660 | 72.5560 | 3.67782 | 239.715 | 2121 |
4 | 2019-09-07 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000150 | 528.582 | 289.508 | 14.62380 | 68.9262 | 1.90232 | 227.356 | 2121 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
114506 | 2021-05-25 17:00:00 | VIR075905 | 281.450 | 0.000000 | 0.00000 | 0.401526 | 346.135 | 314.824 | 3.71658 | 88.5444 | 1.56797 | 156.521 | 2121 |
114507 | 2021-05-26 17:00:00 | VIR075905 | 276.290 | 0.000000 | 0.00000 | 0.025453 | 774.383 | 242.693 | 6.56515 | 72.3112 | 5.24579 | 208.540 | 2121 |
114508 | 2021-05-27 17:00:00 | VIR075905 | 265.546 | 0.000000 | 0.00000 | 0.313308 | 239.181 | 316.157 | 2.42825 | 97.3988 | 7.03859 | 218.594 | 2121 |
114509 | 2021-05-28 17:00:00 | VIR075905 | 268.436 | 5.094300 | 5.09430 | 0.108912 | 376.829 | 301.828 | 1.07766 | 84.1171 | 6.69847 | 266.648 | 2121 |
114510 | 2021-05-29 17:00:00 | VIR075905 | 265.560 | 0.797445 | 3.09568 | 0.010523 | 793.358 | 240.517 | 5.01248 | 70.8355 | 2.92842 | 265.633 | 2121 |
751 rows × 13 columns
Mind the copy in assigning the filtered data frame WX
to a new one wx
! What happens if you don’t copy? (Tip: Try it out and see whether you will see a Warning several cells below!)
wx.loc[~((wx['hn24'] > 10) | (wx['rain'] > 2)), 'rh'].median()
## or equivalently:
# wx.loc[~(wx['hn24'] > 10) & ~(wx['rain'] > 2), 'rh'].median()
# wx.loc[(wx['hn24'] <= 10) & (wx['rain'] <= 2), 'rh'].median()
np.float64(82.03875)
The following solution matches the Pandas knowledge from this unit,
whereas this next solution would be my go-to choice. There are a few extra tricks in here: What it comes down to, is that we would need to use the iloc
operator for selecting the rows based on the integer count i
, but at the same time we want to access the column by its name and not by its location, which requires the loc
operator. I’m ultimately going for the loc
operator to maximize readability and convenience. To still access the correct rows, however, I need to select wx.index[i]
instead of i
alone. Why? –> Because wx.index[i]
will return the row names at the locations i
(see Unit 6).
hn24 | hn72 | hn72_check | hn72_check_II | |
---|---|---|---|---|
0 | 0.000000 | 0.00000 | NaN | NaN |
1 | 0.000000 | 0.00000 | NaN | NaN |
2 | 0.000000 | 0.00000 | NaN | NaN |
3 | 0.000000 | 0.00000 | NaN | NaN |
4 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... |
114506 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
114507 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
114508 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
114509 | 5.094300 | 5.09430 | 5.094300 | 5.094300 |
114510 | 0.797445 | 3.09568 | 5.891745 | 5.891745 |
751 rows × 4 columns
hn72
was computed on the hourly time series. Since we compute hn72_check
on the daily time series, some values are identical while others are not. In any case, the first 4 entries can not be identical, because we are missing data points that are further in the past to compute the 3-day height of new snow.