sentence = "This is a sentence"
sentence.split(" ")['This', 'is', 'a', 'sentence']
Unit 05
Until next unit, work through the material provided in Unit 5 and solve the following exercise blocks in separate notebooks.
Code a function that accepts two arguments, first an arbitrary word or sentence (type str, e.g., "energy efficiency") then a character (also str, e.g., "e"). The function should return an integer of how many times the character appeared in the sentence. Write a comprehensive docstring for the function and test whether it performs as expected with a few test cases.
Consider this:
Code a function that accepts an arbitrary sentence as input and returns a list of strings containing the individual words, just like the example above. However, let’s imagine the .split() method does not exist and you have to find another way of achieving this. Does your function return the same result as the split method?
Consider this:
v1a and v1b are equal again:v2a and v2b are equal again:For this exercise block, you will work with a real data set, nowcast data from a numerical weather prediction model in Glacier National Park, Canada, WX_GNP.csv.
Here is a dictionary that explains the meaning of the spread sheet variables
WX_dict = {
'datetime': 'datetime in the form YYYY-MM-DD HH:MM:SS',
'station_id': 'ID of virtual weather station (i.e., weather model grid point)',
'hs': 'Snow height (cm)',
'hn24': 'Height of new snow within last 24 hours (cm)',
'hn72': 'Height of new snow within last 72 hours (cm)',
'rain': 'Liquid water accumulation within last 24 hours (mm)',
'iswr': 'Incoming shortwave radiation (also refered to as irradiance) (W/m2)',
'ilwr': 'Incoming longave radiation (W/m2)',
'ta': 'Air temperature (degrees Celsius)',
'rh': 'Relative humidity (%)',
'vw': 'Wind speed (m/s)',
'dw': 'Wind direction (degrees)',
'elev': 'Station elevation (m asl)'
}
for key, explanation in WX_dict.items():
print(f'{key:>10}: {explanation}') datetime: datetime in the form YYYY-MM-DD HH:MM:SS
station_id: ID of virtual weather station (i.e., weather model grid point)
hs: Snow height (cm)
hn24: Height of new snow within last 24 hours (cm)
hn72: Height of new snow within last 72 hours (cm)
rain: Liquid water accumulation within last 24 hours (mm)
iswr: Incoming shortwave radiation (also refered to as irradiance) (W/m2)
ilwr: Incoming longave radiation (W/m2)
ta: Air temperature (degrees Celsius)
rh: Relative humidity (%)
vw: Wind speed (m/s)
dw: Wind direction (degrees)
elev: Station elevation (m asl)
WX_GNP.csvThe methods .unique(), .min(), .max(), .describe(), and .quantile() will be handy for these tasks.
station_id of your choice.rh when either hn24 is greater than 10 cm or rain is greater than 2 mm? What about the median rh during the opposite conditions?hn72_check that should conceptually be identical to hn72. Use only hn24 to derive hn72_check.hn72_check is indeed equal to hn72. Why not?def count_character(statement, character):
"""
Count the occurrences of a specific character in a given statement.
Parameters:
-----------
statement : str
The word or sentence in which to search for the character.
character : str
The character to count. Must be a single character.
Returns:
--------
int
The number of times the character appears in the statement.
Raises:
-------
ValueError
If `character` is not a single character.
Examples:
---------
>>> count_character("energy efficiency", "e")
4
>>> count_character("hello world", "o")
2
>>> count_character("Python programming", "m")
2
"""
if len(character) != 1:
raise ValueError("The `character` argument must be a single character.")
count = 0
for char in statement:
if char == character:
count += 1
return countdef split_sentence(sentence):
"""
Splits a given sentence into a list of words based on spaces.
Parameters:
-----------
sentence : str
The sentence to split. Must be a string.
Returns:
--------
list
A list of words extracted from the sentence.
Raises:
-------
ValueError
If `sentence` is not a string.
Examples:
---------
>>> split_sentence("Hello world!")
['Hello', 'world!']
>>> split_sentence("SingleWord")
['SingleWord']
>>> split_sentence(" ")
[]
"""
if not type(sentence) is str:
raise ValueError("`sentence` needs to be a string!")
word_list = []
word = ""
for index, char in enumerate(sentence):
if char == " ":
if word:
word_list.append(word)
word = ""
else:
word = f"{word}{char}"
if index == len(sentence)-1 and word:
word_list.append(word)
return word_listimport numpy as np
count = np.arange(10, 20)
v1a = np.full(count.shape, False)
v1b = v1a.copy()
v2a = v1a.copy()
v2b = v1a.copy()
v1a[count > 16] = True
for i, ct in enumerate(count):
if ct > 16:
v1b[i] = True
print(f"v1a is equal to v1b: {np.array_equal(v1a, v1b)}")
for i, ct in enumerate(count):
if ct > 13 and ct < 17:
v2a[i] = True
v2b[(count > 13) & (count < 17)] = True
print(f"v2a is equal to v2b: {np.array_equal(v2a, v2b)}")v1a is equal to v1b: True
v2a is equal to v2b: True
WX_GNP.csv| datetime | station_id | hs | hn24 | hn72 | rain | iswr | ilwr | ta | rh | vw | dw | elev | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-09-04 06:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000000 | 0.000 | 256.326 | 6.281980 | 94.7963 | 2.32314 | 241.468 | 2121 |
| 1 | 2019-09-04 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000196 | 555.803 | 288.803 | 12.524600 | 72.0814 | 4.38687 | 247.371 | 2121 |
| 2 | 2019-09-05 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000045 | 534.011 | 287.089 | 14.265400 | 55.1823 | 1.93691 | 239.254 | 2121 |
| 3 | 2019-09-06 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000026 | 546.008 | 292.024 | 14.136600 | 72.5560 | 3.67782 | 239.715 | 2121 |
| 4 | 2019-09-07 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000150 | 528.582 | 289.508 | 14.623800 | 68.9262 | 1.90232 | 227.356 | 2121 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 178733 | 2021-05-25 17:00:00 | VIR088016 | 279.822 | 2.197360 | 7.07140 | 0.108814 | 346.135 | 314.824 | 2.471430 | 94.4473 | 1.28551 | 170.356 | 2121 |
| 178734 | 2021-05-26 17:00:00 | VIR088016 | 272.909 | 0.000000 | 1.73176 | 0.001291 | 747.383 | 256.193 | 5.066340 | 78.2142 | 4.11593 | 208.909 | 2121 |
| 178735 | 2021-05-27 17:00:00 | VIR088016 | 267.290 | 2.412910 | 2.80912 | 0.167091 | 185.431 | 316.157 | 1.090880 | 97.3988 | 4.65204 | 218.963 | 2121 |
| 178736 | 2021-05-28 17:00:00 | VIR088016 | 275.573 | 11.509200 | 12.80780 | 0.000000 | 269.329 | 308.515 | 0.247583 | 88.5444 | 5.33803 | 263.788 | 2121 |
| 178737 | 2021-05-29 17:00:00 | VIR088016 | 267.562 | 0.259779 | 6.15200 | 0.000766 | 766.358 | 247.267 | 4.666620 | 69.3598 | 1.92538 | 256.041 | 2121 |
178738 rows × 13 columns
tabledict = {
'datetime': 'datetime in the form YYYY-MM-DD HH:MM:SS',
'station_id': 'ID of virtual weather station (i.e., weather model grid point)',
'hs': 'Snow height (cm)',
'hn24': 'Height of new snow within last 24 hours (cm)',
'hn72': 'Height of new snow within last 72 hours (cm)',
'rain': 'Liquid water accumulation within last 24 hours (mm)',
'iswr': 'Incoming shortwave radiation (Wm**-2)',
'iswr': 'Incoming longave radiation (W/m2)',
'ta': 'Air temperature (degrees Celsius)',
'rh': 'Relative humidity (%)',
'vw': 'Wind speed (m/s)',
'dw': 'Wind direction (degrees)',
'elev': 'Station elevation (m asl)'
} datetime: datetime in the form YYYY-MM-DD HH:MM:SS
station_id: ID of virtual weather station (i.e., weather model grid point)
hs: Snow height (cm)
hn24: Height of new snow within last 24 hours (cm)
hn72: Height of new snow within last 72 hours (cm)
rain: Liquid water accumulation within last 24 hours (mm)
iswr: Incoming longave radiation (W/m2)
ta: Air temperature (degrees Celsius)
rh: Relative humidity (%)
vw: Wind speed (m/s)
dw: Wind direction (degrees)
elev: Station elevation (m asl)
station_ids = WX['station_id'].unique()
n_stations = len(station_ids)
print(f'There are {n_stations} unique stations in the data frame and there labels are as follows:\n {station_ids}')There are 238 unique stations in the data frame and there labels are as follows:
['VIR075905' 'VIR075906' 'VIR075907' 'VIR076452' 'VIR076456' 'VIR076457'
'VIR076458' 'VIR076459' 'VIR077002' 'VIR077003' 'VIR077004' 'VIR077007'
'VIR077008' 'VIR077009' 'VIR077010' 'VIR077548' 'VIR077549' 'VIR077550'
'VIR077551' 'VIR077552' 'VIR077553' 'VIR077554' 'VIR077555' 'VIR077556'
'VIR077557' 'VIR077558' 'VIR077559' 'VIR077560' 'VIR077561' 'VIR078100'
'VIR078101' 'VIR078102' 'VIR078103' 'VIR078104' 'VIR078105' 'VIR078106'
'VIR078107' 'VIR078108' 'VIR078109' 'VIR078110' 'VIR078111' 'VIR078112'
'VIR078652' 'VIR078653' 'VIR078654' 'VIR078655' 'VIR078656' 'VIR078657'
'VIR078658' 'VIR078659' 'VIR078660' 'VIR078661' 'VIR078662' 'VIR079203'
'VIR079204' 'VIR079205' 'VIR079206' 'VIR079207' 'VIR079208' 'VIR079209'
'VIR079210' 'VIR079211' 'VIR079212' 'VIR079213' 'VIR079754' 'VIR079755'
'VIR079756' 'VIR079757' 'VIR079758' 'VIR079759' 'VIR079760' 'VIR079761'
'VIR079762' 'VIR079763' 'VIR079764' 'VIR080304' 'VIR080305' 'VIR080306'
'VIR080307' 'VIR080308' 'VIR080309' 'VIR080310' 'VIR080311' 'VIR080312'
'VIR080313' 'VIR080314' 'VIR080315' 'VIR080855' 'VIR080856' 'VIR080857'
'VIR080858' 'VIR080859' 'VIR080860' 'VIR080861' 'VIR080862' 'VIR080863'
'VIR080864' 'VIR080865' 'VIR080866' 'VIR081406' 'VIR081407' 'VIR081408'
'VIR081409' 'VIR081410' 'VIR081411' 'VIR081412' 'VIR081413' 'VIR081414'
'VIR081415' 'VIR081416' 'VIR081417' 'VIR081420' 'VIR081956' 'VIR081957'
'VIR081958' 'VIR081959' 'VIR081960' 'VIR081961' 'VIR081962' 'VIR081963'
'VIR081964' 'VIR081965' 'VIR081966' 'VIR081967' 'VIR081968' 'VIR081969'
'VIR081970' 'VIR081971' 'VIR082508' 'VIR082509' 'VIR082510' 'VIR082511'
'VIR082512' 'VIR082513' 'VIR082514' 'VIR082515' 'VIR082516' 'VIR082517'
'VIR082518' 'VIR082519' 'VIR082520' 'VIR082521' 'VIR082522' 'VIR082523'
'VIR083059' 'VIR083060' 'VIR083061' 'VIR083062' 'VIR083063' 'VIR083064'
'VIR083065' 'VIR083066' 'VIR083067' 'VIR083068' 'VIR083069' 'VIR083070'
'VIR083071' 'VIR083072' 'VIR083073' 'VIR083610' 'VIR083611' 'VIR083612'
'VIR083613' 'VIR083614' 'VIR083615' 'VIR083616' 'VIR083617' 'VIR083618'
'VIR083619' 'VIR083620' 'VIR083621' 'VIR083622' 'VIR083623' 'VIR084161'
'VIR084162' 'VIR084163' 'VIR084164' 'VIR084165' 'VIR084166' 'VIR084167'
'VIR084168' 'VIR084169' 'VIR084170' 'VIR084171' 'VIR084711' 'VIR084712'
'VIR084713' 'VIR084714' 'VIR084715' 'VIR084716' 'VIR084717' 'VIR084718'
'VIR084719' 'VIR084720' 'VIR084721' 'VIR084722' 'VIR084723' 'VIR085261'
'VIR085262' 'VIR085263' 'VIR085264' 'VIR085265' 'VIR085266' 'VIR085267'
'VIR085268' 'VIR085269' 'VIR085270' 'VIR085271' 'VIR085272' 'VIR085812'
'VIR085813' 'VIR085814' 'VIR085815' 'VIR085816' 'VIR085817' 'VIR085818'
'VIR085819' 'VIR085820' 'VIR085821' 'VIR085822' 'VIR085823' 'VIR086362'
'VIR086363' 'VIR086364' 'VIR086365' 'VIR086371' 'VIR086372' 'VIR086373'
'VIR086374' 'VIR086912' 'VIR086913' 'VIR086914' 'VIR086915' 'VIR086916'
'VIR087463' 'VIR087464' 'VIR087465' 'VIR088016']
print(f"There are {len(WX['datetime'].unique())} unique time stamps between '{WX['datetime'].min()}' and '{WX['datetime'].max()}'.")There are 751 unique time stamps between '2018-09-05 06:00:00' and '2021-05-29 17:00:00'.
WX.loc[(WX['elev'] > 2000) & (WX['elev'] < 2200), 'elev'].unique().shape[0]
## or alternatively (less reccommended):
# WX['elev'][(WX['elev'] > 2000) & (WX['elev'] < 2200)].unique().shape[0]
## this is powerful as well:
# WX.loc[(WX['elev'] > 2000) & (WX['elev'] < 2200), ('elev', 'station_id')]30
count 178738.000000
mean 1836.436975
std 236.796505
min 1279.000000
25% 1671.000000
50% 1828.000000
75% 1989.000000
max 2497.000000
Name: elev, dtype: float64
station_id of your choice.rh when either hn24 is greater than 10 cm or rain is greater than 2 mm? What about the median rh during the opposite conditions?hn72_check that should conceptually be identical to hn72. Use only hn24 to derive hn72_check.hn72_check is indeed equal to hn72. Why not?wx = WX[WX['station_id'].isin(['VIR075905'])].copy()
## or more generically:
# wx = WX.loc[WX['station_id'].isin([WX['station_id'].unique()[0]]), ]
wx| datetime | station_id | hs | hn24 | hn72 | rain | iswr | ilwr | ta | rh | vw | dw | elev | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-09-04 06:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000000 | 0.000 | 256.326 | 6.28198 | 94.7963 | 2.32314 | 241.468 | 2121 |
| 1 | 2019-09-04 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000196 | 555.803 | 288.803 | 12.52460 | 72.0814 | 4.38687 | 247.371 | 2121 |
| 2 | 2019-09-05 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000045 | 534.011 | 287.089 | 14.26540 | 55.1823 | 1.93691 | 239.254 | 2121 |
| 3 | 2019-09-06 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000026 | 546.008 | 292.024 | 14.13660 | 72.5560 | 3.67782 | 239.715 | 2121 |
| 4 | 2019-09-07 17:00:00 | VIR075905 | 0.000 | 0.000000 | 0.00000 | 0.000150 | 528.582 | 289.508 | 14.62380 | 68.9262 | 1.90232 | 227.356 | 2121 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 114506 | 2021-05-25 17:00:00 | VIR075905 | 281.450 | 0.000000 | 0.00000 | 0.401526 | 346.135 | 314.824 | 3.71658 | 88.5444 | 1.56797 | 156.521 | 2121 |
| 114507 | 2021-05-26 17:00:00 | VIR075905 | 276.290 | 0.000000 | 0.00000 | 0.025453 | 774.383 | 242.693 | 6.56515 | 72.3112 | 5.24579 | 208.540 | 2121 |
| 114508 | 2021-05-27 17:00:00 | VIR075905 | 265.546 | 0.000000 | 0.00000 | 0.313308 | 239.181 | 316.157 | 2.42825 | 97.3988 | 7.03859 | 218.594 | 2121 |
| 114509 | 2021-05-28 17:00:00 | VIR075905 | 268.436 | 5.094300 | 5.09430 | 0.108912 | 376.829 | 301.828 | 1.07766 | 84.1171 | 6.69847 | 266.648 | 2121 |
| 114510 | 2021-05-29 17:00:00 | VIR075905 | 265.560 | 0.797445 | 3.09568 | 0.010523 | 793.358 | 240.517 | 5.01248 | 70.8355 | 2.92842 | 265.633 | 2121 |
751 rows × 13 columns
Mind the copy in assigning the filtered data frame WX to a new one wx! What happens if you don’t copy? (Tip: Try it out and see whether you will see a Warning several cells below!)
wx.loc[~((wx['hn24'] > 10) | (wx['rain'] > 2)), 'rh'].median()
## or equivalently:
# wx.loc[~(wx['hn24'] > 10) & ~(wx['rain'] > 2), 'rh'].median()
# wx.loc[(wx['hn24'] <= 10) & (wx['rain'] <= 2), 'rh'].median()np.float64(82.03875)
The following solution matches the Pandas knowledge from this unit,
whereas this next solution would be my go-to choice. There are a few extra tricks in here: What it comes down to, is that we would need to use the iloc operator for selecting the rows based on the integer count i, but at the same time we want to access the column by its name and not by its location, which requires the loc operator. I’m ultimately going for the loc operator to maximize readability and convenience. To still access the correct rows, however, I need to select wx.index[i] instead of i alone. Why? –> Because wx.index[i] will return the row names at the locations i (see Unit 6).
| hn24 | hn72 | hn72_check | hn72_check_II | |
|---|---|---|---|---|
| 0 | 0.000000 | 0.00000 | NaN | NaN |
| 1 | 0.000000 | 0.00000 | NaN | NaN |
| 2 | 0.000000 | 0.00000 | NaN | NaN |
| 3 | 0.000000 | 0.00000 | NaN | NaN |
| 4 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
| ... | ... | ... | ... | ... |
| 114506 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
| 114507 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
| 114508 | 0.000000 | 0.00000 | 0.000000 | 0.000000 |
| 114509 | 5.094300 | 5.09430 | 5.094300 | 5.094300 |
| 114510 | 0.797445 | 3.09568 | 5.891745 | 5.891745 |
751 rows × 4 columns
hn72 was computed on the hourly time series. Since we compute hn72_check on the daily time series, some values are identical while others are not. In any case, the first 4 entries can not be identical, because we are missing data points that are further in the past to compute the 3-day height of new snow.