Week 4 Wednesday#

Announcements#

  • The midterm is Monday. A sample midterm is posted on the Week 5 page on Canvas. Solutions posted by Friday.

  • Jinghao gave out notecards on Tuesday, can put handwritten notes on both sides for the midterm. Let me know if you still need a notecard.

  • The best way to study is to go over the worksheets, quizzes, and the sample midterm. Next priority would be the lecture notes.

  • I briefly mention lots of topics. If you’re not sure if something could appear on the midterm, ask on Ed Discussion and I’ll answer there.

  • Worksheet 8 is posted. Both Worksheets 7-8 are due Monday night (the usual schedule). Working on them will help you prepare for the midterm.

  • I have office hours before class on Friday, 12-1pm, in the same classroom ALP 3610.

  • Friday will have a slightly different structure from usual. There will be time to work at the beginning (about 15 minutes) and then lecture for the last part of class.

Plan for today#

Today I would like to introduce the pandas DataFrame method apply (which is necessary for this week’s worksheets and is also good practice with axis) and continue discussing the Python topics from Monday (list comprehension, f-strings, and lambda functions).

Practice with the DataFrame method apply#

  • Load the “taxis” dataset from Seaborn and drop rows with missing values.

import seaborn as sns
import pandas as pd
df = sns.load_dataset("taxis")
df = df.dropna(axis=0)
  • As a warm-up with apply, try applying the len function using first axis=0 and then axis=1. We’ll see more serious uses below and in the homework.

Notice how the row labels have disappeared but the column labels are still here. We have changed the row labels, which corresponds to axis=0. The numbers 6341 correspond to the length of each of these columns.

df.apply(len, axis=0)
pickup             6341
dropoff            6341
passengers         6341
distance           6341
fare               6341
tip                6341
tolls              6341
total              6341
color              6341
payment            6341
pickup_zone        6341
dropoff_zone       6341
pickup_borough     6341
dropoff_borough    6341
dtype: int64

Here is an example of where the 6341 entry for "pickup" came from.

len(df["pickup"])
6341

We were applying the len function to each of these columns.

df
pickup dropoff passengers distance fare tip tolls total color payment pickup_zone dropoff_zone pickup_borough dropoff_borough
0 2019-03-23 20:21:09 2019-03-23 20:27:24 1 1.60 7.0 2.15 0.0 12.95 yellow credit card Lenox Hill West UN/Turtle Bay South Manhattan Manhattan
1 2019-03-04 16:11:55 2019-03-04 16:19:00 1 0.79 5.0 0.00 0.0 9.30 yellow cash Upper West Side South Upper West Side South Manhattan Manhattan
2 2019-03-27 17:53:01 2019-03-27 18:00:25 1 1.37 7.5 2.36 0.0 14.16 yellow credit card Alphabet City West Village Manhattan Manhattan
3 2019-03-10 01:23:59 2019-03-10 01:49:51 1 7.70 27.0 6.15 0.0 36.95 yellow credit card Hudson Sq Yorkville West Manhattan Manhattan
4 2019-03-30 13:27:42 2019-03-30 13:37:14 3 2.16 9.0 1.10 0.0 13.40 yellow credit card Midtown East Yorkville West Manhattan Manhattan
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6428 2019-03-31 09:51:53 2019-03-31 09:55:27 1 0.75 4.5 1.06 0.0 6.36 green credit card East Harlem North Central Harlem North Manhattan Manhattan
6429 2019-03-31 17:38:00 2019-03-31 18:34:23 1 18.74 58.0 0.00 0.0 58.80 green credit card Jamaica East Concourse/Concourse Village Queens Bronx
6430 2019-03-23 22:55:18 2019-03-23 23:14:25 1 4.14 16.0 0.00 0.0 17.30 green cash Crown Heights North Bushwick North Brooklyn Brooklyn
6431 2019-03-04 10:09:25 2019-03-04 10:14:29 1 1.12 6.0 0.00 0.0 6.80 green credit card East New York East Flatbush/Remsen Village Brooklyn Brooklyn
6432 2019-03-13 19:31:22 2019-03-13 19:48:02 1 3.85 15.0 3.36 0.0 20.16 green credit card Boerum Hill Windsor Terrace Brooklyn Brooklyn

6341 rows Ă— 14 columns

If we use axis=1 instead, then we are plugging in one row at a time. Those rows all have length 14, which is where the following values come from.

df.apply(len, axis=1)
0       14
1       14
2       14
3       14
4       14
        ..
6428    14
6429    14
6430    14
6431    14
6432    14
Length: 6341, dtype: int64

Here is an example where we see different values for different columns. Here the input is a column, and the output is its dtype.

df.apply(lambda x: x.dtype, axis=0)
pickup             datetime64[ns]
dropoff            datetime64[ns]
passengers                  int64
distance                  float64
fare                      float64
tip                       float64
tolls                     float64
total                     float64
color                      object
payment                    object
pickup_zone                object
dropoff_zone               object
pickup_borough             object
dropoff_borough            object
dtype: object

Compare the above use of apply to map, which is used on a single pandas Series, rather than on a whole pandas DataFrame.

ser = df["payment"]
ser
0       credit card
1              cash
2       credit card
3       credit card
4       credit card
           ...     
6428    credit card
6429    credit card
6430           cash
6431    credit card
6432    credit card
Name: payment, Length: 6341, dtype: object

Here we are trying to add 5 to each entry, but we get an error because the entries are strings in ser.

ser.map(lambda x: x+5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [10], line 1
----> 1 ser.map(lambda x: x+5)

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/series.py:3909, in Series.map(self, arg, na_action)
   3834 def map(self, arg, na_action=None) -> "Series":
   3835     """
   3836     Map values of Series according to input correspondence.
   3837 
   (...)
   3907     dtype: object
   3908     """
-> 3909     new_values = super()._map_values(arg, na_action=na_action)
   3910     return self._constructor(new_values, index=self.index).__finalize__(
   3911         self, method="map"
   3912     )

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/base.py:937, in IndexOpsMixin._map_values(self, mapper, na_action)
    934         raise ValueError(msg)
    936 # mapper is a function
--> 937 new_values = map_f(values, mapper)
    939 return new_values

File pandas/_libs/lib.pyx:2467, in pandas._libs.lib.map_infer()

Cell In [10], line 1, in <lambda>(x)
----> 1 ser.map(lambda x: x+5)

TypeError: can only concatenate str (not "int") to str

If we convert 5 to a string (we could also put it in quotation marks), then we don’t get an error. Notice how the function here is a little simpler than what we used above, in the sense that the input to this lambda function is a string, whereas the inputs above were pandas Series.

ser.map(lambda x: x+str(5))
0       credit card5
1              cash5
2       credit card5
3       credit card5
4       credit card5
            ...     
6428    credit card5
6429    credit card5
6430           cash5
6431    credit card5
6432    credit card5
Name: payment, Length: 6341, dtype: object
  • Write a function has_airport which takes as input a row from df (not a row label but an entire row) and as output returns True if the “pickup_zone” or “dropoff_zone” entry contains the substring “Airport”.

Here is a more realistic application of apply. We start with a function that takes as input a pandas Series (a row of the DataFrame). In this case, the output is the Boolean value True or False.

def has_airport(row):
    return ("Airport" in row["pickup_zone"]) or ("Airport" in row["dropoff_zone"])

Here is a typical input. We get the row at integer location 30 from df.

z = df.iloc[30]

Here is what that particular row looks like. It’s still a pandas Series, just like a column.

z
pickup                   2019-03-01 02:55:55
dropoff                  2019-03-01 02:57:59
passengers                                 3
distance                                0.74
fare                                     4.0
tip                                      0.0
tolls                                    0.0
total                                    7.8
color                                 yellow
payment                                 cash
pickup_zone                     Clinton East
dropoff_zone       West Chelsea/Hudson Yards
pickup_borough                     Manhattan
dropoff_borough                    Manhattan
Name: 31, dtype: object

Notice that neither the “pickup_zone” value nor the “dropoff_zone” value contains "Airport", so that is why our function returns False.

has_airport(z)
False

Here is that “pickup_zone” value.

z["pickup_zone"]
'Clinton East'

We could also use loc, but that is not as common with pandas Series. (We can just use the square brackets directly. Unlike with a pandas DataFrame, where using square brackets directly means something else.)

z.loc["pickup_zone"]
'Clinton East'

Here is a row where there is an "Airport" substring.

has_airport(df.loc[10])
True
  • Make a new column "Airport0" which contains True if the “pickup_zone” or “dropoff_zone” contains the substring “Airport”, and otherwise contains False. Use the has_airport function and df.apply(???, axis=???).

Because we are evaluating this to a row at a time, we are using axis=1.

df["Airport0"] = df.apply(has_airport, axis=1)

Scroll to the right and notice how an “Airport0” column is now there.

df
pickup dropoff passengers distance fare tip tolls total color payment pickup_zone dropoff_zone pickup_borough dropoff_borough Airport0
0 2019-03-23 20:21:09 2019-03-23 20:27:24 1 1.60 7.0 2.15 0.0 12.95 yellow credit card Lenox Hill West UN/Turtle Bay South Manhattan Manhattan False
1 2019-03-04 16:11:55 2019-03-04 16:19:00 1 0.79 5.0 0.00 0.0 9.30 yellow cash Upper West Side South Upper West Side South Manhattan Manhattan False
2 2019-03-27 17:53:01 2019-03-27 18:00:25 1 1.37 7.5 2.36 0.0 14.16 yellow credit card Alphabet City West Village Manhattan Manhattan False
3 2019-03-10 01:23:59 2019-03-10 01:49:51 1 7.70 27.0 6.15 0.0 36.95 yellow credit card Hudson Sq Yorkville West Manhattan Manhattan False
4 2019-03-30 13:27:42 2019-03-30 13:37:14 3 2.16 9.0 1.10 0.0 13.40 yellow credit card Midtown East Yorkville West Manhattan Manhattan False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6428 2019-03-31 09:51:53 2019-03-31 09:55:27 1 0.75 4.5 1.06 0.0 6.36 green credit card East Harlem North Central Harlem North Manhattan Manhattan False
6429 2019-03-31 17:38:00 2019-03-31 18:34:23 1 18.74 58.0 0.00 0.0 58.80 green credit card Jamaica East Concourse/Concourse Village Queens Bronx False
6430 2019-03-23 22:55:18 2019-03-23 23:14:25 1 4.14 16.0 0.00 0.0 17.30 green cash Crown Heights North Bushwick North Brooklyn Brooklyn False
6431 2019-03-04 10:09:25 2019-03-04 10:14:29 1 1.12 6.0 0.00 0.0 6.80 green credit card East New York East Flatbush/Remsen Village Brooklyn Brooklyn False
6432 2019-03-13 19:31:22 2019-03-13 19:48:02 1 3.85 15.0 3.36 0.0 20.16 green credit card Boerum Hill Windsor Terrace Brooklyn Brooklyn False

6341 rows Ă— 15 columns

If we try to apply using axis=0 in this case, we get an error, because the code included row["pickup_zone"], which doesn’t make sense for a column in this DataFrame. (There is no row with label “pickup_zone”.)

df.apply(has_airport, axis=0)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/indexes/base.py:3081, in Index.get_loc(self, key, method, tolerance)
   3080 try:
-> 3081     return self._engine.get_loc(casted_key)
   3082 except KeyError as err:

File pandas/_libs/index.pyx:70, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:98, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index_class_helper.pxi:89, in pandas._libs.index.Int64Engine._check_type()

KeyError: 'pickup_zone'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In [24], line 1
----> 1 df.apply(has_airport, axis=0)

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/frame.py:7768, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwds)
   7757 from pandas.core.apply import frame_apply
   7759 op = frame_apply(
   7760     self,
   7761     func=func,
   (...)
   7766     kwds=kwds,
   7767 )
-> 7768 return op.get_result()

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/apply.py:185, in FrameApply.get_result(self)
    182 elif self.raw:
    183     return self.apply_raw()
--> 185 return self.apply_standard()

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/apply.py:276, in FrameApply.apply_standard(self)
    275 def apply_standard(self):
--> 276     results, res_index = self.apply_series_generator()
    278     # wrap results
    279     return self.wrap_results(results, res_index)

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/apply.py:290, in FrameApply.apply_series_generator(self)
    287 with option_context("mode.chained_assignment", None):
    288     for i, v in enumerate(series_gen):
    289         # ignore SettingWithCopy here in case the user mutates
--> 290         results[i] = self.f(v)
    291         if isinstance(results[i], ABCSeries):
    292             # If we have a view on v, we need to make a copy because
    293             #  series_generator will swap out the underlying data
    294             results[i] = results[i].copy(deep=False)

Cell In [12], line 2, in has_airport(row)
      1 def has_airport(row):
----> 2     return ("Airport" in row["pickup_zone"]) or ("Airport" in row["dropoff_zone"])

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/series.py:853, in Series.__getitem__(self, key)
    850     return self._values[key]
    852 elif key_is_scalar:
--> 853     return self._get_value(key)
    855 if is_hashable(key):
    856     # Otherwise index.get_value will raise InvalidIndexError
    857     try:
    858         # For labels that don't resolve as scalars like tuples and frozensets

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/series.py:961, in Series._get_value(self, label, takeable)
    958     return self._values[label]
    960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
    962 return self.index._get_values_for_loc(self, loc, label)

File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/indexes/base.py:3083, in Index.get_loc(self, key, method, tolerance)
   3081         return self._engine.get_loc(casted_key)
   3082     except KeyError as err:
-> 3083         raise KeyError(key) from err
   3085 if tolerance is not None:
   3086     tolerance = self._convert_tolerance(tolerance, np.asarray(key))

KeyError: 'pickup_zone'
  • Make a column "Airport1" with the same values, again using apply, but this time using a lambda function.

The following is the exact same as above, just using an anonymous lambda function instead of our named has_airport function. I’ve broken it up over multiple lines just to make it a little easier to read.

df["Airport1"] = df.apply(
    lambda row: ("Airport" in row["pickup_zone"]) or ("Airport" in row["dropoff_zone"]),
    axis=1
)

Notice how we now have a new column on the far right.

df
pickup dropoff passengers distance fare tip tolls total color payment pickup_zone dropoff_zone pickup_borough dropoff_borough Airport0 Airport1
0 2019-03-23 20:21:09 2019-03-23 20:27:24 1 1.60 7.0 2.15 0.0 12.95 yellow credit card Lenox Hill West UN/Turtle Bay South Manhattan Manhattan False False
1 2019-03-04 16:11:55 2019-03-04 16:19:00 1 0.79 5.0 0.00 0.0 9.30 yellow cash Upper West Side South Upper West Side South Manhattan Manhattan False False
2 2019-03-27 17:53:01 2019-03-27 18:00:25 1 1.37 7.5 2.36 0.0 14.16 yellow credit card Alphabet City West Village Manhattan Manhattan False False
3 2019-03-10 01:23:59 2019-03-10 01:49:51 1 7.70 27.0 6.15 0.0 36.95 yellow credit card Hudson Sq Yorkville West Manhattan Manhattan False False
4 2019-03-30 13:27:42 2019-03-30 13:37:14 3 2.16 9.0 1.10 0.0 13.40 yellow credit card Midtown East Yorkville West Manhattan Manhattan False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6428 2019-03-31 09:51:53 2019-03-31 09:55:27 1 0.75 4.5 1.06 0.0 6.36 green credit card East Harlem North Central Harlem North Manhattan Manhattan False False
6429 2019-03-31 17:38:00 2019-03-31 18:34:23 1 18.74 58.0 0.00 0.0 58.80 green credit card Jamaica East Concourse/Concourse Village Queens Bronx False False
6430 2019-03-23 22:55:18 2019-03-23 23:14:25 1 4.14 16.0 0.00 0.0 17.30 green cash Crown Heights North Bushwick North Brooklyn Brooklyn False False
6431 2019-03-04 10:09:25 2019-03-04 10:14:29 1 1.12 6.0 0.00 0.0 6.80 green credit card East New York East Flatbush/Remsen Village Brooklyn Brooklyn False False
6432 2019-03-13 19:31:22 2019-03-13 19:48:02 1 3.85 15.0 3.36 0.0 20.16 green credit card Boerum Hill Windsor Terrace Brooklyn Brooklyn False False

6341 rows Ă— 16 columns

Here is a fancier but more confusing approach.

  • Notice that df[["pickup_zone", "dropoff_zone"]] is a DataFrame containing just those two columns we’re interested in. Use apply this time to apply a lambda function to the columns rather than the rows, then use any. Name the result "Airport2".

We start off getting a two-column DataFrame, that only contains the two columns of interest.

df_sub = df[["pickup_zone", "dropoff_zone"]]
df_sub
pickup_zone dropoff_zone
0 Lenox Hill West UN/Turtle Bay South
1 Upper West Side South Upper West Side South
2 Alphabet City West Village
3 Hudson Sq Yorkville West
4 Midtown East Yorkville West
... ... ...
6428 East Harlem North Central Harlem North
6429 Jamaica East Concourse/Concourse Village
6430 Crown Heights North Bushwick North
6431 East New York East Flatbush/Remsen Village
6432 Boerum Hill Windsor Terrace

6341 rows Ă— 2 columns

We now use apply again, but this time to the columns, not to the rows. This approach is nicer in some ways, because there is less repetition. Unlike in the code ("Airport" in row["pickup_zone"]) or ("Airport" in row["dropoff_zone"]), where we had the same thing repeated almost entirely twice.

df_sub.apply(lambda col: col.str.contains("Airport"), axis=0)
pickup_zone dropoff_zone
0 False False
1 False False
2 False False
3 False False
4 False False
... ... ...
6428 False False
6429 False False
6430 False False
6431 False False
6432 False False

6341 rows Ă— 2 columns

As an example of how axis works, if we apply any(axis=0), we are checking if there are any True values in the columns. This Series has length two, because there are two columns.

df_sub.apply(lambda col: col.str.contains("Airport"), axis=0).any(axis=0)
pickup_zone     True
dropoff_zone    True
dtype: bool

But what we really want to do is apply any(axis=1), to find out if any of the rows have a True in them. This produces a much longer Series, and we store it in the “Airport2” column.

df["Airport2"] = df_sub.apply(lambda col: col.str.contains("Airport"), axis=0).any(axis=1)

Notice how we now have yet another column on the far right of the DataFrame.

df
pickup dropoff passengers distance fare tip tolls total color payment pickup_zone dropoff_zone pickup_borough dropoff_borough Airport0 Airport1 Airport2
0 2019-03-23 20:21:09 2019-03-23 20:27:24 1 1.60 7.0 2.15 0.0 12.95 yellow credit card Lenox Hill West UN/Turtle Bay South Manhattan Manhattan False False False
1 2019-03-04 16:11:55 2019-03-04 16:19:00 1 0.79 5.0 0.00 0.0 9.30 yellow cash Upper West Side South Upper West Side South Manhattan Manhattan False False False
2 2019-03-27 17:53:01 2019-03-27 18:00:25 1 1.37 7.5 2.36 0.0 14.16 yellow credit card Alphabet City West Village Manhattan Manhattan False False False
3 2019-03-10 01:23:59 2019-03-10 01:49:51 1 7.70 27.0 6.15 0.0 36.95 yellow credit card Hudson Sq Yorkville West Manhattan Manhattan False False False
4 2019-03-30 13:27:42 2019-03-30 13:37:14 3 2.16 9.0 1.10 0.0 13.40 yellow credit card Midtown East Yorkville West Manhattan Manhattan False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6428 2019-03-31 09:51:53 2019-03-31 09:55:27 1 0.75 4.5 1.06 0.0 6.36 green credit card East Harlem North Central Harlem North Manhattan Manhattan False False False
6429 2019-03-31 17:38:00 2019-03-31 18:34:23 1 18.74 58.0 0.00 0.0 58.80 green credit card Jamaica East Concourse/Concourse Village Queens Bronx False False False
6430 2019-03-23 22:55:18 2019-03-23 23:14:25 1 4.14 16.0 0.00 0.0 17.30 green cash Crown Heights North Bushwick North Brooklyn Brooklyn False False False
6431 2019-03-04 10:09:25 2019-03-04 10:14:29 1 1.12 6.0 0.00 0.0 6.80 green credit card East New York East Flatbush/Remsen Village Brooklyn Brooklyn False False False
6432 2019-03-13 19:31:22 2019-03-13 19:48:02 1 3.85 15.0 3.36 0.0 20.16 green credit card Boerum Hill Windsor Terrace Brooklyn Brooklyn False False False

6341 rows Ă— 17 columns

  • Make a list air_cols containing the names of those three columns we just made. Use list comprehension and the string startswith method.

Before we do that, here is an example of including an if statement at the end of a list comprehension. It indicates which values we will use in our list. For example in the following, we are iterating over x = 0, 1, 2, 3, 4 (remember that the right endpoint is typically not included in Python), but we are only using the values which are strictly greater than 2. So we are putting 3**2 and 4**2 into a list. (Side comment: In Python, we do not use the caret symbol ^ for powers, instead we use **.)

[x**2 for x in range(5) if x > 2]
[9, 16]

Here is an example without the if statement. This is just a complicated way to turn df.columns into a list.

[c for c in df.columns]
['pickup',
 'dropoff',
 'passengers',
 'distance',
 'fare',
 'tip',
 'tolls',
 'total',
 'color',
 'payment',
 'pickup_zone',
 'dropoff_zone',
 'pickup_borough',
 'dropoff_borough',
 'Airport0',
 'Airport1',
 'Airport2']

Here is a warm-up. This produces a length 1 list, because only one column name satisfies the condition.

[c for c in df.columns if c == "Airport1"]
['Airport1']

Finally here is the list we want. The startswith method itself is not important for us. The important thing for us is the logic that’s being used here. We are making a list containing all the column names which start with the string "Airport".

[c for c in df.columns if c.startswith("Airport")]
['Airport0', 'Airport1', 'Airport2']

Here is almost another way to make the list. Notice how we have not done anything with i yet.

[f"Airport" for i in range(3)]
['Airport', 'Airport', 'Airport']

Here we include i, thanks to f-strings.

[f"Airport{i}" for i in range(3)]
['Airport0', 'Airport1', 'Airport2']

That is where we stopped on Wednesday.

  • Check that the values in these three columns are the same. One approach: compute row.unique() for every row in the DataFrame, and check that the length is always 1.

Practice with groupby and f-strings#

  • Make a new column named “Day” in the DataFrame, containing the day of the pickup (as a string, like "Monday").

  • Why didn’t we have to use pd.to_datetime? Evaluate df.dtypes and df.dtypes["pickup"] to see the reason.

  • For each day of the week, what was the median tip for that day? Put the results into a pandas Series using groupby.

  • For each day of the week, print out a string stating the median tip amount for that day. Use :.2f in the string formatting part so that only two decimal places get printed.

day_list = [pd.to_datetime(f"4-{i}-2023").day_name() for i in range(24,31)]
day_list
['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

Practice with list comprehension#

(I doubt we’ll get this far.)

  • Using list comprehension, make a list of all the row labels for which the “pickup_zone” or “dropoff_zone” column contains the substring “Airport”.

  • Compute the same thing using pandas techniques.

  • What if we wanted the integer locations instead, rather than the labels?

Created in deepnote.com Created in Deepnote