Week 1 Friday
Contents
Week 1 Friday#
Announcements#
Videos and video quizzes due now. Submit the quizzes sometime this weekend if you haven’t finished them yet.
Our first in-class quiz is Tuesday during discussion section. It is based on the material up to and including Worksheet 2 (especially Boolean indexing and the
str
anddt
accessor attributes).
Warm-up#
One more class with the vending machines dataset. We’ll use a different dataset on Monday.
import pandas as pd
df = pd.read_csv("vend.csv")
date_series = df["TransDate"]
date_series
0 Saturday, January 1, 2022
1 Saturday, January 1, 2022
2 Saturday, January 1, 2022
3 Saturday, January 1, 2022
4 Saturday, January 1, 2022
...
6440 Wednesday, August 31, 2022
6441 Wednesday, August 31, 2022
6442 Wednesday, August 31, 2022
6443 Wednesday, August 31, 2022
6444 Wednesday, August 31, 2022
Name: TransDate, Length: 6445, dtype: object
List each transaction date that corresponds to “Monday” in the DataFrame. Don’t list the same date twice. Use the following strategies.
Use
find
, as on Worksheet 2. (This approach was given on the Worksheet for practice, but I think you’ll agree that the other two approaches given here feel more natural. Thefind
approach is most useful if we care about where the substring occurs.)
It’s good to be able to recognize the following error. What do we need to add, so that there is a find
attribute?
date_series.find("Monday")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [6], line 1
----> 1 date_series.find("Monday")
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/generic.py:5465, in NDFrame.__getattr__(self, name)
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5464 return self[name]
-> 5465 return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'find'
We need to add the str
string accessor attribute between the pandas Series and the find
method.
date_series.str.find("Monday")
0 -1
1 -1
2 -1
3 -1
4 -1
..
6440 -1
6441 -1
6442 -1
6443 -1
6444 -1
Name: TransDate, Length: 6445, dtype: int64
Recall that -1
in this case means “not found”. We are interested in the positions where “Monday” is found, so we want to check that the value is not equal to -1
. This is a little indirect, and the below methods will feel more natural.
date_series.str.find("Monday") != -1
0 False
1 False
2 False
3 False
4 False
...
6440 False
6441 False
6442 False
6443 False
6444 False
Name: TransDate, Length: 6445, dtype: bool
The above is an example of a “Boolean Series”. We now perform “Boolean indexing” using that Series. This keeps all the rows in df
where True
occurs in the Series.
df[date_series.str.find("Monday") != -1]
Status | Device ID | Location | Machine | Product | Category | Transaction | TransDate | Type | RCoil | RPrice | RQty | MCoil | MPrice | MQty | LineTotal | TransTotal | Prcd Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | Processed | VJ300205292 | Brunswick Sq Mall | BSQ Mall x1364 - Zales | Miss Vickie's Potato Chip - Sea Salt & Vinega | Food | 14518731524 | Monday, January 3, 2022 | Cash | 114 | 1.5 | 1 | 114 | 1.5 | 1 | 1.5 | 1.5 | 1/2/2022 |
7 | Processed | VJ300320686 | Earle Asphalt | Earle Asphalt x1371 | Miss Vickie's Potato Chip - Lime & Cracked Pe | Food | 14519162059 | Monday, January 3, 2022 | Credit | 110 | 1.5 | 1 | 110 | 1.5 | 1 | 1.5 | 1.5 | 1/3/2022 |
8 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Monster Energy Original | Carbonated | 14519670154 | Monday, January 3, 2022 | Credit | 144 | 3.0 | 1 | 144 | 3.0 | 1 | 3.0 | 3.0 | 1/3/2022 |
9 | Processed | VJ300320686 | Earle Asphalt | Earle Asphalt x1371 | Seapoint Farms Dry Roasted Edamame - Wasabi | Food | 14520315330 | Monday, January 3, 2022 | Credit | 134 | 2.5 | 1 | 134 | 2.5 | 1 | 2.5 | 2.5 | 1/3/2022 |
10 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Snapple Diet Tea - Lemon | Non Carbonated | 14520522827 | Monday, January 3, 2022 | Cash | 143 | 2.5 | 1 | 143 | 2.5 | 1 | 2.5 | 2.5 | 1/3/2022 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6389 | Processed | VJ300205292 | Brunswick Sq Mall | BSQ Mall x1364 - Zales | Snapple Tea - Lemon | Non Carbonated | 15595167352 | Monday, August 29, 2022 | Credit | 147 | 2.5 | 1 | 147 | 2.5 | 1 | 2.5 | 6.0 | 8/29/2022 |
6390 | Processed | VJ300205292 | Brunswick Sq Mall | BSQ Mall x1364 - Zales | Coca Cola - Zero Sugar | Carbonated | 15595167352 | Monday, August 29, 2022 | Credit | 140 | 1.5 | 1 | 140 | 1.5 | 1 | 1.5 | 6.0 | 8/29/2022 |
6391 | Processed | VJ300205292 | Brunswick Sq Mall | BSQ Mall x1364 - Zales | KitKat - Crisp Wafers | Food | 15595167352 | Monday, August 29, 2022 | Credit | 130 | 2.0 | 1 | 130 | 2.0 | 1 | 2.0 | 6.0 | 8/29/2022 |
6392 | Processed | VJ300205292 | Brunswick Sq Mall | BSQ Mall x1364 - Zales | KitKat - Crisp Wafers | Food | 15595370548 | Monday, August 29, 2022 | Credit | 130 | 2.0 | 1 | 130 | 2.0 | 1 | 2.0 | 2.0 | 8/29/2022 |
6393 | Processed | VJ300205292 | Brunswick Sq Mall | BSQ Mall x1364 - Zales | KitKat - Crisp Wafers | Food | 15595531294 | Monday, August 29, 2022 | Credit | 130 | 2.0 | 2 | 130 | 2.0 | 2 | 4.0 | 4.0 | 8/29/2022 |
1023 rows × 18 columns
We can also use Boolean indexing with a Series. Here we are keeping only those values in date_series
where the corresponding value is True
in the Boolean index. Phrased another way, the Boolean index tells us which values to keep in date_series
.
date_series[date_series.str.find("Monday") != -1]
6 Monday, January 3, 2022
7 Monday, January 3, 2022
8 Monday, January 3, 2022
9 Monday, January 3, 2022
10 Monday, January 3, 2022
...
6389 Monday, August 29, 2022
6390 Monday, August 29, 2022
6391 Monday, August 29, 2022
6392 Monday, August 29, 2022
6393 Monday, August 29, 2022
Name: TransDate, Length: 1023, dtype: object
We want to remove repetitions. The most natural way to do this is with the unique
method.
date_series[date_series.str.find("Monday") != -1].unique()
array(['Monday, January 3, 2022', 'Monday, January 10, 2022',
'Monday, January 17, 2022', 'Monday, January 24, 2022',
'Monday, January 31, 2022', 'Monday, February 7, 2022',
'Monday, February 14, 2022', 'Monday, February 21, 2022',
'Monday, February 28, 2022', 'Monday, March 7, 2022',
'Monday, March 14, 2022', 'Monday, March 21, 2022',
'Monday, March 28, 2022', 'Monday, April 4, 2022',
'Monday, April 11, 2022', 'Monday, April 18, 2022',
'Monday, April 25, 2022', 'Monday, May 2, 2022',
'Monday, May 9, 2022', 'Monday, May 16, 2022',
'Monday, May 23, 2022', 'Monday, May 30, 2022',
'Monday, June 6, 2022', 'Monday, June 13, 2022',
'Monday, June 20, 2022', 'Monday, June 27, 2022',
'Monday, July 4, 2022', 'Monday, July 11, 2022',
'Monday, July 18, 2022', 'Monday, July 25, 2022',
'Monday, August 1, 2022', 'Monday, August 8, 2022',
'Monday, August 15, 2022', 'Monday, August 22, 2022',
'Monday, August 29, 2022'], dtype=object)
Just as an example, here we convert the above array into a Python list.
list(date_series[date_series.str.find("Monday") != -1].unique())
['Monday, January 3, 2022',
'Monday, January 10, 2022',
'Monday, January 17, 2022',
'Monday, January 24, 2022',
'Monday, January 31, 2022',
'Monday, February 7, 2022',
'Monday, February 14, 2022',
'Monday, February 21, 2022',
'Monday, February 28, 2022',
'Monday, March 7, 2022',
'Monday, March 14, 2022',
'Monday, March 21, 2022',
'Monday, March 28, 2022',
'Monday, April 4, 2022',
'Monday, April 11, 2022',
'Monday, April 18, 2022',
'Monday, April 25, 2022',
'Monday, May 2, 2022',
'Monday, May 9, 2022',
'Monday, May 16, 2022',
'Monday, May 23, 2022',
'Monday, May 30, 2022',
'Monday, June 6, 2022',
'Monday, June 13, 2022',
'Monday, June 20, 2022',
'Monday, June 27, 2022',
'Monday, July 4, 2022',
'Monday, July 11, 2022',
'Monday, July 18, 2022',
'Monday, July 25, 2022',
'Monday, August 1, 2022',
'Monday, August 8, 2022',
'Monday, August 15, 2022',
'Monday, August 22, 2022',
'Monday, August 29, 2022']
An alternative approach would be to convert the original Series (with the repetitions) into a set, because sets do not have repetitions. (Two important things to memorize about a set are that they do not have repetitions, and that they are not ordered.)
# sets do not have repetitions, so don't need unique in this part
set(date_series[date_series.str.find("Monday") != -1])
{'Monday, April 11, 2022',
'Monday, April 18, 2022',
'Monday, April 25, 2022',
'Monday, April 4, 2022',
'Monday, August 1, 2022',
'Monday, August 15, 2022',
'Monday, August 22, 2022',
'Monday, August 29, 2022',
'Monday, August 8, 2022',
'Monday, February 14, 2022',
'Monday, February 21, 2022',
'Monday, February 28, 2022',
'Monday, February 7, 2022',
'Monday, January 10, 2022',
'Monday, January 17, 2022',
'Monday, January 24, 2022',
'Monday, January 3, 2022',
'Monday, January 31, 2022',
'Monday, July 11, 2022',
'Monday, July 18, 2022',
'Monday, July 25, 2022',
'Monday, July 4, 2022',
'Monday, June 13, 2022',
'Monday, June 20, 2022',
'Monday, June 27, 2022',
'Monday, June 6, 2022',
'Monday, March 14, 2022',
'Monday, March 21, 2022',
'Monday, March 28, 2022',
'Monday, March 7, 2022',
'Monday, May 16, 2022',
'Monday, May 2, 2022',
'Monday, May 23, 2022',
'Monday, May 30, 2022',
'Monday, May 9, 2022'}
Use
contains
.
Here we will see what I think is a more natural approach. This contains
method is special, because it is not a method defined on strings themselves.
dir('Monday, May 16, 2022')
['__add__',
'__class__',
'__contains__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__getnewargs__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__mod__',
'__mul__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rmod__',
'__rmul__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'capitalize',
'casefold',
'center',
'count',
'encode',
'endswith',
'expandtabs',
'find',
'format',
'format_map',
'index',
'isalnum',
'isalpha',
'isascii',
'isdecimal',
'isdigit',
'isidentifier',
'islower',
'isnumeric',
'isprintable',
'isspace',
'istitle',
'isupper',
'join',
'ljust',
'lower',
'lstrip',
'maketrans',
'partition',
'removeprefix',
'removesuffix',
'replace',
'rfind',
'rindex',
'rjust',
'rpartition',
'rsplit',
'rstrip',
'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'zfill']
Here we verify that "contains"
is not in the above list of string attributes and methods.
"contains" in dir('Monday, May 16, 2022')
False
On the other hand, "contains"
is a method available on date_series.str
.
"contains" in dir(date_series.str)
True
Aside: The main reason contains
is not defined on strings is because you don’t need that method, you can use the in
operator instead. The following says that "stop"
appears as a substring in "Christopher"
.
"stop" in "Christopher"
True
Here is a reminder of what date_series
looks like.
date_series
0 Saturday, January 1, 2022
1 Saturday, January 1, 2022
2 Saturday, January 1, 2022
3 Saturday, January 1, 2022
4 Saturday, January 1, 2022
...
6440 Wednesday, August 31, 2022
6441 Wednesday, August 31, 2022
6442 Wednesday, August 31, 2022
6443 Wednesday, August 31, 2022
6444 Wednesday, August 31, 2022
Name: TransDate, Length: 6445, dtype: object
We can get a Boolean Series corresponding to which of these strings contain the word "January"
. (We’ll switch to "Monday"
below.)
date_series.str.contains("January")
0 True
1 True
2 True
3 True
4 True
...
6440 False
6441 False
6442 False
6443 False
6444 False
Name: TransDate, Length: 6445, dtype: bool
We can count the number of strings containing “January” by adding up all these True
and False
terms. This sum will be the number of True
s that occur, which is the same as the number of strings containing the word “January”.
date_series.str.contains("January").sum()
482
Here we make two changes. First, we switch from “January” to “Monday”. Second, we use Boolean indexing to get only the days in date_series
which correspond to Monday.
date_series[date_series.str.contains("Monday")]
6 Monday, January 3, 2022
7 Monday, January 3, 2022
8 Monday, January 3, 2022
9 Monday, January 3, 2022
10 Monday, January 3, 2022
...
6389 Monday, August 29, 2022
6390 Monday, August 29, 2022
6391 Monday, August 29, 2022
6392 Monday, August 29, 2022
6393 Monday, August 29, 2022
Name: TransDate, Length: 1023, dtype: object
We can again get the unique values by calling the unique
method. I recommend comparing this code to the above code using find
. They are pretty similar in terms of length, but this code is easier to read, because the contains
method immediately tells us True
or False
, unlike the find
method which tells us an index.
date_series[date_series.str.contains("Monday")].unique()
array(['Monday, January 3, 2022', 'Monday, January 10, 2022',
'Monday, January 17, 2022', 'Monday, January 24, 2022',
'Monday, January 31, 2022', 'Monday, February 7, 2022',
'Monday, February 14, 2022', 'Monday, February 21, 2022',
'Monday, February 28, 2022', 'Monday, March 7, 2022',
'Monday, March 14, 2022', 'Monday, March 21, 2022',
'Monday, March 28, 2022', 'Monday, April 4, 2022',
'Monday, April 11, 2022', 'Monday, April 18, 2022',
'Monday, April 25, 2022', 'Monday, May 2, 2022',
'Monday, May 9, 2022', 'Monday, May 16, 2022',
'Monday, May 23, 2022', 'Monday, May 30, 2022',
'Monday, June 6, 2022', 'Monday, June 13, 2022',
'Monday, June 20, 2022', 'Monday, June 27, 2022',
'Monday, July 4, 2022', 'Monday, July 11, 2022',
'Monday, July 18, 2022', 'Monday, July 25, 2022',
'Monday, August 1, 2022', 'Monday, August 8, 2022',
'Monday, August 15, 2022', 'Monday, August 22, 2022',
'Monday, August 29, 2022'], dtype=object)
Use
day_name()
.
Here is another approach to the same problem, this time using datetime values.
Here is a common mistake. The dt
attribute cannot be called on date_series
, which is a pandas Series of strings.
date_series.dt.day_name()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [25], line 1
----> 1 date_series.dt.day_name()
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/generic.py:5461, in NDFrame.__getattr__(self, name)
5454 # Note: obj.x will always call obj.__getattribute__('x') prior to
5455 # calling obj.__getattr__('x').
5456 if (
5457 name in self._internal_names_set
5458 or name in self._metadata
5459 or name in self._accessors
5460 ):
-> 5461 return object.__getattribute__(self, name)
5462 else:
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/accessor.py:180, in CachedAccessor.__get__(self, obj, cls)
177 if obj is None:
178 # we're accessing the attribute of the class, i.e., Dataset.geo
179 return self._accessor
--> 180 accessor_obj = self._accessor(obj)
181 # Replace the property with the accessor object. Inspired by:
182 # https://www.pydanny.com/cached-property.html
183 # We need to use object.__setattr__ because we overwrite __setattr__ on
184 # NDFrame
185 object.__setattr__(obj, self._name, accessor_obj)
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/indexes/accessors.py:494, in CombinedDatetimelikeProperties.__new__(cls, data)
491 elif is_period_dtype(data.dtype):
492 return PeriodProperties(data, orig)
--> 494 raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
Instead, we need to first convert these strings to Timestamps, using pd.to_datetime
.
pd.to_datetime(date_series).dt.day_name()
0 Saturday
1 Saturday
2 Saturday
3 Saturday
4 Saturday
...
6440 Wednesday
6441 Wednesday
6442 Wednesday
6443 Wednesday
6444 Wednesday
Name: TransDate, Length: 6445, dtype: object
We now produce a Boolean Series indicating which days are Monday.
# Boolean Series
pd.to_datetime(date_series).dt.day_name() == "Monday"
0 False
1 False
2 False
3 False
4 False
...
6440 False
6441 False
6442 False
6443 False
6444 False
Name: TransDate, Length: 6445, dtype: bool
We now can use Boolean indexing to get only the Mondays out of date_series
.
date_series[pd.to_datetime(date_series).dt.day_name() == "Monday"]
6 Monday, January 3, 2022
7 Monday, January 3, 2022
8 Monday, January 3, 2022
9 Monday, January 3, 2022
10 Monday, January 3, 2022
...
6389 Monday, August 29, 2022
6390 Monday, August 29, 2022
6391 Monday, August 29, 2022
6392 Monday, August 29, 2022
6393 Monday, August 29, 2022
Name: TransDate, Length: 1023, dtype: object
We can again use the unique
method to get rid of the repetitions.
date_series[pd.to_datetime(date_series).dt.day_name() == "Monday"].unique()
array(['Monday, January 3, 2022', 'Monday, January 10, 2022',
'Monday, January 17, 2022', 'Monday, January 24, 2022',
'Monday, January 31, 2022', 'Monday, February 7, 2022',
'Monday, February 14, 2022', 'Monday, February 21, 2022',
'Monday, February 28, 2022', 'Monday, March 7, 2022',
'Monday, March 14, 2022', 'Monday, March 21, 2022',
'Monday, March 28, 2022', 'Monday, April 4, 2022',
'Monday, April 11, 2022', 'Monday, April 18, 2022',
'Monday, April 25, 2022', 'Monday, May 2, 2022',
'Monday, May 9, 2022', 'Monday, May 16, 2022',
'Monday, May 23, 2022', 'Monday, May 30, 2022',
'Monday, June 6, 2022', 'Monday, June 13, 2022',
'Monday, June 20, 2022', 'Monday, June 27, 2022',
'Monday, July 4, 2022', 'Monday, July 11, 2022',
'Monday, July 18, 2022', 'Monday, July 25, 2022',
'Monday, August 1, 2022', 'Monday, August 8, 2022',
'Monday, August 15, 2022', 'Monday, August 22, 2022',
'Monday, August 29, 2022'], dtype=object)
Which location had the lowest average sale price?#
It’s surprising how rarely in Math 10 we will write functions using def
and how rarely we will use for loops, but these are essential parts of any programming language, and it is important to practice with them.
Write a function
ave_sale
which takes as input a location (like the string “GuttenPlans”) and as output returns the average sale price (from the “RPrice” column) for transactions at that location in the vending machines dataset.
We start out making this computation for a particular location.
s = "GuttenPlans"
Like above, here we are making a Boolean Series. You could imagine that df["Location"] == s
would just return False
, since the Series on the left is not equal to the string on the right, but instead pandas uses what is called broadcasting to instead compare each individual value to s
. That is how we end up with the Boolean Series displayed below.
df["Location"] == s
0 False
1 False
2 False
3 False
4 False
...
6440 False
6441 False
6442 False
6443 True
6444 False
Name: Location, Length: 6445, dtype: bool
Again, we can use Boolean indexing with the above Boolean Series. We keep the rows corresponding to True
in the above Boolean Series. Notice how every location listed in this DataFrame is "GuttenPlans"
.
df[df["Location"] == s]
Status | Device ID | Location | Machine | Product | Category | Transaction | TransDate | Type | RCoil | RPrice | RQty | MCoil | MPrice | MQty | LineTotal | TransTotal | Prcd Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Monster Energy Original | Carbonated | 14519670154 | Monday, January 3, 2022 | Credit | 144 | 3.0 | 1 | 144 | 3.0 | 1 | 3.0 | 3.0 | 1/3/2022 |
10 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Snapple Diet Tea - Lemon | Non Carbonated | 14520522827 | Monday, January 3, 2022 | Cash | 143 | 2.5 | 1 | 143 | 2.5 | 1 | 2.5 | 2.5 | 1/3/2022 |
11 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Skinny Pop Popcorn | Food | 14520523909 | Monday, January 3, 2022 | Cash | 111 | 1.5 | 1 | 111 | 1.5 | 1 | 1.5 | 1.5 | 1/3/2022 |
12 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Stretch Island Fruit Leathers Snacks - Variet | Food | 14520526471 | Monday, January 3, 2022 | Cash | 131 | 1.0 | 1 | 131 | 1.0 | 1 | 1.0 | 1.0 | 1/3/2022 |
14 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Skinny Pop Popcorn | Food | 14520565434 | Monday, January 3, 2022 | Cash | 111 | 1.5 | 1 | 111 | 1.5 | 1 | 1.5 | 1.5 | 1/3/2022 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6418 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Fritos - Original | Food | 15598584893 | Tuesday, August 30, 2022 | Cash | 125 | 1.5 | 1 | 125 | 1.5 | 1 | 1.5 | 1.5 | 8/30/2022 |
6419 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Doritos Famin Hot Nacho | Food | 15598991940 | Tuesday, August 30, 2022 | Cash | 112 | 1.5 | 1 | 112 | 1.5 | 1 | 1.5 | 1.5 | 8/30/2022 |
6432 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Cheetos - Fleming Hot Crunchy | Food | 15600267402 | Tuesday, August 30, 2022 | Cash | 122 | 1.5 | 1 | 122 | 1.5 | 1 | 1.5 | 1.5 | 8/30/2022 |
6434 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Fritos - Original | Food | 15600761938 | Tuesday, August 30, 2022 | Cash | 125 | 1.5 | 1 | 125 | 1.5 | 1 | 1.5 | 1.5 | 8/30/2022 |
6443 | Processed | VJ300320609 | GuttenPlans | GuttenPlans x1367 | Snapple Tea - Lemon | Non Carbonated | 15603853105 | Wednesday, August 31, 2022 | Credit | 145 | 2.5 | 1 | 145 | 2.5 | 1 | 2.5 | 2.5 | 8/31/2022 |
2568 rows × 18 columns
From the DataFrame displayed above, we can now get the "RPrice"
column.
df[df["Location"] == s]["RPrice"]
8 3.0
10 2.5
11 1.5
12 1.0
14 1.5
...
6418 1.5
6419 1.5
6432 1.5
6434 1.5
6443 2.5
Name: RPrice, Length: 2568, dtype: float64
An alternative, that is equally good, is to start with the "RPrice"
column, and then apply Boolean indexing to that. This is the approach taken in the next cell. It should create the exact same Series.
df["RPrice"][df["Location"] == s]
8 3.0
10 2.5
11 1.5
12 1.0
14 1.5
...
6418 1.5
6419 1.5
6432 1.5
6434 1.5
6443 2.5
Name: RPrice, Length: 2568, dtype: float64
Once we have this Series, we can compute the average of the Series using the mean
method.
df["RPrice"][df["Location"] == s].mean()
1.8850272585669783
If you didn’t know that the mean
method existed, you could instead use the sum
method and then divide by the length. Make sure you are dividing by the length of this “GuttenPlans” Series, and not the length of the original DataFrame or the original full column df["RPrice"]
.
sub_series = df["RPrice"][df["Location"] == s]
sub_series.sum()/len(sub_series)
1.8850272585669783
Now that we have seen how to get the desired answer for one particular location, it is easy to turn this into a general function. (Be sure to practice writing this function syntax on your own. It will be very difficult to remember unless you try writing it on your own, without looking at a sample.) In this case, we are naming the input string loc
, so we replace s
in the above formula by loc
. The variable could just as well have been called something like x
instead of loc
.
def ave_sale(loc):
return df["RPrice"][df["Location"] == loc].mean()
We get the same answer as above, which is good, but we should be careful to also test the function on other values.
ave_sale("GuttenPlans")
1.8850272585669783
The fact that the function also works on another location, and gives us a distinct answer, is a good sign.
ave_sale("Brunswick Sq Mall")
1.9276778733385458
Define the same function, this time using a
lambda
function.
Notice how our answer above did not involve any intermediate compuations; the whole formula fit on a single line.
# Full definition syntax
def ave_sale(loc):
return df["RPrice"][df["Location"] == loc].mean()
For that kind of short function definition, it is often more elegant to define it using what is called a lambda
function. In this case, the term lambda
is telling Python that a function is going to be defined. (This is like the @
syntax in Matlab, for defining anonymous functions.) The part that comes before the colon lists the zero or more input variables, and the part that comes after the colon is the returned value. Notice how in the def
notation, we need to use the return
operator, but in this lambda
function syntax, we do not explicitly use the word return
.
# lambda function definition
ave_sale2 = lambda loc: df["RPrice"][df["Location"] == loc].mean()
Let’s check that this ave_sale2
gives the same answer as above.
ave_sale2("Brunswick Sq Mall")
1.9276778733385458
For each location, display the name of that location together with the average price. Use a for loop.
Here we are going to iterate through each unique location.
df["Location"].unique()
array(['Brunswick Sq Mall', 'Earle Asphalt', 'GuttenPlans',
'EB Public Library'], dtype=object)
Aside: I wasn’t sure if this was a NumPy array (more common) or a pandas array (less common). It turns out, this output of the unique
method is a NumPy array.
type(df["Location"].unique())
numpy.ndarray
The fact that it is a NumPy array is not that important. The important thing is that we can iterate through these values. Here we just print them out.
for loc in df["Location"].unique():
print(loc)
Brunswick Sq Mall
Earle Asphalt
GuttenPlans
EB Public Library
If we tried the same thing without calling unique
, we would get over 6000 locations displayed, because repetitions are not being removed.
for loc in df["Location"]:
print(loc)
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
Earle Asphalt
GuttenPlans
GuttenPlans
EB Public Library
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
Brunswick Sq Mall
Earle Asphalt
Earle Asphalt
GuttenPlans
Earle Asphalt
EB Public Library
Earle Asphalt
EB Public Library
Earle Asphalt
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Earle Asphalt
EB Public Library
Brunswick Sq Mall
GuttenPlans
GuttenPlans
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
Brunswick Sq Mall
EB Public Library
GuttenPlans
Brunswick Sq Mall
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
Brunswick Sq Mall
EB Public Library
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
EB Public Library
EB Public Library
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
EB Public Library
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
Brunswick Sq Mall
Earle Asphalt
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Earle Asphalt
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Earle Asphalt
EB Public Library
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
Brunswick Sq Mall
EB Public Library
Brunswick Sq Mall
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
EB Public Library
Earle Asphalt
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
EB Public Library
Brunswick Sq Mall
Earle Asphalt
EB Public Library
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
Brunswick Sq Mall
EB Public Library
GuttenPlans
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
Earle Asphalt
Brunswick Sq Mall
Earle Asphalt
EB Public Library
EB Public Library
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
Earle Asphalt
EB Public Library
GuttenPlans
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
Brunswick Sq Mall
Earle Asphalt
Earle Asphalt
Brunswick Sq Mall
EB Public Library
Earle Asphalt
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
Earle Asphalt
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
Earle Asphalt
EB Public Library
Brunswick Sq Mall
EB Public Library
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
GuttenPlans
EB Public Library
GuttenPlans
Earle Asphalt
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Brunswick Sq Mall
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
Earle Asphalt
Earle Asphalt
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
GuttenPlans
GuttenPlans
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
GuttenPlans
GuttenPlans
Earle Asphalt
EB Public Library
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
Brunswick Sq Mall
Brunswick Sq Mall
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
GuttenPlans
GuttenPlans
GuttenPlans
GuttenPlans
Earle Asphalt
EB Public Library
EB Public Library
Brunswick Sq Mall
EB Public Library
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
EB Public Library
EB Public Library
Brunswick Sq Mall
GuttenPlans
Brunswick Sq Mall
GuttenPlans
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
Earle Asphalt
EB Public Library
EB Public Library
EB Public Library
GuttenPlans
EB Public Library
There are more elegant ways to do this, but for now, we will print the location and then the corresponding average sale.
for loc in df["Location"].unique():
print(loc)
print(ave_sale(loc))
Brunswick Sq Mall
1.9276778733385458
Earle Asphalt
1.7566568047337279
GuttenPlans
1.8850272585669783
EB Public Library
2.0418834547346516
Do the same thing, this time also using f-strings to display the information in a more readable format.
Postponed. We’ll see later, maybe in Week 2, a more elegant way to display the same information.
Put the same information into a dictionary, where the keys are the locations and where the values are the average sale prices.
We start by making an empty dictionary. (It’s not obvious that {}
should make an empty dictionary as opposed to an empty set.)
d = {}
Here we verify that d
really is a dictionary.
type(d)
dict
Even though we have d
defined above, it’s a good idea to put that definition into the same cell as our for loop, so that if we make a mistake, we can reset d
just by evaluating this single cell. Recall the syntax for setting a value in a Python dictionary: d[key] = value
.
d = {}
for loc in df["Location"].unique():
d[loc] = ave_sale(loc)
Here is the contents of d
.
d
{'Brunswick Sq Mall': 1.9276778733385458,
'Earle Asphalt': 1.7566568047337279,
'GuttenPlans': 1.8850272585669783,
'EB Public Library': 2.0418834547346516}
In a dictionary is a very convenient way to store this data, because we can access the values using indexing. For example, d["GuttenPlans"]
is equal to the average price of the sales at the GuttenPlans location.
d["GuttenPlans"]
1.8850272585669783
Which location had the lowest average sale price? Answer this by converting the dictionary to a pandas Series, and then sorting the values and getting the zeroth index.
Here is a reminder of what d
looks like.
d
{'Brunswick Sq Mall': 1.9276778733385458,
'Earle Asphalt': 1.7566568047337279,
'GuttenPlans': 1.8850272585669783,
'EB Public Library': 2.0418834547346516}
It really is a dictionary.
type(d)
dict
Here is a first attempt to convert it to a dictionary. Python does not know where to look for the definition of Series
, because it is not a type defined in base Python.
Series(d)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In [57], line 1
----> 1 Series(d)
NameError: name 'Series' is not defined
Here again an error is raised, because this data type is case sensitive; it should be pd.Series
.
pd.series(d)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [58], line 1
----> 1 pd.series(d)
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/__init__.py:244, in __getattr__(name)
240 from pandas.core.arrays.sparse import SparseArray as _SparseArray
242 return _SparseArray
--> 244 raise AttributeError(f"module 'pandas' has no attribute '{name}'")
AttributeError: module 'pandas' has no attribute 'series'
This works. Notice how the dictionary d
has been converted into a pandas Series.
pd.Series(d)
Brunswick Sq Mall 1.927678
Earle Asphalt 1.756657
GuttenPlans 1.885027
EB Public Library 2.041883
dtype: float64
If we want to find the location with the lowest average sale price, we can first sort the Series according to the values. (By default, sorting is done in increasing order.)
pd.Series(d).sort_values()
Earle Asphalt 1.756657
GuttenPlans 1.885027
Brunswick Sq Mall 1.927678
EB Public Library 2.041883
dtype: float64
We want the zeroth element in the corresponding index. Here is the index.
pd.Series(d).sort_values().index
Index(['Earle Asphalt', 'GuttenPlans', 'Brunswick Sq Mall',
'EB Public Library'],
dtype='object')
Here is the initial element in that index. (This would not work without the sort_values
call.)
# lowest average sale price
pd.Series(d).sort_values().index[0]
'Earle Asphalt'
If instead you wanted the initial value, instead of the initial key, then you should use iloc
.
pd.Series(d).sort_values().iloc[0]
1.7566568047337279