Worksheet 3#
This worksheet is due Monday night of Week 3. You are encouraged to work in groups of up to 3 total students, but each student should submit their own file. (It’s fine for everyone in the group to upload the same file.)
These questions refer to the attached vending machines csv file, vend.csv
.
Load the file as a pandas DataFrame using
pd.read_csv
and store it as the variabledf
. (You will need to importpandas
first.)
How many rows are there in this DataFrame? How many columns? Use the
shape
attribute. (When we refer to something as an attribute, it usually means we will not be using parentheses with it. Methods are like functions and attributes are like variables. Both methods and attributes are attached to an object and are accessed using a period.
.)
Using the
dtypes
attribute of this DataFrame, check how the data type of the “Location” column is represented. Among all the columns, what different data types are listed?
Access the row at integer location
2420
usingiloc
and square brackets. Store this in the variablex
.
Using the Python built-in function
type
, what is the data type ofx
?
What is the value of
x.loc["Location"]
? Is there any difference if you usex["Location"]
? What aboutx("Location")
?
What is the
type
ofx.loc["Location"]
? (Notice how this type was not directly reported to us by pandas when we used thedtypes
attribute. When something is reported as having “object” as its dtype, I usually assume it is a string, but it could also be something else, like a list.)
Using Boolean indexing, define
df_sub
to be the sub-DataFrame containing all the transactions from this same location.
How many rows in the original DataFrame correspond to this location? Set the variable
a
to be equal to this integer. (Check. It should be between 600 and 700.)
What values of
b
andc
are such thatdf_sub.loc[13, "Transaction"]
is equal todf_sub.iloc[b,c]
? (Remember that counting in Python starts at 0. I don’t intend you to have a computer code way of finding these values. Just look atdf_sub
and check.) Store these values.
There was exactly one transaction in
df_sub
where the “RPrice” was1.5
and where “RQty” was2
(meaning two items were sold in the same transaction). What was the name of that product (i.e., the value in the “Product” column? Store that string with the variabled
. (Be sure your answer is exact, including spacing and capitalization.)
There is exactly one row in
df
where the"RPrice"
is not equal to the"MPrice"
. What is the index of that row? Sete
to be equal to that index. (The index is the number that’s displayed all the way on the left. You can access the index by using theindex
attribute. To check whether two elements are not equal, you can use!=
. Another option is to check for equality and then to negate it using tilde~
.)
Put these five values (four integers and one string) into a tuple,
my_tuple = (a,b,c,d,e)
.Save
my_tuple
in a pickle file named"wkst3-ans.pickle"
using the following code. Submit that file on Canvas as your submission for Worksheet 3.
import pickle
with open("wkst3-ans.pickle", 'wb') as f:
pickle.dump(my_tuple, f)
If you want to double-check that this
"wkst3-ans.pickle"
pickle file really contains your answer, you can run the following code. If you then evaluate or printx
, you should see your originalmy_tuple
values. (If you’re in a new notebook, you also need to import the pickle module again.)
with open("wkst3-ans.pickle", 'rb') as f:
x = pickle.load(f)