Worksheet 4#
This worksheet is due Monday night of Week 3. You are encouraged to work in groups of up to 3 total students, but each student should submit their own file. (It’s fine for everyone in the group to upload the same file.)
These questions refer to the attached vending machines csv file, vend.csv
.
Goal#
The goal of this worksheet is to make a 8-by-3 pandas DataFrame. The columns for this DataFrame will be named “Month”, “transactions” and “total price”. The entries will correspond to the number of transactions in that month, as well as the combined price for all those transactions.
Load the attached
vend.csv
dataset usingpd.read_csv
, and store it with the variable namedf
.
Look at the first few rows of
df
using thehead
method.
Which of the columns contains date-like values? Convert this column to actual date values (i.e., values that are recognized as dates by pandas) using the pandas function
to_datetime
. (You shouldn’t need to use a for loop or anything like that, just use the entire column as an input to the pandasto_datetime
function.) Save the resulting pandas Series as a new column indf
, named"Date"
.
Get the corresponding month name for each date, by using the
dt
accessor and the methodmonth_name
on the new “Date” column. Save this as a new column indf
, named"Month"
.
Write a function
month_count
which takes as input a month name like “March” and as output returns how many transactions occurred in this month (use Boolean indexing… there are other ways, but we’ll use a similar approach below to get the total pr).
For example, the output for “March” should be 633
.
Write a function
total_price
which takes as input a month name like “March” and as output returns the sum of the “RPrice” value for all transactions occuring in that month.
For example, the output for “March” should be 1117.75
.
Make an empty DataFrame, that will eventually be the submission for this worksheet. Use
pd.DataFrame()
, nothing inside the parentheses. Name this empty DataFramedf_out
.
Put a “Month” column in
df_out
containing the months fromdf
, each listed one time. (Usedf["Month"].unique()
to get each value one time.)
Apply the
month_count
function to each value indf_out["Month"]
by callingdf_out["Month"].map(month_count)
. Store the result as a new column indf_out
named"transactions"
.
Using the same strategy, make a column named
"total price"
indf_out
, obtained by applying thetotal_price
function to each entry in the"Month"
column ofdf_out
.
The resulting DataFrame
df_out
should have 8 rows and 3 columns. Save it to a file nameddf_out.csv
using the pandas DataFrame methodto_csv
. (You need to give the desired file name as the first argument to theto_csv
method.) Specify the keyword argumentindex=False
so that the numbers0
through7
from the index are not stored.
Submit this csv file on Canvas.