Worksheet 4#
This worksheet is due Monday night of Week 3. You are encouraged to work in groups of up to 3 total students, but each student should submit their own file. (It’s fine for everyone in the group to upload the same file.)
These questions refer to the attached vending machines csv file, vend.csv.
Goal#
The goal of this worksheet is to make a 8-by-3 pandas DataFrame. The columns for this DataFrame will be named “Month”, “transactions” and “total price”. The entries will correspond to the number of transactions in that month, as well as the combined price for all those transactions.
Load the attached
vend.csvdataset usingpd.read_csv, and store it with the variable namedf.
Look at the first few rows of
dfusing theheadmethod.
Which of the columns contains date-like values? Convert this column to actual date values (i.e., values that are recognized as dates by pandas) using the pandas function
to_datetime. (You shouldn’t need to use a for loop or anything like that, just use the entire column as an input to the pandasto_datetimefunction.) Save the resulting pandas Series as a new column indf, named"Date".
Get the corresponding month name for each date, by using the
dtaccessor and the methodmonth_nameon the new “Date” column. Save this as a new column indf, named"Month".
Write a function
month_countwhich takes as input a month name like “March” and as output returns how many transactions occurred in this month (use Boolean indexing… there are other ways, but we’ll use a similar approach below to get the total pr).
For example, the output for “March” should be 633.
Write a function
total_pricewhich takes as input a month name like “March” and as output returns the sum of the “RPrice” value for all transactions occuring in that month.
For example, the output for “March” should be 1117.75.
Make an empty DataFrame, that will eventually be the submission for this worksheet. Use
pd.DataFrame(), nothing inside the parentheses. Name this empty DataFramedf_out.
Put a “Month” column in
df_outcontaining the months fromdf, each listed one time. (Usedf["Month"].unique()to get each value one time.)
Apply the
month_countfunction to each value indf_out["Month"]by callingdf_out["Month"].map(month_count). Store the result as a new column indf_outnamed"transactions".
Using the same strategy, make a column named
"total price"indf_out, obtained by applying thetotal_pricefunction to each entry in the"Month"column ofdf_out.
The resulting DataFrame
df_outshould have 8 rows and 3 columns. Save it to a file nameddf_out.csvusing the pandas DataFrame methodto_csv. (You need to give the desired file name as the first argument to theto_csvmethod.) Specify the keyword argumentindex=Falseso that the numbers0through7from the index are not stored.
Submit this csv file on Canvas.