Worksheet 5#
This worksheet is due Monday night of the following week. You are encouraged to work in groups of up to 3 total students, but each student should submit their own file. (It’s fine for everyone in the group to upload the same file.)
These questions refer to the attached vending machines csv file, data_groups.csv
. This is a very nice (artificial) dataset for demonstrating the usefulness of data visualization.
Put the full names of everyone in your group (even if you’re working alone) here. (This makes grading easier.)
Names:
Part 1 - Without data visualization#
Load the data using pandas as a DataFrame stored as
df
and look at the first few rows.
Look over the following code (and its output). (You don’t need to write anything here, but for a quiz or midterm, you should be able to understand this kind of code.)
groups = sorted(df["grouping"].unique())
for gp in groups:
df_sub = df[df["grouping"] == gp]
print("The group is", gp) # if you know f-strings, those could be used instead
print("The mean of x is", df_sub["x"].mean(),
"The standard deviation of x is", df_sub["x"].std())
print("The mean of y is", df_sub["y"].mean(),
"The standard deviation of y is", df_sub["y"].std())
print()
Here is another way to get the same
mean
information. Look over this code and its output and the output above, and see how they’re related. (Again, you don’t need to write anything for this part.)
df.groupby("grouping").mean()
How do you think you get the same standard deviation information using
groupby
? Try it, and check that the information does match what we got using the for loop.
In a markdown cell, describe, what is the main takeaway from the above outputs, in terms of what they tell you about the dataset? Feel free to make more computations if they are helpful.
Part 2: Visualizing the data all at once#
Using Altair, plot the data in
df
using a scatter plot (mark_circle
). Encode the “x” column in the x-axis, the “y” column in the y-axis, and the “grouping” in the color. Use:N
after specifying “grouping” to tell Altair that these groupings are categories, not quantitative values.
(The resulting plot should look like a mess.)
Part 3: Visualizing the 13 groupings individually#
We will see that these 13 groups of data are fundamentally different by plotting them separately. We don’t have to do this by hand. Altair, Seaborn, and Plotly Express all have their own way to do this separate plotting automatically.
Using Altair#
Make a facet chart in Altair by taking the same
Chart
definition as above, and includingrow="grouping:N"
within the encoding. This tells Altair to put each grouping in its own row. Reference 1 (but don’t look at the first for loop example, scroll below that). Reference 2
Using Seaborn#
Make the same facet chart, but using Seaborn. Reference 1 (Scroll down to the example…
tips
is the name of their DataFrame, so you should usedf
instead oftips
. It seems like thehue
keyword argument should be used withsns.FacetGrid
, while thex
andy
arguments should be used withmap
). Reference 2
Using Plotly Express#
Make the same facet chart using Plotly Express. Reference (To get it to look similar to the Altair and Seaborn versions, I added the keyword argument
height=4000
to thepx.scatter
function.)
Submission#
Using the
Share
button at the top right, enable public sharing, and enable Comment privileges. Then submit the created link on Canvas.