Worksheet 5#

Authors (3 maximum): BLANK

This worksheet is due Tuesday of Week 4, before discussion section. You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same upload.)

Introduction#

The goal of this worksheet is to practice making charts using Altair, Seaborn, and Plotly.

Comment These libraries Altair, Seaborn, and Plotly all have slightly different syntax. For us in Math 10, Altair will be the most important, so in terms of what to try to remember, focus the most on the Altair syntax.

  • Import the penguins dataset from the Seaborn library, using sns.load_dataset. (You can see all the available datasets by using sns.get_dataset_names().) Save the resulting pandas DataFrame with the variable name df.

Removing rows with missing values#

  • Using list comprehension, make a list of the index labels of the rows which do not have any missing values. If mylist is the name of your list, keep only those rows in df by evaluating df = df.loc[mylist].copy(). (Comment. I think it’s a little more natural to do this using Boolean indexing or especially the custom method dropna. We are taking this approach for practice with list comprehension.)

  • Check your answer: df should now have 333 rows and 7 columns.

  • Aside: Why doesn’t df[mylist].copy() work? Answer in a markdown cell.

Description of the plot#

Choose two of the numeric (quantitative) columns and one of the non-numeric (categorical) columns from this DataFrame. We will draw a scatter plot of this data, encoding one of the numeric columns in the x-coordinate, the other numeric column in the y-coordinate, and encoding the categorical column as the color.

You don’t have to write anything here, but use the same three columns for your Altair plot, your Seaborn plot, and your Plotly plot.

The plot using Altair#

  • Make this scatter plot using Altair. I personally prefer using mark_circle instead of mark_point.

In the documentation, you can ignore the following lines:

from vega_datasets import data
cars = data.cars()

The point of those lines is to get a pandas DataFrame, but we already have that DataFrame from Seaborn.

  • By default, Altair will include zero in the axis limits. Remove that feature by adapting the second example here.

The plot using Seaborn#

  • Make this scatter plot using Seaborn. (I recommend scrolling down to where the examples start.)

The plot using Plotly#

Another type of Chart#

  • Find one more chart, not a scatter plot, you think is interesting, still using the penguins data (but you can use new columns), and plot it below. You can take inspiration from the Altair examples, the Seaborn examples, or the Plotly Express examples. (It’s possible some of the fanciest plots won’t work on Deepnote, so just move on if something seems incompatible.)

Reminder#

Every group member needs to submit this on Canvas (even if you all submit the same link).

Be sure you’ve included the names of you and your group members at the top after “Authors”. You might consider restarting this notebook (the Deepnote refresh icon, not your browser refresh icon) and running it again to make sure it’s working.

Submission#

Using the Share & publish menu at the top right, enable public sharing in the “Share project” section, and enable Comment privileges. Then submit that link on Canvas.

Created in deepnote.com Created in Deepnote