Worksheet 8#

This worksheet is due Tuesday of Week 5, before discussion section. You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same upload.)

Creating the DataFrame#

  • Import the Math2B_grades_clean.csv file.

  • Choose 30 random rows using the pandas DataFrame sample method. Use the keyword argument random_state with your student id number, so that the results are always the same. (If you are in a group, use the student id number of any one of the group members.) Name this 30-row DataFrame df30.

  • The following image shows the average score for each assignment. Why would it be difficult to produce the following chart using Altair and df30? (Hint. What would you put for the x-axis encoding?)

assignment along x-axis

To fix that, we will create a new longer DataFrame that contains the same data.

  • Use the pandas DataFrame method melt (documentation) to create a new variable column named “Assignment” containing the assignment names and a new value column named “Score” containing the scores. (Do not put the student ids or the final course grades into these new columns.) Name the resulting DataFrame df.

  • Check your answer. The new DataFrame df should have 270 rows and 4 columns. The columns should be named “Student_id”, “Total”, “Assignment”, “Score”.

  • Check your answer. If you evaluate the following code, you should see something very similar to the above image (but the particular scores will be different because of the randomness). (If you want to make the chart look even more similar, you can remove the axis titles by using the keyword argument axis=alt.Axis(title=None) for both the x-axis and y-axis specifications.)

alt.Chart(df).mark_bar().encode(
    x="Assignment",
    y="mean(Score)"
)

Creating the base chart#

The Altair chart we make will be based on the following. (Notice it doesn’t have a mark defined yet, so it will raise an error if you try to plot it.)

base = alt.Chart(df).encode(
    x="Assignment",
    y="Score",
    tooltip=["Student_id", "Assignment", "Score"],
)
  • Evaluate base.mark_circle() to get a sense for the contents.

  • It would be better if the assignments were in the chronological order, rather than alphabetical order. The following dictionary says what date each assignment occurred on.

assignment_dates = {'Final exam': '6/4/22',
 'Midterm 1': '4/20/22',
 'Midterm 2': '5/13/22',
 'Quiz 1': '4/7/22',
 'Quiz 2': '4/14/22',
 'Quiz 3': '4/28/22',
 'Quiz 4': '5/5/22',
 'Quiz 5': '5/26/22',
 'Webwork': '4/8/22'}
  • Using that dictionary, make a list assignment_list = ['Quiz 1', 'Webwork', ...] which lists these assignments in chronological order. (Here is one approach: convert this dictionary to a pandas Series, then convert the dates using pd.to_datetime, then use sort_values. In general in Math 10, it’s never correct to just write out all the entries one at a time.)

  • Update the x-channel in the base definition to specify sort=assignment_list (you will also need to add alt.X to input this keyword argument). Be sure assignment_list is really a list, not something like a pandas Index.

  • If you evaluate base.mark_circle(), it should look similar to above, but with the x-axis in chronological order.

Creating an interactive chart from the base chart#

Fill in appropriate values to the following template.

  • ???id should be Student id that you want to highlight. (This occurs in two places. Any of the 30 student ids is fine to choose, but choose one for which you think the scores are “interesting” in some way.)

  • ???param should be the variable name for the selection object. (This occurs in three places.)

  • ???size should be the size you want for the highlighted student.

  • ???encode should be the encoding abbreviation that makes the most sense for student id numbers.

  • Replace ???facet so that we see a different chart for each course grade.

  • ???description should be a brief description of something you find interesting about the student you highlighted.

  • ???rs should be the random state you used at the beginning (in sample).

highlight = alt.selection(type='single', on='mouseover',
                          fields=['Student_id'], nearest=True,
                          init={'Student_id': ???id})

lines = base.mark_line().encode(
    size=alt.condition(???param, alt.value(???size), alt.value(1)),
    color=alt.condition(???param, "Student_id:???encode", alt.value("lightgray"))
)

points = base.mark_circle(opacity=0).add_selection(
    ???param
)

chart = alt.layer(lines, points).facet(
    ???facet
).resolve_scale(
    color="independent"
).properties(
    title={
      "text": "Student ???id ???description", 
      "subtitle": "The random_state we used was ???rs"
    }
)

chart

Reminder#

Every group member needs to submit this on Canvas (even if you all submit the same file).

Submission#

Save chart as a json file named “wkst8.json” using the following code, and upload that json file on Canvas.

with open("wkst8.json", "w") as f:
    f.write(chart.to_json())
Created in deepnote.com Created in Deepnote