Worksheet 6#

This worksheet is due Monday of next week. You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same submission.)

Put the full names of everyone in your group (even if you’re working alone) here. (This makes grading easier.)

  • Names:

  • Import the attached “Math2B_grades_clean.csv” file, and name the DataFrame df.

  • Using Boolean indexing, find the sub-DataFrame where the course grade (“Total”) is “F” and where the Midterm 2 score is strictly greater than 72. Name this sub-DataFrame df_sub.

  • Using Altair, make a scatter plot using the data from df_sub for which the x-coordinate is “Midterm 2” and the y-coordinate is “Final exam”.

  • Based on the chart, how many rows do you expect are in df_sub? Explain your answer in a markdown cell, and check your answer using pandas.

(This approach isn’t guaranteed to be correct, because hypothetically two points might be on top of each other, or a row could contain missing data.)

Add one or more additional visual channels (don’t use tooltip here) to the chart (but not changing the x or y definitions) so that you can tell which of these students had the lowest score on Quiz 4.

Some options:

  • color (you might want to use the encoding data type :O or :N to make the colors more distinct Reference).

  • size

  • If you change to mark_point, you can use the shape visual channel. I don’t think shape works with quantitative data, so you need to use an encoding data type like :N in this case. I personally prefer using mark_point(filled=True) over mark_point().

  • Explain in a markdown cell how you can tell from the chart which point has the lowest score on Quiz 4.

  • Add a tooltip with “Student_id” so that you can find the student id of the corresponding student.

Your code for this question will also be used at the end of this worksheet.

  • Here is a way to find that same student id using pandas. Can you figure out how the following code works by breaking it up into pieces? (There might be a question based on this code on the next quiz or on the midterm.)

df_sub.set_index("Student_id")["Quiz 4"].idxmin()
  • Why does the following code give a different answer? (Hint. Display df_sub.)

df_sub["Quiz 4"].idxmin()
  • What changes if we use argmin instead of idxmin? What is the difference between these two methods? How does this correspond to what you see in df_sub? Answer in a markdown cell.

  • If you were to encode the “Student_id” value in one of these channels, why would "Student_id:N" make much more sense than "Student_id:Q" or "Student_id:O"? Explain in a markdown cell.

Take your same Altair chart code above (the one where you found the student id using the tooltip) and make the following changes to it.

  • Change from df_sub to the full DataFrame df.

  • Add column="Total" to the encoding.

  • For the student whose student id you found above, where is the corresponding point located in this facet chart? Explain in a markdown cell where is this point and how you can tell. (You should be able to check that you are correct using the tooltip.)

Submission#

  • Reminder: everyone needs to make a submission on Canvas.

  • Reminder: include everyone’s full name at the top, after Names.

  • Using the Share button at the top right, enable public sharing, and enable Comment privileges. Then submit the created link on Canvas.