Plots from the Spotify dataset

A chart with all the data

  • Import the attached Spotify dataset as df. In this csv file, missing values are denoted by a blank space. Use the na_values keyword argument with pd.read_csv so that those blank spaces get converted to np.nan.

import pandas as pd
import altair as alt
df = pd.read_csv("../data/spotify_dataset.csv", na_values=" ")
  • Check your work by evaluating value_counts on df.dtypes. If everything worked correctly, there should be 11 float columns, 3 integer columns, and 9 object columns.

df.dtypes.value_counts()
float64    11
object      9
int64       3
dtype: int64
  • Plot the data from df using Altair. Encode the “Acousticness” data as the x-coordinate, the “Energy” data as the y-coordinate, and encode the “Valence” data as the color.

  • Use this method from the Altair documentation to adjust the color scheme used. This will be using the dark2 color scheme.

alt.Chart(df).mark_circle().encode(
    x="Acousticness",
    y="Energy",
    color=alt.Color('Valence', scale=alt.Scale(scheme='dark2'))
)
  • Change the color scheme from dark2 to a different one of these options (scroll down to find the options).

  • Add a tooltip to the chart, indicating the Artist name and the song name.

alt.Chart(df).mark_circle().encode(
    x="Acousticness",
    y="Energy",
    color=alt.Color('Valence', scale=alt.Scale(scheme='turbo'))
)

A chart with the 50 most frequently occurring artists

  • Define a new variable s containing the pandas Series corresponding to the “Artist” column in df.

s = df["Artist"]
  • Call the value_counts method on s.

s.value_counts()
Taylor Swift                     52
Lil Uzi Vert                     32
Justin Bieber                    32
Juice WRLD                       30
Pop Smoke                        29
                                 ..
Chris Brown, Young Thug           1
Rauw Alejandro, J Balvin          1
347aidan                          1
Migrantes, Alico                  1
Dadá Boladão, Tati Zaqui, OIK     1
Name: Artist, Length: 716, dtype: int64
  • Using the previous result, find the 50 most frequently occurring artists in this dataset. (Note: the value_counts method automatically sorts the results from most frequent to least frequent.)

  • Define a variable top_artists which contains these top 50 artists. (Hint. You might want to use the index attribute.)

top_artists = s.value_counts().index[:50]
  • (More difficult.) Use the isin method (documentation) and Boolean indexing to define a new pandas DataFrame df2 which is the sub-DataFrame of df containing only the 50 most frequently occurring artists.

df2 = df[df["Artist"].isin(top_artists)]
  • Check your answer: the shape of df2 should be 678 by 23.

df2.shape
(678, 23)

Interactive Altair chart

In this portion, we will make an interactive bar chart to accompany the chart we made above. This is based on the Interactive bar chart YouTube video:

  • Make the same chart as you made above, with the only difference being, that you now use df2 instead of df for the data.

alt.Chart(df2).mark_circle().encode(
    x="Acousticness",
    y="Energy",
    color=alt.Color('Valence', scale=alt.Scale(scheme='turbo'))
)
  • Add a selection_interval object named brush to the chart.

  • Assign this chart to the variable name c1 using the code c1 = alt.Chart....

  • Display this chart by evaluating c1.

  • Check your work: if you click and drag on the chart, there should be a grey rectangle that appears. (Once you’ve displayed the grey rectangle, you can move it around.)

brush = alt.selection_interval()

c1 = alt.Chart(df2).mark_circle().encode(
    x="Acousticness",
    y="Energy",
    color=alt.Color('Valence', scale=alt.Scale(scheme='turbo'))
).add_selection(brush)
c1
  • Make a second chart c2 showing a bar chart for the selected data as on the most recent video. The x-axis should correspond to Artist names (only the top 50 since we’re using df2) and the y-axis should correspond to the number of times those artists appear in the selection. (Use transform_filter with brush, as in the above video.)

  • Display c1 and c2, one after the other, using c1&c2. (If you instead want them to appear side-by-side, you can use c1|c2.)

c2 = alt.Chart(df2).mark_bar().encode(
    x="Artist",
    y="count()"
).transform_filter(brush)
c1&c2
  • Find an image you like (including the selection) and save it using the … “Save as PNG” from the top right of the Deepnote cell with the two charts.

  • Upload that file to this Deepnote project, and embed that png file in a markdown cell. The syntax is ![alt text](path).

saved bar chart

Created in deepnote.com Created in Deepnote