Plots from the Spotify dataset
Contents
Plots from the Spotify dataset¶
A chart with all the data¶
Import the attached Spotify dataset as
df
. In this csv file, missing values are denoted by a blank space. Use thena_values
keyword argument withpd.read_csv
so that those blank spaces get converted tonp.nan
.
import pandas as pd
import altair as alt
df = pd.read_csv("../data/spotify_dataset.csv", na_values=" ")
Check your work by evaluating
value_counts
ondf.dtypes
. If everything worked correctly, there should be 11 float columns, 3 integer columns, and 9 object columns.
df.dtypes.value_counts()
float64 11
object 9
int64 3
dtype: int64
Plot the data from
df
using Altair. Encode the “Acousticness” data as the x-coordinate, the “Energy” data as the y-coordinate, and encode the “Valence” data as the color.Use this method from the Altair documentation to adjust the color scheme used. This will be using the
dark2
color scheme.
alt.Chart(df).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color('Valence', scale=alt.Scale(scheme='dark2'))
)
Change the color scheme from
dark2
to a different one of these options (scroll down to find the options).Add a tooltip to the chart, indicating the Artist name and the song name.
alt.Chart(df).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color('Valence', scale=alt.Scale(scheme='turbo'))
)
A chart with the 50 most frequently occurring artists¶
Define a new variable
s
containing the pandas Series corresponding to the “Artist” column indf
.
s = df["Artist"]
Call the
value_counts
method ons
.
s.value_counts()
Taylor Swift 52
Lil Uzi Vert 32
Justin Bieber 32
Juice WRLD 30
Pop Smoke 29
..
Chris Brown, Young Thug 1
Rauw Alejandro, J Balvin 1
347aidan 1
Migrantes, Alico 1
Dadá Boladão, Tati Zaqui, OIK 1
Name: Artist, Length: 716, dtype: int64
Using the previous result, find the 50 most frequently occurring artists in this dataset. (Note: the
value_counts
method automatically sorts the results from most frequent to least frequent.)Define a variable
top_artists
which contains these top 50 artists. (Hint. You might want to use theindex
attribute.)
top_artists = s.value_counts().index[:50]
(More difficult.) Use the
isin
method (documentation) and Boolean indexing to define a new pandas DataFramedf2
which is the sub-DataFrame ofdf
containing only the 50 most frequently occurring artists.
df2 = df[df["Artist"].isin(top_artists)]
Check your answer: the shape of
df2
should be 678 by 23.
df2.shape
(678, 23)
Interactive Altair chart¶
In this portion, we will make an interactive bar chart to accompany the chart we made above. This is based on the Interactive bar chart YouTube video:
Make the same chart as you made above, with the only difference being, that you now use
df2
instead ofdf
for the data.
alt.Chart(df2).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color('Valence', scale=alt.Scale(scheme='turbo'))
)
Add a
selection_interval
object namedbrush
to the chart.Assign this chart to the variable name
c1
using the codec1 = alt.Chart...
.Display this chart by evaluating
c1
.Check your work: if you click and drag on the chart, there should be a grey rectangle that appears. (Once you’ve displayed the grey rectangle, you can move it around.)
brush = alt.selection_interval()
c1 = alt.Chart(df2).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color('Valence', scale=alt.Scale(scheme='turbo'))
).add_selection(brush)
c1
Make a second chart
c2
showing a bar chart for the selected data as on the most recent video. The x-axis should correspond to Artist names (only the top 50 since we’re usingdf2
) and the y-axis should correspond to the number of times those artists appear in the selection. (Usetransform_filter
withbrush
, as in the above video.)Display c1 and c2, one after the other, using
c1&c2
. (If you instead want them to appear side-by-side, you can usec1|c2
.)
c2 = alt.Chart(df2).mark_bar().encode(
x="Artist",
y="count()"
).transform_filter(brush)
c1&c2
Find an image you like (including the selection) and save it using the … “Save as PNG” from the top right of the Deepnote cell with the two charts.
Upload that file to this Deepnote project, and embed that png file in a markdown cell. The syntax is
![alt text](path)
.