Week 4 Monday¶

Today’s class is mostly meant as review, but there is some new material I want to introduce, including:

np.count_nonzero
sort_values

Announcements¶

Videos and video quizzes are due Tuesday before discussion (different day than most weeks). No in-class quiz this week.
Midterm is Thursday during discussion. You’re allowed to use a notecard with handwritten notes on both sides; ask Chris or Yasmeen if you need a new notecard.
In discussion section on Tuesday, Yasmeen will go over some of the sample midterm. (It’s on Canvas, on the Week 4 page.)
In general, the material from this week could appear on the midterm. (This week’s material is mostly meant as review.) If you’re curious about whether you need to memorize/study something specific, please ask on Ed Discussion.

Timing operations¶

import numpy as np
import pandas as pd

df = pd.read_csv("../data/spotify_dataset.csv", na_values=" ")

Define s to be the pandas Series corresponding to the “Energy” column in df.

s = df["Energy"]

type(s)

pandas.core.series.Series

Count how many values in s are strictly greater than 0.7. Try each of the following strategies.

Use a pandas Boolean Series and the sum method.

s[:5]

  0.800
  0.764
  0.664
  0.897
  0.704
Name: Energy, dtype: float64

s>0.7

      True
      True
     False
      True
      True
        ...  
  False
   True
  False
  False
  False
Name: Energy, Length: 1556, dtype: bool

(s>0.7).sum()

Use a pandas Boolean Series and Python’s built-in sum function.

sum(s>0.7)

Numpy’s np.count_nonzero function is the most efficient way I know to count elements in Python.

Use a pandas Boolean Series and Numpy’s np.count_nonzero function. (This function accepts many different types of inputs. If the input is a Boolean Series, it will count how often True occurs.)

np.count_nonzero(s>0.7)

np.count_nonzero(range(5))

np.count_nonzero([0,1,0,2,2])

Use a list comprehension together with len. (In other words, make a list containing all the values greater than 0.7, then compute the length of that list.)

len([x for x in s if x>0.7])

Aside: recall that if by itself should go at the end of the list comprehension, whereas if together with else should go at the beginning.

[x if x > 0.7 else "christopher" for x in s ]

[0.8,
 0.764,
 'christopher',
 0.897,
 0.704,
 'christopher',
 0.701,
 0.718,
 'christopher',
 'christopher',
 0.825,
 0.819,
 0.741,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.825,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.73,
 'christopher',
 0.766,
 'christopher',
 0.862,
 'christopher',
 'christopher',
 0.796,
 'christopher',
 'christopher',
 0.816,
 'christopher',
 'christopher',
 'christopher',
 0.71,
 'christopher',
 'christopher',
 'christopher',
 0.809,
 0.765,
 'christopher',
 0.784,
 0.716,
 'christopher',
 0.711,
 'christopher',
 0.849,
 'christopher',
 'christopher',
 0.839,
 0.738,
 0.941,
 0.887,
 'christopher',
 'christopher',
 0.72,
 'christopher',
 0.784,
 'christopher',
 0.825,
 'christopher',
 0.793,
 0.782,
 0.807,
 0.77,
 'christopher',
 'christopher',
 'christopher',
 0.706,
 0.899,
 'christopher',
 0.939,
 'christopher',
 'christopher',
 0.701,
 'christopher',
 'christopher',
 'christopher',
 0.948,
 'christopher',
 'christopher',
 0.874,
 0.782,
 0.753,
 'christopher',
 'christopher',
 'christopher',
 0.802,
 0.762,
 0.835,
 0.813,
 0.72,
 0.764,
 0.83,
 'christopher',
 'christopher',
 0.825,
 'christopher',
 0.812,
 0.78,
 'christopher',
 'christopher',
 0.719,
 0.899,
 0.837,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.96,
 0.731,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.812,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.835,
 0.731,
 0.912,
 'christopher',
 'christopher',
 0.873,
 0.859,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.706,
 0.732,
 0.795,
 0.788,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.855,
 0.815,
 'christopher',
 0.717,
 0.861,
 0.705,
 'christopher',
 'christopher',
 0.899,
 0.783,
 0.792,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.924,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.759,
 'christopher',
 'christopher',
 0.771,
 'christopher',
 0.76,
 'christopher',
 'christopher',
 'christopher',
 0.805,
 0.766,
 'christopher',
 'christopher',
 0.878,
 'christopher',
 'christopher',
 0.767,
 'christopher',
 0.72,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.724,
 'christopher',
 0.844,
 0.751,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.707,
 0.732,
 'christopher',
 0.716,
 0.71,
 'christopher',
 'christopher',
 'christopher',
 0.922,
 0.909,
 'christopher',
 0.889,
 'christopher',
 0.918,
 0.712,
 0.739,
 'christopher',
 0.893,
 0.717,
 'christopher',
 0.708,
 'christopher',
 'christopher',
 0.727,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.908,
 'christopher',
 0.795,
 'christopher',
 0.736,
 'christopher',
 'christopher',
 0.762,
 'christopher',
 0.75,
 0.845,
 'christopher',
 'christopher',
 0.855,
 'christopher',
 0.866,
 'christopher',
 0.712,
 0.814,
 0.862,
 'christopher',
 0.707,
 'christopher',
 0.764,
 'christopher',
 'christopher',
 0.809,
 0.756,
 'christopher',
 0.802,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.893,
 0.922,
 'christopher',
 'christopher',
 0.843,
 'christopher',
 0.729,
 'christopher',
 0.733,
 'christopher',
 0.864,
 'christopher',
 0.73,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.706,
 'christopher',
 0.934,
 0.719,
 'christopher',
 0.712,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.828,
 'christopher',
 'christopher',
 0.855,
 'christopher',
 0.831,
 'christopher',
 0.836,
 0.713,
 'christopher',
 'christopher',
 'christopher',
 0.838,
 'christopher',
 'christopher',
 'christopher',
 0.73,
 0.789,
 'christopher',
 'christopher',
 0.862,
 0.725,
 0.745,
 'christopher',
 'christopher',
 0.704,
 'christopher',
 'christopher',
 'christopher',
 0.709,
 'christopher',
 0.856,
 0.89,
 0.793,
 0.867,
 0.955,
 0.737,
 0.814,
 'christopher',
 'christopher',
 'christopher',
 0.929,
 'christopher',
 'christopher',
 0.749,
 'christopher',
 'christopher',
 0.759,
 'christopher',
 0.74,
 'christopher',
 0.703,
 'christopher',
 'christopher',
 0.748,
 0.788,
 0.856,
 'christopher',
 0.83,
 0.837,
 0.8,
 'christopher',
 0.744,
 'christopher',
 0.721,
 'christopher',
 'christopher',
 'christopher',
 0.819,
 'christopher',
 'christopher',
 'christopher',
 0.739,
 'christopher',
 'christopher',
 0.796,
 0.789,
 0.814,
 'christopher',
 0.789,
 0.734,
 0.701,
 0.848,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.817,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.821,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.719,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.758,
 'christopher',
 'christopher',
 0.703,
 0.792,
 0.763,
 0.932,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.773,
 0.741,
 0.899,
 'christopher',
 0.732,
 'christopher',
 'christopher',
 'christopher',
 0.821,
 'christopher',
 'christopher',
 'christopher',
 0.909,
 'christopher',
 0.891,
 0.771,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.711,
 'christopher',
 'christopher',
 0.922,
 0.823,
 'christopher',
 0.756,
 0.741,
 'christopher',
 0.723,
 0.902,
 0.72,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.715,
 0.761,
 0.724,
 'christopher',
 'christopher',
 0.823,
 0.85,
 'christopher',
 'christopher',
 0.701,
 0.794,
 'christopher',
 0.833,
 'christopher',
 'christopher',
 'christopher',
 0.739,
 'christopher',
 'christopher',
 'christopher',
 0.713,
 'christopher',
 0.741,
 'christopher',
 'christopher',
 0.922,
 'christopher',
 0.76,
 'christopher',
 'christopher',
 0.74,
 0.819,
 'christopher',
 0.903,
 0.727,
 0.761,
 0.912,
 0.834,
 0.744,
 'christopher',
 'christopher',
 0.737,
 'christopher',
 0.705,
 'christopher',
 0.828,
 'christopher',
 0.797,
 'christopher',
 0.709,
 0.708,
 0.723,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.756,
 'christopher',
 'christopher',
 'christopher',
 0.775,
 0.742,
 'christopher',
 0.706,
 'christopher',
 0.788,
 'christopher',
 0.816,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.774,
 0.893,
 'christopher',
 'christopher',
 'christopher',
 0.844,
 0.769,
 'christopher',
 'christopher',
 0.814,
 'christopher',
 0.859,
 'christopher',
 'christopher',
 0.746,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.715,
 0.737,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.784,
 0.726,
 'christopher',
 'christopher',
 0.707,
 0.806,
 'christopher',
 'christopher',
 'christopher',
 0.786,
 'christopher',
 'christopher',
 'christopher',
 0.869,
 'christopher',
 0.937,
 0.701,
 'christopher',
 'christopher',
 0.79,
 'christopher',
 'christopher',
 0.707,
 0.792,
 0.865,
 0.836,
 0.914,
 'christopher',
 'christopher',
 0.88,
 0.937,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.812,
 0.831,
 'christopher',
 'christopher',
 'christopher',
 0.772,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.904,
 'christopher',
 0.876,
 0.776,
 'christopher',
 'christopher',
 0.759,
 0.925,
 'christopher',
 0.953,
 'christopher',
 0.841,
 'christopher',
 'christopher',
 'christopher',
 0.814,
 0.843,
 'christopher',
 'christopher',
 'christopher',
 0.87,
 0.869,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.712,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.813,
 'christopher',
 0.771,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.938,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.939,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.73,
 'christopher',
 'christopher',
 0.704,
 0.724,
 0.758,
 0.727,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.881,
 'christopher',
 'christopher',
 'christopher',
 0.703,
 0.885,
 'christopher',
 0.723,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.829,
 0.727,
 0.74,
 0.798,
 0.745,
 0.725,
 0.861,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.845,
 'christopher',
 0.908,
 'christopher',
 0.756,
 'christopher',
 'christopher',
 0.882,
 0.705,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.708,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.782,
 'christopher',
 'christopher',
 'christopher',
 0.787,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.824,
 'christopher',
 'christopher',
 0.97,
 'christopher',
 'christopher',
 'christopher',
 0.765,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.819,
 'christopher',
 0.808,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.716,
 0.741,
 0.796,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.851,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.887,
 'christopher',
 'christopher',
 0.726,
 0.913,
 'christopher',
 'christopher',
 0.771,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.837,
 'christopher',
 'christopher',
 'christopher',
 0.729,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.865,
 0.939,
 'christopher',
 0.867,
 'christopher',
 'christopher',
 0.758,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.733,
 'christopher',
 0.875,
 'christopher',
 0.737,
 'christopher',
 0.733,
 0.793,
 0.706,
 'christopher',
 'christopher',
 'christopher',
 0.709,
 'christopher',
 0.818,
 0.703,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.749,
 'christopher',
 0.858,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.776,
 'christopher',
 'christopher',
 0.884,
 'christopher',
 0.77,
 'christopher',
 0.723,
 'christopher',
 'christopher',
 0.832,
 0.768,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.844,
 0.773,
 'christopher',
 0.706,
 'christopher',
 'christopher',
 'christopher',
 0.727,
 'christopher',
 'christopher',
 'christopher',
 0.814,
 'christopher',
 0.955,
 'christopher',
 'christopher',
 0.743,
 0.82,
 'christopher',
 0.911,
 'christopher',
 'christopher',
 0.799,
 0.857,
 0.71,
 'christopher',
 'christopher',
 0.716,
 'christopher',
 'christopher',
 0.794,
 0.854,
 'christopher',
 'christopher',
 'christopher',
 0.796,
 0.87,
 'christopher',
 'christopher',
 0.863,
 'christopher',
 'christopher',
 0.817,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.854,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.826,
 'christopher',
 0.781,
 'christopher',
 'christopher',
 0.766,
 'christopher',
 'christopher',
 'christopher',
 0.744,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.716,
 'christopher',
 0.704,
 0.848,
 0.741,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.742,
 'christopher',
 'christopher',
 'christopher',
 0.857,
 0.909,
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 'christopher',
 0.764,
 0.723,
 'christopher',
 0.787,
 'christopher',
 0.76,
 'christopher',
 ...]

Time each of these strategies using %%timeit. Which is fastest?

In this case, they are all comparable, with np.count_nonzero being the fastest. If the original DataFrame were bigger, say with ten million rows instead of one thousand rows, I think the differences would be more pronounced.

%%timeit
(s > 0.7).sum()

62 µs ± 582 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
sum(s > 0.7)

142 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
np.count_nonzero(s > 0.7)

42.2 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
len([x for x in s if x>0.7])

137 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Sorting pandas Series and DataFrames¶

One brief topic I want to cover before the midterm is using the sort_values method.

What songs have the 10 highest “Valence” levels in the Spotify dataset? Solve this two ways.

By sorting a pandas Series, getting its index, and then getting the sub-DataFrame.

df["Valence"].sort_values()

   0.0320
  0.0360
   0.0363
   0.0376
  0.0391
         ...  
      NaN
      NaN
      NaN
     NaN
     NaN
Name: Valence, Length: 1556, dtype: float64

df["Valence"].sort_values(ascending=False)

   0.979
  0.977
   0.971
  0.968
  0.966
        ...  
     NaN
     NaN
     NaN
    NaN
    NaN
Name: Valence, Length: 1556, dtype: float64

df["Valence"].sort_values(ascending=False).index

Int64Index([ 884, 1408,  677, 1096, 1230,  463,  130, 1390,  512,  627,
            ...
             163,  464,  530,  636,  654,  750,  784,  876, 1140, 1538],
           dtype='int64', length=1556)

df["Valence"].sort_values(ascending=False).index[:10]

Int64Index([884, 1408, 677, 1096, 1230, 463, 130, 1390, 512, 627], dtype='int64')

df["Valence"].sort_values(ascending=False)[:10].index

Int64Index([884, 1408, 677, 1096, 1230, 463, 130, 1390, 512, 627], dtype='int64')

The following code does not work, because it keeps only the first 10 rows at the beginning, so none of the other rows are considered during the sorting.

df["Valence"][:10].sort_values(ascending=False).index

Int64Index([9, 4, 5, 6, 2, 3, 0, 1, 8, 7], dtype='int64')

The code df["Valence"].sort_values(ascending=False)[:10].index contains labels, so it is natural to put it inside of df.loc.

df.loc[df["Valence"].sort_values(ascending=False)[:10].index]

	Index	Highest Charting Position	Number of Times Charted	Week of Highest Charting	Song Name	Streams	Artist	Artist Followers	Song ID	Genre	...	Danceability	Energy	Loudness	Speechiness	Acousticness	Liveness	Tempo	Duration (ms)	Valence	Chord
884	885	148	1	2020-09-18--2020-09-25	September	5,329,256	Earth, Wind & Fire	3008916.0	2grjqo0Frpf2okIBiifQKs	['disco', 'funk', 'jazz funk', 'motown', 'quie...	...	0.697	0.832	-7.264	0.0298	0.1680	0.2690	125.926	215093.0	0.979	A
1408	1409	85	1	2020-02-14--2020-02-21	Running Over (feat. Lil Dicky)	7,493,188	Justin Bieber	48544923.0	75nKBP8jQu681pTNCtrEnn	['canadian pop', 'pop', 'post-teen pop']	...	0.774	0.603	-7.319	0.0591	0.4380	0.0869	149.982	179627.0	0.977	B
677	678	149	1	2020-12-18--2020-12-25	Little Saint Nick - 1991 Remix	7,301,381	The Beach Boys	1251372.0	63Lk6VuXdj7S58R3wLdv9r	[]	...	0.602	0.553	-9.336	0.0328	0.1080	0.0512	130.594	118840.0	0.971	B
1096	1097	129	4	2020-06-05--2020-06-12	Na Raba Toma Tapão	4,396,629	Niack	352402.0	0AGS6ZRgzobrazmCi6pYMe	['funk carioca']	...	0.962	0.787	1.509	0.0554	0.6660	0.1760	130.003	165231.0	0.968	D#/Eb
1230	1231	102	1	2020-04-17--2020-04-24	JUMP (feat. YoungBoy Never Broke Again)	6,033,348	DaBaby	7601122.0	0oT9ElXYSxvnOOagP9efDq	['north carolina hip hop', 'rap']	...	0.896	0.720	-6.262	0.3550	0.1690	0.2520	140.100	212093.0	0.966	C
463	464	45	18	2021-01-01--2021-01-08	BEBÉ	4,967,348	Camilo, El Alfa	10580764.0	7D7EH7MGyNHWSkqrszerI1	['colombian pop', 'reggaeton colombiano']	...	0.862	0.720	-4.048	0.0379	0.4870	0.0604	129.972	198707.0	0.965	E
130	131	131	3	2021-07-23--2021-07-30	Aquelas Coisas	6,012,839	João Gomes	409173.0	0FqVtQxRD3HsPltldG5v5M	[]	...	0.682	0.873	-4.163	0.0449	0.4020	0.0946	150.006	147072.0	0.964	F#/Gb
1390	1391	154	4	2020-02-21--2020-02-28	Tudo Ok	5,510,844	Thiaguinho MT, Mila, JS o Mão de Ouro	28017.0	4HUZBG98TYbxSR9V1V2DWS	['brega funk', 'funk carioca']	...	0.814	0.755	-6.164	0.0942	0.2390	0.3060	79.976	178500.0	0.963	B
512	513	24	25	2020-11-06--2020-11-13	Se Te Nota (with Guaynaa)	5,168,240	Lele Pons	786461.0	11EnQRgRMJwMAesfkB5pnu	['latin pop', 'viral pop']	...	0.905	0.686	-3.152	0.0664	0.0907	0.2660	103.013	155825.0	0.963	C
627	628	14	6	2020-12-18--2020-12-25	Feliz Navidad	11,664,490	José Feliciano	239129.0	0oPdaY4dXtc3ZsaG17V972	['latin pop', 'puerto rican pop']	...	0.513	0.831	-9.004	0.0383	0.5500	0.3360	148.837	182067.0	0.963	D

10 rows × 23 columns

An easier way to accomplish the same thing is to sort the whole DataFrame at once. So here we are using a DataFrame’s sort_values method, rather than a Series sort_values method.

By sorting the whole DataFrame.

df.sort_values("Valence", ascending=False)[:10]

	Index	Highest Charting Position	Number of Times Charted	Week of Highest Charting	Song Name	Streams	Artist	Artist Followers	Song ID	Genre	...	Danceability	Energy	Loudness	Speechiness	Acousticness	Liveness	Tempo	Duration (ms)	Valence	Chord
884	885	148	1	2020-09-18--2020-09-25	September	5,329,256	Earth, Wind & Fire	3008916.0	2grjqo0Frpf2okIBiifQKs	['disco', 'funk', 'jazz funk', 'motown', 'quie...	...	0.697	0.832	-7.264	0.0298	0.1680	0.2690	125.926	215093.0	0.979	A
1408	1409	85	1	2020-02-14--2020-02-21	Running Over (feat. Lil Dicky)	7,493,188	Justin Bieber	48544923.0	75nKBP8jQu681pTNCtrEnn	['canadian pop', 'pop', 'post-teen pop']	...	0.774	0.603	-7.319	0.0591	0.4380	0.0869	149.982	179627.0	0.977	B
677	678	149	1	2020-12-18--2020-12-25	Little Saint Nick - 1991 Remix	7,301,381	The Beach Boys	1251372.0	63Lk6VuXdj7S58R3wLdv9r	[]	...	0.602	0.553	-9.336	0.0328	0.1080	0.0512	130.594	118840.0	0.971	B
1096	1097	129	4	2020-06-05--2020-06-12	Na Raba Toma Tapão	4,396,629	Niack	352402.0	0AGS6ZRgzobrazmCi6pYMe	['funk carioca']	...	0.962	0.787	1.509	0.0554	0.6660	0.1760	130.003	165231.0	0.968	D#/Eb
1230	1231	102	1	2020-04-17--2020-04-24	JUMP (feat. YoungBoy Never Broke Again)	6,033,348	DaBaby	7601122.0	0oT9ElXYSxvnOOagP9efDq	['north carolina hip hop', 'rap']	...	0.896	0.720	-6.262	0.3550	0.1690	0.2520	140.100	212093.0	0.966	C
463	464	45	18	2021-01-01--2021-01-08	BEBÉ	4,967,348	Camilo, El Alfa	10580764.0	7D7EH7MGyNHWSkqrszerI1	['colombian pop', 'reggaeton colombiano']	...	0.862	0.720	-4.048	0.0379	0.4870	0.0604	129.972	198707.0	0.965	E
130	131	131	3	2021-07-23--2021-07-30	Aquelas Coisas	6,012,839	João Gomes	409173.0	0FqVtQxRD3HsPltldG5v5M	[]	...	0.682	0.873	-4.163	0.0449	0.4020	0.0946	150.006	147072.0	0.964	F#/Gb
1390	1391	154	4	2020-02-21--2020-02-28	Tudo Ok	5,510,844	Thiaguinho MT, Mila, JS o Mão de Ouro	28017.0	4HUZBG98TYbxSR9V1V2DWS	['brega funk', 'funk carioca']	...	0.814	0.755	-6.164	0.0942	0.2390	0.3060	79.976	178500.0	0.963	B
512	513	24	25	2020-11-06--2020-11-13	Se Te Nota (with Guaynaa)	5,168,240	Lele Pons	786461.0	11EnQRgRMJwMAesfkB5pnu	['latin pop', 'viral pop']	...	0.905	0.686	-3.152	0.0664	0.0907	0.2660	103.013	155825.0	0.963	C
627	628	14	6	2020-12-18--2020-12-25	Feliz Navidad	11,664,490	José Feliciano	239129.0	0oPdaY4dXtc3ZsaG17V972	['latin pop', 'puerto rican pop']	...	0.513	0.831	-9.004	0.0383	0.5500	0.3360	148.837	182067.0	0.963	D

10 rows × 23 columns

df.columns

Index(['Index', 'Highest Charting Position', 'Number of Times Charted',
       'Week of Highest Charting', 'Song Name', 'Streams', 'Artist',
       'Artist Followers', 'Song ID', 'Genre', 'Release Date', 'Weeks Charted',
       'Popularity', 'Danceability', 'Energy', 'Loudness', 'Speechiness',
       'Acousticness', 'Liveness', 'Tempo', 'Duration (ms)', 'Valence',
       'Chord'],
      dtype='object')

Including the month¶

Add a column to df with the name “Month” which contains a numeric number 1 to 12 indicating the month of the “Week of Highest Charting”. Use the following strategy.

Drop all rows containing missing values.
Using map and a lambda function, get the first 10 characters in each string from the “Week of Highest Charting” column.
Use .dt.month and pd.to_datetime to find the numeric month.

(There are probably easier ways, since we already have the numeric month in the original string, but this method generalizes nicely if we want something like the month name instead of the month number.)

# Alternate version: df = df.dropna()
df.dropna(inplace=True)

df.shape

(1545, 23)

df["Week of Highest Charting"]

     2021-07-23--2021-07-30
     2021-07-23--2021-07-30
     2021-06-25--2021-07-02
     2021-07-02--2021-07-09
     2021-07-23--2021-07-30
                 ...          
  2019-12-27--2020-01-03
  2019-12-27--2020-01-03
  2019-12-27--2020-01-03
  2019-12-27--2020-01-03
  2019-12-27--2020-01-03
Name: Week of Highest Charting, Length: 1545, dtype: object

Often if we for example what to do something to each entry in a pandas Series, we can just perform that operation on the whole Series, and it will automatically get mapped elementwise. For example, if s is a pandas Series, then s+2 will add 2 to each of the entries in s.

That does not work for slicing though. For example, in the following, we are keeping only the first 10 rows, rather than keeping the first 10 characters in each entry.

df["Week of Highest Charting"][:10]

  2021-07-23--2021-07-30
  2021-07-23--2021-07-30
  2021-06-25--2021-07-02
  2021-07-02--2021-07-09
  2021-07-23--2021-07-30
  2021-05-07--2021-05-14
  2021-05-14--2021-05-21
  2021-06-18--2021-06-25
  2021-06-18--2021-06-25
  2021-07-02--2021-07-09
Name: Week of Highest Charting, dtype: object

One natural way to do this is to use a lambda function.

df["Week of Highest Charting"].map(lambda x: x[:10])

     2021-07-23
     2021-07-23
     2021-06-25
     2021-07-02
     2021-07-23
           ...    
  2019-12-27
  2019-12-27
  2019-12-27
  2019-12-27
  2019-12-27
Name: Week of Highest Charting, Length: 1545, dtype: object

Reminder on lambda functions. lambda provides a quick way to define a function. For example, here we define the squaring function.

f = lambda x: x**2

f(3)

Here is an equivalent way to get the first 10 characters in each entry, but it’s definitely less elegant.

# less elegant
def first_ten(x):
    return x[:10]

df["Week of Highest Charting"].map(first_ten)

     2021-07-23
     2021-07-23
     2021-06-25
     2021-07-02
     2021-07-23
           ...    
  2019-12-27
  2019-12-27
  2019-12-27
  2019-12-27
  2019-12-27
Name: Week of Highest Charting, Length: 1545, dtype: object

temp_series = df["Week of Highest Charting"].map(lambda x: x[:10])

temp_series

     2021-07-23
     2021-07-23
     2021-06-25
     2021-07-02
     2021-07-23
           ...    
  2019-12-27
  2019-12-27
  2019-12-27
  2019-12-27
  2019-12-27
Name: Week of Highest Charting, Length: 1545, dtype: object

This temp_series is the sort of pandas Series which can be converted into the datetime dtype. Notice how the dtype of the previous Series was object (where you should think “string”), whereas in the following it is datetime64[ns].

pd.to_datetime(temp_series)

    2021-07-23
    2021-07-23
    2021-06-25
    2021-07-02
    2021-07-23
          ...    
 2019-12-27
 2019-12-27
 2019-12-27
 2019-12-27
 2019-12-27
Name: Week of Highest Charting, Length: 1545, dtype: datetime64[ns]

Once the Series is in the correct format, we can apply all sorts of useful methods, here using the dt accessor.

pd.to_datetime(temp_series).dt.month_name()

         July
         July
         June
         July
         July
          ...   
  December
  December
  December
  December
  December
Name: Week of Highest Charting, Length: 1545, dtype: object

Here we add a new column to the DataFrame containing the numerical month.

df["Month"] = pd.to_datetime(temp_series).dt.month

df

	Index	Highest Charting Position	Number of Times Charted	Week of Highest Charting	Song Name	Streams	Artist	Artist Followers	Song ID	Genre	...	Energy	Loudness	Speechiness	Acousticness	Liveness	Tempo	Duration (ms)	Valence	Chord	Month
0	1	1	8	2021-07-23--2021-07-30	Beggin'	48,633,449	Måneskin	3377762.0	3Wrjm47oTz2sjIgck11l5e	['indie rock italiano', 'italian pop']	...	0.800	-4.808	0.0504	0.12700	0.3590	134.002	211560.0	0.589	B	7
1	2	2	3	2021-07-23--2021-07-30	STAY (with Justin Bieber)	47,248,719	The Kid LAROI	2230022.0	5HCyWlXZPP0y6Gqq8TgA20	['australian hip hop']	...	0.764	-5.484	0.0483	0.03830	0.1030	169.928	141806.0	0.478	C#/Db	7
2	3	1	11	2021-06-25--2021-07-02	good 4 u	40,162,559	Olivia Rodrigo	6266514.0	4ZtFanR9U6ndgddUvNcjcG	['pop']	...	0.664	-5.044	0.1540	0.33500	0.0849	166.928	178147.0	0.688	A	6
3	4	3	5	2021-07-02--2021-07-09	Bad Habits	37,799,456	Ed Sheeran	83293380.0	6PQ88X9TkUIAUIZJHW2upE	['pop', 'uk pop']	...	0.897	-3.712	0.0348	0.04690	0.3640	126.026	231041.0	0.591	B	7
4	5	5	1	2021-07-23--2021-07-30	INDUSTRY BABY (feat. Jack Harlow)	33,948,454	Lil Nas X	5473565.0	27NovPIUIRrOZoCHxABJwK	['lgbtq+ hip hop', 'pop rap']	...	0.704	-7.409	0.0615	0.02030	0.0501	149.995	212000.0	0.894	D#/Eb	7
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1551	1552	195	1	2019-12-27--2020-01-03	New Rules	4,630,675	Dua Lipa	27167675.0	2ekn2ttSfGqwhhate0LSR0	['dance pop', 'pop', 'uk pop']	...	0.700	-6.021	0.0694	0.00261	0.1530	116.073	209320.0	0.608	A	12
1552	1553	196	1	2019-12-27--2020-01-03	Cheirosa - Ao Vivo	4,623,030	Jorge & Mateus	15019109.0	2PWjKmjyTZeDpmOUa3a5da	['sertanejo', 'sertanejo universitario']	...	0.870	-3.123	0.0851	0.24000	0.3330	152.370	181930.0	0.714	B	12
1553	1554	197	1	2019-12-27--2020-01-03	Havana (feat. Young Thug)	4,620,876	Camila Cabello	22698747.0	1rfofaqEpACxVEHIZBJe6W	['dance pop', 'electropop', 'pop', 'post-teen ...	...	0.523	-4.333	0.0300	0.18400	0.1320	104.988	217307.0	0.394	D	12
1554	1555	198	1	2019-12-27--2020-01-03	Surtada - Remix Brega Funk	4,607,385	Dadá Boladão, Tati Zaqui, OIK	208630.0	5F8ffc8KWKNawllr5WsW0r	['brega funk', 'funk carioca']	...	0.550	-7.026	0.0587	0.24900	0.1820	154.064	152784.0	0.881	F	12
1555	1556	199	1	2019-12-27--2020-01-03	Lover (Remix) [feat. Shawn Mendes]	4,595,450	Taylor Swift	42227614.0	3i9UVldZOE0aD0JnyfAZZ0	['pop', 'post-teen pop']	...	0.603	-7.176	0.0640	0.43300	0.0862	205.272	221307.0	0.422	G	12

1545 rows × 24 columns

UC Irvine Math 10 S22

Week 4 Monday

Contents

Week 4 Monday¶

Announcements¶

Timing operations¶

Sorting pandas Series and DataFrames¶

Including the month¶