Week 4 Tuesday#

import pandas as pd
df = pd.read_csv("spotify_dataset.csv")
df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 MĂĄneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb

5 rows Ă— 23 columns

lambda function examples#

  • In the “Week of Highest Charting” column values, get the first date, for example, if the value is "2021-07-23--2021-07-30", you want to produce "2021-07-23". Put this in a new column called “Date”. Use map and a lambda function.

s = "2021-07-23--2021-07-30"
s.split("-")
['2021', '07', '23', '', '2021', '07', '30']
s.split("--")
['2021-07-23', '2021-07-30']
s.split("--")[0]
'2021-07-23'
df["Date"] = df["Week of Highest Charting"].map(lambda s: s.split("--")[0])
df["Date"]
0       2021-07-23
1       2021-07-23
2       2021-06-25
3       2021-07-02
4       2021-07-23
           ...    
1551    2019-12-27
1552    2019-12-27
1553    2019-12-27
1554    2019-12-27
1555    2019-12-27
Name: Date, Length: 1556, dtype: object
  • Convert the “Date” column to a datetime data type using the pandas function to_datetime.

df["Date"] = pd.to_datetime(df["Date"])
  • Which song has the most genres listed?

Notice: the “Genre” entries” are currently strings and not lists. Use the literal_eval function from the ast module. Some values don’t work, so only apply this to strings starting with "[", and use not-a-number from NumPy for the other values.

df.loc[0, "Genre"]
"['indie rock italiano', 'italian pop']"
df.loc[0, "Genre"][0]
'['
type(df.loc[0, "Genre"])
str
from ast import literal_eval # safer than `eval`
literal_eval(df.loc[0, "Genre"])
['indie rock italiano', 'italian pop']
type(literal_eval(df.loc[0, "Genre"]))
list
help(literal_eval)
Help on function literal_eval in module ast:

literal_eval(node_or_string)
    Safely evaluate an expression node or a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
    sets, booleans, and None.
df["Genre"].map(literal_eval)
Traceback (most recent call last):

  File /shared-libs/python3.9/py-core/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3378 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In [16], line 1
    df["Genre"].map(literal_eval)

  File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/series.py:3909 in map
    new_values = super()._map_values(arg, na_action=na_action)

  File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/base.py:937 in _map_values
    new_values = map_f(values, mapper)

  File pandas/_libs/lib.pyx:2467 in pandas._libs.lib.map_infer

  File /usr/local/lib/python3.9/ast.py:62 in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File /usr/local/lib/python3.9/ast.py:50 in parse
    return compile(source, filename, mode, flags,

  File <unknown>:1
    
    ^
SyntaxError: unexpected EOF while parsing
import numpy as np
df["Genre"].map(lambda x: literal_eval(x) if x.startswith("[") else np.nan)
0                [indie rock italiano, italian pop]
1                              [australian hip hop]
2                                             [pop]
3                                     [pop, uk pop]
4                         [lgbtq+ hip hop, pop rap]
                           ...                     
1551                       [dance pop, pop, uk pop]
1552           [sertanejo, sertanejo universitario]
1553    [dance pop, electropop, pop, post-teen pop]
1554                     [brega funk, funk carioca]
1555                           [pop, post-teen pop]
Name: Genre, Length: 1556, dtype: object
df["Genre Length"] = df["Genre"].map(lambda x: len(literal_eval(x)) if x.startswith("[") else np.nan)
df["Genre"]
0                  ['indie rock italiano', 'italian pop']
1                                  ['australian hip hop']
2                                                 ['pop']
3                                       ['pop', 'uk pop']
4                           ['lgbtq+ hip hop', 'pop rap']
                              ...                        
1551                       ['dance pop', 'pop', 'uk pop']
1552             ['sertanejo', 'sertanejo universitario']
1553    ['dance pop', 'electropop', 'pop', 'post-teen ...
1554                       ['brega funk', 'funk carioca']
1555                             ['pop', 'post-teen pop']
Name: Genre, Length: 1556, dtype: object
df["Genre Length"]
0       2.0
1       1.0
2       1.0
3       2.0
4       2.0
       ... 
1551    3.0
1552    2.0
1553    4.0
1554    2.0
1555    2.0
Name: Genre Length, Length: 1556, dtype: float64
df["Genre Length"].argmax()
44
df.iloc[44]
Index                                                                       45
Highest Charting Position                                                    9
Number of Times Charted                                                     39
Week of Highest Charting                                2021-02-26--2021-03-05
Song Name                                                         The Business
Streams                                                             10,739,770
Artist                                                                  Tiësto
Artist Followers                                                       5785065
Song ID                                                 6f3Slt0GbA2bPZlz0aIFXN
Genre                        ['big room', 'brostep', 'dance pop', 'dutch ed...
Release Date                                                        2020-09-16
Weeks Charted                2021-07-23--2021-07-30\n2021-07-16--2021-07-23...
Popularity                                                                  90
Danceability                                                             0.798
Energy                                                                    0.62
Loudness                                                                -7.079
Speechiness                                                              0.232
Acousticness                                                             0.414
Liveness                                                                 0.112
Tempo                                                                  120.031
Duration (ms)                                                           164000
Valence                                                                  0.235
Chord                                                                    G#/Ab
Date                                                       2021-02-26 00:00:00
Genre Lengths                                                             11.0
Genre Length                                                              11.0
Name: 44, dtype: object
df.iloc[44]["Genre"]
"['big room', 'brostep', 'dance pop', 'dutch edm', 'edm', 'electro house', 'house', 'pop dance', 'slap house', 'trance', 'tropical house']"
df.iloc[44]["Genre Lengths"]
11.0

Practice with list comprehension#

  • Using list comprehension, make a list of all the row labels for which the “Date” value is in April.

# Check: no repeated labels
df.index.is_unique
True
# approach 1
date_list = [ind for ind in df.index if df.loc[ind, "Date"].month == 4]
date_list[:5]
[13, 17, 29, 30, 36]
# approach 2
date_list2 = [ind for ind, mnth in df["Date"].dt.month.items() if mnth == 4]
date_list2[:5]
[13, 17, 29, 30, 36]
# non-list comprehension approach
date_index = df[df["Date"].dt.month == 4].index
date_index[:5]
Int64Index([13, 17, 29, 30, 36], dtype='int64')
  • What are all the genres occuring in the “Genre” column? Remove duplicates by using set comprehension instead of list comprehension (just replace the square brackets with curly brackets). You will have to remove missing values.

df["Genre Lists"] = df["Genre"].map(lambda x: literal_eval(x) if x.startswith("[") else np.nan)
genre_series = df["Genre Lists"][~df["Genre Lists"].isna()]
{genre for genre_list in genre_series for genre in genre_list}
{'a cappella',
 'acoustic pop',
 'adult standards',
 'afrofuturism',
 'afroswing',
 'albanian hip hop',
 'album rock',
 'alt z',
 'alternative metal',
 'alternative pop rock',
 'alternative r&b',
 'american folk revival',
 'argentine hip hop',
 'art pop',
 'art rock',
 'atl hip hop',
 'atl trap',
 'aussietronica',
 'australian dance',
 'australian hip hop',
 'australian pop',
 'australian psych',
 'australian rock',
 'banda',
 'basshall',
 'beatlesque',
 'bedroom pop',
 'bedroom soul',
 'belgian hip hop',
 'big room',
 'boston hip hop',
 'boy band',
 'brazilian hip hop',
 'brega funk',
 'brill building pop',
 'british soul',
 'britpop',
 'brooklyn drill',
 'brostep',
 'bubblegrunge',
 'cali rap',
 'canadian contemporary r&b',
 'canadian hip hop',
 'canadian latin',
 'canadian pop',
 'canadian trap',
 'celtic',
 'celtic punk',
 'celtic rock',
 'champeta',
 'chicago drill',
 'chicago indie',
 'chicago rap',
 'chicago soul',
 'chill r&b',
 'christlicher rap',
 'classic rock',
 'classic soul',
 'classic uk pop',
 'colombian pop',
 'comic',
 'conscious hip hop',
 'contemporary country',
 'corrido',
 'country',
 'country road',
 'country rock',
 'cubaton',
 'cumbia pop',
 'dance pop',
 'dance rock',
 'deep euro house',
 'deep german hip hop',
 'deep house',
 'deep underground hip hop',
 'dembow',
 'detroit hip hop',
 'dfw rap',
 'disco',
 'dmv rap',
 'dominican pop',
 'dream smp',
 'dreamo',
 'drill',
 'dutch edm',
 'dutch pop',
 'east coast hip hop',
 'easy listening',
 'eau claire indie',
 'edm',
 'electro house',
 'electro latino',
 'electronic trap',
 'electropop',
 'emo rap',
 'escape room',
 'etherpop',
 'eurodance',
 'europop',
 'eurovision',
 'florida rap',
 'folk punk',
 'folk rock',
 'folk-pop',
 'folktronica',
 'forro',
 'francoton',
 'frauenrap',
 'french hip hop',
 'funk',
 'funk 150 bpm',
 'funk bh',
 'funk carioca',
 'funk ostentacao',
 'funk paulista',
 'funk pop',
 'gangster rap',
 'garage rock',
 'gauze pop',
 'german alternative rap',
 'german cloud rap',
 'german dance',
 'german drill',
 'german hip hop',
 'german pop',
 'german techno',
 'german trance',
 'german trap',
 'german underground rap',
 'girl group',
 'glam rock',
 'grime',
 'grunge',
 'hamburg hip hop',
 'hard rock',
 'hardcore hip hop',
 'heartland rock',
 'hip hop',
 'hip pop',
 'hollywood',
 'house',
 'houston rap',
 'icelandic pop',
 'indie cafe pop',
 'indie pop',
 'indie pop rap',
 'indie poptimism',
 'indie r&b',
 'indie rock italiano',
 'indie rockism',
 'indie surf',
 'indietronica',
 'indonesian pop',
 'irish folk',
 'irish singer-songwriter',
 'italian adult pop',
 'italian hip hop',
 'italian indie pop',
 'italian pop',
 'italian pop rock',
 'jawaiian',
 'jazz funk',
 'jazz pop',
 'k-pop',
 'k-pop boy group',
 'k-pop girl group',
 'k-rap',
 'kentucky hip hop',
 'latin',
 'latin hip hop',
 'latin pop',
 'latin viral pop',
 'lgbtq+ hip hop',
 'london rap',
 'lounge',
 'madchester',
 'mariachi',
 'melanesian pop',
 'mellow gold',
 'melodic metalcore',
 'melodic rap',
 'meme rap',
 'memphis hip hop',
 'metalcore',
 'metropopolis',
 'mexican hip hop',
 'miami hip hop',
 'minnesota hip hop',
 'modern alternative rock',
 'modern country rock',
 'modern indie pop',
 'modern rock',
 'moombahton',
 'motown',
 'musical advocacy',
 'neo mellow',
 'neo soul',
 'neo-psychedelic',
 'new french touch',
 'new orleans rap',
 'new romantic',
 'new wave',
 'new wave pop',
 'norteno',
 'north carolina hip hop',
 'norwegian pop',
 'nouvelle chanson francaise',
 'nu metal',
 'nuevo regional mexicano',
 'nyc pop',
 'nyc rap',
 'nz pop',
 'oakland hip hop',
 'ohio hip hop',
 'old school rap francais',
 'oulu metal',
 'pacific islands pop',
 'pagode baiano',
 'panamanian pop',
 'permanent wave',
 'perreo',
 'philly rap',
 'piano rock',
 'piseiro',
 'pittsburgh rap',
 'plugg',
 'png pop',
 'pop',
 'pop argentino',
 'pop dance',
 'pop edm',
 'pop house',
 'pop nacional',
 'pop r&b',
 'pop rap',
 'pop reggaeton',
 'pop rock',
 'pop soul',
 'pop urbaine',
 'pop venezolano',
 'post-grunge',
 'post-teen pop',
 'progressive electro house',
 'progressive house',
 'puerto rican pop',
 'punk',
 'queens hip hop',
 'quiet storm',
 'r&b',
 'r&b brasileiro',
 'r&b en espanol',
 'ranchera',
 'rap',
 'rap belge',
 'rap cearense',
 'rap conciencia',
 'rap conscient',
 'rap dominicano',
 'rap francais',
 'rap latina',
 'rap marseille',
 'rap metal',
 'rave funk',
 'reggaeton',
 'reggaeton colombiano',
 'reggaeton flow',
 'regional mexican',
 'rhode island rap',
 'rock',
 'rock-and-roll',
 'rockabilly',
 'sad rap',
 'scandipop',
 'seattle hip hop',
 'sertanejo',
 'sertanejo pop',
 'sertanejo universitario',
 'sheffield indie',
 'shiver pop',
 'show tunes',
 'singer-songwriter',
 'slap house',
 'social media pop',
 'soft rock',
 'soul',
 'soundtrack',
 'south african house',
 'southern hip hop',
 'southern soul',
 'sudanese pop',
 'sunnlensk tonlist',
 'surf punk',
 'swedish pop',
 'swing',
 'synthpop',
 'talent show',
 'tekk',
 'tennessee hip hop',
 'torch song',
 'toronto rap',
 'trance',
 'trancecore',
 'trap',
 'trap argentino',
 'trap boricua',
 'trap brasileiro',
 'trap chileno',
 'trap italiana',
 'trap latino',
 'trap queen',
 'trap soul',
 'trap triste',
 'tropical house',
 'turkish trap',
 'uk alternative hip hop',
 'uk alternative pop',
 'uk dance',
 'uk funky',
 'uk hip hop',
 'uk metalcore',
 'uk pop',
 'underground hip hop',
 'urban contemporary',
 'urbano espanol',
 'vancouver indie',
 'vapor trap',
 'vegas indie',
 'venezuelan hip hop',
 'viral pop',
 'viral rap',
 'vocal jazz',
 'weirdcore',
 'west coast rap',
 'yacht rock'}

Practice with axis#

Reminder:

The axis keyword argument indicates which axis labels are (potentially) changing. If the row labels could change, then we use axis=0. If the column labels could change, then we use axis=1.

df1 = pd.DataFrame({"A": [2, 3, 5], "B": [7, 13, -11]})
df2 = pd.DataFrame({"A": [-1, 0, 5], "B": [2, 2, 2]})
df1
A B
0 2 7
1 3 13
2 5 -11

Can you predict what the following will do?

  • df1.sum(axis=0)

df1.sum(axis=0)
A    10
B     9
dtype: int64
  • df1.min(axis=1)

df1.min(axis=1)
0     2
1     3
2   -11
dtype: int64
  • pd.concat([df1, df2], axis=0)

pd.concat([df1, df2], axis=0)
A B
0 2 7
1 3 13
2 5 -11
0 -1 2
1 0 2
2 5 2
  • pd.concat([df1, df2], axis=1)

pd.concat([df1, df2], axis=1)
A B A B
0 2 7 -1 2
1 3 13 0 2
2 5 -11 5 2

Define df3 = pd.concat([df1, df2], axis=0). What are the following?

  • df3.apply(lambda ser: ser.idxmin(), axis=0)

  • df3.apply(lambda ser: ser.argmin(), axis=0) (I was surprised that df3.argmin(axis=0) didn’t work as an abbreviation for this last one.)

df3 = pd.concat([df1, df2], axis=0)
df3
A B
0 2 7
1 3 13
2 5 -11
0 -1 2
1 0 2
2 5 2
df3.apply(lambda ser: ser.idxmin(), axis=0)
A    0
B    2
dtype: int64
df3.apply(lambda ser: ser.argmin(), axis=0)
A    3
B    2
dtype: int64
pd.concat([df1, df2], axis=0).idxmin(axis=0)
A    0
B    2
dtype: int64
pd.concat([df1, df2], axis=1)
A B A B
0 2 7 -1 2
1 3 13 0 2
2 5 -11 5 2