More practice with the Spotify dataset

More practice with the Spotify dataset

Recording of lecture from 1/26/2022

The best way to import this dataset is to use

pd.read_csv("spotify_dataset.csv", na_values=" ")

That is what we did last time. But it’s also good practice to try making the conversions ourselves. This will give us a chance to try using two important pandas DataFrame methods:

  • apply

  • applymap

These two methods fit into the same family as the pandas Series method

  • map

import numpy as np
import pandas as pd
import altair as alt
# Leaving out the useful na_values keyword argument.
# We will have to do some cleaning of this dataset by hand.
df = pd.read_csv("../data/spotify_dataset.csv")
df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb

5 rows × 23 columns

df.dtypes
Index                         int64
Highest Charting Position     int64
Number of Times Charted       int64
Week of Highest Charting     object
Song Name                    object
Streams                      object
Artist                       object
Artist Followers             object
Song ID                      object
Genre                        object
Release Date                 object
Weeks Charted                object
Popularity                   object
Danceability                 object
Energy                       object
Loudness                     object
Speechiness                  object
Acousticness                 object
Liveness                     object
Tempo                        object
Duration (ms)                object
Valence                      object
Chord                        object
dtype: object
pd.to_numeric(df["Energy"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string " "

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_5565/287173585.py in <module>
----> 1 pd.to_numeric(df["Energy"])

~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    181         coerce_numeric = errors not in ("ignore", "raise")
    182         try:
--> 183             values, _ = lib.maybe_convert_numeric(
    184                 values, set(), coerce_numeric=coerce_numeric
    185             )

~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string " " at position 35
df.replace(" ",np.nan)
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1551 1552 195 1 2019-12-27--2020-01-03 New Rules 4,630,675 Dua Lipa 27167675 2ekn2ttSfGqwhhate0LSR0 ['dance pop', 'pop', 'uk pop'] ... 0.762 0.7 -6.021 0.0694 0.00261 0.153 116.073 209320 0.608 A
1552 1553 196 1 2019-12-27--2020-01-03 Cheirosa - Ao Vivo 4,623,030 Jorge & Mateus 15019109 2PWjKmjyTZeDpmOUa3a5da ['sertanejo', 'sertanejo universitario'] ... 0.528 0.87 -3.123 0.0851 0.24 0.333 152.37 181930 0.714 B
1553 1554 197 1 2019-12-27--2020-01-03 Havana (feat. Young Thug) 4,620,876 Camila Cabello 22698747 1rfofaqEpACxVEHIZBJe6W ['dance pop', 'electropop', 'pop', 'post-teen ... ... 0.765 0.523 -4.333 0.03 0.184 0.132 104.988 217307 0.394 D
1554 1555 198 1 2019-12-27--2020-01-03 Surtada - Remix Brega Funk 4,607,385 Dadá Boladão, Tati Zaqui, OIK 208630 5F8ffc8KWKNawllr5WsW0r ['brega funk', 'funk carioca'] ... 0.832 0.55 -7.026 0.0587 0.249 0.182 154.064 152784 0.881 F
1555 1556 199 1 2019-12-27--2020-01-03 Lover (Remix) [feat. Shawn Mendes] 4,595,450 Taylor Swift 42227614 3i9UVldZOE0aD0JnyfAZZ0 ['pop', 'post-teen pop'] ... 0.448 0.603 -7.176 0.064 0.433 0.0862 205.272 221307 0.422 G

1556 rows × 23 columns

df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb

5 rows × 23 columns

df.replace(1,"Hello")
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 Hello Hello 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 Hello 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 Hello 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1551 1552 195 Hello 2019-12-27--2020-01-03 New Rules 4,630,675 Dua Lipa 27167675 2ekn2ttSfGqwhhate0LSR0 ['dance pop', 'pop', 'uk pop'] ... 0.762 0.7 -6.021 0.0694 0.00261 0.153 116.073 209320 0.608 A
1552 1553 196 Hello 2019-12-27--2020-01-03 Cheirosa - Ao Vivo 4,623,030 Jorge & Mateus 15019109 2PWjKmjyTZeDpmOUa3a5da ['sertanejo', 'sertanejo universitario'] ... 0.528 0.87 -3.123 0.0851 0.24 0.333 152.37 181930 0.714 B
1553 1554 197 Hello 2019-12-27--2020-01-03 Havana (feat. Young Thug) 4,620,876 Camila Cabello 22698747 1rfofaqEpACxVEHIZBJe6W ['dance pop', 'electropop', 'pop', 'post-teen ... ... 0.765 0.523 -4.333 0.03 0.184 0.132 104.988 217307 0.394 D
1554 1555 198 Hello 2019-12-27--2020-01-03 Surtada - Remix Brega Funk 4,607,385 Dadá Boladão, Tati Zaqui, OIK 208630 5F8ffc8KWKNawllr5WsW0r ['brega funk', 'funk carioca'] ... 0.832 0.55 -7.026 0.0587 0.249 0.182 154.064 152784 0.881 F
1555 1556 199 Hello 2019-12-27--2020-01-03 Lover (Remix) [feat. Shawn Mendes] 4,595,450 Taylor Swift 42227614 3i9UVldZOE0aD0JnyfAZZ0 ['pop', 'post-teen pop'] ... 0.448 0.603 -7.176 0.064 0.433 0.0862 205.272 221307 0.422 G

1556 rows × 23 columns

df.replace("100","Hello")
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1551 1552 195 1 2019-12-27--2020-01-03 New Rules 4,630,675 Dua Lipa 27167675 2ekn2ttSfGqwhhate0LSR0 ['dance pop', 'pop', 'uk pop'] ... 0.762 0.7 -6.021 0.0694 0.00261 0.153 116.073 209320 0.608 A
1552 1553 196 1 2019-12-27--2020-01-03 Cheirosa - Ao Vivo 4,623,030 Jorge & Mateus 15019109 2PWjKmjyTZeDpmOUa3a5da ['sertanejo', 'sertanejo universitario'] ... 0.528 0.87 -3.123 0.0851 0.24 0.333 152.37 181930 0.714 B
1553 1554 197 1 2019-12-27--2020-01-03 Havana (feat. Young Thug) 4,620,876 Camila Cabello 22698747 1rfofaqEpACxVEHIZBJe6W ['dance pop', 'electropop', 'pop', 'post-teen ... ... 0.765 0.523 -4.333 0.03 0.184 0.132 104.988 217307 0.394 D
1554 1555 198 1 2019-12-27--2020-01-03 Surtada - Remix Brega Funk 4,607,385 Dadá Boladão, Tati Zaqui, OIK 208630 5F8ffc8KWKNawllr5WsW0r ['brega funk', 'funk carioca'] ... 0.832 0.55 -7.026 0.0587 0.249 0.182 154.064 152784 0.881 F
1555 1556 199 1 2019-12-27--2020-01-03 Lover (Remix) [feat. Shawn Mendes] 4,595,450 Taylor Swift 42227614 3i9UVldZOE0aD0JnyfAZZ0 ['pop', 'post-teen pop'] ... 0.448 0.603 -7.176 0.064 0.433 0.0862 205.272 221307 0.422 G

1556 rows × 23 columns

df = df.replace(" ",np.nan)

We could just as well have used df.replace(" ",np.nan, inplace=True). Not the same as df = df.replace(" ",np.nan, inplace=True).

pd.to_numeric(df.Energy)
0       0.800
1       0.764
2       0.664
3       0.897
4       0.704
        ...  
1551    0.700
1552    0.870
1553    0.523
1554    0.550
1555    0.603
Name: Energy, Length: 1556, dtype: float64
# if s is a blank space, replace it with Not a Number
# otherwise leave s the same
def rep_blank(s):
    if s == " ":
        return np.nan
    else:
        return s
rep_blank(7)
7
rep_blank(" ")
nan
help(df.applymap)
Help on method applymap in module pandas.core.frame:

applymap(func: 'PythonFuncType', na_action: 'str | None' = None, **kwargs) -> 'DataFrame' method of pandas.core.frame.DataFrame instance
    Apply a function to a Dataframe elementwise.
    
    This method applies a function that accepts and returns a scalar
    to every element of a DataFrame.
    
    Parameters
    ----------
    func : callable
        Python function, returns a single value from a single value.
    na_action : {None, 'ignore'}, default None
        If ‘ignore’, propagate NaN values, without passing them to func.
    
        .. versionadded:: 1.2
    
    **kwargs
        Additional keyword arguments to pass as keywords arguments to
        `func`.
    
        .. versionadded:: 1.3.0
    
    Returns
    -------
    DataFrame
        Transformed DataFrame.
    
    See Also
    --------
    DataFrame.apply : Apply a function along input axis of DataFrame.
    
    Examples
    --------
    >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
    >>> df
           0      1
    0  1.000  2.120
    1  3.356  4.567
    
    >>> df.applymap(lambda x: len(str(x)))
       0  1
    0  3  4
    1  5  5
    
    Like Series.map, NA values can be ignored:
    
    >>> df_copy = df.copy()
    >>> df_copy.iloc[0, 0] = pd.NA
    >>> df_copy.applymap(lambda x: len(str(x)), na_action='ignore')
          0  1
    0  <NA>  4
    1     5  5
    
    Note that a vectorized version of `func` often exists, which will
    be much faster. You could square each number elementwise.
    
    >>> df.applymap(lambda x: x**2)
               0          1
    0   1.000000   4.494400
    1  11.262736  20.857489
    
    But it's better to avoid applymap in that case.
    
    >>> df ** 2
               0          1
    0   1.000000   4.494400
    1  11.262736  20.857489
# Apply this function to every entry in the DataFrame
df = df.applymap(rep_blank)
pd.to_numeric(df["Energy"])
0       0.800
1       0.764
2       0.664
3       0.897
4       0.704
        ...  
1551    0.700
1552    0.870
1553    0.523
1554    0.550
1555    0.603
Name: Energy, Length: 1556, dtype: float64
# Our rep_blank function is not very Pythonic
# there's a more concise way to make the same thing
df = pd.read_csv("../data/spotify_dataset.csv")
pd.to_numeric(df["Energy"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string " "

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_5565/287173585.py in <module>
----> 1 pd.to_numeric(df["Energy"])

~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    181         coerce_numeric = errors not in ("ignore", "raise")
    182         try:
--> 183             values, _ = lib.maybe_convert_numeric(
    184                 values, set(), coerce_numeric=coerce_numeric
    185             )

~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string " " at position 35
# if s is a blank space, replace it with Not a Number
# otherwise leave s the same
def rep_blank(s):
    if s == " ":
        return np.nan
    else:
        return s
df.applymap(lambda s: np.nan if s == " ")
  File "/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_5565/1990719709.py", line 1
    df.applymap(lambda s: np.nan if s == " ")
                                            ^
SyntaxError: invalid syntax

map is a method for pandas Series

applymap is a method for pandas DataFrames

# more Pythonic
df.applymap(lambda s: np.nan if s == " " else s)
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1551 1552 195 1 2019-12-27--2020-01-03 New Rules 4,630,675 Dua Lipa 27167675 2ekn2ttSfGqwhhate0LSR0 ['dance pop', 'pop', 'uk pop'] ... 0.762 0.7 -6.021 0.0694 0.00261 0.153 116.073 209320 0.608 A
1552 1553 196 1 2019-12-27--2020-01-03 Cheirosa - Ao Vivo 4,623,030 Jorge & Mateus 15019109 2PWjKmjyTZeDpmOUa3a5da ['sertanejo', 'sertanejo universitario'] ... 0.528 0.87 -3.123 0.0851 0.24 0.333 152.37 181930 0.714 B
1553 1554 197 1 2019-12-27--2020-01-03 Havana (feat. Young Thug) 4,620,876 Camila Cabello 22698747 1rfofaqEpACxVEHIZBJe6W ['dance pop', 'electropop', 'pop', 'post-teen ... ... 0.765 0.523 -4.333 0.03 0.184 0.132 104.988 217307 0.394 D
1554 1555 198 1 2019-12-27--2020-01-03 Surtada - Remix Brega Funk 4,607,385 Dadá Boladão, Tati Zaqui, OIK 208630 5F8ffc8KWKNawllr5WsW0r ['brega funk', 'funk carioca'] ... 0.832 0.55 -7.026 0.0587 0.249 0.182 154.064 152784 0.881 F
1555 1556 199 1 2019-12-27--2020-01-03 Lover (Remix) [feat. Shawn Mendes] 4,595,450 Taylor Swift 42227614 3i9UVldZOE0aD0JnyfAZZ0 ['pop', 'post-teen pop'] ... 0.448 0.603 -7.176 0.064 0.433 0.0862 205.272 221307 0.422 G

1556 rows × 23 columns

df.applymap(lambda s: s if s != " " else np.nan)
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.8 -4.808 0.0504 0.127 0.359 134.002 211560 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.103 169.928 141806 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.154 0.335 0.0849 166.928 178147 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.364 126.026 231041 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000 0.894 D#/Eb
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1551 1552 195 1 2019-12-27--2020-01-03 New Rules 4,630,675 Dua Lipa 27167675 2ekn2ttSfGqwhhate0LSR0 ['dance pop', 'pop', 'uk pop'] ... 0.762 0.7 -6.021 0.0694 0.00261 0.153 116.073 209320 0.608 A
1552 1553 196 1 2019-12-27--2020-01-03 Cheirosa - Ao Vivo 4,623,030 Jorge & Mateus 15019109 2PWjKmjyTZeDpmOUa3a5da ['sertanejo', 'sertanejo universitario'] ... 0.528 0.87 -3.123 0.0851 0.24 0.333 152.37 181930 0.714 B
1553 1554 197 1 2019-12-27--2020-01-03 Havana (feat. Young Thug) 4,620,876 Camila Cabello 22698747 1rfofaqEpACxVEHIZBJe6W ['dance pop', 'electropop', 'pop', 'post-teen ... ... 0.765 0.523 -4.333 0.03 0.184 0.132 104.988 217307 0.394 D
1554 1555 198 1 2019-12-27--2020-01-03 Surtada - Remix Brega Funk 4,607,385 Dadá Boladão, Tati Zaqui, OIK 208630 5F8ffc8KWKNawllr5WsW0r ['brega funk', 'funk carioca'] ... 0.832 0.55 -7.026 0.0587 0.249 0.182 154.064 152784 0.881 F
1555 1556 199 1 2019-12-27--2020-01-03 Lover (Remix) [feat. Shawn Mendes] 4,595,450 Taylor Swift 42227614 3i9UVldZOE0aD0JnyfAZZ0 ['pop', 'post-teen pop'] ... 0.448 0.603 -7.176 0.064 0.433 0.0862 205.272 221307 0.422 G

1556 rows × 23 columns

# more Pythonic
df = df.applymap(lambda s: np.nan if s == " " else s)
pd.to_numeric(df.Energy)
0       0.800
1       0.764
2       0.664
3       0.897
4       0.704
        ...  
1551    0.700
1552    0.870
1553    0.523
1554    0.550
1555    0.603
Name: Energy, Length: 1556, dtype: float64
df.dtypes
Index                         int64
Highest Charting Position     int64
Number of Times Charted       int64
Week of Highest Charting     object
Song Name                    object
Streams                      object
Artist                       object
Artist Followers             object
Song ID                      object
Genre                        object
Release Date                 object
Weeks Charted                object
Popularity                   object
Danceability                 object
Energy                       object
Loudness                     object
Speechiness                  object
Acousticness                 object
Liveness                     object
Tempo                        object
Duration (ms)                object
Valence                      object
Chord                        object
dtype: object
# How can we get the columns from Popularity to Valence (inclusive)?
df_sub = df.loc[:,"Popularity":"Valence"]
# applymap: its input is a single entry
# apply: its input is an entire row or an entire column
df_sub = df_sub.apply(pd.to_numeric,axis=0)
df_sub.dtypes
Popularity       float64
Danceability     float64
Energy           float64
Loudness         float64
Speechiness      float64
Acousticness     float64
Liveness         float64
Tempo            float64
Duration (ms)    float64
Valence          float64
dtype: object
df.loc[:,"Popularity":"Valence"] = df_sub
df.dtypes
Index                          int64
Highest Charting Position      int64
Number of Times Charted        int64
Week of Highest Charting      object
Song Name                     object
Streams                       object
Artist                        object
Artist Followers              object
Song ID                       object
Genre                         object
Release Date                  object
Weeks Charted                 object
Popularity                   float64
Danceability                 float64
Energy                       float64
Loudness                     float64
Speechiness                  float64
Acousticness                 float64
Liveness                     float64
Tempo                        float64
Duration (ms)                float64
Valence                      float64
Chord                         object
dtype: object
pd.to_numeric(df.Streams)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "48,633,449"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_5565/2208871995.py in <module>
----> 1 pd.to_numeric(df.Streams)

~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    181         coerce_numeric = errors not in ("ignore", "raise")
    182         try:
--> 183             values, _ = lib.maybe_convert_numeric(
    184                 values, set(), coerce_numeric=coerce_numeric
    185             )

~/miniconda3/envs/math11/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "48,633,449" at position 0
pd.to_numeric(df.Streams.map(lambda s: s.replace(",","")))
0       48633449
1       47248719
2       40162559
3       37799456
4       33948454
          ...   
1551     4630675
1552     4623030
1553     4620876
1554     4607385
1555     4595450
Name: Streams, Length: 1556, dtype: int64
df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.800 -4.808 0.0504 0.1270 0.3590 134.002 211560.0 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.1030 169.928 141806.0 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.1540 0.3350 0.0849 166.928 178147.0 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.3640 126.026 231041.0 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000.0 0.894 D#/Eb

5 rows × 23 columns

# Try to swap 3rd row with 1st row
temp = df.iloc[1]
type(temp)
pandas.core.series.Series
df.iloc[1] = df.iloc[3]
df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.800 -4.808 0.0504 0.1270 0.3590 134.002 211560.0 0.589 B
1 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.3640 126.026 231041.0 0.591 B
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.1540 0.3350 0.0849 166.928 178147.0 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.3640 126.026 231041.0 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000.0 0.894 D#/Eb

5 rows × 23 columns

df.iloc[3] = temp
df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.800 -4.808 0.0504 0.1270 0.3590 134.002 211560.0 0.589 B
1 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.3640 126.026 231041.0 0.591 B
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.1540 0.3350 0.0849 166.928 178147.0 0.688 A
3 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.1030 169.928 141806.0 0.478 C#/Db
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000.0 0.894 D#/Eb

5 rows × 23 columns

my_list = [1,10,3,5]
temp = my_list[2]
my_list[2] = my_list[1]
my_list[1] = temp
my_list
[1, 3, 10, 5]
Created in deepnote.com Created in Deepnote