Week 4, Tuesday Discussion
Contents
Week 4, Tuesday Discussion¶
Today:
Go through practice midterm solutions
Reminders:
Homework #3 due tonight
Midterm Thursday during discussion
I have extra notecards, if you lost yours or did not get one
Question 1¶
(a) What is an advantage of a Python tuple in comparison to a Python list?
A: Tuple can go in a set, while list cannot go in a set.
Also, a list can be changed after we create it, while tuple cannot be changed after it is made (e.g. list has
.append()
while tuple does not) (b) What is an advantage ofnp.arange
in comparison torange
?Entries of
np.arange
do not have to be integers, whilerange
can only have integersnp.arange
has many more methods/functions available (e.g. taking reciprocal) and the operations are “vectorized” (makes computation much faster) (c) What will be the result of the following code?
a = range(1,20,4)
b = [2,3,2]
[f"Train{a[i]}{i}" for i in b]
['Train92', 'Train133', 'Train92']
#Notice how this range still only goes up to 17
a2 = range(1,21,4)
list(a2)
[1, 5, 9, 13, 17]
r = range(10)
my_set = {r}
my_list = [r]
type(my_set)
type(my_list[0])
range
(d) What is an example of a similarity between a pandas Series and a Python dictionary?
Elements of both are accessed using square brackets
[]
and calling via labels/keys (some examples below, after part (e))
(e) How can the following error be corrected?
import pandas as pd
df = pd.read_csv("../data/spotify_dataset.csv")
df = df[df.Energy.isin([" "]) == False]
type(df.loc[0,"Energy"])
str
df.Energy = pd.to_numeric(df.Energy) #with the correction
type(df.loc[0,"Energy"])
numpy.float64
df.Energy.mean()
0.633495145631068
type(df["Energy"])
pandas.core.series.Series
df["Energy"][0]
0.8
test = {"Energy":5}
test["Energy"]
5
Question 2¶
Assume df
is a pandas DataFrame, and that its “day” column has strings representing dates. Write code to extract the sub-DataFrame which contains only those rows corresponding to Tuesday or Wednesday.
#Our first goal is to convert the date to a day of the week. I will store this information in a new column
df["weekday"] = pd.to_datetime(df["Release Date"]).dt.day_name()
#The second step is to create a boolean series that returns True if df["weekday"] is Tuesday or Wednesday
bool_ser = df["weekday"].isin(["Tuesday", "Wednesday"])
#Get the sub-DataFrame
df[bool_ser]
Index | Highest Charting Position | Number of Times Charted | Week of Highest Charting | Song Name | Streams | Artist | Artist Followers | Song ID | Genre | ... | Energy | Loudness | Speechiness | Acousticness | Liveness | Tempo | Duration (ms) | Valence | Chord | weekday | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 1 | 18 | 2021-05-07--2021-05-14 | MONTERO (Call Me By Your Name) | 30,071,134 | Lil Nas X | 5473565 | 67BtfxlNbhBmCDR2L2l8qd | ['lgbtq+ hip hop', 'pop rap'] | ... | 0.508 | -6.682 | 0.152 | 0.297 | 0.384 | 178.818 | 137876 | 0.758 | G#/Ab | Wednesday |
29 | 30 | 3 | 28 | 2021-04-02--2021-04-09 | Astronaut In The Ocean | 14,174,752 | Masked Wolf | 365975 | 3Ofmpyhv5UAQ70mENzB277 | ['australian hip hop'] | ... | 0.695 | -6.865 | 0.0913 | 0.175 | 0.15 | 149.996 | 132780 | 0.472 | E | Wednesday |
44 | 45 | 9 | 39 | 2021-02-26--2021-03-05 | The Business | 10,739,770 | Tiësto | 5785065 | 6f3Slt0GbA2bPZlz0aIFXN | ['big room', 'brostep', 'dance pop', 'dutch ed... | ... | 0.620 | -7.079 | 0.232 | 0.414 | 0.112 | 120.031 | 164000 | 0.235 | G#/Ab | Wednesday |
48 | 49 | 29 | 6 | 2021-06-25--2021-07-02 | Fiel - Remix | 10,032,746 | Wisin, Jhay Cortez, Anuel AA, Los Legendarios,... | 6929075 | 43qcs9NpJhDxtG91zxFkj7 | ['latin', 'latin hip hop', 'reggaeton', 'trap ... | ... | 0.711 | -4.733 | 0.0473 | 0.398 | 0.118 | 97.99 | 349547 | 0.573 | F#/Gb | Tuesday |
50 | 51 | 19 | 4 | 2021-07-02--2021-07-09 | Nicky Jam: Bzrp Music Sessions, Vol. 41 | 9,799,701 | Bizarrap, Nicky Jam | 3126961 | 03LfOYi0icz4souspZVVhq | ['argentine hip hop', 'pop venezolano', 'trap ... | ... | 0.849 | -3.167 | 0.153 | 0.0913 | 0.145 | 89.907 | 158087 | 0.818 | C#/Db | Wednesday |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1457 | 1458 | 136 | 1 | 2020-01-31--2020-02-07 | Mon Ami | 5,774,770 | Samra | 1045091 | 1R4xkZXQUQ8QJtAdwHkSgC | ['german hip hop'] | ... | 0.731 | -3.723 | 0.383 | 0.188 | 0.111 | 105.666 | 138409 | 0.537 | G#/Ab | Wednesday |
1464 | 1465 | 106 | 6 | 2020-01-03--2020-01-10 | Liar | 4,896,939 | Camila Cabello | 22698747 | 7LzouaWGFCy4tkXDOOnEyM | ['dance pop', 'electropop', 'pop', 'post-teen ... | ... | 0.498 | -6.684 | 0.0456 | 0.0169 | 0.319 | 98.016 | 207039 | 0.652 | B | Wednesday |
1547 | 1548 | 156 | 1 | 2019-12-27--2020-01-03 | Combatchy (feat. MC Rebecca) | 5,149,797 | Anitta, Lexa, Luísa Sonza | 10741972 | 2bPtwnrpFNEe8N7Q85kLHw | ['funk carioca', 'funk pop', 'pagode baiano', ... | ... | 0.730 | -3.032 | 0.0809 | 0.383 | 0.0197 | 150.134 | 157600 | 0.605 | C#/Db | Wednesday |
1554 | 1555 | 198 | 1 | 2019-12-27--2020-01-03 | Surtada - Remix Brega Funk | 4,607,385 | Dadá Boladão, Tati Zaqui, OIK | 208630 | 5F8ffc8KWKNawllr5WsW0r | ['brega funk', 'funk carioca'] | ... | 0.550 | -7.026 | 0.0587 | 0.249 | 0.182 | 154.064 | 152784 | 0.881 | F | Wednesday |
1555 | 1556 | 199 | 1 | 2019-12-27--2020-01-03 | Lover (Remix) [feat. Shawn Mendes] | 4,595,450 | Taylor Swift | 42227614 | 3i9UVldZOE0aD0JnyfAZZ0 | ['pop', 'post-teen pop'] | ... | 0.603 | -7.176 | 0.064 | 0.433 | 0.0862 | 205.272 | 221307 | 0.422 | G | Wednesday |
121 rows × 24 columns
#Another option, if you insist on list comprehension :)
good_rows = [c for c in df.index if (df.loc[c,"weekday"] == "Tuesday") |(df.loc[c,"weekday"] == "Wednesday")]
df.loc[good_rows]
Index | Highest Charting Position | Number of Times Charted | Week of Highest Charting | Song Name | Streams | Artist | Artist Followers | Song ID | Genre | ... | Energy | Loudness | Speechiness | Acousticness | Liveness | Tempo | Duration (ms) | Valence | Chord | weekday | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 1 | 18 | 2021-05-07--2021-05-14 | MONTERO (Call Me By Your Name) | 30,071,134 | Lil Nas X | 5473565 | 67BtfxlNbhBmCDR2L2l8qd | ['lgbtq+ hip hop', 'pop rap'] | ... | 0.508 | -6.682 | 0.152 | 0.297 | 0.384 | 178.818 | 137876 | 0.758 | G#/Ab | Wednesday |
29 | 30 | 3 | 28 | 2021-04-02--2021-04-09 | Astronaut In The Ocean | 14,174,752 | Masked Wolf | 365975 | 3Ofmpyhv5UAQ70mENzB277 | ['australian hip hop'] | ... | 0.695 | -6.865 | 0.0913 | 0.175 | 0.15 | 149.996 | 132780 | 0.472 | E | Wednesday |
44 | 45 | 9 | 39 | 2021-02-26--2021-03-05 | The Business | 10,739,770 | Tiësto | 5785065 | 6f3Slt0GbA2bPZlz0aIFXN | ['big room', 'brostep', 'dance pop', 'dutch ed... | ... | 0.620 | -7.079 | 0.232 | 0.414 | 0.112 | 120.031 | 164000 | 0.235 | G#/Ab | Wednesday |
48 | 49 | 29 | 6 | 2021-06-25--2021-07-02 | Fiel - Remix | 10,032,746 | Wisin, Jhay Cortez, Anuel AA, Los Legendarios,... | 6929075 | 43qcs9NpJhDxtG91zxFkj7 | ['latin', 'latin hip hop', 'reggaeton', 'trap ... | ... | 0.711 | -4.733 | 0.0473 | 0.398 | 0.118 | 97.99 | 349547 | 0.573 | F#/Gb | Tuesday |
50 | 51 | 19 | 4 | 2021-07-02--2021-07-09 | Nicky Jam: Bzrp Music Sessions, Vol. 41 | 9,799,701 | Bizarrap, Nicky Jam | 3126961 | 03LfOYi0icz4souspZVVhq | ['argentine hip hop', 'pop venezolano', 'trap ... | ... | 0.849 | -3.167 | 0.153 | 0.0913 | 0.145 | 89.907 | 158087 | 0.818 | C#/Db | Wednesday |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1457 | 1458 | 136 | 1 | 2020-01-31--2020-02-07 | Mon Ami | 5,774,770 | Samra | 1045091 | 1R4xkZXQUQ8QJtAdwHkSgC | ['german hip hop'] | ... | 0.731 | -3.723 | 0.383 | 0.188 | 0.111 | 105.666 | 138409 | 0.537 | G#/Ab | Wednesday |
1464 | 1465 | 106 | 6 | 2020-01-03--2020-01-10 | Liar | 4,896,939 | Camila Cabello | 22698747 | 7LzouaWGFCy4tkXDOOnEyM | ['dance pop', 'electropop', 'pop', 'post-teen ... | ... | 0.498 | -6.684 | 0.0456 | 0.0169 | 0.319 | 98.016 | 207039 | 0.652 | B | Wednesday |
1547 | 1548 | 156 | 1 | 2019-12-27--2020-01-03 | Combatchy (feat. MC Rebecca) | 5,149,797 | Anitta, Lexa, Luísa Sonza | 10741972 | 2bPtwnrpFNEe8N7Q85kLHw | ['funk carioca', 'funk pop', 'pagode baiano', ... | ... | 0.730 | -3.032 | 0.0809 | 0.383 | 0.0197 | 150.134 | 157600 | 0.605 | C#/Db | Wednesday |
1554 | 1555 | 198 | 1 | 2019-12-27--2020-01-03 | Surtada - Remix Brega Funk | 4,607,385 | Dadá Boladão, Tati Zaqui, OIK | 208630 | 5F8ffc8KWKNawllr5WsW0r | ['brega funk', 'funk carioca'] | ... | 0.550 | -7.026 | 0.0587 | 0.249 | 0.182 | 154.064 | 152784 | 0.881 | F | Wednesday |
1555 | 1556 | 199 | 1 | 2019-12-27--2020-01-03 | Lover (Remix) [feat. Shawn Mendes] | 4,595,450 | Taylor Swift | 42227614 | 3i9UVldZOE0aD0JnyfAZZ0 | ['pop', 'post-teen pop'] | ... | 0.603 | -7.176 | 0.064 | 0.433 | 0.0862 | 205.272 | 221307 | 0.422 | G | Wednesday |
121 rows × 24 columns
Question 3¶
import numpy as np
rng = np.random.default_rng()
A = rng.integers(0,6,size = (10,10))
#Do not worry about random numbers! You will not be tested on this.
df = pd.DataFrame(A)
#df.loc is label-based!!
df2 = df.loc[5:,3:].copy()
df.head(3)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | 5 | 2 | 4 | 4 | 3 | 3 | 5 | 1 | 3 |
1 | 1 | 0 | 4 | 3 | 2 | 2 | 2 | 4 | 5 | 1 |
2 | 3 | 0 | 4 | 4 | 4 | 2 | 4 | 4 | 2 | 0 |
Will there be a difference between df2.iloc[:,4:].shape
and df2.loc[:,4:].shape
?
df2.iloc[:,4:]
is saying take all of the rows of df2, and take the column in integer position 4 and everything after
df2.loc[:,4:]]
is saying take all of the rows of df2, and take the column labeled 4 and everything after
df2.iloc[:,4:].shape
(5, 3)
df2.loc[:,4:].shape
(5, 6)
Also keep in mind! df.index
returns labels
Question 4¶
df = pd.read_csv("../data/spotify_dataset.csv")
#Find top 10 artists
top_art = df["Artist"].value_counts().index[:10]
df2 = df[df["Artist"].isin(top_art)]
df2
Index | Highest Charting Position | Number of Times Charted | Week of Highest Charting | Song Name | Streams | Artist | Artist Followers | Song ID | Genre | ... | Danceability | Energy | Loudness | Speechiness | Acousticness | Liveness | Tempo | Duration (ms) | Valence | Chord | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | 9 | 3 | 8 | 2021-06-18--2021-06-25 | Yonaguni | 25,030,128 | Bad Bunny | 36142273 | 2JPLbjOn0wPCngEot2STUS | ['latin', 'reggaeton', 'trap latino'] | ... | 0.644 | 0.648 | -4.601 | 0.118 | 0.276 | 0.135 | 179.951 | 206710 | 0.44 | C#/Db |
12 | 13 | 5 | 3 | 2021-07-09--2021-07-16 | Permission to Dance | 22,062,812 | BTS | 37106176 | 0LThjFY2iTtNdd4wviwVV2 | ['k-pop', 'k-pop boy group'] | ... | 0.702 | 0.741 | -5.33 | 0.0427 | 0.00544 | 0.337 | 124.925 | 187585 | 0.646 | A |
13 | 14 | 1 | 19 | 2021-04-02--2021-04-09 | Peaches (feat. Daniel Caesar & Giveon) | 20,294,457 | Justin Bieber | 48504126 | 4iJyoBOLtHqaGxP12qzhQI | ['canadian pop', 'pop', 'post-teen pop'] | ... | 0.677 | 0.696 | -6.181 | 0.119 | 0.321 | 0.42 | 90.03 | 198082 | 0.464 | C |
14 | 15 | 2 | 10 | 2021-05-21--2021-05-28 | Butter | 19,985,713 | BTS | 37106176 | 2bgTY4UwhfBYhGT4HUYStN | ['k-pop', 'k-pop boy group'] | ... | 0.759 | 0.459 | -5.187 | 0.0948 | 0.00323 | 0.0906 | 109.997 | 164442 | 0.695 | G#/Ab |
17 | 18 | 5 | 14 | 2021-04-23--2021-04-30 | Save Your Tears (with Ariana Grande) (Remix) | 18,053,141 | The Weeknd | 35305637 | 37BZB0z9T8Xu7U3e65qxFy | ['canadian contemporary r&b', 'canadian pop', ... | ... | 0.65 | 0.825 | -4.645 | 0.0325 | 0.0215 | 0.0936 | 118.091 | 191014 | 0.593 | C |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1499 | 1500 | 100 | 1 | 2020-01-17--2020-01-24 | Alfred - Interlude | 8,030,151 | Eminem | 46814751 | 4EmunTy7kNBYQivOa8F6b8 | ['detroit hip hop', 'hip hop', 'rap'] | ... | 0.429 | 0.231 | -20.43 | 0.402 | 0.878 | 0.279 | 74.545 | 30133 | 0.914 | F |
1500 | 1501 | 102 | 1 | 2020-01-17--2020-01-24 | Little Engine | 7,913,461 | Eminem | 46814751 | 4qNWEOMyexn7b8Icyk29t9 | ['detroit hip hop', 'hip hop', 'rap'] | ... | 0.769 | 0.811 | -4.162 | 0.228 | 0.0234 | 0.0451 | 155.081 | 177293 | 0.76 | A#/Bb |
1501 | 1502 | 113 | 1 | 2020-01-17--2020-01-24 | I Will (feat. KXNG Crooked, Royce Da 5'9" & Jo... | 7,115,414 | Eminem | 46814751 | 3CJbxqRQ0JNCqboWDNUUeX | ['detroit hip hop', 'hip hop', 'rap'] | ... | 0.635 | 0.543 | -5.941 | 0.067 | 0.0454 | 0.272 | 98.743 | 303000 | 0.036 | G#/Ab |
1549 | 1550 | 187 | 1 | 2019-12-27--2020-01-03 | Let Me Know (I Wonder Why Freestyle) | 4,701,532 | Juice WRLD | 19102888 | 3wwo0bJvDSorOpNfzEkfXx | ['chicago rap', 'melodic rap'] | ... | 0.635 | 0.537 | -7.895 | 0.0832 | 0.172 | 0.418 | 125.028 | 215381 | 0.383 | G |
1555 | 1556 | 199 | 1 | 2019-12-27--2020-01-03 | Lover (Remix) [feat. Shawn Mendes] | 4,595,450 | Taylor Swift | 42227614 | 3i9UVldZOE0aD0JnyfAZZ0 | ['pop', 'post-teen pop'] | ... | 0.448 | 0.603 | -7.176 | 0.064 | 0.433 | 0.0862 | 205.272 | 221307 | 0.422 | G |
295 rows × 23 columns
Question 5¶
import altair as alt
df2 = pd.read_csv("../data/cars.csv")
brush = alt.selection_interval()
c1 = alt.Chart(df2).mark_circle().encode(
x = "Miles_per_Gallon",
y = "Horsepower",
color = "Origin"
).add_selection(
brush
)
c2 = alt.Chart(df2).mark_bar().encode(
x = "Origin",
y = alt.Y("count()", scale=alt.Scale(domain = [0,500]))
).transform_filter(
brush
)
c1|c2