Week 1 Tuesday Discussion

Meeting Times TuTh: 14:00-14:50 in ALP 3600

Office Hours: TBA but will be over Zoom

TA: Yasmeen Baki mailto:ybaki@uci.edu

Plan for Today:

  • Introductions and time to find people to work with ~10 minutes

  • Overview of discussion policies ~5 minutes

  • Getting comfortable with Deepnote (e.g. markdown versus code cells)

  • Practice with uploading data, pandas, and getting started on Homework 1


Overview of Our Discussion Sections

Purpose: Discussion sections are a time for you to reinforce the material that you have been learning in lecture throughout the week. My general plan is for us to try some exercises and go through some homework all together as a class, but also to leave time for individual work where Chupeng, Yufei, and I can go around answering specific questions you may have.

Quizzes: Quizzes will typically be during the last 20 minutes of our Tuesday discussions. I will give a review of the quiz material during the first 30 minutes of our discussion on these days.

Office Hours: Office hours (and Ed Discussion!) are some of the best places to get fast help. Please do not ask detailed questions about your code right after or before our discussion times – this can create serious delays for our class, and those right aftewards.

Email Policy: Email should be reserved for personal/private concerns (e.g. illness, family emergency, etc.), and not for homework or lecture related questions (this is what Ed Discussion, office hours, and discussion is for). Further, please be patient and allow me about 24 hours to get back to your email; in particular, do not send me the same email multiple times.

General Advice and Style Guidelines:

  • Be as organized as possible when saving files on your computer; it helps to have a folder dedicated to this class. Don’t save everything in your Downloads folder!

  • Use descriptive names for variables and files.

  • Use comments to make your code more readable to yourself and others.

  • Start early, start often

  • Ask for help!

Getting Comfortable with Deepnote

All of your work in Deepnote will be done in cells. This is an example of a markdown cell. Markdown cells are used for displaying text, and in our class are an important part of answering homework questions each week.

To create a markdown cell below this cell, we can first use the shortcut + j on Mac, or ctrl + j on PC to create a new code cell. Then, we can convert this new cell to a markdown cell by using the command ⌘+shift+m, or ctrl+shift+m.

Exercise 1: Using only keyboard shorcuts, create a new markdown cell below this one. Write a short self-introduction. Using the code from this exercise, change the font color of your self-introduction to blue.


Remember: Markdown is subtly different on different sites, so what might work in Jupyter or GitHub, for example, might not work in Deepnote.

It is worth taking a look at this list of keyboard shortcuts for working in Deepnote. Spending the time to learn at least a few of these shortcuts now will make your life much easier going forward.

Exercise 2: Use the link above to learn the keyboard shortcut for deleting a cell. Using only keyboard shortcuts, create a new cell and then delete it.

#This is an example of a comment inside of a code cell
#Comments can be used to help people reading your code understand it better...
#they can also be used to remove portions of code from being evaluated (think debugging!)

2**3
8

Exercise 3: Create a new code cell and evaluate 2^3. Is this different than what you would expect?


Uploading files, pandas, and getting started on Homework 1

Exercise 4: Import pandas. Practice uploading a dataset by downloading the csv file found at this link. This is a good time to practice giving your csv file a description name. Load it into this notebook using df = pd.read_csv(...). Explore what df.head(), df.columns, and df.shape return.

import pandas as pd

df = pd.read_csv("../data/spotify_dataset.csv",na_values = " ")
df.head()
Index Highest Charting Position Number of Times Charted Week of Highest Charting Song Name Streams Artist Artist Followers Song ID Genre ... Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence Chord
0 1 1 8 2021-07-23--2021-07-30 Beggin' 48,633,449 Måneskin 3377762.0 3Wrjm47oTz2sjIgck11l5e ['indie rock italiano', 'italian pop'] ... 0.714 0.800 -4.808 0.0504 0.1270 0.3590 134.002 211560.0 0.589 B
1 2 2 3 2021-07-23--2021-07-30 STAY (with Justin Bieber) 47,248,719 The Kid LAROI 2230022.0 5HCyWlXZPP0y6Gqq8TgA20 ['australian hip hop'] ... 0.591 0.764 -5.484 0.0483 0.0383 0.1030 169.928 141806.0 0.478 C#/Db
2 3 1 11 2021-06-25--2021-07-02 good 4 u 40,162,559 Olivia Rodrigo 6266514.0 4ZtFanR9U6ndgddUvNcjcG ['pop'] ... 0.563 0.664 -5.044 0.1540 0.3350 0.0849 166.928 178147.0 0.688 A
3 4 3 5 2021-07-02--2021-07-09 Bad Habits 37,799,456 Ed Sheeran 83293380.0 6PQ88X9TkUIAUIZJHW2upE ['pop', 'uk pop'] ... 0.808 0.897 -3.712 0.0348 0.0469 0.3640 126.026 231041.0 0.591 B
4 5 5 1 2021-07-23--2021-07-30 INDUSTRY BABY (feat. Jack Harlow) 33,948,454 Lil Nas X 5473565.0 27NovPIUIRrOZoCHxABJwK ['lgbtq+ hip hop', 'pop rap'] ... 0.736 0.704 -7.409 0.0615 0.0203 0.0501 149.995 212000.0 0.894 D#/Eb

5 rows × 23 columns

df.columns
Index(['Index', 'Highest Charting Position', 'Number of Times Charted',
       'Week of Highest Charting', 'Song Name', 'Streams', 'Artist',
       'Artist Followers', 'Song ID', 'Genre', 'Release Date', 'Weeks Charted',
       'Popularity', 'Danceability', 'Energy', 'Loudness', 'Speechiness',
       'Acousticness', 'Liveness', 'Tempo', 'Duration (ms)', 'Valence',
       'Chord'],
      dtype='object')
df.shape
(1556, 23)

Exercise 5: Use info() to see what data is stored as numerically; then use describe() to find out the average number of a times a song in the dataset has charted. Write your answers to these questions in a markdown cell.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1556 entries, 0 to 1555
Data columns (total 23 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Index                      1556 non-null   int64  
 1   Highest Charting Position  1556 non-null   int64  
 2   Number of Times Charted    1556 non-null   int64  
 3   Week of Highest Charting   1556 non-null   object 
 4   Song Name                  1556 non-null   object 
 5   Streams                    1556 non-null   object 
 6   Artist                     1556 non-null   object 
 7   Artist Followers           1545 non-null   float64
 8   Song ID                    1545 non-null   object 
 9   Genre                      1545 non-null   object 
 10  Release Date               1545 non-null   object 
 11  Weeks Charted              1556 non-null   object 
 12  Popularity                 1545 non-null   float64
 13  Danceability               1545 non-null   float64
 14  Energy                     1545 non-null   float64
 15  Loudness                   1545 non-null   float64
 16  Speechiness                1545 non-null   float64
 17  Acousticness               1545 non-null   float64
 18  Liveness                   1545 non-null   float64
 19  Tempo                      1545 non-null   float64
 20  Duration (ms)              1545 non-null   float64
 21  Valence                    1545 non-null   float64
 22  Chord                      1545 non-null   object 
dtypes: float64(11), int64(3), object(9)
memory usage: 279.7+ KB
df.describe()
Index Highest Charting Position Number of Times Charted Artist Followers Popularity Danceability Energy Loudness Speechiness Acousticness Liveness Tempo Duration (ms) Valence
count 1556.000000 1556.000000 1556.000000 1.545000e+03 1545.000000 1545.000000 1545.000000 1545.000000 1545.000000 1545.000000 1545.000000 1545.000000 1545.000000 1545.000000
mean 778.500000 87.744216 10.668380 1.471690e+07 70.089320 0.689997 0.633495 -6.348474 0.123656 0.248695 0.181202 122.811023 197940.816828 0.514704
std 449.322824 58.147225 16.360546 1.667579e+07 15.824034 0.142444 0.161577 2.509281 0.110383 0.250326 0.144071 29.591088 47148.930420 0.227326
min 1.000000 1.000000 1.000000 4.883000e+03 0.000000 0.150000 0.054000 -25.166000 0.023200 0.000025 0.019700 46.718000 30133.000000 0.032000
25% 389.750000 37.000000 1.000000 2.123734e+06 65.000000 0.599000 0.532000 -7.491000 0.045600 0.048500 0.096600 97.960000 169266.000000 0.343000
50% 778.500000 80.000000 4.000000 6.852509e+06 73.000000 0.707000 0.642000 -5.990000 0.076500 0.161000 0.124000 122.012000 193591.000000 0.512000
75% 1167.250000 137.000000 12.000000 2.269875e+07 80.000000 0.796000 0.752000 -4.711000 0.165000 0.388000 0.217000 143.860000 218902.000000 0.691000
max 1556.000000 200.000000 142.000000 8.333778e+07 100.000000 0.980000 0.970000 1.509000 0.884000 0.994000 0.962000 205.272000 588139.000000 0.979000

Exercise 6: Using slicing techniques from Monday’s lecture, create a new dataframe which has just the “Song Name” column from the original dataframe.

df2 = df.loc[:,"Song Name"]

Exercise 7: Using value_counts(), determine how many times each artist appears in the dataset. Then pick an artist and use boolean indexing to find all songs by that artist in the original dataframe.

df["Artist"].value_counts()
Taylor Swift                     52
Lil Uzi Vert                     32
Justin Bieber                    32
Juice WRLD                       30
Pop Smoke                        29
                                 ..
Chris Brown, Young Thug           1
Rauw Alejandro, J Balvin          1
347aidan                          1
Migrantes, Alico                  1
Dadá Boladão, Tati Zaqui, OIK     1
Name: Artist, Length: 716, dtype: int64
df3 = df[df["Artist"] == "Taylor Swift"]["Song Name"]
df3
398     Mr. Perfectly Fine (Taylor’s Version) (From Th...
421                         Love Story (Taylor’s Version)
424                                                willow
428                 You Belong With Me (Taylor’s Version)
429                           Fearless (Taylor’s Version)
431                            Fifteen (Taylor’s Version)
432                The Way I Loved You (Taylor’s Version)
433     You All Over Me (feat. Maren Morris) (Taylor’s...
434                        Hey Stephen (Taylor’s Version)
435                        White Horse (Taylor’s Version)
436                   Forever & Always (Taylor’s Version)
437     Breathe (feat. Colbie Caillat) (Taylor’s Version)
439     That’s When (feat. Keith Urban) (Taylor’s Vers...
440                        Tell Me Why (Taylor’s Version)
441                   You’re Not Sorry (Taylor’s Version)
444         Don’t You (Taylor’s Version) (From The Vault)
445     We Were Happy (Taylor’s Version) (From The Vault)
585                                    champagne problems
608                        no body, no crime (feat. HAIM)
667                                  ‘tis the damn season
671                                             gold rush
688                                   Christmas Tree Farm
691                                           tolerate it
694                                             happiness
695                                                   ivy
696                                              dorothea
697                     coney island (feat. The National)
698                             evermore (feat. Bon Iver)
699                                      long story short
700                                        cowboy like me
701                                              marjorie
702                                               closure
713                                              cardigan
889                                exile (feat. Bon Iver)
921                                                 the 1
942                                                august
948                       the last great american dynasty
950                                     my tears ricochet
960                                      invisible string
965                                            mirrorball
966                                                 seven
967                                     this is me trying
968                                                 betty
970                                       illicit affairs
976                                             mad woman
977                                              epiphany
978                                                 peace
983                                                  hoax
1374                                You Need To Calm Down
1425          Only The Young - Featured in Miss Americana
1466      ME! (feat. Brendon Urie of Panic! At The Disco)
1555                   Lover (Remix) [feat. Shawn Mendes]
Name: Song Name, dtype: object

Getting Started on Homework 1

  • Remember that you can work in groups of 2-3 students on the homework, and you all can submit the same work. Just remember to include the names of your collaborators. Let’s quickly see how to add collaborators to a project.

  • Thursday we will work on Homework 1 together. It helps if you come prepared to discussion having already found a dataset you would like to use from Kaggle (you will need to create an account). When picking a dataset, here are a few things to keep in mind:

    • Find a dataset that interests you, but spend the majority of your time working on the homework questions. It can be easy to waste time trying to find the perfect dataset.

    • The data you use for this homework should be relatively “clean” already (I will show you an example of a dataset that would be a bad choice to use for this homework). We will have opportunities later in the quarter to work on data cleaning.