Week 1 Thursday Discussion

Week 1 Thursday Discussion

import pandas as pd
  • Make the list [0,-1,-4,-9,...,-81,-100] using list comprehension.

[-x**2 for x in range(11)]
[0, -1, -4, -9, -16, -25, -36, -49, -64, -81, -100]
  • How can you print out the following lines using f-strings?

the square of 1 is 1
the square of 2 is 4
...
the square of 10 is 100
for x in range(11):
    print(f"the square of {x} is {x**2}")
the square of 0 is 0
the square of 1 is 1
the square of 2 is 4
the square of 3 is 9
the square of 4 is 16
the square of 5 is 25
the square of 6 is 36
the square of 7 is 49
the square of 8 is 64
the square of 9 is 81
the square of 10 is 100
indexList = ['NYA', 'IXIC', 'HSI', '000001.SS', 'N225', 'N100', '399001.SZ',
    'GSPTSE', 'NSEI', 'KS11', 'SSMI', 'TWII', 'J203.JO']
  • Using list comprehension, find the sublist of indexList containing all the indexes which have the number “0” in their abbreviation. (Hint. You can use in, just make sure “0” is a str and not an int.)

[x for x in indexList if "0" in x]
['000001.SS', 'N100', '399001.SZ', 'J203.JO']
  • Define df to be the pandas DataFrame from the attached indexData.csv file. For how many values of i in df.index is df.loc[i,"Name"] equal to “HSI”? Compute your answer using list comprehension of the form temp_list = [??? for i in ??? if ???] and then len(temp_list). (Warning. This is not nearly as efficient as the next method.)

  • Compute the same value instead using Boolean indexing from Monday’s lecture and then taking the length of the resulting DataFrame using len.

  • Compute the same value again using Boolean indexing, but this time using ???.shape[0].

  • Compute the same value by evaluating df.Name.value_counts().

df = pd.read_csv("../data/indexData.csv")
temp_list = [i for i in df.index if df.loc[i,"Name"] == "HSI"]
len(temp_list)
8750
df2 = df[df["Name"] == "HSI"]
len(df2)
8750
df2.shape[0]
8750
df.Name.value_counts()
N225         14500
NYA          13948
IXIC         12690
GSPTSE       10776
HSI           8750
GDAXI         8606
SSMI          7830
KS11          6181
TWII          6010
000001.SS     5963
399001.SZ     5928
N100          5507
NSEI          3381
J203.JO       2387
Name: Name, dtype: int64
  • For each index listed in the “Name” column from the indexData.csv file, print out the index’s abbreviation together with how many times it occurs in the file. Use f-strings, Boolean indexing, and the unique method. For example, one line might be “The index HSI occurs 8750 times”. Check your answer using df["Name"].value_counts().

df = pd.read_csv("../data/indexData.csv")
for name in df.Name.unique():
    print(f"The index {name} occurs {len(df[df.Name==name])} times")
The index NYA occurs 13948 times
The index IXIC occurs 12690 times
The index HSI occurs 8750 times
The index 000001.SS occurs 5963 times
The index GSPTSE occurs 10776 times
The index 399001.SZ occurs 5928 times
The index NSEI occurs 3381 times
The index GDAXI occurs 8606 times
The index KS11 occurs 6181 times
The index SSMI occurs 7830 times
The index TWII occurs 6010 times
The index J203.JO occurs 2387 times
The index N225 occurs 14500 times
The index N100 occurs 5507 times
listA = range(30)
listB = range(0,100,3)
  • Using a for loop and append, make a list containing all the elements in listA which are not in listB. (Hint. You can use not in.)

  • Make the same list using list comprehension.

list_diff = []
for x in listA:
    if x not in listB:
        list_diff.append(x)
list_diff
[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29]
[x for x in listA if x not in listB]
[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29]
  • The above list indexList contains the abbreviation for every stock exchange in the indexData.csv file except one. What is the missing stock exchange? (Use the technique from the previous example.)

[x for x in df.Name.unique() if x not in indexList]
['GDAXI']