Week 1 Wednesday Lecture¶
Reminders.¶
Quiz during discussion section Thursday. Three questions, each pass/no pass (no partial credit). Will be a chance to redo later. Here are the learning objectives for this quiz:
Choose from among list, tuple, range, set, NumPy array, depending on which is an appropriate data type for a particular task.
Extract basic information from a Python error message.
Extract basic information from Python documentation.
Access elements from lists and list-like structures using indexing and slicing.
Create lists and related structures (such as by using range or np.arange, for loops and if statements, or indexing).
Create matrix-like structures (such as by using for loops, nested for loops, or NumPy commands like np.zeros, np.ones, np.full, or reshape).
Determine the result of executing code by reading through the code step-by-step.
Homework 1 due Friday at 11:30am as an online submission. The instructions for the Streamlit question have changed slightly. (See my Tuesday email or the assignment page or the Ed Discussion thread.)
The best way to search in a list is to use the in
operator. How could we instead search in a list using a for loop?¶
Define a function search_list
which takes as input a list and an element, and as output returns True
if the element is in the list, and returns False
otherwise. You must use a for loop (do not use the in
command).
def search_list(my_list,elt):
b = False
for e in my_list:
if e == elt:
b = True
return b
search_list([3,1,4],2)
False
search_list([3,1,4],1)
True
Make the list [0,4,8,12,...,10**6 - 4]
and call the result my_list
Using %%timeit
, time how long it takes Python to check if 0 is in the list, and next time how long it takes to check if 1 is in the list. Why do you think these numbers are so different? How do these numbers compare between the in
operator and our search_list
function?
my_list = range(0,10**6,4)
type(my_list)
range
my_list = list(range(0,10**6,4))
type(my_list)
list
%%timeit
0 in my_list
29.1 ns ± 0.0157 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%%timeit
1 in my_list
2.77 ms ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Why is searching for 1 thousands of times slower? It has to check every element. Can’t terminate early. With 0, it terminates early
%%timeit
search_list(my_list,1)
6.19 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Our “naively” written search algorithm is almost as good for this element 1 as the built in “in” function in Python.
set vs list¶
Name three advantages of a list over a set:
Lists can have repeated elements.
Lists have an order.
Lists can have a wider variety of elements.
One of the main goals of this notebook is to see a big advantage of a set over a list.
[2,10,[3,1,4]]
[2, 10, [3, 1, 4]]
{2,10,{3,1,4}}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83235/1870974526.py in <module>
----> 1 {2,10,{3,1,4}}
TypeError: unhashable type: 'set'
{2,10,[3,1,4]}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83235/3643581426.py in <module>
----> 1 {2,10,[3,1,4]}
TypeError: unhashable type: 'list'
Search speed¶
Convert the list to a set. In a different cell (not the same one with the conversion), time how long it takes to check if 1 is in the set. Do the same thing for a NumPy array. Do the same thing for a tuple. Do the same thing for a range. (I don’t think you can convert from a list to a range, so you will need to make the correct range “by hand”.)
my_set = set(my_list)
type(my_set)
set
len(my_set)
250000
%%timeit
1 in my_set
31.2 ns ± 0.367 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
my_tuple = tuple(my_list)
%%timeit
1 in my_tuple
2.55 ms ± 6.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
import numpy as np
my_array = np.array(my_list)
%%timeit
1 in my_array
77.5 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
my_range = range(0,10**6,4)
%%timeit
1 in my_range
79.8 ns ± 0.0276 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Efficiency for vectorized operations in NumPy¶
Make a list and a NumPy array corresponding to range(10**7)
. Go through and replace all the odd-indexed entries by their reciprocals, so for example, the resulting list would be [0,1/1,2,1/3,4,1/5,...]
, with the fractions displayed as decimal approximations. How long does this process take in standard Python vs in NumPy? Use the %%time
command, rather than the %%timeit
command. (I originally made a mistake and used the %time
command, but that only times one single line, not the whole cell.)
For the NumPy array, make the datatype a float. (If NumPy thinks the entries are of data type int, and you take their reciprocal, you will get many 0s.)
my_list = list(range(10**7))
my_array = np.array(range(10**7),dtype=float)
my_list[:10]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
my_array[-10:]
array([9999990., 9999991., 9999992., 9999993., 9999994., 9999995.,
9999996., 9999997., 9999998., 9999999.])
%%timeit
w = []
for i in range(len(my_list)):
if i%2 == 0:
w.append(my_list[i])
else:
w.append(1/my_list[i])
1.16 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
w[:10]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83235/1752169815.py in <module>
----> 1 w[:10]
NameError: name 'w' is not defined
my_array[1:10:2]
array([1., 3., 5., 7., 9.])
%%timeit
my_array[1::2] = 1/my_array[1::2]
6.44 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
my_array[:10]
array([0. , 1. , 2. , 0.33333333, 4. ,
0.2 , 6. , 0.14285714, 8. , 0.11111111])