Python data types#

We will be working with many different Python data types in Math 9, both built-in data types as well as data types defined in external libraries, like the library NumPy. We’ll start this portion of the class by introducing some of the many different data types.

lists and strings#

We start with two data types that seem very different but which share a lot of common functionality.

We can indicate that an object is a string using quotation marks (either single or double quotation marks). Here we define the variable s to be the string "Hello, world". You can check the type of an object by using the type function.

s = "Hello, world"
type(s)
str
print(s)
Hello, world

Just like we made a string by using quotation marks, here we make a list by using square brackets.

mylist = [3,1,4,1]
type(mylist)
list

Here is one of the many similarities between strings and lists (as well as many other data types in Python): you can use the following indexing notation to extract a part of the object. Notice that numbering in Python starts at 0. I will usually try to refer to this as the “zeroth” element or the “initial” element. For example, the initial element in the string "Hello, world" is the letter "H".

s[0]
'H'
mylist[0]
3

Another similarity between strings and lists is that both have a notion of length.

len(s)
12
len(mylist)
4

Because indexing starts at 0 and the string s has length 12, using s[12] will raise an error. Unfortunately, many Python error messages are difficult to read, but this one is pretty clear, IndexError: string index out of range. (I usually start reading at the bottom of the error message.)

s[len(s)]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_36568/387520648.py in <module>
----> 1 s[len(s)]

IndexError: string index out of range

If you want to get the last element in a string (or list) s, you can use s[len(s)-1], but much more common is to use the negative indexing shorthand s[-1].

s[len(s)-1]
'd'
s[-1]
'd'
mylist[-1]
1

Another use of indexing is to get a sequence of elements (in a string, or a list, or …). Calling s[1:4] will include s[1], s[2], s[3], but it will not include s[4]. That is a common convention in Python: the right endpoint is usually not included.

This type of indexing is called slicing.

s[1:4]
'ell'
s
'Hello, world'

The string s[:5] is the same as s[0:5]. One benefit of the “right endpoint is not included” convention is that s[:5] will be length 5.

s[:5]
'Hello'
s[0:5]
'Hello'

Similarly, the notation s[3:] will go from the element s[3] to the end.

s[3:]
'lo, world'
mylist[1:]
[1, 4, 1]
mylist
[3, 1, 4, 1]

Another type of slicing will specify the step size (this is similar to the colon notation from Matlab, although with Python slicing, the step size goes at the end, whereas in Matlab the step size goes in the middle). For example, s[1:7:2] will include s[1], s[3], s[5], but not s[7] because right endpoints are not included.

s[1:7:2]
'el,'
s
'Hello, world'

The following gets all the even-indexed elements in s.

s[::2]
'Hlo ol'

Similarly, s[1::2] gets all the odd-indexed elements in s (we start at index 1 and go up in steps of 2).

s[1::2]
'el,wrd'

lists, tuples, and sets#

In the previous section, we considered two data types (lists and strings) which seem quite different. Here we will look at three data types (lists, tuples, and sets) that on the surface seem very similar. The difference between lists and tuples is more subtle, but sets are very different from these other two.

As we already saw, you can construct a list using square brackets. To construct a tuple, you can use round parentheses, and to construct a set you can use curly brackets.

mylist = [3,1,4,1]
mytuple = (3,1,4,1)
myset = {3,1,4,1}
type(mytuple)
tuple
type(myset)
set

Here is a first difference with sets: like the mathematical notion of a set, sets in Python do not allow repeated elements. Notice how there is only a single 1 shown in myset, even though we defined myset using myset = {3,1,4,1}.

myset
{1, 3, 4}

Here is a major difference between sets and the other two: sets do not support indexing. This is because, like the mathematical notion of a set, sets in Python do not have a notion of order… there is no “zeroth” element in a set.

myset[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_37067/435093538.py in <module>
----> 1 myset[0]

TypeError: 'set' object is not subscriptable
mytuple[0]
3
mylist[0]
3

Sets, tuples, and lists all have a notion of length.

len(mytuple)
4
len(myset)
3
mylist[2]
4

Here is a first difference between lists and tuples: you can change elements in a list but not in a tuple. This is usually described as saying that tuples are “immutable”.

mylist[2] = 17
mylist
[3, 1, 17, 1]
mytuple[2]
4
mytuple[2] = 17
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_37067/2796111138.py in <module>
----> 1 mytuple[2] = 17

TypeError: 'tuple' object does not support item assignment

Another consequence of lists being mutable and tuples being immutable is that lists have an append method and tuples do not. (A method is very similar to a function. It is like a function which is attached to a specific data type.)

mylist.append(8)
mylist
[3, 1, 17, 1, 8]
mytuple.append(8)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_37067/4200240113.py in <module>
----> 1 mytuple.append(8)

AttributeError: 'tuple' object has no attribute 'append'

It might seem like lists are better than tuples in every way. One key advantage of a tuple over a list is that you can use tuples in certain situations where lists are not allowed. As an example, an element in a set is allowed to be a tuple, but you cannot put a list into a set.

newset = {3,1,(4,1)}
type(newset)
set
len(newset)
3
newset2 = {3,1,[4,1]}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_37067/4210761174.py in <module>
----> 1 newset2 = {3,1,[4,1]}

TypeError: unhashable type: 'list'

ranges and for loops#

A Python range object is maybe a little more unusual (or at least more specialized) than the other Python data types we have seen before. On the other hand, range objects usually show up very early when learning Python, because many of the most basic types of for loops are made using range objects. Before we see those for loop examples using range objects, let’s see some for loop examples using the data types we have already met.

mystring = "Hello, world"
mylist = [3,1,4,1]
mytuple = (3,1,4,1)
myset = {3,1,4,1}
for x in mylist:
    print(x)
3
1
4
1
for x in mystring:
    print(x)
H
e
l
l
o
,
 
w
o
r
l
d

It may come as a surprise that white space in Python is very meaningful. For example, notice how all three of the following print statements are indented the same amount. Python repeats each of these three print statements each iteration looping through the for loop.

for x in mystring:
    print(x)
    print(type(x))
    print(len(x))
H
<class 'str'>
1
e
<class 'str'>
1
l
<class 'str'>
1
l
<class 'str'>
1
o
<class 'str'>
1
,
<class 'str'>
1
 
<class 'str'>
1
w
<class 'str'>
1
o
<class 'str'>
1
r
<class 'str'>
1
l
<class 'str'>
1
d
<class 'str'>
1

On the other hand, in the following, only the print(x) command is indented, so only that command is repeated by the for loop. The other two print commands are executed after the for loop has completed.

for x in mystring:
    print(x)
print(type(x))
print(len(x))
H
e
l
l
o
,
 
w
o
r
l
d
<class 'str'>
1
for x in mytuple:
    print(x)
3
1
4
1
for x in myset:
    print(x)
1
3
4
myset
{1, 3, 4}

Here is our first example of a range object. In this case, it is used as a quick way to repeat the print statement 5 times.

for i in range(5):
    print("Hello, world")
Hello, world
Hello, world
Hello, world
Hello, world
Hello, world

We never used the i variable in the previous example. Here is an example using the i variable. The syntax is very similar to the slicing syntax from above: in particular, the step size comes at the end.

for i in range(2,10,3):
    print(i)
2
5
8

You can just as easily move backwards (both in slicing and in range objects) by specifying a negative step size. Notice that, like usual in Python, the right endpoint (3 in this case) is not included.

for i in range(10,3,-1):
    print(i)
10
9
8
7
6
5
4

The expression range(10,3,-1) above is its own type of object, a range object.

myrange = range(10,3,-1)
type(myrange)
range
myrange
range(10, 3, -1)
myrange2 = range(3,1000)
myrange2[0]
3
myrange2[-1]
999

Range objects support slicing. That’s not too important; maybe the most interesting thing about slicing in the context of range objects is that it again produces a range object.

myrange2[10:100:4]
range(13, 103, 4)

You can also compute the length of a range object, the same way as you can compute the length of a string, list, tuple, or set.

len(myrange2)
997

ints, floats, and bools#

We’re almost done with what I consider the “essential” Python data types. We will introduce three more data types in this section, and we will introduce dictionaries in the next section.

z = 10/2

The result of the following expression is the Boolean value True. Be sure to notice that the word True is capitalized (and be sure not to put it in quotation marks; it is not a string).

z == 5
True
True
True
true
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_45994/592217714.py in <module>
----> 1 true

NameError: name 'true' is not defined
False
False
type(False)
bool

Because z == 5 is True, you probably expect range(z) to work the same as range(5), but there is a subtle difference. The problem is that because z was defined as 10/2, Python set its data type as a floating point number (like a decimal, not an integer), and range objects can only be created using integers.

range(5)
range(0, 5)
range(z)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_45994/374006378.py in <module>
----> 1 range(z)

TypeError: 'float' object cannot be interpreted as an integer
type(5)
int
type(z)
float

Because computers can only specify decimals (floats, real numbers) to a finite degree of precision, some subtleties are inevitable. One consequence is that it is almost never correct to ask if two floating points are equal. Here is an example where two floats are obviously mathematically equal, but Python reports them as being unequal.

(0.1 + 0.1 + 0.1) == 0.3
False

dictionaries#

The last built-in Python data type I want to introduce in this portion of the class is the dictionary data type. Here is an example of constructing a dictionary. The portions before the colons are called “keys” and the portions after the colons are called “values”.

d = {"A": 90, "B": 80, "C": 70, "D": 60, "F": 0}
type(d)
dict

The next two errors are saying that 20 and 0 (the integers, not the strings) do not occur as keys in d.

d[20]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_50443/1661466599.py in <module>
----> 1 d[20]

KeyError: 20
d[0]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_50443/1089268471.py in <module>
----> 1 d[0]

KeyError: 0

Here is an example of adding a new key to the dictionary. Notice how it gets displayed without quotation marks, because 0 is not the same as "0" (the first one is an integer, while the second one is a string).

d[0] = [3,1,4,1]
d
{'A': 90, 'B': 80, 'C': 70, 'D': 60, 'F': 0, 0: [3, 1, 4, 1]}

We can also change a value associated to a key using the same kind of syntax. Notice how the old value gets deleted; there cannot be repeated keys in a dictionary.

d["B"] = 85
d
{'A': 90, 'B': 85, 'C': 70, 'D': 60, 'F': 0, 0: [3, 1, 4, 1]}

I probably should have done this earlier, but here is how you access the value associated for example to the key "C".

d["C"]
70

Here we add a new key, and the value is itself a dictionary. The two things to notice in this next example are that both 0 and "0" show up as keys, and that a dictionary can be a value in a dictionary.

d["0"] = {"hello": "first word", "world": "second word"}
d
{'A': 90,
 'B': 85,
 'C': 70,
 'D': 60,
 'F': 0,
 0: [3, 1, 4, 1],
 '0': {'hello': 'first word', 'world': 'second word'}}

Remember that lists were not allowed to go in sets, but tuples were allowed. It is the same with keys in dictionaries. (Values in dictionaries can be anything, but there are restrictions on keys; the same restrictions as on elements in sets.)

myset = {3,1,[4,1]}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_50443/1236779418.py in <module>
----> 1 myset = {3,1,[4,1]}

TypeError: unhashable type: 'list'
{3,1,(4,1)}
{(4, 1), 1, 3}

We get the exact same error if we try to make the list [4,1] a key in our dictionary.

d[[4,1]] = 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_50443/746167574.py in <module>
----> 1 d[[4,1]] = 1

TypeError: unhashable type: 'list'

But making the tuple (4,1) a key in our dictionary works fine.

d[(4,1)] = 1
d
{'A': 90,
 'B': 85,
 'C': 70,
 'D': 60,
 'F': 0,
 0: [3, 1, 4, 1],
 '0': {'hello': 'first word', 'world': 'second word'},
 (4, 1): 1}

Converting from one type to another#

mylist = [3,1,4,1]

Here is the syntax for creating a tuple out of mylist. Notice that it does not actually change the value of mylist itself. (If something gets displayed as a result of the operation, as (3,1,4,1) does in this case, that is often a hint that the original object did not change.)

tuple(mylist)
(3, 1, 4, 1)
type(mylist)
list

If you actually want to be able to use that tuple you’ve created, you need to save it.

x = tuple(mylist)
type(x)
tuple
x
(3, 1, 4, 1)

Many of the objects we have seen so far can be converted into other data types. For example, this is what happens when you convert from a string to a list.

s = "Hello world"
list(s)
['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

Maybe you would have expected to get a list of words instead of a list of characters. The fact that we get a list of characters is very similar to what happens if we iterate over the elements in the string using a for loop.

for x in s:
    print(x)
H
e
l
l
o
 
w
o
r
l
d

Here is an example where conversion does not work. This particular list mylist cannot be converted to a dictionary.

mylist
[3, 1, 4, 1]
dict(mylist)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_50904/677643877.py in <module>
----> 1 dict(mylist)

TypeError: cannot convert dictionary update sequence element #0 to a sequence

Some lists can be converted to a dictionary. For example, this list of length-2 tuples can be converted. Python interprets the 0th element in each tuple as the key and the 1st element in each tuple as the value. (Remember that keys can’t be repeated in a dictionary, which is why we only see the key 1 one time.)

mylist2 = [(3,"a"),(1,"b"),(1,"c")]
dict(mylist2)
{3: 'a', 1: 'c'}

A surprisingly useful conversion (that is often done automatically) is going from the Boolean values True and False to the integer values 1 and 0 (respectively). One reason this is useful is that if you add together the elements in a list or array of Trues and Falses, the result will be exactly the number of Trues.

int(True)
1
int(False)
0

You can also add an integer to a Boolean value, and Python will automatically convert the Boolean value to an integer before doing the addition. This functionality itself is not very important, but it’s very important to remember in general that True corresponds to 1 and that False corresponds to 0.

4+True
5

Timing comparisons#

A fundamental example of sets over lists and tuples is that you can search in a set much faster.

10000000
10000000

That integer is pretty hard to read. Notice that we cannot make exponents in Python using the caret symbol, instead you have to use **.

10^7
13
10**7
10000000

Let’s make a range object and convert it into a list, into a tuple, and into a set.

r = range(0,10**7,3)
mylist = list(r)
mytuple = tuple(r)
myset = set(r)

Here we use the operator in to check if 0 is in the corresponding object.

0 in mylist
True
1 in mylist
False

Here is a special feature of Jupyter notebooks, where we can time how long an operation takes. I usually only pay attention to the left-most number, which in this case says 36.2 nanoseconds, where one nanosecond is \(10^{-9}\) seconds.

%%timeit
0 in mylist
36.2 ns ± 0.455 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Very similar speed for checking in mytuple.

%%timeit
0 in mytuple
38.4 ns ± 1.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I said that checking in a set was faster, but it doesn’t seem to be, it’s actually slower in this case.

%%timeit
0 in myset
42.9 ns ± 1.54 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

But the reason checking for 0 was fast in the above examples was because Python could find the element 0 very quickly, since it was at the beginning of the set and the beginning of the tuple. If we instead check for 1, which is not in the list (or the tuple or the set), then we will see that the set version of searching is much faster. Notice that the unit for the list version is ms rather than ns, which represents \(10^{-3}\) seconds. So in this case, the set version is about one million (!) times faster.

%%timeit
1 in mylist
55.2 ms ± 4.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
1 in myset
43.1 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The fact that we can search in a set very fast is related to the restrictions on the elements that can go in a set. (For example, remember that we are not allowed to put a list into a set.) For the same underlying reason, searching for a key in a dictionary is also very fast.