Data types in Python

There are a huge number of data types in Python, both built-in data types (like list) and data types that come from an external library (like NumPy arrays). Often data types can appear quite similar. Here is an example of five data types which all share some common properties:

  • list

  • set

  • tuple

  • range

  • NumPy array

It is important to recognize in what ways these are similar and in what ways they are different. Understanding the pros and cons of different data types is essential to selecting the data type which is best-suited for a given task.

list

One of the most fundamental data types in Python is type list.

my_list = [3,1,4,1,5]
type(my_list)
list
len(my_list)
5

Notice that the following probably doesn’t produce what you expect. The reason is that indexing in Python starts at 0, not at 1.

# indexing
my_list[2]
4
# indexing
my_list[0]
3

If you remember the colon operator from Matlab, then the following, which is known as slicing, should look familiar.

# slicing
my_list[2:]
[4, 1, 5]

One difference from Matlab is that the right-endpoint is usually not included by default in Python. For example, the following code my_list[2:4] produces a list that contains my_list[2] and my_list[3] but does not contain my_list[4].

my_list[2:4]
[4, 1]
my_list
[3, 1, 4, 1, 5]
my_list[2] = "hello Math 10"
my_list
[3, 1, 'hello Math 10', 1, 5]
# this changes my_list
my_list.append(4)
my_list
[3, 1, 'hello Math 10', 1, 5, 4]
new_list = [3,1,4,1]

A common cause of mistakes is thinking that a command will change a structure, when in fact it has no effect on the original object. In the following code, sorted(new_list) doesn’t change new_list. It makes a different list.

# has no effect on new_list
sorted(new_list)
[1, 1, 3, 4]
new_list
[3, 1, 4, 1]

If we wanted to actually replace new_list with a sorted version of new_list, we should use a syntax like the following.

# to change the original
new_list = sorted(new_list)
new_list
[1, 1, 3, 4]
new_list.extend([3,1,4])
new_list
[1, 1, 3, 4, 3, 1, 4]
new_list.append([3,1,4])
new_list
[1, 1, 3, 4, 3, 1, 4, [3, 1, 4]]
new_list[-1]
[3, 1, 4]
type(new_list[-1])
list
type(new_list[-2])
int

set

A set in Python is similar to a list in many ways. For example, it contains a collection of objects. Some differences are that sets do not have an order, and sets cannot have repeated elements. Also sets are more limited in terms of the types of objects they can contain.

We start by viewing the list my_list.

my_list
[3, 1, 'hello Math 10', 1, 5, 4]

Here is the standard way to convert data types in Python. In this case, we are converting from a list to a set.

my_set = set(my_list)
my_set
{1, 3, 4, 5, 'hello Math 10'}
type(my_set)
set

Notice how the square brackets [...] got replaced by curly brackets {...} and how the order changed and the repetitions disappeared.

set([3,1,4,1,-10,5])
{-10, 1, 3, 4, 5}

Sets don’t have a notion of order, so taking the 2nd element doesn’t make sense.

my_set[2]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/1306209179.py in <module>
----> 1 my_set[2]

TypeError: 'set' object is not subscriptable
len(my_set)
5
my_set
{1, 3, 4, 5, 'hello Math 10'}

Here is the by-hand way to make a set (as opposed to converting from a list).

{2,10,4}
{2, 4, 10}

tuple

In simple cases, a tuple is almost indistinguishable from a list. The only real difference in the following first few examples is that the tuple has round parentheses instead of square brackets.

my_tuple = (3,1,4,1)
my_tuple
(3, 1, 4, 1)
my_tuple[2]
4
my_tuple[1:3]
(1, 4)
len(my_tuple)
4

If we want to convert a tuple into a list, we can do that by wrapping the variable name inside of list, as in this example. The first line makes the conversion, and the second line displays the result.

my_list = list(my_tuple)
my_list
[3, 1, 4, 1]

How are tuples different from lists? Here is one example. You should not change a tuple after you’ve created it.

print("hi")
my_tuple[2] = 10
print("hello")
hi
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/3086213120.py in <module>
      1 print("hi")
----> 2 my_tuple[2] = 10
      3 print("hello")

TypeError: 'tuple' object does not support item assignment

Another example of how you should not change a tuple:

my_tuple.append(4)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/3420063398.py in <module>
----> 1 my_tuple.append(4)

AttributeError: 'tuple' object has no attribute 'append'

Why would we ever use tuples? One pragmatic advantage of a tuple over a list is that a tuple can (usually) go in a set, while a list can’t.

# Why would we ever use tuples?
# tuples can go in a set, and lists can't
{5,10,my_tuple}
{(3, 1, 4, 1), 10, 5}
{5,10,my_list}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/3098496682.py in <module>
----> 1 {5,10,my_list}

TypeError: unhashable type: 'list'

Another practical reason tuples are important is that, when you are using Python code written by someone else, it will often involve tuples. Here is a small example and a preview of the NumPy arrays discussed below.

import numpy as np
A = np.array(my_list)
A
array([3, 1, 4, 1])

The variable A represents a NumPy array.

type(A)
numpy.ndarray

NumPy arrays have a shape attribute which tells us the dimensions of the array. (That is especially useful if A represents a matrix.)

A.shape
(4,)

This shape attribute is represented as a tuple.

type(A.shape)
tuple

Here is a discussion of differences between lists and tuples from Stack Overflow, although that discussion, like many discussions on Stack Overflow, is a little advanced for us as we’re just beginning with Python.

range

We won’t use range by itself very often, but we will use it all the time to produce repetitions, such as during a for loop. The range function is similar to how : works in Matlab, although with range, the step-size goes at the end, rather than in the middle.

my_range = range(0,100,3)
my_range
range(0, 100, 3)

This element my_range is yet another type in Python.

type(my_range)
range
list(my_range)
[0,
 3,
 6,
 9,
 12,
 15,
 18,
 21,
 24,
 27,
 30,
 33,
 36,
 39,
 42,
 45,
 48,
 51,
 54,
 57,
 60,
 63,
 66,
 69,
 72,
 75,
 78,
 81,
 84,
 87,
 90,
 93,
 96,
 99]
list(range(0,10,1))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You are not allowed to use range with non-integer entries.

range(0,10,0.5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/1541136516.py in <module>
----> 1 range(0,10,0.5)

TypeError: 'float' object cannot be interpreted as an integer

One reason range is useful is it can represent gigantic pieces of data without actually taking up much space in memory. In the following, we want to write \(10^{40}\), which in Python is expressed using 10**40 (the caret symbol ^ means something else in Python).

my_range = range(0,10**40,3)

It would be a very bad idea to try to convert that into a list, because no computer can store \(10^{40}\) distinct integers.

list(range(0,10**40,3))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/3285818751.py in <module>
----> 1 list(range(0,10**40,3))

OverflowError: Python int too large to convert to C ssize_t

Even though my_range doesn’t literally contain all of those numbers, Python is still smart enough to answer some basic questions about my_range. For example:

  • Is 252523525252421 in my_range?

252523525252421 in my_range
False

What about 252523525252422?

252523525252422 in my_range
True
  • What are the last 4 elements in my_range?

my_range[-4:]
range(9999999999999999999999999999999999999990, 10000000000000000000000000000000000000002, 3)

By default, those 4 elements are also represented as a range object. We can explicitly convert that range object to a list.

list(my_range[-4:])
[9999999999999999999999999999999999999990,
 9999999999999999999999999999999999999993,
 9999999999999999999999999999999999999996,
 9999999999999999999999999999999999999999]

NumPy arrays

All of the above data types are part of standard Python. Often in Math 10, we will instead be working with data types defined in separate Python libraries. NumPy is probably the library we will use second-most often (behind only pandas).

Here is the syntax for importing NumPy. There is a standard abbreviation np which should always be used when referring to this library.

import numpy as np

We saw above that range cannot be used with non-integer values. (These decimal numbers are called floats in Python.)

step = 0.5
type(step)
float
range(0,10,step)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/1917164971.py in <module>
----> 1 range(0,10,step)

TypeError: 'float' object cannot be interpreted as an integer

NumPy defines a similar function called arange which can work with floats. The following though does not work.

arange(0,10,step)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_83230/2951658910.py in <module>
----> 1 arange(0,10,step)

NameError: name 'arange' is not defined

The reason arange(0,10,step) does not work is that Python does not know that arange is defined by NumPy. To tell Python where to look for the definition, we use the syntax np.arange.

A = np.arange(0,10,step)
A
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])
type(A)
numpy.ndarray

NumPy arrays are the last data type we will consider in this notebook. Recall the following syntax for turning my_tuple into a list:

list(my_tuple)
[3, 1, 4, 1]

There is the same sort of syntax for turning my_tuple into a NumPy array:

A = np.array(my_tuple)
A
array([3, 1, 4, 1])
type(A)
numpy.ndarray

Slicing works with NumPy arrays:

A[1:]
array([1, 4, 1])

There are many NumPy functions which can be applied very efficiently to every element in a NumPy array. Here is an example of applying cosine to every entry in A:

np.cos(A)
array([-0.9899925 ,  0.54030231, -0.65364362,  0.54030231])

We will see later examples of how to time operations in Python, and then we will see concretely that many mathematical operations are much faster when performed on a NumPy array than when performed on something like a list or tuple.