Worksheet¶
You may submit this worksheet in a group of 1-3 total students.
These worksheets are graded for effort, not for correctness.
Due date. Due at 5:00pm on Tuesday of Week 1.
Question 0:
Name(s):
UCI ID(s):
Motivating question:¶
If we roll five distinct six-sided dice, what is the probability that the number of 1s rolled is equal to the number of 2s rolled?
This combinatorics question can be solved directly without a computer (if you’re interested, try to to figure out where these numbers come from, or try to come up with an alternative method for getting this same answer): $\( \text{prob} = \left(\frac{4}{6}\right)^5 + 20 \, \left(\frac{1}{6}\right)^2 \left(\frac{4}{6}\right)^3 + 30 \,\left(\frac{1}{6}\right)^4 \left(\frac{4}{6}\right) = \frac{101}{324} \approx 0.311728395 \)$ In this notebook, we’ll solve the question in several different ways in Python.
import numpy as np
import pandas as pd
from itertools import product
A probability estimate using NumPy¶
Strategy:
Set a value of \(n\) for “number of experiments”
Make an \(n \times 5\) NumPy array of random integers, where each row represents rolling five distinct 6-sided dice. (Be sure to construct this array without any loops. Use a random number generator in Python, constructed using
np.random.default_rng()
.)Count how many of the experiments were successful.
Estimate the probability using the formula “number of successes” divided by “number of experiments”.
If your answer does not equal the exact probability to at least three decimal places, try increasing \(n\) (or looking for mistakes!).
Here is an example of counting 6s in a NumPy array.
A = np.array([[1, 4, 2, 6, 6],
[5, 6, 4, 3, 5],
[4, 4, 3, 4, 3],
[1, 5, 5, 3, 3],
[3, 1, 6, 2, 5]])
# B has the same shape as A
# B contains True where A contains 6.
B = (A == 6)
B
array([[False, False, False, True, True],
[False, True, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, True, False, False]])
B.dtype
dtype('bool')
# To count Trues, you can add them.
True + True + False
2
# axis = 0 means: add along the columns, one column at a time.
# axis = 1 means: add along the rows, one row at a time
# (Make sure you understand these examples.)
B.sum(axis = 0)
array([0, 1, 1, 1, 1])
B.sum(axis = 1)
array([2, 1, 0, 0, 1])
Question 1:
Put your code for the NumPy estimate here. The answer should be correct to three decimal places.
A probability estimate using pandas¶
We make a similar computation here to practice with pandas.
A = np.array([[1, 4, 2, 6, 6],
[5, 6, 4, 3, 5],
[4, 4, 3, 4, 3],
[1, 5, 5, 3, 3],
[3, 1, 6, 2, 5]])
# Convert the NumPy array to a pandas DataFrame
df = pd.DataFrame(A)
df
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 1 | 4 | 2 | 6 | 6 |
1 | 5 | 6 | 4 | 3 | 5 |
2 | 4 | 4 | 3 | 4 | 3 |
3 | 1 | 5 | 5 | 3 | 3 |
4 | 3 | 1 | 6 | 2 | 5 |
df2 = (df == 6)
df2
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | False | False | False | True | True |
1 | False | True | False | False | False |
2 | False | False | False | False | False |
3 | False | False | False | False | False |
4 | False | False | True | False | False |
# Note how similar this code is to the NumPy code.
df2.sum(axis = 0)
0 0
1 1
2 1
3 1
4 1
dtype: int64
df2.sum(axis = 1)
0 2
1 1
2 0
3 0
4 1
dtype: int64
# Example of making a new column
df["sixes"] = df2.sum(axis = 1)
df
0 | 1 | 2 | 3 | 4 | sixes | |
---|---|---|---|---|---|---|
0 | 1 | 4 | 2 | 6 | 6 | 2 |
1 | 5 | 6 | 4 | 3 | 5 | 1 |
2 | 4 | 4 | 3 | 4 | 3 | 0 |
3 | 1 | 5 | 5 | 3 | 3 | 0 |
4 | 3 | 1 | 6 | 2 | 5 | 1 |
Question 2:
Put your code for the pandas estimate here. Try to use a big enough dataframe (and efficient enough code) that the answer is correct to three decimal places.
The exact probability using itertools.product¶
(Optional)
This is good practice but we may not cover itertools in Math 10, so you can skip this if you prefer.
product(["a","b","c"],[0,1,10])
<itertools.product at 0x7fa282a7e700>
list(product(["a","b","c"],[0,1,10]))
[('a', 0),
('a', 1),
('a', 10),
('b', 0),
('b', 1),
('b', 10),
('c', 0),
('c', 1),
('c', 10)]
list(product(range(2),repeat=4))
[(0, 0, 0, 0),
(0, 0, 0, 1),
(0, 0, 1, 0),
(0, 0, 1, 1),
(0, 1, 0, 0),
(0, 1, 0, 1),
(0, 1, 1, 0),
(0, 1, 1, 1),
(1, 0, 0, 0),
(1, 0, 0, 1),
(1, 0, 1, 0),
(1, 0, 1, 1),
(1, 1, 0, 0),
(1, 1, 0, 1),
(1, 1, 1, 0),
(1, 1, 1, 1)]
# We can't use this kind of syntax to count 1s,
# even though it works in both NumPy and pandas:
my_tuple = (1,0,1,1)
my_tuple == 1
False
# Here is a method to count:
my_tuple.count(1)
3
Question 3:
Compute the exact probability using product from the itertools module. (You might think this way is better than the NumPy and pandas methods, but the NumPy and pandas methods generalize better to bigger problems, where we cannot explicitly list every combination.)