Lab 1 - Math 173A

Lab 1 - Math 173A#

Note. This is not the first lab, be sure you do Lab 0 first. (Lab 0 is also due one week earlier.)

You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same upload.)

Put the full names of everyone in your group (even if you’re working alone) here. (This makes grading easier.)

Names:

Suggestion#

Follow the instructions from the Lab 0 worksheet to make an Education Workspace for this lab.

Preliminaries#

Our goal in this lab is to implement encryption and (more difficult) automatic decryption of Vigenère ciphertext, as described in Section 5.2 of Hoffstein, Pipher, and Silverman.

In this Preliminaries section, we introduce some concepts and tools that will be useful below.

Import some libraries and functions we will need by executing the following code. Some of these functions are new. Some you might have to review from lecture or Lab 0. (The Lab1_Helper.py file is already attached to this Deepnote project, so there is no need to download it.)

import string
import numpy as np
from Lab1_Helper import (
                            only_letters,
                            weave, 
                            shift_string, 
                            kasiski_diffs,
                            ind_co,
                            mut_ind_co,
                            get_freq,
                            english_freq,
                            add_spaces
                        )

When we encrypt using the Vigenère cipher with a key length of 5 (for example), we shift all the characters at integer positions 0 modulo 5 by the same amount, all characters at integer positions 1 modulo 5 by the same amount, and so on. Assume X is the string we want to encrypt using the Vigenère cipher with a key length of 5. Notice you can get all the characters in X at integer positions 2 modulo 5 (for example) using X[2::5]. (You can think of this as shorthand for X[2:len(X):5], which means, start at position 2, go until reaching len(X), go up by 5.) Test this out by setting X = string.ascii_uppercase and then printing X and X[2::5].

We need a way of “re-assembling” these strings. We can do that using the imported weave function. Evaluate the following code to get a sense for how it works. (Here X should be the same as above. If you know list comprehension, try to recreate this list using list comprehension.)

weave([X[0::5], X[1::5], X[2::5], X[3::5], X[4::5]])

As described on page 219 of Hoffstein, Pipher, and Silverman, the Kasiski method first finds the gaps between repeated trigrams in Vigenère ciphertext. (The idea is that these repeated trigrams will usually correspond to the same English plaintext, and so the gaps will usually be divisible by the key length.) The kasiski_diffs function will find these gaps for us. Try it out by evaluating the following code, and notice that the numbers match those in Table 5.3 in Hoffstein, Pipher, Silverman, and that many of them are divisible by 7.

Y = '''zpgdl rjlaj kpylx zpyyg lrjgd lrzhz qyjzq repvm swrzy rigzh
zvreg kwivs saolt nliuw oldie aqewf iiykh bjowr hdogc qhkwa
jyagg emisr zqoqh oavlk bjofr ylvps rtgiu avmsw lzgms evwpc
dmjsv jqbrn klpcf iowhv kxjbj pmfkr qthtk ozrgq ihbmq sbivd
ardym qmpbu nivxm tzwqv gefjh ucbor vwpcd xuwft qmoow jipds
fluqm oeavl jgqea lrkti wvext vkrrg xani'''

kasiski_diffs(Y)

In lecture on Monday of Week 1, we used (an estimate for) the Index of Coincidence to distinguish random characters from shift ciphertext. One strength of the Vigenère cipher is that it looks very much like random text from this perspective. Compute ind_co(Y) for Y as above, and compare it with 1/26, which is the value expected for randomly chosen English characters. (Comment. We’ve changed ind_co since Monday. It used to take a dictionary of letter frequencies as input, like mut_ind_co. Now it takes a string as input.)

On the other hand, the above was encrypted using a Vigenère cipher with a key length of 7. Compute the following and compare the values with the expected Index of Coincidence value for true English of approximately 0.068. (All of the values will be close to, but slightly less than, this 0.068 value. I’m not sure if there is a theoretical reason for why they are less than 0.068. The important thing is they’re all significantly closer to the true English value than to the random characters value.)

Z = only_letters(Y) # we remove spaces from `Y`
for i in range(7):
    print(ind_co(Z[i::7]))

Vigenère encryption#

Writing the `vigenere` function#

Write a function to perform Vigenère encryption using the following template. (If you want to make the code more elegant, replace for i in range(key_length): in our template with for i, k in enumerate(key):.)

Input: a string X and a list of shift amounts key Output: a string Y

Example: If X is "The rain in Spain stays mainly in the plain" and key is [5, 11, 0, 12, 8, 13, 6, 14], then the output should be the ciphertext written on page 217 of our textbook, beginning "YSEDIVTWSDPM...".

def vigenere(X, key, case="upper"):
    X = only_letters(X, case=case)
    key_length = len(key)
    string_list = []
    for i in range(key_length):
        s = # Get the characters in `X` at integer positions ??? modulo ???
        shifted_s = # Shift the characters in s.  Use the imported `shift_string` function.
        string_list.append(shifted_s)
    output = # Use the imported `weave` function
    return output

Test your function by evaluating the following and checking that the result matches what is on page 217 of the textbook (aside from ours being in upper-case letters).

X = "The rain in Spain stays mainly in the plain"
key = [5, 11, 0, 12, 8, 13, 6, 14]
vigenere(X, key)

Posting Vigenère ciphertext on Ed Discussion#

Encrypt a piece of English text using the above vigenere function with a random key of length 7, 8, 9, 10, or 11.
You can generate a random key using the following, where key_length is your chosen length. (The resulting key is a NumPy array rather than a list, but it should still work.)

rng = np.random.default_rng()
key = rng.integers(0, 26, size=key_length)
key

Your chosen text should be quite long, maybe 300 characters. Just copy and paste from some source, and use triple quotation marks. (Using triple quotation marks means that Python won’t complain about line breaks or apostrophes or double quotation marks in your pasted text.)

plaintext = ''' Your text here '''

Encrypt your chosen plaintext using the above key and the vigenere function. Store the resulting ciphertext using the variable name Y.

Make the ciphertext less overwhelming to look at by evaluating the following.

print(add_spaces(Y))

Go to the “Lab 1 Ciphertexts” thread on Ed Discussion. Post the displayed ciphertext as a new response. Type triple backticks (on my keyboard, backtick is the key above tab) before pasting the ciphertext, or alternatively, change from “Paragraph” to “Code” in the dropdown menu. This will let Ed Discussion know that all characters should be displayed with the same width.

Vigenère decryption#

There are two main steps to decrypting Vigenère ciphertext: Find the key length and find the individual shift amounts. Finding the individual shift amounts is essentially the same as what we did in Lab 0, so the main work is finding the key length.

We will implement two separate methods for finding the key length, both of which are described in Section 5.2 of Hoffstein, Pipher, and Silverman.

Finding the key-length using the Kasiski method#

Write a function prop_div using the following template.

Input: a list of integers X and a modulus m Output: the proportion of integers in X which are divisible by m

For example, prop_div([3,5,8,9,10], 3) should be equal to 0.4, because 2 out of 5 of these integers are divisible by 3.

def prop_div(X, m):
    n = len(X)
    t = 0 # Eventually t will be the number of integers divisible by m in X
    for x in X:
        if x%??? == ???:
            t = t+1
    return ??? # Return the proportion, not t.

Write a function key_length_kasiski to predict the key length using the following template. (Recall that we introduced the kasiski_diffs function above.) The idea is to find for which key length value, from key_start (inclusive) to key_end (exclusive), is the highest proportion of gaps between repeated trigrams divisible by that key length. The parameters top_value and top_key represent the current highest proportion and the current best key length, respectively.

Feel free to rewrite this if you would like to use a more elegant/Pythonic approach to find the best key length, as long as it is still using the Kasiski method. (If I were doing this in Math 10, I think I would make a pandas Series prop_ser with values equal to the proportions and keys equal to the key lengths, and then I would find the best key length by using prop_ser.idxmax().)

def key_length_kasiski(Y, key_start=7, key_end=12):
    diffs = kasiski_diffs(Y)
    
    top_value = 0
    top_key = 0
    
    for k in range(key_start, key_end):
        prop = prop_div(diffs, k)
        if ??? > ???:
            top_value = prop
            top_key = k

    return ??? # Return the key length which yields the highest value

Test your function by evaluating the following. The output should be 7.

Y = '''zpgdl rjlaj kpylx zpyyg lrjgd lrzhz qyjzq repvm swrzy rigzh
zvreg kwivs saolt nliuw oldie aqewf iiykh bjowr hdogc qhkwa
jyagg emisr zqoqh oavlk bjofr ylvps rtgiu avmsw lzgms evwpc
dmjsv jqbrn klpcf iowhv kxjbj pmfkr qthtk ozrgq ihbmq sbivd
ardym qmpbu nivxm tzwqv gefjh ucbor vwpcd xuwft qmoow jipds
fluqm oeavl jgqea lrkti wvext vkrrg xani'''

key_length_kasiski(Y)

Finding the key-length using the Index of Coincidence#

Write a function coincidence_mean using the following template. (Note. Base Python does not have a built-in mean or average function, but it does have a built in sum function.)

Inputs: a string Y and an integer m. Output: a real number

def coincidence_mean(Y, m):
    coinc_list = []
    for i in range(m):
        s = # Get the characters in `Y` at integer positions ??? modulo ???
        c = # Compute the Index of Coincidence for `s` using the `ind_co` function
        coinc_list.append(c)

    return ??? # Return the mean value of `coinc_list`

Write a function key_length_coincidence to predict the key length using the following template.

Again, feel free to rewrite this if you would like to use a more elegant approach, as long as it is still using the coincidence_mean function from above.

def key_length_coincidence(Y, key_start=7, key_end=12):
    
    top_value = 0
    top_key = 0
    
    for k in range(key_start, key_end):
        mean = coincidence_mean(???, ???)
        if ??? > ???:
            top_value = ???
            top_key = ???

    return ??? # Return the key length which yields the highest value

Test your function by evaluating the following. The output should again be 7.

Y = '''zpgdl rjlaj kpylx zpyyg lrjgd lrzhz qyjzq repvm swrzy rigzh
zvreg kwivs saolt nliuw oldie aqewf iiykh bjowr hdogc qhkwa
jyagg emisr zqoqh oavlk bjofr ylvps rtgiu avmsw lzgms evwpc
dmjsv jqbrn klpcf iowhv kxjbj pmfkr qthtk ozrgq ihbmq sbivd
ardym qmpbu nivxm tzwqv gefjh ucbor vwpcd xuwft qmoow jipds
fluqm oeavl jgqea lrkti wvext vkrrg xani'''

key_length_coincidence(Y)

Finding the shift amounts#

This portion should be easier, because you essentially already did this in Lab 0.

Paste in and execute the code for your shift_decrypt function from Lab 0.

Write a function find_shifts using the following template.

Inputs: a Vigenère ciphertext string Y and a key-length k Output: a list of shift amounts shifts

def find_shifts(Y, k):
    shifts = []
    for i in range(k):
        s = # Get the characters in `Y` at integer positions ??? modulo ???
        X, m = shift_decrypt(s) # We only care about the shift amount `m`
        shifts.append(m)
        
    return shifts

Putting it all together#

Write a function vigenere_decrypt using the following template. The method keyword argument allows the user to specify whether they wish to use the Kasiski method or the Index of Coincidence method to find the key length. If no method is specified, the Kasiski method is used by default.

def vigenere_decrypt(Y, key_start=7, key_end=12, method="kasiski"):
    Y = only_letters(Y, case="upper") # Remove spaces

    if method == "kasiski":
        key_length_fn = key_length_kasiski
    elif method == "coincidence":
        key_length_fn = key_length_coincidence
    else:
        raise ValueError('''key_length should be either "kasiski" or "coincidence"''')
        
    k = key_length_fn(Y, key_start=key_start, key_end=key_end) # `k` is the predicted key length
    shifts = ??? # The predicted shift amounts
    X = ??? # The predicted plaintext.  Use your `vigenere` function from high above
    
    return X.lower()

Try to decrypt the above ciphertext from Section 5.2 of Hoffstein, Pipher, and Silverman.

Y = '''zpgdl rjlaj kpylx zpyyg lrjgd lrzhz qyjzq repvm swrzy rigzh
zvreg kwivs saolt nliuw oldie aqewf iiykh bjowr hdogc qhkwa
jyagg emisr zqoqh oavlk bjofr ylvps rtgiu avmsw lzgms evwpc
dmjsv jqbrn klpcf iowhv kxjbj pmfkr qthtk ozrgq ihbmq sbivd
ardym qmpbu nivxm tzwqv gefjh ucbor vwpcd xuwft qmoow jipds
fluqm oeavl jgqea lrkti wvext vkrrg xani'''

vigenere_decrypt(Y)

Decrypting Vigenère ciphertext from Ed Discussion#

Select one of your classmate’s Vigenère ciphertexts that were posted on Ed Discussion and attempt to decrypt it using your vigenere_decrypt function. Assign the ciphertext to the variable name Y (use triple quotation marks to allow line breaks). Try using both key length methods.

vigenere_decrypt(Y, method="kasiski")

and

vigenere_decrypt(Y, method="coincidence")

Do they both wok?

Submission#

Using the Share button at the top right, enable public sharing, and enable Comment privileges. Then submit the created link on Canvas.

(Don’t just copy the browser URL. Copy the link that is provided after you click the “Share” button. It won’t be available until you enable public sharing.)

Reminder: Everyone in the group needs to submit this link on Canvas.

Created in Deepnote