Optional: Difference of Squares Implementation#
For Math 173A, you’re expected to know the Difference of Squares factorization algorithm, but you’re not expected to know how to implement it in Python, so only look at this if you’re interested or if you think it will help you understand the algorithm better.
The galois package is used for working over finite fields. We want to work over \(\mathbb{Z}/2\mathbb{Z}\).
!pip install galois pycryptodome
Collecting galois
Downloading galois-0.3.5-py3-none-any.whl (4.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 78.5 MB/s eta 0:00:00
?25hCollecting pycryptodome
Downloading pycryptodome-3.18.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 101.1 MB/s eta 0:00:00
?25hCollecting numba<0.58,>=0.55
Downloading numba-0.57.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 85.7 MB/s eta 0:00:00
?25hRequirement already satisfied: numpy<1.25,>=1.21.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from galois) (1.23.4)
Requirement already satisfied: typing-extensions>=4.0.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from galois) (4.4.0)
Collecting llvmlite<0.41,>=0.40.0dev0
Downloading llvmlite-0.40.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 51.7 MB/s eta 0:00:00
?25hInstalling collected packages: pycryptodome, llvmlite, numba, galois
Successfully installed galois-0.3.5 llvmlite-0.40.1 numba-0.57.1 pycryptodome-3.18.0
WARNING: You are using pip version 22.0.4; however, version 23.2.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
from math import gcd, sqrt, floor
from sympy import factorint, isprime
from Crypto.Util.number import getPrime
import galois
import pandas as pd
import numpy as np
Here is a sample RSA modulus N we will try to factor. The same one from lecture on Week 5 Friday. The point is to demonstrate the algorithm from Hoffstein, Pipher, Silverman, Section 3.6. This N is small enough we could easily factor it using brute force.
N = 914387
A number is \(L\)-smooth if its biggest prime divisor \(p\) satisfies \(p \leq L\). For example, \(4400 = 2^6 \cdot 3 \cdot 5^2\), so this is \(5\)-smooth but not \(4\)-smooth.
factorint(4800)
{2: 6, 3: 1, 5: 2}
# Biggest prime divisor
max(factorint(4800))
5
L = 12
primes = [x for x in range(L+1) if isprime(x)]
primes
[2, 3, 5, 7, 11]
We’ll store the results in a matrix, like in the Week 5 Friday lecture.
df = pd.DataFrame(index = primes)
We look for smooth numbers among the reductions \(x^2 \bmod N\). We’ll never get anything interesting unless \(x^2 > N\), so that is why we start at \(x = \lfloor \sqrt{N} \rfloor + 1\).
start = floor(sqrt(N)) + 1
for x in range(start, 5000):
factor_dct = factorint(pow(x,2,N))
# go to next x if x^2 mod N is not L-smooth
if max(factor_dct) > L:
continue
# Default exponent of 0
df[x] = 0
df[x].update(factor_dct)
df
| 1869 | 1909 | 3387 | 4586 | |
|---|---|---|---|---|
| 2 | 4 | 14 | 0 | 0 |
| 3 | 1 | 0 | 1 | 2 |
| 5 | 6 | 1 | 3 | 1 |
| 7 | 0 | 0 | 0 | 0 |
| 11 | 0 | 1 | 3 | 1 |
We need to make two changes before we can use the “linear algebra over finite fields” capabilities of the galois library.
We need to switch from a pandas DataFrame to a NumPy array.
We need to reduce the entries.
# This is how we indicate we want to work mod 2.
GF = galois.GF(2)
# Doesn't accept a pandas DataFrame
GF(df)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [36], line 2
1 # Doesn't accept a pandas DataFrame
----> 2 GF(df)
File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:70, in FieldArray.__new__(cls, x, dtype, copy, order, ndmin)
64 if cls is FieldArray:
65 raise NotImplementedError(
66 "FieldArray is an abstract base class that cannot be directly instantiated. "
67 "Instead, create a FieldArray subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
68 "and instantiate an array using `x = GF(array_like)`."
69 )
---> 70 return super().__new__(cls, x, dtype, copy, order, ndmin)
File ~/venv/lib/python3.9/site-packages/galois/_domains/_array.py:53, in Array.__new__(cls, x, dtype, copy, order, ndmin)
45 raise NotImplementedError(
46 "Array is an abstract base class that cannot be directly instantiated. "
47 "Instead, create an Array subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
48 "and instantiate an array using `x = GF(array_like)`."
49 )
51 dtype = cls._get_dtype(dtype)
---> 53 x = cls._verify_array_like_types_and_values(x)
54 array = np.array(x, dtype=dtype, copy=copy, order=order, ndmin=ndmin)
56 # Perform view without verification since the elements were verified in _verify_array_like_types_and_values()
File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:151, in FieldArray._verify_array_like_types_and_values(cls, x)
149 cls._verify_array_values(x)
150 else:
--> 151 raise TypeError(
152 f"{cls.name} arrays can be created with scalars of type int/str, lists/tuples, "
153 f"or ndarrays, not {type(x)}."
154 )
156 return x
TypeError: GF(2) arrays can be created with scalars of type int/str, lists/tuples, or ndarrays, not <class 'pandas.core.frame.DataFrame'>.
# Needs entries 0 or 1
GF(df.values)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [38], line 2
1 # Needs entries 0 or 1
----> 2 GF(df.values)
File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:70, in FieldArray.__new__(cls, x, dtype, copy, order, ndmin)
64 if cls is FieldArray:
65 raise NotImplementedError(
66 "FieldArray is an abstract base class that cannot be directly instantiated. "
67 "Instead, create a FieldArray subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
68 "and instantiate an array using `x = GF(array_like)`."
69 )
---> 70 return super().__new__(cls, x, dtype, copy, order, ndmin)
File ~/venv/lib/python3.9/site-packages/galois/_domains/_array.py:53, in Array.__new__(cls, x, dtype, copy, order, ndmin)
45 raise NotImplementedError(
46 "Array is an abstract base class that cannot be directly instantiated. "
47 "Instead, create an Array subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
48 "and instantiate an array using `x = GF(array_like)`."
49 )
51 dtype = cls._get_dtype(dtype)
---> 53 x = cls._verify_array_like_types_and_values(x)
54 array = np.array(x, dtype=dtype, copy=copy, order=order, ndmin=ndmin)
56 # Perform view without verification since the elements were verified in _verify_array_like_types_and_values()
File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:149, in FieldArray._verify_array_like_types_and_values(cls, x)
147 elif not np.issubdtype(x.dtype, np.integer):
148 raise TypeError(f"{cls.name} arrays must have integer dtypes, not {x.dtype}.")
--> 149 cls._verify_array_values(x)
150 else:
151 raise TypeError(
152 f"{cls.name} arrays can be created with scalars of type int/str, lists/tuples, "
153 f"or ndarrays, not {type(x)}."
154 )
File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:176, in FieldArray._verify_array_values(cls, array)
174 idxs = np.logical_or(array < 0, array >= cls.order)
175 values = array if array.ndim == 0 else array[idxs]
--> 176 raise ValueError(f"{cls.name} arrays must have elements in `0 <= x < {cls.order}`, not {values}.")
ValueError: GF(2) arrays must have elements in `0 <= x < 2`, not [ 4 14 2 6 3 3].
mat = GF((df.values)%2)
mat
GF([[0, 0, 0, 0],
[1, 0, 1, 0],
[0, 1, 1, 1],
[0, 0, 0, 0],
[0, 1, 1, 1]], order=2)
ns = mat.null_space()
# Basis for the null-space.
ns
GF([[1, 0, 1, 1],
[0, 1, 0, 1]], order=2)
# Focus on the second basis vector
vec = ns[1]
vec
GF([0, 1, 0, 1], order=2)
# x values corresponding to vec
cols = df.columns[vec == 1]
cols
Int64Index([1909, 4586], dtype='int64')
# prime exponents corresponding to vec
prime_series = df[cols].sum(axis=1)
prime_series
2 14
3 2
5 2
7 0
11 2
dtype: int64
a = np.prod(cols)
b = np.prod(prime_series.index ** (prime_series.values//2))
# Success
gcd(a-b, N)
1103
Let’s try with a bigger value of N, but we’re still in the range where the values can be factored naively.
p = getPrime(13)
q = getPrime(13)
N = p*q
print(N)
print(p)
print(q)
44762551
6343
7057
Here are the values we used:
N = 44762551
p = 6343
q = 7057
L = 80
primes = [x for x in range(L+1) if isprime(x)]
df = pd.DataFrame(index = primes)
start = floor(sqrt(N)) + 1
for x in range(start, 10000):
factor_dct = factorint(pow(x,2,N))
# go to next x if x^2 mod N is not L-smooth
if max(factor_dct) > L:
continue
# Default exponent of 0
df[x] = 0
df[x].update(factor_dct)
df.shape
(22, 50)
mat = GF((df.values)%2)
ns = mat.null_space()
for vec in ns:
cols = df.columns[vec == 1]
prime_series = df[cols].sum(axis=1)
a = np.prod(cols)
b = np.prod(prime_series.index ** (prime_series.values//2))
g = gcd(a-b, N)
print(g)
1
1
1
1
6343
44762551
6343
1
1
1
1
44762551
1
6343
1
6343
1
1
6343
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
44762551
/tmp/ipykernel_35/329814783.py:6: RuntimeWarning: overflow encountered in long_scalars
g = gcd(a-b, N)