Optional: Difference of Squares Implementation

Optional: Difference of Squares Implementation#

For Math 173A, you’re expected to know the Difference of Squares factorization algorithm, but you’re not expected to know how to implement it in Python, so only look at this if you’re interested or if you think it will help you understand the algorithm better.

The galois package is used for working over finite fields. We want to work over \(\mathbb{Z}/2\mathbb{Z}\).

!pip install galois pycryptodome

Collecting galois
  Downloading galois-0.3.5-py3-none-any.whl (4.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 78.5 MB/s eta 0:00:00
?25hCollecting pycryptodome
  Downloading pycryptodome-3.18.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 101.1 MB/s eta 0:00:00
?25hCollecting numba<0.58,>=0.55
  Downloading numba-0.57.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 85.7 MB/s eta 0:00:00
?25hRequirement already satisfied: numpy<1.25,>=1.21.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from galois) (1.23.4)
Requirement already satisfied: typing-extensions>=4.0.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from galois) (4.4.0)
Collecting llvmlite<0.41,>=0.40.0dev0
  Downloading llvmlite-0.40.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 51.7 MB/s eta 0:00:00
?25hInstalling collected packages: pycryptodome, llvmlite, numba, galois
Successfully installed galois-0.3.5 llvmlite-0.40.1 numba-0.57.1 pycryptodome-3.18.0
WARNING: You are using pip version 22.0.4; however, version 23.2.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.

from math import gcd, sqrt, floor
from sympy import factorint, isprime
from Crypto.Util.number import getPrime
import galois
import pandas as pd
import numpy as np

Here is a sample RSA modulus N we will try to factor. The same one from lecture on Week 5 Friday. The point is to demonstrate the algorithm from Hoffstein, Pipher, Silverman, Section 3.6. This N is small enough we could easily factor it using brute force.

N = 914387

A number is \(L\)-smooth if its biggest prime divisor \(p\) satisfies \(p \leq L\). For example, \(4400 = 2^6 \cdot 3 \cdot 5^2\), so this is \(5\)-smooth but not \(4\)-smooth.

factorint(4800)

{2: 6, 3: 1, 5: 2}

# Biggest prime divisor
max(factorint(4800))

L = 12
primes = [x for x in range(L+1) if isprime(x)]
primes

[2, 3, 5, 7, 11]

We’ll store the results in a matrix, like in the Week 5 Friday lecture.

df = pd.DataFrame(index = primes)

We look for smooth numbers among the reductions \(x^2 \bmod N\). We’ll never get anything interesting unless \(x^2 > N\), so that is why we start at \(x = \lfloor \sqrt{N} \rfloor + 1\).

start = floor(sqrt(N)) + 1
for x in range(start, 5000):
    factor_dct = factorint(pow(x,2,N))
    # go to next x if x^2 mod N is not L-smooth
    if max(factor_dct) > L:
        continue
    # Default exponent of 0
    df[x] = 0
    df[x].update(factor_dct)

df

	1869	1909	3387	4586
2	4	14	0	0
3	1	0	1	2
5	6	1	3	1
7	0	0	0	0
11	0	1	3	1

We need to make two changes before we can use the “linear algebra over finite fields” capabilities of the galois library.

We need to switch from a pandas DataFrame to a NumPy array.
We need to reduce the entries.

# This is how we indicate we want to work mod 2.
GF = galois.GF(2)

# Doesn't accept a pandas DataFrame
GF(df)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [36], line 2
      1 # Doesn't accept a pandas DataFrame
----> 2 GF(df)

File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:70, in FieldArray.__new__(cls, x, dtype, copy, order, ndmin)
     64 if cls is FieldArray:
     65     raise NotImplementedError(
     66         "FieldArray is an abstract base class that cannot be directly instantiated. "
     67         "Instead, create a FieldArray subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
     68         "and instantiate an array using `x = GF(array_like)`."
     69     )
---> 70 return super().__new__(cls, x, dtype, copy, order, ndmin)

File ~/venv/lib/python3.9/site-packages/galois/_domains/_array.py:53, in Array.__new__(cls, x, dtype, copy, order, ndmin)
     45     raise NotImplementedError(
     46         "Array is an abstract base class that cannot be directly instantiated. "
     47         "Instead, create an Array subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
     48         "and instantiate an array using `x = GF(array_like)`."
     49     )
     51 dtype = cls._get_dtype(dtype)
---> 53 x = cls._verify_array_like_types_and_values(x)
     54 array = np.array(x, dtype=dtype, copy=copy, order=order, ndmin=ndmin)
     56 # Perform view without verification since the elements were verified in _verify_array_like_types_and_values()

File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:151, in FieldArray._verify_array_like_types_and_values(cls, x)
    149     cls._verify_array_values(x)
    150 else:
--> 151     raise TypeError(
    152         f"{cls.name} arrays can be created with scalars of type int/str, lists/tuples, "
    153         f"or ndarrays, not {type(x)}."
    154     )
    156 return x

TypeError: GF(2) arrays can be created with scalars of type int/str, lists/tuples, or ndarrays, not <class 'pandas.core.frame.DataFrame'>.

# Needs entries 0 or 1
GF(df.values)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [38], line 2
      1 # Needs entries 0 or 1
----> 2 GF(df.values)

File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:70, in FieldArray.__new__(cls, x, dtype, copy, order, ndmin)
     64 if cls is FieldArray:
     65     raise NotImplementedError(
     66         "FieldArray is an abstract base class that cannot be directly instantiated. "
     67         "Instead, create a FieldArray subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
     68         "and instantiate an array using `x = GF(array_like)`."
     69     )
---> 70 return super().__new__(cls, x, dtype, copy, order, ndmin)

File ~/venv/lib/python3.9/site-packages/galois/_domains/_array.py:53, in Array.__new__(cls, x, dtype, copy, order, ndmin)
     45     raise NotImplementedError(
     46         "Array is an abstract base class that cannot be directly instantiated. "
     47         "Instead, create an Array subclass for GF(p^m) arithmetic using `GF = galois.GF(p**m)` "
     48         "and instantiate an array using `x = GF(array_like)`."
     49     )
     51 dtype = cls._get_dtype(dtype)
---> 53 x = cls._verify_array_like_types_and_values(x)
     54 array = np.array(x, dtype=dtype, copy=copy, order=order, ndmin=ndmin)
     56 # Perform view without verification since the elements were verified in _verify_array_like_types_and_values()

File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:149, in FieldArray._verify_array_like_types_and_values(cls, x)
    147     elif not np.issubdtype(x.dtype, np.integer):
    148         raise TypeError(f"{cls.name} arrays must have integer dtypes, not {x.dtype}.")
--> 149     cls._verify_array_values(x)
    150 else:
    151     raise TypeError(
    152         f"{cls.name} arrays can be created with scalars of type int/str, lists/tuples, "
    153         f"or ndarrays, not {type(x)}."
    154     )

File ~/venv/lib/python3.9/site-packages/galois/_fields/_array.py:176, in FieldArray._verify_array_values(cls, array)
    174 idxs = np.logical_or(array < 0, array >= cls.order)
    175 values = array if array.ndim == 0 else array[idxs]
--> 176 raise ValueError(f"{cls.name} arrays must have elements in `0 <= x < {cls.order}`, not {values}.")

ValueError: GF(2) arrays must have elements in `0 <= x < 2`, not [ 4 14  2  6  3  3].

mat = GF((df.values)%2)

mat

GF([[0, 0, 0, 0],
    [1, 0, 1, 0],
    [0, 1, 1, 1],
    [0, 0, 0, 0],
    [0, 1, 1, 1]], order=2)

ns = mat.null_space()

# Basis for the null-space.
ns

GF([[1, 0, 1, 1],
    [0, 1, 0, 1]], order=2)

# Focus on the second basis vector
vec = ns[1]
vec

GF([0, 1, 0, 1], order=2)

# x values corresponding to vec
cols = df.columns[vec == 1]
cols

Int64Index([1909, 4586], dtype='int64')

# prime exponents corresponding to vec
prime_series = df[cols].sum(axis=1)
prime_series

   14
    2
    2
    0
   2
dtype: int64

a = np.prod(cols)
b = np.prod(prime_series.index ** (prime_series.values//2))

# Success
gcd(a-b, N)

Let’s try with a bigger value of N, but we’re still in the range where the values can be factored naively.

p = getPrime(13)
q = getPrime(13)
N = p*q
print(N)
print(p)
print(q)

44762551
6343
7057

Here are the values we used: N = 44762551 p = 6343 q = 7057

L = 80
primes = [x for x in range(L+1) if isprime(x)]

df = pd.DataFrame(index = primes)

start = floor(sqrt(N)) + 1
for x in range(start, 10000):
    factor_dct = factorint(pow(x,2,N))
    # go to next x if x^2 mod N is not L-smooth
    if max(factor_dct) > L:
        continue
    # Default exponent of 0
    df[x] = 0
    df[x].update(factor_dct)

df.shape

(22, 50)

mat = GF((df.values)%2)

ns = mat.null_space()

for vec in ns:
    cols = df.columns[vec == 1]
    prime_series = df[cols].sum(axis=1)
    a = np.prod(cols)
    b = np.prod(prime_series.index ** (prime_series.values//2))
    g = gcd(a-b, N)
    print(g)

1
1
1
1
6343
44762551
6343
1
1
1
1
44762551
1
6343
1
6343
1
1
6343
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
44762551
/tmp/ipykernel_35/329814783.py:6: RuntimeWarning: overflow encountered in long_scalars
  g = gcd(a-b, N)

Created in Deepnote