Affine ciphers

From eaab4a47bd8a28435b175e89e3fb4ae1a4dca0d3 Mon Sep 17 00:00:00 2001 From: Neil Smith Date: Fri, 14 Mar 2014 11:44:42 +0000 Subject: [PATCH] Tinkering with slides --- slides/affine-encipher.html | 138 ++++++++++++++++++++++++++++++++++++ slides/caesar-break.html | 40 +++++++++-- 2 files changed, 172 insertions(+), 6 deletions(-) create mode 100644 slides/affine-encipher.html diff --git a/slides/affine-encipher.html b/slides/affine-encipher.html new file mode 100644 index 0000000..9c54d8a --- /dev/null +++ b/slides/affine-encipher.html @@ -0,0 +1,138 @@ + + + + Affine ciphers + + + + +

+
+# Affine ciphers
+
+## Explanation of extended Euclid's algorithm from [Programming with finite fields](http://jeremykun.com/2014/03/13/programming-with-finite-fields/)
+
+**Definition:** An element _d_ is called a greatest common divisor (gcd) of _a, b_ if it divides both _a_ and _b_, and for every other _z_ dividing both _a_ and _b_, _z_ divides _d_. 
+
+**Theorem:** For any two integers _a, b_ there exist unique integers _x, y_ such that _ax_ + _by_ = gcd(_a, b_).
+
+We could beat around the bush and try to prove these things in various ways, but when it comes down to it thereâs one algorithm of central importance that both computes the gcd and produces the needed linear combination _x, y_. The algorithm is called the Euclidean algorithm. Here is a simple version that just gives the gcd.
+
+```python
+def gcd(a, b):
+   if abs(a) < abs(b):
+      return gcd(b, a)
+ 
+   while abs(b) > 0:
+      q,r = divmod(a,b)
+      a,b = b,r
+ 
+   return a
+```
+
+This works by the simple observation that gcd(_a_, _aq_ + _r_) = gcd(_a_, _r_) (this is an easy exercise to prove directly). So the Euclidean algorithm just keeps applying this rule over and over again: take the remainder when dividing the bigger argument by the smaller argument until the remainder becomes zero. Then gcd(_x_, 0) = 0 because everything divides zero.
+
+Now the so-called âextendedâ Euclidean algorithm just keeps track of some additional data as it goes (the partial quotients and remainders). Hereâs the algorithm.
+
+```python
+def extendedEuclideanAlgorithm(a, b):
+   if abs(b) > abs(a):
+      (x,y,d) = extendedEuclideanAlgorithm(b, a)
+      return (y,x,d)
+ 
+   if abs(b) == 0:
+      return (1, 0, a)
+ 
+   x1, x2, y1, y2 = 0, 1, 1, 0
+   while abs(b) > 0:
+      q, r = divmod(a,b)
+      x = x2 - q*x1
+      y = y2 - q*y1
+      a, b, x2, x1, y2, y1 = b, r, x1, x, y1, y
+ 
+   return (x2, y2, a)
+```
+
+Indeed, the reader who hasnât seen this stuff before is encouraged to trace out a run for the numbers 4864, 3458. Their gcd is 38 and the two integers are 32 and -45, respectively.
+
+How does this help us compute inverses? Well, if we want to find the inverse of _a_ modulo _p_, we know that their gcd is 1. So compute the _x, y_ such that _ax_ + _py_ = 1, and then reduce both sides mod _p_. You get _ax_ + 0 = 1 _mod p_, which means that _x mod p_ is the inverse of _a_. So once we have the extended Euclidean algorithm our inverse function is trivial to write!
+
+```python
+def inverse(self):
+   x,y,d = extendedEuclideanAlgorithm(self.n, self.p)
+   return IntegerModP(x)
+```
+
+And indeed it works as expected:
+
+```python
+>>> mod23 = IntegersModP(23)
+>>> mod23(7).inverse()
+10 (mod 23)
+>>> mod23(7).inverse() * mod23(7)
+1 (mod 23)
+```
+
+
+

+ + + + + + + diff --git a/slides/caesar-break.html b/slides/caesar-break.html index 187719d..f296e44 100644 --- a/slides/caesar-break.html +++ b/slides/caesar-break.html @@ -126,13 +126,15 @@ But before then how do we count the letters? * Read a file into a string ```python open() -read() +.read() ``` * Count them ```python import collections ``` +Create the `language_models.py` file for this. + --- # Canonical forms @@ -150,16 +152,18 @@ Counting letters in _War and Peace_ gives all manner of junk. # Accents ```python ->>> caesar_encipher_letter('Ã©', 1) +>>> 'Ã©' in string.ascii_letters +>>> 'e' in string.ascii_letters ``` -What does it produce? - -What should it produce? ## Unicode, combining codepoints, and normal forms Text encodings will bite you when you least expect it. +- **Ã©** : LATIN SMALL LETTER E WITH ACUTE (U+00E9) + +- **e** + ** ́** : LATIN SMALL LETTER E (U+0065) + COMBINING ACUTE ACCENT (U+0301) + * urlencoding is the other pain point. --- @@ -190,6 +194,15 @@ def unaccent(text): --- +# Find the frequencies of letters in English + +1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`. +2. Find the frequencies +3. Sort by count (`sorted(, key=)` ; `.items()`, `.keys()`, `.values()`, `.get()`) +4. Write counts to `count_1l.txt` + +--- + # Vector distances .float-right[![right-aligned Vector subtraction](vector-subtraction.svg)] @@ -217,7 +230,7 @@ The higher the power used, the more weight is given to the largest differences i * L_∞ norm: `\(\|\mathbf{a} - \mathbf{b}\| = \max_i{(\mathbf{a}_i - \mathbf{b}_i)} \)` -neither of which will be that useful.) +neither of which will be that useful here, but they keep cropping up.) --- # Normalisation of vectors @@ -285,6 +298,21 @@ And the probability measure! ## Computing is an empircal science +Let's do some experiments to find the best solution! + +--- + +## Step 1: get **some** codebreaking working + +Let's start with the letter probability norm, because it's easy. + +## Step 2: build some other scoring functions + +We also need a way of passing the different functions to the keyfinding function. + +## Step 3: find the best scoring function + +Try them all on random ciphertexts, see which one works best. -- 2.34.1