4 <title>Breaking caesar ciphers
</title>
5 <meta http-equiv=
"Content-Type" content=
"text/html; charset=UTF-8"/>
6 <style type=
"text/css">
15 h1 { font-size:
3em; }
16 h2 { font-size:
2em; }
17 h3 { font-size:
1.6em; }
19 text-decoration: none;
22 -moz-border-radius:
5px;
23 -web-border-radius:
5px;
31 text-shadow:
0 0 20px #
333;
37 text-shadow:
0 0 20px #
333;
43 <textarea id=
"source">
45 # Breaking caesar ciphers
47 ![centre-aligned Caesar wheel](caesarwheel1.gif)
53 Slow but clever vs Dumb but fast
57 Ciphertext | Plaintext
59 ![left-aligned Ciphertext frequencies](c1a_frequency_histogram.png) | ![left-aligned English frequencies](english_frequency_histogram.png)
71 * How many keys to try?
77 decipher with this key
78 how close is it to English?
82 What steps do we know how to do?
85 # How close is it to English?
87 What does English look like?
89 * We need a model of English.
91 How do we define
"closeness"?
95 # What does English look like?
97 ## Abstraction: frequency of letter counts
111 One way of thinking about this is a
26-dimensional vector.
113 Create a vector of our text, and one of idealised English.
115 The distance between the vectors is how far from English the text is.
122 ## FIXME! Diagram of vector subtraction here
124 Several different distance measures (__metrics__, also called __norms__):
126 * L
<sub>2</sub> norm (Euclidean distance): `\(\|\mathbf{x} - \mathbf{y}\| = \sqrt{\sum_i (\mathbf{x}_i - \mathbf{y}_i)^
2} \)`
128 * L
<sub>1</sub> norm (Manhattan distance, taxicab distance): `\(\|\mathbf{x} - \mathbf{y}\| = \sum_i |\mathbf{x}_i - \mathbf{y}_i| \)`
130 * L
<sub>3</sub> norm: `\(\|\mathbf{x} - \mathbf{y}\| = \sqrt[
3]{\sum_i |\mathbf{x}_i - \mathbf{y}_i|^
3} \)`
132 The higher the power used, the more weight is given to the largest differences in components.
136 * L
<sub>0</sub> norm (Hamming distance): `\(\|\mathbf{x} - \mathbf{y}\| = \sum_i \left\{\begin{matrix}
1 &\mbox{if}\ \mathbf{x}_i \neq \mathbf{y}_i , \\
0 &\mbox{if}\ \mathbf{x}_i = \mathbf{y}_i \end{matrix} \right| \)`
138 * L
<sub>∞</sub> norm: `\(\|\mathbf{x} - \mathbf{y}\| = \max_i{(\mathbf{x}_i - \mathbf{y}_i)} \)`
140 neither of which will be that useful.)
143 # Normalisation of vectors
145 Frequency distributions drawn from different sources will have different lengths. For a fair comparison we need to scale them.
147 * Eucliean scaling (vector with unit length): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \sqrt{\mathbf{x}_1^
2 + \mathbf{x}_2^
2 + \mathbf{x}_3^
2 + \dots } }\)`
149 * Normalisation (components of vector sum to
1): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }\)`
153 # Angle, not distance
155 Rather than looking at the distance between the vectors, look at the angle between them.
157 Vector dot product shows how much of one vector lies in the direction of another:
158 `\( \mathbf{A} \bullet \mathbf{B} =
159 \| \mathbf{A} \| \cdot \| \mathbf{B} \| \cos{\theta} \)`
162 `\( \mathbf{A} \bullet \mathbf{B} = \sum_i \mathbf{A}_i \cdot \mathbf{B}_i \)`
163 and `\( \| \mathbf{A} \| = \sum_i \mathbf{A}_i^
2 \)`
165 A bit of rearranging give the cosine simiarity:
166 `\( \cos{\theta} = \frac{ \mathbf{A} \bullet \mathbf{B} }{ \| \mathbf{A} \| \cdot \| \mathbf{B} \| } =
167 \frac{\sum_i \mathbf{A}_i \cdot \mathbf{B}_i}{\sum_i \mathbf{A}_i^
2 \times \sum_i \mathbf{B}_i^
2} \)`
169 This is independent of vector lengths!
171 Cosine similarity is
1 if in same direction,
0 if at
90⁰, -
1 if antiparallel.
173 ## FIXME! Cosine distance bug: frequencies2 length not squared.
176 ## FIXME! Diagram of vector dot product
179 <script src=
"http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type=
"text/javascript">
182 <script type=
"text/javascript"
183 src=
"http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&delayStartupUntil=configured"></script>
185 <script type=
"text/javascript">
186 var slideshow = remark.create({ ratio:
"16:9" });
191 skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
194 MathJax.Hub.Queue(function() {
195 $(MathJax.Hub.getAllJax()).map(function(index, elem) {
196 return(elem.SourceElement());
197 }).parent().addClass('has-jax');
199 MathJax.Hub.Configured();