# Breaking caesar ciphers ![centre-aligned Caesar wheel](caesarwheel1.gif) --- # Brute force How many keys to try? ## Basic idea ``` for each key: decipher with this key how close is it to English? remember the best key ``` What steps do we know how to do? --- # How close is it to English? What does English look like? * We need a model of English. How do we define "closeness"? --- # What does English look like? ## Abstraction: frequency of letter counts Letter | Count -------|------ a | 489107 b | 92647 c | 140497 d | 267381 e | 756288 . | . . | . . | . z | 3575 One way of thinking about this is a 26-dimensional vector. Create a vector of our text, and one of idealised English. The distance between the vectors is how far from English the text is. --- # Vector distances ## FIXME! Diagram of vector subtraction here Several different distance measures (__metrics__, also called __norms__): * L
2
norm (Euclidean distance): `\(|\mathbf{x} - \mathbf{y}| = \sqrt{\sum_i (\mathbf{x}_i - \mathbf{y}_i)^2} \)` * L
1
norm (Manhattan distance, taxicab distance): `\(|\mathbf{x} - \mathbf{y}| = \sum_i |\mathbf{x}_i - \mathbf{y}_i| \)` * L
3
norm: `\(|\mathbf{x} - \mathbf{y}| = \sqrt[3]{\sum_i |\mathbf{x}_i - \mathbf{y}_i|^3} \)` The higher the power used, the more weight is given to the largest differences in components. (Extends out to: * L
0
norm (Hamming distance): `\(|\mathbf{x} - \mathbf{y}| = \sum_i \left\{\begin{matrix} 1 &\mbox{if}\ \mathbf{x}_i \neq \mathbf{y}_i , \\ 0 &\mbox{if}\ \mathbf{x}_i = \mathbf{y}_i \end{matrix} \right| \)` * L
∞
norm: `\(|\mathbf{x} - \mathbf{y}| = \max_i{(\mathbf{x}_i - \mathbf{y}_i)} \)` neither of which will be that useful.) --- # Normalisation of vectors Frequency distributions drawn from different sources will have different lengths. For a fair comparison we need to scale them. * Eucliean scaling (vector with unit length): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \sqrt{\mathbf{x}_1^2 + \mathbf{x}_2^2 + \mathbf{x}_3^2 + \dots } }\)` * Normalisation (components of vector sum to 1): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }\)` --- # Angle, not distance Rather than looking at the distance between the vectors, look at the angle between them. Vector dot product shows how much of one vector lies in the direction of another: `\( \mathbf{A} \bullet \mathbf{B} = \| \mathbf{A} \| \cdot \| \mathbf{B} \| \cos{\theta} \)` But, `\( \mathbf{A} \bullet \mathbf{B} = \sum_i \mathbf{A}_i \cdot \mathbf{B}_i \)` and `\( \| \mathbf{A} \| = \sum_i \mathbf{A}_i^2 \)` A bit of rearranging give the cosine simiarity: `\( \cos{\theta} = \frac{ \mathbf{A} \bullet \mathbf{B} }{ \| \mathbf{A} \| \cdot \| \mathbf{B} \| } = \frac{\sum_i \mathbf{A}_i \cdot \mathbf{B}_i}{\sum_i \mathbf{A}_i^2 \times \sum_i \mathbf{B}_i^2} \)` This is independent of vector lengths! Cosine similarity is 1 if in same direction, 0 if at 90⁰, -1 if antiparallel. ## FIXME! Cosine distance bug: frequencies2 length not squared. ## FIXME! Diagram of vector dot product