Breaking caesar ciphers

# Breaking caesar ciphers

![centre-aligned Caesar wheel](caesarwheel1.gif)

---

# Brute force

How many keys to try?

## Basic idea

```
for each key:
    decipher with this key
    how close is it to English?
    remember the best key
```

What steps do we know how to do?

---
# How close is it to English?

What does English look like?
* We need a model of English.

How do we define "closeness"?

---

# What does English look like?

## Abstraction: frequency of letter counts

Letter | Count
-------|------
a | 489107
b | 92647
c | 140497
d | 267381
e | 756288
. | .
. | .
. | .
z | 3575

One way of thinking about this is a 26-dimensional vector.

Create a vector of our text, and one of idealised English.

The distance between the vectors is how far from English the text is.

---

# Vector distances

## FIXME! Diagram of vector subtraction here

Several different distance measures (__metrics__, also called __norms__):

* L2 norm (Euclidean distance): `\(|\mathbf{x} - \mathbf{y}| = \sqrt{\sum_i (\mathbf{x}_i - \mathbf{y}_i)^2} \)`

* L1 norm (Manhattan distance, taxicab distance): `\(|\mathbf{x} - \mathbf{y}| = \sum_i |\mathbf{x}_i - \mathbf{y}_i| \)`

* L3 norm: `\(|\mathbf{x} - \mathbf{y}| = \sqrt[3]{\sum_i |\mathbf{x}_i - \mathbf{y}_i|^3} \)`

The higher the power used, the more weight is given to the largest differences in components.

(Extends out to:

* L0 norm (Hamming distance): `\(|\mathbf{x} - \mathbf{y}| = \sum_i \left\{\begin{matrix} 1 &\mbox{if}\ \mathbf{x}_i \neq \mathbf{y}_i , \\ 0 &\mbox{if}\ \mathbf{x}_i = \mathbf{y}_i \end{matrix} \right| \)`

* L∞ norm: `\(|\mathbf{x} - \mathbf{y}| = \max_i{(\mathbf{x}_i - \mathbf{y}_i)} \)`

neither of which will be that useful.)
---

# Normalisation of vectors

Frequency distributions drawn from different sources will have different lengths. For a fair comparison we need to scale them.

* Eucliean scaling (vector with unit length): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \sqrt{\mathbf{x}_1^2 + \mathbf{x}_2^2 + \mathbf{x}_3^2 + \dots } }\)`

* Normalisation (components of vector sum to 1): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }\)`

---

# Angle, not distance

Rather than looking at the distance between the vectors, look at the angle between them.

Vector dot product shows how much of one vector lies in the direction of another: 
`\( \mathbf{A} \bullet \mathbf{B} = 
\| \mathbf{A} \| \cdot \| \mathbf{B} \| \cos{\theta} \)`

But, 
`\( \mathbf{A} \bullet \mathbf{B} = \sum_i \mathbf{A}_i \cdot \mathbf{B}_i \)`
and `\( \| \mathbf{A} \| = \sum_i \mathbf{A}_i^2 \)`

A bit of rearranging give the cosine simiarity:
`\( \cos{\theta} = \frac{ \mathbf{A} \bullet \mathbf{B} }{ \| \mathbf{A} \| \cdot \| \mathbf{B} \| } = 
\frac{\sum_i \mathbf{A}_i \cdot \mathbf{B}_i}{\sum_i \mathbf{A}_i^2 \times \sum_i \mathbf{B}_i^2} \)`

This is independent of vector lengths!

Cosine similarity is 1 if in same direction, 0 if at 90⁰, -1 if antiparallel.

## FIXME! Cosine distance bug: frequencies2 length not squared.

## FIXME! Diagram of vector dot product