From: Neil Smith Date: Wed, 12 Mar 2014 12:32:08 +0000 (+0000) Subject: Included frequency histogram diagrams X-Git-Url: https://git.njae.me.uk/?p=cipher-training.git;a=commitdiff_plain;h=05677f98ecb9a499853a19cb299b1880c0359c9a Included frequency histogram diagrams --- diff --git a/slides/c1a_frequency_histogram.png b/slides/c1a_frequency_histogram.png new file mode 100644 index 0000000..7ccbd42 Binary files /dev/null and b/slides/c1a_frequency_histogram.png differ diff --git a/slides/caesar-break.html b/slides/caesar-break.html index d552209..6df8365 100644 --- a/slides/caesar-break.html +++ b/slides/caesar-break.html @@ -48,9 +48,27 @@ --- -# Brute force +# Human vs Machine -How many keys to try? +Slow but clever vs Dumb but fast + +## Human approach + +Ciphertext | Plaintext +---|--- +![left-aligned Ciphertext frequencies](c1a_frequency_histogram.png) | ![left-aligned English frequencies](english_frequency_histogram.png) + +--- + +# Human vs machine + +## Machine approach + +Brute force. + +Try all keys. + +* How many keys to try? ## Basic idea @@ -67,6 +85,7 @@ What steps do we know how to do? # How close is it to English? What does English look like? + * We need a model of English. How do we define "closeness"? @@ -104,19 +123,19 @@ The distance between the vectors is how far from English the text is. Several different distance measures (__metrics__, also called __norms__): -* L2 norm (Euclidean distance): `\(|\mathbf{x} - \mathbf{y}| = \sqrt{\sum_i (\mathbf{x}_i - \mathbf{y}_i)^2} \)` +* L2 norm (Euclidean distance): `\(\|\mathbf{x} - \mathbf{y}\| = \sqrt{\sum_i (\mathbf{x}_i - \mathbf{y}_i)^2} \)` -* L1 norm (Manhattan distance, taxicab distance): `\(|\mathbf{x} - \mathbf{y}| = \sum_i |\mathbf{x}_i - \mathbf{y}_i| \)` +* L1 norm (Manhattan distance, taxicab distance): `\(\|\mathbf{x} - \mathbf{y}\| = \sum_i |\mathbf{x}_i - \mathbf{y}_i| \)` -* L3 norm: `\(|\mathbf{x} - \mathbf{y}| = \sqrt[3]{\sum_i |\mathbf{x}_i - \mathbf{y}_i|^3} \)` +* L3 norm: `\(\|\mathbf{x} - \mathbf{y}\| = \sqrt[3]{\sum_i |\mathbf{x}_i - \mathbf{y}_i|^3} \)` The higher the power used, the more weight is given to the largest differences in components. (Extends out to: -* L0 norm (Hamming distance): `\(|\mathbf{x} - \mathbf{y}| = \sum_i \left\{\begin{matrix} 1 &\mbox{if}\ \mathbf{x}_i \neq \mathbf{y}_i , \\ 0 &\mbox{if}\ \mathbf{x}_i = \mathbf{y}_i \end{matrix} \right| \)` +* L0 norm (Hamming distance): `\(\|\mathbf{x} - \mathbf{y}\| = \sum_i \left\{\begin{matrix} 1 &\mbox{if}\ \mathbf{x}_i \neq \mathbf{y}_i , \\ 0 &\mbox{if}\ \mathbf{x}_i = \mathbf{y}_i \end{matrix} \right| \)` -* L norm: `\(|\mathbf{x} - \mathbf{y}| = \max_i{(\mathbf{x}_i - \mathbf{y}_i)} \)` +* L norm: `\(\|\mathbf{x} - \mathbf{y}\| = \max_i{(\mathbf{x}_i - \mathbf{y}_i)} \)` neither of which will be that useful.) --- diff --git a/slides/english_frequency_histogram.png b/slides/english_frequency_histogram.png new file mode 100644 index 0000000..3ab182e Binary files /dev/null and b/slides/english_frequency_histogram.png differ