slides/caesar-break.html

   1 <!DOCTYPE html>
   2 <html>
   3   <head>
   4     <title>Breaking caesar ciphers</title>
   5     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
   6     <style type="text/css">
   7       /* Slideshow styles */
   8       body {
   9         font-size: 20px;
  10       }
  11       h1, h2, h3 {
  12         font-weight: 400;
  13         margin-bottom: 0;
  14       }
  15       h1 { font-size: 3em; }
  16       h2 { font-size: 2em; }
  17       h3 { font-size: 1.6em; }
  18       a, a > code {
  19         text-decoration: none;
  20       }
  21       code {
  22         -moz-border-radius: 5px;
  23         -web-border-radius: 5px;
  24         background: #e7e8e2;
  25         border-radius: 5px;
  26         font-size: 16px;
  27       }
  28       .plaintext {
  29         background: #272822;
  30         color: #80ff80;
  31         text-shadow: 0 0 20px #333;
  32         padding: 2px 5px;
  33       }
  34       .ciphertext {
  35         background: #272822;
  36         color: #ff6666;
  37         text-shadow: 0 0 20px #333;
  38         padding: 2px 5px;
  39       }
  40     </style>
  41   </head>
  42   <body>
  43     <textarea id="source">
  44
  45 # Breaking caesar ciphers
  46
  47 ![centre-aligned Caesar wheel](caesarwheel1.gif)
  48
  49 ---
  50
  51 # Human vs Machine
  52
  53 Slow but clever vs Dumb but fast
  54
  55 ## Human approach
  56
  57 Ciphertext | Plaintext
  58 ---|---
  59 ![left-aligned Ciphertext frequencies](c1a_frequency_histogram.png) | ![left-aligned English frequencies](english_frequency_histogram.png)
  60
  61 ---
  62
  63 # Human vs machine
  64
  65 ## Machine approach
  66
  67 Brute force.
  68
  69 Try all keys.
  70
  71 * How many keys to try?
  72
  73 ## Basic idea
  74
  75 ```
  76 for each key:
  77     decipher with this key
  78     how close is it to English?
  79     remember the best key
  80 ```
  81
  82 What steps do we know how to do?
  83
  84 ---
  85 # How close is it to English?
  86
  87 What does English look like?
  88
  89 * We need a model of English.
  90
  91 How do we define "closeness"?
  92
  93 ---
  94
  95 # What does English look like?
  96
  97 ## Abstraction: frequency of letter counts
  98
  99 Letter | Count
 100 -------|------
 101 a | 489107
 102 b | 92647
 103 c | 140497
 104 d | 267381
 105 e | 756288
 106 . | .
 107 . | .
 108 . | .
 109 z | 3575
 110
 111 One way of thinking about this is a 26-dimensional vector.
 112
 113 Create a vector of our text, and one of idealised English.
 114
 115 The distance between the vectors is how far from English the text is.
 116
 117 ---
 118
 119 # Vector distances
 120
 121
 122 ## FIXME! Diagram of vector subtraction here
 123
 124 Several different distance measures (__metrics__, also called __norms__):
 125
 126 * L<sub>2</sub> norm (Euclidean distance):  `\(\|\mathbf{x} - \mathbf{y}\| = \sqrt{\sum_i (\mathbf{x}_i - \mathbf{y}_i)^2} \)`
 127
 128 * L<sub>1</sub> norm (Manhattan distance, taxicab distance):  `\(\|\mathbf{x} - \mathbf{y}\| = \sum_i |\mathbf{x}_i - \mathbf{y}_i| \)`
 129
 130 * L<sub>3</sub> norm:  `\(\|\mathbf{x} - \mathbf{y}\| = \sqrt[3]{\sum_i |\mathbf{x}_i - \mathbf{y}_i|^3} \)`
 131
 132 The higher the power used, the more weight is given to the largest differences in components.
 133
 134 (Extends out to:
 135
 136 * L<sub>0</sub> norm (Hamming distance):  `\(\|\mathbf{x} - \mathbf{y}\| = \sum_i \left\{\begin{matrix} 1 &amp;\mbox{if}\ \mathbf{x}_i \neq \mathbf{y}_i , \\ 0 &amp;\mbox{if}\ \mathbf{x}_i = \mathbf{y}_i \end{matrix} \right| \)`
 137
 138 * L<sub>&infin;</sub> norm:  `\(\|\mathbf{x} - \mathbf{y}\| = \max_i{(\mathbf{x}_i - \mathbf{y}_i)} \)`
 139
 140 neither of which will be that useful.)
 141 ---
 142
 143 # Normalisation of vectors
 144
 145 Frequency distributions drawn from different sources will have different lengths. For a fair comparison we need to scale them.
 146
 147 * Eucliean scaling (vector with unit length): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \sqrt{\mathbf{x}_1^2 + \mathbf{x}_2^2 + \mathbf{x}_3^2 + \dots } }\)`
 148
 149 * Normalisation (components of vector sum to 1): `\( \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }\)`
 150
 151 ---
 152
 153 # Angle, not distance
 154
 155 Rather than looking at the distance between the vectors, look at the angle between them.
 156
 157 Vector dot product shows how much of one vector lies in the direction of another:
 158 `\( \mathbf{A} \bullet \mathbf{B} =
 159 \| \mathbf{A} \| \cdot \| \mathbf{B} \| \cos{\theta} \)`
 160
 161 But,
 162 `\( \mathbf{A} \bullet \mathbf{B} = \sum_i \mathbf{A}_i \cdot \mathbf{B}_i \)`
 163 and `\( \| \mathbf{A} \| = \sum_i \mathbf{A}_i^2 \)`
 164
 165 A bit of rearranging give the cosine simiarity:
 166 `\( \cos{\theta} = \frac{ \mathbf{A} \bullet \mathbf{B} }{ \| \mathbf{A} \| \cdot \| \mathbf{B} \| } =
 167 \frac{\sum_i \mathbf{A}_i \cdot \mathbf{B}_i}{\sum_i \mathbf{A}_i^2 \times \sum_i \mathbf{B}_i^2} \)`
 168
 169 This is independent of vector lengths!
 170
 171 Cosine similarity is 1 if in same direction, 0 if at 90⁰, -1 if antiparallel.
 172
 173 ## FIXME! Cosine distance bug: frequencies2 length not squared.
 174
 175
 176 ## FIXME! Diagram of vector dot product
 177
 178     </textarea>
 179     <script src="http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type="text/javascript">
 180     </script>
 181
 182     <script type="text/javascript"
 183       src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&delayStartupUntil=configured"></script>
 184
 185     <script type="text/javascript">
 186       var slideshow = remark.create({ ratio: "16:9" });
 187
 188       // Setup MathJax
 189       MathJax.Hub.Config({
 190         tex2jax: {
 191         skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
 192         }
 193       });
 194       MathJax.Hub.Queue(function() {
 195         $(MathJax.Hub.getAllJax()).map(function(index, elem) {
 196             return(elem.SourceElement());
 197         }).parent().addClass('has-jax');
 198       });
 199       MathJax.Hub.Configured();
 200     </script>
 201   </body>
 202 </html>