slides/keyword-break.html

   1 <!DOCTYPE html>
   2 <html>
   3   <head>
   4     <title>Breaking keyword ciphers</title>
   5     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
   6     <style type="text/css">
   7       /* Slideshow styles */
   8       body {
   9         font-size: 20px;
  10       }
  11       h1, h2, h3 {
  12         font-weight: 400;
  13         margin-bottom: 0;
  14       }
  15       h1 { font-size: 3em; }
  16       h2 { font-size: 2em; }
  17       h3 { font-size: 1.6em; }
  18       a, a > code {
  19         text-decoration: none;
  20       }
  21       code {
  22         -moz-border-radius: 5px;
  23         -web-border-radius: 5px;
  24         background: #e7e8e2;
  25         border-radius: 5px;
  26         font-size: 16px;
  27       }
  28       .plaintext {
  29         background: #272822;
  30         color: #80ff80;
  31         text-shadow: 0 0 20px #333;
  32         padding: 2px 5px;
  33       }
  34       .ciphertext {
  35         background: #272822;
  36         color: #ff6666;
  37         text-shadow: 0 0 20px #333;
  38         padding: 2px 5px;
  39       }
  40       .indexlink {
  41         position: absolute;
  42         bottom: 1em;
  43         left: 1em;
  44       }
  45        .float-right {
  46         float: right;
  47       }
  48     </style>
  49   </head>
  50   <body>
  51     <textarea id="source">
  52
  53 # Breaking keyword ciphers
  54
  55 a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
  56 --|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|--
  57 k | e | y | w | o | r | d | a | b | c | f | g | h | i | j | l | m | n | p | q | s | t | u | v | x | z
  58
  59 ---
  60
  61 layout: true
  62
  63 .indexlink[[Index](index.html)]
  64
  65 ---
  66
  67 # Duplicate and extend your `affine_break()` function
  68
  69 * How to cycle through all the keys? What _are_ all the keys?
  70
  71 * Look at `words.txt`
  72
  73 ---
  74
  75 # Test it.
  76
  77 * `2013/4a.ciphertext`
  78 * `2013/4b.ciphertext`
  79
  80 This will take a while. Fire up a system monitor. What's wrong?
  81
  82 ---
  83
  84 # Python, threads, and the GIL
  85
  86 Thread-safe shared-memory code is hard.
  87
  88 Python's Global Interpreter Lock prevents shooting yourself in the foot.
  89
  90 Where you want true parallelism, need different threads (Python processes).
  91
  92 * Thread-safe shared-memory code is hard.
  93
  94 The `multiprocessing` library makes this easier.
  95
  96 But before we get there, a couple of diversions...
  97
  98 ---
  99
 100 # DRYing code
 101
 102 Three cipher breaking tasks so far.
 103
 104 All working on the same principle:
 105
 106 ```
 107 find a way to enumerate all the possible keys
 108 initialise 'best so far'
 109 for each key:
 110     decipher message with this key
 111     score it
 112     if it's better than the best so far:
 113         update best so far
 114 ```
 115
 116 Repetition of code is a bad smell.
 117
 118 Separate the 'try all keys, keep the best' logic from the 'score this one key' logic.
 119
 120 ---
 121
 122 # map()
 123
 124 A common task is to apply a function to each item in a sequence, returning a sequence of the results.
 125
 126 ```python
 127 def double(x):
 128     return x * 2
 129
 130 >>> map(double, [1,2,3])
 131 [2,4,6]
 132 ```
 133
 134 * `map()` is a higher-order function: its first argument is the function that's applied.
 135
 136 How can we use this for keyword cipher breaking?
 137
 138 ---
 139
 140 # Mapping keyword decipherings
 141
 142 Define a function that takes a possible key (keyword and cipher type) and returns the key and its fitness.
 143
 144 * (Also pass in the message and the fitness function)
 145
 146 Use `map()` and `max()` to find the best key
 147
 148 ---
 149
 150 # Arity of print()
 151
 152 How many arguments does this take?
 153
 154 How do you write a function that takes this many arguments?
 155
 156 ---
 157
 158 # Function arguments
 159
 160 ## Positional, keyword
 161
 162 * Common or garden parameters, as you're used to.
 163 * `def keyword_encipher(message, keyword, Keyword_wrap_alphabet.from_a):`
 164
 165 ## Excess positional
 166 * `def mean(x, *xs):`
 167
 168 First number goes in `x`, remaining go in the tuple `xs`
 169
 170 ## Excess keyword
 171
 172 * `def myfunc(arg1=0, **kwargs):`
 173
 174 `kwargs` will be a Dict of the remaining keywords (not `arg1`)
 175
 176 ---
 177
 178 # Back to `multiprocessing`
 179
 180 What does `Pool.starmap()` do?
 181
 182 ---
 183
 184 ```python
 185 from multiprocessing import Pool
 186
 187 def keyword_break_mp(message, wordlist=keywords, fitness=Pletters, chunksize=500):
 188     helper_args = [??? for word in wordlist] # One tuple for each possible key
 189     with Pool() as pool:
 190         breaks = pool.starmap(keyword_break_worker, helper_args, chunksize)
 191         return max(breaks, key=lambda k: k[1])
 192
 193 def keyword_break_worker(???):
 194     ???
 195     return (key, fitness)
 196 ```
 197
 198 * Gotcha: the function in `Pool.starmap()` must be defined at the top level
 199     * This is definitely a "feature"
 200
 201 ---
 202
 203 # Performance and chunksize
 204
 205 Try the multiprocessing keyword break. Is it using all the resources?
 206
 207 Setting `chunksize` is an art.
 208
 209 ## Map-reduce as a general pattern for multiprocessing
 210
 211     </textarea>
 212     <script src="http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type="text/javascript">
 213     </script>
 214
 215     <script type="text/javascript"
 216       src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&delayStartupUntil=configured"></script>
 217
 218     <script type="text/javascript">
 219       var slideshow = remark.create({ ratio: "16:9" });
 220
 221       // Setup MathJax
 222       MathJax.Hub.Config({
 223         tex2jax: {
 224         skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
 225         }
 226       });
 227       MathJax.Hub.Queue(function() {
 228         $(MathJax.Hub.getAllJax()).map(function(index, elem) {
 229             return(elem.SourceElement());
 230         }).parent().addClass('has-jax');
 231       });
 232       MathJax.Hub.Configured();
 233     </script>
 234   </body>
 235 </html>