Tweaked slide layout
[cipher-training.git] / slides / keyword-break.html
1 <!DOCTYPE html>
2 <html>
3 <head>
4 <title>Breaking keyword ciphers</title>
5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
6 <style type="text/css">
7 /* Slideshow styles */
8 body {
9 font-size: 20px;
10 }
11 h1, h2, h3 {
12 font-weight: 400;
13 margin-bottom: 0;
14 }
15 h1 { font-size: 3em; }
16 h2 { font-size: 2em; }
17 h3 { font-size: 1.6em; }
18 a, a > code {
19 text-decoration: none;
20 }
21 code {
22 -moz-border-radius: 5px;
23 -web-border-radius: 5px;
24 background: #e7e8e2;
25 border-radius: 5px;
26 font-size: 16px;
27 }
28 .plaintext {
29 background: #272822;
30 color: #80ff80;
31 text-shadow: 0 0 20px #333;
32 padding: 2px 5px;
33 }
34 .ciphertext {
35 background: #272822;
36 color: #ff6666;
37 text-shadow: 0 0 20px #333;
38 padding: 2px 5px;
39 }
40 .indexlink {
41 position: absolute;
42 bottom: 1em;
43 left: 1em;
44 }
45 .float-right {
46 float: right;
47 }
48 </style>
49 </head>
50 <body>
51 <textarea id="source">
52
53 # Breaking keyword ciphers
54
55 a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
56 --|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|--
57 k | e | y | w | o | r | d | a | b | c | f | g | h | i | j | l | m | n | p | q | s | t | u | v | x | z
58
59 ---
60
61 layout: true
62
63 .indexlink[[Index](index.html)]
64
65 ---
66
67 # Duplicate and extend your `affine_break()` function
68
69 * How to cycle through all the keys? What _are_ all the keys?
70
71 * Look at `words.txt`
72
73 ---
74
75 # Test it.
76
77 * `2013/4a.ciphertext`
78 * `2013/4b.ciphertext`
79
80 This will take a while. Fire up a system monitor. What's wrong?
81
82 ---
83
84 # Python, threads, and the GIL
85
86 Thread-safe shared-memory code is hard.
87
88 Python's Global Interpreter Lock prevents shooting yourself in the foot.
89
90 Where you want true parallelism, need different threads (Python processes).
91
92 * Thread-safe shared-memory code is hard.
93
94 The `multiprocessing` library makes this easier.
95
96 But before we get there, a couple of diversions...
97
98 ---
99
100 # DRYing code
101
102 Three cipher breaking tasks so far.
103
104 All working on the same principle:
105
106 ```
107 find a way to enumerate all the possible keys
108 initialise 'best so far'
109 for each key:
110 decipher message with this key
111 score it
112 if it's better than the best so far:
113 update best so far
114 ```
115
116 Repetition of code is a bad smell.
117
118 Separate the 'try all keys, keep the best' logic from the 'score this one key' logic.
119
120 ---
121
122 # map()
123
124 A common task is to apply a function to each item in a sequence, returning a sequence of the results.
125
126 ```python
127 def double(x):
128 return x * 2
129
130 >>> map(double, [1,2,3])
131 [2,4,6]
132 ```
133
134 * `map()` is a higher-order function: its first argument is the function that's applied.
135
136 How can we use this for keyword cipher breaking?
137
138 ---
139
140 # Mapping keyword decipherings
141
142 Define a function that takes a possible key (keyword and cipher type) and returns the key and its fitness.
143
144 * (Also pass in the message and the fitness function)
145
146 Use `map()` and `max()` to find the best key
147
148 ---
149
150 # Arity of print()
151
152 How many arguments does this take?
153
154 How do you write a function that takes this many arguments?
155
156 ---
157
158 # Function arguments
159
160 ## Positional, keyword
161
162 * Common or garden parameters, as you're used to.
163 * `def keyword_encipher(message, keyword, Keyword_wrap_alphabet.from_a):`
164
165 ## Excess positional
166 * `def mean(x, *xs):`
167
168 First number goes in `x`, remaining go in the tuple `xs`
169
170 ## Excess keyword
171
172 * `def myfunc(arg1=0, **kwargs):`
173
174 `kwargs` will be a Dict of the remaining keywords (not `arg1`)
175
176 ---
177
178 # Back to `multiprocessing`
179
180 What does `Pool.starmap()` do?
181
182 ---
183
184 ```python
185 from multiprocessing import Pool
186
187 def keyword_break_mp(message, wordlist=keywords, fitness=Pletters, chunksize=500):
188 helper_args = [??? for word in wordlist] # One tuple for each possible key
189 with Pool() as pool:
190 breaks = pool.starmap(keyword_break_worker, helper_args, chunksize)
191 return max(breaks, key=lambda k: k[1])
192
193 def keyword_break_worker(???):
194 ???
195 return (key, fitness)
196 ```
197
198 * Gotcha: the function in `Pool.starmap()` must be defined at the top level
199 * This is definitely a "feature"
200
201 ---
202
203 # Performance and chunksize
204
205 Try the multiprocessing keyword break. Is it using all the resources?
206
207 Setting `chunksize` is an art.
208
209 ## Map-reduce as a general pattern for multiprocessing
210
211 </textarea>
212 <script src="http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type="text/javascript">
213 </script>
214
215 <script type="text/javascript"
216 src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&delayStartupUntil=configured"></script>
217
218 <script type="text/javascript">
219 var slideshow = remark.create({ ratio: "16:9" });
220
221 // Setup MathJax
222 MathJax.Hub.Config({
223 tex2jax: {
224 skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
225 }
226 });
227 MathJax.Hub.Queue(function() {
228 $(MathJax.Hub.getAllJax()).map(function(index, elem) {
229 return(elem.SourceElement());
230 }).parent().addClass('has-jax');
231 });
232 MathJax.Hub.Configured();
233 </script>
234 </body>
235 </html>