# Breaking keyword ciphers a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z --|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|-- k | e | y | w | o | r | d | a | b | c | f | g | h | i | j | l | m | n | p | q | s | t | u | v | x | z --- layout: true .indexlink[[Index](index.html)] --- # Duplicate and extend your `affine_break()` function * How to cycle through all the keys? What _are_ all the keys? * Look at `words.txt` --- # Test it. * `2013/4a.ciphertext` * `2013/4b.ciphertext` This will take a while. Fire up a system monitor. What's wrong? --- # Python, threads, and the GIL Thread-safe shared-memory code is hard. Python's Global Interpreter Lock prevents shooting yourself in the foot. Where you want true parallelism, need different threads (Python processes). * Thread-safe shared-memory code is hard. The `multiprocessing` library makes this easier. But before we get there, a couple of diversions... --- # DRYing code Three cipher breaking tasks so far. All working on the same principle: ``` find a way to enumerate all the possible keys initialise 'best so far' for each key: decipher message with this key score it if it's better than the best so far: update best so far ``` Repetition of code is a bad smell. Separate the 'try all keys, keep the best' logic from the 'score this one key' logic. --- # map() A common task is to apply a function to each item in a sequence, returning a sequence of the results. ```python def double(x): return x * 2 >>> map(double, [1,2,3]) [2,4,6] ``` * `map()` is a higher-order function: its first argument is the function that's applied. How can we use this for keyword cipher breaking? --- # Mapping keyword decipherings Define a function that takes a possible key (keyword and cipher type) and returns the key and its fitness. * (Also pass in the message and the fitness function) Use `map()` and `max()` to find the best key --- # Arity of print() How many arguments does this take? How do you write a function that takes this many arguments? --- # Function arguments ## Positional, keyword * Common or garden parameters, as you're used to. * `def keyword_encipher(message, keyword, Keyword_wrap_alphabet.from_a):` ## Excess positional * `def mean(x, *xs):` First number goes in `x`, remaining go in the tuple `xs` ## Excess keyword * `def myfunc(arg1=0, **kwargs):` `kwargs` will be a Dict of the remaining keywords (not `arg1`) --- # Back to `multiprocessing` What does `Pool.starmap()` do? --- ```python from multiprocessing import Pool def keyword_break_mp(message, wordlist=keywords, fitness=Pletters, chunksize=500): helper_args = [??? for word in wordlist] # One tuple for each possible key with Pool() as pool: breaks = pool.starmap(keyword_break_worker, helper_args, chunksize) return max(breaks, key=lambda k: k[1]) def keyword_break_worker(???): ??? return (key, fitness) ``` * Gotcha: the function in `Pool.starmap()` must be defined at the top level * This is definitely a "feature" --- # Performance and chunksize Try the multiprocessing keyword break. Is it using all the resources? Setting `chunksize` is an art. ## Map-reduce as a general pattern for multiprocessing