From: Neil Smith Date: Wed, 26 Mar 2014 16:49:16 +0000 (+0000) Subject: Merge branch 'presentation-slides' of github.com:NeilNjae/cipher-training into presen... X-Git-Url: https://git.njae.me.uk/?a=commitdiff_plain;ds=sidebyside;h=dd6476685c5b9a3ca11a47951dab769915c4f86c;hp=-c;p=cipher-training.git Merge branch 'presentation-slides' of github.com:NeilNjae/cipher-training into presentation-slides --- dd6476685c5b9a3ca11a47951dab769915c4f86c diff --combined slides/caesar-break.html index 9aea681,c2556cf..15607e7 --- a/slides/caesar-break.html +++ b/slides/caesar-break.html @@@ -197,9 -197,9 +197,9 @@@ def unaccent(text) # Find the frequencies of letters in English 1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`. -2. Find the frequencies -3. Sort by count (`sorted(, key=)` ; `.items()`, `.keys()`, `.values()`, `.get()`) -4. Write counts to `count_1l.txt` +2. Find the frequencies (`.update()`) +3. Sort by count +4. Write counts to `count_1l.txt` (`'text{}\n'.format()`) --- @@@ -314,6 -314,39 +314,39 @@@ We also need a way of passing the diffe Try them all on random ciphertexts, see which one works best. + --- + + # Reading letter probabilities + + 1. Load the file `count_1l.txt` into a dict, with letters as keys. + + 2. Normalise the counts (components of vector sum to 1): `$$ \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }$$` + * Return a new dict + * Remember the doctest! + + 3. Create a dict `Pl` that gives the log probability of a letter + + 4. Create a function `Pletters` that gives the probability of an iterable of letters + * What preconditions should this function have? + * Remember the doctest! + + --- + + # Breaking caesar ciphers (at last!) + + ## Remember the basic idea + + ``` + for each key: + decipher with this key + how close is it to English? + remember the best key + ``` + + Try it on the text in `2013/1a.ciphertext`. Does it work? + + --- +