# Breaking transposition ciphers attack the fort at dawn a t t a c k t h e f o r t a t d a w n akod ttra aean cft Generally quite familiar... ## Try all the keys, pick the one that looks most like Englilsh --- # ...Pick one that looks most like English But the naïve Bayes score will always be the same! * Same letters, just a different order. Score by probability of substrings of letters * Bigrams, trigrams, _n_-grams --- # Finding _n_-grams Given `count_2l.txt` and `count_3l.txt`, counts of bigrams and trigrams in English # Write a function that returns all the _n_-grams for a text, given _n_ * Assume the text is already sanitised # Build `P2l`, `P3l` (after `Pl`), `Pbigrams`, `Ptrigrams` (after `Pletters`) --- # Breaking scytale What are the possible keys? --- # Try all the keys... *All* the keys? What's the transposition of 'cat'? * 'bat'? * 'car'? * 'wry'? * 'babe'? * 'powwow'? --- # Equivalence classes and canonical forms Lots of words yield the same transposition * They're all in the same equivalence class * Only need to test one from the class General idea: if there are different ways to represent something, pick one to make comparisons easier * Canonical form, canonical representation --- # Finding the transpositions to try ``` For each word: if it's a new transposition: add it to the list ``` What data structure to use to store the transpositions?