Breaking transposition ciphers

# Breaking transposition ciphers

attack the fort at dawn

a t t a c
    k t h e f
    o r t a t
    d a w n

akod ttra aean cft

Generally quite familiar...

## Try all the keys, pick the one that looks most like Englilsh

---

# ...Pick one that looks most like English

But the naïve Bayes score will always be the same!

* Same letters, just a different order.

Score by probability of substrings of letters

* Bigrams, trigrams, _n_-grams

---

# Finding _n_-grams

Given `count_2l.txt` and `count_3l.txt`, counts of bigrams and trigrams in English

# Write a function that returns all the _n_-grams for a text, given _n_
    * Assume the text is already sanitised

# Build `P2l`, `P3l` (after `Pl`), `Pbigrams`, `Ptrigrams` (after `Pletters`)

---

# Breaking scytale

What are the possible keys?

---

# Try all the keys...

*All* the keys?

What's the transposition of 'cat'?

* 'bat'?
* 'car'?
* 'wry'?
* 'babe'?
* 'powwow'?

---

# Equivalence classes and canonical forms

Lots of words yield the same transposition

* They're all in the same equivalence class
* Only need to test one from the class

General idea: if there are different ways to represent something, pick one to make comparisons easier

* Canonical form, canonical representation

---

# Finding the transpositions to try

```
For each word:
    if it's a new transposition:
        add it to the list
```

What data structure to use to store the transpositions?