## Abstraction: frequency of letter counts
+.float-right[![right-aligned Letter frequencies](letter-frequency-treemap.png)]
+
Letter | Count
-------|------
a | 489107
---
-# An infinite number of monkeys
+.float-right[![right-aligned Typing monkey](typingmonkeylarge.jpg)]
-What is the probability that this string of letters is a sample of English?
+# Naive Bayes, or the bag of letters
-## Naive Bayes, or the bag of letters
+What is the probability that this string of letters is a sample of English?
Ignore letter order, just treat each letter individually.
------------|---------|---------|---------|---------|---------|-------
Probability | 0.06723 | 0.02159 | 0.02748 | 0.02748 | 0.01607 | 1.76244520 × 10<sup>-8</sup>
-(Implmentation issue: this can often underflow, so get in the habit of rephrasing it as `\( \sum_i \log p_i \)`)
+(Implmentation issue: this can often underflow, so we rephrase it as `\( \sum_i \log p_i \)`)
Letter | h | e | l | l | o | hello
------------|---------|---------|---------|---------|---------|-------
# Five minutes on StackOverflow later...
```python
+import unicodedata
+
def unaccent(text):
"""Remove all accents from letters.
It does this by converting the unicode string to decomposed compatibility
1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`.
2. Find the frequencies (`.update()`)
-3. Sort by count
-4. Write counts to `count_1l.txt` (`'text{}\n'.format()`)
+3. Sort by count (read the docs...)
+4. Write counts to `count_1l.txt`
+```python
+with open('count_1l.txt', 'w') as f:
+ for each letter...:
+ f.write('text\t{}\n'.format(count))
+```
---
# Breaking caesar ciphers
+New file: `cipherbreak.py`
+
## Remember the basic idea
```