---
-# An infinite number of monkeys
+.float-right[![right-aligned Typing monkey](typingmonkeylarge.jpg)]
-What is the probability that this string of letters is a sample of English?
+# Naive Bayes, or the bag of letters
-## Naive Bayes, or the bag of letters
+What is the probability that this string of letters is a sample of English?
Ignore letter order, just treat each letter individually.
1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`.
2. Find the frequencies (`.update()`)
-3. Sort by count
-4. Write counts to `count_1l.txt` (`'text{}\n'.format()`)
+3. Sort by count (read the docs...)
+4. Write counts to `count_1l.txt`
+```python
+with open('count_1l.txt', 'w') as f:
+ for each letter...:
+ f.write('text\t{}\n'.format(count))
+```
---
# Reading letter probabilities
+New file: `language_models.py`
+
1. Load the file `count_1l.txt` into a dict, with letters as keys.
2. Normalise the counts (components of vector sum to 1): `$$ \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }$$`
# Breaking caesar ciphers
+New file: `cipherbreak.py`
+
## Remember the basic idea
```
---
+# Using the tools
+
+Before doing anything, create a new branch in Git
+
+* This will keep your changes isolated
+
+Experiment in IPython (ephemeral, for us)
+
+Once you've got something working, copy the code into a `.py` file (permanent and reusable)
+
+```python
+from imp import reload
+
+import test
+reload(test)
+from test import *
+```
+
+Re-evaluate the second cell to reload the file into the IPython notebook
+
+When you've made progress, make a Git commit
+
+* Commit early and often!
+
+When you've finished, change back to `master` branch and `merge` the development branch
+
+---
+
# The [string module](http://docs.python.org/3.3/library/string.html) is your friend
```python
```
---
+
# DRY and YAGNI
Is your code DRY?
---
-# Doing all the letters
+# Doing the whole message
## Test-first developement
---
-# Doing all the letters
+# Doing the whole message
## Abysmal
ciphertext += caesar_encipher_letter(plaintext[i], key)
```
+Try it in IPython
+
---
-# Doing all the letters
+# Doing the whole message
## Bad
---
-# Doing all the letters
+# Doing the whole message
## Good (but unPythonic)
---
-# Doing all the letters
+# Doing the whole message
## Best
Repetition of code is a bad smell.
-Separate the 'try all keys, keep the best' logic from the 'score this one key' logic.
+Separate out
+
+* enumerate the keys
+* score a key
+* find the key with the best score
---
```python
class Pdist(dict):
def __init__(self, data=[]):
- for key, count in data2:
+ for key, count in data:
...
self.total = ...
def __missing__(self, key):
return the split with highest score
```
-Indexing pulls out letters. `'sometext'[0]` = 's' ; `'keyword'[3]` = 'e' ; `'keyword'[-1]` = 't'
+Indexing pulls out letters. `'sometext'[0]` = 's' ; `'sometext'[3]` = 'e' ; `'sometext'[-1]` = 't'
-Slices pulls out substrings. `'keyword'[1:4]` = 'ome' ; `'keyword'[:3]` = 'som' ; `'keyword'[5:]` = 'ext'
+Slices pulls out substrings. `'sometext'[1:4]` = 'ome' ; `'sometext'[:3]` = 'som' ; `'sometext'[5:]` = 'ext'
`range()` will sweep across the string