color: #ff6666;
text-shadow: 0 0 20px #333;
padding: 2px 5px;
+ }
+ .indexlink {
+ position: absolute;
+ bottom: 1em;
+ left: 1em;
}
.float-right {
float: right;
---
+layout: true
+
+.indexlink[[Index](index.html)]
+
+---
+
# Human vs Machine
Slow but clever vs Dumb but fast
How do we define "closeness"?
+## Here begineth the yak shaving
+
---
# What does English look like?
---
-# An infinite number of monkeys
+.float-right[![right-aligned Typing monkey](typingmonkeylarge.jpg)]
-What is the probability that this string of letters is a sample of English?
+# Naive Bayes, or the bag of letters
-## Naive Bayes, or the bag of letters
+What is the probability that this string of letters is a sample of English?
Ignore letter order, just treat each letter individually.
* Count them
```python
import collections
+collections.Counter()
```
Create the `language_models.py` file for this.
1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`.
2. Find the frequencies (`.update()`)
-3. Sort by count
-4. Write counts to `count_1l.txt` (`'text{}\n'.format()`)
+3. Sort by count (read the docs...)
+4. Write counts to `count_1l.txt`
+```python
+with open('count_1l.txt', 'w') as f:
+ for each letter...:
+ f.write('text\t{}\n'.format(count))
+```
---
# Breaking caesar ciphers
+New file: `cipherbreak.py`
+
## Remember the basic idea
```
---
-# How much ciphertext do we need?
+# Homework: how much ciphertext do we need?
## Let's do an experiment to find out
4. Score 1 point if `caesar_cipher_break()` recovers the correct key
5. Repeat many times and with many plaintext lengths
+```python
+import csv
+
+def show_results():
+ with open('caesar_break_parameter_trials.csv', 'w') as f:
+ writer = csv.DictWriter(f, ['name'] + message_lengths,
+ quoting=csv.QUOTE_NONNUMERIC)
+ writer.writeheader()
+ for scoring in sorted(scores.keys()):
+ scores[scoring]['name'] = scoring
+ writer.writerow(scores[scoring])
+```
</textarea>
<script src="http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type="text/javascript">