projects
/
cipher-training.git
/ blobdiff
commit
grep
author
committer
pickaxe
?
search:
re
summary
|
shortlog
|
log
|
commit
|
commitdiff
|
tree
raw
|
inline
| side by side
Copied updated slides across
[cipher-training.git]
/
slides
/
caesar-break.html
diff --git
a/slides/caesar-break.html
b/slides/caesar-break.html
index 4d2ebfa0d01d556dbdd37599f3e4320f92fc4031..7a2fbf6d550cbd8e8dc90bcea3b694c0e4dcd293 100644
(file)
--- a/
slides/caesar-break.html
+++ b/
slides/caesar-break.html
@@
-128,11
+128,11
@@
Use this to predict the probability of each letter, and hence the probability of
---
---
-# An infinite number of monkeys
+.float-right[![right-aligned Typing monkey](typingmonkeylarge.jpg)]
-What is the probability that this string of letters is a sample of English?
+# Naive Bayes, or the bag of letters
-## Naive Bayes, or the bag of letters
+What is the probability that this string of letters is a sample of English?
Ignore letter order, just treat each letter individually.
Ignore letter order, just treat each letter individually.
@@
-234,13
+234,20
@@
def unaccent(text):
1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`.
2. Find the frequencies (`.update()`)
1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`.
2. Find the frequencies (`.update()`)
-3. Sort by count
-4. Write counts to `count_1l.txt` (`'text{}\n'.format()`)
+3. Sort by count (read the docs...)
+4. Write counts to `count_1l.txt`
+```python
+with open('count_1l.txt', 'w') as f:
+ for each letter...:
+ f.write('text\t{}\n'.format(count))
+```
---
# Reading letter probabilities
---
# Reading letter probabilities
+New file: `language_models.py`
+
1. Load the file `count_1l.txt` into a dict, with letters as keys.
2. Normalise the counts (components of vector sum to 1): `$$ \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }$$`
1. Load the file `count_1l.txt` into a dict, with letters as keys.
2. Normalise the counts (components of vector sum to 1): `$$ \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }$$`
@@
-257,6
+264,8
@@
def unaccent(text):
# Breaking caesar ciphers
# Breaking caesar ciphers
+New file: `cipherbreak.py`
+
## Remember the basic idea
```
## Remember the basic idea
```