Bits of tinkering

[cipher-training.git] / slides / caesar-break.html
diff --git a/slides/caesar-break.html b/slides/caesar-break.html

index 5ea77b9ffb208baa62eb1c4437cad26a6ef19b55..090c43f9147340b37e2f6e11ef213c93a1320a39 100644 (file)
--- a/slides/caesar-break.html
+++ b/slides/caesar-break.html
@@ -36,6 +36,11 @@
          color: #ff6666;
          text-shadow: 0 0 20px #333;
          padding: 2px 5px;
+      }
+      .indexlink {
+        position: absolute;
+        bottom: 1em;
+        left: 1em;
        }
         .float-right {
          float: right;
@@ -51,6 +56,12 @@
  
  ---
  
+layout: true
+
+.indexlink[[Index](index.html)]
+
+---
+
  # Human vs Machine
  
  Slow but clever vs Dumb but fast
@@ -93,6 +104,8 @@ What does English look like?
  
  How do we define "closeness"?
  
+## Here begineth the yak shaving
+
  ---
  
  # What does English look like?
@@ -115,11 +128,11 @@ Use this to predict the probability of each letter, and hence the probability of
  
  ---
  
-# An infinite number of monkeys
+.float-right[![right-aligned Typing monkey](typingmonkeylarge.jpg)]
  
-What is the probability that this string of letters is a sample of English?
+# Naive Bayes, or the bag of letters
  
-## Naive Bayes, or the bag of letters
+What is the probability that this string of letters is a sample of English?
  
  Ignore letter order, just treat each letter individually.
  
@@ -154,6 +167,7 @@ open()
  * Count them
  ```python
  import collections
+collections.Counter()
  ```
  
  Create the `language_models.py` file for this.
@@ -193,6 +207,8 @@ Text encodings will bite you when you least expect it.
  # Five minutes on StackOverflow later...
  
  ```python
+import unicodedata
+
  def unaccent(text):
      """Remove all accents from letters. 
      It does this by converting the unicode string to decomposed compatibility
@@ -220,8 +236,13 @@ def unaccent(text):
  
  1. Read from `shakespeare.txt`, `sherlock-holmes.txt`, and `war-and-peace.txt`.
  2. Find the frequencies (`.update()`)
-3. Sort by count 
-4. Write counts to `count_1l.txt` (`'text{}\n'.format()`)
+3. Sort by count (read the docs...)
+4. Write counts to `count_1l.txt` 
+```python
+with open('count_1l.txt', 'w') as f:
+    for each letter...:
+        f.write('text\t{}\n'.format(count))
+```
  
  ---
  
@@ -243,6 +264,8 @@ def unaccent(text):
  
  # Breaking caesar ciphers
  
+New file: `cipherbreak.py`
+
  ## Remember the basic idea
  
  ```
@@ -279,7 +302,7 @@ Use `logger.debug()`, `logger.info()`, etc. to log a message.
  
  ---
  
-# How much ciphertext do we need?
+# Homework: how much ciphertext do we need?
  
  ## Let's do an experiment to find out
  
@@ -289,6 +312,18 @@ Use `logger.debug()`, `logger.info()`, etc. to log a message.
  4. Score 1 point if `caesar_cipher_break()` recovers the correct key
  5. Repeat many times and with many plaintext lengths
  
+```python
+import csv
+
+def show_results():
+    with open('caesar_break_parameter_trials.csv', 'w') as f:
+        writer = csv.DictWriter(f, ['name'] + message_lengths, 
+            quoting=csv.QUOTE_NONNUMERIC)
+        writer.writeheader()
+        for scoring in sorted(scores.keys()):
+            scores[scoring]['name'] = scoring
+            writer.writerow(scores[scoring])
+```
  
      </textarea>
      <script src="http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type="text/javascript">