Merge conflict resolved

author Neil Smith <neil.git@njae.me.uk>

Sun, 27 Mar 2016 12:35:08 +0000 (13:35 +0100)

committer Neil Smith <neil.git@njae.me.uk>

Sun, 27 Mar 2016 12:35:08 +0000 (13:35 +0100)
author Neil Smith <neil.git@njae.me.uk>
Sun, 27 Mar 2016 12:35:08 +0000 (13:35 +0100)
committer Neil Smith <neil.git@njae.me.uk>
Sun, 27 Mar 2016 12:35:08 +0000 (13:35 +0100)
diff --git a/count_1l.txt b/count_1l.txt

new file mode 100644 (file)

index 0000000..e9ac0c6
--- /dev/null
+++ b/count_1l.txt
@@ -0,0 +1,26 @@
+e      758103
+t      560576
+o      504520
+a      490129
+i      421240
+n      419374
+h      416369
+s      404473
+r      373599
+d      267917
+l      259023
+u      190269
+m      172199
+w      154157
+y      143040
+c      141094
+f      135318
+g      117888
+p      100690
+b      92919
+v      65297
+k      54248
+x      7414
+j      6679
+q      5499
+z      3577
diff --git a/slides/caesar-break.html b/slides/caesar-break.html

index 7a2fbf6d550cbd8e8dc90bcea3b694c0e4dcd293..81a8396f6d7d8d67b841d5925e17c2a545e4405f 100644 (file)
--- a/slides/caesar-break.html
+++ b/slides/caesar-break.html
@@ -112,6 +112,8 @@ How do we define "closeness"?
  
  ## Abstraction: frequency of letter counts
  
+.float-right[![right-aligned Letter frequencies](letter-frequency-treemap.png)]
+
  Letter | Count
  -------|------
  a | 489107
@@ -146,7 +148,7 @@ Letter      | i       | f       | m       | m       | p       | ifmmp
  ------------|---------|---------|---------|---------|---------|-------
  Probability | 0.06723 | 0.02159 | 0.02748 | 0.02748 | 0.01607 | 1.76244520 × 10<sup>-8</sup>
  
-(Implmentation issue: this can often underflow, so get in the habit of rephrasing it as `\( \sum_i \log p_i \)`)
+(Implmentation issue: this can often underflow, so we rephrase it as `\( \sum_i \log p_i \)`)
  
  Letter      | h       | e       | l       | l       | o       | hello
  ------------|---------|---------|---------|---------|---------|-------
@@ -207,6 +209,8 @@ Text encodings will bite you when you least expect it.
  # Five minutes on StackOverflow later...
  
  ```python
+import unicodedata
+
  def unaccent(text):
      """Remove all accents from letters. 
      It does this by converting the unicode string to decomposed compatibility
@@ -246,8 +250,6 @@ with open('count_1l.txt', 'w') as f:
  
  # Reading letter probabilities
  
-New file: `language_models.py`
-
  1. Load the file `count_1l.txt` into a dict, with letters as keys.
  
  2. Normalise the counts (components of vector sum to 1): `$$ \hat{\mathbf{x}} = \frac{\mathbf{x}}{\| \mathbf{x} \|} = \frac{\mathbf{x}}{ \mathbf{x}_1 + \mathbf{x}_2 + \mathbf{x}_3 + \dots }$$`
diff --git a/slides/caesar-encipher.html b/slides/caesar-encipher.html

index 4afd78dad4f3367c5107a99d0f40ff8f1e0aa037..279c2bdf8aa96de2b978f9e012381fcb57705a71 100644 (file)
--- a/slides/caesar-encipher.html
+++ b/slides/caesar-encipher.html
@@ -90,7 +90,7 @@ Before doing anything, create a new branch in Git
  
  Experiment in IPython (ephemeral, for us)
  
-Once you've got something working, copy the code into a `.py` file (permanent and reusable)
+Once you've got something working, export the code into a `.py` file (permanent and reusable)
  
  ```python
  from imp import reload
@@ -224,6 +224,15 @@ ciphertext = [caesar_encipher_letter(p, key) for p in plaintext]
  ''.join()
  ```
  
+You'll be doing this a lot, so define a couple of utility functions:
+
+```python
+cat = ''.join
+wcat = ' '.join
+```
+
+`cat` after the Unix command (_concatenate_ files), `wcat` for _word concatenate_.
+
      </textarea>
      <script src="http://gnab.github.io/remark/downloads/remark-0.6.0.min.js" type="text/javascript">
      </script>
diff --git a/slides/letter-frequency-treemap.png b/slides/letter-frequency-treemap.png

new file mode 100644 (file)

index 0000000..256230e

Binary files /dev/null and b/slides/letter-frequency-treemap.png differ
author	Neil Smith <neil.git@njae.me.uk>
	Sun, 27 Mar 2016 12:35:08 +0000 (13:35 +0100)
committer	Neil Smith <neil.git@njae.me.uk>
	Sun, 27 Mar 2016 12:35:08 +0000 (13:35 +0100)
count_1l.txt	[new file with mode: 0644]	patch \| blob
slides/caesar-break.html		patch \| blob \| history
slides/caesar-encipher.html		patch \| blob \| history
slides/letter-frequency-treemap.png	[new file with mode: 0644]	patch \| blob