Removing files from data analysis directory

[ou-summer-of-code-2017.git] / 03-door-codes / door-codes-solution.ipynb
diff --git a/03-door-codes/door-codes-solution.ipynb b/03-door-codes/door-codes-solution.ipynb

index 3f39c4964958930a43b30d6ab6100c33a24e77b6..f0ea3499828b6e4106fbb1d54ac20a90184adaac 100644 (file)
--- a/03-door-codes/door-codes-solution.ipynb
+++ b/03-door-codes/door-codes-solution.ipynb
@@ -21,18 +21,18 @@
      "2. \"Add\" the current letter of the pass phrase to the second letter of the code, replacing the second letter of the code\n",
      "3. Move on to the next letter of the pass phrase\n",
      "\n",
-    "\"Adding\" letters is done by converting the letters to their position in the alphabet (starting at one), adding, then converting the number back to a letter. Numbers greater than 26 are \"wrapped around\". For instance, to add `u` + `k`, convert them to numbers (`u`=`21`, `k`=`11`), add them (`21` + `11` = `32`), then convert back to a letter (`32` is larger than 26, so it becomes `32` - 26 = `6`, which is `f`).\n",
+    "\"Adding\" letters is done by converting the letters to their position in the alphabet (starting at one), adding, then converting the number back to a letter. Numbers greater than 26 are \"wrapped around\". For instance, to add `t` + `h`, convert them to numbers (`t`=`20`, `h`=`8`), add them (`20` + `8` = `28`), then convert back to a letter (`28` is larger than 26, so it becomes `28` - `26` = `2`, which is `b`).\n",
      "\n",
-    "Anything that isn't a lower-case letter is ignored.\n",
+    "All letters are converted to lower-case, and anything that isn't a letter is ignored.\n",
      "\n",
      "For example, to find the code from the pass phrase `the cat`, the code starts as being the first two letters `th`, then the subsequent letters are used to update the code as follows:\n",
      "\n",
      "| old code | code as<br>numbers | passphrase<br/>letter | number of<br/>letter | new first<br/>part of code |new second<br/>part of code | new code |\n",
      "|:---|:---|:---|:---|:---|:---|:---|\n",
-    "| th | (20, 8) | e | 5 | 28 | 13 | bm |\n",
-    "| bm | (2, 13) | c | 3 | 15 | 16 | op |\n",
-    "| op | (15, 16) | a | 1 | 31 | 17 | eq |\n",
-    "| eq | (5, 17) | t | 20 | 22 | 37 | vk |\n",
+    "| `th` | (20, 8) | `e` | 5 | 28 | 13 | `bm` |\n",
+    "| `bm` | (2, 13) | `c` | 3 | 15 | 16 | `op` |\n",
+    "| `op` | (15, 16) | `a` | 1 | 31 | 17 | `eq` |\n",
+    "| `eq` | (5, 17) | `t` | 20 | 22 | 37 | `vk` |\n",
      "\n",
      "So the final code is `vk`\n",
      "\n",
@@ -43,9 +43,29 @@
      "**What is your door code?**"
     ]
    },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Worked example of solution: Part 1\n",
+    "\n",
+    "While the overall shape of this is the same as previous days (walk along a list, updating the code as you reach each letter), there are a couple of wrinkles:\n",
+    "\n",
+    "1. Not every character in the input should be processed (and the others should be converted to lower-case letters).\n",
+    "2. The 'update the code' part is complex.\n",
+    "\n",
+    "\"Sanitising\" the input is, again, walking over the input, convering letters and discarding the rest. These are examples of standard approaches: `filter` is applying a predicate to every item in a sequence, returning just hose that pass; `map` is applying a function to every item in a sequence, returning the sequence of results. In this case, sanitising the input is `filter`ing to keep just the letters then `map`ping over the \"convert to lowercase\" function. Python's comprehensions do this: the general form is `f(x) for x in sequence if predicate(x)`\n",
+    "\n",
+    "Updating the code involves lots of faffing around, converting between characters and numbers. Rather than retyping lots of arithmetic, I define a couple of functions to do the conversions how I want. I've deliberately given them short names, as I want the functions to almost disappear in the program, becoming little more than punctuation. That will keep the focus on the important part, the updating.\n",
+    "\n",
+    "The `ord(letter) - ord('a')` and `chr(number + ord('a')` are standard idioms for converting from letters to positions in the alphabet. There's also moving the result by 1 to give one-based numbering, and the modulus operation `%` to keep the numbers in the range 0-25 before converting back to letters.\n",
+    "\n",
+    "Finally, the `string` library defines some convenient constants, which helps prevent annoying and hard-to-find typos if I wrote out the alphabet verbatim here."
+   ]
+  },
    {
     "cell_type": "code",
-   "execution_count": 98,
+   "execution_count": 2,
     "metadata": {
      "collapsed": true
     },
@@ -56,7 +76,39 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 99,
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def sanitise(phrase):\n",
+    "    return ''.join(l.lower() for l in phrase if l in string.ascii_letters)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'helloworld'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sanitise('Hello World')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
     "metadata": {
      "collapsed": true
     },
@@ -71,22 +123,49 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 100,
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "('a', 'z', 'z', 'a')"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "c(1), c(0), c(26), c(27)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
     "metadata": {
      "collapsed": true
     },
     "outputs": [],
     "source": [
-    "def sanitise(phrase):\n",
-    "    return ''.join(l for l in phrase if l in string.ascii_lowercase)"
+    "def whash1(word):\n",
+    "    h = list(word[:2])\n",
+    "    for l in word[2:]:\n",
+    "        h[0] = c(o(h[0]) + o(h[1]))\n",
+    "        h[1] = c(o(h[1]) + o(l))\n",
+    "    return ''.join(h)"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 101,
-   "metadata": {},
+   "execution_count": 20,
+   "metadata": {
+    "collapsed": true
+   },
     "outputs": [],
     "source": [
+    "# Extended version that generates the tables used in the question text.\n",
      "def whash1(word, show_steps=False):\n",
      "    if show_steps:\n",
      "        print('| old code | code as<br>numbers | passphrase<br/>letter | number of<br/>letter | new first<br/>part of code |'\n",
@@ -99,7 +178,7 @@
      "        h[0] = c(o(h[0]) + o(h[1]))\n",
      "        h[1] = c(o(h[1]) + o(l))\n",
      "        if show_steps:\n",
-    "            print('| {} | {} | {} | {} | {} | {} | {} |'.format(\n",
+    "            print('| `{}` | {} | `{}` | {} | {} | {} | `{}` |'.format(\n",
      "                  ''.join(h_before), \n",
      "                  (o(h_before[0]), o(h_before[1])),\n",
      "                  l, \n",
@@ -112,7 +191,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 102,
+   "execution_count": 9,
     "metadata": {},
     "outputs": [
      {
@@ -121,10 +200,10 @@
       "text": [
        "| old code | code as<br>numbers | passphrase<br/>letter | number of<br/>letter | new first<br/>part of code |new second<br/>part of code | new code |\n",
        "|:---|:---|:---|:---|:---|:---|:---|\n",
-      "| th | (20, 8) | e | 5 | 28 | 13 | bm |\n",
-      "| bm | (2, 13) | c | 3 | 15 | 16 | op |\n",
-      "| op | (15, 16) | a | 1 | 31 | 17 | eq |\n",
-      "| eq | (5, 17) | t | 20 | 22 | 37 | vk |\n"
+      "| `th` | (20, 8) | `e` | 5 | 28 | 13 | `bm` |\n",
+      "| `bm` | (2, 13) | `c` | 3 | 15 | 16 | `op` |\n",
+      "| `op` | (15, 16) | `a` | 1 | 31 | 17 | `eq` |\n",
+      "| `eq` | (5, 17) | `t` | 20 | 22 | 37 | `vk` |\n"
       ]
      },
      {
@@ -133,7 +212,7 @@
         "'vk'"
        ]
       },
-     "execution_count": 102,
+     "execution_count": 9,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -144,7 +223,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 103,
+   "execution_count": 10,
     "metadata": {
      "collapsed": true
     },
@@ -156,7 +235,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 104,
+   "execution_count": 24,
     "metadata": {},
     "outputs": [
      {
@@ -165,7 +244,7 @@
         "'mc'"
        ]
       },
-     "execution_count": 104,
+     "execution_count": 24,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -184,29 +263,38 @@
      "\n",
      "Indeed, the algorithm has been changed, \"to improve security,\" they say.\n",
      "\n",
-    "The new algorithm starts with the same initial value of the code for every passphrase. In in this case, the initial code is `ri`. The algorithm also uses two contants, `m1`= 5 and `m2` = 11.\n",
+    "The new algorithm starts with the same initial value of the code for every passphrase. In in this case, the initial code is `ri`. The algorithm also uses two contants, `alpha`= 5 and `beta` = 11.\n",
      "\n",
      "Then, for each letter in the pass phrase, the code is updated as follows:\n",
      "\n",
-    "1. \"Add\" the second letter of the code to the first letter of the code \"multiplied by\" `m1`, replacing the first letter of the code\n",
-    "2. \"Add\" the current letter of the pass phrase to the second letter of the code \"multiplied by\" `m2`, replacing the second letter of the code\n",
+    "1. \"Multiply\" the second letter of the code by `alpha` then add the first letter of the code, replacing the first code letter\n",
+    "2. \"Multiply\" the current letter of the pass phrase by `beta` and add the second letter of the code, replacing the second letter of the code\n",
      "\n",
-    "\"Multiplying\" letters is done by converting the letters to their position in the alphabet (starting at one) and multiplying. For instance, to multiply `u` by 11, convert `u` to `21`, multiply by 11 (`21` × `11` = `231`), then convert back to a letter (`231` is larger than 26, so it becomes `22`, which is `w`).\n",
+    "\"Multiplying\" letters is done by converting the letters to their position in the alphabet (starting at one) and multiplying. For instance, to multiply `u` by 11, convert `u` to `21`, multiply by 11 (`21` × `11` = `231`), then convert back to a letter (`231` is larger than 26, so it becomes `23`, which is `w`).\n",
      "\n",
-    "Again, anything that isn't a lower-case letter is ignored.\n",
+    "Again, all letters are converted to lower-case and anything that isn't a letter is ignored.\n",
      "\n",
-    "For example, to find the code from the pass phrase `the cat`, the code starts as being the first two letters `th`, then the subsequent letters are used to update the code as follows:\n",
+    "For example, to find the code from the pass phrase `the cat`, the code starts as being the first two letters `ri`. When the first letter is encrypted, the first letter of the code becomes:\n",
+    "\n",
+    "`r` + `i` × `alpha` = `18` + `9` × `5` = `63` = `k`\n",
+    "\n",
+    "and the second letter of the code becomes:\n",
+    "\n",
+    "`i` + `t` × `beta` = `9` + `20` × `11` = `229` = `u`\n",
+    "\n",
+    "The passphrase letters are used to update the code as follows:\n",
      "\n",
      "| old code | code as<br>numbers | passphrase<br/>letter | number of<br/>letter | new first<br/>part of code |new second<br/>part of code | new code |\n",
      "|:---|:---|:---|:---|:---|:---|:---|\n",
-    "| ri | (18, 9) | t | 20 | 63 | 229 | ku |\n",
-    "| ku | (11, 21) | h | 8 | 116 | 109 | le |\n",
-    "| le | (12, 5) | e | 5 | 37 | 60 | kh |\n",
-    "| kh | (11, 8) | c | 3 | 51 | 41 | yo |\n",
-    "| yo | (25, 15) | a | 1 | 100 | 26 | vz |\n",
-    "| vz | (22, 26) | t | 20 | 152 | 246 | vl |\n",
+    "| `ri` | (18, 9) | `t` | 20 | 63 | 229 | `ku` |\n",
+    "| `ku` | (11, 21) | `h` | 8 | 116 | 109 | `le` |\n",
+    "| `le` | (12, 5) | `e` | 5 | 37 | 60 | `kh` |\n",
+    "| `kh` | (11, 8) | `c` | 3 | 51 | 41 | `yo` |\n",
+    "| `yo` | (25, 15) | `a` | 1 | 100 | 26 | `vz` |\n",
+    "| `vz` | (22, 26) | `t` | 20 | 152 | 246 | `vl` |\n",
      "\n",
-    "So the final code is `pl`\n",
+    "\n",
+    "So the final code is `vl`\n",
      "\n",
      "## Part 2\n",
      "Your passphrase remains \"the traveller in the grey riding-coat, who called himself mr. melville, was contemplating the malice of which the gods are capable.\"\n",
@@ -214,33 +302,104 @@
      "Using this new algorithm, **what is your door code?**"
     ]
    },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Worked example of solution: Part 2\n",
+    "\n",
+    "This is almost identical to part 1, but the arithmetic is slightly different. Note the use of keyword arguments with default values, to allow the code to use different starting values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(21, 231, 23, 'w')"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "o('u'), o('u') * 11, (o('u') * 11 - 1) % 26 + 1, c(o('u') * 11)"
+   ]
+  },
    {
     "cell_type": "code",
-   "execution_count": 105,
+   "execution_count": 13,
     "metadata": {},
     "outputs": [
      {
       "data": {
        "text/plain": [
-       "(21, 231, 22, 'w')"
+       "(18, 9, 45, 63, 'k')"
        ]
       },
-     "execution_count": 105,
+     "execution_count": 13,
       "metadata": {},
       "output_type": "execute_result"
      }
     ],
     "source": [
-    "o('u'), o('u') * 11, (o('u') * 11 - 1) % 26, c(o('u') * 11)"
+    "o('r'), o('i'), o('i') * 5, o('r') + o('i') * 5, c(o('r') + o('i') * 5)"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 113,
+   "execution_count": 14,
     "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(9, 20, 220, 229, 'u')"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "o('i'), o('t'), o('t') * 11, o('i') + o('t') * 11, c(o('i') + o('t') * 11)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def whash2(word, h0=None, alpha=5, beta=11):\n",
+    "    if h0 is None:\n",
+    "        h = list('ri')\n",
+    "    else:\n",
+    "        h = list(h0)\n",
+    "    for l in word:\n",
+    "        h[0] = c(o(h[0]) + o(h[1]) * alpha)\n",
+    "        h[1] = c(o(h[1]) + o(l) * beta)\n",
+    "    return ''.join(h)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "collapsed": true
+   },
     "outputs": [],
     "source": [
-    "def whash2(word, h0=None, m1=5, m2=11, show_steps=False):\n",
+    "# Extended version that generates the tables used in the question text.\n",
+    "def whash2(word, h0=None, alpha=5, beta=11, show_steps=False):\n",
      "    if show_steps:\n",
      "        print('| old code | code as<br>numbers | passphrase<br/>letter | number of<br/>letter | new first<br/>part of code |'\n",
      "              'new second<br/>part of code | new code |')\n",
@@ -252,23 +411,23 @@
      "    for l in word:\n",
      "        if show_steps:\n",
      "            h_before = h.copy()\n",
-    "        h[0] = c(o(h[0]) + o(h[1]) * m1)\n",
-    "        h[1] = c(o(h[1]) + o(l) * m2)\n",
+    "        h[0] = c(o(h[0]) + o(h[1]) * alpha)\n",
+    "        h[1] = c(o(h[1]) + o(l) * beta)\n",
      "        if show_steps:\n",
-    "            print('| {} | {} | {} | {} | {} | {} | {} |'.format(\n",
+    "            print('| `{}` | {} | `{}` | {} | {} | {} | `{}` |'.format(\n",
      "                  ''.join(h_before), \n",
      "                  (o(h_before[0]), o(h_before[1])),\n",
      "                  l, \n",
      "                  o(l), \n",
-    "                  o(h_before[0]) + o(h_before[1]) * m1, \n",
-    "                  o(h_before[1]) + o(l) * m2, \n",
+    "                  o(h_before[0]) + o(h_before[1]) * alpha, \n",
+    "                  o(h_before[1]) + o(l) * beta, \n",
      "                  ''.join(h)))\n",
      "    return ''.join(h)"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 116,
+   "execution_count": 16,
     "metadata": {},
     "outputs": [
      {
@@ -277,12 +436,12 @@
       "text": [
        "| old code | code as<br>numbers | passphrase<br/>letter | number of<br/>letter | new first<br/>part of code |new second<br/>part of code | new code |\n",
        "|:---|:---|:---|:---|:---|:---|:---|\n",
-      "| ri | (18, 9) | t | 20 | 63 | 229 | ku |\n",
-      "| ku | (11, 21) | h | 8 | 116 | 109 | le |\n",
-      "| le | (12, 5) | e | 5 | 37 | 60 | kh |\n",
-      "| kh | (11, 8) | c | 3 | 51 | 41 | yo |\n",
-      "| yo | (25, 15) | a | 1 | 100 | 26 | vz |\n",
-      "| vz | (22, 26) | t | 20 | 152 | 246 | vl |\n"
+      "| `ri` | (18, 9) | `t` | 20 | 63 | 229 | `ku` |\n",
+      "| `ku` | (11, 21) | `h` | 8 | 116 | 109 | `le` |\n",
+      "| `le` | (12, 5) | `e` | 5 | 37 | 60 | `kh` |\n",
+      "| `kh` | (11, 8) | `c` | 3 | 51 | 41 | `yo` |\n",
+      "| `yo` | (25, 15) | `a` | 1 | 100 | 26 | `vz` |\n",
+      "| `vz` | (22, 26) | `t` | 20 | 152 | 246 | `vl` |\n"
       ]
      },
      {
@@ -291,7 +450,7 @@
         "'vl'"
        ]
       },
-     "execution_count": 116,
+     "execution_count": 16,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -302,7 +461,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 117,
+   "execution_count": 22,
     "metadata": {},
     "outputs": [
      {
@@ -311,7 +470,7 @@
         "'qb'"
        ]
       },
-     "execution_count": 117,
+     "execution_count": 22,
       "metadata": {},
       "output_type": "execute_result"
      }