Added notes for day 8

[ou-summer-of-code-2017.git] / 08-word-chains / visa-woes-solution.ipynb
diff --git a/08-word-chains/visa-woes-solution.ipynb b/08-word-chains/visa-woes-solution.ipynb

index 009e18721feeeeac73cb2b2e53544ed627c34d63..a030d8ae2707cdcf418de1f25a4ce46d3653f5d6 100644 (file)
--- a/08-word-chains/visa-woes-solution.ipynb
+++ b/08-word-chains/visa-woes-solution.ipynb
@@ -50,9 +50,136 @@
      "How many different offices could you visit in no more than 10 steps from `coax`?"
     ]
    },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Worked example solution: part 1\n",
+    "This is the first of two tasks intended to exercise skills developed in M269. In this case, this is about search. I won't go over the general idea of search itself here, as there are loads of tutorials online about it. Instead, I'll talk about the specific way I've implmeneted it in this solution.\n",
+    "\n",
+    "(See below for the <a href=\"#part2\">discussion of part 2</a>.)\n",
+    "\n",
+    "## Search\n",
+    "The offices/words can be thought of as a graph of nodes and edges. Each word is a node, and there's an edge between two words if those words differ by one letter. The diagrams below show the connections between words that are one step and two steps away from 'rush'.\n",
+    "\n",
+    "| Words one step from 'rush' | Words two steps from 'rush' |\n",
+    "| ------------- | ------------- |\n",
+    "| <a href=\"rush1.dot.png\"><img src=\"rush1.dot.png\" alt=\"Words one step from 'rush'\" style=\"width: 200px;\"/></a>     | <a href=\"rush2.dot.png\"><img src=\"rush2.dot.png\" alt=\"Words two steps from 'rush'\" style=\"width: 200px;\"/></a> |\n",
+    "\n",
+    "The task is to find a path, a sequence of words, from the starting word to the destination. This is a classic search problem.\n",
+    "\n",
+    "The key data structure is the _agenda_, a set of partial routes. We take a partial route from the agenda and _extend_ it with the words/rooms reachable from the end of the partial route, giving a new set of partial routes. Those are added back into the agenda.\n",
+    "\n",
+    "Search finishes either when the partial route we're processing is a solution (i.e. goes form the origin to destination room) or the agenda becomes empty.\n",
+    "\n",
+    "For instance, say we're going from `ache` to `ashy`. The initial agenda consists of just one partial route, and that partial route contains just `ache`.\n",
+    "\n",
+    "```\n",
+    "ache\n",
+    "```\n",
+    "\n",
+    "We take that item off the agenda, extend it (with `acne`, `acme`, `acre`, and `achy` as neighbours of `ache`). When we add those four items to the agenda, we get \n",
+    "\n",
+    "```\n",
+    "ache -> acne\n",
+    "ache -> acme\n",
+    "ache -> acre\n",
+    "ache -> achy\n",
+    "```\n",
+    "\n",
+    "We then proces the `ache -> acne` partial path, extending it with `acme` and `acre`, giving the agenda:\n",
+    "\n",
+    "```\n",
+    "ache -> acme\n",
+    "ache -> acre\n",
+    "ache -> achy\n",
+    "ache -> acne -> acme\n",
+    "ache -> acne -> acre\n",
+    "```\n",
+    "\n",
+    "(Note that while `ache` is adjacent to `acne`, we don't want to create a new partial path to `ache` as we've already visited it in this path.)\n",
+    "\n",
+    "We then proces the `ache -> acme` partial path, extending it with `acne` and `acre`, giving the agenda:\n",
+    "\n",
+    "```\n",
+    "ache -> acre\n",
+    "ache -> achy\n",
+    "ache -> acne -> acme\n",
+    "ache -> acne -> acre\n",
+    "ache -> acme -> acne\n",
+    "ache -> acme -> acre\n",
+    "```\n",
+    "\n",
+    "We then do `ache -> acre` and `ache -> achy` to give:\n",
+    "\n",
+    "```\n",
+    "ache -> acne -> acme\n",
+    "ache -> acne -> acre\n",
+    "ache -> acme -> acne\n",
+    "ache -> acme -> acre\n",
+    "ache -> acre -> acne\n",
+    "ache -> acre -> acme\n",
+    "ache -> achy -> ashy\n",
+    "```\n",
+    "\n",
+    "`ache -> acne -> acme` has only one valid extension, so we get:\n",
+    "\n",
+    "```\n",
+    "ache -> acne -> acre\n",
+    "ache -> acme -> acne\n",
+    "ache -> acme -> acre\n",
+    "ache -> acre -> acne\n",
+    "ache -> acre -> acme\n",
+    "ache -> achy -> ashy\n",
+    "ache -> acne -> acme -> acre\n",
+    "```\n",
+    "\n",
+    "We process all the other partial paths in turn until we get to the agenda looking like:\n",
+    "```\n",
+    "ache -> achy -> ashy\n",
+    "ache -> acne -> acme -> acre\n",
+    "ache -> acne -> acre -> acme\n",
+    "ache -> acme -> acne -> acre\n",
+    "ache -> acme -> acre -> acne\n",
+    "ache -> acre -> acne -> acme\n",
+    "ache -> acre -> acme -> acne\n",
+    "```\n",
+    "At this point, the first partial path in the agenda is a solution, so we report success and return that path.\n",
+    "\n",
+    "With breadth-first search, we add the newly-discovered partial paths to the end of the agenda, meaning we go through the graph one layer at a time. If we add the new partial paths to the front of the agenda, we have depth-first search, where we explore one trail fully before backtracking and trying another.\n",
+    "\n",
+    "There are other clever things we can do, such as sorting the agenda by predicted distance to go, which can speed things up.\n",
+    "\n",
+    "## Agenda and paths\n",
+    "As each partial path is a sequence of words in order, it makes sense to store it as a Python `list` of words. For breadth-first and depth-first searches, the agenda is a sequence of partial paths. Again, it makes sense to store this a a Python `list` of partial paths.\n",
+    "\n",
+    "For A* search, the agenda is sorted by the sum of path length and predicted cost to complete the path. In this case, it makes sense to represent the agenda by a priority queue, also known as a heap. I decided to use the standard Python `heapq` library, rather than the one used in M269. Partial paths are put on the heap with `heappush` and the partial path with the lowest cost is removed with `heappop`. \n",
+    "\n",
+    "## The graph of words\n",
+    "How to represent the graph of words? In the input, the connections between words are implicit, so we need to write a little procedure that returns all the neighbouring words for the word passed in. \n",
+    "\n",
+    "There's also a choice here: to we calculate and store the explicit graph of words, or do we rediscover the neighbours every time we want to process a word? The first approach consumes space to reduce time; the second uses time to reduce space. \n",
+    "\n",
+    "In this case, as there are only 2300 words and 20,000 connections, it's not a lot of space. The search will be examining lots of nodes, so I took a decision to explicity cache all the word neighbour relations first, then just look them up as needed. \n",
+    "\n",
+    "(It turns out this wasn't a good idea in terms of raw perfromance. Generating the graph takes ~130ms, but seems to give just about no change in performance for any search algorithm. Ho hum, but an example of where a [premature optimistion](https://en.wikiquote.org/wiki/Donald_Knuth) turns out to be costly!)\n",
+    "\n",
+    "## Important optimisation: closed sets\n",
+    "If you look at the diagrams and traces above, you'll see that the graph of words is very heavily connected, with several different routes from one word to another. But we often don't need to try out all these alternatives. If we're considering a partial route that ends at a particular word _w_, and we've already found another partial route to _w_ that's no longer than this one, there's no need to continue analysing this one. For instance, in the trace above, there's a route `ache -> acne -> acme`, but we've already found the route `ache -> acme`, so we can discard the `ache -> acne -> acme` alternative without worrying about missing possible solutions.\n",
+    "\n",
+    "If we use something like breadth-first search, we know that the first time we encounter a new word is the shortest path to that word. That means that all subsequent times we arrive that that word, we know we can discard that partial path. \n",
+    "\n",
+    "We maintain a `closed set` of the words we've already processed. We can use that set in two places. Either we check the partial path when we take it off the agenda, or we use the closed set to reduce the number of partial paths generated at each step. In the implementation below, I do the latter.\n",
+    "\n",
+    "## Another optimisation: `list` or `set` of words?\n",
+    "The obvious way to represent the known words is as a list. But, we don't care about the order of items in the collection of words. All we do with them is check for existence of a word, and iterate through the whole collection. In these cases, the Python `set` is a much more efficient data structure than the Pyhton `list`, as it uses something like a `dict` underneath. This means membership checking is much faster while iteration takes about the same amount of time.\n",
+    "\n",
+    "Therefore, rather than using `list`s, we'll use `set`s where possible. In fact, as the set of words doesn't change, we can use the `frozenset` type to indicate that the set is immutable."
+   ]
+  },
    {
     "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 116,
     "metadata": {
      "collapsed": true
     },
@@ -64,7 +191,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 117,
     "metadata": {},
     "outputs": [
      {
@@ -73,46 +200,73 @@
         "2336"
        ]
       },
-     "execution_count": 3,
+     "execution_count": 117,
       "metadata": {},
       "output_type": "execute_result"
      }
     ],
     "source": [
-    "words = [w.strip() for w in open('08-rooms.txt').readlines()]\n",
+    "words = frozenset(w.strip() for w in open('08-rooms.txt').readlines())\n",
      "len(words)"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 191,
     "metadata": {
      "collapsed": true
     },
     "outputs": [],
     "source": [
      "def adjacents(word):\n",
-    "    return [word[0:i] + l + word[i+1:]\n",
+    "    return frozenset(word[0:i] + l + word[i+1:]\n",
      "           for i in range(len(word))\n",
      "           for l in string.ascii_lowercase\n",
-    "           if l != word[i]]"
+    "           if l != word[i])"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
+   "execution_count": 192,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10 loops, best of 3: 130 ms per loop\n"
+     ]
+    }
+   ],
     "source": [
-    "neighbours = {w: [n for n in adjacents(w) if n in words]\n",
+    "%%timeit\n",
+    "neighbours = {w: frozenset(n for n in adjacents(w) if n in words)\n",
      "             for w in words}"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 193,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "20092"
+      ]
+     },
+     "execution_count": 193,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sum(len(neighbours[w]) for w in neighbours)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 233,
     "metadata": {
      "collapsed": true
     },
@@ -125,7 +279,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 195,
     "metadata": {
      "collapsed": true
     },
@@ -138,7 +292,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 196,
     "metadata": {
      "collapsed": true
     },
@@ -155,21 +309,24 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 197,
     "metadata": {
      "collapsed": true
     },
     "outputs": [],
     "source": [
-    "def extend_raw(chain):\n",
-    "    nbrs = [w for w in adjacents(chain[-1]) if w in words]\n",
+    "def extend_raw(chain, closed=None):\n",
+    "    if closed:\n",
+    "        nbrs = set(neighbours[chain[-1]]) - closed\n",
+    "    else:\n",
+    "        nbrs = neighbours[chain[-1]]\n",
      "    return [chain + [s] for s in nbrs\n",
      "           if s not in chain]"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 198,
     "metadata": {
      "collapsed": true
     },
@@ -195,7 +352,33 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 172,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def bfs_search_raw(start, goal, debug=False):\n",
+    "    agenda = [[start]]\n",
+    "    finished = False\n",
+    "    while not finished and agenda:\n",
+    "        current = agenda[0]\n",
+    "        if debug:\n",
+    "            print(current)\n",
+    "        if current[-1] == goal:\n",
+    "            finished = True\n",
+    "        else:\n",
+    "            successors = extend_raw(current)\n",
+    "            agenda = agenda[1:] + successors\n",
+    "    if finished:\n",
+    "        return current\n",
+    "    else:\n",
+    "        return None        "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 173,
     "metadata": {
      "collapsed": true
     },
@@ -223,7 +406,35 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 181,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def bfs_search_closed_raw(start, goal, debug=False):\n",
+    "    agenda = [[start]]\n",
+    "    closed = set()\n",
+    "    finished = False\n",
+    "    while not finished and agenda:\n",
+    "        current = agenda[0]\n",
+    "        if debug:\n",
+    "            print(current)\n",
+    "        if current[-1] == goal:\n",
+    "            finished = True\n",
+    "        else:\n",
+    "            closed.add(current[-1])\n",
+    "            successors = extend_raw(current, closed)\n",
+    "            agenda = agenda[1:] + successors\n",
+    "    if finished:\n",
+    "        return current\n",
+    "    else:\n",
+    "        return None   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 127,
     "metadata": {
      "collapsed": true
     },
@@ -249,7 +460,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 128,
     "metadata": {
      "collapsed": true
     },
@@ -277,7 +488,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 129,
     "metadata": {
      "collapsed": true
     },
@@ -306,7 +517,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 130,
     "metadata": {
      "collapsed": true
     },
@@ -336,27 +547,27 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 234,
     "metadata": {},
     "outputs": [
      {
       "data": {
        "text/plain": [
-       "['vice', 'dice', 'dire', 'dare', 'ware', 'wars']"
+       "['coax', 'coat', 'boat', 'boar', 'soar', 'star', 'stay']"
        ]
       },
-     "execution_count": 16,
+     "execution_count": 234,
       "metadata": {},
       "output_type": "execute_result"
      }
     ],
     "source": [
-    "astar_search('vice', 'wars')"
+    "astar_search('coax', 'stay')"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 235,
     "metadata": {},
     "outputs": [
      {
@@ -365,7 +576,7 @@
         "['vice', 'dice', 'dire', 'dare', 'ware', 'wars']"
        ]
       },
-     "execution_count": 17,
+     "execution_count": 235,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -376,7 +587,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 201,
     "metadata": {},
     "outputs": [
      {
@@ -385,7 +596,7 @@
         "['boon', 'boot', 'bolt', 'belt', 'bell', 'sell']"
        ]
       },
-     "execution_count": 18,
+     "execution_count": 201,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -396,27 +607,27 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 236,
     "metadata": {},
     "outputs": [
      {
       "data": {
        "text/plain": [
-       "6"
+       "7"
        ]
       },
-     "execution_count": 19,
+     "execution_count": 236,
       "metadata": {},
       "output_type": "execute_result"
      }
     ],
     "source": [
-    "len(astar_search('vice', 'wars'))"
+    "len(astar_search('coax', 'stay'))"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 203,
     "metadata": {
      "collapsed": true
     },
@@ -427,146 +638,251 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 237,
     "metadata": {},
     "outputs": [
      {
       "data": {
        "text/plain": [
-       "6"
+       "7"
        ]
       },
-     "execution_count": 21,
+     "execution_count": 237,
       "metadata": {},
       "output_type": "execute_result"
      }
     ],
     "source": [
-    "len(bfs_search_closed('vice', 'wars'))"
+    "len(bfs_search_closed('coax', 'stay'))"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 238,
     "metadata": {},
     "outputs": [
      {
       "data": {
        "text/plain": [
-       "793"
+       "458"
        ]
       },
-     "execution_count": 22,
+     "execution_count": 238,
       "metadata": {},
       "output_type": "execute_result"
      }
     ],
     "source": [
-    "len(dfs_search('vice', 'wars'))"
+    "len(dfs_search('coax', 'stay'))"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 239,
     "metadata": {},
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "10000 loops, best of 3: 157 µs per loop\n"
+      "1000 loops, best of 3: 605 µs per loop\n"
       ]
      }
     ],
     "source": [
      "%%timeit\n",
-    "astar_search('vice', 'wars')"
+    "astar_search('coax', 'stay')"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 240,
     "metadata": {},
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "100 loops, best of 3: 15.5 ms per loop\n"
+      "1000 loops, best of 3: 607 µs per loop\n"
       ]
      }
     ],
     "source": [
      "%%timeit\n",
-    "astar_search_raw('vice', 'wars')"
+    "astar_search_raw('coax', 'stay')"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 241,
     "metadata": {},
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "10000 loops, best of 3: 166 µs per loop\n"
+      "1000 loops, best of 3: 552 µs per loop\n"
       ]
      }
     ],
     "source": [
      "%%timeit\n",
-    "astar_search_closed('vice', 'wars')"
+    "astar_search_closed('coax', 'stay')"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
+   "execution_count": 243,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1 loop, best of 3: 4min 25s per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "bfs_search('coax', 'stay')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 244,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1 loop, best of 3: 4min 25s per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "bfs_search_raw('coax', 'stay')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 245,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1 loop, best of 3: 810 ms per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "bfs_search_closed('coax', 'stay')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 246,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1 loop, best of 3: 814 ms per loop\n"
+     ]
+    }
+   ],
     "source": [
-    "# %%timeit\n",
-    "# bfs_search('vice', 'wars')"
+    "%%timeit\n",
+    "bfs_search_closed_raw('coax', 'stay')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 247,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10 loops, best of 3: 26.8 ms per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "dfs_search('coax', 'stay')"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 248,
     "metadata": {},
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "1 loop, best of 3: 554 ms per loop\n"
+      "1 loop, best of 3: 4.39 s per loop\n"
       ]
      }
     ],
     "source": [
      "%%timeit\n",
-    "bfs_search_closed('vice', 'wars')"
+    "astar_search('amen', 'doff')"
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 249,
     "metadata": {},
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "10 loops, best of 3: 90.6 ms per loop\n"
+      "1 loop, best of 3: 4.4 s per loop\n"
       ]
      }
     ],
     "source": [
      "%%timeit\n",
-    "dfs_search('vice', 'wars')"
+    "astar_search_raw('amen', 'doff')"
     ]
    },
+  {
+   "cell_type": "code",
+   "execution_count": 250,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10 loops, best of 3: 87.4 ms per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "astar_search_closed('amen', 'doff')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  },
    {
     "cell_type": "markdown",
     "metadata": {},
@@ -580,12 +896,33 @@
      "\n",
      "There are 180 words reachable in up to three steps from `rash`.\n",
      "\n",
-    "How many words are reachable in up to ten steps from `vice`?"
+    "How many words are reachable in up to ten steps from `coax`?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "# <a name=\"part2\"></a>Worked example solution: part 2\n",
+    "After all the mucking around with search algorithms, this is actually much easier. It's all to do with building up sets.\n",
+    "\n",
+    "The basic idea is to maintain a `set` of reachable words which is extended in passes. In each pass, we find all the neighbours of all the words in the reachable word set, and add those neighbours to the set of reachable words. We just do the number of passes specified in in the task.\n",
+    "\n",
+    "In the code below, `reachable` is set of reachable words and `extras` is the set of new words found. As an optimisation, I use the set `boundary` to hold the words added in the previous pass, as the new words to add to `reachable` must be neighbours of a word in the `boundary`: all words that are neighbours of the 'interior' of the set of reachable words have already been added. so there's no need to process them again.\n",
+    "\n",
+    "The `trim_extras` flag is for another optimisation: eacy time we add some more words to `extras`, remove words which are already in `reachable`. That will make the later update of `reachable` faster.\n",
+    "\n",
+    "This approach is quick, as its based on sets and set membership checking, which is a fast process.\n",
+    "\n",
+    "This also suggests another way of solving part 1. Start at depth 1, and find all the words reachable at that depth. If the target word is in the reachable set, report success. If not, increment the depth and find the reachable words again.\n",
+    "\n",
+    "This approach does give the right answer of the minimal distance from source to goal words, but it doesn't give any information about the route to take to go from one word to another; the route is something returned by the search algorithms above."
     ]
    },
    {
     "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 217,
     "metadata": {
      "collapsed": true
     },
@@ -607,7 +944,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 218,
     "metadata": {},
     "outputs": [
      {
@@ -626,7 +963,7 @@
         " 'rasp']"
        ]
       },
-     "execution_count": 30,
+     "execution_count": 218,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -637,7 +974,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 219,
     "metadata": {
      "scrolled": true
     },
@@ -649,7 +986,7 @@
         " '`bash`, `cash`, `dash`, `gash`, `hash`, `lash`, `mash`, `rasp`, `rush`, `sash`, `wash`')"
        ]
       },
-     "execution_count": 31,
+     "execution_count": 219,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -660,7 +997,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 220,
     "metadata": {
      "scrolled": true
     },
@@ -672,7 +1009,7 @@
         " '<code>bash</code>, <code>cash</code>, <code>dash</code>, <code>gash</code>, <code>hash</code>, <code>lash</code>, <code>mash</code>, <code>rasp</code>, <code>rush</code>, <code>sash</code>, <code>wash</code>')"
        ]
       },
-     "execution_count": 32,
+     "execution_count": 220,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -683,7 +1020,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 221,
     "metadata": {
      "scrolled": true
     },
@@ -695,7 +1032,7 @@
         " '`base`, `bash`, `bask`, `bass`, `bast`, `bath`, `bosh`, `bush`, `case`, `cash`, `cask`, `cast`, `dash`, `dish`, `gash`, `gasp`, `gosh`, `gush`, `hash`, `hasp`, `hath`, `hush`, `lash`, `lass`, `last`, `lath`, `lush`, `mash`, `mask`, `mass`, `mast`, `math`, `mesh`, `mush`, `push`, `ramp`, `rasp`, `ruse`, `rush`, `rusk`, `rust`, `sash`, `sass`, `tush`, `wash`, `wasp`, `wish`')"
        ]
       },
-     "execution_count": 33,
+     "execution_count": 221,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -706,7 +1043,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 222,
     "metadata": {
      "scrolled": true
     },
@@ -718,7 +1055,7 @@
         " '<code>base</code>, <code>bash</code>, <code>bask</code>, <code>bass</code>, <code>bast</code>, <code>bath</code>, <code>bosh</code>, <code>bush</code>, <code>case</code>, <code>cash</code>, <code>cask</code>, <code>cast</code>, <code>dash</code>, <code>dish</code>, <code>gash</code>, <code>gasp</code>, <code>gosh</code>, <code>gush</code>, <code>hash</code>, <code>hasp</code>, <code>hath</code>, <code>hush</code>, <code>lash</code>, <code>lass</code>, <code>last</code>, <code>lath</code>, <code>lush</code>, <code>mash</code>, <code>mask</code>, <code>mass</code>, <code>mast</code>, <code>math</code>, <code>mesh</code>, <code>mush</code>, <code>push</code>, <code>ramp</code>, <code>rasp</code>, <code>ruse</code>, <code>rush</code>, <code>rusk</code>, <code>rust</code>, <code>sash</code>, <code>sass</code>, <code>tush</code>, <code>wash</code>, <code>wasp</code>, <code>wish</code>')"
        ]
       },
-     "execution_count": 34,
+     "execution_count": 222,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -729,7 +1066,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": 223,
     "metadata": {
      "scrolled": true
     },
@@ -740,7 +1077,7 @@
         "180"
        ]
       },
-     "execution_count": 35,
+     "execution_count": 223,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -751,7 +1088,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 36,
+   "execution_count": 224,
     "metadata": {
      "scrolled": true
     },
@@ -762,7 +1099,7 @@
         "2195"
        ]
       },
-     "execution_count": 36,
+     "execution_count": 224,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -773,7 +1110,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 37,
+   "execution_count": 225,
     "metadata": {
      "scrolled": true
     },
@@ -784,7 +1121,7 @@
         "2192"
        ]
       },
-     "execution_count": 37,
+     "execution_count": 225,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -795,7 +1132,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 38,
+   "execution_count": 226,
     "metadata": {
      "scrolled": true
     },
@@ -804,7 +1141,7 @@
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "100 loops, best of 3: 5.81 ms per loop\n"
+      "100 loops, best of 3: 6.2 ms per loop\n"
       ]
      }
     ],
@@ -815,7 +1152,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": 227,
     "metadata": {
      "scrolled": true
     },
@@ -824,7 +1161,7 @@
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "100 loops, best of 3: 2.71 ms per loop\n"
+      "100 loops, best of 3: 2.92 ms per loop\n"
       ]
      }
     ],
@@ -835,7 +1172,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 40,
+   "execution_count": 155,
     "metadata": {},
     "outputs": [
      {
@@ -844,7 +1181,7 @@
         "2188"
        ]
       },
-     "execution_count": 40,
+     "execution_count": 155,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -855,7 +1192,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 156,
     "metadata": {},
     "outputs": [
      {
@@ -864,7 +1201,7 @@
         "2193"
        ]
       },
-     "execution_count": 41,
+     "execution_count": 156,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -875,7 +1212,63 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 232,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(True, 1)"
+      ]
+     },
+     "execution_count": 232,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dist = 0\n",
+    "found = False\n",
+    "\n",
+    "while not found and distance < 50:\n",
+    "    dist += 1\n",
+    "    others = reachable_in('coax', distance)\n",
+    "    if 'stay' in others:\n",
+    "        found = True\n",
+    "\n",
+    "found, dist"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 229,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1000 loops, best of 3: 1.26 ms per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%timeit\n",
+    "distance = 0\n",
+    "found = False\n",
+    "\n",
+    "while not found and distance < 50:\n",
+    "    distance += 1\n",
+    "    others = reachable_in('coax', distance)\n",
+    "    if 'stay' in others:\n",
+    "        found = True\n",
+    "\n",
+    "found, distance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 157,
     "metadata": {},
     "outputs": [
      {
@@ -884,7 +1277,7 @@
         "280"
        ]
       },
-     "execution_count": 42,
+     "execution_count": 157,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -896,7 +1289,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 158,
     "metadata": {},
     "outputs": [
      {
@@ -905,7 +1298,7 @@
         "296"
        ]
       },
-     "execution_count": 43,
+     "execution_count": 158,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -920,16 +1313,16 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 159,
     "metadata": {},
     "outputs": [
      {
       "data": {
        "text/plain": [
-       "'know step yeps test pout book beck poor boos veep teed gent saws dyer hemp soon deem legs bets twit whim belt rock well bled whet prop spec weep peed loan week clef feed lock yens vent doer less teem toot yews blur clue pert pegs brew flea trig vets sued felt perk bobs brat tens beef leap bloc shoo moan boob jeez suet sits rood drag goat stay chip fled fist than loot pets suck peek pled heel hair sill prep char knot lens meek trap jeep beet crew draw tell nets peep meet sows glue bees bent pelt whit need foot bran chop gram chew teen veil lent gees fret soft leis pent gets weed keen reek news prim drab wets sics deed bows pier akin pock geed grey bogs goad skis peck prod loam keep pens hoot best reed yous sewn been tram pest whom bout unit deep boot dual lees omit peel sues berm dock yock shes lets mock grow jell whip lean thin sots geek stem floe tees sort fees gout mead dram said twin prom bras bier heft foul root fell melt bops coup coot atop brad chin moot shoe sacs chit hell door lead feet boon bolt leaf bell bias eery goop reel text boom herd rent leek rend club self send sick term airs frog yell next czar plus mean brow foam crow troy nest heed dell load glib peps cent chow pees bend jock held tent soil poop moor sped emit boss fest crop mews knit tout crag knob grab thaw vial sack help jets very boys flue whir newt wiry beep room pews crab baas cock coop hock blue flee moat skid quit dial brag cell aloe grad hews went leak toad wees mewl coat dent'"
+       "'brow chit bees hoot weep cock sots dock boob book boys brag bled eery sows cent crow mead aloe coup wiry less goat well vent deed rood moor belt atop bows draw prep chip sewn whom plus bets sued wets mean news czar newt fees chow reek flue troy pled geek boot akin lock bout leek fell feed shoe club heed bolt root peel thin yock vets crew load wees clue brad been meet pert boos beet legs mews self hock moot tell toad lead shoo jell teed sill skis veil glue went step test crop text peck bran teen crab pier bogs boss leis grow lent quit foam bras prod knot trap cell dram very goad felt chop sack omit poop pelt drag gram peps yens jeep brat prim jets goop tees deep dual bobs beef sits baas dyer lets leaf gees feet chew rend sues gent whet chin whip berm whim yell trig blue peek prop leak lean bent yeps bier drab bell foot heel boon fret moan send tens jock brew crag thaw sick beck gets bend pest loan geed herd skid toot grab pees hair poor rock best bloc pens coat bias fest heft mock lees loot gout sped sics tout frog nest flee meek stay weed doer stem peep hews grad peed knit keep pegs next twin blur coop week saws perk lens suet pews foul char vial soil term flea leap pent coot sort nets keen loam beep soft boom twit jeez glib teem than prom yews need veep shes bops dent moat hemp held reed whir whit deem know clef reel soon pock room tent floe pout help fist knob dell melt yous sacs unit door spec said hell mewl emit tram suck grey pets rent fled airs dial'"
        ]
       },
-     "execution_count": 44,
+     "execution_count": 159,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -940,7 +1333,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 160,
     "metadata": {},
     "outputs": [
      {
@@ -949,7 +1342,7 @@
         "['coax', 'coat', 'boat', 'boar', 'soar', 'star', 'stay']"
        ]
       },
-     "execution_count": 45,
+     "execution_count": 160,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -960,24 +1353,24 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 46,
+   "execution_count": 161,
     "metadata": {},
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
-      "CPU times: user 620 ms, sys: 4 ms, total: 624 ms\n",
-      "Wall time: 624 ms\n"
+      "CPU times: user 848 ms, sys: 0 ns, total: 848 ms\n",
+      "Wall time: 849 ms\n"
       ]
      },
      {
       "data": {
        "text/plain": [
-       "['coax', 'coat', 'chat', 'shat', 'spat', 'spay', 'stay']"
+       "['coax', 'coat', 'boat', 'boar', 'soar', 'star', 'stay']"
        ]
       },
-     "execution_count": 46,
+     "execution_count": 161,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -988,7 +1381,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 47,
+   "execution_count": 162,
     "metadata": {
      "collapsed": true
     },
@@ -999,7 +1392,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 163,
     "metadata": {},
     "outputs": [
      {
@@ -1008,7 +1401,7 @@
         "['czar', 'tzar', 'tear', 'sear', 'star', 'stay']"
        ]
       },
-     "execution_count": 48,
+     "execution_count": 163,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -1019,7 +1412,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 49,
+   "execution_count": 164,
     "metadata": {
      "collapsed": true
     },
@@ -1030,7 +1423,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 50,
+   "execution_count": 165,
     "metadata": {
      "collapsed": true
     },
@@ -1041,7 +1434,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 79,
+   "execution_count": 166,
     "metadata": {},
     "outputs": [
      {
@@ -1050,7 +1443,7 @@
         "185"
        ]
       },
-     "execution_count": 79,
+     "execution_count": 166,
       "metadata": {},
       "output_type": "execute_result"
      }
@@ -1063,7 +1456,7 @@
    },
    {
     "cell_type": "code",
-   "execution_count": 80,
+   "execution_count": 167,
     "metadata": {
      "collapsed": true,
      "scrolled": true
@@ -1110,7 +1503,7 @@
     "name": "python",
     "nbconvert_exporter": "python",
     "pygments_lexer": "ipython3",
-   "version": "3.5.2+"
+   "version": "3.5.3"
    }
   },
   "nbformat": 4,