Added another recursive solution to the duplicates challenge

[cas-master-teacher-training.git] / duplicate-letters.ipynb
diff --git a/duplicate-letters.ipynb b/duplicate-letters.ipynb

index 26c182e582466f6518b5a25d5544ae5c74a873b4..5dddeceb0b39acc447b79043d8a5eaf89b838da9 100644 (file)
--- a/duplicate-letters.ipynb
+++ b/duplicate-letters.ipynb
@@ -1,7 +1,7 @@
  {
   "metadata": {
    "name": "",
-  "signature": "sha256:9a1b8b45577dfcc1daaffba595f74655068367c9070816fe6eeb9345bb55729d"
+  "signature": "sha256:09f57b8dd5238a1d205d7560ca860f967686132ecc69f4def5f741d8f6d3fe5d"
   },
   "nbformat": 3,
   "nbformat_minor": 0,
@@ -55,7 +55,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 4
+     "prompt_number": 10
      },
      {
       "cell_type": "code",
@@ -71,7 +71,7 @@
         "output_type": "pyout",
         "prompt_number": 11,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
@@ -104,7 +104,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 34
+     "prompt_number": 12
      },
      {
       "cell_type": "code",
@@ -118,13 +118,13 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 35,
+       "prompt_number": 13,
         "text": [
-        "['c', 'e', 'r']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 35
+     "prompt_number": 13
      },
      {
       "cell_type": "markdown",
@@ -161,7 +161,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 8
+     "prompt_number": 14
      },
      {
       "cell_type": "code",
@@ -175,13 +175,13 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 12,
+       "prompt_number": 15,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 12
+     "prompt_number": 15
      },
      {
       "cell_type": "markdown",
@@ -202,7 +202,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 64
+     "prompt_number": 16
      },
      {
       "cell_type": "code",
@@ -216,22 +216,28 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 65,
+       "prompt_number": 17,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 65
+     "prompt_number": 17
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "##Solution 5\n",
-      "A recursive approach. The idea is that a letter is a duplicate if it's either the first letter of the string and it occurs in the rest of the string, or it's a duplicate if it's duplicated in the rest of the string.\n",
+      "A recursive approach.\n",
+      "\n",
+      "The empty word contains no duplicates.\n",
+      "\n",
+      "Non-empty words might contain duplicates. \n",
        "\n",
-      "We keep track of the duplicates found so far to avoid double-counting letters that occur more than twice in the string."
+      "Split the word into the first letter and the rest. Find all the duplicates in the rest.\n",
+      "\n",
+      "If the first letter appears in the rest of the word, it's a duplicate, so add it to list of duplicates. (But don't add it if we've already decided it's a duplicate.)"
       ]
      },
      {
@@ -239,28 +245,87 @@
       "collapsed": false,
       "input": [
        "def duplicates_via_recursion(word):\n",
+      "    if word:\n",
+      "        first = word[0]\n",
+      "        rest = word[1:]\n",
+      "        duplicates_in_rest = duplicates_via_recursion(rest)\n",
+      "        if first in rest and first not in duplicates_in_rest:\n",
+      "            duplicates_in_rest += [first]\n",
+      "        return duplicates_in_rest\n",
+      "    else:\n",
+      "        return []"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 18
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "duplicates_via_recursion('occurrences')"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 19,
+       "text": [
+        "['e', 'r', 'c']"
+       ]
+      }
+     ],
+     "prompt_number": 19
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "##Solution 6\n",
+      "A recursive approach, using an accumulator.\n",
+      "\n",
+      "The previous solution built up the set of duplicates as the recursive calls unwound. This approach uses an accumulator to capture the duplicates as the recursive calls are made. This is the trick for prompting semi-smart compilers to use tail-call optimisation. The algorithm is a simple reordering of the subtasks above.\n",
+      "\n",
+      "Split the word into the first letter and the rest. \n",
+      "\n",
+      "If the first letter appears in the rest of the word, it's a duplicate, so add it to list of duplicates. (But don't add it if we've already decided it's a duplicate.)\n",
+      "\n",
+      "Find all the duplicates in the rest, keeping track of the duplicates found so far.\n",
+      "\n",
+      "When we run out of word, the duplicates found so far are all the duplicates, so return them."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def duplicates_via_recursion_with_accumulator(word):\n",
        "    return duplicates_via_recursion_helper(word, [])\n",
        "\n",
        "def duplicates_via_recursion_helper(word, duplicates_so_far):\n",
        "    if word:\n",
-      "        letter = word[0]\n",
-      "        if letter in word[1:] and letter not in duplicates_so_far:\n",
-      "            return duplicates_via_recursion_helper(word[1:], [letter] + duplicates_so_far)\n",
+      "        first = word[0]\n",
+      "        rest = word[1:]\n",
+      "        if first in rest and first not in duplicates_so_far:\n",
+      "            return duplicates_via_recursion_helper(rest, [first] + duplicates_so_far)\n",
        "        else:\n",
-      "            return duplicates_via_recursion_helper(word[1:], duplicates_so_far)\n",
+      "            return duplicates_via_recursion_helper(rest, duplicates_so_far)\n",
        "    else:\n",
        "        return duplicates_so_far"
       ],
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 25
+     "prompt_number": 20
      },
      {
       "cell_type": "code",
       "collapsed": false,
       "input": [
-      "duplicates_via_recursion('occurrences')"
+      "duplicates_via_recursion_with_accumulator('occurrences')"
       ],
       "language": "python",
       "metadata": {},
@@ -268,19 +333,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 26,
+       "prompt_number": 21,
         "text": [
          "['e', 'r', 'c']"
         ]
        }
       ],
-     "prompt_number": 26
+     "prompt_number": 21
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 6\n",
+      "##Solution 7\n",
        "Sort the word, then walk along it. A letter is duplicated if you find two consecutive letters the same. Store the results in a set, so that we don't have to worry about double-counting more than two letters."
       ]
      },
@@ -299,7 +364,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 29
+     "prompt_number": 22
      },
      {
       "cell_type": "code",
@@ -313,19 +378,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 30,
+       "prompt_number": 23,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 30
+     "prompt_number": 23
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 7\n",
+      "##Solution 8\n",
        "As above, but this time we store the duplicates in a list and use an additional variable `found_duplicates` to keep the state of whether we've found duplicates of this letter.\n",
        "\n",
        "Initially `found_duplicates` if `False`. As soon as we find a second occurrence of the letter, set `found_duplicates` to `True` and record the letter as a duplicate. Then continue to walk along the list until we find a different letter, when we reset `found_duplicates` to `False` again."
@@ -352,7 +417,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 55
+     "prompt_number": 24
      },
      {
       "cell_type": "code",
@@ -366,19 +431,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 56,
+       "prompt_number": 25,
         "text": [
          "['c', 'e', 'r']"
         ]
        }
       ],
-     "prompt_number": 56
+     "prompt_number": 25
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 8\n",
+      "##Solution 9\n",
        "Sort the string then split it into a list of lists, where each sublist is the group of identical letters that occur together. Then find the segments that have more than one letter."
       ]
      },
@@ -399,7 +464,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 49
+     "prompt_number": 26
      },
      {
       "cell_type": "code",
@@ -413,19 +478,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 50,
+       "prompt_number": 27,
         "text": [
          "['c', 'e', 'r']"
         ]
        }
       ],
-     "prompt_number": 50
+     "prompt_number": 27
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 9\n",
+      "##Solution 10\n",
        "The same as above, but without having to sort the word first. This requires an additional loop, where we look for the bucket to place this letter. If we can't find the bucket, create a new one."
       ]
      },
@@ -452,7 +517,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 43
+     "prompt_number": 28
      },
      {
       "cell_type": "code",
@@ -466,19 +531,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 44,
+       "prompt_number": 29,
         "text": [
          "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 44
+     "prompt_number": 29
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 10\n",
+      "##Solution 11\n",
        "Keep two lists: a list of pending duplicates, and a list of actual duplicates. The first time we see a letter, add it to the pending list. If we see the letter again, it's already in the pending list, so that prompts us to add it to the duplicates list."
       ]
      },
@@ -500,7 +565,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 27
+     "prompt_number": 30
      },
      {
       "cell_type": "code",
@@ -514,19 +579,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 28,
+       "prompt_number": 31,
         "text": [
          "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 28
+     "prompt_number": 31
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 11\n",
+      "##Solution 12\n",
        "Exactly the same as above, but using sets instead of lists. This means we don't have to worry about adding more than one copy of the letter to either `pending` or `duplicates`."
       ]
      },
@@ -546,7 +611,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 31
+     "prompt_number": 32
      },
      {
       "cell_type": "code",
@@ -560,19 +625,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 32,
+       "prompt_number": 33,
         "text": [
          "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 32
+     "prompt_number": 33
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 12\n",
+      "##Solution 13\n",
        "Creating a set removes all the duplicates. Make a set from the word, then remove each letter in the set from the word. Whatever letters are left are the duplicates. (Convert the duplicates to a set then to a list to eliminate letters that occur more than three times.)"
       ]
      },
@@ -590,7 +655,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 31
+     "prompt_number": 34
      },
      {
       "cell_type": "code",
@@ -604,19 +669,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 32,
+       "prompt_number": 35,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 32
+     "prompt_number": 35
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 13\n",
+      "##Solution 14\n",
        "Use the `Counter` in the standard `collections` library. This will count the occurrences of each letter, allowing us to simply find those with a count of more than one."
       ]
      },
@@ -634,7 +699,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 33
+     "prompt_number": 36
      },
      {
       "cell_type": "code",
@@ -648,20 +713,20 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 34,
+       "prompt_number": 37,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 34
+     "prompt_number": 37
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 14\n",
-      "Similar to solution 12 above, but this time using Counters as multisets (or bags), where we can just do multiset subtraction to find the duplicates. `letters` will have a count of 1 for each letter."
+      "##Solution 15\n",
+      "Similar to solution 13 above, but this time using Counters as multisets (or bags), where we can just do multiset subtraction to find the duplicates. `letters` will have a count of 1 for each letter."
       ]
      },
      {
@@ -679,7 +744,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 24
+     "prompt_number": 38
      },
      {
       "cell_type": "code",
@@ -693,19 +758,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 25,
+       "prompt_number": 39,
         "text": [
          "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 25
+     "prompt_number": 39
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 15\n",
+      "##Solution 16\n",
        "Using the `itertools.groupby` standard library function to find all the groups in the sorted word. We can then pull out the longer groups. \n",
        "\n",
        "The same approach as solution 8 above, but using library functions to simplify the implementation."
@@ -724,7 +789,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 41
+     "prompt_number": 40
      },
      {
       "cell_type": "code",
@@ -738,19 +803,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 42,
+       "prompt_number": 41,
         "text": [
          "['c', 'e', 'r']"
         ]
        }
       ],
-     "prompt_number": 42
+     "prompt_number": 41
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 16\n",
+      "##Solution 17\n",
        "Using `functools.reduce` to merge the counts of letters in sublists of the word. Start by counting all the letters in the length one sublists, then combine them. Finally, find the letters with counts more than one."
       ]
      },
@@ -776,7 +841,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 62
+     "prompt_number": 42
      },
      {
       "cell_type": "code",
@@ -790,19 +855,19 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 63,
+       "prompt_number": 43,
         "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 63
+     "prompt_number": 43
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-      "##Solution 17\n",
+      "##Solution 18\n",
        "Similar to above, but refactored to more clearly show the separate `map` and `reduce` stages of the map-reduce strategy.\n",
        "\n",
        "Also uses the `update` method of `Counter`s to merge intermediate results."
@@ -830,7 +895,7 @@
       "language": "python",
       "metadata": {},
       "outputs": [],
-     "prompt_number": 36
+     "prompt_number": 44
      },
      {
       "cell_type": "code",
@@ -844,13 +909,13 @@
        {
         "metadata": {},
         "output_type": "pyout",
-       "prompt_number": 37,
+       "prompt_number": 45,
         "text": [
-        "['c', 'e', 'r']"
+        "['c', 'r', 'e']"
         ]
        }
       ],
-     "prompt_number": 37
+     "prompt_number": 45
      },
      {
       "cell_type": "code",
@@ -858,7 +923,8 @@
       "input": [],
       "language": "python",
       "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": 45
      }
     ],
     "metadata": {}