Added another recursive solution to the duplicates challenge
authorNeil Smith <neil.git@njae.me.uk>
Wed, 11 Feb 2015 11:26:46 +0000 (11:26 +0000)
committerNeil Smith <neil.git@njae.me.uk>
Wed, 11 Feb 2015 11:26:46 +0000 (11:26 +0000)
SIGNED.md
duplicate-letters.ipynb

index 5e8e1781dd5b786b3244527ae8188d01178192e6..db57c764cc97996286fa9a2119745613920e76ed 100644 (file)
--- a/SIGNED.md
+++ b/SIGNED.md
@@ -3,19 +3,19 @@
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
-iQIcBAABAgAGBQJU17JdAAoJEJPB2e07PgbqqdAP/A0wHqJCvYJ1FhrfqBZlMm0w
-d6rIBnNCIxCOLhZqdHlKNYJc3BQd+p82L+OXouV6c0pXdxd57jNCPZxPUWRE5g8t
-+jyO93zJ7MNoQJKGrVwHtlqbmzrMWfEzuc6yFsbmX3Vj64NFtc1XykTCnn+z+cq8
-ARAkW31nf0a2BVylX+RdEdHf0o2eSukfNolCAKNSj5EJXoEKmPiHnl++2O+E3esS
-Lg/NIy1RREWT18GNWJyM6mp/54RmLi4UDEczp7M4QVPAy0KtbhtalVktgcE0OYDd
-+92FsmUcr5Eh49FC1CADmviIzcAoGB3rm178zOnK7UiFZOCF8stGNObD+FrzOudi
-ZUQmt+/88iKP0Y+KUqbktC8LukNFKAOwKP+acByt9J7EJ82d0WOg0r+muFxidld+
-Y308Ka6avk4FcjG7Z6wnO8gqHBV8dkAaMl4fA9r9lQmOJrWCGiRv4KtOY0BeVYXT
-17Izc5j87R75N3QpjsG16Z1IqUWKXCd5XRtfWfKHRi7tWPw7O8B6OPQYYeZAz51T
-jqcXAENAy+1X7eIm9i9dbvFQ60zy19kt18eeNQMDpvLWMzvG0Xm6c5R8zPGvRq7H
-G579la7H/pNGXqYWughjnzldgKDbrNIhlBEg8uELL2WPBOakRxISzf8yhSuHyA4J
-EozFRjYaEH4P3q3RU5WE
-=fjvz
+iQIcBAABAgAGBQJU2zx6AAoJEJPB2e07Pgbq9yMP/Rd1Hi+/ssIKfd2QjrhCOMf4
+HE4bF7uJqm+HiyLYzrdZAmCuoSBCAYqKM1IktIPSLS8KK4Vc3c/8S2H+gQ9mCQ10
+FC6aQFbgmGBdgEVvo/kF5UGaGZ1J7wowWK980Wd0+tfg5qWOUPn/AsPsVgeOoPVz
+ChjEfewRF5DHY81wszSRY4CWadQkOn85qbTYLZwuCjIXtE7zYpi4VIVL1+g/u/dl
+qNzqC5+mOX0XF2GE5ZNT3FwLRGvYb5aO6O/zgkkIHp5PAcCYIV2NAorpI5OaSE/S
+zsv7ETwxUdn+QKvEtjZxVPk2sWPJL42Kqh1locvUJlQJLO/TrcW0yXKKCsoiXhHX
+lNtUTBNhiHjqNO7HkVxIwD9qwiDuZY+uW2sf2NBRaSPApRvGYb594/bwgb5b5iPt
+hnVqPHXftjoYFxTk0zrd2Uo9REtRPfwfR/B8VpyeIZ4PeTDlI0+JdoLXPP2JmLBL
+TbHpEaTKhpUEihgXMZSMVAUSFPclz4EcjOktMA3y7TgHdLQkffwwuZyjMjTg6h5a
+QWGYC5UAGUIe1qMAdqQtCJ6WIBm3FxU3j2DiVDQrnACeOPxWuYpM5Dp7gDgwMhtM
+46tF2jm+knQs2C3hNCYDnpQ2vMtNi3f3DdYpL4CyQNipm9RdN+PFYvyqACYR9xAs
+c+JhrEO8NkEw1axJHFDC
+=m4rZ
 -----END PGP SIGNATURE-----
 
 ```
@@ -38,7 +38,7 @@ size    exec  file                                 contents
 27510           big-o-chart-2.png                  a9f40d4499bd43d07bad2b9674b3055a9a3f319349a1256d88a9419f206287d7|56acfe8154f080f9d3186170980a44efd640d4c1748c37c7031f1f20d2f3342e
 11222           big-o-chart-smaller.png            5b453ac13c187ea797dbdcf03c2f1941c730a952e285d2afbfde83cf231f0bc3|ba81f54c5daae7c199d9eb66f58db952e52e6b15cf9c69ed34245c4400ffc318
 31971           big-o-chart.png                    82cea33b61f5af16f928d7037877b31526e532ea62a7cf99ca3761bc4313e4f1|0c43ead93dc010d2fee9724cfda23f2d922cf58931d6258f1da9977fd5bb8cd1
-23974           duplicate-letters.ipynb            dae47262661abc6bca7526b10cd9928905b94eb6b4c04577a8491be10bad3a4d                                                                 
+26138           duplicate-letters.ipynb            cf873a1d288a94e9ed365a903ca0039a4664e1306fda83d56fab43d93084d13d                                                                 
 21947           euler-11.ipynb                     81780dca17496751894f582e8691935ee6f94949017fb3135087056489f8ad88                                                                 
 24903           euler-14.ipynb                     3652a2be86ec270420431814efa453f8679ad054ae28c015d34232c48dac16bd                                                                 
                 hangman/                                                                                                                                                            
index 26c182e582466f6518b5a25d5544ae5c74a873b4..5dddeceb0b39acc447b79043d8a5eaf89b838da9 100644 (file)
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:9a1b8b45577dfcc1daaffba595f74655068367c9070816fe6eeb9345bb55729d"
+  "signature": "sha256:09f57b8dd5238a1d205d7560ca860f967686132ecc69f4def5f741d8f6d3fe5d"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -55,7 +55,7 @@
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 4
+     "prompt_number": 10
     },
     {
      "cell_type": "code",
@@ -71,7 +71,7 @@
        "output_type": "pyout",
        "prompt_number": 11,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 34
+     "prompt_number": 12
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 35,
+       "prompt_number": 13,
        "text": [
-        "['c', 'e', 'r']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 35
+     "prompt_number": 13
     },
     {
      "cell_type": "markdown",
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 8
+     "prompt_number": 14
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 12,
+       "prompt_number": 15,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 12
+     "prompt_number": 15
     },
     {
      "cell_type": "markdown",
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 64
+     "prompt_number": 16
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 65,
+       "prompt_number": 17,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 65
+     "prompt_number": 17
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
       "##Solution 5\n",
-      "A recursive approach. The idea is that a letter is a duplicate if it's either the first letter of the string and it occurs in the rest of the string, or it's a duplicate if it's duplicated in the rest of the string.\n",
+      "A recursive approach.\n",
+      "\n",
+      "The empty word contains no duplicates.\n",
+      "\n",
+      "Non-empty words might contain duplicates. \n",
       "\n",
-      "We keep track of the duplicates found so far to avoid double-counting letters that occur more than twice in the string."
+      "Split the word into the first letter and the rest. Find all the duplicates in the rest.\n",
+      "\n",
+      "If the first letter appears in the rest of the word, it's a duplicate, so add it to list of duplicates. (But don't add it if we've already decided it's a duplicate.)"
      ]
     },
     {
      "collapsed": false,
      "input": [
       "def duplicates_via_recursion(word):\n",
+      "    if word:\n",
+      "        first = word[0]\n",
+      "        rest = word[1:]\n",
+      "        duplicates_in_rest = duplicates_via_recursion(rest)\n",
+      "        if first in rest and first not in duplicates_in_rest:\n",
+      "            duplicates_in_rest += [first]\n",
+      "        return duplicates_in_rest\n",
+      "    else:\n",
+      "        return []"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 18
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "duplicates_via_recursion('occurrences')"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 19,
+       "text": [
+        "['e', 'r', 'c']"
+       ]
+      }
+     ],
+     "prompt_number": 19
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "##Solution 6\n",
+      "A recursive approach, using an accumulator.\n",
+      "\n",
+      "The previous solution built up the set of duplicates as the recursive calls unwound. This approach uses an accumulator to capture the duplicates as the recursive calls are made. This is the trick for prompting semi-smart compilers to use tail-call optimisation. The algorithm is a simple reordering of the subtasks above.\n",
+      "\n",
+      "Split the word into the first letter and the rest. \n",
+      "\n",
+      "If the first letter appears in the rest of the word, it's a duplicate, so add it to list of duplicates. (But don't add it if we've already decided it's a duplicate.)\n",
+      "\n",
+      "Find all the duplicates in the rest, keeping track of the duplicates found so far.\n",
+      "\n",
+      "When we run out of word, the duplicates found so far are all the duplicates, so return them."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def duplicates_via_recursion_with_accumulator(word):\n",
       "    return duplicates_via_recursion_helper(word, [])\n",
       "\n",
       "def duplicates_via_recursion_helper(word, duplicates_so_far):\n",
       "    if word:\n",
-      "        letter = word[0]\n",
-      "        if letter in word[1:] and letter not in duplicates_so_far:\n",
-      "            return duplicates_via_recursion_helper(word[1:], [letter] + duplicates_so_far)\n",
+      "        first = word[0]\n",
+      "        rest = word[1:]\n",
+      "        if first in rest and first not in duplicates_so_far:\n",
+      "            return duplicates_via_recursion_helper(rest, [first] + duplicates_so_far)\n",
       "        else:\n",
-      "            return duplicates_via_recursion_helper(word[1:], duplicates_so_far)\n",
+      "            return duplicates_via_recursion_helper(rest, duplicates_so_far)\n",
       "    else:\n",
       "        return duplicates_so_far"
      ],
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 25
+     "prompt_number": 20
     },
     {
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "duplicates_via_recursion('occurrences')"
+      "duplicates_via_recursion_with_accumulator('occurrences')"
      ],
      "language": "python",
      "metadata": {},
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 26,
+       "prompt_number": 21,
        "text": [
         "['e', 'r', 'c']"
        ]
       }
      ],
-     "prompt_number": 26
+     "prompt_number": 21
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 6\n",
+      "##Solution 7\n",
       "Sort the word, then walk along it. A letter is duplicated if you find two consecutive letters the same. Store the results in a set, so that we don't have to worry about double-counting more than two letters."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 29
+     "prompt_number": 22
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 30,
+       "prompt_number": 23,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 30
+     "prompt_number": 23
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 7\n",
+      "##Solution 8\n",
       "As above, but this time we store the duplicates in a list and use an additional variable `found_duplicates` to keep the state of whether we've found duplicates of this letter.\n",
       "\n",
       "Initially `found_duplicates` if `False`. As soon as we find a second occurrence of the letter, set `found_duplicates` to `True` and record the letter as a duplicate. Then continue to walk along the list until we find a different letter, when we reset `found_duplicates` to `False` again."
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 55
+     "prompt_number": 24
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 56,
+       "prompt_number": 25,
        "text": [
         "['c', 'e', 'r']"
        ]
       }
      ],
-     "prompt_number": 56
+     "prompt_number": 25
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 8\n",
+      "##Solution 9\n",
       "Sort the string then split it into a list of lists, where each sublist is the group of identical letters that occur together. Then find the segments that have more than one letter."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 49
+     "prompt_number": 26
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 50,
+       "prompt_number": 27,
        "text": [
         "['c', 'e', 'r']"
        ]
       }
      ],
-     "prompt_number": 50
+     "prompt_number": 27
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 9\n",
+      "##Solution 10\n",
       "The same as above, but without having to sort the word first. This requires an additional loop, where we look for the bucket to place this letter. If we can't find the bucket, create a new one."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 43
+     "prompt_number": 28
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 44,
+       "prompt_number": 29,
        "text": [
         "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 44
+     "prompt_number": 29
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 10\n",
+      "##Solution 11\n",
       "Keep two lists: a list of pending duplicates, and a list of actual duplicates. The first time we see a letter, add it to the pending list. If we see the letter again, it's already in the pending list, so that prompts us to add it to the duplicates list."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 27
+     "prompt_number": 30
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 28,
+       "prompt_number": 31,
        "text": [
         "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 28
+     "prompt_number": 31
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 11\n",
+      "##Solution 12\n",
       "Exactly the same as above, but using sets instead of lists. This means we don't have to worry about adding more than one copy of the letter to either `pending` or `duplicates`."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 31
+     "prompt_number": 32
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 32,
+       "prompt_number": 33,
        "text": [
         "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 32
+     "prompt_number": 33
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 12\n",
+      "##Solution 13\n",
       "Creating a set removes all the duplicates. Make a set from the word, then remove each letter in the set from the word. Whatever letters are left are the duplicates. (Convert the duplicates to a set then to a list to eliminate letters that occur more than three times.)"
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 31
+     "prompt_number": 34
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 32,
+       "prompt_number": 35,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 32
+     "prompt_number": 35
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 13\n",
+      "##Solution 14\n",
       "Use the `Counter` in the standard `collections` library. This will count the occurrences of each letter, allowing us to simply find those with a count of more than one."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 33
+     "prompt_number": 36
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 34,
+       "prompt_number": 37,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 34
+     "prompt_number": 37
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 14\n",
-      "Similar to solution 12 above, but this time using Counters as multisets (or bags), where we can just do multiset subtraction to find the duplicates. `letters` will have a count of 1 for each letter."
+      "##Solution 15\n",
+      "Similar to solution 13 above, but this time using Counters as multisets (or bags), where we can just do multiset subtraction to find the duplicates. `letters` will have a count of 1 for each letter."
      ]
     },
     {
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 24
+     "prompt_number": 38
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 25,
+       "prompt_number": 39,
        "text": [
         "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 25
+     "prompt_number": 39
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 15\n",
+      "##Solution 16\n",
       "Using the `itertools.groupby` standard library function to find all the groups in the sorted word. We can then pull out the longer groups. \n",
       "\n",
       "The same approach as solution 8 above, but using library functions to simplify the implementation."
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 41
+     "prompt_number": 40
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 42,
+       "prompt_number": 41,
        "text": [
         "['c', 'e', 'r']"
        ]
       }
      ],
-     "prompt_number": 42
+     "prompt_number": 41
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 16\n",
+      "##Solution 17\n",
       "Using `functools.reduce` to merge the counts of letters in sublists of the word. Start by counting all the letters in the length one sublists, then combine them. Finally, find the letters with counts more than one."
      ]
     },
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 62
+     "prompt_number": 42
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 63,
+       "prompt_number": 43,
        "text": [
-        "['r', 'e', 'c']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 63
+     "prompt_number": 43
     },
     {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##Solution 17\n",
+      "##Solution 18\n",
       "Similar to above, but refactored to more clearly show the separate `map` and `reduce` stages of the map-reduce strategy.\n",
       "\n",
       "Also uses the `update` method of `Counter`s to merge intermediate results."
      "language": "python",
      "metadata": {},
      "outputs": [],
-     "prompt_number": 36
+     "prompt_number": 44
     },
     {
      "cell_type": "code",
       {
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 37,
+       "prompt_number": 45,
        "text": [
-        "['c', 'e', 'r']"
+        "['c', 'r', 'e']"
        ]
       }
      ],
-     "prompt_number": 37
+     "prompt_number": 45
     },
     {
      "cell_type": "code",
      "input": [],
      "language": "python",
      "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": 45
     }
    ],
    "metadata": {}