First bit of the A-level miscellany
[cas-master-teacher-training.git] / duplicate-letters.ipynb
1 {
2 "metadata": {
3 "name": "",
4 "signature": "sha256:09f57b8dd5238a1d205d7560ca860f967686132ecc69f4def5f741d8f6d3fe5d"
5 },
6 "nbformat": 3,
7 "nbformat_minor": 0,
8 "worksheets": [
9 {
10 "cells": [
11 {
12 "cell_type": "markdown",
13 "metadata": {},
14 "source": [
15 "#The specification\n",
16 "Write a procedure that returns all duplicate characters from a given String.\n",
17 "\n",
18 "For example if String is \"Java\" then program should return [\"a\"].\n",
19 "\n",
20 "* \"Java\" -> ['a']\n",
21 "* \"String\" -> []\n",
22 "* \"Computer science\" -> [\"c\", \"e\"]\n",
23 "* \"happy\" -> ['p']\n",
24 "\n",
25 "Bonus points if your program is robust and handle different kinds of input e.g. String without duplicate, null or empty String etc. \n",
26 "\n",
27 "Bonus points if you also write unit tests for normal and edge cases."
28 ]
29 },
30 {
31 "cell_type": "markdown",
32 "metadata": {},
33 "source": [
34 "##Solution 1\n",
35 "Walk over the word, incrementing a count for each letter as you go (stored in a `dict`). Then, walk over the set of counts and record those with a count greater than one."
36 ]
37 },
38 {
39 "cell_type": "code",
40 "collapsed": false,
41 "input": [
42 "def duplicates_via_dict(word):\n",
43 " counts = {}\n",
44 " for letter in word:\n",
45 " if letter in counts:\n",
46 " counts[letter] += 1\n",
47 " else:\n",
48 " counts[letter] = 1\n",
49 " duplicates = []\n",
50 " for letter in counts:\n",
51 " if counts[letter] > 1:\n",
52 " duplicates += [letter]\n",
53 " return duplicates"
54 ],
55 "language": "python",
56 "metadata": {},
57 "outputs": [],
58 "prompt_number": 10
59 },
60 {
61 "cell_type": "code",
62 "collapsed": false,
63 "input": [
64 "duplicates_via_dict('occurrences')"
65 ],
66 "language": "python",
67 "metadata": {},
68 "outputs": [
69 {
70 "metadata": {},
71 "output_type": "pyout",
72 "prompt_number": 11,
73 "text": [
74 "['c', 'r', 'e']"
75 ]
76 }
77 ],
78 "prompt_number": 11
79 },
80 {
81 "cell_type": "markdown",
82 "metadata": {},
83 "source": [
84 "##Solution 2\n",
85 "As solution 1, but using a `defaultdict` to record the counts, but pretending that each count is zero if it hasn't been seen already. Note that this makes the logic simpler in the first loop."
86 ]
87 },
88 {
89 "cell_type": "code",
90 "collapsed": false,
91 "input": [
92 "import collections\n",
93 "\n",
94 "def duplicates_via_defaultdict(word):\n",
95 " counts = collections.defaultdict(int)\n",
96 " for letter in word:\n",
97 " counts[letter] += 1\n",
98 " duplicates = []\n",
99 " for letter in counts:\n",
100 " if counts[letter] > 1:\n",
101 " duplicates += [letter]\n",
102 " return duplicates"
103 ],
104 "language": "python",
105 "metadata": {},
106 "outputs": [],
107 "prompt_number": 12
108 },
109 {
110 "cell_type": "code",
111 "collapsed": false,
112 "input": [
113 "duplicates_via_defaultdict('occurrences')"
114 ],
115 "language": "python",
116 "metadata": {},
117 "outputs": [
118 {
119 "metadata": {},
120 "output_type": "pyout",
121 "prompt_number": 13,
122 "text": [
123 "['c', 'r', 'e']"
124 ]
125 }
126 ],
127 "prompt_number": 13
128 },
129 {
130 "cell_type": "markdown",
131 "metadata": {},
132 "source": [
133 "##Solution 3\n",
134 "Define a helper function to count all the occurrences of a particular letter in the word. \n",
135 "\n",
136 "Walk over the word and find how many times that letter occurs in the the word (stored in a `dict`). Then, walk over the set of counts and record those with a count greater than one.\n",
137 "\n",
138 "Note that the first loop doesn't check if a letter already exists in the dict. You could introduce a test to avoid duplicating calls to `count_letter`."
139 ]
140 },
141 {
142 "cell_type": "code",
143 "collapsed": false,
144 "input": [
145 "def duplicates_via_counts(word):\n",
146 " def count_letter(word, letter):\n",
147 " count = 0\n",
148 " for l in word:\n",
149 " if l == letter:\n",
150 " count += 1\n",
151 " return count\n",
152 " counts = {}\n",
153 " for letter in word:\n",
154 " counts[letter] = count_letter(word, letter)\n",
155 " duplicates = []\n",
156 " for letter in counts:\n",
157 " if counts[letter] > 1:\n",
158 " duplicates += [letter]\n",
159 " return duplicates"
160 ],
161 "language": "python",
162 "metadata": {},
163 "outputs": [],
164 "prompt_number": 14
165 },
166 {
167 "cell_type": "code",
168 "collapsed": false,
169 "input": [
170 "duplicates_via_counts('occurrences')"
171 ],
172 "language": "python",
173 "metadata": {},
174 "outputs": [
175 {
176 "metadata": {},
177 "output_type": "pyout",
178 "prompt_number": 15,
179 "text": [
180 "['c', 'r', 'e']"
181 ]
182 }
183 ],
184 "prompt_number": 15
185 },
186 {
187 "cell_type": "markdown",
188 "metadata": {},
189 "source": [
190 "##Solution 4\n",
191 "A direct approach using comprehensions. The first one is a nested comprehension that counts how many times each letter appears in the word. The second comprehension just returns the letters that are duplicated."
192 ]
193 },
194 {
195 "cell_type": "code",
196 "collapsed": false,
197 "input": [
198 "def duplicates_via_comprehension(word):\n",
199 " counts = {letter: len([l for l in word if l == letter]) for letter in word}\n",
200 " return [letter for letter in counts if counts[letter] > 1]"
201 ],
202 "language": "python",
203 "metadata": {},
204 "outputs": [],
205 "prompt_number": 16
206 },
207 {
208 "cell_type": "code",
209 "collapsed": false,
210 "input": [
211 "duplicates_via_comprehension('occurrences')"
212 ],
213 "language": "python",
214 "metadata": {},
215 "outputs": [
216 {
217 "metadata": {},
218 "output_type": "pyout",
219 "prompt_number": 17,
220 "text": [
221 "['c', 'r', 'e']"
222 ]
223 }
224 ],
225 "prompt_number": 17
226 },
227 {
228 "cell_type": "markdown",
229 "metadata": {},
230 "source": [
231 "##Solution 5\n",
232 "A recursive approach.\n",
233 "\n",
234 "The empty word contains no duplicates.\n",
235 "\n",
236 "Non-empty words might contain duplicates. \n",
237 "\n",
238 "Split the word into the first letter and the rest. Find all the duplicates in the rest.\n",
239 "\n",
240 "If the first letter appears in the rest of the word, it's a duplicate, so add it to list of duplicates. (But don't add it if we've already decided it's a duplicate.)"
241 ]
242 },
243 {
244 "cell_type": "code",
245 "collapsed": false,
246 "input": [
247 "def duplicates_via_recursion(word):\n",
248 " if word:\n",
249 " first = word[0]\n",
250 " rest = word[1:]\n",
251 " duplicates_in_rest = duplicates_via_recursion(rest)\n",
252 " if first in rest and first not in duplicates_in_rest:\n",
253 " duplicates_in_rest += [first]\n",
254 " return duplicates_in_rest\n",
255 " else:\n",
256 " return []"
257 ],
258 "language": "python",
259 "metadata": {},
260 "outputs": [],
261 "prompt_number": 18
262 },
263 {
264 "cell_type": "code",
265 "collapsed": false,
266 "input": [
267 "duplicates_via_recursion('occurrences')"
268 ],
269 "language": "python",
270 "metadata": {},
271 "outputs": [
272 {
273 "metadata": {},
274 "output_type": "pyout",
275 "prompt_number": 19,
276 "text": [
277 "['e', 'r', 'c']"
278 ]
279 }
280 ],
281 "prompt_number": 19
282 },
283 {
284 "cell_type": "markdown",
285 "metadata": {},
286 "source": [
287 "##Solution 6\n",
288 "A recursive approach, using an accumulator.\n",
289 "\n",
290 "The previous solution built up the set of duplicates as the recursive calls unwound. This approach uses an accumulator to capture the duplicates as the recursive calls are made. This is the trick for prompting semi-smart compilers to use tail-call optimisation. The algorithm is a simple reordering of the subtasks above.\n",
291 "\n",
292 "Split the word into the first letter and the rest. \n",
293 "\n",
294 "If the first letter appears in the rest of the word, it's a duplicate, so add it to list of duplicates. (But don't add it if we've already decided it's a duplicate.)\n",
295 "\n",
296 "Find all the duplicates in the rest, keeping track of the duplicates found so far.\n",
297 "\n",
298 "When we run out of word, the duplicates found so far are all the duplicates, so return them."
299 ]
300 },
301 {
302 "cell_type": "code",
303 "collapsed": false,
304 "input": [
305 "def duplicates_via_recursion_with_accumulator(word):\n",
306 " return duplicates_via_recursion_helper(word, [])\n",
307 "\n",
308 "def duplicates_via_recursion_helper(word, duplicates_so_far):\n",
309 " if word:\n",
310 " first = word[0]\n",
311 " rest = word[1:]\n",
312 " if first in rest and first not in duplicates_so_far:\n",
313 " return duplicates_via_recursion_helper(rest, [first] + duplicates_so_far)\n",
314 " else:\n",
315 " return duplicates_via_recursion_helper(rest, duplicates_so_far)\n",
316 " else:\n",
317 " return duplicates_so_far"
318 ],
319 "language": "python",
320 "metadata": {},
321 "outputs": [],
322 "prompt_number": 20
323 },
324 {
325 "cell_type": "code",
326 "collapsed": false,
327 "input": [
328 "duplicates_via_recursion_with_accumulator('occurrences')"
329 ],
330 "language": "python",
331 "metadata": {},
332 "outputs": [
333 {
334 "metadata": {},
335 "output_type": "pyout",
336 "prompt_number": 21,
337 "text": [
338 "['e', 'r', 'c']"
339 ]
340 }
341 ],
342 "prompt_number": 21
343 },
344 {
345 "cell_type": "markdown",
346 "metadata": {},
347 "source": [
348 "##Solution 7\n",
349 "Sort the word, then walk along it. A letter is duplicated if you find two consecutive letters the same. Store the results in a set, so that we don't have to worry about double-counting more than two letters."
350 ]
351 },
352 {
353 "cell_type": "code",
354 "collapsed": false,
355 "input": [
356 "def duplicates_via_sorting(word):\n",
357 " duplicates = set()\n",
358 " sorted_word = sorted(word)\n",
359 " for i in range(len(sorted_word) - 1):\n",
360 " if sorted_word[i] == sorted_word[i+1]:\n",
361 " duplicates.add(sorted_word[i])\n",
362 " return list(duplicates)"
363 ],
364 "language": "python",
365 "metadata": {},
366 "outputs": [],
367 "prompt_number": 22
368 },
369 {
370 "cell_type": "code",
371 "collapsed": false,
372 "input": [
373 "duplicates_via_sorting('occurrences')"
374 ],
375 "language": "python",
376 "metadata": {},
377 "outputs": [
378 {
379 "metadata": {},
380 "output_type": "pyout",
381 "prompt_number": 23,
382 "text": [
383 "['c', 'r', 'e']"
384 ]
385 }
386 ],
387 "prompt_number": 23
388 },
389 {
390 "cell_type": "markdown",
391 "metadata": {},
392 "source": [
393 "##Solution 8\n",
394 "As above, but this time we store the duplicates in a list and use an additional variable `found_duplicates` to keep the state of whether we've found duplicates of this letter.\n",
395 "\n",
396 "Initially `found_duplicates` if `False`. As soon as we find a second occurrence of the letter, set `found_duplicates` to `True` and record the letter as a duplicate. Then continue to walk along the list until we find a different letter, when we reset `found_duplicates` to `False` again."
397 ]
398 },
399 {
400 "cell_type": "code",
401 "collapsed": false,
402 "input": [
403 "def duplicates_via_runs(word):\n",
404 " duplicates = []\n",
405 " sorted_word = sorted(word)\n",
406 " i = 0\n",
407 " found_duplicates = False\n",
408 " for i in range(len(sorted_word) - 1):\n",
409 " if sorted_word[i] == sorted_word[i+1]:\n",
410 " if not found_duplicates:\n",
411 " duplicates += [sorted_word[i]]\n",
412 " found_duplicates = True\n",
413 " else:\n",
414 " found_duplicates = False\n",
415 " return duplicates"
416 ],
417 "language": "python",
418 "metadata": {},
419 "outputs": [],
420 "prompt_number": 24
421 },
422 {
423 "cell_type": "code",
424 "collapsed": false,
425 "input": [
426 "duplicates_via_runs('occurrences')"
427 ],
428 "language": "python",
429 "metadata": {},
430 "outputs": [
431 {
432 "metadata": {},
433 "output_type": "pyout",
434 "prompt_number": 25,
435 "text": [
436 "['c', 'e', 'r']"
437 ]
438 }
439 ],
440 "prompt_number": 25
441 },
442 {
443 "cell_type": "markdown",
444 "metadata": {},
445 "source": [
446 "##Solution 9\n",
447 "Sort the string then split it into a list of lists, where each sublist is the group of identical letters that occur together. Then find the segments that have more than one letter."
448 ]
449 },
450 {
451 "cell_type": "code",
452 "collapsed": false,
453 "input": [
454 "def duplicates_via_segments(word):\n",
455 " segments = []\n",
456 " sorted_word = sorted(word)\n",
457 " segment_start = 0\n",
458 " for i in range(len(sorted_word) - 1):\n",
459 " if sorted_word[i] != sorted_word[i+1]:\n",
460 " segments += [sorted_word[segment_start:i+1]]\n",
461 " segment_start = i + 1\n",
462 " return [segment[0] for segment in segments if len(segment) > 1]"
463 ],
464 "language": "python",
465 "metadata": {},
466 "outputs": [],
467 "prompt_number": 26
468 },
469 {
470 "cell_type": "code",
471 "collapsed": false,
472 "input": [
473 "duplicates_via_segments('occurrences')"
474 ],
475 "language": "python",
476 "metadata": {},
477 "outputs": [
478 {
479 "metadata": {},
480 "output_type": "pyout",
481 "prompt_number": 27,
482 "text": [
483 "['c', 'e', 'r']"
484 ]
485 }
486 ],
487 "prompt_number": 27
488 },
489 {
490 "cell_type": "markdown",
491 "metadata": {},
492 "source": [
493 "##Solution 10\n",
494 "The same as above, but without having to sort the word first. This requires an additional loop, where we look for the bucket to place this letter. If we can't find the bucket, create a new one."
495 ]
496 },
497 {
498 "cell_type": "code",
499 "collapsed": false,
500 "input": [
501 "def duplicates_via_buckets(word):\n",
502 " buckets = []\n",
503 " for letter in word:\n",
504 " found_bucket = False\n",
505 " for bucket in buckets:\n",
506 " if bucket[0] == letter:\n",
507 " bucket += [letter]\n",
508 " found_bucket = True\n",
509 " if not found_bucket:\n",
510 " buckets += [[letter]]\n",
511 " duplicates = []\n",
512 " for bucket in buckets:\n",
513 " if len(bucket) > 1:\n",
514 " duplicates += bucket[0]\n",
515 " return duplicates"
516 ],
517 "language": "python",
518 "metadata": {},
519 "outputs": [],
520 "prompt_number": 28
521 },
522 {
523 "cell_type": "code",
524 "collapsed": false,
525 "input": [
526 "duplicates_via_buckets('occurrences')"
527 ],
528 "language": "python",
529 "metadata": {},
530 "outputs": [
531 {
532 "metadata": {},
533 "output_type": "pyout",
534 "prompt_number": 29,
535 "text": [
536 "['c', 'r', 'e']"
537 ]
538 }
539 ],
540 "prompt_number": 29
541 },
542 {
543 "cell_type": "markdown",
544 "metadata": {},
545 "source": [
546 "##Solution 11\n",
547 "Keep two lists: a list of pending duplicates, and a list of actual duplicates. The first time we see a letter, add it to the pending list. If we see the letter again, it's already in the pending list, so that prompts us to add it to the duplicates list."
548 ]
549 },
550 {
551 "cell_type": "code",
552 "collapsed": false,
553 "input": [
554 "def duplicates_via_pending_list(word):\n",
555 " pending = []\n",
556 " duplicates = []\n",
557 " for letter in word:\n",
558 " if letter in pending:\n",
559 " if letter not in duplicates:\n",
560 " duplicates.append(letter)\n",
561 " else:\n",
562 " pending.append(letter)\n",
563 " return duplicates"
564 ],
565 "language": "python",
566 "metadata": {},
567 "outputs": [],
568 "prompt_number": 30
569 },
570 {
571 "cell_type": "code",
572 "collapsed": false,
573 "input": [
574 "duplicates_via_pending_list('occurrences')"
575 ],
576 "language": "python",
577 "metadata": {},
578 "outputs": [
579 {
580 "metadata": {},
581 "output_type": "pyout",
582 "prompt_number": 31,
583 "text": [
584 "['c', 'r', 'e']"
585 ]
586 }
587 ],
588 "prompt_number": 31
589 },
590 {
591 "cell_type": "markdown",
592 "metadata": {},
593 "source": [
594 "##Solution 12\n",
595 "Exactly the same as above, but using sets instead of lists. This means we don't have to worry about adding more than one copy of the letter to either `pending` or `duplicates`."
596 ]
597 },
598 {
599 "cell_type": "code",
600 "collapsed": false,
601 "input": [
602 "def duplicates_via_pending_set(word):\n",
603 " pending = set()\n",
604 " duplicates = set()\n",
605 " for letter in word:\n",
606 " if letter in pending:\n",
607 " duplicates.add(letter)\n",
608 " pending.add(letter)\n",
609 " return list(duplicates)"
610 ],
611 "language": "python",
612 "metadata": {},
613 "outputs": [],
614 "prompt_number": 32
615 },
616 {
617 "cell_type": "code",
618 "collapsed": false,
619 "input": [
620 "duplicates_via_pending_set('occurrences')"
621 ],
622 "language": "python",
623 "metadata": {},
624 "outputs": [
625 {
626 "metadata": {},
627 "output_type": "pyout",
628 "prompt_number": 33,
629 "text": [
630 "['c', 'r', 'e']"
631 ]
632 }
633 ],
634 "prompt_number": 33
635 },
636 {
637 "cell_type": "markdown",
638 "metadata": {},
639 "source": [
640 "##Solution 13\n",
641 "Creating a set removes all the duplicates. Make a set from the word, then remove each letter in the set from the word. Whatever letters are left are the duplicates. (Convert the duplicates to a set then to a list to eliminate letters that occur more than three times.)"
642 ]
643 },
644 {
645 "cell_type": "code",
646 "collapsed": false,
647 "input": [
648 "def duplicates_via_set(word):\n",
649 " letters = set(word)\n",
650 " duplicates = list(word)\n",
651 " for letter in letters:\n",
652 " duplicates.remove(letter)\n",
653 " return list(set(duplicates))"
654 ],
655 "language": "python",
656 "metadata": {},
657 "outputs": [],
658 "prompt_number": 34
659 },
660 {
661 "cell_type": "code",
662 "collapsed": false,
663 "input": [
664 "duplicates_via_set('occurrences')"
665 ],
666 "language": "python",
667 "metadata": {},
668 "outputs": [
669 {
670 "metadata": {},
671 "output_type": "pyout",
672 "prompt_number": 35,
673 "text": [
674 "['c', 'r', 'e']"
675 ]
676 }
677 ],
678 "prompt_number": 35
679 },
680 {
681 "cell_type": "markdown",
682 "metadata": {},
683 "source": [
684 "##Solution 14\n",
685 "Use the `Counter` in the standard `collections` library. This will count the occurrences of each letter, allowing us to simply find those with a count of more than one."
686 ]
687 },
688 {
689 "cell_type": "code",
690 "collapsed": false,
691 "input": [
692 "import collections\n",
693 "\n",
694 "def duplicates_via_counter(word):\n",
695 " counts = collections.Counter(word)\n",
696 " return [letter for letter, count in counts.items() \n",
697 " if count > 1]"
698 ],
699 "language": "python",
700 "metadata": {},
701 "outputs": [],
702 "prompt_number": 36
703 },
704 {
705 "cell_type": "code",
706 "collapsed": false,
707 "input": [
708 "duplicates_via_counter('occurrences')"
709 ],
710 "language": "python",
711 "metadata": {},
712 "outputs": [
713 {
714 "metadata": {},
715 "output_type": "pyout",
716 "prompt_number": 37,
717 "text": [
718 "['c', 'r', 'e']"
719 ]
720 }
721 ],
722 "prompt_number": 37
723 },
724 {
725 "cell_type": "markdown",
726 "metadata": {},
727 "source": [
728 "##Solution 15\n",
729 "Similar to solution 13 above, but this time using Counters as multisets (or bags), where we can just do multiset subtraction to find the duplicates. `letters` will have a count of 1 for each letter."
730 ]
731 },
732 {
733 "cell_type": "code",
734 "collapsed": false,
735 "input": [
736 "import collections\n",
737 "\n",
738 "def duplicates_via_counter_and_set(word):\n",
739 " letters = collections.Counter(set(word))\n",
740 " counts = collections.Counter(word)\n",
741 " duplicates = counts - letters\n",
742 " return [letter for letter in duplicates]"
743 ],
744 "language": "python",
745 "metadata": {},
746 "outputs": [],
747 "prompt_number": 38
748 },
749 {
750 "cell_type": "code",
751 "collapsed": false,
752 "input": [
753 "duplicates_via_counter_and_set('occurrences')"
754 ],
755 "language": "python",
756 "metadata": {},
757 "outputs": [
758 {
759 "metadata": {},
760 "output_type": "pyout",
761 "prompt_number": 39,
762 "text": [
763 "['c', 'r', 'e']"
764 ]
765 }
766 ],
767 "prompt_number": 39
768 },
769 {
770 "cell_type": "markdown",
771 "metadata": {},
772 "source": [
773 "##Solution 16\n",
774 "Using the `itertools.groupby` standard library function to find all the groups in the sorted word. We can then pull out the longer groups. \n",
775 "\n",
776 "The same approach as solution 8 above, but using library functions to simplify the implementation."
777 ]
778 },
779 {
780 "cell_type": "code",
781 "collapsed": false,
782 "input": [
783 "import itertools\n",
784 "\n",
785 "def duplicates_via_groupby(word):\n",
786 " return [k for k, g in itertools.groupby(sorted(word)) \n",
787 " if len(list(g)) > 1]"
788 ],
789 "language": "python",
790 "metadata": {},
791 "outputs": [],
792 "prompt_number": 40
793 },
794 {
795 "cell_type": "code",
796 "collapsed": false,
797 "input": [
798 "duplicates_via_groupby('occurrences')"
799 ],
800 "language": "python",
801 "metadata": {},
802 "outputs": [
803 {
804 "metadata": {},
805 "output_type": "pyout",
806 "prompt_number": 41,
807 "text": [
808 "['c', 'e', 'r']"
809 ]
810 }
811 ],
812 "prompt_number": 41
813 },
814 {
815 "cell_type": "markdown",
816 "metadata": {},
817 "source": [
818 "##Solution 17\n",
819 "Using `functools.reduce` to merge the counts of letters in sublists of the word. Start by counting all the letters in the length one sublists, then combine them. Finally, find the letters with counts more than one."
820 ]
821 },
822 {
823 "cell_type": "code",
824 "collapsed": false,
825 "input": [
826 "import functools\n",
827 "\n",
828 "def duplicates_via_reduce(word):\n",
829 " def merge_counts(counts1, counts2):\n",
830 " counts = {l: c for l, c in counts1.items()}\n",
831 " for l, c in counts2.items():\n",
832 " if l in counts:\n",
833 " counts[l] += c\n",
834 " else:\n",
835 " counts[l] = c\n",
836 " return counts\n",
837 " \n",
838 " counts = [{letter: 1} for letter in word]\n",
839 " return [letter for letter, count in functools.reduce(merge_counts, counts).items() if count > 1]"
840 ],
841 "language": "python",
842 "metadata": {},
843 "outputs": [],
844 "prompt_number": 42
845 },
846 {
847 "cell_type": "code",
848 "collapsed": false,
849 "input": [
850 "duplicates_via_reduce('occurrences')"
851 ],
852 "language": "python",
853 "metadata": {},
854 "outputs": [
855 {
856 "metadata": {},
857 "output_type": "pyout",
858 "prompt_number": 43,
859 "text": [
860 "['c', 'r', 'e']"
861 ]
862 }
863 ],
864 "prompt_number": 43
865 },
866 {
867 "cell_type": "markdown",
868 "metadata": {},
869 "source": [
870 "##Solution 18\n",
871 "Similar to above, but refactored to more clearly show the separate `map` and `reduce` stages of the map-reduce strategy.\n",
872 "\n",
873 "Also uses the `update` method of `Counter`s to merge intermediate results."
874 ]
875 },
876 {
877 "cell_type": "code",
878 "collapsed": false,
879 "input": [
880 "import functools\n",
881 "import collections\n",
882 "\n",
883 "def duplicates_via_map_reduce_update(word):\n",
884 " def mapper(letter):\n",
885 " return collections.Counter(letter)\n",
886 " \n",
887 " def reducer(left, right):\n",
888 " left.update(right.elements())\n",
889 " return left\n",
890 " \n",
891 " letters = map(mapper, word)\n",
892 " counts = functools.reduce(reducer, letters)\n",
893 " return [letter for letter, count in counts.items() if count > 1]"
894 ],
895 "language": "python",
896 "metadata": {},
897 "outputs": [],
898 "prompt_number": 44
899 },
900 {
901 "cell_type": "code",
902 "collapsed": false,
903 "input": [
904 "duplicates_via_map_reduce_update('occurrences')"
905 ],
906 "language": "python",
907 "metadata": {},
908 "outputs": [
909 {
910 "metadata": {},
911 "output_type": "pyout",
912 "prompt_number": 45,
913 "text": [
914 "['c', 'r', 'e']"
915 ]
916 }
917 ],
918 "prompt_number": 45
919 },
920 {
921 "cell_type": "code",
922 "collapsed": false,
923 "input": [],
924 "language": "python",
925 "metadata": {},
926 "outputs": [],
927 "prompt_number": 45
928 }
929 ],
930 "metadata": {}
931 }
932 ]
933 }