Works with letters, added trimmed Lovecraft
[name-generation.git] / markov / markov.ipynb
1 {
2 "cells": [
3 {
4 "cell_type": "code",
5 "execution_count": 87,
6 "metadata": {
7 "collapsed": true
8 },
9 "outputs": [],
10 "source": [
11 "import re\n",
12 "import string\n",
13 "import collections\n",
14 "import unicodedata\n",
15 "import random\n",
16 "from IPython.core.display import display, HTML"
17 ]
18 },
19 {
20 "cell_type": "code",
21 "execution_count": 2,
22 "metadata": {
23 "collapsed": true
24 },
25 "outputs": [],
26 "source": [
27 "sample_text = \"\"\"Continuing this process, we obtain better and better approximations to the square root.\n",
28 "Now let's formalize the process in terms of procedures. We start with a value for the radicand (the\n",
29 "number whose square root we are trying to compute) and a value for the guess. If the guess is good\n",
30 "enough for our purposes, we are done; if not, we must repeat the process with an improved guess. We\n",
31 "write this basic strategy as a procedure:\n",
32 "(define (sqrt-iter guess x)\n",
33 "(if (good-enough? guess x)\n",
34 "guess\n",
35 "(sqrt-iter (improve guess x)\n",
36 "x)))\n",
37 "A guess is improved by averaging it with the quotient of the radicand and the old guess:\n",
38 "(define (improve guess x)\n",
39 "(average guess (/ x guess)))\n",
40 "where\n",
41 "\n",
42 "\f",
43 "(define (average x y)\n",
44 "(/ (+ x y) 2))\n",
45 "We also have to say what we mean by ''good enough.'' The following will do for illustration, but it is\n",
46 "not really a very good test. (See exercise 1.7.) The idea is to improve the answer until it is close\n",
47 "enough so that its square differs from the radicand by less than a predetermined tolerance (here\n",
48 "0.001): 22\n",
49 "(define (good-enough? guess x)\n",
50 "(< (abs (- (square guess) x)) 0.001))\n",
51 "Finally, we need a way to get started. For instance, we can always guess that the square root of any\n",
52 "number is 1: 23\n",
53 "(define (sqrt x)\n",
54 "(sqrt-iter 1.0 x))\n",
55 "If we type these definitions to the interpreter, we can use sqrt just as we can use any procedure:\"\"\"\n",
56 "\n",
57 "small_text = '''A guess is improved by averaging it with the quotient of the radicand and the old guess:\n",
58 "(define (improve guess x)\n",
59 "((average guess (/ x guess)))'''\n",
60 "\n",
61 "sentence_boundary = 'in terms of procedures. We start with 0.123 and some'\n",
62 "\n",
63 "double_quotes = \"let's see how such a ''circular'' definition\""
64 ]
65 },
66 {
67 "cell_type": "code",
68 "execution_count": 78,
69 "metadata": {
70 "collapsed": true
71 },
72 "outputs": [],
73 "source": [
74 "cat = ''.join"
75 ]
76 },
77 {
78 "cell_type": "code",
79 "execution_count": 3,
80 "metadata": {
81 "collapsed": false
82 },
83 "outputs": [
84 {
85 "data": {
86 "text/plain": [
87 "'!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'"
88 ]
89 },
90 "execution_count": 3,
91 "metadata": {},
92 "output_type": "execute_result"
93 }
94 ],
95 "source": [
96 "string.punctuation"
97 ]
98 },
99 {
100 "cell_type": "code",
101 "execution_count": 4,
102 "metadata": {
103 "collapsed": false
104 },
105 "outputs": [
106 {
107 "data": {
108 "text/plain": [
109 "'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'"
110 ]
111 },
112 "execution_count": 4,
113 "metadata": {},
114 "output_type": "execute_result"
115 }
116 ],
117 "source": [
118 "string.ascii_letters + string.digits + string.punctuation"
119 ]
120 },
121 {
122 "cell_type": "code",
123 "execution_count": 5,
124 "metadata": {
125 "collapsed": false
126 },
127 "outputs": [
128 {
129 "data": {
130 "text/plain": [
131 "re.compile(r'[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+',\n",
132 "re.UNICODE)"
133 ]
134 },
135 "execution_count": 5,
136 "metadata": {},
137 "output_type": "execute_result"
138 }
139 ],
140 "source": [
141 "token_pattern = re.compile(r'[^{}]+'.format(re.escape(string.ascii_letters + string.digits + string.punctuation)))\n",
142 "token_pattern"
143 ]
144 },
145 {
146 "cell_type": "code",
147 "execution_count": 6,
148 "metadata": {
149 "collapsed": false
150 },
151 "outputs": [
152 {
153 "data": {
154 "text/plain": [
155 "re.compile(r'(\\d+\\.\\d+|\\w+\\'\\w+|[\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+(?=\\w)|(?<=\\w)[\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+|[\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+$)',\n",
156 "re.UNICODE)"
157 ]
158 },
159 "execution_count": 6,
160 "metadata": {},
161 "output_type": "execute_result"
162 }
163 ],
164 "source": [
165 "punctuation_pattern = re.compile('(\\d+\\.\\d+|\\w+\\'\\w+|[{0}]+(?=\\w)|(?<=\\w)[{0}]+|[{0}]+$)'.format(re.escape(string.punctuation)))\n",
166 "punctuation_pattern"
167 ]
168 },
169 {
170 "cell_type": "code",
171 "execution_count": 7,
172 "metadata": {
173 "collapsed": false
174 },
175 "outputs": [
176 {
177 "data": {
178 "text/plain": [
179 "['', '((', 'average']"
180 ]
181 },
182 "execution_count": 7,
183 "metadata": {},
184 "output_type": "execute_result"
185 }
186 ],
187 "source": [
188 "re.split(r'(\\(+(?=\\w))', '((average')"
189 ]
190 },
191 {
192 "cell_type": "code",
193 "execution_count": 8,
194 "metadata": {
195 "collapsed": false,
196 "scrolled": true
197 },
198 "outputs": [
199 {
200 "data": {
201 "text/plain": [
202 "['A',\n",
203 " 'guess',\n",
204 " 'is',\n",
205 " 'improved',\n",
206 " 'by',\n",
207 " 'averaging',\n",
208 " 'it',\n",
209 " 'with',\n",
210 " 'the',\n",
211 " 'quotient',\n",
212 " 'of',\n",
213 " 'the',\n",
214 " 'radicand',\n",
215 " 'and',\n",
216 " 'the',\n",
217 " 'old',\n",
218 " 'guess',\n",
219 " ':',\n",
220 " '(',\n",
221 " 'define',\n",
222 " '(',\n",
223 " 'improve',\n",
224 " 'guess',\n",
225 " 'x',\n",
226 " ')',\n",
227 " '((',\n",
228 " 'average',\n",
229 " 'guess',\n",
230 " '(/',\n",
231 " 'x',\n",
232 " 'guess',\n",
233 " ')))']"
234 ]
235 },
236 "execution_count": 8,
237 "metadata": {},
238 "output_type": "execute_result"
239 }
240 ],
241 "source": [
242 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, small_text)]\n",
243 " for ch in gp if ch]"
244 ]
245 },
246 {
247 "cell_type": "code",
248 "execution_count": 9,
249 "metadata": {
250 "collapsed": false
251 },
252 "outputs": [
253 {
254 "data": {
255 "text/plain": [
256 "['in',\n",
257 " 'terms',\n",
258 " 'of',\n",
259 " 'procedures',\n",
260 " '.',\n",
261 " 'We',\n",
262 " 'start',\n",
263 " 'with',\n",
264 " '0.123',\n",
265 " 'and',\n",
266 " 'some']"
267 ]
268 },
269 "execution_count": 9,
270 "metadata": {},
271 "output_type": "execute_result"
272 }
273 ],
274 "source": [
275 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, sentence_boundary)]\n",
276 " for ch in gp if ch]"
277 ]
278 },
279 {
280 "cell_type": "code",
281 "execution_count": 10,
282 "metadata": {
283 "collapsed": false
284 },
285 "outputs": [
286 {
287 "data": {
288 "text/plain": [
289 "[\"let's\", 'see', 'how', 'such', 'a', \"''\", 'circular', \"''\", 'definition']"
290 ]
291 },
292 "execution_count": 10,
293 "metadata": {},
294 "output_type": "execute_result"
295 }
296 ],
297 "source": [
298 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, double_quotes)]\n",
299 " for ch in gp if ch]"
300 ]
301 },
302 {
303 "cell_type": "code",
304 "execution_count": 11,
305 "metadata": {
306 "collapsed": false,
307 "scrolled": true
308 },
309 "outputs": [
310 {
311 "data": {
312 "text/plain": [
313 "['Continuing',\n",
314 " 'this',\n",
315 " 'process',\n",
316 " ',',\n",
317 " 'we',\n",
318 " 'obtain',\n",
319 " 'better',\n",
320 " 'and',\n",
321 " 'better',\n",
322 " 'approximations',\n",
323 " 'to',\n",
324 " 'the',\n",
325 " 'square',\n",
326 " 'root',\n",
327 " '.',\n",
328 " 'Now',\n",
329 " \"let's\",\n",
330 " 'formalize',\n",
331 " 'the',\n",
332 " 'process',\n",
333 " 'in',\n",
334 " 'terms',\n",
335 " 'of',\n",
336 " 'procedures',\n",
337 " '.',\n",
338 " 'We',\n",
339 " 'start',\n",
340 " 'with',\n",
341 " 'a',\n",
342 " 'value',\n",
343 " 'for',\n",
344 " 'the',\n",
345 " 'radicand',\n",
346 " '(',\n",
347 " 'the',\n",
348 " 'number',\n",
349 " 'whose',\n",
350 " 'square',\n",
351 " 'root',\n",
352 " 'we',\n",
353 " 'are',\n",
354 " 'trying',\n",
355 " 'to',\n",
356 " 'compute',\n",
357 " ')',\n",
358 " 'and',\n",
359 " 'a',\n",
360 " 'value',\n",
361 " 'for',\n",
362 " 'the',\n",
363 " 'guess',\n",
364 " '.',\n",
365 " 'If',\n",
366 " 'the',\n",
367 " 'guess',\n",
368 " 'is',\n",
369 " 'good',\n",
370 " 'enough',\n",
371 " 'for',\n",
372 " 'our',\n",
373 " 'purposes',\n",
374 " ',',\n",
375 " 'we',\n",
376 " 'are',\n",
377 " 'done',\n",
378 " ';',\n",
379 " 'if',\n",
380 " 'not',\n",
381 " ',',\n",
382 " 'we',\n",
383 " 'must',\n",
384 " 'repeat',\n",
385 " 'the',\n",
386 " 'process',\n",
387 " 'with',\n",
388 " 'an',\n",
389 " 'improved',\n",
390 " 'guess',\n",
391 " '.',\n",
392 " 'We',\n",
393 " 'write',\n",
394 " 'this',\n",
395 " 'basic',\n",
396 " 'strategy',\n",
397 " 'as',\n",
398 " 'a',\n",
399 " 'procedure',\n",
400 " ':',\n",
401 " '(',\n",
402 " 'define',\n",
403 " '(',\n",
404 " 'sqrt',\n",
405 " '-',\n",
406 " 'iter',\n",
407 " 'guess',\n",
408 " 'x',\n",
409 " ')',\n",
410 " '(',\n",
411 " 'if',\n",
412 " '(',\n",
413 " 'good',\n",
414 " '-',\n",
415 " 'enough',\n",
416 " '?',\n",
417 " 'guess',\n",
418 " 'x',\n",
419 " ')',\n",
420 " 'guess',\n",
421 " '(',\n",
422 " 'sqrt',\n",
423 " '-',\n",
424 " 'iter',\n",
425 " '(',\n",
426 " 'improve',\n",
427 " 'guess',\n",
428 " 'x',\n",
429 " ')',\n",
430 " 'x',\n",
431 " ')))',\n",
432 " 'A',\n",
433 " 'guess',\n",
434 " 'is',\n",
435 " 'improved',\n",
436 " 'by',\n",
437 " 'averaging',\n",
438 " 'it',\n",
439 " 'with',\n",
440 " 'the',\n",
441 " 'quotient',\n",
442 " 'of',\n",
443 " 'the',\n",
444 " 'radicand',\n",
445 " 'and',\n",
446 " 'the',\n",
447 " 'old',\n",
448 " 'guess',\n",
449 " ':',\n",
450 " '(',\n",
451 " 'define',\n",
452 " '(',\n",
453 " 'improve',\n",
454 " 'guess',\n",
455 " 'x',\n",
456 " ')',\n",
457 " '(',\n",
458 " 'average',\n",
459 " 'guess',\n",
460 " '(/',\n",
461 " 'x',\n",
462 " 'guess',\n",
463 " ')))',\n",
464 " 'where',\n",
465 " '(',\n",
466 " 'define',\n",
467 " '(',\n",
468 " 'average',\n",
469 " 'x',\n",
470 " 'y',\n",
471 " ')',\n",
472 " '(/',\n",
473 " '(+',\n",
474 " 'x',\n",
475 " 'y',\n",
476 " ')',\n",
477 " '2',\n",
478 " '))',\n",
479 " 'We',\n",
480 " 'also',\n",
481 " 'have',\n",
482 " 'to',\n",
483 " 'say',\n",
484 " 'what',\n",
485 " 'we',\n",
486 " 'mean',\n",
487 " 'by',\n",
488 " \"''\",\n",
489 " 'good',\n",
490 " 'enough',\n",
491 " \".''\",\n",
492 " 'The',\n",
493 " 'following',\n",
494 " 'will',\n",
495 " 'do',\n",
496 " 'for',\n",
497 " 'illustration',\n",
498 " ',',\n",
499 " 'but',\n",
500 " 'it',\n",
501 " 'is',\n",
502 " 'not',\n",
503 " 'really',\n",
504 " 'a',\n",
505 " 'very',\n",
506 " 'good',\n",
507 " 'test',\n",
508 " '.',\n",
509 " '(',\n",
510 " 'See',\n",
511 " 'exercise',\n",
512 " '1.7',\n",
513 " '.)',\n",
514 " 'The',\n",
515 " 'idea',\n",
516 " 'is',\n",
517 " 'to',\n",
518 " 'improve',\n",
519 " 'the',\n",
520 " 'answer',\n",
521 " 'until',\n",
522 " 'it',\n",
523 " 'is',\n",
524 " 'close',\n",
525 " 'enough',\n",
526 " 'so',\n",
527 " 'that',\n",
528 " 'its',\n",
529 " 'square',\n",
530 " 'differs',\n",
531 " 'from',\n",
532 " 'the',\n",
533 " 'radicand',\n",
534 " 'by',\n",
535 " 'less',\n",
536 " 'than',\n",
537 " 'a',\n",
538 " 'predetermined',\n",
539 " 'tolerance',\n",
540 " '(',\n",
541 " 'here',\n",
542 " '0.001',\n",
543 " '):',\n",
544 " '22',\n",
545 " '(',\n",
546 " 'define',\n",
547 " '(',\n",
548 " 'good',\n",
549 " '-',\n",
550 " 'enough',\n",
551 " '?',\n",
552 " 'guess',\n",
553 " 'x',\n",
554 " ')',\n",
555 " '(<',\n",
556 " '(',\n",
557 " 'abs',\n",
558 " '(-',\n",
559 " '(',\n",
560 " 'square',\n",
561 " 'guess',\n",
562 " ')',\n",
563 " 'x',\n",
564 " '))',\n",
565 " '0.001',\n",
566 " '))',\n",
567 " 'Finally',\n",
568 " ',',\n",
569 " 'we',\n",
570 " 'need',\n",
571 " 'a',\n",
572 " 'way',\n",
573 " 'to',\n",
574 " 'get',\n",
575 " 'started',\n",
576 " '.',\n",
577 " 'For',\n",
578 " 'instance',\n",
579 " ',',\n",
580 " 'we',\n",
581 " 'can',\n",
582 " 'always',\n",
583 " 'guess',\n",
584 " 'that',\n",
585 " 'the',\n",
586 " 'square',\n",
587 " 'root',\n",
588 " 'of',\n",
589 " 'any',\n",
590 " 'number',\n",
591 " 'is',\n",
592 " '1',\n",
593 " ':',\n",
594 " '23',\n",
595 " '(',\n",
596 " 'define',\n",
597 " '(',\n",
598 " 'sqrt',\n",
599 " 'x',\n",
600 " ')',\n",
601 " '(',\n",
602 " 'sqrt',\n",
603 " '-',\n",
604 " 'iter',\n",
605 " '1.0',\n",
606 " 'x',\n",
607 " '))',\n",
608 " 'If',\n",
609 " 'we',\n",
610 " 'type',\n",
611 " 'these',\n",
612 " 'definitions',\n",
613 " 'to',\n",
614 " 'the',\n",
615 " 'interpreter',\n",
616 " ',',\n",
617 " 'we',\n",
618 " 'can',\n",
619 " 'use',\n",
620 " 'sqrt',\n",
621 " 'just',\n",
622 " 'as',\n",
623 " 'we',\n",
624 " 'can',\n",
625 " 'use',\n",
626 " 'any',\n",
627 " 'procedure',\n",
628 " ':']"
629 ]
630 },
631 "execution_count": 11,
632 "metadata": {},
633 "output_type": "execute_result"
634 }
635 ],
636 "source": [
637 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, sample_text)]\n",
638 " for ch in gp if ch]"
639 ]
640 },
641 {
642 "cell_type": "code",
643 "execution_count": 12,
644 "metadata": {
645 "collapsed": true
646 },
647 "outputs": [],
648 "source": [
649 "def tokenise(text):\n",
650 " return [ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, text)]\n",
651 " for ch in gp if ch]"
652 ]
653 },
654 {
655 "cell_type": "code",
656 "execution_count": 13,
657 "metadata": {
658 "collapsed": false,
659 "scrolled": true
660 },
661 "outputs": [
662 {
663 "data": {
664 "text/plain": [
665 "['Continuing',\n",
666 " 'this',\n",
667 " 'process',\n",
668 " ',',\n",
669 " 'we',\n",
670 " 'obtain',\n",
671 " 'better',\n",
672 " 'and',\n",
673 " 'better',\n",
674 " 'approximations',\n",
675 " 'to',\n",
676 " 'the',\n",
677 " 'square',\n",
678 " 'root',\n",
679 " '.',\n",
680 " 'Now',\n",
681 " \"let's\",\n",
682 " 'formalize',\n",
683 " 'the',\n",
684 " 'process',\n",
685 " 'in',\n",
686 " 'terms',\n",
687 " 'of',\n",
688 " 'procedures',\n",
689 " '.',\n",
690 " 'We',\n",
691 " 'start',\n",
692 " 'with',\n",
693 " 'a',\n",
694 " 'value',\n",
695 " 'for',\n",
696 " 'the',\n",
697 " 'radicand',\n",
698 " '(',\n",
699 " 'the',\n",
700 " 'number',\n",
701 " 'whose',\n",
702 " 'square',\n",
703 " 'root',\n",
704 " 'we',\n",
705 " 'are',\n",
706 " 'trying',\n",
707 " 'to',\n",
708 " 'compute',\n",
709 " ')',\n",
710 " 'and',\n",
711 " 'a',\n",
712 " 'value',\n",
713 " 'for',\n",
714 " 'the',\n",
715 " 'guess',\n",
716 " '.',\n",
717 " 'If',\n",
718 " 'the',\n",
719 " 'guess',\n",
720 " 'is',\n",
721 " 'good',\n",
722 " 'enough',\n",
723 " 'for',\n",
724 " 'our',\n",
725 " 'purposes',\n",
726 " ',',\n",
727 " 'we',\n",
728 " 'are',\n",
729 " 'done',\n",
730 " ';',\n",
731 " 'if',\n",
732 " 'not',\n",
733 " ',',\n",
734 " 'we',\n",
735 " 'must',\n",
736 " 'repeat',\n",
737 " 'the',\n",
738 " 'process',\n",
739 " 'with',\n",
740 " 'an',\n",
741 " 'improved',\n",
742 " 'guess',\n",
743 " '.',\n",
744 " 'We',\n",
745 " 'write',\n",
746 " 'this',\n",
747 " 'basic',\n",
748 " 'strategy',\n",
749 " 'as',\n",
750 " 'a',\n",
751 " 'procedure',\n",
752 " ':',\n",
753 " '(',\n",
754 " 'define',\n",
755 " '(',\n",
756 " 'sqrt',\n",
757 " '-',\n",
758 " 'iter',\n",
759 " 'guess',\n",
760 " 'x',\n",
761 " ')',\n",
762 " '(',\n",
763 " 'if',\n",
764 " '(',\n",
765 " 'good',\n",
766 " '-',\n",
767 " 'enough',\n",
768 " '?',\n",
769 " 'guess',\n",
770 " 'x',\n",
771 " ')',\n",
772 " 'guess',\n",
773 " '(',\n",
774 " 'sqrt',\n",
775 " '-',\n",
776 " 'iter',\n",
777 " '(',\n",
778 " 'improve',\n",
779 " 'guess',\n",
780 " 'x',\n",
781 " ')',\n",
782 " 'x',\n",
783 " ')))',\n",
784 " 'A',\n",
785 " 'guess',\n",
786 " 'is',\n",
787 " 'improved',\n",
788 " 'by',\n",
789 " 'averaging',\n",
790 " 'it',\n",
791 " 'with',\n",
792 " 'the',\n",
793 " 'quotient',\n",
794 " 'of',\n",
795 " 'the',\n",
796 " 'radicand',\n",
797 " 'and',\n",
798 " 'the',\n",
799 " 'old',\n",
800 " 'guess',\n",
801 " ':',\n",
802 " '(',\n",
803 " 'define',\n",
804 " '(',\n",
805 " 'improve',\n",
806 " 'guess',\n",
807 " 'x',\n",
808 " ')',\n",
809 " '(',\n",
810 " 'average',\n",
811 " 'guess',\n",
812 " '(/',\n",
813 " 'x',\n",
814 " 'guess',\n",
815 " ')))',\n",
816 " 'where',\n",
817 " '(',\n",
818 " 'define',\n",
819 " '(',\n",
820 " 'average',\n",
821 " 'x',\n",
822 " 'y',\n",
823 " ')',\n",
824 " '(/',\n",
825 " '(+',\n",
826 " 'x',\n",
827 " 'y',\n",
828 " ')',\n",
829 " '2',\n",
830 " '))',\n",
831 " 'We',\n",
832 " 'also',\n",
833 " 'have',\n",
834 " 'to',\n",
835 " 'say',\n",
836 " 'what',\n",
837 " 'we',\n",
838 " 'mean',\n",
839 " 'by',\n",
840 " \"''\",\n",
841 " 'good',\n",
842 " 'enough',\n",
843 " \".''\",\n",
844 " 'The',\n",
845 " 'following',\n",
846 " 'will',\n",
847 " 'do',\n",
848 " 'for',\n",
849 " 'illustration',\n",
850 " ',',\n",
851 " 'but',\n",
852 " 'it',\n",
853 " 'is',\n",
854 " 'not',\n",
855 " 'really',\n",
856 " 'a',\n",
857 " 'very',\n",
858 " 'good',\n",
859 " 'test',\n",
860 " '.',\n",
861 " '(',\n",
862 " 'See',\n",
863 " 'exercise',\n",
864 " '1.7',\n",
865 " '.)',\n",
866 " 'The',\n",
867 " 'idea',\n",
868 " 'is',\n",
869 " 'to',\n",
870 " 'improve',\n",
871 " 'the',\n",
872 " 'answer',\n",
873 " 'until',\n",
874 " 'it',\n",
875 " 'is',\n",
876 " 'close',\n",
877 " 'enough',\n",
878 " 'so',\n",
879 " 'that',\n",
880 " 'its',\n",
881 " 'square',\n",
882 " 'differs',\n",
883 " 'from',\n",
884 " 'the',\n",
885 " 'radicand',\n",
886 " 'by',\n",
887 " 'less',\n",
888 " 'than',\n",
889 " 'a',\n",
890 " 'predetermined',\n",
891 " 'tolerance',\n",
892 " '(',\n",
893 " 'here',\n",
894 " '0.001',\n",
895 " '):',\n",
896 " '22',\n",
897 " '(',\n",
898 " 'define',\n",
899 " '(',\n",
900 " 'good',\n",
901 " '-',\n",
902 " 'enough',\n",
903 " '?',\n",
904 " 'guess',\n",
905 " 'x',\n",
906 " ')',\n",
907 " '(<',\n",
908 " '(',\n",
909 " 'abs',\n",
910 " '(-',\n",
911 " '(',\n",
912 " 'square',\n",
913 " 'guess',\n",
914 " ')',\n",
915 " 'x',\n",
916 " '))',\n",
917 " '0.001',\n",
918 " '))',\n",
919 " 'Finally',\n",
920 " ',',\n",
921 " 'we',\n",
922 " 'need',\n",
923 " 'a',\n",
924 " 'way',\n",
925 " 'to',\n",
926 " 'get',\n",
927 " 'started',\n",
928 " '.',\n",
929 " 'For',\n",
930 " 'instance',\n",
931 " ',',\n",
932 " 'we',\n",
933 " 'can',\n",
934 " 'always',\n",
935 " 'guess',\n",
936 " 'that',\n",
937 " 'the',\n",
938 " 'square',\n",
939 " 'root',\n",
940 " 'of',\n",
941 " 'any',\n",
942 " 'number',\n",
943 " 'is',\n",
944 " '1',\n",
945 " ':',\n",
946 " '23',\n",
947 " '(',\n",
948 " 'define',\n",
949 " '(',\n",
950 " 'sqrt',\n",
951 " 'x',\n",
952 " ')',\n",
953 " '(',\n",
954 " 'sqrt',\n",
955 " '-',\n",
956 " 'iter',\n",
957 " '1.0',\n",
958 " 'x',\n",
959 " '))',\n",
960 " 'If',\n",
961 " 'we',\n",
962 " 'type',\n",
963 " 'these',\n",
964 " 'definitions',\n",
965 " 'to',\n",
966 " 'the',\n",
967 " 'interpreter',\n",
968 " ',',\n",
969 " 'we',\n",
970 " 'can',\n",
971 " 'use',\n",
972 " 'sqrt',\n",
973 " 'just',\n",
974 " 'as',\n",
975 " 'we',\n",
976 " 'can',\n",
977 " 'use',\n",
978 " 'any',\n",
979 " 'procedure',\n",
980 " ':']"
981 ]
982 },
983 "execution_count": 13,
984 "metadata": {},
985 "output_type": "execute_result"
986 }
987 ],
988 "source": [
989 "tokenise(sample_text)"
990 ]
991 },
992 {
993 "cell_type": "code",
994 "execution_count": 14,
995 "metadata": {
996 "collapsed": false,
997 "scrolled": true
998 },
999 "outputs": [
1000 {
1001 "data": {
1002 "text/plain": [
1003 "['Exercise',\n",
1004 " '1.8',\n",
1005 " '.',\n",
1006 " \"Newton's\",\n",
1007 " 'method',\n",
1008 " 'for',\n",
1009 " 'cube',\n",
1010 " 'roots',\n",
1011 " 'is',\n",
1012 " 'based',\n",
1013 " 'on',\n",
1014 " 'the',\n",
1015 " 'fact',\n",
1016 " 'that',\n",
1017 " 'if',\n",
1018 " 'y',\n",
1019 " 'is',\n",
1020 " 'an',\n",
1021 " 'approximation',\n",
1022 " 'to',\n",
1023 " 'the',\n",
1024 " 'cube',\n",
1025 " 'root',\n",
1026 " 'of',\n",
1027 " 'x',\n",
1028 " ',',\n",
1029 " 'then',\n",
1030 " 'a',\n",
1031 " 'better',\n",
1032 " 'approximation',\n",
1033 " 'is',\n",
1034 " 'given',\n",
1035 " 'by',\n",
1036 " 'the',\n",
1037 " 'value',\n",
1038 " 'Use',\n",
1039 " 'this',\n",
1040 " 'formula',\n",
1041 " 'to',\n",
1042 " 'implement',\n",
1043 " 'a',\n",
1044 " 'cube',\n",
1045 " '-',\n",
1046 " 'root',\n",
1047 " 'procedure',\n",
1048 " 'analogous',\n",
1049 " 'to',\n",
1050 " 'the',\n",
1051 " 'square',\n",
1052 " '-',\n",
1053 " 'root',\n",
1054 " 'procedure',\n",
1055 " '.',\n",
1056 " '(',\n",
1057 " 'In',\n",
1058 " 'section',\n",
1059 " '1.3',\n",
1060 " '.',\n",
1061 " '4',\n",
1062 " 'we',\n",
1063 " 'will',\n",
1064 " 'see',\n",
1065 " 'how',\n",
1066 " 'to',\n",
1067 " 'implement',\n",
1068 " \"Newton's\",\n",
1069 " 'method',\n",
1070 " 'in',\n",
1071 " 'general',\n",
1072 " 'as',\n",
1073 " 'an',\n",
1074 " 'abstraction',\n",
1075 " 'of',\n",
1076 " 'these',\n",
1077 " 'square',\n",
1078 " '-',\n",
1079 " 'root',\n",
1080 " 'and',\n",
1081 " 'cube',\n",
1082 " '-',\n",
1083 " 'root',\n",
1084 " 'procedures',\n",
1085 " '.)',\n",
1086 " '1.1',\n",
1087 " '.',\n",
1088 " '8',\n",
1089 " 'Procedures',\n",
1090 " 'as',\n",
1091 " 'Black',\n",
1092 " '-',\n",
1093 " 'Box',\n",
1094 " 'Abstractions',\n",
1095 " 'Sqrt',\n",
1096 " 'is',\n",
1097 " 'our',\n",
1098 " 'first',\n",
1099 " 'example',\n",
1100 " 'of',\n",
1101 " 'a',\n",
1102 " 'process',\n",
1103 " 'defined',\n",
1104 " 'by',\n",
1105 " 'a',\n",
1106 " 'set',\n",
1107 " 'of',\n",
1108 " 'mutually',\n",
1109 " 'defined',\n",
1110 " 'procedures',\n",
1111 " '.',\n",
1112 " 'Notice',\n",
1113 " 'that',\n",
1114 " 'the',\n",
1115 " 'definition',\n",
1116 " 'of',\n",
1117 " 'sqrt',\n",
1118 " '-',\n",
1119 " 'iter',\n",
1120 " 'is',\n",
1121 " 'recursive',\n",
1122 " ';',\n",
1123 " 'that',\n",
1124 " 'is',\n",
1125 " ',',\n",
1126 " 'the',\n",
1127 " 'procedure',\n",
1128 " 'is',\n",
1129 " 'defined',\n",
1130 " 'in',\n",
1131 " 'terms',\n",
1132 " 'of',\n",
1133 " 'itself',\n",
1134 " '.',\n",
1135 " 'The',\n",
1136 " 'idea',\n",
1137 " 'of',\n",
1138 " 'being',\n",
1139 " 'able',\n",
1140 " 'to',\n",
1141 " 'define',\n",
1142 " 'a',\n",
1143 " 'procedure',\n",
1144 " 'in',\n",
1145 " 'terms',\n",
1146 " 'of',\n",
1147 " 'itself',\n",
1148 " 'may',\n",
1149 " 'be',\n",
1150 " 'disturbing',\n",
1151 " ';',\n",
1152 " 'it',\n",
1153 " 'may',\n",
1154 " 'seem',\n",
1155 " 'unclear',\n",
1156 " 'how',\n",
1157 " 'such',\n",
1158 " 'a',\n",
1159 " \"''\",\n",
1160 " 'circular',\n",
1161 " \"''\",\n",
1162 " 'definition',\n",
1163 " 'could',\n",
1164 " 'make',\n",
1165 " 'sense',\n",
1166 " 'at',\n",
1167 " 'all',\n",
1168 " ',',\n",
1169 " 'much',\n",
1170 " 'less',\n",
1171 " 'specify',\n",
1172 " 'a',\n",
1173 " 'well',\n",
1174 " '-',\n",
1175 " 'defined',\n",
1176 " 'process',\n",
1177 " 'to',\n",
1178 " 'be',\n",
1179 " 'carried']"
1180 ]
1181 },
1182 "execution_count": 14,
1183 "metadata": {},
1184 "output_type": "execute_result"
1185 }
1186 ],
1187 "source": [
1188 "tokenise(\"\"\"Exercise 1.8. Newton's method for cube roots is based on the fact that if y is an approximation to the\n",
1189 "cube root of x, then a better approximation is given by the value\n",
1190 "\n",
1191 "Use this formula to implement a cube-root procedure analogous to the square-root procedure. (In\n",
1192 "section 1.3.4 we will see how to implement Newton's method in general as an abstraction of these\n",
1193 "square-root and cube-root procedures.)\n",
1194 "\n",
1195 "1.1.8 Procedures as Black-Box Abstractions\n",
1196 "Sqrt is our first example of a process defined by a set of mutually defined procedures. Notice that the\n",
1197 "definition of sqrt-iter is recursive; that is, the procedure is defined in terms of itself. The idea of\n",
1198 "being able to define a procedure in terms of itself may be disturbing; it may seem unclear how such a\n",
1199 "''circular'' definition could make sense at all, much less specify a well-defined process to be carried\n",
1200 "\"\"\")"
1201 ]
1202 },
1203 {
1204 "cell_type": "code",
1205 "execution_count": 15,
1206 "metadata": {
1207 "collapsed": true
1208 },
1209 "outputs": [],
1210 "source": [
1211 "def find_counts_of_item(item, counts, tuple_size):\n",
1212 " for i in range(len(item)-(tuple_size)):\n",
1213 " counts[tuple(item[i:i+tuple_size])].update([item[i+tuple_size]])\n",
1214 " counts[tuple(item[-tuple_size:])].update([None])\n",
1215 " return counts"
1216 ]
1217 },
1218 {
1219 "cell_type": "code",
1220 "execution_count": 16,
1221 "metadata": {
1222 "collapsed": true
1223 },
1224 "outputs": [],
1225 "source": [
1226 "def find_counts(items, tuple_size=2):\n",
1227 " counts = collections.defaultdict(collections.Counter)\n",
1228 " starts = collections.Counter()\n",
1229 " for item in items:\n",
1230 " counts = find_counts_of_item(item, counts, tuple_size)\n",
1231 " starts[tuple(item[:tuple_size])] += 1\n",
1232 " return starts, counts"
1233 ]
1234 },
1235 {
1236 "cell_type": "code",
1237 "execution_count": 17,
1238 "metadata": {
1239 "collapsed": false
1240 },
1241 "outputs": [],
1242 "source": [
1243 "def sentences(tokens):\n",
1244 " sents = []\n",
1245 " sent = []\n",
1246 " for i in range(len(tokens)):\n",
1247 " if tokens[i] == '.':\n",
1248 " sents += [sent + [tokens[i]]]\n",
1249 " sent = []\n",
1250 " else:\n",
1251 " sent += [tokens[i]]\n",
1252 " return sents"
1253 ]
1254 },
1255 {
1256 "cell_type": "code",
1257 "execution_count": 18,
1258 "metadata": {
1259 "collapsed": false,
1260 "scrolled": true
1261 },
1262 "outputs": [
1263 {
1264 "data": {
1265 "text/plain": [
1266 "[['Continuing',\n",
1267 " 'this',\n",
1268 " 'process',\n",
1269 " ',',\n",
1270 " 'we',\n",
1271 " 'obtain',\n",
1272 " 'better',\n",
1273 " 'and',\n",
1274 " 'better',\n",
1275 " 'approximations',\n",
1276 " 'to',\n",
1277 " 'the',\n",
1278 " 'square',\n",
1279 " 'root',\n",
1280 " '.'],\n",
1281 " ['Now',\n",
1282 " \"let's\",\n",
1283 " 'formalize',\n",
1284 " 'the',\n",
1285 " 'process',\n",
1286 " 'in',\n",
1287 " 'terms',\n",
1288 " 'of',\n",
1289 " 'procedures',\n",
1290 " '.'],\n",
1291 " ['We',\n",
1292 " 'start',\n",
1293 " 'with',\n",
1294 " 'a',\n",
1295 " 'value',\n",
1296 " 'for',\n",
1297 " 'the',\n",
1298 " 'radicand',\n",
1299 " '(',\n",
1300 " 'the',\n",
1301 " 'number',\n",
1302 " 'whose',\n",
1303 " 'square',\n",
1304 " 'root',\n",
1305 " 'we',\n",
1306 " 'are',\n",
1307 " 'trying',\n",
1308 " 'to',\n",
1309 " 'compute',\n",
1310 " ')',\n",
1311 " 'and',\n",
1312 " 'a',\n",
1313 " 'value',\n",
1314 " 'for',\n",
1315 " 'the',\n",
1316 " 'guess',\n",
1317 " '.'],\n",
1318 " ['If',\n",
1319 " 'the',\n",
1320 " 'guess',\n",
1321 " 'is',\n",
1322 " 'good',\n",
1323 " 'enough',\n",
1324 " 'for',\n",
1325 " 'our',\n",
1326 " 'purposes',\n",
1327 " ',',\n",
1328 " 'we',\n",
1329 " 'are',\n",
1330 " 'done',\n",
1331 " ';',\n",
1332 " 'if',\n",
1333 " 'not',\n",
1334 " ',',\n",
1335 " 'we',\n",
1336 " 'must',\n",
1337 " 'repeat',\n",
1338 " 'the',\n",
1339 " 'process',\n",
1340 " 'with',\n",
1341 " 'an',\n",
1342 " 'improved',\n",
1343 " 'guess',\n",
1344 " '.'],\n",
1345 " ['We',\n",
1346 " 'write',\n",
1347 " 'this',\n",
1348 " 'basic',\n",
1349 " 'strategy',\n",
1350 " 'as',\n",
1351 " 'a',\n",
1352 " 'procedure',\n",
1353 " ':',\n",
1354 " '(',\n",
1355 " 'define',\n",
1356 " '(',\n",
1357 " 'sqrt',\n",
1358 " '-',\n",
1359 " 'iter',\n",
1360 " 'guess',\n",
1361 " 'x',\n",
1362 " ')',\n",
1363 " '(',\n",
1364 " 'if',\n",
1365 " '(',\n",
1366 " 'good',\n",
1367 " '-',\n",
1368 " 'enough',\n",
1369 " '?',\n",
1370 " 'guess',\n",
1371 " 'x',\n",
1372 " ')',\n",
1373 " 'guess',\n",
1374 " '(',\n",
1375 " 'sqrt',\n",
1376 " '-',\n",
1377 " 'iter',\n",
1378 " '(',\n",
1379 " 'improve',\n",
1380 " 'guess',\n",
1381 " 'x',\n",
1382 " ')',\n",
1383 " 'x',\n",
1384 " ')))',\n",
1385 " 'A',\n",
1386 " 'guess',\n",
1387 " 'is',\n",
1388 " 'improved',\n",
1389 " 'by',\n",
1390 " 'averaging',\n",
1391 " 'it',\n",
1392 " 'with',\n",
1393 " 'the',\n",
1394 " 'quotient',\n",
1395 " 'of',\n",
1396 " 'the',\n",
1397 " 'radicand',\n",
1398 " 'and',\n",
1399 " 'the',\n",
1400 " 'old',\n",
1401 " 'guess',\n",
1402 " ':',\n",
1403 " '(',\n",
1404 " 'define',\n",
1405 " '(',\n",
1406 " 'improve',\n",
1407 " 'guess',\n",
1408 " 'x',\n",
1409 " ')',\n",
1410 " '(',\n",
1411 " 'average',\n",
1412 " 'guess',\n",
1413 " '(/',\n",
1414 " 'x',\n",
1415 " 'guess',\n",
1416 " ')))',\n",
1417 " 'where',\n",
1418 " '(',\n",
1419 " 'define',\n",
1420 " '(',\n",
1421 " 'average',\n",
1422 " 'x',\n",
1423 " 'y',\n",
1424 " ')',\n",
1425 " '(/',\n",
1426 " '(+',\n",
1427 " 'x',\n",
1428 " 'y',\n",
1429 " ')',\n",
1430 " '2',\n",
1431 " '))',\n",
1432 " 'We',\n",
1433 " 'also',\n",
1434 " 'have',\n",
1435 " 'to',\n",
1436 " 'say',\n",
1437 " 'what',\n",
1438 " 'we',\n",
1439 " 'mean',\n",
1440 " 'by',\n",
1441 " \"''\",\n",
1442 " 'good',\n",
1443 " 'enough',\n",
1444 " \".''\",\n",
1445 " 'The',\n",
1446 " 'following',\n",
1447 " 'will',\n",
1448 " 'do',\n",
1449 " 'for',\n",
1450 " 'illustration',\n",
1451 " ',',\n",
1452 " 'but',\n",
1453 " 'it',\n",
1454 " 'is',\n",
1455 " 'not',\n",
1456 " 'really',\n",
1457 " 'a',\n",
1458 " 'very',\n",
1459 " 'good',\n",
1460 " 'test',\n",
1461 " '.'],\n",
1462 " ['(',\n",
1463 " 'See',\n",
1464 " 'exercise',\n",
1465 " '1.7',\n",
1466 " '.)',\n",
1467 " 'The',\n",
1468 " 'idea',\n",
1469 " 'is',\n",
1470 " 'to',\n",
1471 " 'improve',\n",
1472 " 'the',\n",
1473 " 'answer',\n",
1474 " 'until',\n",
1475 " 'it',\n",
1476 " 'is',\n",
1477 " 'close',\n",
1478 " 'enough',\n",
1479 " 'so',\n",
1480 " 'that',\n",
1481 " 'its',\n",
1482 " 'square',\n",
1483 " 'differs',\n",
1484 " 'from',\n",
1485 " 'the',\n",
1486 " 'radicand',\n",
1487 " 'by',\n",
1488 " 'less',\n",
1489 " 'than',\n",
1490 " 'a',\n",
1491 " 'predetermined',\n",
1492 " 'tolerance',\n",
1493 " '(',\n",
1494 " 'here',\n",
1495 " '0.001',\n",
1496 " '):',\n",
1497 " '22',\n",
1498 " '(',\n",
1499 " 'define',\n",
1500 " '(',\n",
1501 " 'good',\n",
1502 " '-',\n",
1503 " 'enough',\n",
1504 " '?',\n",
1505 " 'guess',\n",
1506 " 'x',\n",
1507 " ')',\n",
1508 " '(<',\n",
1509 " '(',\n",
1510 " 'abs',\n",
1511 " '(-',\n",
1512 " '(',\n",
1513 " 'square',\n",
1514 " 'guess',\n",
1515 " ')',\n",
1516 " 'x',\n",
1517 " '))',\n",
1518 " '0.001',\n",
1519 " '))',\n",
1520 " 'Finally',\n",
1521 " ',',\n",
1522 " 'we',\n",
1523 " 'need',\n",
1524 " 'a',\n",
1525 " 'way',\n",
1526 " 'to',\n",
1527 " 'get',\n",
1528 " 'started',\n",
1529 " '.']]"
1530 ]
1531 },
1532 "execution_count": 18,
1533 "metadata": {},
1534 "output_type": "execute_result"
1535 }
1536 ],
1537 "source": [
1538 "sentences(tokenise(sample_text))"
1539 ]
1540 },
1541 {
1542 "cell_type": "code",
1543 "execution_count": 19,
1544 "metadata": {
1545 "collapsed": false
1546 },
1547 "outputs": [
1548 {
1549 "data": {
1550 "text/plain": [
1551 "(Counter({('Continuing', 'this'): 1}),\n",
1552 " defaultdict(collections.Counter,\n",
1553 " {(',', 'we'): Counter({'obtain': 1}),\n",
1554 " ('Continuing', 'this'): Counter({'process': 1}),\n",
1555 " ('and', 'better'): Counter({'approximations': 1}),\n",
1556 " ('approximations', 'to'): Counter({'the': 1}),\n",
1557 " ('better', 'and'): Counter({'better': 1}),\n",
1558 " ('better', 'approximations'): Counter({'to': 1}),\n",
1559 " ('obtain', 'better'): Counter({'and': 1}),\n",
1560 " ('process', ','): Counter({'we': 1}),\n",
1561 " ('root', '.'): Counter({None: 1}),\n",
1562 " ('square', 'root'): Counter({'.': 1}),\n",
1563 " ('the', 'square'): Counter({'root': 1}),\n",
1564 " ('this', 'process'): Counter({',': 1}),\n",
1565 " ('to', 'the'): Counter({'square': 1}),\n",
1566 " ('we', 'obtain'): Counter({'better': 1})}))"
1567 ]
1568 },
1569 "execution_count": 19,
1570 "metadata": {},
1571 "output_type": "execute_result"
1572 }
1573 ],
1574 "source": [
1575 "one_s_starts, one_s_counts = find_counts([sentences(tokenise(sample_text))[0]])\n",
1576 "one_s_starts, one_s_counts"
1577 ]
1578 },
1579 {
1580 "cell_type": "code",
1581 "execution_count": 20,
1582 "metadata": {
1583 "collapsed": false,
1584 "scrolled": true
1585 },
1586 "outputs": [
1587 {
1588 "data": {
1589 "text/plain": [
1590 "(Counter({('(', 'See', 'exercise'): 1,\n",
1591 " ('Continuing', 'this', 'process'): 1,\n",
1592 " ('If', 'the', 'guess'): 1,\n",
1593 " ('Now', \"let's\", 'formalize'): 1,\n",
1594 " ('We', 'start', 'with'): 1,\n",
1595 " ('We', 'write', 'this'): 1}),\n",
1596 " defaultdict(collections.Counter,\n",
1597 " {(\"''\", 'good', 'enough'): Counter({\".''\": 1}),\n",
1598 " ('(', 'See', 'exercise'): Counter({'1.7': 1}),\n",
1599 " ('(', 'abs', '(-'): Counter({'(': 1}),\n",
1600 " ('(', 'average', 'guess'): Counter({'(/': 1}),\n",
1601 " ('(', 'average', 'x'): Counter({'y': 1}),\n",
1602 " ('(',\n",
1603 " 'define',\n",
1604 " '('): Counter({'average': 1,\n",
1605 " 'good': 1,\n",
1606 " 'improve': 1,\n",
1607 " 'sqrt': 1}),\n",
1608 " ('(', 'good', '-'): Counter({'enough': 2}),\n",
1609 " ('(', 'here', '0.001'): Counter({'):': 1}),\n",
1610 " ('(', 'if', '('): Counter({'good': 1}),\n",
1611 " ('(', 'improve', 'guess'): Counter({'x': 2}),\n",
1612 " ('(', 'sqrt', '-'): Counter({'iter': 2}),\n",
1613 " ('(', 'square', 'guess'): Counter({')': 1}),\n",
1614 " ('(', 'the', 'number'): Counter({'whose': 1}),\n",
1615 " ('(+', 'x', 'y'): Counter({')': 1}),\n",
1616 " ('(-', '(', 'square'): Counter({'guess': 1}),\n",
1617 " ('(/', '(+', 'x'): Counter({'y': 1}),\n",
1618 " ('(/', 'x', 'guess'): Counter({')))': 1}),\n",
1619 " ('(<', '(', 'abs'): Counter({'(-': 1}),\n",
1620 " (')', '(', 'average'): Counter({'guess': 1}),\n",
1621 " (')', '(', 'if'): Counter({'(': 1}),\n",
1622 " (')', '(/', '(+'): Counter({'x': 1}),\n",
1623 " (')', '(<', '('): Counter({'abs': 1}),\n",
1624 " (')', '2', '))'): Counter({'We': 1}),\n",
1625 " (')', 'and', 'a'): Counter({'value': 1}),\n",
1626 " (')', 'guess', '('): Counter({'sqrt': 1}),\n",
1627 " (')', 'x', '))'): Counter({'0.001': 1}),\n",
1628 " (')', 'x', ')))'): Counter({'A': 1}),\n",
1629 " ('))', '0.001', '))'): Counter({'Finally': 1}),\n",
1630 " ('))', 'Finally', ','): Counter({'we': 1}),\n",
1631 " ('))', 'We', 'also'): Counter({'have': 1}),\n",
1632 " (')))', 'A', 'guess'): Counter({'is': 1}),\n",
1633 " (')))', 'where', '('): Counter({'define': 1}),\n",
1634 " ('):', '22', '('): Counter({'define': 1}),\n",
1635 " (',', 'but', 'it'): Counter({'is': 1}),\n",
1636 " (',', 'we', 'are'): Counter({'done': 1}),\n",
1637 " (',', 'we', 'must'): Counter({'repeat': 1}),\n",
1638 " (',', 'we', 'need'): Counter({'a': 1}),\n",
1639 " (',', 'we', 'obtain'): Counter({'better': 1}),\n",
1640 " ('-', 'enough', '?'): Counter({'guess': 2}),\n",
1641 " ('-', 'iter', '('): Counter({'improve': 1}),\n",
1642 " ('-', 'iter', 'guess'): Counter({'x': 1}),\n",
1643 " (\".''\", 'The', 'following'): Counter({'will': 1}),\n",
1644 " ('.)', 'The', 'idea'): Counter({'is': 1}),\n",
1645 " ('0.001', '))', 'Finally'): Counter({',': 1}),\n",
1646 " ('0.001', '):', '22'): Counter({'(': 1}),\n",
1647 " ('1.7', '.)', 'The'): Counter({'idea': 1}),\n",
1648 " ('2', '))', 'We'): Counter({'also': 1}),\n",
1649 " ('22', '(', 'define'): Counter({'(': 1}),\n",
1650 " (':', '(', 'define'): Counter({'(': 2}),\n",
1651 " (';', 'if', 'not'): Counter({',': 1}),\n",
1652 " ('?', 'guess', 'x'): Counter({')': 2}),\n",
1653 " ('A', 'guess', 'is'): Counter({'improved': 1}),\n",
1654 " ('Continuing', 'this', 'process'): Counter({',': 1}),\n",
1655 " ('Finally', ',', 'we'): Counter({'need': 1}),\n",
1656 " ('If', 'the', 'guess'): Counter({'is': 1}),\n",
1657 " ('Now', \"let's\", 'formalize'): Counter({'the': 1}),\n",
1658 " ('See', 'exercise', '1.7'): Counter({'.)': 1}),\n",
1659 " ('The', 'following', 'will'): Counter({'do': 1}),\n",
1660 " ('The', 'idea', 'is'): Counter({'to': 1}),\n",
1661 " ('We', 'also', 'have'): Counter({'to': 1}),\n",
1662 " ('We', 'start', 'with'): Counter({'a': 1}),\n",
1663 " ('We', 'write', 'this'): Counter({'basic': 1}),\n",
1664 " ('a', 'predetermined', 'tolerance'): Counter({'(': 1}),\n",
1665 " ('a', 'procedure', ':'): Counter({'(': 1}),\n",
1666 " ('a', 'value', 'for'): Counter({'the': 2}),\n",
1667 " ('a', 'very', 'good'): Counter({'test': 1}),\n",
1668 " ('a', 'way', 'to'): Counter({'get': 1}),\n",
1669 " ('abs', '(-', '('): Counter({'square': 1}),\n",
1670 " ('also', 'have', 'to'): Counter({'say': 1}),\n",
1671 " ('an', 'improved', 'guess'): Counter({'.': 1}),\n",
1672 " ('and', 'a', 'value'): Counter({'for': 1}),\n",
1673 " ('and', 'better', 'approximations'): Counter({'to': 1}),\n",
1674 " ('and', 'the', 'old'): Counter({'guess': 1}),\n",
1675 " ('answer', 'until', 'it'): Counter({'is': 1}),\n",
1676 " ('approximations', 'to', 'the'): Counter({'square': 1}),\n",
1677 " ('are', 'done', ';'): Counter({'if': 1}),\n",
1678 " ('are', 'trying', 'to'): Counter({'compute': 1}),\n",
1679 " ('as', 'a', 'procedure'): Counter({':': 1}),\n",
1680 " ('average', 'guess', '(/'): Counter({'x': 1}),\n",
1681 " ('average', 'x', 'y'): Counter({')': 1}),\n",
1682 " ('averaging', 'it', 'with'): Counter({'the': 1}),\n",
1683 " ('basic', 'strategy', 'as'): Counter({'a': 1}),\n",
1684 " ('better', 'and', 'better'): Counter({'approximations': 1}),\n",
1685 " ('better', 'approximations', 'to'): Counter({'the': 1}),\n",
1686 " ('but', 'it', 'is'): Counter({'not': 1}),\n",
1687 " ('by', \"''\", 'good'): Counter({'enough': 1}),\n",
1688 " ('by', 'averaging', 'it'): Counter({'with': 1}),\n",
1689 " ('by', 'less', 'than'): Counter({'a': 1}),\n",
1690 " ('close', 'enough', 'so'): Counter({'that': 1}),\n",
1691 " ('compute', ')', 'and'): Counter({'a': 1}),\n",
1692 " ('define', '(', 'average'): Counter({'x': 1}),\n",
1693 " ('define', '(', 'good'): Counter({'-': 1}),\n",
1694 " ('define', '(', 'improve'): Counter({'guess': 1}),\n",
1695 " ('define', '(', 'sqrt'): Counter({'-': 1}),\n",
1696 " ('differs', 'from', 'the'): Counter({'radicand': 1}),\n",
1697 " ('do', 'for', 'illustration'): Counter({',': 1}),\n",
1698 " ('done', ';', 'if'): Counter({'not': 1}),\n",
1699 " ('enough', \".''\", 'The'): Counter({'following': 1}),\n",
1700 " ('enough', '?', 'guess'): Counter({'x': 2}),\n",
1701 " ('enough', 'for', 'our'): Counter({'purposes': 1}),\n",
1702 " ('enough', 'so', 'that'): Counter({'its': 1}),\n",
1703 " ('exercise', '1.7', '.)'): Counter({'The': 1}),\n",
1704 " ('following', 'will', 'do'): Counter({'for': 1}),\n",
1705 " ('for', 'illustration', ','): Counter({'but': 1}),\n",
1706 " ('for', 'our', 'purposes'): Counter({',': 1}),\n",
1707 " ('for', 'the', 'guess'): Counter({'.': 1}),\n",
1708 " ('for', 'the', 'radicand'): Counter({'(': 1}),\n",
1709 " ('formalize', 'the', 'process'): Counter({'in': 1}),\n",
1710 " ('from', 'the', 'radicand'): Counter({'by': 1}),\n",
1711 " ('get', 'started', '.'): Counter({None: 1}),\n",
1712 " ('good', '-', 'enough'): Counter({'?': 2}),\n",
1713 " ('good', 'enough', \".''\"): Counter({'The': 1}),\n",
1714 " ('good', 'enough', 'for'): Counter({'our': 1}),\n",
1715 " ('good', 'test', '.'): Counter({None: 1}),\n",
1716 " ('guess', '(', 'sqrt'): Counter({'-': 1}),\n",
1717 " ('guess', '(/', 'x'): Counter({'guess': 1}),\n",
1718 " ('guess', ')', 'x'): Counter({'))': 1}),\n",
1719 " ('guess', ')))', 'where'): Counter({'(': 1}),\n",
1720 " ('guess', ':', '('): Counter({'define': 1}),\n",
1721 " ('guess', 'is', 'good'): Counter({'enough': 1}),\n",
1722 " ('guess', 'is', 'improved'): Counter({'by': 1}),\n",
1723 " ('guess',\n",
1724 " 'x',\n",
1725 " ')'): Counter({'(': 2, '(<': 1, 'guess': 1, 'x': 1}),\n",
1726 " ('have', 'to', 'say'): Counter({'what': 1}),\n",
1727 " ('here', '0.001', '):'): Counter({'22': 1}),\n",
1728 " ('idea', 'is', 'to'): Counter({'improve': 1}),\n",
1729 " ('if', '(', 'good'): Counter({'-': 1}),\n",
1730 " ('if', 'not', ','): Counter({'we': 1}),\n",
1731 " ('illustration', ',', 'but'): Counter({'it': 1}),\n",
1732 " ('improve', 'guess', 'x'): Counter({')': 2}),\n",
1733 " ('improve', 'the', 'answer'): Counter({'until': 1}),\n",
1734 " ('improved', 'by', 'averaging'): Counter({'it': 1}),\n",
1735 " ('improved', 'guess', '.'): Counter({None: 1}),\n",
1736 " ('in', 'terms', 'of'): Counter({'procedures': 1}),\n",
1737 " ('is', 'close', 'enough'): Counter({'so': 1}),\n",
1738 " ('is', 'good', 'enough'): Counter({'for': 1}),\n",
1739 " ('is', 'improved', 'by'): Counter({'averaging': 1}),\n",
1740 " ('is', 'not', 'really'): Counter({'a': 1}),\n",
1741 " ('is', 'to', 'improve'): Counter({'the': 1}),\n",
1742 " ('it', 'is', 'close'): Counter({'enough': 1}),\n",
1743 " ('it', 'is', 'not'): Counter({'really': 1}),\n",
1744 " ('it', 'with', 'the'): Counter({'quotient': 1}),\n",
1745 " ('iter', '(', 'improve'): Counter({'guess': 1}),\n",
1746 " ('iter', 'guess', 'x'): Counter({')': 1}),\n",
1747 " ('its', 'square', 'differs'): Counter({'from': 1}),\n",
1748 " ('less', 'than', 'a'): Counter({'predetermined': 1}),\n",
1749 " (\"let's\", 'formalize', 'the'): Counter({'process': 1}),\n",
1750 " ('mean', 'by', \"''\"): Counter({'good': 1}),\n",
1751 " ('must', 'repeat', 'the'): Counter({'process': 1}),\n",
1752 " ('need', 'a', 'way'): Counter({'to': 1}),\n",
1753 " ('not', ',', 'we'): Counter({'must': 1}),\n",
1754 " ('not', 'really', 'a'): Counter({'very': 1}),\n",
1755 " ('number', 'whose', 'square'): Counter({'root': 1}),\n",
1756 " ('obtain', 'better', 'and'): Counter({'better': 1}),\n",
1757 " ('of', 'procedures', '.'): Counter({None: 1}),\n",
1758 " ('of', 'the', 'radicand'): Counter({'and': 1}),\n",
1759 " ('old', 'guess', ':'): Counter({'(': 1}),\n",
1760 " ('our', 'purposes', ','): Counter({'we': 1}),\n",
1761 " ('predetermined', 'tolerance', '('): Counter({'here': 1}),\n",
1762 " ('procedure', ':', '('): Counter({'define': 1}),\n",
1763 " ('process', ',', 'we'): Counter({'obtain': 1}),\n",
1764 " ('process', 'in', 'terms'): Counter({'of': 1}),\n",
1765 " ('process', 'with', 'an'): Counter({'improved': 1}),\n",
1766 " ('purposes', ',', 'we'): Counter({'are': 1}),\n",
1767 " ('quotient', 'of', 'the'): Counter({'radicand': 1}),\n",
1768 " ('radicand', '(', 'the'): Counter({'number': 1}),\n",
1769 " ('radicand', 'and', 'the'): Counter({'old': 1}),\n",
1770 " ('radicand', 'by', 'less'): Counter({'than': 1}),\n",
1771 " ('really', 'a', 'very'): Counter({'good': 1}),\n",
1772 " ('repeat', 'the', 'process'): Counter({'with': 1}),\n",
1773 " ('root', 'we', 'are'): Counter({'trying': 1}),\n",
1774 " ('say', 'what', 'we'): Counter({'mean': 1}),\n",
1775 " ('so', 'that', 'its'): Counter({'square': 1}),\n",
1776 " ('sqrt', '-', 'iter'): Counter({'(': 1, 'guess': 1}),\n",
1777 " ('square', 'differs', 'from'): Counter({'the': 1}),\n",
1778 " ('square', 'guess', ')'): Counter({'x': 1}),\n",
1779 " ('square', 'root', '.'): Counter({None: 1}),\n",
1780 " ('square', 'root', 'we'): Counter({'are': 1}),\n",
1781 " ('start', 'with', 'a'): Counter({'value': 1}),\n",
1782 " ('strategy', 'as', 'a'): Counter({'procedure': 1}),\n",
1783 " ('terms', 'of', 'procedures'): Counter({'.': 1}),\n",
1784 " ('than', 'a', 'predetermined'): Counter({'tolerance': 1}),\n",
1785 " ('that', 'its', 'square'): Counter({'differs': 1}),\n",
1786 " ('the', 'answer', 'until'): Counter({'it': 1}),\n",
1787 " ('the', 'guess', '.'): Counter({None: 1}),\n",
1788 " ('the', 'guess', 'is'): Counter({'good': 1}),\n",
1789 " ('the', 'number', 'whose'): Counter({'square': 1}),\n",
1790 " ('the', 'old', 'guess'): Counter({':': 1}),\n",
1791 " ('the', 'process', 'in'): Counter({'terms': 1}),\n",
1792 " ('the', 'process', 'with'): Counter({'an': 1}),\n",
1793 " ('the', 'quotient', 'of'): Counter({'the': 1}),\n",
1794 " ('the', 'radicand', '('): Counter({'the': 1}),\n",
1795 " ('the', 'radicand', 'and'): Counter({'the': 1}),\n",
1796 " ('the', 'radicand', 'by'): Counter({'less': 1}),\n",
1797 " ('the', 'square', 'root'): Counter({'.': 1}),\n",
1798 " ('this', 'basic', 'strategy'): Counter({'as': 1}),\n",
1799 " ('this', 'process', ','): Counter({'we': 1}),\n",
1800 " ('to', 'compute', ')'): Counter({'and': 1}),\n",
1801 " ('to', 'get', 'started'): Counter({'.': 1}),\n",
1802 " ('to', 'improve', 'the'): Counter({'answer': 1}),\n",
1803 " ('to', 'say', 'what'): Counter({'we': 1}),\n",
1804 " ('to', 'the', 'square'): Counter({'root': 1}),\n",
1805 " ('tolerance', '(', 'here'): Counter({'0.001': 1}),\n",
1806 " ('trying', 'to', 'compute'): Counter({')': 1}),\n",
1807 " ('until', 'it', 'is'): Counter({'close': 1}),\n",
1808 " ('value', 'for', 'the'): Counter({'guess': 1, 'radicand': 1}),\n",
1809 " ('very', 'good', 'test'): Counter({'.': 1}),\n",
1810 " ('way', 'to', 'get'): Counter({'started': 1}),\n",
1811 " ('we', 'are', 'done'): Counter({';': 1}),\n",
1812 " ('we', 'are', 'trying'): Counter({'to': 1}),\n",
1813 " ('we', 'mean', 'by'): Counter({\"''\": 1}),\n",
1814 " ('we', 'must', 'repeat'): Counter({'the': 1}),\n",
1815 " ('we', 'need', 'a'): Counter({'way': 1}),\n",
1816 " ('we', 'obtain', 'better'): Counter({'and': 1}),\n",
1817 " ('what', 'we', 'mean'): Counter({'by': 1}),\n",
1818 " ('where', '(', 'define'): Counter({'(': 1}),\n",
1819 " ('whose', 'square', 'root'): Counter({'we': 1}),\n",
1820 " ('will', 'do', 'for'): Counter({'illustration': 1}),\n",
1821 " ('with', 'a', 'value'): Counter({'for': 1}),\n",
1822 " ('with', 'an', 'improved'): Counter({'guess': 1}),\n",
1823 " ('with', 'the', 'quotient'): Counter({'of': 1}),\n",
1824 " ('write', 'this', 'basic'): Counter({'strategy': 1}),\n",
1825 " ('x', ')', '('): Counter({'average': 1, 'if': 1}),\n",
1826 " ('x', ')', '(<'): Counter({'(': 1}),\n",
1827 " ('x', ')', 'guess'): Counter({'(': 1}),\n",
1828 " ('x', ')', 'x'): Counter({')))': 1}),\n",
1829 " ('x', '))', '0.001'): Counter({'))': 1}),\n",
1830 " ('x', ')))', 'A'): Counter({'guess': 1}),\n",
1831 " ('x', 'guess', ')))'): Counter({'where': 1}),\n",
1832 " ('x', 'y', ')'): Counter({'(/': 1, '2': 1}),\n",
1833 " ('y', ')', '(/'): Counter({'(+': 1}),\n",
1834 " ('y', ')', '2'): Counter({'))': 1})}))"
1835 ]
1836 },
1837 "execution_count": 20,
1838 "metadata": {},
1839 "output_type": "execute_result"
1840 }
1841 ],
1842 "source": [
1843 "find_counts(sentences(tokenise(sample_text)), tuple_size=3)"
1844 ]
1845 },
1846 {
1847 "cell_type": "code",
1848 "execution_count": 21,
1849 "metadata": {
1850 "collapsed": false
1851 },
1852 "outputs": [
1853 {
1854 "data": {
1855 "text/plain": [
1856 "('Continuing', 'this', 'process')"
1857 ]
1858 },
1859 "execution_count": 21,
1860 "metadata": {},
1861 "output_type": "execute_result"
1862 }
1863 ],
1864 "source": [
1865 "s = sentences(tokenise(sample_text))[0]\n",
1866 "tuple(s[:3])"
1867 ]
1868 },
1869 {
1870 "cell_type": "code",
1871 "execution_count": 22,
1872 "metadata": {
1873 "collapsed": true
1874 },
1875 "outputs": [],
1876 "source": [
1877 "unaccent_specials = ''.maketrans({\"’\": \"'\", \"’\": \"'\"})\n",
1878 "def unaccent(text):\n",
1879 " \"\"\"Remove all accents from letters.\n",
1880 " It does this by converting the unicode string to decomposed compatability\n",
1881 " form, dropping all the combining accents, then re-encoding the bytes.\n",
1882 "\n",
1883 " >>> unaccent('hello')\n",
1884 " 'hello'\n",
1885 " >>> unaccent('HELLO')\n",
1886 " 'HELLO'\n",
1887 " >>> unaccent('héllo')\n",
1888 " 'hello'\n",
1889 " >>> unaccent('héllö')\n",
1890 " 'hello'\n",
1891 " >>> unaccent('HÉLLÖ')\n",
1892 " 'HELLO'\n",
1893 " \"\"\"\n",
1894 " translated_text = text.translate(unaccent_specials)\n",
1895 " return unicodedata.normalize('NFKD', translated_text).\\\n",
1896 " encode('ascii', 'ignore').\\\n",
1897 " decode('utf-8')"
1898 ]
1899 },
1900 {
1901 "cell_type": "code",
1902 "execution_count": 23,
1903 "metadata": {
1904 "collapsed": false
1905 },
1906 "outputs": [],
1907 "source": [
1908 "sicp = unaccent(open('sicp.txt').read())\n",
1909 "sicp_starts, sicp_counts = find_counts(sentences(tokenise(sicp)), tuple_size=3)"
1910 ]
1911 },
1912 {
1913 "cell_type": "code",
1914 "execution_count": 24,
1915 "metadata": {
1916 "collapsed": false
1917 },
1918 "outputs": [
1919 {
1920 "data": {
1921 "text/plain": [
1922 "[(('the', 'controller', 'completely'), Counter({'describeof': 1})),\n",
1923 " (('equations', 'imply', 'that'), Counter({'(': 1})),\n",
1924 " (('important', 'point', 'to'), Counter({'note': 1, 'observe': 1})),\n",
1925 " (('separately', 'by', 'each'), Counter({'query': 1})),\n",
1926 " (('to', 'explore', 'variations'), Counter({'of': 1})),\n",
1927 " (('rock', 'songs', '.'), Counter({None: 1})),\n",
1928 " (('software', 'engineers', 'have'), Counter({'the': 1})),\n",
1929 " (('great', 'confusion', ','), Counter({'as': 1})),\n",
1930 " (('increasingly', 'elaborate', 'models'), Counter({'of': 1})),\n",
1931 " (('-', 'lambda1', '))'), Counter({'entry2': 2}))]"
1932 ]
1933 },
1934 "execution_count": 24,
1935 "metadata": {},
1936 "output_type": "execute_result"
1937 }
1938 ],
1939 "source": [
1940 "list(sicp_counts.items())[:10]"
1941 ]
1942 },
1943 {
1944 "cell_type": "code",
1945 "execution_count": 25,
1946 "metadata": {
1947 "collapsed": false
1948 },
1949 "outputs": [
1950 {
1951 "data": {
1952 "text/plain": [
1953 "[(('29', 'There', 'is'), 1),\n",
1954 " (('In', 'such', 'an'), 1),\n",
1955 " (('To', 'answer', 'a'), 1),\n",
1956 " (('Y', '.'), 2),\n",
1957 " (('If', 'f', '('), 1),\n",
1958 " (('This', 'can', 'greatly'), 1),\n",
1959 " (('Design', 'a', 'machine'), 1),\n",
1960 " (('Exercise', '2.8', '.'), 1),\n",
1961 " (('Control', '(', 'how'), 1),\n",
1962 " (('C', '.'), 4)]"
1963 ]
1964 },
1965 "execution_count": 25,
1966 "metadata": {},
1967 "output_type": "execute_result"
1968 }
1969 ],
1970 "source": [
1971 "list(sicp_starts.items())[:10]"
1972 ]
1973 },
1974 {
1975 "cell_type": "code",
1976 "execution_count": 26,
1977 "metadata": {
1978 "collapsed": false
1979 },
1980 "outputs": [
1981 {
1982 "data": {
1983 "text/plain": [
1984 "[None]"
1985 ]
1986 },
1987 "execution_count": 26,
1988 "metadata": {},
1989 "output_type": "execute_result"
1990 }
1991 ],
1992 "source": [
1993 "list(sicp_counts[('Gas', 'Meters', '.')].elements())"
1994 ]
1995 },
1996 {
1997 "cell_type": "code",
1998 "execution_count": 27,
1999 "metadata": {
2000 "collapsed": false
2001 },
2002 "outputs": [
2003 {
2004 "data": {
2005 "text/plain": [
2006 "('The', 'constructor', 'for')"
2007 ]
2008 },
2009 "execution_count": 27,
2010 "metadata": {},
2011 "output_type": "execute_result"
2012 }
2013 ],
2014 "source": [
2015 "random.choice(list(sicp_starts.elements()))"
2016 ]
2017 },
2018 {
2019 "cell_type": "code",
2020 "execution_count": 28,
2021 "metadata": {
2022 "collapsed": false
2023 },
2024 "outputs": [
2025 {
2026 "data": {
2027 "text/plain": [
2028 "('+', 'or')"
2029 ]
2030 },
2031 "execution_count": 28,
2032 "metadata": {},
2033 "output_type": "execute_result"
2034 }
2035 ],
2036 "source": [
2037 "t = ('as', '+')\n",
2038 "t[1:] + ('or', )"
2039 ]
2040 },
2041 {
2042 "cell_type": "code",
2043 "execution_count": 29,
2044 "metadata": {
2045 "collapsed": true
2046 },
2047 "outputs": [],
2048 "source": [
2049 "def markov_item(starts, counts, max_len=None):\n",
2050 " valid_found = False\n",
2051 " while not valid_found:\n",
2052 " i = 0\n",
2053 " current = random.choice(list(starts.elements()))\n",
2054 " chain = list(current)\n",
2055 " next_item = random.choice(list(counts[current].elements()))\n",
2056 " while next_item and ((max_len and i < max_len) or not max_len):\n",
2057 " chain += [next_item]\n",
2058 " current = current[1:] + (next_item, )\n",
2059 " i += 1\n",
2060 " next_item = random.choice(list(counts[current].elements()))\n",
2061 " # print(chain, ':', current, ':', list(counts[current].elements()), ':', next_item)\n",
2062 " if max_len and i < max_len:\n",
2063 " valid_found = True\n",
2064 " if not max_len:\n",
2065 " valid_found = True\n",
2066 " return chain"
2067 ]
2068 },
2069 {
2070 "cell_type": "code",
2071 "execution_count": 30,
2072 "metadata": {
2073 "collapsed": false
2074 },
2075 "outputs": [
2076 {
2077 "data": {
2078 "text/plain": [
2079 "'46 Alternatively , multiprocessing computers provide instructions that support atomic operations directly in hardware .'"
2080 ]
2081 },
2082 "execution_count": 30,
2083 "metadata": {},
2084 "output_type": "execute_result"
2085 }
2086 ],
2087 "source": [
2088 "' '.join(markov_item(sicp_starts, sicp_counts, 500))"
2089 ]
2090 },
2091 {
2092 "cell_type": "code",
2093 "execution_count": 31,
2094 "metadata": {
2095 "collapsed": false
2096 },
2097 "outputs": [
2098 {
2099 "data": {
2100 "text/plain": [
2101 "Counter({'.': 1})"
2102 ]
2103 },
2104 "execution_count": 31,
2105 "metadata": {},
2106 "output_type": "execute_result"
2107 }
2108 ],
2109 "source": [
2110 "sicp_counts['the', 'dispatch', 'procedure']"
2111 ]
2112 },
2113 {
2114 "cell_type": "code",
2115 "execution_count": 32,
2116 "metadata": {
2117 "collapsed": false
2118 },
2119 "outputs": [
2120 {
2121 "data": {
2122 "text/plain": [
2123 "'Continuing this process , we obtain better and better approximations to the square root .'"
2124 ]
2125 },
2126 "execution_count": 32,
2127 "metadata": {},
2128 "output_type": "execute_result"
2129 }
2130 ],
2131 "source": [
2132 "' '.join(markov_item(one_s_starts, one_s_counts, 500))"
2133 ]
2134 },
2135 {
2136 "cell_type": "code",
2137 "execution_count": 33,
2138 "metadata": {
2139 "collapsed": false
2140 },
2141 "outputs": [
2142 {
2143 "data": {
2144 "text/plain": [
2145 "(defaultdict(collections.Counter,\n",
2146 " {(',', 'we'): Counter({'obtain': 1}),\n",
2147 " ('Continuing', 'this'): Counter({'process': 1}),\n",
2148 " ('and', 'better'): Counter({'approximations': 1}),\n",
2149 " ('approximations', 'to'): Counter({'the': 1}),\n",
2150 " ('better', 'and'): Counter({'better': 1}),\n",
2151 " ('better', 'approximations'): Counter({'to': 1}),\n",
2152 " ('obtain', 'better'): Counter({'and': 1}),\n",
2153 " ('process', ','): Counter({'we': 1}),\n",
2154 " ('root', '.'): Counter({None: 1}),\n",
2155 " ('square', 'root'): Counter({'.': 1}),\n",
2156 " ('the', 'square'): Counter({'root': 1}),\n",
2157 " ('this', 'process'): Counter({',': 1}),\n",
2158 " ('to', 'the'): Counter({'square': 1}),\n",
2159 " ('we', 'obtain'): Counter({'better': 1})}),\n",
2160 " Counter({('Continuing', 'this'): 1}))"
2161 ]
2162 },
2163 "execution_count": 33,
2164 "metadata": {},
2165 "output_type": "execute_result"
2166 }
2167 ],
2168 "source": [
2169 "one_s_counts, one_s_starts"
2170 ]
2171 },
2172 {
2173 "cell_type": "code",
2174 "execution_count": 34,
2175 "metadata": {
2176 "collapsed": false
2177 },
2178 "outputs": [],
2179 "source": [
2180 "def sentence_join(tokens):\n",
2181 " sentence = ''\n",
2182 " for t in tokens:\n",
2183 " if t[-1] not in \".,:;')-\":\n",
2184 " sentence += ' '\n",
2185 " sentence += t\n",
2186 " return sentence.strip()"
2187 ]
2188 },
2189 {
2190 "cell_type": "code",
2191 "execution_count": 35,
2192 "metadata": {
2193 "collapsed": false
2194 },
2195 "outputs": [
2196 {
2197 "data": {
2198 "text/plain": [
2199 "'4, we can usually do better by taking advantage of additional structure that may be represented in two almost equivalent ways: and He has written the following two rules, we can find integers not divisible by 7 simply by accessing elements of this stream: ( define input- 1 input- 2 to 1 and allow the values to which they are listed.'"
2200 ]
2201 },
2202 "execution_count": 35,
2203 "metadata": {},
2204 "output_type": "execute_result"
2205 }
2206 ],
2207 "source": [
2208 "sentence_join(markov_item(sicp_starts, sicp_counts, 500))"
2209 ]
2210 },
2211 {
2212 "cell_type": "code",
2213 "execution_count": 36,
2214 "metadata": {
2215 "collapsed": false
2216 },
2217 "outputs": [
2218 {
2219 "data": {
2220 "text/plain": [
2221 "\"3. We model state with local state variables describing the actual object's state. Since ? x is bound in the frame. 61 Interest in logic programming peaked during the early 80s when the Japanese government began an ambitious project aimed at building superfast computers optimized to run logic programming languages. 4.1. We will compile the definition of f and start the machine, and so on, his modified eval will usually check fewer clauses than the original eval before identifying the type of the expression. This makes no difference in the values returned by the call to make- operation- exp- label dest)))) ( lambda ( pair) ( prime ? (+ ( car pair) ( cadr s))) ( set- signal ! input- 1 ( right- branch set)) ( element- of list2))) ( require ( null ? rest) result ( iter ( stream- cdr, and so will work with a system that performs arithmetic operations on complex numbers and ordinary numbers should be the sequence ( enumerate- interval stream- filter examines the stream- car s) ( if ( not ( job ? x ( computer ? type) could be constructed by evaluating the expression ( f ? y) a), and the remaining pairs: 67 Observe that the expression is a definition, so it calls compile- definition to compile code to compute the gcd when the rational numbers are constructed. 1. In order to keep the procedure general, we can print rational numbers by printing the sequence of items in a list and generate the list of words for the required part, thus preserving the illusion that all the possible values of an inexact quantity). The result of adding, subtracting, multiplying, or dividing two intervals is a function only of the widths of the intervals, it is irrelevant what a, b, x, and use this to define gcd- terms a b)))) Using the substitution model cannot do this. In general, however, we can install in our simulation program a meter'' that measures the size of the exponent we can compute n ! using the recursive factorial procedure: 27 (( lambda ( a b c) '( d e f));;; M- Eval input: ( list- union ( list first- reg) ( registers- needed relations, computing in terms of non- strict.\""
2222 ]
2223 },
2224 "execution_count": 36,
2225 "metadata": {},
2226 "output_type": "execute_result"
2227 }
2228 ],
2229 "source": [
2230 "' '.join(sentence_join(markov_item(sicp_starts, sicp_counts, 500)) for _ in range(10))"
2231 ]
2232 },
2233 {
2234 "cell_type": "code",
2235 "execution_count": 37,
2236 "metadata": {
2237 "collapsed": false
2238 },
2239 "outputs": [],
2240 "source": [
2241 "kjb = unaccent(open('king-james-bible.txt').read())\n",
2242 "kjb_starts, kjb_counts = find_counts(sentences(tokenise(kjb)), tuple_size=3)"
2243 ]
2244 },
2245 {
2246 "cell_type": "code",
2247 "execution_count": 38,
2248 "metadata": {
2249 "collapsed": false
2250 },
2251 "outputs": [
2252 {
2253 "data": {
2254 "text/plain": [
2255 "'12: 1 Help, LORD; for thy God helpeth thee. 66: 8 O bless our God, and that ye might fear the LORD from this time forth and for evermore. 9: 30 If there be laid on him a scarlet robe. 16: 38 The son of Amzi, the son of Micah, as he hath said, and be dandled upon her knees. 4: 12 And David commanded to gather together the strangers that came out of the land, saying, Because I drew him out of the temple which was in Bethlehem. 3: 24 Then he is gracious, and will give ten tribes to thee: We have such an high priest became us, who knew no sin; that we should die in the pit, and his oath unto Isaac; 105: 15 Saying, Touch not mine anointed, and be astonied one with another what they might do to Jesus. 10: 8 Neither let us commit fornication, as some of them committed, and of the south shall come into the land which I sware unto Abraham, As for Sarai thy wife, and upon the great toe of his right foot, upon the four corners of one base: and the truth shall make you free. 8: 15 We who are Jews by nature, and wert graffed contrary to nature into a good olive tree: how much more do his friends go far from him ? he that formed the earth and the heaven, they go to be with Sarah after the manner of men were they whom ye slew at Tabor ? And they said, Seven. 13: 26 Therefore their inhabitants were of small power, they were thy merchants in all sorts of wine: 10: 2 Thus saith the Lord GOD, I will get me honour upon Pharaoh, and let it be granted to the Jews: for the letter killeth, but the Holy Ghost. 13: 6 Then David put garrisons in Syria of Damascus: and when he came to David to Hebron, so he would also finish in you the same things to you in truth, in judgment, and equity cannot enter.'"
2256 ]
2257 },
2258 "execution_count": 38,
2259 "metadata": {},
2260 "output_type": "execute_result"
2261 }
2262 ],
2263 "source": [
2264 "' '.join(sentence_join(markov_item(kjb_starts, kjb_counts, 500)) for _ in range(10))"
2265 ]
2266 },
2267 {
2268 "cell_type": "code",
2269 "execution_count": 39,
2270 "metadata": {
2271 "collapsed": true
2272 },
2273 "outputs": [],
2274 "source": [
2275 "all_starts = sicp_starts + kjb_starts"
2276 ]
2277 },
2278 {
2279 "cell_type": "code",
2280 "execution_count": 40,
2281 "metadata": {
2282 "collapsed": false,
2283 "scrolled": false
2284 },
2285 "outputs": [
2286 {
2287 "data": {
2288 "text/plain": [
2289 "[(('119', ':', '112'), 1),\n",
2290 " (('29', 'There', 'is'), 1),\n",
2291 " (('To', 'answer', 'a'), 1),\n",
2292 " (('Y', '.'), 2),\n",
2293 " (('36', ':', '36'), 3),\n",
2294 " (('Ye', 'shall', 'seek'), 1),\n",
2295 " (('And', 'in', 'like'), 1),\n",
2296 " (('If', 'f', '('), 1),\n",
2297 " (('3', ':', '47'), 1),\n",
2298 " (('27', ':', '24'), 7),\n",
2299 " (('26', ':', '28'), 5),\n",
2300 " (('Control', '(', 'how'), 1),\n",
2301 " (('16', ':', '43'), 3),\n",
2302 " (('They', 'should', 'be'), 1),\n",
2303 " (('139', ':', '9'), 1),\n",
2304 " (('So', 'he', 'drew'), 1),\n",
2305 " (('If', 'the', 'symbol'), 1),\n",
2306 " (('RC', 'should', 'take'), 1),\n",
2307 " (('?', 'type', ')'), 3),\n",
2308 " (('How', 'does', 'the'), 1)]"
2309 ]
2310 },
2311 "execution_count": 40,
2312 "metadata": {},
2313 "output_type": "execute_result"
2314 }
2315 ],
2316 "source": [
2317 "list(all_starts.items())[:20]"
2318 ]
2319 },
2320 {
2321 "cell_type": "code",
2322 "execution_count": 41,
2323 "metadata": {
2324 "collapsed": true
2325 },
2326 "outputs": [],
2327 "source": [
2328 "all_counts = collections.defaultdict(collections.Counter)\n",
2329 "for k in sicp_counts:\n",
2330 " all_counts[k] = sicp_counts[k].copy()\n",
2331 "for k in kjb_counts:\n",
2332 " all_counts[k] += kjb_counts[k].copy()"
2333 ]
2334 },
2335 {
2336 "cell_type": "code",
2337 "execution_count": 42,
2338 "metadata": {
2339 "collapsed": false
2340 },
2341 "outputs": [
2342 {
2343 "data": {
2344 "text/plain": [
2345 "\"In becoming an expert programmer, just as our embedded Lisp evaluator uses primitives and control structure from the underlying Scheme system to perform arithmetic with rational numbers. What are these constants ? Similarly, find the ratios of the stack required to compute n ! by specifying that we first multiply 1 by 2, 3, 6, 10, 15,.... Exercise 3.56. Describe what kind of information ( patterns and frames) is included in this history, and how abstraction preserves for us the flexibility to consider alternate implementations. When evaluation is complete, x will be 1, y will be 2, 3 } could be represented as a pair of numbers: the x coordinate and the y coordinate. 43- 112. The timing diagram in figure 3.29, where Peter changes the account balance between the times when Paul accesses the account only very rarely. The painter that draws a line on the screen between two specified points. ( save continue) ( save n); save factorial procedure Figure 5.17: Compilation of the definition of the primitive procedure objects, so long as apply can identify and apply them by using the order of events where balance starts at 100, Peter withdraws 10, Paul withdraws 25, and yet the final value of balance. The entries at ev- application or ev- begin ev- definition- 1)) after- gcd- 2 Figure 5.10: Assigning labels to the continue register, since each level'' of the polynomials can be integers, rational numbers, complex numbers are implemented in terms of n for the total number of leaves of a tree branch- dest breakpoint broken heart bug capturing a free variable car ( primitive procedure) apply- dispatch ( test ( op definition?) ( reg ( reg proc)) ( goto ( label read- eval- print- loop)) ( else ( actual- value rather than eval: ( define ( add- streams ( scale- list integrand dt) int))) The interpreter's ability to deal with both of these can be used as a preparation for work in artificial intelligence. When we consider processes that operate on this representation.\""
2346 ]
2347 },
2348 "execution_count": 42,
2349 "metadata": {},
2350 "output_type": "execute_result"
2351 }
2352 ],
2353 "source": [
2354 "' '.join(sentence_join(markov_item(sicp_starts, sicp_counts, 500)) for _ in range(10))"
2355 ]
2356 },
2357 {
2358 "cell_type": "code",
2359 "execution_count": 43,
2360 "metadata": {
2361 "collapsed": false
2362 },
2363 "outputs": [
2364 {
2365 "data": {
2366 "text/plain": [
2367 "'8: 16 ( For six months did Joab remain there with all Israel, and he shall serve: I have not shewed them. 25: 34 But when the blade was sprung up, it withered away, because they were accursed: neither will I tempt the LORD. 16: 44 Behold, he breaketh down, and the land of thy kindred that is called a brother be a fornicator, or covetous, or an Hebrew woman, be sold unto your enemies for bondmen and bondwomen unto you: ye shall pass before your brethren the children of Israel took Amaziah king of Judah went up to eat and drink before him; and he gave him to wife Asenath the daughter of Saul, that Saul put the people in whose heart are the ways of my people: and he took her, and yet couldest not be satisfied; and thy raiment was of fine gold, amounting to six hundred talents. 32: 6 And Israel said unto him, I know thee by name, hath lifted up his eyes, and when he saw that, behold, I have put off my sackcloth, and girded himself. 28: 9 But the wise took oil in their vessels with their lamps, and took counsel how they might destroy him. And Tobiah sent letters to put me into the dust of your city, which were come again out of the camp, all Israel have transgressed thy law, nor hearkened unto me, even he lifted up his eyes, and look not back: bring my sons from far, their silver and their gold shall be able to redeem it, then it shall be given unto you. 37: 3 Trust in the LORD, that obeyeth the voice of singing men and singing women. 19: 10 Delight is not seemly for a fool: much less do lying lips a prince. 36: 18 Because of the multitude of thy strangers shall be like small dust, and made him look up: and it came to pass the selfsame day the hand of the LORD, that hate thee. 13: 58 And they called for Samson out of the midst of us, and shall rain it upon him while he is near: 55: 11 So that the face of his brother whom he slew at one time.'"
2368 ]
2369 },
2370 "execution_count": 43,
2371 "metadata": {},
2372 "output_type": "execute_result"
2373 }
2374 ],
2375 "source": [
2376 "' '.join(sentence_join(markov_item(kjb_starts, kjb_counts, 500)) for _ in range(10))"
2377 ]
2378 },
2379 {
2380 "cell_type": "code",
2381 "execution_count": 44,
2382 "metadata": {
2383 "collapsed": false
2384 },
2385 "outputs": [
2386 {
2387 "data": {
2388 "text/plain": [
2389 "\"7: 3 And when I rose in the morning as he returned into the host, and went forth before them all, saying, Behold, I have brought him forth abroad, and said, Thy servant Uriah the Hittite: thirty and seven thousand. 135: 8 Who also declared unto us your love in the truth; It is expedient for you, that ye should be guilty. 68: 10 Thy cheeks are comely with rows of jewels, thy neck with chains of gold. 2: 25 And he set up the horn. 19: 14 My kinsfolk have failed, and my people love to have it return 0. 5: 14 And when Eli heard the noise of the taking of Babylon the earth is earthly, and speaketh uprightly; he that toucheth the land, concerning the vessels that remain in this city, and had beaten the graven images into powder, and strawed it upon the altar: these are the kings of Israel ? 16: 10 And I took the little book. 17: 8 Upright men shall be made smooth; 3: 2 And the Lord make you to increase and abound in love one toward another; men with men working that which is sold shall remain in the day when I drink it new with you in weakness, and in all thy coast seven days; and also after that, when they spake unto them, Hearken unto the voice of mirth, and the seats of them that boil, where the body of Christ; that we may put them to shame that hated us. 2: 2 Then said the LORD, and of brass, and carried the people away into Babylon unto Christ are fourteen generations; and thou shalt live. 27: 13 Why dost thou strive against him ? or if thy transgressions be multiplied, and all the curses that are written in the king's dale: for he hath been worth a double hired servant to thee, and thy silver and thy gold is mine; and she shall turn to you again. 18: 43 And the LORD spake unto Moses, saying, Have thou nothing to do with thee, saith the Lord GOD.\""
2390 ]
2391 },
2392 "execution_count": 44,
2393 "metadata": {},
2394 "output_type": "execute_result"
2395 }
2396 ],
2397 "source": [
2398 "' '.join(sentence_join(markov_item(all_starts, all_counts, 500)) for _ in range(10))"
2399 ]
2400 },
2401 {
2402 "cell_type": "code",
2403 "execution_count": 45,
2404 "metadata": {
2405 "collapsed": false
2406 },
2407 "outputs": [],
2408 "source": [
2409 "all2 = unaccent(open('sicp-trimmed.txt').read() + open('king-james-bible.txt').read())\n",
2410 "all2_starts, all2_counts = find_counts(sentences(tokenise(all2)), tuple_size=2)"
2411 ]
2412 },
2413 {
2414 "cell_type": "code",
2415 "execution_count": 46,
2416 "metadata": {
2417 "collapsed": false,
2418 "scrolled": true
2419 },
2420 "outputs": [
2421 {
2422 "data": {
2423 "text/plain": [
2424 "['Typical memory systems provide a driver loop.',\n",
2425 " '2: 10 Which doeth great wonders, ye were sealed twelve thousand.',\n",
2426 " \"59: 13 ( 3 4)) ( fib n) ( newline)) We must evaluate (* x y) (/ (+ x 4))) x) ( let (( avpt (/ (+ ( sum term a) ( registers- needed seq1)) ( branch ( label ev- sequence '( proc argl continue)); linkage code machine- model > < value>. Thus, she came trembling, he that ruled throughout the seven seals thereof: it shall be forty and five hundred and ten thousand in breadth.\",\n",
2427 " '9: 22 And the chief captain that he shall do my prophets no harm: 35 And the firstborn, Jehush the second time the LORD: that I visit them, If thou cast down, even since the assembler to store the current value of the sword, and to which denominators.',\n",
2428 " 'And he blessed Joseph, saying, They hated me, Son of man: preserve my life, or a product of the mountain which is escaped ? And the children of Israel went thither a whoring after their tongues, in the dispatch as in a 1954 paper that essentially founded the earth are burned without inhabitant, and it was not, to find the value of a combination operation cross- type information.',\n",
2429 " '35: 14 And that first covenant had also seen the Father hath loved us, and mourn; 61: 8 But there is neither bond nor free: for that the brook Kidron.',\n",
2430 " '22: 18 But he that confirmeth not all of them in the procedure object followed by code to check whether the car pointer of the winepress shall not do any thing as of the congregation of the king had heard that it is symmetrical-- variables are transformed in this case, we will shew who are under the shadow of thy body, yet shall know that the LORD.',\n",
2431 " '1: 16 Wash you, an he lamb without blemish: 29 Then fled Moses at this: primitive expressions such as C.',\n",
2432 " '22: 4: 6 Then shall he appear the sign of a man to his sons that shall vex thee, O Judah, in which my people, that Pharaoh should have sorrow from them that did cleave unto the LORD shall renew their strength, O Media; all that pertained to Saul ? choose you, having never learned ? 7: 1 And the remnant of the fire shall ever be removed, and exhort.',\n",
2433 " '12: 42 For I was an angel come down from the lions ? The most common way to do unto the king said unto the king of Shimron, and slew them at all times, to identify all the army of the house, and cause the horn of oil, saith the LORD said unto him, Nay; but the LORD followed them.',\n",
2434 " '6: 2 And he said unto them; and it was told is true, then that we have students use the second seal, I have decked my bed I sought him, Is Saul also went his way, and the transgression of an hundred talents of gold, and in the way he sees it.',\n",
2435 " 'C.',\n",
2436 " '11: 11 If ye have gathered the wind ceased: and David his father: and it was hid three months in Jerusalem.',\n",
2437 " 'Next, the same with a great forsaking in the world with time.',\n",
2438 " '13: 10 Yet was she slain.',\n",
2439 " '18: 30 And Mephibosheth had a long blast with the trumpets, and cast them down into Egypt.',\n",
2440 " 'Thus make- queue, a third type.',\n",
2441 " '( Compare the two patterns (? x c ? x should be apparent that the primitive elements of its arguments, the selectors they call their lands; every one for his mouth, and the candlestick, and filled them with the voice of Israel, to him, and how he had delight in lies: they shall light the hidden wisdom, nor did him honour at his words: and the nail, and on the eighth day; and there be laid in the six hundred men; leave us not love in the face of the Lord, behold, men of Israel; and he shall eat the fruit of the fat of rams, of the air; the family of the prophets, that we have not profited them that they might not lift up thine eyes be open, and teachest him out of thy counsels of the blood upon him.',\n",
2442 " '3: 33 They give drink to every good work.',\n",
2443 " '14: 8 For he that is an everlasting covenant, whom God hath remembered her iniquities.',\n",
2444 " '46: 2 When I brake the bands of their sacrifices, which brought thee out of the nations be blessed.',\n",
2445 " '4: 10 Arise ye, when thou thyself art a God of Abraham, I was an hundred forty and five.',\n",
2446 " '2: 17 And David said, How many hired servants of Saul the son of Israel eat not any that were virgins apparelled.',\n",
2447 " '8: 15 Every raven after his servant with the edge of the altar round about Gibeah.',\n",
2448 " '32: 31 Behold, thou hast made void the counsel of peace, without changing the thunk once the longsuffering of our God.',\n",
2449 " '2: 21 Have pity upon the earth; and he had commanded him, saying, So is he which baptizeth with the notion of a variable, we will change make- instruction- sequences is thus the empty frame, [ 2 ] message- passing techniques developed in conjunction with higher- order evaluation applicative order as a branch, and begat sons and their inheritance be given to thy fathers hath given me knowledge of him.',\n",
2450 " '8: 10 And Judah saw his sons.',\n",
2451 " 'Proof of correctness of the priests praised the LORD.',\n",
2452 " '71: 19 I know thy works.',\n",
2453 " '12: 44 And Shema begat Raham, the son of Josedech, the dart, nor make any manner of sickness and all the people that they should believe on him, Let her alone: if so be that they were come from the face of the nations of the going up to the Huffman tree of figure 5.12 and examine the state of that value leads to the LORD for evermore, and in the midst of Israel numbered, and ointments, and twelve oxen, that they should be the Son, go forth as a tree non- strict nondeterminism, we can tell whether the bindings in the evaluator machine, and followed the LORD sent an angel standing in the house of feasting and joy; he hath taken increase: but I obtained mercy; that they without us: for thou art good in the midst of the expression ( and ( not (= ( remainder ( square x) Figure 3.12: Lists x: (( a b) ( first- class elements in language design who were fifty and six.']"
2454 ]
2455 },
2456 "execution_count": 46,
2457 "metadata": {},
2458 "output_type": "execute_result"
2459 }
2460 ],
2461 "source": [
2462 "[sentence_join(markov_item(all2_starts, all2_counts, 500)) for _ in range(30)]"
2463 ]
2464 },
2465 {
2466 "cell_type": "code",
2467 "execution_count": 47,
2468 "metadata": {
2469 "collapsed": false
2470 },
2471 "outputs": [],
2472 "source": [
2473 "sicp_lovecraft = unaccent(open('sicp-trimmed.txt').read() + open('lovecraft.txt').read())\n",
2474 "sl2_starts, sl2_counts = find_counts(sentences(tokenise(sicp_lovecraft)), tuple_size=2)"
2475 ]
2476 },
2477 {
2478 "cell_type": "code",
2479 "execution_count": 48,
2480 "metadata": {
2481 "collapsed": false
2482 },
2483 "outputs": [
2484 {
2485 "data": {
2486 "text/plain": [
2487 "'But on the moon shone down cold through the unseen tumbler meant a danger not to be increasing in vividness and darkened with dread of opening it or descend the wide appearance of this treatment of not and lisp- value rather than practical examples in this decrepit edifice. There were gods and the receiver, and shuddered. Bothersome forms, and the hellish image; but it is worth correcting. Three coffin- shaped man with Oriental eyes has said that Aspinwall had died of it. With the extra lens I could not have written the following procedures. The telepathic messages had not yet sufficiently trained to do with queer oils as Marceline had always done. The inlaid doors and the pc: ( define ( analyze- quoted exp) ( let (( x 3- 10 I have ever shared. Example: Arithmetic Operations The task seemed known to me in this farther void of fear and triumph seemed to evoke. Only after such a loathsome cost, and the awful concept of a branch instruction at the ends of the equator had been there and engulfed all the efforts of sturdy Colonial tenants and dewy rose- wreathed world of somnolent cerebration. Sir Wade would speak of our physical creation.'"
2488 ]
2489 },
2490 "execution_count": 48,
2491 "metadata": {},
2492 "output_type": "execute_result"
2493 }
2494 ],
2495 "source": [
2496 "\" \".join(sentence_join(markov_item(sl2_starts, sl2_counts, 500)) for _ in range(10))"
2497 ]
2498 },
2499 {
2500 "cell_type": "code",
2501 "execution_count": 49,
2502 "metadata": {
2503 "collapsed": false
2504 },
2505 "outputs": [
2506 {
2507 "data": {
2508 "text/html": [
2509 "<p>\" I buried myself in a fearful daze out into the grounds, and especially down cellar, and from what I went with them gold- green stone idol found. We had all been rather jovial, and a general alarm. We will also modify the environment as context for evaluation of the creatures to invade a lethal stupor which wards off madness by dulling the memory of those books wish a word or a 1) (* x (+ x 2))) produces the average citizen. The names of Akeley and play, and produces a painter; therefore, is called. There are, which one might spy through the remaining stretch of road, and as she had said in a scientific purpose, so that its cdr is the empty stream or a labyrinth of inexplicably fashioned metal under a certain transcribed witch- woman- her name in life imagined, the sleeping plain. To form even a sign of man's world and perhaps sinister history, philosophy, and was placed on exhibition early in the water change- detector sense- data and primitive mammals, and the like. His walks were late ( as that of the forbidden ? The rifling of Ezra Weeden, who had come to recognize the will and buried to the open sea some were incised and wholly at the hospital. Trying the match is basically the same way as for the tomb of man and his hints occasionally became concrete. Curwen's sailors would then pose as a free hand and radiating out in the neighbourhood noted. He had known Al always, were freely spoken of as two separate problems: what'' to a tremendous effect on me, and even the myriad towering stories had fallen through, revealing a wide range.</h1>"
2510 ],
2511 "text/plain": [
2512 "<IPython.core.display.HTML object>"
2513 ]
2514 },
2515 "metadata": {},
2516 "output_type": "display_data"
2517 }
2518 ],
2519 "source": [
2520 "display(HTML('<p>' + \n",
2521 " \" \".join(sentence_join(markov_item(sl2_starts, sl2_counts, 500)) for _ in range(10)) + \n",
2522 " '</h1>'))"
2523 ]
2524 },
2525 {
2526 "cell_type": "code",
2527 "execution_count": 57,
2528 "metadata": {
2529 "collapsed": false,
2530 "scrolled": true
2531 },
2532 "outputs": [
2533 {
2534 "data": {
2535 "text/plain": [
2536 "(8177, 249224)"
2537 ]
2538 },
2539 "execution_count": 57,
2540 "metadata": {},
2541 "output_type": "execute_result"
2542 }
2543 ],
2544 "source": [
2545 "sum(sicp_starts.values()), sum(sum(c.values()) for c in sicp_counts.values())"
2546 ]
2547 },
2548 {
2549 "cell_type": "code",
2550 "execution_count": 58,
2551 "metadata": {
2552 "collapsed": false,
2553 "scrolled": true
2554 },
2555 "outputs": [
2556 {
2557 "data": {
2558 "text/plain": [
2559 "(26374, 958228)"
2560 ]
2561 },
2562 "execution_count": 58,
2563 "metadata": {},
2564 "output_type": "execute_result"
2565 }
2566 ],
2567 "source": [
2568 "sum(kjb_starts.values()), sum(sum(c.values()) for c in kjb_counts.values())"
2569 ]
2570 },
2571 {
2572 "cell_type": "code",
2573 "execution_count": 91,
2574 "metadata": {
2575 "collapsed": false,
2576 "scrolled": true
2577 },
2578 "outputs": [
2579 {
2580 "data": {
2581 "text/plain": [
2582 "(24560, 683961)"
2583 ]
2584 },
2585 "execution_count": 91,
2586 "metadata": {},
2587 "output_type": "execute_result"
2588 }
2589 ],
2590 "source": [
2591 "lovecraft = unaccent(open('lovecraft-trimmed.txt').read())\n",
2592 "lovecraft_starts, lovecraft_counts = find_counts(sentences(tokenise(lovecraft)), tuple_size=2)\n",
2593 "sum(lovecraft_starts.values()), sum(sum(c.values()) for c in lovecraft_counts.values())"
2594 ]
2595 },
2596 {
2597 "cell_type": "code",
2598 "execution_count": 92,
2599 "metadata": {
2600 "collapsed": false
2601 },
2602 "outputs": [
2603 {
2604 "data": {
2605 "text/plain": [
2606 "3.225388284211814"
2607 ]
2608 },
2609 "execution_count": 92,
2610 "metadata": {},
2611 "output_type": "execute_result"
2612 }
2613 ],
2614 "source": [
2615 "sum(kjb_starts.values()) / sum(sicp_starts.values())"
2616 ]
2617 },
2618 {
2619 "cell_type": "code",
2620 "execution_count": 93,
2621 "metadata": {
2622 "collapsed": false
2623 },
2624 "outputs": [
2625 {
2626 "data": {
2627 "text/plain": [
2628 "3.8448464032356435"
2629 ]
2630 },
2631 "execution_count": 93,
2632 "metadata": {},
2633 "output_type": "execute_result"
2634 }
2635 ],
2636 "source": [
2637 "sum(sum(c.values()) for c in kjb_counts.values()) / sum(sum(c.values()) for c in sicp_counts.values())"
2638 ]
2639 },
2640 {
2641 "cell_type": "code",
2642 "execution_count": 94,
2643 "metadata": {
2644 "collapsed": false
2645 },
2646 "outputs": [
2647 {
2648 "data": {
2649 "text/plain": [
2650 "(3.0035465329582975, 2.7443625012037365)"
2651 ]
2652 },
2653 "execution_count": 94,
2654 "metadata": {},
2655 "output_type": "execute_result"
2656 }
2657 ],
2658 "source": [
2659 "sum(lovecraft_starts.values()) / sum(sicp_starts.values()), \\\n",
2660 "sum(sum(c.values()) for c in lovecraft_counts.values()) / sum(sum(c.values()) for c in sicp_counts.values())"
2661 ]
2662 },
2663 {
2664 "cell_type": "code",
2665 "execution_count": 95,
2666 "metadata": {
2667 "collapsed": true
2668 },
2669 "outputs": [],
2670 "source": [
2671 "def scale_merge(left_starts, left_start_scale, left_counts, left_count_scale, \n",
2672 " right_starts, right_start_scale, right_counts, right_count_scale):\n",
2673 " starts = collections.Counter()\n",
2674 " counts = collections.defaultdict(collections.Counter)\n",
2675 " \n",
2676 " for k, n in left_starts.items():\n",
2677 " starts[k] = n * left_start_scale\n",
2678 " for k, n in right_starts.items():\n",
2679 " starts[k] += n * right_start_scale\n",
2680 " \n",
2681 " for k in left_counts:\n",
2682 " for j in left_counts[k]:\n",
2683 " counts[k][j] = left_counts[k][j] * left_count_scale\n",
2684 " \n",
2685 " for k in right_counts:\n",
2686 " for j in right_counts[k]:\n",
2687 " counts[k][j] = right_counts[k][j] * right_count_scale\n",
2688 "\n",
2689 " return starts, counts"
2690 ]
2691 },
2692 {
2693 "cell_type": "code",
2694 "execution_count": 96,
2695 "metadata": {
2696 "collapsed": true
2697 },
2698 "outputs": [],
2699 "source": [
2700 "sk_starts, sk_counts = scale_merge(sicp_starts, 3, sicp_counts, 4, kjb_starts, 1, kjb_counts, 1)"
2701 ]
2702 },
2703 {
2704 "cell_type": "code",
2705 "execution_count": 97,
2706 "metadata": {
2707 "collapsed": false
2708 },
2709 "outputs": [
2710 {
2711 "data": {
2712 "text/html": [
2713 "<p>The voltage response v of the circuit ( summarized by v C, the Lisp evaluator would be to process the query ( and ( supervisor ? x ? middle- manager in the wheel rule of section 4.4.</p><p>3: 11 Therefore thy gates shall be burned therein.</p><p>36: 3 And say to the forest of the south shall be strong, and eat bread.</p><p>The servants said unto him, My LORD, if thou go with us; and he gave him, and saith unto them, ready to depart on the morrow, which was set over the affairs of this life, and out of the hand that wrote.</p><p>5: 6 And the sword shall abide on his cities, and with an extreme burning, and the vintage shall reach unto Azal: yea, though many false witnesses came, yet found they none.</p><p>He adds all the weekly meetings of the firm to the Microshaft data base by expressions of the query system repeatedly reads input expressions.</p><p>Specifically: The evaluator enables us to increase the modularity of our systems by encapsulating, or hiding,'' parts of the earth, that the LORD cast out before the LORD, as the LORD commanded to give unto her the cup of the Lord Jesus Christ, was in number three hundred thousand choice men, able to deliver you from his iniquities.</p><p>Cy agrees that Ben is right about the behavior of a quantity x as a function of n, to be applied in the same way as in the night visions, and your goodliest young men, and they slew all the males from a month old and upward, from the east side unto the west side, a portion for Asher.</p><p>13: 1 And the LORD said, Bring the portion which I gave thee, of the children of Israel before their idols, and all that is in use, existing programs that define procedures with these names bound as local variables.</p><p>For example, the complex number z = x + iy ( where i 2 =- 1) can be expressed as sequence operations.</p>"
2714 ],
2715 "text/plain": [
2716 "<IPython.core.display.HTML object>"
2717 ]
2718 },
2719 "metadata": {},
2720 "output_type": "display_data"
2721 }
2722 ],
2723 "source": [
2724 "display(HTML(cat('<p>' + sentence_join(markov_item(sk_starts, sk_counts, 500)) + '</p>' for _ in range(10))))"
2725 ]
2726 },
2727 {
2728 "cell_type": "code",
2729 "execution_count": 98,
2730 "metadata": {
2731 "collapsed": true
2732 },
2733 "outputs": [],
2734 "source": [
2735 "sl_starts, sl_counts = scale_merge(sicp_starts, 3, sicp_counts, 2, lovecraft_starts, 1, lovecraft_counts, 1)"
2736 ]
2737 },
2738 {
2739 "cell_type": "code",
2740 "execution_count": 99,
2741 "metadata": {
2742 "collapsed": false
2743 },
2744 "outputs": [
2745 {
2746 "data": {
2747 "text/html": [
2748 "<p>Akeley had been a nightmare of buzzing voices, and a partial relief from the second signal ordering a general plan was to see him.</p><p>From the handwriting ? Do you wonder how many farmhouses burnt to ashes.</p><p>1 that the interpreter runs, it follows a process that must be made to a system in which complex numbers are naturally represented as ordered pairs.</p><p>Then the GCD is the other argument for * ( assign proc ( op ( opcompiled- procedure- entry) ( let (( segments ( segments agenda) ( cdr z)) ( append (( save, first- reg) ( modifies- register ? modularity, [ 2 ] unify- match query- pattern frame) ( let (( t1 -> t2 ( apply- generic op ( t1 -> t2 ( get- register- contents < machine- model > < register- name > ( op vector- set!) ( reg new- cdrs) ( reg env));; begin actual procedure body ( save continue) ( goto ( reg val)) Now we can try our rational- number operations in terms of abstract selectors and constructors for this notation such that our derivative program still works ? 2.3.</p><p>And yet it was, or that those horrible cylinders and machines- and the desolate salt marshes, desolate and unpeopled, chaos was complete long before I die I should be as it shines on certain others- even a perfect solution, because of a long and familiar doom.</p><p>He defines the following two rules: For any list y, the empty list, then the smaller number in the sequence.</p><p>Of all its curious influence call up the sixty- three at the cryptic parchment; but upon examining the bank as if to accentuate by their fellow- men.</p><p>When he read this, but kept it in the hideous culmination of events that depopulated the whole scene of such data as my torch could not doubt the power of its unfamiliar temporary form, was of little quaint fishing towns that climbed from the Rowley road drew so close to the fourth floor, and likewise any metal case- resulted in a vast Italian quarter, and a feeling that some of our camel drivers older than Memphis and mankind.</p><p>\" Ye see all too close to the sight.</p><p>In the roseal dawn the burghers of Milwaukee rose to panic.</p>"
2749 ],
2750 "text/plain": [
2751 "<IPython.core.display.HTML object>"
2752 ]
2753 },
2754 "metadata": {},
2755 "output_type": "display_data"
2756 }
2757 ],
2758 "source": [
2759 "display(HTML(cat('<p>' + sentence_join(markov_item(sl_starts, sl_counts, 500)) + '</p>' for _ in range(10))))"
2760 ]
2761 },
2762 {
2763 "cell_type": "code",
2764 "execution_count": 100,
2765 "metadata": {
2766 "collapsed": true
2767 },
2768 "outputs": [],
2769 "source": [
2770 "lk_starts, lk_counts = scale_merge(lovecraft_starts, 1, lovecraft_counts, 1, kjb_starts, 1, kjb_counts, 1)"
2771 ]
2772 },
2773 {
2774 "cell_type": "code",
2775 "execution_count": 101,
2776 "metadata": {
2777 "collapsed": false
2778 },
2779 "outputs": [
2780 {
2781 "data": {
2782 "text/html": [
2783 "<p>I say harass, because they have when they tore open the massive pre- Revolutionary homes with their nails, so I think almost hope that no man has yet dared not seem to have the ether- resisting wings characteristic of him, and many things.</p><p>2: 31 And he said unto him, Caesar's.</p><p>33: 9 Nevertheless the priests of the high places of Baal, which are called might receive the fruits of your ground; neither shall thy tears run down like a river, and every tree therein: for then should ye go after vain things, which thou hast redeemed.</p><p>He could scarcely decipher what they portrayed, and almost ghastly results, since it could be done.</p><p>27: 12 These shall stand upon the mount of Olives, then sent Jesus two disciples, 21: 6 And Ezra blessed the LORD God, even the porch of the house of the LORD was with Jehoshaphat, because he had heard therefore that he was Rebekah's son: and I will remove far off from my presence: 23: 11 And they found written in the law.</p><p>Was he just come up.</p><p>Lanterns that shudder and death are penned.</p><p>But this was a cross of new developments in those fields where the west window, and with haste To get this impetus, though to my flagging quest; for thousands and tens of thousands of light which would warn the waiting horse gave a thrill that a small man probably having a kind of a chaotic pronunciation.</p><p>Beneath him dozens of queer things I had not traversed before; and though progress was very close indeed.</p><p>52: 23 And they were exceeding many: neither was the word of the LORD came again unto Jerusalem and Judah, Hanani, Eliathah, Giddalti, and Romamtiezer, Joshbekashah, Mallothi, Hothir, and Mahazioth: 25: 27 Over against the border were at the north bay of the salt sea: this shall not be ashamed of Chemosh, as the manner of all the tribes of the children of Ammon said to Hanun, Thinkest thou that David doth honour thy father, that he may die: because she hath not sinned.</p>"
2784 ],
2785 "text/plain": [
2786 "<IPython.core.display.HTML object>"
2787 ]
2788 },
2789 "metadata": {},
2790 "output_type": "display_data"
2791 }
2792 ],
2793 "source": [
2794 "display(HTML(cat('<p>' + sentence_join(markov_item(lk_starts, lk_counts, 500)) + '</p>' for _ in range(10))))"
2795 ]
2796 },
2797 {
2798 "cell_type": "code",
2799 "execution_count": null,
2800 "metadata": {
2801 "collapsed": true
2802 },
2803 "outputs": [],
2804 "source": []
2805 }
2806 ],
2807 "metadata": {
2808 "kernelspec": {
2809 "display_name": "Python 3",
2810 "language": "python",
2811 "name": "python3"
2812 },
2813 "language_info": {
2814 "codemirror_mode": {
2815 "name": "ipython",
2816 "version": 3
2817 },
2818 "file_extension": ".py",
2819 "mimetype": "text/x-python",
2820 "name": "python",
2821 "nbconvert_exporter": "python",
2822 "pygments_lexer": "ipython3",
2823 "version": "3.4.3+"
2824 }
2825 },
2826 "nbformat": 4,
2827 "nbformat_minor": 0
2828 }