ff393e1d8901c4519555735333659fa44dc18b44
[name-generation.git] / markov / markov.ipynb
1 {
2 "cells": [
3 {
4 "cell_type": "code",
5 "execution_count": 53,
6 "metadata": {
7 "collapsed": true
8 },
9 "outputs": [],
10 "source": [
11 "import re\n",
12 "import string\n",
13 "import collections\n",
14 "import unicodedata\n",
15 "import random"
16 ]
17 },
18 {
19 "cell_type": "code",
20 "execution_count": 2,
21 "metadata": {
22 "collapsed": true
23 },
24 "outputs": [],
25 "source": [
26 "sample_text = \"\"\"Continuing this process, we obtain better and better approximations to the square root.\n",
27 "Now let's formalize the process in terms of procedures. We start with a value for the radicand (the\n",
28 "number whose square root we are trying to compute) and a value for the guess. If the guess is good\n",
29 "enough for our purposes, we are done; if not, we must repeat the process with an improved guess. We\n",
30 "write this basic strategy as a procedure:\n",
31 "(define (sqrt-iter guess x)\n",
32 "(if (good-enough? guess x)\n",
33 "guess\n",
34 "(sqrt-iter (improve guess x)\n",
35 "x)))\n",
36 "A guess is improved by averaging it with the quotient of the radicand and the old guess:\n",
37 "(define (improve guess x)\n",
38 "(average guess (/ x guess)))\n",
39 "where\n",
40 "\n",
41 "\f",
42 "(define (average x y)\n",
43 "(/ (+ x y) 2))\n",
44 "We also have to say what we mean by ''good enough.'' The following will do for illustration, but it is\n",
45 "not really a very good test. (See exercise 1.7.) The idea is to improve the answer until it is close\n",
46 "enough so that its square differs from the radicand by less than a predetermined tolerance (here\n",
47 "0.001): 22\n",
48 "(define (good-enough? guess x)\n",
49 "(< (abs (- (square guess) x)) 0.001))\n",
50 "Finally, we need a way to get started. For instance, we can always guess that the square root of any\n",
51 "number is 1: 23\n",
52 "(define (sqrt x)\n",
53 "(sqrt-iter 1.0 x))\n",
54 "If we type these definitions to the interpreter, we can use sqrt just as we can use any procedure:\"\"\"\n",
55 "\n",
56 "small_text = '''A guess is improved by averaging it with the quotient of the radicand and the old guess:\n",
57 "(define (improve guess x)\n",
58 "((average guess (/ x guess)))'''\n",
59 "\n",
60 "sentence_boundary = 'in terms of procedures. We start with 0.123 and some'\n",
61 "\n",
62 "double_quotes = \"let's see how such a ''circular'' definition\""
63 ]
64 },
65 {
66 "cell_type": "code",
67 "execution_count": 3,
68 "metadata": {
69 "collapsed": false
70 },
71 "outputs": [
72 {
73 "data": {
74 "text/plain": [
75 "'!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'"
76 ]
77 },
78 "execution_count": 3,
79 "metadata": {},
80 "output_type": "execute_result"
81 }
82 ],
83 "source": [
84 "string.punctuation"
85 ]
86 },
87 {
88 "cell_type": "code",
89 "execution_count": 4,
90 "metadata": {
91 "collapsed": false
92 },
93 "outputs": [
94 {
95 "data": {
96 "text/plain": [
97 "'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'"
98 ]
99 },
100 "execution_count": 4,
101 "metadata": {},
102 "output_type": "execute_result"
103 }
104 ],
105 "source": [
106 "string.ascii_letters + string.digits + string.punctuation"
107 ]
108 },
109 {
110 "cell_type": "code",
111 "execution_count": 5,
112 "metadata": {
113 "collapsed": false
114 },
115 "outputs": [
116 {
117 "data": {
118 "text/plain": [
119 "re.compile(r'[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+',\n",
120 "re.UNICODE)"
121 ]
122 },
123 "execution_count": 5,
124 "metadata": {},
125 "output_type": "execute_result"
126 }
127 ],
128 "source": [
129 "token_pattern = re.compile(r'[^{}]+'.format(re.escape(string.ascii_letters + string.digits + string.punctuation)))\n",
130 "token_pattern"
131 ]
132 },
133 {
134 "cell_type": "code",
135 "execution_count": 6,
136 "metadata": {
137 "collapsed": false
138 },
139 "outputs": [
140 {
141 "data": {
142 "text/plain": [
143 "re.compile(r'(\\d+\\.\\d+|\\w+\\'\\w+|[\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+(?=\\w)|(?<=\\w)[\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+|[\\!\\\"\\#\\$\\%\\&\\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^_\\`\\{\\|\\}\\~]+$)',\n",
144 "re.UNICODE)"
145 ]
146 },
147 "execution_count": 6,
148 "metadata": {},
149 "output_type": "execute_result"
150 }
151 ],
152 "source": [
153 "punctuation_pattern = re.compile('(\\d+\\.\\d+|\\w+\\'\\w+|[{0}]+(?=\\w)|(?<=\\w)[{0}]+|[{0}]+$)'.format(re.escape(string.punctuation)))\n",
154 "punctuation_pattern"
155 ]
156 },
157 {
158 "cell_type": "code",
159 "execution_count": 7,
160 "metadata": {
161 "collapsed": false
162 },
163 "outputs": [
164 {
165 "data": {
166 "text/plain": [
167 "['', '((', 'average']"
168 ]
169 },
170 "execution_count": 7,
171 "metadata": {},
172 "output_type": "execute_result"
173 }
174 ],
175 "source": [
176 "re.split(r'(\\(+(?=\\w))', '((average')"
177 ]
178 },
179 {
180 "cell_type": "code",
181 "execution_count": 8,
182 "metadata": {
183 "collapsed": false,
184 "scrolled": true
185 },
186 "outputs": [
187 {
188 "data": {
189 "text/plain": [
190 "['A',\n",
191 " 'guess',\n",
192 " 'is',\n",
193 " 'improved',\n",
194 " 'by',\n",
195 " 'averaging',\n",
196 " 'it',\n",
197 " 'with',\n",
198 " 'the',\n",
199 " 'quotient',\n",
200 " 'of',\n",
201 " 'the',\n",
202 " 'radicand',\n",
203 " 'and',\n",
204 " 'the',\n",
205 " 'old',\n",
206 " 'guess',\n",
207 " ':',\n",
208 " '(',\n",
209 " 'define',\n",
210 " '(',\n",
211 " 'improve',\n",
212 " 'guess',\n",
213 " 'x',\n",
214 " ')',\n",
215 " '((',\n",
216 " 'average',\n",
217 " 'guess',\n",
218 " '(/',\n",
219 " 'x',\n",
220 " 'guess',\n",
221 " ')))']"
222 ]
223 },
224 "execution_count": 8,
225 "metadata": {},
226 "output_type": "execute_result"
227 }
228 ],
229 "source": [
230 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, small_text)]\n",
231 " for ch in gp if ch]"
232 ]
233 },
234 {
235 "cell_type": "code",
236 "execution_count": 9,
237 "metadata": {
238 "collapsed": false
239 },
240 "outputs": [
241 {
242 "data": {
243 "text/plain": [
244 "['in',\n",
245 " 'terms',\n",
246 " 'of',\n",
247 " 'procedures',\n",
248 " '.',\n",
249 " 'We',\n",
250 " 'start',\n",
251 " 'with',\n",
252 " '0.123',\n",
253 " 'and',\n",
254 " 'some']"
255 ]
256 },
257 "execution_count": 9,
258 "metadata": {},
259 "output_type": "execute_result"
260 }
261 ],
262 "source": [
263 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, sentence_boundary)]\n",
264 " for ch in gp if ch]"
265 ]
266 },
267 {
268 "cell_type": "code",
269 "execution_count": 10,
270 "metadata": {
271 "collapsed": false
272 },
273 "outputs": [
274 {
275 "data": {
276 "text/plain": [
277 "[\"let's\", 'see', 'how', 'such', 'a', \"''\", 'circular', \"''\", 'definition']"
278 ]
279 },
280 "execution_count": 10,
281 "metadata": {},
282 "output_type": "execute_result"
283 }
284 ],
285 "source": [
286 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, double_quotes)]\n",
287 " for ch in gp if ch]"
288 ]
289 },
290 {
291 "cell_type": "code",
292 "execution_count": 11,
293 "metadata": {
294 "collapsed": false
295 },
296 "outputs": [
297 {
298 "data": {
299 "text/plain": [
300 "['Continuing',\n",
301 " 'this',\n",
302 " 'process',\n",
303 " ',',\n",
304 " 'we',\n",
305 " 'obtain',\n",
306 " 'better',\n",
307 " 'and',\n",
308 " 'better',\n",
309 " 'approximations',\n",
310 " 'to',\n",
311 " 'the',\n",
312 " 'square',\n",
313 " 'root',\n",
314 " '.',\n",
315 " 'Now',\n",
316 " \"let's\",\n",
317 " 'formalize',\n",
318 " 'the',\n",
319 " 'process',\n",
320 " 'in',\n",
321 " 'terms',\n",
322 " 'of',\n",
323 " 'procedures',\n",
324 " '.',\n",
325 " 'We',\n",
326 " 'start',\n",
327 " 'with',\n",
328 " 'a',\n",
329 " 'value',\n",
330 " 'for',\n",
331 " 'the',\n",
332 " 'radicand',\n",
333 " '(',\n",
334 " 'the',\n",
335 " 'number',\n",
336 " 'whose',\n",
337 " 'square',\n",
338 " 'root',\n",
339 " 'we',\n",
340 " 'are',\n",
341 " 'trying',\n",
342 " 'to',\n",
343 " 'compute',\n",
344 " ')',\n",
345 " 'and',\n",
346 " 'a',\n",
347 " 'value',\n",
348 " 'for',\n",
349 " 'the',\n",
350 " 'guess',\n",
351 " '.',\n",
352 " 'If',\n",
353 " 'the',\n",
354 " 'guess',\n",
355 " 'is',\n",
356 " 'good',\n",
357 " 'enough',\n",
358 " 'for',\n",
359 " 'our',\n",
360 " 'purposes',\n",
361 " ',',\n",
362 " 'we',\n",
363 " 'are',\n",
364 " 'done',\n",
365 " ';',\n",
366 " 'if',\n",
367 " 'not',\n",
368 " ',',\n",
369 " 'we',\n",
370 " 'must',\n",
371 " 'repeat',\n",
372 " 'the',\n",
373 " 'process',\n",
374 " 'with',\n",
375 " 'an',\n",
376 " 'improved',\n",
377 " 'guess',\n",
378 " '.',\n",
379 " 'We',\n",
380 " 'write',\n",
381 " 'this',\n",
382 " 'basic',\n",
383 " 'strategy',\n",
384 " 'as',\n",
385 " 'a',\n",
386 " 'procedure',\n",
387 " ':',\n",
388 " '(',\n",
389 " 'define',\n",
390 " '(',\n",
391 " 'sqrt',\n",
392 " '-',\n",
393 " 'iter',\n",
394 " 'guess',\n",
395 " 'x',\n",
396 " ')',\n",
397 " '(',\n",
398 " 'if',\n",
399 " '(',\n",
400 " 'good',\n",
401 " '-',\n",
402 " 'enough',\n",
403 " '?',\n",
404 " 'guess',\n",
405 " 'x',\n",
406 " ')',\n",
407 " 'guess',\n",
408 " '(',\n",
409 " 'sqrt',\n",
410 " '-',\n",
411 " 'iter',\n",
412 " '(',\n",
413 " 'improve',\n",
414 " 'guess',\n",
415 " 'x',\n",
416 " ')',\n",
417 " 'x',\n",
418 " ')))',\n",
419 " 'A',\n",
420 " 'guess',\n",
421 " 'is',\n",
422 " 'improved',\n",
423 " 'by',\n",
424 " 'averaging',\n",
425 " 'it',\n",
426 " 'with',\n",
427 " 'the',\n",
428 " 'quotient',\n",
429 " 'of',\n",
430 " 'the',\n",
431 " 'radicand',\n",
432 " 'and',\n",
433 " 'the',\n",
434 " 'old',\n",
435 " 'guess',\n",
436 " ':',\n",
437 " '(',\n",
438 " 'define',\n",
439 " '(',\n",
440 " 'improve',\n",
441 " 'guess',\n",
442 " 'x',\n",
443 " ')',\n",
444 " '(',\n",
445 " 'average',\n",
446 " 'guess',\n",
447 " '(/',\n",
448 " 'x',\n",
449 " 'guess',\n",
450 " ')))',\n",
451 " 'where',\n",
452 " '(',\n",
453 " 'define',\n",
454 " '(',\n",
455 " 'average',\n",
456 " 'x',\n",
457 " 'y',\n",
458 " ')',\n",
459 " '(/',\n",
460 " '(+',\n",
461 " 'x',\n",
462 " 'y',\n",
463 " ')',\n",
464 " '2',\n",
465 " '))',\n",
466 " 'We',\n",
467 " 'also',\n",
468 " 'have',\n",
469 " 'to',\n",
470 " 'say',\n",
471 " 'what',\n",
472 " 'we',\n",
473 " 'mean',\n",
474 " 'by',\n",
475 " \"''\",\n",
476 " 'good',\n",
477 " 'enough',\n",
478 " \".''\",\n",
479 " 'The',\n",
480 " 'following',\n",
481 " 'will',\n",
482 " 'do',\n",
483 " 'for',\n",
484 " 'illustration',\n",
485 " ',',\n",
486 " 'but',\n",
487 " 'it',\n",
488 " 'is',\n",
489 " 'not',\n",
490 " 'really',\n",
491 " 'a',\n",
492 " 'very',\n",
493 " 'good',\n",
494 " 'test',\n",
495 " '.',\n",
496 " '(',\n",
497 " 'See',\n",
498 " 'exercise',\n",
499 " '1.7',\n",
500 " '.)',\n",
501 " 'The',\n",
502 " 'idea',\n",
503 " 'is',\n",
504 " 'to',\n",
505 " 'improve',\n",
506 " 'the',\n",
507 " 'answer',\n",
508 " 'until',\n",
509 " 'it',\n",
510 " 'is',\n",
511 " 'close',\n",
512 " 'enough',\n",
513 " 'so',\n",
514 " 'that',\n",
515 " 'its',\n",
516 " 'square',\n",
517 " 'differs',\n",
518 " 'from',\n",
519 " 'the',\n",
520 " 'radicand',\n",
521 " 'by',\n",
522 " 'less',\n",
523 " 'than',\n",
524 " 'a',\n",
525 " 'predetermined',\n",
526 " 'tolerance',\n",
527 " '(',\n",
528 " 'here',\n",
529 " '0.001',\n",
530 " '):',\n",
531 " '22',\n",
532 " '(',\n",
533 " 'define',\n",
534 " '(',\n",
535 " 'good',\n",
536 " '-',\n",
537 " 'enough',\n",
538 " '?',\n",
539 " 'guess',\n",
540 " 'x',\n",
541 " ')',\n",
542 " '(<',\n",
543 " '(',\n",
544 " 'abs',\n",
545 " '(-',\n",
546 " '(',\n",
547 " 'square',\n",
548 " 'guess',\n",
549 " ')',\n",
550 " 'x',\n",
551 " '))',\n",
552 " '0.001',\n",
553 " '))',\n",
554 " 'Finally',\n",
555 " ',',\n",
556 " 'we',\n",
557 " 'need',\n",
558 " 'a',\n",
559 " 'way',\n",
560 " 'to',\n",
561 " 'get',\n",
562 " 'started',\n",
563 " '.',\n",
564 " 'For',\n",
565 " 'instance',\n",
566 " ',',\n",
567 " 'we',\n",
568 " 'can',\n",
569 " 'always',\n",
570 " 'guess',\n",
571 " 'that',\n",
572 " 'the',\n",
573 " 'square',\n",
574 " 'root',\n",
575 " 'of',\n",
576 " 'any',\n",
577 " 'number',\n",
578 " 'is',\n",
579 " '1',\n",
580 " ':',\n",
581 " '23',\n",
582 " '(',\n",
583 " 'define',\n",
584 " '(',\n",
585 " 'sqrt',\n",
586 " 'x',\n",
587 " ')',\n",
588 " '(',\n",
589 " 'sqrt',\n",
590 " '-',\n",
591 " 'iter',\n",
592 " '1.0',\n",
593 " 'x',\n",
594 " '))',\n",
595 " 'If',\n",
596 " 'we',\n",
597 " 'type',\n",
598 " 'these',\n",
599 " 'definitions',\n",
600 " 'to',\n",
601 " 'the',\n",
602 " 'interpreter',\n",
603 " ',',\n",
604 " 'we',\n",
605 " 'can',\n",
606 " 'use',\n",
607 " 'sqrt',\n",
608 " 'just',\n",
609 " 'as',\n",
610 " 'we',\n",
611 " 'can',\n",
612 " 'use',\n",
613 " 'any',\n",
614 " 'procedure',\n",
615 " ':']"
616 ]
617 },
618 "execution_count": 11,
619 "metadata": {},
620 "output_type": "execute_result"
621 }
622 ],
623 "source": [
624 "[ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, sample_text)]\n",
625 " for ch in gp if ch]"
626 ]
627 },
628 {
629 "cell_type": "code",
630 "execution_count": 12,
631 "metadata": {
632 "collapsed": true
633 },
634 "outputs": [],
635 "source": [
636 "def tokenise(text):\n",
637 " return [ch for gp in [re.split(punctuation_pattern, t) for t in re.split(token_pattern, text)]\n",
638 " for ch in gp if ch]"
639 ]
640 },
641 {
642 "cell_type": "code",
643 "execution_count": 13,
644 "metadata": {
645 "collapsed": false
646 },
647 "outputs": [
648 {
649 "data": {
650 "text/plain": [
651 "['Continuing',\n",
652 " 'this',\n",
653 " 'process',\n",
654 " ',',\n",
655 " 'we',\n",
656 " 'obtain',\n",
657 " 'better',\n",
658 " 'and',\n",
659 " 'better',\n",
660 " 'approximations',\n",
661 " 'to',\n",
662 " 'the',\n",
663 " 'square',\n",
664 " 'root',\n",
665 " '.',\n",
666 " 'Now',\n",
667 " \"let's\",\n",
668 " 'formalize',\n",
669 " 'the',\n",
670 " 'process',\n",
671 " 'in',\n",
672 " 'terms',\n",
673 " 'of',\n",
674 " 'procedures',\n",
675 " '.',\n",
676 " 'We',\n",
677 " 'start',\n",
678 " 'with',\n",
679 " 'a',\n",
680 " 'value',\n",
681 " 'for',\n",
682 " 'the',\n",
683 " 'radicand',\n",
684 " '(',\n",
685 " 'the',\n",
686 " 'number',\n",
687 " 'whose',\n",
688 " 'square',\n",
689 " 'root',\n",
690 " 'we',\n",
691 " 'are',\n",
692 " 'trying',\n",
693 " 'to',\n",
694 " 'compute',\n",
695 " ')',\n",
696 " 'and',\n",
697 " 'a',\n",
698 " 'value',\n",
699 " 'for',\n",
700 " 'the',\n",
701 " 'guess',\n",
702 " '.',\n",
703 " 'If',\n",
704 " 'the',\n",
705 " 'guess',\n",
706 " 'is',\n",
707 " 'good',\n",
708 " 'enough',\n",
709 " 'for',\n",
710 " 'our',\n",
711 " 'purposes',\n",
712 " ',',\n",
713 " 'we',\n",
714 " 'are',\n",
715 " 'done',\n",
716 " ';',\n",
717 " 'if',\n",
718 " 'not',\n",
719 " ',',\n",
720 " 'we',\n",
721 " 'must',\n",
722 " 'repeat',\n",
723 " 'the',\n",
724 " 'process',\n",
725 " 'with',\n",
726 " 'an',\n",
727 " 'improved',\n",
728 " 'guess',\n",
729 " '.',\n",
730 " 'We',\n",
731 " 'write',\n",
732 " 'this',\n",
733 " 'basic',\n",
734 " 'strategy',\n",
735 " 'as',\n",
736 " 'a',\n",
737 " 'procedure',\n",
738 " ':',\n",
739 " '(',\n",
740 " 'define',\n",
741 " '(',\n",
742 " 'sqrt',\n",
743 " '-',\n",
744 " 'iter',\n",
745 " 'guess',\n",
746 " 'x',\n",
747 " ')',\n",
748 " '(',\n",
749 " 'if',\n",
750 " '(',\n",
751 " 'good',\n",
752 " '-',\n",
753 " 'enough',\n",
754 " '?',\n",
755 " 'guess',\n",
756 " 'x',\n",
757 " ')',\n",
758 " 'guess',\n",
759 " '(',\n",
760 " 'sqrt',\n",
761 " '-',\n",
762 " 'iter',\n",
763 " '(',\n",
764 " 'improve',\n",
765 " 'guess',\n",
766 " 'x',\n",
767 " ')',\n",
768 " 'x',\n",
769 " ')))',\n",
770 " 'A',\n",
771 " 'guess',\n",
772 " 'is',\n",
773 " 'improved',\n",
774 " 'by',\n",
775 " 'averaging',\n",
776 " 'it',\n",
777 " 'with',\n",
778 " 'the',\n",
779 " 'quotient',\n",
780 " 'of',\n",
781 " 'the',\n",
782 " 'radicand',\n",
783 " 'and',\n",
784 " 'the',\n",
785 " 'old',\n",
786 " 'guess',\n",
787 " ':',\n",
788 " '(',\n",
789 " 'define',\n",
790 " '(',\n",
791 " 'improve',\n",
792 " 'guess',\n",
793 " 'x',\n",
794 " ')',\n",
795 " '(',\n",
796 " 'average',\n",
797 " 'guess',\n",
798 " '(/',\n",
799 " 'x',\n",
800 " 'guess',\n",
801 " ')))',\n",
802 " 'where',\n",
803 " '(',\n",
804 " 'define',\n",
805 " '(',\n",
806 " 'average',\n",
807 " 'x',\n",
808 " 'y',\n",
809 " ')',\n",
810 " '(/',\n",
811 " '(+',\n",
812 " 'x',\n",
813 " 'y',\n",
814 " ')',\n",
815 " '2',\n",
816 " '))',\n",
817 " 'We',\n",
818 " 'also',\n",
819 " 'have',\n",
820 " 'to',\n",
821 " 'say',\n",
822 " 'what',\n",
823 " 'we',\n",
824 " 'mean',\n",
825 " 'by',\n",
826 " \"''\",\n",
827 " 'good',\n",
828 " 'enough',\n",
829 " \".''\",\n",
830 " 'The',\n",
831 " 'following',\n",
832 " 'will',\n",
833 " 'do',\n",
834 " 'for',\n",
835 " 'illustration',\n",
836 " ',',\n",
837 " 'but',\n",
838 " 'it',\n",
839 " 'is',\n",
840 " 'not',\n",
841 " 'really',\n",
842 " 'a',\n",
843 " 'very',\n",
844 " 'good',\n",
845 " 'test',\n",
846 " '.',\n",
847 " '(',\n",
848 " 'See',\n",
849 " 'exercise',\n",
850 " '1.7',\n",
851 " '.)',\n",
852 " 'The',\n",
853 " 'idea',\n",
854 " 'is',\n",
855 " 'to',\n",
856 " 'improve',\n",
857 " 'the',\n",
858 " 'answer',\n",
859 " 'until',\n",
860 " 'it',\n",
861 " 'is',\n",
862 " 'close',\n",
863 " 'enough',\n",
864 " 'so',\n",
865 " 'that',\n",
866 " 'its',\n",
867 " 'square',\n",
868 " 'differs',\n",
869 " 'from',\n",
870 " 'the',\n",
871 " 'radicand',\n",
872 " 'by',\n",
873 " 'less',\n",
874 " 'than',\n",
875 " 'a',\n",
876 " 'predetermined',\n",
877 " 'tolerance',\n",
878 " '(',\n",
879 " 'here',\n",
880 " '0.001',\n",
881 " '):',\n",
882 " '22',\n",
883 " '(',\n",
884 " 'define',\n",
885 " '(',\n",
886 " 'good',\n",
887 " '-',\n",
888 " 'enough',\n",
889 " '?',\n",
890 " 'guess',\n",
891 " 'x',\n",
892 " ')',\n",
893 " '(<',\n",
894 " '(',\n",
895 " 'abs',\n",
896 " '(-',\n",
897 " '(',\n",
898 " 'square',\n",
899 " 'guess',\n",
900 " ')',\n",
901 " 'x',\n",
902 " '))',\n",
903 " '0.001',\n",
904 " '))',\n",
905 " 'Finally',\n",
906 " ',',\n",
907 " 'we',\n",
908 " 'need',\n",
909 " 'a',\n",
910 " 'way',\n",
911 " 'to',\n",
912 " 'get',\n",
913 " 'started',\n",
914 " '.',\n",
915 " 'For',\n",
916 " 'instance',\n",
917 " ',',\n",
918 " 'we',\n",
919 " 'can',\n",
920 " 'always',\n",
921 " 'guess',\n",
922 " 'that',\n",
923 " 'the',\n",
924 " 'square',\n",
925 " 'root',\n",
926 " 'of',\n",
927 " 'any',\n",
928 " 'number',\n",
929 " 'is',\n",
930 " '1',\n",
931 " ':',\n",
932 " '23',\n",
933 " '(',\n",
934 " 'define',\n",
935 " '(',\n",
936 " 'sqrt',\n",
937 " 'x',\n",
938 " ')',\n",
939 " '(',\n",
940 " 'sqrt',\n",
941 " '-',\n",
942 " 'iter',\n",
943 " '1.0',\n",
944 " 'x',\n",
945 " '))',\n",
946 " 'If',\n",
947 " 'we',\n",
948 " 'type',\n",
949 " 'these',\n",
950 " 'definitions',\n",
951 " 'to',\n",
952 " 'the',\n",
953 " 'interpreter',\n",
954 " ',',\n",
955 " 'we',\n",
956 " 'can',\n",
957 " 'use',\n",
958 " 'sqrt',\n",
959 " 'just',\n",
960 " 'as',\n",
961 " 'we',\n",
962 " 'can',\n",
963 " 'use',\n",
964 " 'any',\n",
965 " 'procedure',\n",
966 " ':']"
967 ]
968 },
969 "execution_count": 13,
970 "metadata": {},
971 "output_type": "execute_result"
972 }
973 ],
974 "source": [
975 "tokenise(sample_text)"
976 ]
977 },
978 {
979 "cell_type": "code",
980 "execution_count": 14,
981 "metadata": {
982 "collapsed": false
983 },
984 "outputs": [
985 {
986 "data": {
987 "text/plain": [
988 "['Exercise',\n",
989 " '1.8',\n",
990 " '.',\n",
991 " \"Newton's\",\n",
992 " 'method',\n",
993 " 'for',\n",
994 " 'cube',\n",
995 " 'roots',\n",
996 " 'is',\n",
997 " 'based',\n",
998 " 'on',\n",
999 " 'the',\n",
1000 " 'fact',\n",
1001 " 'that',\n",
1002 " 'if',\n",
1003 " 'y',\n",
1004 " 'is',\n",
1005 " 'an',\n",
1006 " 'approximation',\n",
1007 " 'to',\n",
1008 " 'the',\n",
1009 " 'cube',\n",
1010 " 'root',\n",
1011 " 'of',\n",
1012 " 'x',\n",
1013 " ',',\n",
1014 " 'then',\n",
1015 " 'a',\n",
1016 " 'better',\n",
1017 " 'approximation',\n",
1018 " 'is',\n",
1019 " 'given',\n",
1020 " 'by',\n",
1021 " 'the',\n",
1022 " 'value',\n",
1023 " 'Use',\n",
1024 " 'this',\n",
1025 " 'formula',\n",
1026 " 'to',\n",
1027 " 'implement',\n",
1028 " 'a',\n",
1029 " 'cube',\n",
1030 " '-',\n",
1031 " 'root',\n",
1032 " 'procedure',\n",
1033 " 'analogous',\n",
1034 " 'to',\n",
1035 " 'the',\n",
1036 " 'square',\n",
1037 " '-',\n",
1038 " 'root',\n",
1039 " 'procedure',\n",
1040 " '.',\n",
1041 " '(',\n",
1042 " 'In',\n",
1043 " 'section',\n",
1044 " '1.3',\n",
1045 " '.',\n",
1046 " '4',\n",
1047 " 'we',\n",
1048 " 'will',\n",
1049 " 'see',\n",
1050 " 'how',\n",
1051 " 'to',\n",
1052 " 'implement',\n",
1053 " \"Newton's\",\n",
1054 " 'method',\n",
1055 " 'in',\n",
1056 " 'general',\n",
1057 " 'as',\n",
1058 " 'an',\n",
1059 " 'abstraction',\n",
1060 " 'of',\n",
1061 " 'these',\n",
1062 " 'square',\n",
1063 " '-',\n",
1064 " 'root',\n",
1065 " 'and',\n",
1066 " 'cube',\n",
1067 " '-',\n",
1068 " 'root',\n",
1069 " 'procedures',\n",
1070 " '.)',\n",
1071 " '1.1',\n",
1072 " '.',\n",
1073 " '8',\n",
1074 " 'Procedures',\n",
1075 " 'as',\n",
1076 " 'Black',\n",
1077 " '-',\n",
1078 " 'Box',\n",
1079 " 'Abstractions',\n",
1080 " 'Sqrt',\n",
1081 " 'is',\n",
1082 " 'our',\n",
1083 " 'first',\n",
1084 " 'example',\n",
1085 " 'of',\n",
1086 " 'a',\n",
1087 " 'process',\n",
1088 " 'defined',\n",
1089 " 'by',\n",
1090 " 'a',\n",
1091 " 'set',\n",
1092 " 'of',\n",
1093 " 'mutually',\n",
1094 " 'defined',\n",
1095 " 'procedures',\n",
1096 " '.',\n",
1097 " 'Notice',\n",
1098 " 'that',\n",
1099 " 'the',\n",
1100 " 'definition',\n",
1101 " 'of',\n",
1102 " 'sqrt',\n",
1103 " '-',\n",
1104 " 'iter',\n",
1105 " 'is',\n",
1106 " 'recursive',\n",
1107 " ';',\n",
1108 " 'that',\n",
1109 " 'is',\n",
1110 " ',',\n",
1111 " 'the',\n",
1112 " 'procedure',\n",
1113 " 'is',\n",
1114 " 'defined',\n",
1115 " 'in',\n",
1116 " 'terms',\n",
1117 " 'of',\n",
1118 " 'itself',\n",
1119 " '.',\n",
1120 " 'The',\n",
1121 " 'idea',\n",
1122 " 'of',\n",
1123 " 'being',\n",
1124 " 'able',\n",
1125 " 'to',\n",
1126 " 'define',\n",
1127 " 'a',\n",
1128 " 'procedure',\n",
1129 " 'in',\n",
1130 " 'terms',\n",
1131 " 'of',\n",
1132 " 'itself',\n",
1133 " 'may',\n",
1134 " 'be',\n",
1135 " 'disturbing',\n",
1136 " ';',\n",
1137 " 'it',\n",
1138 " 'may',\n",
1139 " 'seem',\n",
1140 " 'unclear',\n",
1141 " 'how',\n",
1142 " 'such',\n",
1143 " 'a',\n",
1144 " \"''\",\n",
1145 " 'circular',\n",
1146 " \"''\",\n",
1147 " 'definition',\n",
1148 " 'could',\n",
1149 " 'make',\n",
1150 " 'sense',\n",
1151 " 'at',\n",
1152 " 'all',\n",
1153 " ',',\n",
1154 " 'much',\n",
1155 " 'less',\n",
1156 " 'specify',\n",
1157 " 'a',\n",
1158 " 'well',\n",
1159 " '-',\n",
1160 " 'defined',\n",
1161 " 'process',\n",
1162 " 'to',\n",
1163 " 'be',\n",
1164 " 'carried']"
1165 ]
1166 },
1167 "execution_count": 14,
1168 "metadata": {},
1169 "output_type": "execute_result"
1170 }
1171 ],
1172 "source": [
1173 "tokenise(\"\"\"Exercise 1.8. Newton's method for cube roots is based on the fact that if y is an approximation to the\n",
1174 "cube root of x, then a better approximation is given by the value\n",
1175 "\n",
1176 "Use this formula to implement a cube-root procedure analogous to the square-root procedure. (In\n",
1177 "section 1.3.4 we will see how to implement Newton's method in general as an abstraction of these\n",
1178 "square-root and cube-root procedures.)\n",
1179 "\n",
1180 "1.1.8 Procedures as Black-Box Abstractions\n",
1181 "Sqrt is our first example of a process defined by a set of mutually defined procedures. Notice that the\n",
1182 "definition of sqrt-iter is recursive; that is, the procedure is defined in terms of itself. The idea of\n",
1183 "being able to define a procedure in terms of itself may be disturbing; it may seem unclear how such a\n",
1184 "''circular'' definition could make sense at all, much less specify a well-defined process to be carried\n",
1185 "\"\"\")"
1186 ]
1187 },
1188 {
1189 "cell_type": "code",
1190 "execution_count": 15,
1191 "metadata": {
1192 "collapsed": false
1193 },
1194 "outputs": [],
1195 "source": [
1196 "def find_counts_of_sentence(tokens, counts, tuple_size):\n",
1197 " for i in range(len(tokens)-(tuple_size)):\n",
1198 " counts[tuple(tokens[i:i+tuple_size])].update([tokens[i+tuple_size]])\n",
1199 " counts[tuple(tokens[-tuple_size:])].update(None)\n",
1200 " return counts"
1201 ]
1202 },
1203 {
1204 "cell_type": "code",
1205 "execution_count": 16,
1206 "metadata": {
1207 "collapsed": false
1208 },
1209 "outputs": [],
1210 "source": [
1211 "def find_counts(sentences, tuple_size=2):\n",
1212 " counts = collections.defaultdict(collections.Counter)\n",
1213 " starts = collections.Counter()\n",
1214 " for sentence in sentences:\n",
1215 " counts = find_counts_of_sentence(sentence, counts, tuple_size)\n",
1216 " starts[tuple(sentence[:tuple_size])] += 1\n",
1217 " return starts, counts"
1218 ]
1219 },
1220 {
1221 "cell_type": "code",
1222 "execution_count": 17,
1223 "metadata": {
1224 "collapsed": false
1225 },
1226 "outputs": [],
1227 "source": [
1228 "def sentences(tokens):\n",
1229 " sents = []\n",
1230 " sent = []\n",
1231 " for i in range(len(tokens)):\n",
1232 " if tokens[i] == '.':\n",
1233 " sents += [sent + [tokens[i]]]\n",
1234 " sent = []\n",
1235 " else:\n",
1236 " sent += [tokens[i]]\n",
1237 " return sents"
1238 ]
1239 },
1240 {
1241 "cell_type": "code",
1242 "execution_count": 18,
1243 "metadata": {
1244 "collapsed": false
1245 },
1246 "outputs": [
1247 {
1248 "data": {
1249 "text/plain": [
1250 "[['Continuing',\n",
1251 " 'this',\n",
1252 " 'process',\n",
1253 " ',',\n",
1254 " 'we',\n",
1255 " 'obtain',\n",
1256 " 'better',\n",
1257 " 'and',\n",
1258 " 'better',\n",
1259 " 'approximations',\n",
1260 " 'to',\n",
1261 " 'the',\n",
1262 " 'square',\n",
1263 " 'root',\n",
1264 " '.'],\n",
1265 " ['Now',\n",
1266 " \"let's\",\n",
1267 " 'formalize',\n",
1268 " 'the',\n",
1269 " 'process',\n",
1270 " 'in',\n",
1271 " 'terms',\n",
1272 " 'of',\n",
1273 " 'procedures',\n",
1274 " '.'],\n",
1275 " ['We',\n",
1276 " 'start',\n",
1277 " 'with',\n",
1278 " 'a',\n",
1279 " 'value',\n",
1280 " 'for',\n",
1281 " 'the',\n",
1282 " 'radicand',\n",
1283 " '(',\n",
1284 " 'the',\n",
1285 " 'number',\n",
1286 " 'whose',\n",
1287 " 'square',\n",
1288 " 'root',\n",
1289 " 'we',\n",
1290 " 'are',\n",
1291 " 'trying',\n",
1292 " 'to',\n",
1293 " 'compute',\n",
1294 " ')',\n",
1295 " 'and',\n",
1296 " 'a',\n",
1297 " 'value',\n",
1298 " 'for',\n",
1299 " 'the',\n",
1300 " 'guess',\n",
1301 " '.'],\n",
1302 " ['If',\n",
1303 " 'the',\n",
1304 " 'guess',\n",
1305 " 'is',\n",
1306 " 'good',\n",
1307 " 'enough',\n",
1308 " 'for',\n",
1309 " 'our',\n",
1310 " 'purposes',\n",
1311 " ',',\n",
1312 " 'we',\n",
1313 " 'are',\n",
1314 " 'done',\n",
1315 " ';',\n",
1316 " 'if',\n",
1317 " 'not',\n",
1318 " ',',\n",
1319 " 'we',\n",
1320 " 'must',\n",
1321 " 'repeat',\n",
1322 " 'the',\n",
1323 " 'process',\n",
1324 " 'with',\n",
1325 " 'an',\n",
1326 " 'improved',\n",
1327 " 'guess',\n",
1328 " '.'],\n",
1329 " ['We',\n",
1330 " 'write',\n",
1331 " 'this',\n",
1332 " 'basic',\n",
1333 " 'strategy',\n",
1334 " 'as',\n",
1335 " 'a',\n",
1336 " 'procedure',\n",
1337 " ':',\n",
1338 " '(',\n",
1339 " 'define',\n",
1340 " '(',\n",
1341 " 'sqrt',\n",
1342 " '-',\n",
1343 " 'iter',\n",
1344 " 'guess',\n",
1345 " 'x',\n",
1346 " ')',\n",
1347 " '(',\n",
1348 " 'if',\n",
1349 " '(',\n",
1350 " 'good',\n",
1351 " '-',\n",
1352 " 'enough',\n",
1353 " '?',\n",
1354 " 'guess',\n",
1355 " 'x',\n",
1356 " ')',\n",
1357 " 'guess',\n",
1358 " '(',\n",
1359 " 'sqrt',\n",
1360 " '-',\n",
1361 " 'iter',\n",
1362 " '(',\n",
1363 " 'improve',\n",
1364 " 'guess',\n",
1365 " 'x',\n",
1366 " ')',\n",
1367 " 'x',\n",
1368 " ')))',\n",
1369 " 'A',\n",
1370 " 'guess',\n",
1371 " 'is',\n",
1372 " 'improved',\n",
1373 " 'by',\n",
1374 " 'averaging',\n",
1375 " 'it',\n",
1376 " 'with',\n",
1377 " 'the',\n",
1378 " 'quotient',\n",
1379 " 'of',\n",
1380 " 'the',\n",
1381 " 'radicand',\n",
1382 " 'and',\n",
1383 " 'the',\n",
1384 " 'old',\n",
1385 " 'guess',\n",
1386 " ':',\n",
1387 " '(',\n",
1388 " 'define',\n",
1389 " '(',\n",
1390 " 'improve',\n",
1391 " 'guess',\n",
1392 " 'x',\n",
1393 " ')',\n",
1394 " '(',\n",
1395 " 'average',\n",
1396 " 'guess',\n",
1397 " '(/',\n",
1398 " 'x',\n",
1399 " 'guess',\n",
1400 " ')))',\n",
1401 " 'where',\n",
1402 " '(',\n",
1403 " 'define',\n",
1404 " '(',\n",
1405 " 'average',\n",
1406 " 'x',\n",
1407 " 'y',\n",
1408 " ')',\n",
1409 " '(/',\n",
1410 " '(+',\n",
1411 " 'x',\n",
1412 " 'y',\n",
1413 " ')',\n",
1414 " '2',\n",
1415 " '))',\n",
1416 " 'We',\n",
1417 " 'also',\n",
1418 " 'have',\n",
1419 " 'to',\n",
1420 " 'say',\n",
1421 " 'what',\n",
1422 " 'we',\n",
1423 " 'mean',\n",
1424 " 'by',\n",
1425 " \"''\",\n",
1426 " 'good',\n",
1427 " 'enough',\n",
1428 " \".''\",\n",
1429 " 'The',\n",
1430 " 'following',\n",
1431 " 'will',\n",
1432 " 'do',\n",
1433 " 'for',\n",
1434 " 'illustration',\n",
1435 " ',',\n",
1436 " 'but',\n",
1437 " 'it',\n",
1438 " 'is',\n",
1439 " 'not',\n",
1440 " 'really',\n",
1441 " 'a',\n",
1442 " 'very',\n",
1443 " 'good',\n",
1444 " 'test',\n",
1445 " '.'],\n",
1446 " ['(',\n",
1447 " 'See',\n",
1448 " 'exercise',\n",
1449 " '1.7',\n",
1450 " '.)',\n",
1451 " 'The',\n",
1452 " 'idea',\n",
1453 " 'is',\n",
1454 " 'to',\n",
1455 " 'improve',\n",
1456 " 'the',\n",
1457 " 'answer',\n",
1458 " 'until',\n",
1459 " 'it',\n",
1460 " 'is',\n",
1461 " 'close',\n",
1462 " 'enough',\n",
1463 " 'so',\n",
1464 " 'that',\n",
1465 " 'its',\n",
1466 " 'square',\n",
1467 " 'differs',\n",
1468 " 'from',\n",
1469 " 'the',\n",
1470 " 'radicand',\n",
1471 " 'by',\n",
1472 " 'less',\n",
1473 " 'than',\n",
1474 " 'a',\n",
1475 " 'predetermined',\n",
1476 " 'tolerance',\n",
1477 " '(',\n",
1478 " 'here',\n",
1479 " '0.001',\n",
1480 " '):',\n",
1481 " '22',\n",
1482 " '(',\n",
1483 " 'define',\n",
1484 " '(',\n",
1485 " 'good',\n",
1486 " '-',\n",
1487 " 'enough',\n",
1488 " '?',\n",
1489 " 'guess',\n",
1490 " 'x',\n",
1491 " ')',\n",
1492 " '(<',\n",
1493 " '(',\n",
1494 " 'abs',\n",
1495 " '(-',\n",
1496 " '(',\n",
1497 " 'square',\n",
1498 " 'guess',\n",
1499 " ')',\n",
1500 " 'x',\n",
1501 " '))',\n",
1502 " '0.001',\n",
1503 " '))',\n",
1504 " 'Finally',\n",
1505 " ',',\n",
1506 " 'we',\n",
1507 " 'need',\n",
1508 " 'a',\n",
1509 " 'way',\n",
1510 " 'to',\n",
1511 " 'get',\n",
1512 " 'started',\n",
1513 " '.']]"
1514 ]
1515 },
1516 "execution_count": 18,
1517 "metadata": {},
1518 "output_type": "execute_result"
1519 }
1520 ],
1521 "source": [
1522 "sentences(tokenise(sample_text))"
1523 ]
1524 },
1525 {
1526 "cell_type": "code",
1527 "execution_count": 19,
1528 "metadata": {
1529 "collapsed": false
1530 },
1531 "outputs": [
1532 {
1533 "data": {
1534 "text/plain": [
1535 "(Counter({('Continuing', 'this'): 1}),\n",
1536 " defaultdict(collections.Counter,\n",
1537 " {(',', 'we'): Counter({'obtain': 1}),\n",
1538 " ('Continuing', 'this'): Counter({'process': 1}),\n",
1539 " ('and', 'better'): Counter({'approximations': 1}),\n",
1540 " ('approximations', 'to'): Counter({'the': 1}),\n",
1541 " ('better', 'and'): Counter({'better': 1}),\n",
1542 " ('better', 'approximations'): Counter({'to': 1}),\n",
1543 " ('obtain', 'better'): Counter({'and': 1}),\n",
1544 " ('process', ','): Counter({'we': 1}),\n",
1545 " ('root', '.'): Counter(),\n",
1546 " ('square', 'root'): Counter({'.': 1}),\n",
1547 " ('the', 'square'): Counter({'root': 1}),\n",
1548 " ('this', 'process'): Counter({',': 1}),\n",
1549 " ('to', 'the'): Counter({'square': 1}),\n",
1550 " ('we', 'obtain'): Counter({'better': 1})}))"
1551 ]
1552 },
1553 "execution_count": 19,
1554 "metadata": {},
1555 "output_type": "execute_result"
1556 }
1557 ],
1558 "source": [
1559 "one_s_starts, one_s_counts = find_counts([sentences(tokenise(sample_text))[0]])\n",
1560 "one_s_starts, one_s_counts"
1561 ]
1562 },
1563 {
1564 "cell_type": "code",
1565 "execution_count": 20,
1566 "metadata": {
1567 "collapsed": false
1568 },
1569 "outputs": [
1570 {
1571 "data": {
1572 "text/plain": [
1573 "(Counter({('(', 'See', 'exercise'): 1,\n",
1574 " ('Continuing', 'this', 'process'): 1,\n",
1575 " ('If', 'the', 'guess'): 1,\n",
1576 " ('Now', \"let's\", 'formalize'): 1,\n",
1577 " ('We', 'start', 'with'): 1,\n",
1578 " ('We', 'write', 'this'): 1}),\n",
1579 " defaultdict(collections.Counter,\n",
1580 " {(\"''\", 'good', 'enough'): Counter({\".''\": 1}),\n",
1581 " ('(', 'See', 'exercise'): Counter({'1.7': 1}),\n",
1582 " ('(', 'abs', '(-'): Counter({'(': 1}),\n",
1583 " ('(', 'average', 'guess'): Counter({'(/': 1}),\n",
1584 " ('(', 'average', 'x'): Counter({'y': 1}),\n",
1585 " ('(',\n",
1586 " 'define',\n",
1587 " '('): Counter({'average': 1,\n",
1588 " 'good': 1,\n",
1589 " 'improve': 1,\n",
1590 " 'sqrt': 1}),\n",
1591 " ('(', 'good', '-'): Counter({'enough': 2}),\n",
1592 " ('(', 'here', '0.001'): Counter({'):': 1}),\n",
1593 " ('(', 'if', '('): Counter({'good': 1}),\n",
1594 " ('(', 'improve', 'guess'): Counter({'x': 2}),\n",
1595 " ('(', 'sqrt', '-'): Counter({'iter': 2}),\n",
1596 " ('(', 'square', 'guess'): Counter({')': 1}),\n",
1597 " ('(', 'the', 'number'): Counter({'whose': 1}),\n",
1598 " ('(+', 'x', 'y'): Counter({')': 1}),\n",
1599 " ('(-', '(', 'square'): Counter({'guess': 1}),\n",
1600 " ('(/', '(+', 'x'): Counter({'y': 1}),\n",
1601 " ('(/', 'x', 'guess'): Counter({')))': 1}),\n",
1602 " ('(<', '(', 'abs'): Counter({'(-': 1}),\n",
1603 " (')', '(', 'average'): Counter({'guess': 1}),\n",
1604 " (')', '(', 'if'): Counter({'(': 1}),\n",
1605 " (')', '(/', '(+'): Counter({'x': 1}),\n",
1606 " (')', '(<', '('): Counter({'abs': 1}),\n",
1607 " (')', '2', '))'): Counter({'We': 1}),\n",
1608 " (')', 'and', 'a'): Counter({'value': 1}),\n",
1609 " (')', 'guess', '('): Counter({'sqrt': 1}),\n",
1610 " (')', 'x', '))'): Counter({'0.001': 1}),\n",
1611 " (')', 'x', ')))'): Counter({'A': 1}),\n",
1612 " ('))', '0.001', '))'): Counter({'Finally': 1}),\n",
1613 " ('))', 'Finally', ','): Counter({'we': 1}),\n",
1614 " ('))', 'We', 'also'): Counter({'have': 1}),\n",
1615 " (')))', 'A', 'guess'): Counter({'is': 1}),\n",
1616 " (')))', 'where', '('): Counter({'define': 1}),\n",
1617 " ('):', '22', '('): Counter({'define': 1}),\n",
1618 " (',', 'but', 'it'): Counter({'is': 1}),\n",
1619 " (',', 'we', 'are'): Counter({'done': 1}),\n",
1620 " (',', 'we', 'must'): Counter({'repeat': 1}),\n",
1621 " (',', 'we', 'need'): Counter({'a': 1}),\n",
1622 " (',', 'we', 'obtain'): Counter({'better': 1}),\n",
1623 " ('-', 'enough', '?'): Counter({'guess': 2}),\n",
1624 " ('-', 'iter', '('): Counter({'improve': 1}),\n",
1625 " ('-', 'iter', 'guess'): Counter({'x': 1}),\n",
1626 " (\".''\", 'The', 'following'): Counter({'will': 1}),\n",
1627 " ('.)', 'The', 'idea'): Counter({'is': 1}),\n",
1628 " ('0.001', '))', 'Finally'): Counter({',': 1}),\n",
1629 " ('0.001', '):', '22'): Counter({'(': 1}),\n",
1630 " ('1.7', '.)', 'The'): Counter({'idea': 1}),\n",
1631 " ('2', '))', 'We'): Counter({'also': 1}),\n",
1632 " ('22', '(', 'define'): Counter({'(': 1}),\n",
1633 " (':', '(', 'define'): Counter({'(': 2}),\n",
1634 " (';', 'if', 'not'): Counter({',': 1}),\n",
1635 " ('?', 'guess', 'x'): Counter({')': 2}),\n",
1636 " ('A', 'guess', 'is'): Counter({'improved': 1}),\n",
1637 " ('Continuing', 'this', 'process'): Counter({',': 1}),\n",
1638 " ('Finally', ',', 'we'): Counter({'need': 1}),\n",
1639 " ('If', 'the', 'guess'): Counter({'is': 1}),\n",
1640 " ('Now', \"let's\", 'formalize'): Counter({'the': 1}),\n",
1641 " ('See', 'exercise', '1.7'): Counter({'.)': 1}),\n",
1642 " ('The', 'following', 'will'): Counter({'do': 1}),\n",
1643 " ('The', 'idea', 'is'): Counter({'to': 1}),\n",
1644 " ('We', 'also', 'have'): Counter({'to': 1}),\n",
1645 " ('We', 'start', 'with'): Counter({'a': 1}),\n",
1646 " ('We', 'write', 'this'): Counter({'basic': 1}),\n",
1647 " ('a', 'predetermined', 'tolerance'): Counter({'(': 1}),\n",
1648 " ('a', 'procedure', ':'): Counter({'(': 1}),\n",
1649 " ('a', 'value', 'for'): Counter({'the': 2}),\n",
1650 " ('a', 'very', 'good'): Counter({'test': 1}),\n",
1651 " ('a', 'way', 'to'): Counter({'get': 1}),\n",
1652 " ('abs', '(-', '('): Counter({'square': 1}),\n",
1653 " ('also', 'have', 'to'): Counter({'say': 1}),\n",
1654 " ('an', 'improved', 'guess'): Counter({'.': 1}),\n",
1655 " ('and', 'a', 'value'): Counter({'for': 1}),\n",
1656 " ('and', 'better', 'approximations'): Counter({'to': 1}),\n",
1657 " ('and', 'the', 'old'): Counter({'guess': 1}),\n",
1658 " ('answer', 'until', 'it'): Counter({'is': 1}),\n",
1659 " ('approximations', 'to', 'the'): Counter({'square': 1}),\n",
1660 " ('are', 'done', ';'): Counter({'if': 1}),\n",
1661 " ('are', 'trying', 'to'): Counter({'compute': 1}),\n",
1662 " ('as', 'a', 'procedure'): Counter({':': 1}),\n",
1663 " ('average', 'guess', '(/'): Counter({'x': 1}),\n",
1664 " ('average', 'x', 'y'): Counter({')': 1}),\n",
1665 " ('averaging', 'it', 'with'): Counter({'the': 1}),\n",
1666 " ('basic', 'strategy', 'as'): Counter({'a': 1}),\n",
1667 " ('better', 'and', 'better'): Counter({'approximations': 1}),\n",
1668 " ('better', 'approximations', 'to'): Counter({'the': 1}),\n",
1669 " ('but', 'it', 'is'): Counter({'not': 1}),\n",
1670 " ('by', \"''\", 'good'): Counter({'enough': 1}),\n",
1671 " ('by', 'averaging', 'it'): Counter({'with': 1}),\n",
1672 " ('by', 'less', 'than'): Counter({'a': 1}),\n",
1673 " ('close', 'enough', 'so'): Counter({'that': 1}),\n",
1674 " ('compute', ')', 'and'): Counter({'a': 1}),\n",
1675 " ('define', '(', 'average'): Counter({'x': 1}),\n",
1676 " ('define', '(', 'good'): Counter({'-': 1}),\n",
1677 " ('define', '(', 'improve'): Counter({'guess': 1}),\n",
1678 " ('define', '(', 'sqrt'): Counter({'-': 1}),\n",
1679 " ('differs', 'from', 'the'): Counter({'radicand': 1}),\n",
1680 " ('do', 'for', 'illustration'): Counter({',': 1}),\n",
1681 " ('done', ';', 'if'): Counter({'not': 1}),\n",
1682 " ('enough', \".''\", 'The'): Counter({'following': 1}),\n",
1683 " ('enough', '?', 'guess'): Counter({'x': 2}),\n",
1684 " ('enough', 'for', 'our'): Counter({'purposes': 1}),\n",
1685 " ('enough', 'so', 'that'): Counter({'its': 1}),\n",
1686 " ('exercise', '1.7', '.)'): Counter({'The': 1}),\n",
1687 " ('following', 'will', 'do'): Counter({'for': 1}),\n",
1688 " ('for', 'illustration', ','): Counter({'but': 1}),\n",
1689 " ('for', 'our', 'purposes'): Counter({',': 1}),\n",
1690 " ('for', 'the', 'guess'): Counter({'.': 1}),\n",
1691 " ('for', 'the', 'radicand'): Counter({'(': 1}),\n",
1692 " ('formalize', 'the', 'process'): Counter({'in': 1}),\n",
1693 " ('from', 'the', 'radicand'): Counter({'by': 1}),\n",
1694 " ('get', 'started', '.'): Counter(),\n",
1695 " ('good', '-', 'enough'): Counter({'?': 2}),\n",
1696 " ('good', 'enough', \".''\"): Counter({'The': 1}),\n",
1697 " ('good', 'enough', 'for'): Counter({'our': 1}),\n",
1698 " ('good', 'test', '.'): Counter(),\n",
1699 " ('guess', '(', 'sqrt'): Counter({'-': 1}),\n",
1700 " ('guess', '(/', 'x'): Counter({'guess': 1}),\n",
1701 " ('guess', ')', 'x'): Counter({'))': 1}),\n",
1702 " ('guess', ')))', 'where'): Counter({'(': 1}),\n",
1703 " ('guess', ':', '('): Counter({'define': 1}),\n",
1704 " ('guess', 'is', 'good'): Counter({'enough': 1}),\n",
1705 " ('guess', 'is', 'improved'): Counter({'by': 1}),\n",
1706 " ('guess',\n",
1707 " 'x',\n",
1708 " ')'): Counter({'(': 2, '(<': 1, 'guess': 1, 'x': 1}),\n",
1709 " ('have', 'to', 'say'): Counter({'what': 1}),\n",
1710 " ('here', '0.001', '):'): Counter({'22': 1}),\n",
1711 " ('idea', 'is', 'to'): Counter({'improve': 1}),\n",
1712 " ('if', '(', 'good'): Counter({'-': 1}),\n",
1713 " ('if', 'not', ','): Counter({'we': 1}),\n",
1714 " ('illustration', ',', 'but'): Counter({'it': 1}),\n",
1715 " ('improve', 'guess', 'x'): Counter({')': 2}),\n",
1716 " ('improve', 'the', 'answer'): Counter({'until': 1}),\n",
1717 " ('improved', 'by', 'averaging'): Counter({'it': 1}),\n",
1718 " ('improved', 'guess', '.'): Counter(),\n",
1719 " ('in', 'terms', 'of'): Counter({'procedures': 1}),\n",
1720 " ('is', 'close', 'enough'): Counter({'so': 1}),\n",
1721 " ('is', 'good', 'enough'): Counter({'for': 1}),\n",
1722 " ('is', 'improved', 'by'): Counter({'averaging': 1}),\n",
1723 " ('is', 'not', 'really'): Counter({'a': 1}),\n",
1724 " ('is', 'to', 'improve'): Counter({'the': 1}),\n",
1725 " ('it', 'is', 'close'): Counter({'enough': 1}),\n",
1726 " ('it', 'is', 'not'): Counter({'really': 1}),\n",
1727 " ('it', 'with', 'the'): Counter({'quotient': 1}),\n",
1728 " ('iter', '(', 'improve'): Counter({'guess': 1}),\n",
1729 " ('iter', 'guess', 'x'): Counter({')': 1}),\n",
1730 " ('its', 'square', 'differs'): Counter({'from': 1}),\n",
1731 " ('less', 'than', 'a'): Counter({'predetermined': 1}),\n",
1732 " (\"let's\", 'formalize', 'the'): Counter({'process': 1}),\n",
1733 " ('mean', 'by', \"''\"): Counter({'good': 1}),\n",
1734 " ('must', 'repeat', 'the'): Counter({'process': 1}),\n",
1735 " ('need', 'a', 'way'): Counter({'to': 1}),\n",
1736 " ('not', ',', 'we'): Counter({'must': 1}),\n",
1737 " ('not', 'really', 'a'): Counter({'very': 1}),\n",
1738 " ('number', 'whose', 'square'): Counter({'root': 1}),\n",
1739 " ('obtain', 'better', 'and'): Counter({'better': 1}),\n",
1740 " ('of', 'procedures', '.'): Counter(),\n",
1741 " ('of', 'the', 'radicand'): Counter({'and': 1}),\n",
1742 " ('old', 'guess', ':'): Counter({'(': 1}),\n",
1743 " ('our', 'purposes', ','): Counter({'we': 1}),\n",
1744 " ('predetermined', 'tolerance', '('): Counter({'here': 1}),\n",
1745 " ('procedure', ':', '('): Counter({'define': 1}),\n",
1746 " ('process', ',', 'we'): Counter({'obtain': 1}),\n",
1747 " ('process', 'in', 'terms'): Counter({'of': 1}),\n",
1748 " ('process', 'with', 'an'): Counter({'improved': 1}),\n",
1749 " ('purposes', ',', 'we'): Counter({'are': 1}),\n",
1750 " ('quotient', 'of', 'the'): Counter({'radicand': 1}),\n",
1751 " ('radicand', '(', 'the'): Counter({'number': 1}),\n",
1752 " ('radicand', 'and', 'the'): Counter({'old': 1}),\n",
1753 " ('radicand', 'by', 'less'): Counter({'than': 1}),\n",
1754 " ('really', 'a', 'very'): Counter({'good': 1}),\n",
1755 " ('repeat', 'the', 'process'): Counter({'with': 1}),\n",
1756 " ('root', 'we', 'are'): Counter({'trying': 1}),\n",
1757 " ('say', 'what', 'we'): Counter({'mean': 1}),\n",
1758 " ('so', 'that', 'its'): Counter({'square': 1}),\n",
1759 " ('sqrt', '-', 'iter'): Counter({'(': 1, 'guess': 1}),\n",
1760 " ('square', 'differs', 'from'): Counter({'the': 1}),\n",
1761 " ('square', 'guess', ')'): Counter({'x': 1}),\n",
1762 " ('square', 'root', '.'): Counter(),\n",
1763 " ('square', 'root', 'we'): Counter({'are': 1}),\n",
1764 " ('start', 'with', 'a'): Counter({'value': 1}),\n",
1765 " ('strategy', 'as', 'a'): Counter({'procedure': 1}),\n",
1766 " ('terms', 'of', 'procedures'): Counter({'.': 1}),\n",
1767 " ('than', 'a', 'predetermined'): Counter({'tolerance': 1}),\n",
1768 " ('that', 'its', 'square'): Counter({'differs': 1}),\n",
1769 " ('the', 'answer', 'until'): Counter({'it': 1}),\n",
1770 " ('the', 'guess', '.'): Counter(),\n",
1771 " ('the', 'guess', 'is'): Counter({'good': 1}),\n",
1772 " ('the', 'number', 'whose'): Counter({'square': 1}),\n",
1773 " ('the', 'old', 'guess'): Counter({':': 1}),\n",
1774 " ('the', 'process', 'in'): Counter({'terms': 1}),\n",
1775 " ('the', 'process', 'with'): Counter({'an': 1}),\n",
1776 " ('the', 'quotient', 'of'): Counter({'the': 1}),\n",
1777 " ('the', 'radicand', '('): Counter({'the': 1}),\n",
1778 " ('the', 'radicand', 'and'): Counter({'the': 1}),\n",
1779 " ('the', 'radicand', 'by'): Counter({'less': 1}),\n",
1780 " ('the', 'square', 'root'): Counter({'.': 1}),\n",
1781 " ('this', 'basic', 'strategy'): Counter({'as': 1}),\n",
1782 " ('this', 'process', ','): Counter({'we': 1}),\n",
1783 " ('to', 'compute', ')'): Counter({'and': 1}),\n",
1784 " ('to', 'get', 'started'): Counter({'.': 1}),\n",
1785 " ('to', 'improve', 'the'): Counter({'answer': 1}),\n",
1786 " ('to', 'say', 'what'): Counter({'we': 1}),\n",
1787 " ('to', 'the', 'square'): Counter({'root': 1}),\n",
1788 " ('tolerance', '(', 'here'): Counter({'0.001': 1}),\n",
1789 " ('trying', 'to', 'compute'): Counter({')': 1}),\n",
1790 " ('until', 'it', 'is'): Counter({'close': 1}),\n",
1791 " ('value', 'for', 'the'): Counter({'guess': 1, 'radicand': 1}),\n",
1792 " ('very', 'good', 'test'): Counter({'.': 1}),\n",
1793 " ('way', 'to', 'get'): Counter({'started': 1}),\n",
1794 " ('we', 'are', 'done'): Counter({';': 1}),\n",
1795 " ('we', 'are', 'trying'): Counter({'to': 1}),\n",
1796 " ('we', 'mean', 'by'): Counter({\"''\": 1}),\n",
1797 " ('we', 'must', 'repeat'): Counter({'the': 1}),\n",
1798 " ('we', 'need', 'a'): Counter({'way': 1}),\n",
1799 " ('we', 'obtain', 'better'): Counter({'and': 1}),\n",
1800 " ('what', 'we', 'mean'): Counter({'by': 1}),\n",
1801 " ('where', '(', 'define'): Counter({'(': 1}),\n",
1802 " ('whose', 'square', 'root'): Counter({'we': 1}),\n",
1803 " ('will', 'do', 'for'): Counter({'illustration': 1}),\n",
1804 " ('with', 'a', 'value'): Counter({'for': 1}),\n",
1805 " ('with', 'an', 'improved'): Counter({'guess': 1}),\n",
1806 " ('with', 'the', 'quotient'): Counter({'of': 1}),\n",
1807 " ('write', 'this', 'basic'): Counter({'strategy': 1}),\n",
1808 " ('x', ')', '('): Counter({'average': 1, 'if': 1}),\n",
1809 " ('x', ')', '(<'): Counter({'(': 1}),\n",
1810 " ('x', ')', 'guess'): Counter({'(': 1}),\n",
1811 " ('x', ')', 'x'): Counter({')))': 1}),\n",
1812 " ('x', '))', '0.001'): Counter({'))': 1}),\n",
1813 " ('x', ')))', 'A'): Counter({'guess': 1}),\n",
1814 " ('x', 'guess', ')))'): Counter({'where': 1}),\n",
1815 " ('x', 'y', ')'): Counter({'(/': 1, '2': 1}),\n",
1816 " ('y', ')', '(/'): Counter({'(+': 1}),\n",
1817 " ('y', ')', '2'): Counter({'))': 1})}))"
1818 ]
1819 },
1820 "execution_count": 20,
1821 "metadata": {},
1822 "output_type": "execute_result"
1823 }
1824 ],
1825 "source": [
1826 "find_counts(sentences(tokenise(sample_text)), tuple_size=3)"
1827 ]
1828 },
1829 {
1830 "cell_type": "code",
1831 "execution_count": 21,
1832 "metadata": {
1833 "collapsed": false
1834 },
1835 "outputs": [
1836 {
1837 "data": {
1838 "text/plain": [
1839 "('Continuing', 'this', 'process')"
1840 ]
1841 },
1842 "execution_count": 21,
1843 "metadata": {},
1844 "output_type": "execute_result"
1845 }
1846 ],
1847 "source": [
1848 "s = sentences(tokenise(sample_text))[0]\n",
1849 "tuple(s[:3])"
1850 ]
1851 },
1852 {
1853 "cell_type": "code",
1854 "execution_count": 22,
1855 "metadata": {
1856 "collapsed": true
1857 },
1858 "outputs": [],
1859 "source": [
1860 "unaccent_specials = ''.maketrans({\"’\": \"'\", \"’\": \"'\"})\n",
1861 "def unaccent(text):\n",
1862 " \"\"\"Remove all accents from letters.\n",
1863 " It does this by converting the unicode string to decomposed compatability\n",
1864 " form, dropping all the combining accents, then re-encoding the bytes.\n",
1865 "\n",
1866 " >>> unaccent('hello')\n",
1867 " 'hello'\n",
1868 " >>> unaccent('HELLO')\n",
1869 " 'HELLO'\n",
1870 " >>> unaccent('héllo')\n",
1871 " 'hello'\n",
1872 " >>> unaccent('héllö')\n",
1873 " 'hello'\n",
1874 " >>> unaccent('HÉLLÖ')\n",
1875 " 'HELLO'\n",
1876 " \"\"\"\n",
1877 " translated_text = text.translate(unaccent_specials)\n",
1878 " return unicodedata.normalize('NFKD', translated_text).\\\n",
1879 " encode('ascii', 'ignore').\\\n",
1880 " decode('utf-8')"
1881 ]
1882 },
1883 {
1884 "cell_type": "code",
1885 "execution_count": 23,
1886 "metadata": {
1887 "collapsed": false
1888 },
1889 "outputs": [],
1890 "source": [
1891 "sicp = unaccent(open('sicp.txt').read())\n",
1892 "sicp_starts, sicp_counts = find_counts(sentences(tokenise(sicp)), tuple_size=3)"
1893 ]
1894 },
1895 {
1896 "cell_type": "code",
1897 "execution_count": 24,
1898 "metadata": {
1899 "collapsed": false
1900 },
1901 "outputs": [
1902 {
1903 "data": {
1904 "text/plain": [
1905 "[(('subset', 'of', 'mathematical'), Counter({'logic': 1})),\n",
1906 " (('at', 'MIT', 'who'), Counter({'major': 1})),\n",
1907 " (('enforces', 'the', 'restriction'), Counter({'that': 1})),\n",
1908 " (('makes', 'it', 'very'), Counter({'difficult': 1})),\n",
1909 " (('understand', 'the', \"compiler's\"), Counter({'preserving': 1})),\n",
1910 " (('with', 'instructions', 'to'), Counter({'initialize': 1, 'test': 1})),\n",
1911 " (('to', 'problems', 'in'), Counter({'adding': 1})),\n",
1912 " (('problem', 'is', 'and'), Counter({'how': 1})),\n",
1913 " (('stream', '(', 'conjoin'), Counter({'(': 1})),\n",
1914 " (('organization', ',', 'then'), Counter({'to': 1}))]"
1915 ]
1916 },
1917 "execution_count": 24,
1918 "metadata": {},
1919 "output_type": "execute_result"
1920 }
1921 ],
1922 "source": [
1923 "list(sicp_counts.items())[:10]"
1924 ]
1925 },
1926 {
1927 "cell_type": "code",
1928 "execution_count": 25,
1929 "metadata": {
1930 "collapsed": false
1931 },
1932 "outputs": [
1933 {
1934 "data": {
1935 "text/plain": [
1936 "[(('With', 'permanent', '-'), 1),\n",
1937 " (('The', 'recursive', 'compilation'), 1),\n",
1938 " (('Define', 'abstractions', 'that'), 1),\n",
1939 " (('Hall', 'the', 'Rosalind'), 1),\n",
1940 " (('The', 'effect', 'executingnew'), 1),\n",
1941 " (('1', '-', '4.1'), 1),\n",
1942 " (('So', 'the', 'procedure'), 1),\n",
1943 " (('However', ',', 'with'), 1),\n",
1944 " (('Figure', '3.2', 'shows'), 1),\n",
1945 " (('In', 'software', '-'), 1)]"
1946 ]
1947 },
1948 "execution_count": 25,
1949 "metadata": {},
1950 "output_type": "execute_result"
1951 }
1952 ],
1953 "source": [
1954 "list(sicp_starts.items())[:10]"
1955 ]
1956 },
1957 {
1958 "cell_type": "code",
1959 "execution_count": 26,
1960 "metadata": {
1961 "collapsed": false
1962 },
1963 "outputs": [
1964 {
1965 "data": {
1966 "text/plain": [
1967 "[]"
1968 ]
1969 },
1970 "execution_count": 26,
1971 "metadata": {},
1972 "output_type": "execute_result"
1973 }
1974 ],
1975 "source": [
1976 "list(sicp_counts[('Gas', 'Meters', '.')].elements())"
1977 ]
1978 },
1979 {
1980 "cell_type": "code",
1981 "execution_count": 27,
1982 "metadata": {
1983 "collapsed": false
1984 },
1985 "outputs": [
1986 {
1987 "data": {
1988 "text/plain": [
1989 "('The', 'left', \"''\")"
1990 ]
1991 },
1992 "execution_count": 27,
1993 "metadata": {},
1994 "output_type": "execute_result"
1995 }
1996 ],
1997 "source": [
1998 "random.choice(list(sicp_starts.elements()))"
1999 ]
2000 },
2001 {
2002 "cell_type": "code",
2003 "execution_count": 28,
2004 "metadata": {
2005 "collapsed": false
2006 },
2007 "outputs": [
2008 {
2009 "data": {
2010 "text/plain": [
2011 "('+', 'or')"
2012 ]
2013 },
2014 "execution_count": 28,
2015 "metadata": {},
2016 "output_type": "execute_result"
2017 }
2018 ],
2019 "source": [
2020 "t = ('as', '+')\n",
2021 "t[1:] + ('or', )"
2022 ]
2023 },
2024 {
2025 "cell_type": "code",
2026 "execution_count": 29,
2027 "metadata": {
2028 "collapsed": true
2029 },
2030 "outputs": [],
2031 "source": [
2032 "def markov_sentence(counts, starts, items=None):\n",
2033 " i = 0\n",
2034 " current = random.choice(list(starts.elements()))\n",
2035 " chain = list(current)\n",
2036 " nexts = list(counts[current].elements())\n",
2037 " while nexts and items and i < items:\n",
2038 " next_item = random.choice(nexts)\n",
2039 " chain += [next_item]\n",
2040 " current = current[1:] + (next_item, )\n",
2041 " i += 1\n",
2042 " nexts = list(counts[current].elements())\n",
2043 " # print(chain, ':', current, ':', nexts)\n",
2044 " return chain"
2045 ]
2046 },
2047 {
2048 "cell_type": "code",
2049 "execution_count": 30,
2050 "metadata": {
2051 "collapsed": false
2052 },
2053 "outputs": [
2054 {
2055 "data": {
2056 "text/plain": [
2057 "'Rewrite the sqrt procedure of section 2.2 .'"
2058 ]
2059 },
2060 "execution_count": 30,
2061 "metadata": {},
2062 "output_type": "execute_result"
2063 }
2064 ],
2065 "source": [
2066 "' '.join(markov_sentence(sicp_counts, sicp_starts, 500))"
2067 ]
2068 },
2069 {
2070 "cell_type": "code",
2071 "execution_count": 31,
2072 "metadata": {
2073 "collapsed": false
2074 },
2075 "outputs": [
2076 {
2077 "data": {
2078 "text/plain": [
2079 "Counter({'.': 1})"
2080 ]
2081 },
2082 "execution_count": 31,
2083 "metadata": {},
2084 "output_type": "execute_result"
2085 }
2086 ],
2087 "source": [
2088 "sicp_counts['the', 'dispatch', 'procedure']"
2089 ]
2090 },
2091 {
2092 "cell_type": "code",
2093 "execution_count": 32,
2094 "metadata": {
2095 "collapsed": false
2096 },
2097 "outputs": [
2098 {
2099 "data": {
2100 "text/plain": [
2101 "'Continuing this process , we obtain better and better approximations to the square root .'"
2102 ]
2103 },
2104 "execution_count": 32,
2105 "metadata": {},
2106 "output_type": "execute_result"
2107 }
2108 ],
2109 "source": [
2110 "' '.join(markov_sentence(one_s_counts, one_s_starts, 500))"
2111 ]
2112 },
2113 {
2114 "cell_type": "code",
2115 "execution_count": 33,
2116 "metadata": {
2117 "collapsed": false
2118 },
2119 "outputs": [
2120 {
2121 "data": {
2122 "text/plain": [
2123 "(defaultdict(collections.Counter,\n",
2124 " {(',', 'we'): Counter({'obtain': 1}),\n",
2125 " ('Continuing', 'this'): Counter({'process': 1}),\n",
2126 " ('and', 'better'): Counter({'approximations': 1}),\n",
2127 " ('approximations', 'to'): Counter({'the': 1}),\n",
2128 " ('better', 'and'): Counter({'better': 1}),\n",
2129 " ('better', 'approximations'): Counter({'to': 1}),\n",
2130 " ('obtain', 'better'): Counter({'and': 1}),\n",
2131 " ('process', ','): Counter({'we': 1}),\n",
2132 " ('root', '.'): Counter(),\n",
2133 " ('square', 'root'): Counter({'.': 1}),\n",
2134 " ('the', 'square'): Counter({'root': 1}),\n",
2135 " ('this', 'process'): Counter({',': 1}),\n",
2136 " ('to', 'the'): Counter({'square': 1}),\n",
2137 " ('we', 'obtain'): Counter({'better': 1})}),\n",
2138 " Counter({('Continuing', 'this'): 1}))"
2139 ]
2140 },
2141 "execution_count": 33,
2142 "metadata": {},
2143 "output_type": "execute_result"
2144 }
2145 ],
2146 "source": [
2147 "one_s_counts, one_s_starts"
2148 ]
2149 },
2150 {
2151 "cell_type": "code",
2152 "execution_count": 35,
2153 "metadata": {
2154 "collapsed": false
2155 },
2156 "outputs": [],
2157 "source": [
2158 "def sentence_join(tokens):\n",
2159 " sentence = ''\n",
2160 " for t in tokens:\n",
2161 " if t[-1] not in \".,:;')-\":\n",
2162 " sentence += ' '\n",
2163 " sentence += t\n",
2164 " return sentence.strip()"
2165 ]
2166 },
2167 {
2168 "cell_type": "code",
2169 "execution_count": 36,
2170 "metadata": {
2171 "collapsed": false
2172 },
2173 "outputs": [
2174 {
2175 "data": {
2176 "text/plain": [
2177 "'Griss, Martin L.'"
2178 ]
2179 },
2180 "execution_count": 36,
2181 "metadata": {},
2182 "output_type": "execute_result"
2183 }
2184 ],
2185 "source": [
2186 "sentence_join(markov_sentence(sicp_counts, sicp_starts, 500))"
2187 ]
2188 },
2189 {
2190 "cell_type": "code",
2191 "execution_count": 38,
2192 "metadata": {
2193 "collapsed": false
2194 },
2195 "outputs": [
2196 {
2197 "data": {
2198 "text/plain": [
2199 "'2 Compiling Expressions In this section we will examine several Lisp procedures and design a specific register machine to execute programs written in Scheme, we used a succession of programs, most of the pairs generated in a typical computation are used only to speed up the evaluator by separating analysis from runtime execution. Unfortunately, as we have implemented adjoin- term to work for polynomials with coefficients that are themselves sequences are subtrees. In the original definition of scale- list, the recursive structure of the procedure object W1. A. Turner, David, and Daniel Weinreb. 4 Compound Procedures 1.1. Although their method was complex, it produced reasonably efficient programs because it did little redundant search. Exercise 3.13 constructed such lists. Modify the handling of cond so that it expects the integrand as a delayed argument and hence can be used in this way: ( define ( scheme- number -> scheme- number scheme- number -> scheme- number- package instantiate instantiate a pattern instruction counting instruction execution procedure instruction sequence, a list of n elements ? Exercise 2.65. Exercise 3.22.'"
2200 ]
2201 },
2202 "execution_count": 38,
2203 "metadata": {},
2204 "output_type": "execute_result"
2205 }
2206 ],
2207 "source": [
2208 "' '.join(sentence_join(markov_sentence(sicp_counts, sicp_starts, 500)) for _ in range(10))"
2209 ]
2210 },
2211 {
2212 "cell_type": "code",
2213 "execution_count": 42,
2214 "metadata": {
2215 "collapsed": false
2216 },
2217 "outputs": [],
2218 "source": [
2219 "kjb = unaccent(open('king-james-bible.txt').read())\n",
2220 "kjb_starts, kjb_counts = find_counts(sentences(tokenise(kjb)), tuple_size=3)"
2221 ]
2222 },
2223 {
2224 "cell_type": "code",
2225 "execution_count": 43,
2226 "metadata": {
2227 "collapsed": false
2228 },
2229 "outputs": [
2230 {
2231 "data": {
2232 "text/plain": [
2233 "'4: 5 And Simon answering said unto him, Thou wicked and slothful servant, thou knewest that I reap where I sowed not, and he that is of a perverse heart shall be there: and make no mention of the name of it called Babel; because the former troubles are forgotten, and because of the Ethiopian woman whom he had chosen. 20: 6 But Abram said unto Lot, Hast thou found honey ? eat so much as heard whether there be prophecies, they shall fall one upon another, that shall not be cut off from among his people: neither shalt thou suffer the salt of the covenant stood: and they rest not day and night come to an end, 21: 26 They mount up to heaven, and on camels, and asses. 15: 4 Yea, though he be not redeemed within the space of a full year may he redeem it. 89: 43 Thou hast also committed fornication with her, and disallowed her not: then all her vows, or of the strangers for a prey, and to the soldiers, when they saw that the Syrians were fled, they likewise fled before Abishai his brother, is in darkness even until now. 8: 8 Also he sent forth a raven, which went out from the land of Canaan, the lot of our inheritance on this side Jordan westward. 2: 6 His body also was like the fiery flame, and the Jebusites, which were remaining of the families of the Gershonites according to their families was thus: even the border of Simeon, Shemuel the son of Kareah spake to Gedaliah in Mizpah secretly saying, Let neither man nor woman alive, to bring up the tithe of the corn, and to the king, and of oil, ye shall keep this service. 8: 8 So Rabshakeh returned, and corrupted all their doings. 18: 17 And the flood was forty days upon the land, and their wives, and all they that were vexed with unclean spirits: and they possessed it, and set his heart on Daniel to deliver him: and he said unto them thus, Who commanded you to perform, even ten thousand captives, and brought him to the face, both he that soweth and he that formed the eye, I am Christ; and shall set the woman before the LORD, the fire causeth the waters to boil, to make an atonement for your sin. 2: 22 Thou shalt eat it within thy gates: 20: 4 And whither I go. 8: 29 And Jesus answered and said, I am in a great strait: let us go by night, being one of them.'"
2234 ]
2235 },
2236 "execution_count": 43,
2237 "metadata": {},
2238 "output_type": "execute_result"
2239 }
2240 ],
2241 "source": [
2242 "' '.join(sentence_join(markov_sentence(kjb_counts, kjb_starts, 500)) for _ in range(10))"
2243 ]
2244 },
2245 {
2246 "cell_type": "code",
2247 "execution_count": 44,
2248 "metadata": {
2249 "collapsed": true
2250 },
2251 "outputs": [],
2252 "source": [
2253 "all_starts = sicp_starts + kjb_starts"
2254 ]
2255 },
2256 {
2257 "cell_type": "code",
2258 "execution_count": 45,
2259 "metadata": {
2260 "collapsed": false
2261 },
2262 "outputs": [
2263 {
2264 "data": {
2265 "text/plain": [
2266 "[(('The', 'programmer', 'must'), 1),\n",
2267 " (('And', 'Jehoram', 'reigned'), 1),\n",
2268 " (('The', 'recursive', 'compilation'), 1),\n",
2269 " (('17', ':', '22'), 10),\n",
2270 " (('Hall', 'the', 'Rosalind'), 1),\n",
2271 " (('107', ':', '10'), 1),\n",
2272 " (('The', 'stored', 'value'), 1),\n",
2273 " (('148', ':', '1'), 1),\n",
2274 " (('So', 'the', 'procedure'), 1),\n",
2275 " (('Woe', 'to', 'the'), 1),\n",
2276 " (('Design', 'and', 'implement'), 1),\n",
2277 " (('Figure', '3.2', 'shows'), 1),\n",
2278 " (('23', ':', '30'), 7),\n",
2279 " (('51', ':', '6'), 3),\n",
2280 " (('The', 'key', 'modularity'), 1),\n",
2281 " (('6', ':', '17'), 25),\n",
2282 " (('47', ':', '1'), 5),\n",
2283 " (('mutable', 'Indeed', ','), 1),\n",
2284 " (('And', 'he', 'saith'), 4),\n",
2285 " (('31', 'Exercise', '1.9'), 1)]"
2286 ]
2287 },
2288 "execution_count": 45,
2289 "metadata": {},
2290 "output_type": "execute_result"
2291 }
2292 ],
2293 "source": [
2294 "list(all_starts.items())[:20]"
2295 ]
2296 },
2297 {
2298 "cell_type": "code",
2299 "execution_count": 46,
2300 "metadata": {
2301 "collapsed": true
2302 },
2303 "outputs": [],
2304 "source": [
2305 "all_counts = collections.defaultdict(collections.Counter)\n",
2306 "for k in sicp_counts:\n",
2307 " all_counts[k] = sicp_counts[k].copy()\n",
2308 "for k in kjb_counts:\n",
2309 " all_counts[k] += kjb_counts[k].copy()"
2310 ]
2311 },
2312 {
2313 "cell_type": "code",
2314 "execution_count": 47,
2315 "metadata": {
2316 "collapsed": false
2317 },
2318 "outputs": [
2319 {
2320 "data": {
2321 "text/plain": [
2322 "\"The New Hacker's Dictionary. 14 Any memory cell that is not one- directional computations, which perform operations on prespecified arguments to produce desired outputs. 6 Internal Definitions Our environment model of procedure application, which is defined to be the special case where the set elements need not be a one- element list, which is the root of the tree to find the next free location. Fill in the missing expressions in the following definition of factorial: ( define ( user- print modified for compiled code apply- generic requests an operation to initialize the stack and return it, and students pick it up in a few seconds with the Fermat method), and test each of the 12 primes found in exercise 1.22. High- level languages, erected on a machine- language program machine model modified for compiled code monitoring performance ( stack use) of compiled code, and each time we move down the right branch and which will be executed). We have not shown the part of the implementation. ( Hint: Use substitution to evaluate ( square x) ( if (= n 1) ( make- new- machine make- operation- exp- reg dest)))) ( lambda() < exp>), without using the optimization provided by memo- proc do something like ( set ! * unparsed * ( cdr * unparsed*)) sent)) We can generate such a parse with a simple program that has separate procedures for each of the items in the set is ( 1) = y 1. Then we can find the remainder of the division is called the mutual exclusion problem. 4, where we found that formulating stream models of systems with loops may require uses of delay beyond the hidden'' delay supplied by cons- stream initial- value ( thunk- value obj)) ( else ( make- rat is called. Assume that the coefficients of the terms and add them. If none of the < test>, and the result of the operation is valid.\""
2323 ]
2324 },
2325 "execution_count": 47,
2326 "metadata": {},
2327 "output_type": "execute_result"
2328 }
2329 ],
2330 "source": [
2331 "' '.join(sentence_join(markov_sentence(sicp_counts, sicp_starts, 500)) for _ in range(10))"
2332 ]
2333 },
2334 {
2335 "cell_type": "code",
2336 "execution_count": 48,
2337 "metadata": {
2338 "collapsed": false
2339 },
2340 "outputs": [
2341 {
2342 "data": {
2343 "text/plain": [
2344 "\"19: 25 But thou, O God: for they commit lewdness. 16: 15 And forgetteth that the foot may crush them, or that is in thee by the walls and in the feast of unleavened bread, until the selfsame day. 28: 23 But I will take away out of all his troubles. 4: 5 And they shall answer and say unto him, Is not this the fast that I have been with you at all seasons: the hard causes they brought unto him all the elders of the land of Egypt saying, 12: 11 Now when Job's three friends heard of all that therein is, in my realm, which are come to the place which the man of God, the God of Israel, which they could not so much as to eat. 5: 28 The LORD shall preserve thy going out; and it was shut. 43: 4 And the king of the nations: for I will stretch out my hand, and be rejected of the elders: from whom also I speak freely: for I have loved strangers, and after his going forth is prepared as the morning, that they come up out of Egypt, even they also turned to be their governor in the land of thy acts and of thy wisdom was not told me: thy hair is as a troubled fountain, and she conceived, and bare Jacob the fifth son. 22: 38 And Elisha came again to him, and upon the great toe of their right feet: and Deborah went up with ten thousand men of Judah; We have a little sister, and her days shall not be forgiven. 13: 2 And it shall come to pass, when the people removed from their tents, and said unto him that is near of kin unto us, he would raise up Christ to sit on the throne; yea, they have trodden my portion under foot, whose land the rivers have spoiled, to the right hand of the king. 40: 34 Then said they unto him an accusation against him. 14: 1 The priests the Levites, unto Ezra the priest brought the law before the congregation for judgment, but there was no bread there but the shewbread, which was the son of Meshullam, the son of Hinnom, and burnt it with fire, and destroyed down to the pit.\""
2345 ]
2346 },
2347 "execution_count": 48,
2348 "metadata": {},
2349 "output_type": "execute_result"
2350 }
2351 ],
2352 "source": [
2353 "' '.join(sentence_join(markov_sentence(kjb_counts, kjb_starts, 500)) for _ in range(10))"
2354 ]
2355 },
2356 {
2357 "cell_type": "code",
2358 "execution_count": 50,
2359 "metadata": {
2360 "collapsed": false
2361 },
2362 "outputs": [
2363 {
2364 "data": {
2365 "text/plain": [
2366 "'6: 30 And the children of Israel, and the garlick: 11: 11 And Balak said unto Balaam, Go with the men, whom I have not spoken of me the thing that I hate, saith the LORD. 14: 27 And Joshua made them that day, saith the LORD; If ye will not fall upon me yourselves. 7: 2 Thou hast given a banner to them that believe: 2: 37 And at the end of the mountain that lieth before the valley of vision, breaking down the walls of the houses withal: 29: 37 Their meat offering and their drink offerings, and offer an offering made by fire unto the LORD. Then the master of his mercy; 1: 36 And Bethnimrah, and Succoth, and encamped at Hazeroth. If the queue was initially empty, we set the front and rear pointers of the queue in the first month in the second chariot that he had built, 10: 32 Benjamin, Malluch, and Adaiah, 10: 6 In burnt offerings and sacrifices, and with him an hundred and thirty. An extreme case of inefficiency occurs when the system falls into infinite loops in making deductions. 18: 12 And when the days of Cainan were nine hundred sixty and two years, and begat Methuselah: 5: 11 And Manoah arose, and rebuked them, because of the words which the horn spake: I sought him, but feared the people, He is my shepherd, and the Levites shall keep the watch of the LORD, and we will give ourselves continually to prayer, a certain blind man sat by the way. 34: 20 In a moment, and be still. One thing that makes the query language provides means of combination is crucial to the ability to represent compound data. 4: 15 That which is far off shall die of the pestilence.'"
2367 ]
2368 },
2369 "execution_count": 50,
2370 "metadata": {},
2371 "output_type": "execute_result"
2372 }
2373 ],
2374 "source": [
2375 "' '.join(sentence_join(markov_sentence(all_counts, all_starts, 500)) for _ in range(10))"
2376 ]
2377 },
2378 {
2379 "cell_type": "code",
2380 "execution_count": 63,
2381 "metadata": {
2382 "collapsed": false
2383 },
2384 "outputs": [],
2385 "source": [
2386 "all2 = unaccent(open('sicp-trimmed.txt').read() + open('king-james-bible.txt').read())\n",
2387 "all2_starts, all2_counts = find_counts(sentences(tokenise(all2)), tuple_size=2)"
2388 ]
2389 },
2390 {
2391 "cell_type": "code",
2392 "execution_count": 65,
2393 "metadata": {
2394 "collapsed": false
2395 },
2396 "outputs": [
2397 {
2398 "data": {
2399 "text/plain": [
2400 "['32 The stranger that is, that hath taken a new clause to the unit square ( imag- part, and our Saviour.',\n",
2401 " '112: 2 And, behold, immediately he arose, and cast them to pass at that level, which loveth thee and thy word is gone a little sister, to bring them unto the sea, they reach even to him, with the whole land shall be the Son of man are before the LORD pondereth the hearts of the first and the latter days.',\n",
2402 " '10: 38 Joel the chief man of God, as much as any of the wood offering, with his fathers: also they were ashamed: let not the light of the Greeks seek after my name, and said unto him that doeth righteousness at all redeem ought of the LORD appointed other seventy also, as at other times, and Mearah that is spent in our streets: and in Benjamin fenced cities, and to all that he will preserve him, and couple the curtains of goats that were numbered of them that sleep sleep in the house of Jeroboam the son of Mattaniah: for ye shall be shortened.',\n",
2403 " '( This argument is the spirit of his nostrils, and turn again by the heel in the place where concurrency control, and he that reapeth may rejoice.',\n",
2404 " '6: 7 That ye may be of the truth.',\n",
2405 " '6: 10 And thou shalt see it.',\n",
2406 " \"9: 25 And it shall come and sing praise to thee with ? x set) false))))) Without- interrupts disables time- shared operating systems, especially they who are ye in me is true,'' for example, if thou warn the righteous smite me; 26: 10 Declaring the end.\",\n",
2407 " '7: 22 But he rebelled against thee, yet make thee to anger, and Jeremoth, and the men of Babylon.',\n",
2408 " '( define ( cube x)))) ( parallel- execute is a great multitude from Galilee, into the ark of the dust of the desire of our attitudes about programming and, behold, they brought out the mote out of thy land shall fall therein: and the compiler.',\n",
2409 " 'And Peter remembered the days when God overthrew Sodom and her towns and country round about upon the horns, and openeth their ears are open, and to the user, provide a rigorous mathematical definition of the boards of fir.',\n",
2410 " '5.4.',\n",
2411 " 'For example, we set the battle increased that trouble me ! for my comfort in these cases are recognized by the space is ( define ( append- instruction- sequence, we will use a distinct bignum data type enables the system should be bound in chains among all the days that have not walked in the faith, I will surely shew myself unto prayer.',\n",
2412 " '1: 15 And thou shalt require of the testimony in Jacob his people made no lamentation.',\n",
2413 " '12: 37 Then he questioned with him.',\n",
2414 " 'The procedure that takes as input a pattern such as had not obtained that which is at hand, that David had a servant to wife.',\n",
2415 " '2: 11 Now this I say unto thee; and they that were diseased.',\n",
2416 " 'There are many ways in the matter, they withered away from him, sing praises unto God from the tower, and birds of the terms of more and more abundantly unto you, he hath given unto thee: and they shall offer the gift, with the beasts of the same car.',\n",
2417 " 'You must implement the generic procedure so that they name, and go up, and take up his pillars.',\n",
2418 " \"We will have to change ( or ( recursively) during analysis is a faithful man who was yet shut up him that shaketh it ? 30: 15 The way of the LORD's passover.\",\n",
2419 " 'Amen, and combine the two horns were high; that thou hast heard, Satan: thou hast redeemed.',\n",
2420 " \"The community's broader work goes beyond issues of objects that can skill to hew in the gate of Issachar according to all that they may shave their heads, neither have seen strange things to write coercion procedures that manipulate the shared account.\",\n",
2421 " \"25: 1: 36 My desire is, and Lot his brother's name was Tamar: she is thy love to have done evil in his disease he sought how he loved the people and the Amalekites had carried it into the sea side: and they shall be as frontlets between thine eyes behold, the joy wherewith we joy for your goodness is as the shadow of death stain it; because he had said unto her mistress, and they of Tyre and Sidon, which uses the last; I will restore it to a given application problem; the family of the whole congregation of the armies of Israel, and shod them, having faithful children not accused of riot, speaking of the lad where he entereth in by the young men now arise, if we call make- new- machine procedure, we hurt them, and his dwelling place of rest, but shall surely pay ox for a fool: much more diligent, upon these tables the words which I also was accounted to him, and much people, and upon the persons were four leprous men at his birth.\",\n",
2422 " '32: 3 She crieth in the gate from the power of the sea shall be unclean.',\n",
2423 " '2: 13 To keep the commandments of the wicked he turneth upside down shall be set in my sanctuary ? but thou shalt make two rings in the wine, which is called the Italian band, when ye go to Dothan.',\n",
2424 " '21: 8 Therefore thus saith the LORD our God in heaven.',\n",
2425 " 'The empty set is being paid more than lovers of God; every one over against him.',\n",
2426 " 'And the sons of Bilhah.',\n",
2427 " '34: 22 Saying, Give us this great fire any more at all times: unless she had seen.',\n",
2428 " '1: 7 Ain, Remmon, and Lot the son of Pashur, a flame: there shall be strong.',\n",
2429 " '32: 10 Yea, all the leaves of a combination, and Meshullam.']"
2430 ]
2431 },
2432 "execution_count": 65,
2433 "metadata": {},
2434 "output_type": "execute_result"
2435 }
2436 ],
2437 "source": [
2438 "[sentence_join(markov_sentence(all2_counts, all2_starts, 500)) for _ in range(30)]"
2439 ]
2440 },
2441 {
2442 "cell_type": "code",
2443 "execution_count": 66,
2444 "metadata": {
2445 "collapsed": false
2446 },
2447 "outputs": [],
2448 "source": [
2449 "sicp_lovecraft = unaccent(open('sicp-trimmed.txt').read() + open('lovecraft.txt').read())\n",
2450 "sl2_starts, sl2_counts = find_counts(sentences(tokenise(sicp_lovecraft)), tuple_size=2)"
2451 ]
2452 },
2453 {
2454 "cell_type": "code",
2455 "execution_count": 70,
2456 "metadata": {
2457 "collapsed": false
2458 },
2459 "outputs": [
2460 {
2461 "data": {
2462 "text/plain": [
2463 "\"To make a new procedure object W1. Now, if you fail to imagine an alien evolution had led him. We also have to steer better with four interface procedures to enforce the condition is called forcing. In Proceedings of the window I did not, they wished; so that the great sweep, half- bent on its own opposite. There he worked like beavers over Lake's two best planes, fitting them again it was very blurred, and very handsome save for his family. By incorporating a limited number of arrests, the brightness and affability which promised respite from our emotions, and aerial, aeroplane parts, an' crinkly albino hair, pale blue eyes, impotently trying to tell is more ancient than the ordered union- set pairs))) Make- withdraw. Out of the same hallucination. I had wander'd in rapture beneath them; and there was only a lone coyote on the side turned away from both men sat still and attempt to divide both the Esquimaux wizards and the weirdness of the sort used in Mesopotamia, being wakeful, and could not sleep well that sucks your life and sensation- the hellish whine of the vault I kept deathly quiet, and when I told them the back of the Eighteenth Century, with full daemoniac fury upon the dark stairs. ( You should find that much of the same constraint that the whole way to the westward precipice beside him was a baronet before his arrival. 1 Representations for Complex Numbers We will augment the factorial machine shown in figure 5.14) of section 5.2 we use apply- dispatch, and seemed to notice this ripple until reminded by later events; but I can scarcely describe the strangeness was not sorry for him did not believe he was so rapidly overtaking it in an infinite continued fraction expansion for e x is unbound.\""
2464 ]
2465 },
2466 "execution_count": 70,
2467 "metadata": {},
2468 "output_type": "execute_result"
2469 }
2470 ],
2471 "source": [
2472 "\" \".join(sentence_join(markov_sentence(sl2_counts, sl2_starts, 500)) for _ in range(10))"
2473 ]
2474 },
2475 {
2476 "cell_type": "code",
2477 "execution_count": 75,
2478 "metadata": {
2479 "collapsed": false
2480 },
2481 "outputs": [
2482 {
2483 "data": {
2484 "text/html": [
2485 "<p>Mrs. Now she heard voices in other universes and other myths, they raised and the senior Ward, however, were soon unbuttoning our heavy flying furs. 4 Exponentiation Consider the following August his labours. The face was a dream- world discovery in press. Each time we called a program for simulating diffusion ( say N / k calls to protected are done; though once in a begin.) The evaluator can check to see a pale lemon- yellow while his singular results. I had carried off. Reflecting upon these pictures can revive. There was a hideous uncertainty came over and over must you land amongst them with hot brooding gaze and evil. Dee was never regarded by the known habits of childish rebelliousness being exchanged for other needs of the steel- rimmed pince- nez and pointed stiffly to a cloud of probable East- Indian or Indochinese provenance, though I could tell; yet it is too tremendous for any to picture or encompass. Aroused at last to see the old Asian castles clinging to him on his study, a rational number can automatically be applied to frames in parallel and merging with the unbeautiful things of earth's convulsions, this is embedded.</h1>"
2486 ],
2487 "text/plain": [
2488 "<IPython.core.display.HTML object>"
2489 ]
2490 },
2491 "metadata": {},
2492 "output_type": "display_data"
2493 }
2494 ],
2495 "source": [
2496 "from IPython.core.display import display, HTML\n",
2497 "display(HTML('<p>' + \n",
2498 " \" \".join(sentence_join(markov_sentence(sl2_counts, sl2_starts, 500)) for _ in range(10)) + \n",
2499 " '</h1>'))"
2500 ]
2501 },
2502 {
2503 "cell_type": "code",
2504 "execution_count": null,
2505 "metadata": {
2506 "collapsed": true
2507 },
2508 "outputs": [],
2509 "source": []
2510 }
2511 ],
2512 "metadata": {
2513 "kernelspec": {
2514 "display_name": "Python 3",
2515 "language": "python",
2516 "name": "python3"
2517 },
2518 "language_info": {
2519 "codemirror_mode": {
2520 "name": "ipython",
2521 "version": 3
2522 },
2523 "file_extension": ".py",
2524 "mimetype": "text/x-python",
2525 "name": "python",
2526 "nbconvert_exporter": "python",
2527 "pygments_lexer": "ipython3",
2528 "version": "3.4.3+"
2529 }
2530 },
2531 "nbformat": 4,
2532 "nbformat_minor": 0
2533 }