Made the text output narrower so github displays it neatly
[visualising-punctuation.git] / visualising-punctuation.ipynb
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# Punctuation in novels\n",
8 "Inspired by [Punctuation in novels](https://medium.com/@neuroecology/punctuation-in-novels-8f316d542ec4#.qwj8e1n8m).\n",
9 "\n",
10 "Texts used here are [The complete works of Sherlock Holmes](sherlock.txt), [War and Peace](war-and-peace.txt), [The complete works of Shakespeare](shakespeare.txt), [Ulysses](ulysses.txt), and [Pride and Prejudice](pride-and-prejudice.txt)."
11 ]
12 },
13 {
14 "cell_type": "code",
15 "execution_count": 7,
16 "metadata": {
17 "collapsed": false
18 },
19 "outputs": [],
20 "source": [
21 "import string\n",
22 "import collections\n",
23 "from PIL import Image, ImageDraw\n",
24 "from math import ceil\n",
25 "import matplotlib as mpl\n",
26 "import matplotlib.pyplot as plt\n",
27 "%matplotlib inline\n",
28 "import pandas as pd"
29 ]
30 },
31 {
32 "cell_type": "markdown",
33 "metadata": {},
34 "source": [
35 "The `string` module has some nice subsets of characters. Does it know about punctuation?"
36 ]
37 },
38 {
39 "cell_type": "code",
40 "execution_count": 8,
41 "metadata": {
42 "collapsed": false
43 },
44 "outputs": [
45 {
46 "data": {
47 "text/plain": [
48 "'!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'"
49 ]
50 },
51 "execution_count": 8,
52 "metadata": {},
53 "output_type": "execute_result"
54 }
55 ],
56 "source": [
57 "string.punctuation"
58 ]
59 },
60 {
61 "cell_type": "markdown",
62 "metadata": {},
63 "source": [
64 "## Getting the punctuation\n",
65 "First, let's just open a text file and read the punctuation. We can also count the number of different punctuation characters in it."
66 ]
67 },
68 {
69 "cell_type": "code",
70 "execution_count": 9,
71 "metadata": {
72 "collapsed": true
73 },
74 "outputs": [],
75 "source": [
76 "sherlock = open('sherlock-holmes.txt').read()"
77 ]
78 },
79 {
80 "cell_type": "code",
81 "execution_count": 10,
82 "metadata": {
83 "collapsed": false
84 },
85 "outputs": [
86 {
87 "name": "stdout",
88 "output_type": "stream",
89 "text": [
90 "..-.......'.........,,,.,,,.,.-'..,-,.,,...,-,,,,,,,,.,,,,.:,,.,,,.-,-(,.-,,,,.,,,,.,,.,,..-...;,,.,,,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,.,'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\"\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",--.\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".'''''''''''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''..,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\".\n"
91 ]
92 }
93 ],
94 "source": [
95 "sherlock_punct = [c for c in sherlock if c in string.punctuation]\n",
96 "print(''.join(sherlock_punct)[:500])"
97 ]
98 },
99 {
100 "cell_type": "code",
101 "execution_count": 11,
102 "metadata": {
103 "collapsed": false
104 },
105 "outputs": [
106 {
107 "data": {
108 "text/plain": [
109 "Counter({'!': 171,\n",
110 " '\"': 4834,\n",
111 " '&': 5,\n",
112 " \"'\": 1490,\n",
113 " '(': 5,\n",
114 " ',': 7053,\n",
115 " '-': 965,\n",
116 " '.': 4843,\n",
117 " '/': 1,\n",
118 " ':': 56,\n",
119 " ';': 202,\n",
120 " '?': 138})"
121 ]
122 },
123 "execution_count": 11,
124 "metadata": {},
125 "output_type": "execute_result"
126 }
127 ],
128 "source": [
129 "sherlock_counts = collections.Counter(sherlock_punct)\n",
130 "sherlock_counts"
131 ]
132 },
133 {
134 "cell_type": "code",
135 "execution_count": 12,
136 "metadata": {
137 "collapsed": false
138 },
139 "outputs": [
140 {
141 "data": {
142 "text/plain": [
143 "! 171\n",
144 "\" 4834\n",
145 "& 5\n",
146 "' 1490\n",
147 "( 5\n",
148 ", 7053\n",
149 "- 965\n",
150 ". 4843\n",
151 "/ 1\n",
152 ": 56\n",
153 "; 202\n",
154 "? 138\n",
155 "dtype: int64"
156 ]
157 },
158 "execution_count": 12,
159 "metadata": {},
160 "output_type": "execute_result"
161 }
162 ],
163 "source": [
164 "sherlock_ps = pd.Series(sherlock_counts)\n",
165 "sherlock_ps.sort_index(inplace=True)\n",
166 "sherlock_ps"
167 ]
168 },
169 {
170 "cell_type": "code",
171 "execution_count": 13,
172 "metadata": {
173 "collapsed": false
174 },
175 "outputs": [
176 {
177 "data": {
178 "text/plain": [
179 "<matplotlib.axes._subplots.AxesSubplot at 0x7f6556d5c748>"
180 ]
181 },
182 "execution_count": 13,
183 "metadata": {},
184 "output_type": "execute_result"
185 },
186 {
187 "data": {
188 "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD9CAYAAABZVQdHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFPhJREFUeJzt3W2MXNdh3vH/I7FKKJkNwbpdkZRSC/Ey0rZ2I7MVnSaF\nxk3KbvoiEgVCUkgFoqaNCkwjt2iDLo22XH9xJfeVRkEVrZ1omVoM2DgR5JqmuWI4TeEg2tqRbcY0\nTbLIqt51dpXaceQ0aUNGTz/MoTjaDndnlzOj3cPnBwxw7rnn3nMuZ/js2XN3ZmSbiIio021v9gAi\nIqJ/EvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVbMuQlHZL0FUnnJD0r6bskbZI0KemipNOSNi5o\nf0nSBUk72+q3l3NcknSkXxcUERHXLRrykt4GvB94l+13ALcD+4AxYNL2NuBM2UbSCLAXGAFGgaOS\nVE73NHDA9jAwLGm051cTERFvsNRM/lXgCnCnpHXAncA3gEeAidJmAthdyruA47av2J4GLgM7JG0G\nNtieKu2OtR0TERF9smjI2/4W8K+A/0kr3L9texIYsj1fms0DQ6W8BZhpO8UMsLVD/Wypj4iIPlpq\nueb7gL8PvI1WUL9F0t9ub+PW5yLksxEiIlahdUvs//PAr9r+JoCkXwR+EJiTdLftubIU80ppPwvc\n23b8PbRm8LOl3F4/26lDSfmBERGxTLbVqX6pNfkLwLslrS83UH8UOA98Cthf2uwHnivl54F9ku6Q\ndB8wDEzZngNelbSjnOextmM6DXZZj8OHDy/7mJt5pL/0l/5ujf7WyrUtZtGZvO0vSToGfB54Dfh1\n4D8AG4ATkg4A08Ce0v68pBPlB8FV4KCvj+Ag8AywHjhp+9SiI4uIiJu21HINtj8CfGRB9bdozeo7\ntf8w8OEO9V8A3rGCMUZExApV8Y7XRqOR/tJf+kt/a7qvfvWnpdZzBk2SV9uYIiJWM0l4hTdeIyJi\nDUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxE\nRMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFVsy5CV9v6SX2h6/K+kJSZskTUq6KOm0pI1txxyS\ndEnSBUk72+q3SzpX9h3p10VFRETLkiFv+2u2H7T9ILAd+H3gl4AxYNL2NuBM2UbSCLAXGAFGgaOS\nrn0t1dPAAdvDwLCk0V5fUEREXLfc5ZofBS7b/jrwCDBR6ieA3aW8Czhu+4rtaeAysEPSZmCD7anS\n7ljbMRER0Qfrltl+H3C8lIdsz5fyPDBUyluAX2s7ZgbYClwp5WtmS31Ez13/5XH58kXyUZOuZ/KS\n7gD+JvCfF+5z639F/mfEKuMVPCLqspyZ/I8BX7D922V7XtLdtufKUswrpX4WuLftuHtozeBnS7m9\nfrZTR+Pj46+XG40GjUZjGcOMiKhbs9mk2Wx21Vbd/moq6eeBz9ieKNsfAb5p+ylJY8BG22Plxuuz\nwEO0lmNeAN5u25JeBJ4ApoBPAx+1fWpBP86vy3GzWss1K3kdKcs1seZIwnbHNcquQl7SXcDLwH22\nv1PqNgEngO8FpoE9tr9d9n0QeC9wFfiA7c+W+u3AM8B64KTtJzr0lZCPm5aQj1vJTYf8ICXkoxcS\n8nErWSzk847XiIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQj\nIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhXIS9po6Rf\nkPRVSecl7ZC0SdKkpIuSTkva2Nb+kKRLki5I2tlWv13SubLvSD8uKCIirut2Jn8EOGn7AeCdwAVg\nDJi0vQ04U7aRNALsBUaAUeCoWt+qDPA0cMD2MDAsabRnVxIREf+fJUNe0vcAf8n2zwDYvmr7d4FH\ngInSbALYXcq7gOO2r9ieBi4DOyRtBjbYnirtjrUdExERfdDNTP4+4Lcl/aykX5f0HyXdBQzZni9t\n5oGhUt4CzLQdPwNs7VA/W+ojIqJPugn5dcC7gKO23wX8b8rSzDW2Dbj3w4uIiJuxros2M8CM7f9e\ntn8BOATMSbrb9lxZinml7J8F7m07/p5yjtlSbq+f7dTh+Pj46+VGo0Gj0ehimBERt4Zms0mz2eyq\nrVqT8CUaSb8CvM/2RUnjwJ1l1zdtPyVpDNhoe6zceH0WeIjWcswLwNttW9KLwBPAFPBp4KO2Ty3o\ny92MKWIxrXv9K3kdibz+Yq2RhG112tfNTB7gp4BPSLoD+B/A3wFuB05IOgBMA3sAbJ+XdAI4D1wF\nDral9kHgGWA9rb/WeUPAR0REb3U1kx+kzOSjFzKTj1vJYjP5vOM1IqJiCfmIiIol5CMiKpaQj4io\nWEI+IqJiCfmIiIol5CMiKpaQj4ioWEI+IqJiCfmIiIol5CMiKpaQj4ioWEI+IqJiCfmIiIol5CMi\nKpaQj4ioWEI+IqJiCfmIiIol5CMiKtZVyEualvRlSS9Jmip1myRNSroo6bSkjW3tD0m6JOmCpJ1t\n9dslnSv7jvT+ciIiol23M3kDDdsP2n6o1I0Bk7a3AWfKNpJGgL3ACDAKHFXrW5UBngYO2B4GhiWN\n9ug6IiKig+Us1yz8JvBHgIlSngB2l/Iu4LjtK7angcvADkmbgQ22p0q7Y23HREREHyxnJv+CpM9L\nen+pG7I9X8rzwFApbwFm2o6dAbZ2qJ8t9RER0Sfrumz3Q7Z/S9KfBCYlXWjfaduS3KtBjY+Pv15u\nNBo0Go1enToiYs1rNps0m82u2speXjZLOgz8HvB+Wuv0c2Up5qzt+yWNAdh+srQ/BRwGXi5tHij1\njwIP2358wfm93DFFLNS6DbSS15HI6y/WGknYXrikDnSxXCPpTkkbSvkuYCdwDnge2F+a7QeeK+Xn\ngX2S7pB0HzAMTNmeA16VtKPciH2s7ZiIiOiDbpZrhoBfKn8gsw74hO3Tkj4PnJB0AJgG9gDYPi/p\nBHAeuAocbJuaHwSeAdYDJ22f6uG1RETEAsterum3LNdEL2S5Jm4liy3XdHvj9ZZz/U/7ly8hEf2W\n12d0KyG/qJXNBCMGI6/PWFo+uyYiomIJ+YiIiiXkIyIqlpCPiKhYQj4iomIJ+YiIiiXkIyIqlpCP\niKhYQj4iomIJ+YiIiiXkIyIqlpCPiKhYQj4iomIJ+YiIiiXkIyIqlpCPiKhYQj4iomJdhbyk2yW9\nJOlTZXuTpElJFyWdlrSxre0hSZckXZC0s61+u6RzZd+R3l9KREQs1O1M/gPAea5/39gYMGl7G3Cm\nbCNpBNgLjACjwFFd/zLKp4EDtoeBYUmjvbmEiIi4kSVDXtI9wF8DPsb1L4h8BJgo5QlgdynvAo7b\nvmJ7GrgM7JC0Gdhge6q0O9Z2TERE9Ek3M/l/A/w08Fpb3ZDt+VKeB4ZKeQsw09ZuBtjaoX621EdE\nRB+tW2ynpL8BvGL7JUmNTm1sW9JKvjb+hsbHx18vNxoNGo2OXUdE3JKazSbNZrOrtrJvnM+SPgw8\nBlwFvhv448AvAn8BaNieK0sxZ23fL2kMwPaT5fhTwGHg5dLmgVL/KPCw7cc79OnFxjQorVsJKxmH\nWA3jv9XV/vzVfn2xPJKwrU77Fl2usf1B2/favg/YB/yy7ceA54H9pdl+4LlSfh7YJ+kOSfcBw8CU\n7TngVUk7yo3Yx9qOiYiIPll0uaaDa1OAJ4ETkg4A08AeANvnJZ2g9Zc4V4GDbdPyg8AzwHrgpO1T\nNzf0iIhYyqLLNW+GLNdEL9T+/NV+fbE8K16uiYiItS0hHxFRsYR8RETFEvIRERVLyEdEVCwhHxFR\nsYR8RETFEvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVLyEdE\nVCwhHxFRsYR8RETFFg15Sd8t6UVJX5R0XtI/L/WbJE1KuijptKSNbcccknRJ0gVJO9vqt0s6V/Yd\n6d8lRUTENYuGvO3/A7zH9g8A7wTeI+mHgTFg0vY24EzZRtIIsBcYAUaBo2p94zDA08AB28PAsKTR\nflxQRERct+Ryje3fL8U7gNuB3wEeASZK/QSwu5R3AcdtX7E9DVwGdkjaDGywPVXaHWs7JiIi+mTJ\nkJd0m6QvAvPAWdtfAYZsz5cm88BQKW8BZtoOnwG2dqifLfUREdFH65ZqYPs14AckfQ/wWUnvWbDf\nktzLQY2Pj79ebjQaNBqNXp4+ImJNazabNJvNrtrK7j6fJf1T4A+A9wEN23NlKeas7fsljQHYfrK0\nPwUcBl4ubR4o9Y8CD9t+vEMfXs6Y+qV1K2El4xCrYfy3utqfv9qvL5ZHErbVad9Sf13z1mt/OSNp\nPfBXgJeA54H9pdl+4LlSfh7YJ+kOSfcBw8CU7TngVUk7yo3Yx9qOiYiIPllquWYzMCHpNlo/EH7O\n9hlJLwEnJB0ApoE9ALbPSzoBnAeuAgfbpuUHgWeA9cBJ26d6fTEREfFGy1quGYQs10Qv1P781X59\nsTwrXq6JiIi1LSEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGx\nhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMWWDHlJ90o6K+kr\nkn5D0hOlfpOkSUkXJZ2WtLHtmEOSLkm6IGlnW/12SefKviP9uaSIiLimm5n8FeAf2P4zwLuBn5T0\nADAGTNreBpwp20gaAfYCI8AocFStbx0GeBo4YHsYGJY02tOriYiIN1gy5G3P2f5iKf8e8FVgK/AI\nMFGaTQC7S3kXcNz2FdvTwGVgh6TNwAbbU6XdsbZjIiKiD5a1Ji/pbcCDwIvAkO35smseGCrlLcBM\n22EztH4oLKyfLfUREdEn67ptKOktwCeBD9j+zvUVGLBtSe7VoMbHx18vNxoNGo1Gr04dEbHmNZtN\nms1mV21lL53Nkv4Y8F+Az9j+t6XuAtCwPVeWYs7avl/SGIDtJ0u7U8Bh4OXS5oFS/yjwsO3HF/Tl\nbsbUb60fYisZh1gN47/V1f781X59sTySsK1O+7r56xoBHwfOXwv44nlgfynvB55rq98n6Q5J9wHD\nwJTtOeBVSTvKOR9rOyYiIvpgyZm8pB8GfgX4MtenDoeAKeAE8L3ANLDH9rfLMR8E3gtcpbW889lS\nvx14BlgPnLT9RIf+MpOPm1b781f79cXyLDaT72q5ZpAS8tELtT9/tV9fLM9NLddERMTalZCPiKhY\nQj4iomIJ+YiIiiXkIyIqlpCPiKhYQj4iomIJ+YiIiiXkIyIq1vWnUEZd2j9FdLnyjsmItSMhf0tb\n2dviI2LtyHJNRETFEvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVLyEdEVKybL/L+GUnzks611W2S\nNCnpoqTTkja27Tsk6ZKkC5J2ttVvl3Su7DvS+0uJiIiFupnJ/ywwuqBuDJi0vQ04U7aRNALsBUbK\nMUd1/a2VTwMHbA8Dw5IWnjMiInpsyZC3/d+A31lQ/QgwUcoTwO5S3gUct33F9jRwGdghaTOwwfZU\naXes7ZiIiOiTla7JD9meL+V5YKiUtwAzbe1mgK0d6mdLfURE9NFN33h169Oq8olVERGr0Eo/oGxe\n0t2258pSzCulfha4t63dPbRm8LOl3F4/e6OTj4+Pv15uNBo0Go0VDjMioj7NZpNms9lVW3XzsbGS\n3gZ8yvY7yvZHgG/afkrSGLDR9li58fos8BCt5ZgXgLfbtqQXgSeAKeDTwEdtn+rQl1fDR9m27hev\n7FMaV8P4l5Lru+GRK7q+QX90c+3PXyyPJGx3fBEuOZOXdBx4GHirpK8D/wx4Ejgh6QAwDewBsH1e\n0gngPHAVONiW2AeBZ4D1wMlOAR+xtuWjm2P16WomP0iZyQ9Gru+GR66JmXXtz18sz2Iz+bzjNSKi\nYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+I\nqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKLfkdrxERNRv0l7AP2sBn8pJGJV2QdEnS\nP17GcSt+REQszit4rA0DDXlJtwP/DhgFRoBHJT3Q/Rlu9I99dpF9/dDs03lv0FtzsP3l+tLfsnob\n8PM32P5639egJ6yDnsk/BFy2PW37CvDzwK6bP23z5k+xivurPQRrv77a+0vIr8SNJqWHF9m3MoNe\nk98KfL1tewbYMeAxRMQqt9Ss9UMf+tAN962FdfJBGvRMPv/6EdGlwc12a6ZB/tST9G5g3PZo2T4E\nvGb7qbY2eaYiIpbJdsdffwYd8uuArwE/AnwDmAIetf3VgQ0iIuIWMtA1edtXJf094LPA7cDHE/AR\nEf0z0Jl8REQM1pp6x6ukf7igyrb/ddn3mO2fexOG1TeSNtj+Tim/3fblN3tMN0OSvMSsops2PRjH\nZuBbtv9vP/spfd1te67f/ayWfvtl4fX06/o69NOX14qkPwE8DvwB8DHbr/by/O3W2mfXbADe0vbY\n0Lbvzn50KOlseXyyH+dfwuckPSdpL3C6Hx1I+s3yeLEf51+gKemnJW3rMI7vL++A/q8DGMd/Ar4m\n6V8OoK+TA+ijk4/3+oSS/qLevLeQL7yenl/fDc7br9fKJ4G7gHuAX5P0fT0+/+uyXLMESX+6FP/I\n9kyf+7oL+MPyRrFrdQdpvUt4n+0T/ey/3yR9F/ATwKPAnwW+A4jWD+zfAD4BPGv7DwcwltuAB2x/\npc/9vGT7wX72MSiS/j2t97VcBD4DnKrpt4Ub6cdrRdKXbb+zlP8q8DHg28A/At5n+8d71ldCfnGS\nfrMUX7Hd1zduldn0btu/Vbb/FvAvgJ8CftL2X+9n/4NUPuLirWXzf9n+ozdzPP0i6aDto2/2OHqp\nfBTJjwE7gY3ALwOngM/V+jz2mqTPAT9he7ps3wZsAb4FbLT9jZ71lZBfPSR9yfafK+W/C/wT4Eds\nX5T0Bdvb39wRRryRpDuB99AK/R/Ma7Q7ku6ndU/xa33vKyG/ekg6S+vDMu4F3gv8ZdtNSX8KeOHa\nr3cREd1aazdea/fjwGu01jz3AB+XNAH8KvDUYgdGRHSSmfwqJmkr8EPAlwbxa11E1CchHxFRsSzX\nRERULCEfEVGxhHxERMUS8hERFUvIR0RU7P8BnaAgGCuaansAAAAASUVORK5CYII=\n",
189 "text/plain": [
190 "<matplotlib.figure.Figure at 0x7f6556db5e48>"
191 ]
192 },
193 "metadata": {},
194 "output_type": "display_data"
195 }
196 ],
197 "source": [
198 "sherlock_ps.plot(kind=\"bar\")"
199 ]
200 },
201 {
202 "cell_type": "markdown",
203 "metadata": {},
204 "source": [
205 "Now we can read and process a novel, wrap that into a function and read some other novels"
206 ]
207 },
208 {
209 "cell_type": "code",
210 "execution_count": 14,
211 "metadata": {
212 "collapsed": true
213 },
214 "outputs": [],
215 "source": [
216 "def punct_summarise(fname):\n",
217 " content = open(fname).read()\n",
218 " punct = ''.join(c for c in content if c in string.punctuation)\n",
219 " counts = collections.Counter(punct)\n",
220 " return {'punctuation': punct, 'counts': counts}"
221 ]
222 },
223 {
224 "cell_type": "code",
225 "execution_count": 15,
226 "metadata": {
227 "collapsed": false
228 },
229 "outputs": [
230 {
231 "data": {
232 "text/plain": [
233 "Counter({'!': 171,\n",
234 " '\"': 4834,\n",
235 " '&': 5,\n",
236 " \"'\": 1490,\n",
237 " '(': 5,\n",
238 " ',': 7053,\n",
239 " '-': 965,\n",
240 " '.': 4843,\n",
241 " '/': 1,\n",
242 " ':': 56,\n",
243 " ';': 202,\n",
244 " '?': 138})"
245 ]
246 },
247 "execution_count": 15,
248 "metadata": {},
249 "output_type": "execute_result"
250 }
251 ],
252 "source": [
253 "# Complete Sherlock Holmes\n",
254 "sherlock = punct_summarise('sherlock-holmes.txt')\n",
255 "sherlock['counts']"
256 ]
257 },
258 {
259 "cell_type": "code",
260 "execution_count": 16,
261 "metadata": {
262 "collapsed": false
263 },
264 "outputs": [
265 {
266 "data": {
267 "text/plain": [
268 "Counter({'!': 3923,\n",
269 " '\"': 17970,\n",
270 " '#': 1,\n",
271 " '$': 2,\n",
272 " '%': 1,\n",
273 " \"'\": 7529,\n",
274 " '(': 670,\n",
275 " ')': 670,\n",
276 " '*': 300,\n",
277 " ',': 39891,\n",
278 " '-': 6308,\n",
279 " '.': 30805,\n",
280 " '/': 29,\n",
281 " ':': 1014,\n",
282 " ';': 1145,\n",
283 " '=': 2,\n",
284 " '?': 3137,\n",
285 " '@': 2,\n",
286 " '[': 1,\n",
287 " ']': 1})"
288 ]
289 },
290 "execution_count": 16,
291 "metadata": {},
292 "output_type": "execute_result"
293 }
294 ],
295 "source": [
296 "wap = punct_summarise('war-and-peace.txt')\n",
297 "wap['counts']"
298 ]
299 },
300 {
301 "cell_type": "code",
302 "execution_count": 17,
303 "metadata": {
304 "collapsed": false
305 },
306 "outputs": [
307 {
308 "data": {
309 "text/plain": [
310 "Counter({'!': 10815,\n",
311 " '\"': 6,\n",
312 " '&': 10,\n",
313 " \"'\": 27942,\n",
314 " ',': 82750,\n",
315 " '-': 4590,\n",
316 " '.': 36881,\n",
317 " ':': 10649,\n",
318 " ';': 17400,\n",
319 " '?': 10327,\n",
320 " '[': 19,\n",
321 " ']': 18})"
322 ]
323 },
324 "execution_count": 17,
325 "metadata": {},
326 "output_type": "execute_result"
327 }
328 ],
329 "source": [
330 "# Complete works of Shakespeare\n",
331 "shakespeare = punct_summarise('shakespeare.txt')\n",
332 "shakespeare['counts']"
333 ]
334 },
335 {
336 "cell_type": "code",
337 "execution_count": 18,
338 "metadata": {
339 "collapsed": false
340 },
341 "outputs": [
342 {
343 "data": {
344 "text/plain": [
345 "Counter({'!': 1576,\n",
346 " '\"': 8,\n",
347 " '%': 3,\n",
348 " '&': 3,\n",
349 " \"'\": 4485,\n",
350 " '(': 1777,\n",
351 " ')': 1788,\n",
352 " '*': 90,\n",
353 " '+': 2,\n",
354 " ',': 16349,\n",
355 " '-': 5037,\n",
356 " '.': 21361,\n",
357 " '/': 58,\n",
358 " ':': 2564,\n",
359 " ';': 34,\n",
360 " '?': 2235,\n",
361 " '_': 4566})"
362 ]
363 },
364 "execution_count": 18,
365 "metadata": {},
366 "output_type": "execute_result"
367 }
368 ],
369 "source": [
370 "ulysses = punct_summarise('ulysses.txt')\n",
371 "ulysses['counts']"
372 ]
373 },
374 {
375 "cell_type": "code",
376 "execution_count": 19,
377 "metadata": {
378 "collapsed": false
379 },
380 "outputs": [
381 {
382 "data": {
383 "text/plain": [
384 "Counter({'!': 500,\n",
385 " '\"': 3553,\n",
386 " '#': 1,\n",
387 " '$': 2,\n",
388 " '%': 1,\n",
389 " \"'\": 748,\n",
390 " '(': 38,\n",
391 " ')': 38,\n",
392 " '*': 58,\n",
393 " ',': 9280,\n",
394 " '-': 1193,\n",
395 " '.': 6396,\n",
396 " '/': 26,\n",
397 " ':': 155,\n",
398 " ';': 1538,\n",
399 " '?': 462,\n",
400 " '@': 2,\n",
401 " '[': 1,\n",
402 " ']': 2,\n",
403 " '_': 808})"
404 ]
405 },
406 "execution_count": 19,
407 "metadata": {},
408 "output_type": "execute_result"
409 }
410 ],
411 "source": [
412 "pap = punct_summarise('pride-and-prejudice.txt')\n",
413 "pap['counts']"
414 ]
415 },
416 {
417 "cell_type": "markdown",
418 "metadata": {},
419 "source": [
420 "Place all the counts into a Pandas dataframe, normalise them, and then plot them."
421 ]
422 },
423 {
424 "cell_type": "code",
425 "execution_count": 20,
426 "metadata": {
427 "collapsed": false
428 },
429 "outputs": [
430 {
431 "data": {
432 "text/html": [
433 "<div>\n",
434 "<table border=\"1\" class=\"dataframe\">\n",
435 " <thead>\n",
436 " <tr style=\"text-align: right;\">\n",
437 " <th></th>\n",
438 " <th>pap</th>\n",
439 " <th>shakespeare</th>\n",
440 " <th>sherlock</th>\n",
441 " <th>ulysses</th>\n",
442 " <th>wap</th>\n",
443 " </tr>\n",
444 " </thead>\n",
445 " <tbody>\n",
446 " <tr>\n",
447 " <th>!</th>\n",
448 " <td>500</td>\n",
449 " <td>10815</td>\n",
450 " <td>171</td>\n",
451 " <td>1576</td>\n",
452 " <td>3923</td>\n",
453 " </tr>\n",
454 " <tr>\n",
455 " <th>\"</th>\n",
456 " <td>3553</td>\n",
457 " <td>6</td>\n",
458 " <td>4834</td>\n",
459 " <td>8</td>\n",
460 " <td>17970</td>\n",
461 " </tr>\n",
462 " <tr>\n",
463 " <th>#</th>\n",
464 " <td>1</td>\n",
465 " <td>0</td>\n",
466 " <td>0</td>\n",
467 " <td>0</td>\n",
468 " <td>1</td>\n",
469 " </tr>\n",
470 " <tr>\n",
471 " <th>$</th>\n",
472 " <td>2</td>\n",
473 " <td>0</td>\n",
474 " <td>0</td>\n",
475 " <td>0</td>\n",
476 " <td>2</td>\n",
477 " </tr>\n",
478 " <tr>\n",
479 " <th>%</th>\n",
480 " <td>1</td>\n",
481 " <td>0</td>\n",
482 " <td>0</td>\n",
483 " <td>3</td>\n",
484 " <td>1</td>\n",
485 " </tr>\n",
486 " <tr>\n",
487 " <th>&amp;</th>\n",
488 " <td>0</td>\n",
489 " <td>10</td>\n",
490 " <td>5</td>\n",
491 " <td>3</td>\n",
492 " <td>0</td>\n",
493 " </tr>\n",
494 " <tr>\n",
495 " <th>'</th>\n",
496 " <td>748</td>\n",
497 " <td>27942</td>\n",
498 " <td>1490</td>\n",
499 " <td>4485</td>\n",
500 " <td>7529</td>\n",
501 " </tr>\n",
502 " <tr>\n",
503 " <th>(</th>\n",
504 " <td>38</td>\n",
505 " <td>0</td>\n",
506 " <td>5</td>\n",
507 " <td>1777</td>\n",
508 " <td>670</td>\n",
509 " </tr>\n",
510 " <tr>\n",
511 " <th>)</th>\n",
512 " <td>38</td>\n",
513 " <td>0</td>\n",
514 " <td>0</td>\n",
515 " <td>1788</td>\n",
516 " <td>670</td>\n",
517 " </tr>\n",
518 " <tr>\n",
519 " <th>*</th>\n",
520 " <td>58</td>\n",
521 " <td>0</td>\n",
522 " <td>0</td>\n",
523 " <td>90</td>\n",
524 " <td>300</td>\n",
525 " </tr>\n",
526 " <tr>\n",
527 " <th>+</th>\n",
528 " <td>0</td>\n",
529 " <td>0</td>\n",
530 " <td>0</td>\n",
531 " <td>2</td>\n",
532 " <td>0</td>\n",
533 " </tr>\n",
534 " <tr>\n",
535 " <th>,</th>\n",
536 " <td>9280</td>\n",
537 " <td>82750</td>\n",
538 " <td>7053</td>\n",
539 " <td>16349</td>\n",
540 " <td>39891</td>\n",
541 " </tr>\n",
542 " <tr>\n",
543 " <th>-</th>\n",
544 " <td>1193</td>\n",
545 " <td>4590</td>\n",
546 " <td>965</td>\n",
547 " <td>5037</td>\n",
548 " <td>6308</td>\n",
549 " </tr>\n",
550 " <tr>\n",
551 " <th>.</th>\n",
552 " <td>6396</td>\n",
553 " <td>36881</td>\n",
554 " <td>4843</td>\n",
555 " <td>21361</td>\n",
556 " <td>30805</td>\n",
557 " </tr>\n",
558 " <tr>\n",
559 " <th>/</th>\n",
560 " <td>26</td>\n",
561 " <td>0</td>\n",
562 " <td>1</td>\n",
563 " <td>58</td>\n",
564 " <td>29</td>\n",
565 " </tr>\n",
566 " <tr>\n",
567 " <th>:</th>\n",
568 " <td>155</td>\n",
569 " <td>10649</td>\n",
570 " <td>56</td>\n",
571 " <td>2564</td>\n",
572 " <td>1014</td>\n",
573 " </tr>\n",
574 " <tr>\n",
575 " <th>;</th>\n",
576 " <td>1538</td>\n",
577 " <td>17400</td>\n",
578 " <td>202</td>\n",
579 " <td>34</td>\n",
580 " <td>1145</td>\n",
581 " </tr>\n",
582 " <tr>\n",
583 " <th>=</th>\n",
584 " <td>0</td>\n",
585 " <td>0</td>\n",
586 " <td>0</td>\n",
587 " <td>0</td>\n",
588 " <td>2</td>\n",
589 " </tr>\n",
590 " <tr>\n",
591 " <th>?</th>\n",
592 " <td>462</td>\n",
593 " <td>10327</td>\n",
594 " <td>138</td>\n",
595 " <td>2235</td>\n",
596 " <td>3137</td>\n",
597 " </tr>\n",
598 " <tr>\n",
599 " <th>@</th>\n",
600 " <td>2</td>\n",
601 " <td>0</td>\n",
602 " <td>0</td>\n",
603 " <td>0</td>\n",
604 " <td>2</td>\n",
605 " </tr>\n",
606 " <tr>\n",
607 " <th>[</th>\n",
608 " <td>1</td>\n",
609 " <td>19</td>\n",
610 " <td>0</td>\n",
611 " <td>0</td>\n",
612 " <td>1</td>\n",
613 " </tr>\n",
614 " <tr>\n",
615 " <th>]</th>\n",
616 " <td>2</td>\n",
617 " <td>18</td>\n",
618 " <td>0</td>\n",
619 " <td>0</td>\n",
620 " <td>1</td>\n",
621 " </tr>\n",
622 " <tr>\n",
623 " <th>_</th>\n",
624 " <td>808</td>\n",
625 " <td>0</td>\n",
626 " <td>0</td>\n",
627 " <td>4566</td>\n",
628 " <td>0</td>\n",
629 " </tr>\n",
630 " </tbody>\n",
631 "</table>\n",
632 "</div>"
633 ],
634 "text/plain": [
635 " pap shakespeare sherlock ulysses wap\n",
636 "! 500 10815 171 1576 3923\n",
637 "\" 3553 6 4834 8 17970\n",
638 "# 1 0 0 0 1\n",
639 "$ 2 0 0 0 2\n",
640 "% 1 0 0 3 1\n",
641 "& 0 10 5 3 0\n",
642 "' 748 27942 1490 4485 7529\n",
643 "( 38 0 5 1777 670\n",
644 ") 38 0 0 1788 670\n",
645 "* 58 0 0 90 300\n",
646 "+ 0 0 0 2 0\n",
647 ", 9280 82750 7053 16349 39891\n",
648 "- 1193 4590 965 5037 6308\n",
649 ". 6396 36881 4843 21361 30805\n",
650 "/ 26 0 1 58 29\n",
651 ": 155 10649 56 2564 1014\n",
652 "; 1538 17400 202 34 1145\n",
653 "= 0 0 0 0 2\n",
654 "? 462 10327 138 2235 3137\n",
655 "@ 2 0 0 0 2\n",
656 "[ 1 19 0 0 1\n",
657 "] 2 18 0 0 1\n",
658 "_ 808 0 0 4566 0"
659 ]
660 },
661 "execution_count": 20,
662 "metadata": {},
663 "output_type": "execute_result"
664 }
665 ],
666 "source": [
667 "punctuation = pd.DataFrame({'sherlock': sherlock['counts'],\n",
668 " 'wap': wap['counts'],\n",
669 " 'shakespeare': shakespeare['counts'],\n",
670 " 'ulysses': ulysses['counts'],\n",
671 " 'pap': pap['counts']})\n",
672 "punctuation.fillna(value=0, inplace=True)\n",
673 "punctuation"
674 ]
675 },
676 {
677 "cell_type": "code",
678 "execution_count": 21,
679 "metadata": {
680 "collapsed": false
681 },
682 "outputs": [
683 {
684 "data": {
685 "text/plain": [
686 "<matplotlib.legend.Legend at 0x7f6556dc50f0>"
687 ]
688 },
689 "execution_count": 21,
690 "metadata": {},
691 "output_type": "execute_result"
692 },
693 {
694 "data": {
695 "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfoAAAEECAYAAADAjfYgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt8VNW5//HPQ7grCQjILWBA8QJ4iopUEWSwYtEiWBTl\nImK1VrQq1NZaPKLBeqSKClU5UlpFq9YLtSh4wdRCEIqioCLoEX6gAgKCaLgUkATy/P7YkzgZMsmQ\nC5kZvu/Xa15k9l5r7bUnQ569Lnttc3dEREQkNdWq6QqIiIhI9VGgFxERSWEK9CIiIilMgV5ERCSF\nKdCLiIikMAV6ERGRFKZALyIiksLiCvRmdr2ZfW5me8xsiZn1jDNfRzPbaWY7o7aHzKywlNfxFTkJ\nERERKV25gd7MLgMmA3cDXYFFwOtm1racfHWB54D5QKxVeToBLSNeq+OuuYiIiJQrnhb9zcB0d3/M\n3Ve6+03AJuC6cvLdC3wIzAAsRpqv3X1LxKsw7pqLiIhIucoM9OFW+alATtSuHKBHGfl+AvwEuJHY\nQR5giZltNLM3zSwUV41FREQkbuW16JsBacDmqO1bCLraD2BmrYFpwHB33x2j3I3AKGBQ+LUS+Fe8\nY/8iIiISn9rVUOZTwKPu/l6sBO6+ClgVsekdM8sCbgEWRqY1Mz11R0SkAty9rB5VOUyU16LfCuwH\nWkRtb0EwTl+aPsCdZlZgZgXAX4Ajwu9/Xsax3gU6lrbD3WO+7rzzzjL3V3W+ZDtmstVXx0zMvDpm\n8h1TpEiZLXp3zzezpcB5wIsRu/oSTLIrTZeo9xcB/w2cTtBlH0vXcvaLiIjIQYqn6/5B4Ckze5fg\n1rpRBOPzUwHMbAJwurufC+Dun0RmNrPuQGHkdjMbA3wOfALUBS4HBhKM14uIiEgVKTfQu/sLZtYU\nuB1oBSwHLnD39eEkLYEO5RUT9b4OMBHIBPYAK8JlzjmIugMQCoUONkul8iXbMSuTV8dMrWNWJq+O\nmVrHlMOLJfpYjpl5otdRRCTRmBmuyXiC1roXERFJaQr0IiIiKUyBXkREJIUp0IuIiKQwBXoREZEU\npkAvIiKSwhToRUREUpgCvYiISApToBcREUlhCvQiIiIpTIFeREQkhSnQi4iIpDAFehERkRSmQC8i\nIpLC4gr0Zna9mX1uZnvMbImZ9YwzX0cz22lmO0vZ19vMlobLXGNm1x5s5UVqiplhpieAikjiKzfQ\nm9llwGTgbqArsAh43czalpOvLvAcMB/wqH3tgdeAheEyJwAPm9mgCpyDiIiIxGDuXnYCs8XAh+5+\nbcS2VcDf3f22MvJNAtKBt4BH3L1RxL57gYvc/YSIbX8GOrt7j6hyvLw6ihxqRa15fTclUZkZ7q5u\nJym7RR9ulZ8K5ETtygF6HJijON9PgJ8ANwKlfdHOjFFmNzNLK6fOIiIiEqfyuu6bAWnA5qjtW4CW\npWUws9bANGC4u++OUW6LUsrcDNQOH1NERESqQO1qKPMp4FF3f6+qCszOzi7+ORQKEQqFqqpoEZGU\nkJubS25ubk1XQxJQmWP04a77XcAQd38xYvsUoJO79yklTyGwP3ITQc/BfuA6d/+Lmc0Hlrv7DRH5\nBgPPAA3cfX/Edo3RS8LRGL0kOo3RS5Eyu+7dPR9YCpwXtasvwez70nQBfhDxugPYE/757+E0b4fL\niC7zvcggLyIiIpUTT9f9g8BTZvYuQXAfRTA+PxXAzCYAp7v7uQDu/klkZjPrDhRGbZ8K3BCemT8N\nOAsYCQyp3OmIiIhIpHIDvbu/YGZNgduBVsBy4AJ3Xx9O0hLoUF4xUWV+YWYXAJOA64ANwI3uPvMg\n6y8iIiJlKPc++pqmMXpJRBqjl0SnMXoporXuRUREUpgCvYiISApToBcREUlhCvQiIiIpTIFeREQk\nhSnQi4iIpDAFehERkRRWHQ+1EUlZRffPi4gkC7XoRQ6aFskRkeShQC8iIpLCFOhFRERSmAK9iIhI\nClOgFxERSWEK9CIiIiksrkBvZteb2edmtsfMlphZzzLSdjKzeWb2VTj9GjP7HzOrE5EmZGaFpbyO\nr4qTEhERkUC599Gb2WXAZOA6YCHwS+B1M+vk7utLybIXmA58AGwDugJ/BuoCt0Sl7QR8G/F+68Ge\ngIiIiMRm7mXfE2xmi4EP3f3aiG2rgL+7+21xHcTsQeAMd+8Rfh8C5gLN3f2bcvJ6eXUUOVSCBXMc\nKLlwjr6jkmjMDHfXCk9Sdte9mdUFTgVyonblAD3iOYCZHQf8uJQyAJaY2UYzezMc/EWSisK7iCS6\n8sbomwFpwOao7VuAlmVlNLNFZrYHWAUsdvfsiN0bgVHAoPBrJfCvssb+RRKZmWl5XBFJSNW51v2l\nwJEEY/QTzew+d/8tgLuvIrgAKPKOmWURjOEvjC4oOzu7+OdQKEQoFKquOouIJKXc3Fxyc3NruhqS\ngMocow933e8Chrj7ixHbpwCd3L1PXAcxGw48DjR09/0x0twJXObunaK2a4xeEkb0GH30aL2+q5Io\nNEYvRcrsunf3fGApcF7Urr7AooM4Tlr4WGUdrytBl76IiIhUkXi67h8EnjKzdwmC+yiC8fmpAGY2\nATjd3c8Nvx8B7AFWAPlAN+Ae4Hl3LwinGQN8DnxCcNvd5cBAgvF6ESlH0XwA9SCISHnKDfTu/oKZ\nNQVuB1oBy4ELIu6hbwl0iMhSAIwFOhL0aq4FHgEmRaSpA0wEMvn+ouACd59TqbMRERGREsq9j76m\naYxeEkmijNGrRS/l0Ri9FNFa9yIiIilMgV5ERCSFKdCLiIikMAV6ERGRFKZALyIiksIU6EVERFKY\nAr2IiEgKq86H2oiISIIxMy2+kMJKWztBgV5E5DCjhZZSU6xHZavrXkREJIUp0IuIiKQwBXoREZEU\npkAvIiKSwjQZT0TkMBdrEldV0gTAmhNXi97Mrjezz81sj5ktMbOeZaTtZGbzzOyrcPo1ZvY/ZlYn\nKl1vM1sakebayp6MiIhUlFfjS2pSuYHezC4DJgN3A12BRcDrZtY2Rpa9wHSgL3A8MAa4Grgnosz2\nwGvAwnCZE4CHzWxQhc9ERESSWlZWFn/4wx/o3LkzRx11FFdddRV79+4lLy+P/v37c/TRR3PUUUdx\n4YUXsmHDhuJ8oVCIsWPH8sMf/pCMjAwuuugi8vLyavBMEouV151iZouBD9392ohtq4C/u/ttcR3E\n7EHgDHfvEX5/L3CRu58QkebPQOeiNBHbXV0+kiiCLk4Hgq7O738KHKrvalFXq/5vSCxmVuriKaX9\nTf3+e11ttYnru5qVlUV6ejqvv/46DRs25MILL6RPnz786le/Yv78+Zx//vns27ePq666ioKCAmbO\nnAkEgX716tXk5OSQlZXFFVdcQYMGDXjqqaeq8ZwST6zfeZktejOrC5wK5ETtygF6HJij1DKOA34c\nVcaZMcrsZmZp8ZQrIiKpxcy44YYbaNOmDU2aNOG///u/efbZZznqqKP46U9/Sv369TnyyCO57bbb\nmD9/fol8V1xxBZ06daJhw4b8/ve/54UXXtCFcFh5XffNgDRgc9T2LUDLsjKa2SIz2wOsAha7e3bE\n7hallLmZYHJgs3LqJCIiKapt2+9Hhdu1a8fGjRvZs2cP1157LVlZWWRkZNC7d2+2b99eIpBH5yso\nKGDr1q2HtO6Jqjpn3V8KHEkwBj/RzO5z999WpKDs7Ozin0OhEKFQqCrqJyKSMnJzc8nNza3palTa\nunXrSvzcunVrHnjgAVatWsW7777L0UcfzYcffsipp56KuxcPY0Xnq1OnDs2aqd0I5YzRh7vudwFD\n3P3FiO1TgE7u3ieug5gNBx4HGrr7fjObDyx39xsi0gwGngEauPv+iO0ao5eEoTF6SRbJOkafkZHB\na6+9RoMGDRgwYAChUIiCggKWL1/OzJkz2bVrF1dffTUvv/wy+/bto1atWoRCIdasWUNOTg7HHHMM\nI0eOpF69ejz99NPVeE6Jp0Jj9O6eDywFzova1Zdg9n280sLHKjre2+Eyost8LzLIi4jI4cPMGDZs\nGOeddx7HHnssHTt25Pbbb2fMmDHs2bOHZs2a0aNHD84///wS9/6bGSNGjODKK6+kVatW5Ofn89BD\nD9XgmSSWeGbdXwo8BVxPENxHAT8jmCG/3swmAKe7+7nh9COAPcAKIB/oBjwI5Lr75eE0WeH9fwam\nAWcBUwh6DmZGHV8tekkYatFLsjj4Fn31iue72r59ex577DHOOeecgyq7T58+jBgxgquuuqqi1UsJ\nsX7n5Y7Ru/sLZtYUuB1oBSwHLnD39eEkLYEOEVkKgLFAR4K/gWuBR4BJEWV+YWYXhLddB2wAbowO\n8iIiUv1S4YIxFc6husQ1Gc/dHwUejbHvZ1HvnwOei6PMt4DT4jm+iIhIWQ5Fr0SyKrfrvqap614S\nibruJVkcTNe9pIYKTcYTERGR5KZALyIiksIU6EVERFKYAr2IiEgKU6AXERFJYQr0IiKS0LKzsxkx\nYkSF8j7xxBP06tWrimuUXKrzoTYiIpIEEmVlvFh0j3zlKNCLJDHdTy9VJjtxyz5cv9/79+8nLS2t\n0uWo614kic1jXk1XQaRK3XvvvWRmZpKens6JJ57I3LlzMTPy8/MZOXIk6enpdOnShaVLlxbn+cMf\n/sBxxx1Heno6nTt35qWXXopZ/i233EKvXr3YuXMn27dv5+qrr6Z169ZkZmYybtw4CgsLAVi9ejW9\ne/emcePGNG/enCFDhhSXUatWLR5++GGOPfZYmjdvzm9/+9sSFyOPP/44nTp14qijjqJfv34lHqE7\nevRo2rVrR0ZGBt26dWPhwoXF+7Kzs7nkkksYMWIEGRkZPPnkk2XWMV4K9CIikhBWrlzJlClTWLJk\nCTt27CAnJ4esrCzcnVmzZjF06FC2b9/OgAEDuOGG4qecc9xxx7Fw4UJ27NjBnXfeyeWXX87mzZtL\nlO3uXHPNNaxYsYJ//vOfNGrUiCuvvJK6deuyZs0aPvjgA3JycvjLX/4CwLhx4+jXrx/btm1jw4YN\n3HTTTSXKe+mll1i6dCnvv/8+L7/8Mo8//jgAL7/8MhMmTGDmzJls3bqVXr16MXTo0OJ83bt3Z9my\nZeTl5TFs2DAGDx5Mfn5+8f5Zs2YxePBgtm/fzrBhw8qsY7wU6EVEJCGkpaWxd+9ePv74YwoKCmjX\nrh0dOgTPTOvVqxf9+vXDzLj88stZtmxZcb5LLrmEli1bAnDppZfSsWNHFi9eXLy/oKCAIUOGsG3b\nNmbPnk39+vXZvHkzr7/+OpMmTaJBgwY0b96cMWPG8NxzwaNa6tatyxdffMGGDRuoW7cuPXr0KFHX\nW2+9lcaNG9O2bVvGjBnDs88+C8DUqVMZO3YsJ5xwArVq1WLs2LF8+OGHrF8fPAdu+PDhNGnShFq1\nanHzzTezd+9eVq5cWVxujx49GDBgAADbt28vs47xUqAXEZGEcNxxxzF58mSys7Np0aIFQ4cOZdOm\nTQC0aNGiOF3Dhg357rvviruw//rXv3LKKafQpEkTmjRpwooVK/jmm2+K069evZrZs2dzxx13ULt2\nMDVt7dq1FBQU0KpVq+J8o0aN4uuvvwbgvvvuw93p3r07Xbp0Yfr06SXq2rZt2+Kf27Vrx8aNG4vL\nHT16dHGZTZs2BWDDhg0A3H///XTq1InGjRvTpEkTtm/fztatW4vLyszMLP65vDrGK65Ab2bXm9nn\nZrbHzJaYWc8y0obM7GUz22hmu8xsmZn9rJQ0haW8jj+o2ouISEoZOnQoCxYsYO3atZgZt956a5mz\n7teuXcsvfvELpkyZwrfffkteXh5dunQpMWZ+0kkn8fjjj3P++eezatUqIAjU9erV45tvviEvL4+8\nvDy2b9/O8uXLgeDCYtq0aWzYsIE//elPXH/99Xz22WfFZUaOu69bt442bdoAQdCfNm1acZl5eXns\n2rWLM844gwULFjBx4kRmzJjBtm3byMvLIyMjo0RdI8+1vDrGq9xAb2aXAZOBu4GuwCLgdTNrGyPL\nmcAy4GKgM8HjbaeZ2dBS0nYieJ590Wv1QdVeRERSxqpVq5g7dy579+6lXr161K9fv9xZ57t27cLM\naNasGYWFhUyfPp0VK1YckG7IkCHcc889nHvuuXz22We0atWK8847j5tvvpmdO3dSWFjImjVreOut\ntwCYMWMGX375JQCNGzfGzKhV6/uQef/997Nt2zbWr1/PQw89xGWXXQbAqFGjuOeee/jkk0+AoPt9\nxowZAOzcuZPatWvTrFkz8vPzueuuu9ixY0fMcyuvjvGKp0V/MzDd3R9z95XufhOwCbiutMTuPsHd\n73D3t939C3efCvyDIPBH+9rdt0S8Dm4qochhwMyKXyKpbO/evYwdO5bmzZvTqlUrtm7dyoQJE4AD\n76Uvet+pUyd+/etfc+aZZ9KyZUtWrFhBz549S6QrSnvFFVdwxx13cM4557Bu3Tr++te/kp+fXzxD\nfvDgwXz11VcALFmyhDPOOINGjRoxcOBAHnroIbKysorLHThwIKeddhqnnHIK/fv356qrrgLgoosu\n4tZbb2XIkCFkZGRw8skn88YbbwDQr18/+vXrx/HHH09WVhYNGjSgXbt2pda1SFl1jFeZz6M3s7rA\nLmCIu78Ysf0RoIu7h+I6iNkcYJ27/yL8PgTMBdYC9YBPgLvdPbeUvHp2siSMmngefeQxi8ov+mMw\nj3n0oc9he5+xxHYwz6NP9AVzEk2tWrVYvXp18UTBRBHrd17egjnNgDRgc9T2LQRd7fEcuD9wDhA5\nZXEjMAp4jyDQjwD+ZWa93X3hgaWIiEh1SaUgLAeq1pXxzOws4BngRndfUrTd3VcBqyKSvmNmWcAt\ngAK9iIgkrGQbRisv0G8F9gMtora3IBinjyk8M/9VYJy7/ymOurwLXFbajuzs7OKfQ6EQoVAojuJE\nRA4fubm55Obm1nQ1Dgv79++v6SoclDLH6AHM7B1gmbtfG7FtFTDD3f87Rp6zgVeAO9x9clwVMZsJ\nNHL3c6O2a4xeEobG6CVZHMwYvaSGio7RAzwIPGVm7xLcWjeKYHx+arjgCcDpRQE6PNHuVeAR4Fkz\nKxrL3+/uX4fTjAE+J5iEVxe4HBgIDKroCR4sPQxEklGydRmKSM0rN9C7+wtm1hS4HWgFLAcucPf1\n4SQtgciphyOB+gTj7bdEbP8iIl0dYCKQCewBVoTLnFPhMxE5TET3IoiIlKXcrvuaVl3dTGrRS0XU\ndNd99DHVdS+xqOv+8BPrd6617kVERFKYAr2IiCS07OxsRowYkVDlPfHEE/Tq1auKalS9qvU+ehER\nSXyJvjJeVdfvcJvUqkAvIiJU56h9ZcNqVc4p2LdvX5WVlSzUdS8iIgnj3nvvJTMzk/T0dE488UTm\nzp2LmZGfn8/IkSNJT0+nS5cuLF26tDjPxo0bufjiizn66KPp0KEDDz/8cPG+7OxsLrnkEkaMGEFG\nRgZPPvnkAcecNWsWnTt3pkmTJvTp04dPP/20eN/69esZNGgQRx99NM2aNePGG28std633HILvXr1\nKvNpdDVFgV5ERBLCypUrmTJlCkuWLGHHjh3k5OSQlZWFuzNr1iyGDh3K9u3bGTBgADfccAMAhYWF\nXHjhhZxyyils3LiRf/3rX0yePJmcnJzicmfNmsXgwYPZvn07w4cPL3HMVatWMWzYMB566CG2bt3K\nBRdcwIUXXsi+ffvYv38//fv3p3379qxdu5YNGzYwdGjJJ667O9dccw0rVqzgn//8J+np6dX/QR0k\nBXoREUkIaWlp7N27l48//piCggLatWtX/IS4Xr160a9fP8yMyy+/nGXLlgHw3nvvsXXrVm6//XZq\n165N+/bt+fnPf85zzz1XXG6PHj0YMGAAAPXr1y8xFPD888/Tv39/fvSjH5GWlsZvfvMb9uzZw7//\n/W/effddNm3axMSJE2nQoAH16tWjR4/vn89WUFDAkCFD2LZtG7Nnz6Z+/fqH4mM6aBqjFxGRhHDc\ncccxefJksrOz+fjjj/nxj3/Mgw8+CECLFt8/cqVhw4Z89913FBYWsnbtWjZu3EiTJk2K9+/fv5+z\nzz67+H1mZmbMY27cuPGAZ8K3bduWDRs2UKdOHY455hhq1Sq9Tbx69Wo++ugjFi9eTO3aiRtO1aIX\nEZGEMXToUBYsWMDatWsxM2699dYyZ8m3bduW9u3bk5eXV/zasWMHr7zyChAE7rLyt2nThrVr1xa/\nd3fWr19PZmYmbdu2Zd26dTEfYnPSSSfx+OOPc/7557Nq1apS0yQCBXoREUkIq1atYu7cuezdu5d6\n9epRv3590tLSyszTvXt3GjVqxH333ceePXvYv38/K1asYMmS4Mno5c3YHzx4MK+++ipz586loKCA\nBx54gPr169OjRw9OP/10WrVqxe9+9zt2797Nd999x6JFi0rkHzJkCPfccw/nnnsun332WeU+gGqi\nQC8iIglh7969jB07lubNm9OqVSu2bt3KhAkTgAPvfS96n5aWxiuvvMKHH35Ihw4daN68Ob/4xS+K\nZ7+X1qKP3HbCCSfw9NNPc+ONN9K8eXNeffVVZs+eTe3atUlLS2P27NmsXr2adu3a0bZtW1544YUD\nyrjiiiu44447OOecc1i3bl31fUAVpLXuE/z8JbForXtJFgez1n2iL5gj8anMY2pFRCSFKQinNnXd\ni4iIpLC4Ar2ZXW9mn5vZHjNbYmY9y0gbMrOXzWyjme0ys2Vm9rNS0vU2s6XhMteY2bWVORERERE5\nULmB3swuAyYDdwNdgUXA62bWNkaWM4FlwMVAZ+BRYJqZFS8nZGbtgdeAheEyJwAPm9mgip+KiIiI\nRCt3Mp6ZLQY+dPdrI7atAv7u7rfFdRCz54E0d78k/P5e4CJ3PyEizZ+Bzu7eIyqvJuNJwtBkPEkW\nBzMZT1JDrN95mS16M6sLnArkRO3KAXocmCOmDODbiPdnxiizm5mVfdOkiIiIxK28WffNgDRgc9T2\nLUDLeA5gZv2Bcyh5YdCilDI3h+vTrJR9IiIiUgHVenudmZ0FPAPc6O5LKlpOdnZ28c+hUIhQKFTp\nuomIpJLc3Fxyc3NruhqSgMocow933e8Chrj7ixHbpwCd3L1PGXl7Aq8C49z9oah984Hl7n5DxLbB\nBBcFDdx9f8R2jdFLwtAYvSSLVBujz83NZcSIEaxfv76mq5KwKjRG7+75wFLgvKhdfQlm38c62NkE\ns+rvjA7yYW+Hy4gu873IIC8iItWvaDnX6nxJzYmn6/5B4Ckze5cguI8iGJ+fCmBmE4DT3f3c8PsQ\nQUv+EeBZMysay9/v7l+Hf54K3GBmk4BpwFnASGBIVZyUiIgcpHnzqq/sPjE7f+UQKPc+end/ARgD\n3A58QDCp7gJ3L+o/aQl0iMgyEqgP3AJsAjaGX4sjyvwCuAA4O1zmWIJx/JmVOx0REUlWtWrVKvEE\nuCuvvJJx48YdkG7ixIlccsklJbbddNNNjBkzBoAnnniCY489lvT0dDp06MDf/vY3IHh+fO/evWnc\nuDHNmzdnyJDv25affvopffv2pWnTppx44onMmDGjeN9rr71G586dSU9PJzMzkwceeKBKz7u6xTUZ\nz90fJVj4prR9Pyvl/QEr4ZWS7y3gtHiOLyIih59Y3f6XX34548ePZ/v27WRkZLBv3z6ef/555syZ\nw65duxg9ejRLliyhY8eObN68mW+++QaAcePG0a9fP+bPn09+fn7xo2x37dpF3759ufvuu3njjTf4\n6KOP6Nu3LyeffDInnngiV199NX//+98566yz2L59e8I+jjYWrXUvIiIJq7SJg61ataJXr17Fre45\nc+bQrFkzTjnlFCDoGVi+fDl79uyhRYsWdOrUCYC6devyxRdfsGHDBurWrUuPHsFd36+88grt27dn\n5MiR1KpVi65duzJo0KDiR9LWrVuXjz/+mB07dpCRkVF8nGShQC8iIkln5MiRPP300wA8/fTTXHHF\nFQAcccQRPP/880ydOpXWrVvTv39/Vq5cCcB9992Hu9O9e3e6dOnC9OnTAVi7di2LFy+mSZMmxa+/\n/e1vbN4cLOny4osv8tprr5GVlUUoFOKdd96pgTOuOAV6ERFJCA0bNmT37t3F7zdt2hRzxv7AgQP5\n6KOPWLFiBa+++irDhw8v3nfeeeeRk5PDV199xYknnsg111wDQIsWLZg2bRobNmzgT3/6E9dffz1r\n1qyhXbt29O7dm7y8vOLXzp07mTJlCgDdunXjpZde4uuvv+aiiy7i0ksvrcZPoeop0IuISELo2rUr\nzzzzDPv372fOnDm89dZbMdM2aNCAiy++mGHDhvHDH/6QzMxMALZs2cLLL7/Mrl27qFOnDkcccQRp\nacHK6jNmzODLL78EoHHjxpgZaWlp9O/fn1WrVvH0009TUFBAQUEB7733Hp9++ikFBQU888wzbN++\nnbS0NBo1alRcXrJQoBcRkYTwxz/+kdmzZxd3nf/0pz8tsT+6dT9y5EhWrFjBiBEjircVFhYyadIk\n2rRpQ9OmTVmwYAGPPhrMJV+yZAlnnHEGjRo1YuDAgTz00ENkZWVx5JFHkpOTw3PPPUebNm1o1aoV\nY8eOJT8/HwiGBtq3b09GRgbTpk3jmWeeqeZPomqV+/S6mqaV8SSRaGU8SRYHszLeoVjQpjq+o+vX\nr+fEE09k8+bNHHnkkVVefrKJ9Tuv1rXuRUQk8SXjhWJhYSEPPPAAQ4cOVZAvhwK9iIgklV27dtGi\nRQvat2/PnDlzaro6CU+BXkREksoRRxzBf/7zn5quRtLQZDwREZEUpkAvIiKSwhToRUREUpgCvYiI\nSApToBcREUlhcQV6M7vezD43sz1mtsTMepaRtp6ZPWFmy8ws38zmlZImZGaFpbyOr8zJiIiISEnl\n3l5nZpcBk4HrgIXAL4HXzayTu68vJUsasAd4GPgJkFFG8Z2AbyPeb42z3lUmckWoZFw0QkSkspJ1\nZTyJTzz30d8MTHf3x8LvbzKzfgSB/7boxO6+O7wPM+sKNC6j7K/d/ZuDq3LVK1pGVETkcDWPAzpf\nq4z+vtbUBSYFAAAUMElEQVSsMrvuzawucCqQE7UrB+hRBcdfYmYbzexNMwtVQXkiIpKkpk+fzoAB\nA4rfd+zYscQjYdu2bcuyZcsYPXo07dq1IyMjg27durFw4cLiNNnZ2VxyySUMGTKE9PR0TjvtND76\n6KNDeh6Jprwx+mYEXfGbo7ZvAVpW4rgbgVHAoPBrJfCvssb+RUQktYVCIRYsWADAxo0bKSgo4J13\n3gHgs88+Y9euXfzgBz+ge/fuLFu2jLy8PIYNG8bgwYOLnzQHMGvWLC699NLi/RdddBH79u2rkXNK\nBDWyBK67rwJWRWx6x8yygFsI5gGUkJ2dXfxzKBQiFApVa/1ERJJNbm4uubm5NV2NSmnfvj2NGjXi\ngw8+YOXKlfz4xz9m2bJlrFy5kkWLFnH22WcDMHz48OI8N998M3fffTcrV67k5JNPBqBbt24MGjSo\neP8DDzzAO++8Q8+eh2dbsrxAvxXYD7SI2t4C2FTFdXkXuKy0HZGBXkREDhTdCBo/fnzNVaYSevfu\nTW5uLqtXr6Z37940btyY+fPn8/bbb9O7d28A7r//fh5//HE2btyImbFjxw62bv1+LndmZmbxz2ZG\nZmYmmzZVdchKHmV23bt7PrAUOC9qV19gURXXpStBl76IiBymevfuzbx581iwYAGhUKg48M+fP5/e\nvXuzYMECJk6cyIwZM9i2bRt5eXlkZGSUmNW/fv33N4QVFhby5Zdf0rp165o4nYQQz330DwJXmtnV\nZnaSmf2RYHx+KoCZTTCzNyMzmFmn8Iz7ZsCRZvaD8Pui/WPMbKCZdTSzzmY2ARgIPFJVJyYiIsmn\nKNB/9913tG7dmp49ezJnzhy+/fZbTjnlFHbu3Ent2rVp1qwZ+fn53HXXXezYsaNEGUuXLmXmzJns\n27ePyZMnU79+fc4444waOqOaV+4Yvbu/YGZNgduBVsBy4IKIe+hbAh2isr0KHFNUBPBB+N+08LY6\nwEQgk+Ce+xXhMqv1wcJF94rqfk4RkcTUsWNHGjVqRK9evQBIT0/n2GOP5eijj8bM6NevH/369eP4\n44/niCOO4Fe/+hXt2rUrzm9mDBw4kOeff56RI0fSsWNH/vGPf5CWlhbrkCnPEj3omZlXVR0jA33k\nAhFF99En+mchNS/43jgQ/i4V/xSoju9QWcfUd1diMTPc/YCVcEr7m5pKC+aMHz+e1atX89RTTx2S\n4yWSWL/zGpl1L4lNPR8ih5dU+r+eSudSVfRQGxERSRlmdkh6KJKJWvQiIpIy7rzzzpquQsJRi15E\nRCSFKdCLiIikMAV6ERGRFKYxehGRw4wmqx1eFOhFRA4jpd1nLalNXfciIiIpTIFeREQkhSnQi4iI\npDAFehERkRSmQC8iIpLCFOhFRERSWFyB3syuN7PPzWyPmS0xs55lpK1nZk+Y2TIzyzezeTHS9Taz\npeEy15jZtRU9CRERESlduYHezC4DJgN3A12BRcDrZtY2RpY0YA/wMPAqweOzo8tsD7wGLAyXOQF4\n2MwGVeAcREREJIZ4Fsy5GZju7o+F399kZv2A64DbohO7++7wPsysK9C4lDJHAV+6++jw+5Vm9kPg\nN8A/Du4UREREJJYyW/RmVhc4FciJ2pUD9KjEcc+MUWY3M0urRLkiIiISobyu+2YEXfGbo7ZvAVpW\n4rgtSilzM0EPQ7NKlCsiIiIRkmKt++zs7OKfQ6EQoVCoxuoiIpKIcnNzyc3NrelqSAIqL9BvBfYT\ntMAjtQA2VeK4X3Fgj0ALYF/4mCVEBnoRETlQdCNo/PjxNVcZSShldt27ez6wFDgvaldfgtn3FfV2\nuIzoMt9z9/2VKFdEREQixHMf/YPAlWZ2tZmdZGZ/JGiNTwUwswlm9mZkBjPrFJ5x3ww40sx+EH5f\nZCrQxswmhcv8OTASuL8qTkpEREQC5Y7Ru/sLZtYUuB1oBSwHLnD39eEkLYEOUdleBY4pKgL4IPxv\nWrjML8zsAmASwa14G4Ab3X1m5U5HREREIsU1Gc/dHwUejbHvZ6Vsax9HmW8Bp8VzfBEREakYrXUv\nIiKSwpLi9joRSS5mVvyz+wGrYIvIIaQWvYhUj+yaroCIgFr0IjVGrV4RORTUohepSfNKfYqziEiV\nUaAXERFJYQr0EpOZleheFhGR5KNALzFp1Fgk8emCXMqjQC/l0h8REZHkpUAvIiKSwhToRUREUpgC\nvcRF44AiIslJgV7iMg/d7y0ikowU6EWkSqnnRySxxBXozex6M/vczPaY2RIz61lO+pPNbL6Z7Taz\nL81sXNT+kJkVlvI6vjInIyIiIiWVu9a9mV0GTAauAxYCvwReN7NO7r6+lPTpwD+BXKAbcBIw3cx2\nufuDUck7Ad9GvN9akZMQERGR0sXTor8ZmO7uj7n7Sne/CdhEEPhLMxyoD4x090/c/UXg3nA50b52\n9y0Rr8KKnIRIslN3t4hUlzIDvZnVBU4FcqJ25QA9YmQ7E1jg7nuj0rc2s2Oi0i4xs41m9qaZheKv\ntmaBixwK+n8mkvzKa9E3A9KAzVHbtwAtY+RpWUr6zRH7ADYCo4BB4ddK4F/ljf2LiJRFFyYiB6qO\n59GXu0S6u68CVkVsesfMsoBbCOYBlJCdnV38cygUIhQKVbKKIjVLwUiqWm5uLrm5uTVdDUlA5QX6\nrcB+oEXU9hYE4/Sl+YoDW/stIvbF8i5wWWk7IgO9SLIqCu7uelyQVL3oRtD48eNrrjKSUMrsunf3\nfGApcF7Urr7AohjZ3gZ6mVm9qPQb3H1tGYfrStClLyIiIlUknln3DwJXmtnVZnaSmf2RoMU+FcDM\nJpjZmxHp/wbsBp4ws85mNgi4NVwO4TxjzGygmXUMp5kADAQeqaLzEhEREeIYo3f3F8ysKXA70ApY\nDlwQcQ99S6BDRPodZtYXmAIsIbhP/n53nxRRbB1gIpAJ7AFWhMucU/lTEpHqoKEHkeQU12Q8d38U\neDTGvp+Vsm0F0LuM8iYSBHoRERGpRlrrXkREJIVVx+11IiI1KvL2RQ01yOEuZVr0WihDREqYp0cr\ni4Ba9CIiSUM9FVIRCvRSrCp7RCrzB0mzu0XK4oB6LyV+KdN1L1WlCoOruk4lAWhITw53Sd+i139i\nERGR2FKjRa+WoyQRXZyKyKGUGoFeJJlk13QFRORwknRd92oNiUg0/V0QiS1JW/SajS0i0fR3QaQ0\nSRroRUREJB4K9CIiIiksrkBvZteb2edmtsfMlphZz3LSn2xm881st5l9aWbjSknT28yWhstcY2bX\nVvQkREREpHTlBnozuwyYDNwNdAUWAa+bWdsY6dOBfwKbgG7AaOAWM7s5Ik174DVgYbjMCcDDZjao\nUmcjItXuYJ8rUZRWz6MQqRnxtOhvBqa7+2PuvtLdbyII4tfFSD8cqA+MdPdP3P1F4N5wOUVGAV+6\n++hwmX8BngR+U+EzOYRyc3MPab6aylvxI1ZcjZxnZY5Z4ZwVV5ljVsVnNI9Ds25FZX4vNXHMZPu7\nIIePMgO9mdUFTgVyonblAD1iZDsTWODue6PStzazYyLSlFZmNzNLi6fiNSnZ/kPXdKA/mFZcZF0P\ntgWoQB9H3jjPNfqzT7agWxPHTLa/C3L4KK9F3wxIAzZHbd8CtIyRp2Up6TdH7ANoESNN7fAxRSrc\nclQXcVX5/na18ePH12A9RKQyqmPWvW5mPQxVJrhWNG9RvqK848ePV4CPEv3ZFn1GB/s53VkF9RCR\nmmFlPQo03HW/CxgSHmsv2j4F6OTufUrJ8yTQ1N37R2w7HVgMtHf3tWY2H1ju7jdEpBkMPAM0cPf9\nEdt14SAiUgHurissKXsJXHfPN7OlwHnAixG7+gIzYmR7G7jXzOpFjNP3BTa4+9qIND+NytcXeC8y\nyIfroC+qiIhIBcXTdf8gcKWZXW1mJ5nZHwnG2qcCmNkEM3szIv3fgN3AE2bWOXzL3K3hcopMBdqY\n2aRwmT8HRgL3V8E5iYiISFi5D7Vx9xfMrClwO9AKWA5c4O7rw0laAh0i0u8ws77AFGAJ8C1wv7tP\nikjzhZldAEwiuE1vA3Cju8+smtMSERERKGeMPtmZWbvI9+6+rqbqUlFm1gUIEfS+LHT39ytRVqa7\nf1lVdUt2ZnYWsNTdv6uCstoRrA1ReKiOWZ3C51Pg7psitrUGapf3/6i0vBHbY35GkZ9Non5OZtYd\neN/d98WZ/jSCxtGFHPxE5Tnuvvsg82BmmcCm6GFQOXwlRaA3szuJ8Z/E3e8ys18STAC8KypfYcmk\nXqF79M3sEeBOd/8mzvRHEvxB3BZ+X4tgMaCewPvAPe6eH0c51wK/J7h1uh5wDnC3u98bR94NwHzg\ncXd/08xOAWa7e2ZUuqLP1gg+o7sOLK3U8ju4+2fxpC0rffizaQNkANuAjXEEy57AF+7+pZm1IZjk\nuTDeukSUsxP4wcGcRxllFQIfA79097eq+5hm9n/A8bG+02Y2CZhJcHFY5ucZI38h8Km7d4rY9inQ\nsbz/R6Xljdge8zOK/Gzi/ZzM7ELgWOA5d/8qztMrytsSGEcwX+go4P8BE939r+WcW0t33xLnMXYC\nPwBWH0zdCP5PdqzI96Qqv9eSGpLlefSDOTDQW3jbXcAgguGDEkHK3St8+2BU63c4cB/wjZlFD12U\n5q/AR0B2+P3NwFjgZeDnBGsF3BCdycyau/vXEZtGA/9V9AfMzHoB/yBYabA8dwAnA8+F69yNYNnh\naO2p2C2Ri8JzM6YDc72UK0YL7qn6EfCz8L8tw9sbApcCQ4GzgIYR2Xab2b+BZ4EXYrRo6hLM+bgU\neAD4UwXqX9WuIvgs7we6H4LjTQGalrG/AcFnWM/MXgFeAt5w9z1xln8VwYVXpLEEF2QVyVu0vco+\nIzP7HcGF8BZgrJn1dfeP4sx7BsGF0J+BXgSrfZ4K/K+Z1Q2v1hnLPWYWT0vbCL6rRVq5e/T6IbHq\ntzOedCLxSIoWfUVF9wTE21oN590FbCVY2/8ioK+7L4znatnMPgNGuPu/w+8/JmjFP1P0B8bdW5WS\n73PgLnefHn6/BLjF3eeF318bfn9cKXkbEPw+d0dtHwY8TXCbZHt33xrvZ1AWM8sAbgGuIQgq7wPr\ngP8AjYB2wCnAd8A0gpbSdjMbDdwGfE1w4fNeON8OIB04hiAIXAgcTdCD8VApx59CEEwau/svK3gO\nh7zlcyiPGb7QOp3g+zuQIMi+SRD0Z0ddVNa4g23Rm9k6YJy7P2lmtxFcGF8B/B9B4G5OKUMN4e/u\np8Av3H121L7OwGvufoyZPQ/cEPk5mVku8V8YFzVGhgP3EMxDiiuAm9nU8Lkd9O9ILXqJluqB/glK\ndkv/7CDy1iG4wu8J/A+QT7B6XxbBH5QXo6/OzaxoKbeiLvqioNubIKDtDtflbIJudSLXIgh3Q08h\nCJS/IFhBcAZQh6D3ZR/BBcQbpdT3HwRjetMitvUE5gC/I2g5fx1+VkGVCa+1cG74nDqG676doKty\nAfBm5DCFmb0I/E88cw3C45u3ufvFEduKPuNGBL+f94GdUPKzjFFe5IWfEVxwPArkhbfFPXRRUWY2\nHHjZ3f9TnceJcezj+D7o/xB4lyDoP+vuGw51faJVIND/B+ji7l+E399O0KvnBBeLz1DKUEM43X+5\n+6VmtoKgRynyNt5jCHqfxgGF7j66qs7xUFCgl2gpHegrw8waFHVzmlkecBrBXQdvAiuAzsB6dz+h\nlLxrCQLyW2bWH3jQ3Y8P78sA1rl7zC5QM7uYYKjgL8BDBGOQtYCVsbpezWwL0Nvd/y/8vku4rjd5\ncOfEWcDz0WP0ySrcot8BpMfboo+48IPgD/swYBbBhcJBXwwmMzNrTtBrMgD4t7tPrOEqVSTQv0/Q\n6n01Yltrgv+nnxAMXTV099yofIuBCe7+kpldSXAh/HuCXq8xBE/VvBM4jmCOQ/OqO8vqp0AvB3B3\nvUp5AXsJVvObRNAS7xzevpMg8NYDesXI+ySwkqBFsJqg+7lo39nAkjiOn0Ew9vwB0D2O9P8BTgz/\n3J6g+/KciP0dgd01/blW0e+mDzAj/PPzQKiC5ewEjq3p89GrxO+jQ/TPZaS/AfhHBY6zOeL/81Lg\nRxH7mhJcQNYjuPjLJ5h8V+OfT0U+R730cvdqWes+VWQSdNnvJXiwz/tmtpDgD8CpBK2/BTHy/pqg\nW/RSglb1PRH7fkowZl4qM/uJmf2aILhfC9wIPGZmk83siDLq+wHwRzO7DniLYDxwbsT+nxDMKk4F\nBXz/2ONfA5W5jUhdWknK3R9x90EVyJrP95MKWxG05It8R9CVn07Qi1aLyn2/aoJWE5US1HUfh3DX\n/dnASQQz6r8iGD9/1917V+FxHgAuB+YRTKJ60oPbB+sRjCcPA8Z4RFdlRN7TCMbz9xN0R/8UGA+s\nIWgB30YQ/MuaTXxYURdnYqnI7XUVPM4bBPMk/jc8nHMCwYXjboJu/JPcvauZdSWYY5JUT9S0YL2C\nDa776CVMgT4OZraNYPLOuvAfoK4ErYCQuz9Xhcf5FjjP3ZeY2VHAYnfvGLG/E/And+8VR1lXENyG\n14LgD9gkdx9XVXVNBeGZ2o+6e165iaXaRf4+qvN3Y2ZXEVww/5eZpRPcojmA4Fa4t4DRHqze+ShQ\nK9yzJpK0FOjjEHmFHL5Vrp+XfR99RY+zHrjZ3WeEWxN/dff/ikpjHucvzYLFaI4G8vz7BwyJHNbC\nd9S8B7zi7rfHSDMAeBzo6lpNUpKcAn0CCd969WeC29MaAiPd/aWarZVI6jGzLOANgtszf+/un4S3\ntyCYFzMKuMSjZuyLJCMF+gRjZs0IVvn7f+pSFqk+ZtYI+C3BIjsZBBNv6xAs5DTew/fniyQ7BXoR\nOeyZWROCIL/VK/BsAJFEpkAvIiKSwnQfvYiISApToBcREUlhCvQiIiIpTIFeREQkhf1/BmYtWNLZ\nUuUAAAAASUVORK5CYII=\n",
696 "text/plain": [
697 "<matplotlib.figure.Figure at 0x7f6556dc55c0>"
698 ]
699 },
700 "metadata": {},
701 "output_type": "display_data"
702 }
703 ],
704 "source": [
705 "punctuation_normalised = punctuation.div(punctuation.sum())\n",
706 "ax = punctuation_normalised.plot(kind='bar', fontsize=14)\n",
707 "ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))"
708 ]
709 },
710 {
711 "cell_type": "markdown",
712 "metadata": {},
713 "source": [
714 "Too many types of punctuation with very low counts. Let's just look at the most common punctuation."
715 ]
716 },
717 {
718 "cell_type": "code",
719 "execution_count": 22,
720 "metadata": {
721 "collapsed": false
722 },
723 "outputs": [
724 {
725 "data": {
726 "text/plain": [
727 "<matplotlib.legend.Legend at 0x7f6540d34c88>"
728 ]
729 },
730 "execution_count": 22,
731 "metadata": {},
732 "output_type": "execute_result"
733 },
734 {
735 "data": {
736 "image/png": "iVBORw0KGgoAAAANSUhEUgAAAegAAAD/CAYAAAA69EWbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X10VNX97/H3TsKTShIeQgwkMQGiCNhCa1HRNAkVGvrj\nqQqWIAEq1eu1omhrlVsxscsfPiCV4mVVuS2gggVpCwREjL+SUSz1AVrQUCErUAMkSEVDghGSQM79\nI8k0hDxMZuZkziSf11pnrcmcc/Z8Z2dmvmfvs8/ZxrIsRERExFlCAh2AiIiIXEwJWkRExIGUoEVE\nRBxICVpERMSBlKBFREQcSAlaRETEgcLsKtgYo+u3RES8YFmWCXQMEni2tqAty3LUkpWVFfAYFFPH\niksxKSZ/LyL11MUtIiLiQErQIiIiDtSpEnRqamqgQ7iIYvKcE+NSTJ5RTCJtZ+w652GMsXQ+RUSk\nbYwxWBokJnSyFrSIiEiwsO0yK6g9Euyo1DsgIiJ2sjVBk5dna/EBk5YW6AhERKSDUxe3iIiIA9k6\nSMyWgh1CXdwiYgcNEpN6tnZxZ2VluR+npqbqsgYRkUZcLhculyvQYYgD6TIrEREHUQta6ukctIiI\niAMF9WVWaqGLiEhHZe9lVtiZQNUDJCIiHZfNCVpJVERExBttStDGmHjgnzTfNL7asqxj9X+oC1pE\nRMQ7bRrFbYwJBa5oYZMiy7LO122rUdwiIm2kUdxST5dZiYg4iBK01NNlViIiIg6kBC0iIuJAStAi\nIiIOpAQtIiLiQErQIiIiDqQELSIi4kBK0CIiIg6kBC0iIuJAbU7Qxph7jTF/N8Z8ZYw5Yox5pIVt\n/bqIiIh0Ft5MljEGWAjsB1KA3xlj9luWtaXxhnnk+Rjef6SR5reyREREnM7nW30aY4qBxZZlLW30\nfLve51O3FRWRjkC3+pR6Pk03aYz5P3VlrGtygzz/taBblKbWtYiIdCxet6CNMY8C84CxlmV91MR6\ntaBFRNpILWip51UL2hjTH/gVML6p5Fwvq8HjVCANJVIRkYZcLhculyvQYYgDedWCNsZ8G/gASLQs\n60gz2zg2E+sgQUScSi1oqeftOeh/AqOA4y1ule1l6XbKDnQAIiIirfO2BT0KeBkYY1lWSTPbOLaZ\nqha0iDiVWtBSz9sW9CVAUuv7e5MIjRKoiIh0ej5fB91swcZYStAiIm2jFrTU8+k66NbpMyYiIuIN\nWxO0WsIiIiLe0WxWIiIiDqQELSIi4kBK0CIiIg6kBC0iIuJAStAiIiIOpAQtIiLiQErQIiIiDqQE\nLSIi4kBK0CIiIg5k653EjGn+Vp+6y5iIiEjzbE3QeeQ1+XwaaXa+rIiISNCzeTar5qkFLSJyMc1m\nJfV8PgdtjJljjKkxxsQ3XmdZVrOLiIiINM8fXdyngANAdeMVOgctIiLiHZ8TtGVZm4BNTa3LavA4\ntW4BzRItIlLP5XLhcrkCHYY4kK3noJsr2aAWtIhIU3QOWurZe5lVS+ta6P7uKHQQIiIi3rI1QZNt\na+nOlh3oAEREJJj53MVtjJkDrAQSLMs60uD5Tt98VAtaRNpKXdxSzx8t6ERgP3Ds4lXBmKCMEquI\niAScPxL0eOCnlmXVXLxKB4EiIiLesHcUt1qiIiJt0lwXt04bdlzNndKwd5CYiIj4jRo9HU9LVzRp\nukkREREHUoIWERFxICVoERERB1KCFhERcSANEhMRCULtcbtkDUoLLCVoEZGgZWcC1X0sAk1d3CIi\n4rWEhASeeuophg0bRu/evbnjjjuorKyktLSUCRMm0K9fP3r37s3EiRMpLi5275eamsqCBQu47rrr\niIiIYMqUKZSWlgbwnTiPErSIiPjk1VdfJTc3l0OHDlFQUMATTzyBZVnMnTuXI0eOcOTIEXr06MG9\n9957wX6vvPIKq1at4vjx44SFhXHfffcF6B04k613Emtunc5riIg0raU7iTX87aw9B21vF7cnv9WJ\niYksWLCAu+66C4A33niDefPmUVhYeMF2e/fuZcyYMXz55ZcApKWlccMNN7Bo0SIAPvnkE0aMGMHZ\ns2c7xXTE9VqaHMXWc9B55F30XBppdr6kiIi0s7i4OPfj+Ph4SkpKOHPmDPPnz+fNN990d11/9dVX\nWJblTsCN96uurubkyZNERUW17xtwKFsTdHPJWKMPRUQ6jiNHjlzwuH///ixZsoSCggI++OAD+vXr\nx969e/nWt751QYJuvF+XLl3o27dvu8fvVPbOB513cQu6XaSlKUGLSFAKti7uhIQEIiIi2LZtGz16\n9GDSpEmkpqZSXV3Nxx9/zMaNG6moqGDu3Lls3ryZc+fOERISQmpqKocOHSI3N5crrriC2bNn061b\nN9asWWPje3Kelrq4/TFI7BRwAKj2Q1kiIhJEjDHMmDGDcePGMWjQIJKSknj00UeZP38+Z86coW/f\nvowePZrx48df0HtqjCEzM5M5c+YQExNDVVUVy5YtC+A7cZ6ADBJrD2pBi0gwalsL2l6eDhL7/e9/\nz5gxY9pUdlpaGpmZmdxxxx3ehtchBGyQWBbwOJCVlUVqaiqpqal2vpyISNBxuVy4XK4279cRGiEd\n4T3YqcO2oFujD4aIOJGnLWinUAvaNy21oP0xSOyHwJPAGMuySho8b5HtU9H2yVaCFhFnCrYELb6x\nu4s7AkhqsqxsP5QuIiLSCdncxd2wbM+G7IuIdGZqQXcuARskptlQREREvGNrgtbRnoiIiHc0m5WI\niIgDKUGLiIjfZWdnk5mZ6dW+q1evJjk52c8RBR+bz0GLiIgdnHInseZ0pikj7aIELSISrLKdW3Zn\nHYN0/vx5QkND/VKWurhFRMQnTz/9NLGxsYSHhzNkyBB27NiBMYaqqipmz55NeHg4w4cPZ8+ePe59\nnnrqKQYPHkx4eDjDhg1j06ZNzZb/0EMPkZyczOnTpykrK2Pu3Ln079+f2NhYFi5cSE1NDQCFhYWk\npKQQGRlJVFQU06dPd5cREhLC888/z6BBg4iKiuIXv/jFBQcRK1euZOjQofTu3Zv09PQLpsK8//77\niY+PJyIigmuvvZZ3333XvS47O5upU6eSmZlJREQEL730UosxtoUStIiIeO3gwYMsX76c3bt3U15e\nTm5uLgkJCViWRU5ODhkZGZSVlTFp0iTuvfde936DBw/m3Xffpby8nKysLGbOnMmJEycuKNuyLO68\n807y8/N566236NmzJ3PmzKFr164cOnSIf/zjH+Tm5vK73/0OgIULF5Kens6pU6coLi7mvvvuu6C8\nTZs2sWfPHv7+97+zefNmVq5cCcDmzZt58skn2bhxIydPniQ5OZmMjAz3fqNGjWLfvn2UlpYyY8YM\npk2bRlVVlXt9Tk4O06ZNo6ysjBkzZrQYY1soQYuIiNdCQ0OprKxk//79VFdXEx8fz8CBAwFITk4m\nPT0dYwwzZ85k37597v2mTp3K5ZdfDsBtt91GUlIS77//vnt9dXU106dP59SpU2zZsoXu3btz4sQJ\n3njjDZ577jl69OhBVFQU8+fPZ926dQB07dqVTz/9lOLiYrp27cro0aMviPXhhx8mMjKSuLg45s+f\nzx/+8AcAXnjhBRYsWMBVV11FSEgICxYsYO/evRw9ehSA22+/nV69ehESEsKDDz5IZWUlBw8edJc7\nevRoJk2aBEBZWVmLMbaFErSIiHht8ODBLF26lOzsbKKjo8nIyOD48eMAREdHu7e75JJLOHv2rLur\n9+WXX2bkyJH06tWLXr16kZ+fzxdffOHevrCwkC1btvDYY48RFlY7XKqoqIjq6mpiYmLc+9199918\n/vnnADzzzDNYlsWoUaMYPnw4q1atuiDWuLg49+P4+HhKSkrc5d5///3uMvv06QNAcXExAM8++yxD\nhw4lMjKSXr16UVZWxsmTJ91lxcbGuh+3FmNbKEGLiIhPMjIy2LlzJ0VFRRhjePjhh1scxV1UVMRd\nd93F8uXL+fLLLyktLWX48OEXnBO++uqrWblyJePHj6egoACoTbDdunXjiy++oLS0lNLSUsrKyvj4\n44+B2gOCFStWUFxczIsvvsg999zD4cOH3WU2PK985MgRBgwYANQm6xUrVrjLLC0tpaKiguuvv56d\nO3eyePFiNmzYwKlTpygtLSUiIoLm5uZuLca2sDVBG2O0aNHSyiISzAoKCtixYweVlZV069aN7t27\ntzqKuaKiAmMMffv2paamhlWrVpGfn3/RdtOnT2fRokXcfPPNHD58mJiYGMaNG8eDDz7I6dOnqamp\n4dChQ7zzzjsAbNiwgWPHjgEQGRmJMYaQkP+kuWeffZZTp05x9OhRli1bxo9+9CMA7r77bhYtWsQ/\n//lPoLabesOGDQCcPn2asLAw+vbtS1VVFb/61a8oLy9v9r21FmNb2HqZVR55dhYvEvTSSAt0CCI+\nqaysZMGCBXzyySd06dKFG2+8kRUrVvDiiy9edABa//fQoUP52c9+xg033EBISAizZs3ipptuumC7\n+m1nzZpFVVUVY8aM4Z133uHll1/mkUceYejQoZw+fZqBAwfyyCOPALB7924eeOABysrKiI6OZtmy\nZSQkJLjLnTx5Mt/+9rcpKyvjxz/+sXsu6ilTpvDVV18xffp0ioqKiIiIYNy4cUybNo309HTS09O5\n8sorufTSS3nggQeIj49vMtZ6LcXYFrbOZqUELdKyNNI67fWi0jRjPJvNqj16XzrSZzMkJITCwkL3\nADanaO7/DTa3oNU6EGldR+3m7kg/7k6k+u347L2TWJ5a0CKdUpoOzsVZgvFA2NYublsKFpGgoBae\ndzzt4paOIWBd3FkNHqfWLSLBzKDEI/7lcrlwuVyBDkMcSC1okTZSghY7qQXduQSsBW3rTCsS3LKV\n6EREWuJzC9oYcy/wU8uyrm70vH59pVPTAYh4Qy3ozsXuFnQf4MqmV+nDJJ1V8I0YFRFn8flWn5Zl\nPW5Zln9mpxYRkQ4hOzubzMxMR5W3evVqkpOT/RSR/ew9B61WhIiILZx+JzF/xxeM1zH7ytYErfMl\nIiL2sfMX1td06M/f/3PnzvmtrGCi6SZFRMQnTz/9NLGxsYSHhzNkyBB27NiBMYaqqipmz55NeHg4\nw4cPZ8+ePe59SkpKuPXWW+nXrx8DBw7k+eefd6/Lzs5m6tSpZGZmEhERwUsvvXTRa+bk5DBs2DB6\n9epFWloaBw4ccK87evQot9xyC/369aNv377MmzevybgfeughkpOTW5ydKpCUoEVExGsHDx5k+fLl\n7N69m/LycnJzc0lISMCyLHJycsjIyKCsrIxJkyZx7733AlBTU8PEiRMZOXIkJSUl/OUvf2Hp0qXk\n5ua6y83JyWHatGmUlZVx++23X/CaBQUFzJgxg2XLlnHy5El+8IMfMHHiRM6dO8f58+eZMGECiYmJ\nFBUVUVxcTEZGxgX7W5bFnXfeSX5+Pm+99Rbh4eH2V5QXlKBFRMRroaGhVFZWsn//fqqrq4mPj3fP\nGJWcnEx6ejrGGGbOnMm+ffsA+PDDDzl58iSPPvooYWFhJCYm8pOf/IR169a5yx09ejSTJk0CoHv3\n7hd0ma9fv54JEybwve99j9DQUH7+859z5swZ/vrXv/LBBx9w/PhxFi9eTI8ePejWrRujR49271td\nXc306dM5deoUW7ZsoXv37u1RTV6xeZCYiIh0ZIMHD2bp0qVkZ2ezf/9+vv/97/PrX/8agOjoaPd2\nl1xyCWfPnqWmpoaioiJKSkro1auXe/358+f57ne/6/47Nja22dcsKSm5aE7muLg4iouL6dKlC1dc\ncQUhIU23PwsLC/noo494//33CQtzdgpUC1pERHySkZHBzp07KSoqwhjDww8/3OKo67i4OBITEykt\nLXUv5eXlbN26FahNuC3tP2DAAIqKitx/W5bF0aNHiY2NJS4ujiNHjnD+/Pkm97366qtZuXIl48eP\np6CgwMt33D6UoEVExGsFBQXs2LGDyspKunXrRvfu3QkNbfnWGKNGjaJnz54888wznDlzhvPnz5Of\nn8/u3buB1keAT5s2jddff50dO3ZQXV3NkiVL6N69O6NHj+Y73/kOMTExPPLII3z99decPXuWXbt2\nXbD/9OnTWbRoETfffDOHDx/2rQJspAQtIiJeq6ysZMGCBURFRRETE8PJkyd58skngYuvXa7/OzQ0\nlK1bt7J3714GDhxIVFQUd911l3s0dVMt6IbPXXXVVaxZs4Z58+YRFRXF66+/zpYtWwgLCyM0NJQt\nW7ZQWFhIfHw8cXFxvPbaaxeVMWvWLB577DHGjBnDkSNH7KsgH9g6m5WugxYRaRtP78Xt9BuViGcC\nN5uViIjYQsmz47M1QQfDrdn0IRcRESeyNUHnkWdn8T5LIy3QIYiIiDTJ1nPQthTsYGqNi4ivNB90\n5xK4c9B5zm5B+1WaWuMiIuI/akH7kY5uRcRXakF3Lra0oI0xtwMvNHgq3bKsvzbcJqvB49S6JVAM\nSqAi4jwulwuXyxXoMMSBvG5BG2MuA/o1eKrEsqyzDdY7LhsqQYuI06kF3bnY0oK2LOsr4KsWN8r2\ntvT/7K8PpIhIcHK5XGRmZnL06NFAhxKUdKMSEZEgpDuJdXz2Juhs34vw54dQHzYR6VDsvFJGV6YE\nnM2TZVgOWkRExN9CQkIumBFqzpw5LFy48KLtFi9ezNSpUy947r777mP+/PkArF69mkGDBhEeHs7A\ngQN59dVXgdr5m1NSUoiMjCQqKorp06e79z9w4ABjx46lT58+DBkyhA0bNrjXbdu2jWHDhhEeHk5s\nbCxLlizx6/tuDzZ3cTv/Vp8iIuI/zc3lPHPmTB5//HHKysqIiIjg3LlzrF+/nu3bt1NRUcH999/P\n7t27SUpK4sSJE3zxxRcALFy4kPT0dN5++22qqqrcU1JWVFQwduxYnnjiCd58800++ugjxo4dyzXX\nXMOQIUOYO3cuf/zjH7nxxhspKytz9LSSzbG1BW1ZlqMWERGxX1O/tzExMSQnJ7tbudu3b6dv376M\nHDkSqG2Jf/zxx5w5c4bo6GiGDh0KQNeuXfn0008pLi6ma9eujB49GoCtW7eSmJjI7NmzCQkJYcSI\nEdxyyy3uqSW7du3K/v37KS8vJyIiwv06wUTzQYuISLuYPXs2a9asAWDNmjXMmjULgEsvvZT169fz\nwgsv0L9/fyZMmMDBgwcBeOaZZ7Asi1GjRjF8+HBWrVoFQFFREe+//z69evVyL6+++ionTpwA4E9/\n+hPbtm0jISGB1NRU3nvvvQC8Y98oQYuIiNcuueQSvv76a/ffx48fb3Zw7+TJk/noo4/Iz8/n9ddf\n5/bbb3evGzduHLm5uXz22WcMGTKEO++8E4Do6GhWrFhBcXExL774Ivfccw+HDh0iPj6elJQUSktL\n3cvp06dZvnw5ANdeey2bNm3i888/Z8qUKdx222021oI9lKBFRMRrI0aMYO3atZw/f57t27fzzjvv\nNLttjx49uPXWW5kxYwbXXXcdsbGxAPz73/9m8+bNVFRU0KVLFy699FJCQ0MB2LBhA8eOHQMgMjIS\nYwyhoaFMmDCBgoIC1qxZQ3V1NdXV1Xz44YccOHCA6upq1q5dS1lZGaGhofTs2dNdXjBRghYREa/9\n5je/YcuWLe4u5h/+8IcXrG/cmp49ezb5+flkZma6n6upqeG5555jwIAB9OnTh507d/Lb3/4WgN27\nd3P99dfTs2dPJk+ezLJly0hISOCyyy4jNzeXdevWMWDAAGJiYliwYAFVVVVAbRd6YmIiERERrFix\ngrVr19pcE/5n62QZGpglItI2nt7qM1hvVHL06FGGDBnCiRMnuOyyy/xefrAJ3HSTIiJii2BsANXU\n1LBkyRIyMjKUnD2gBC0iIrarqKggOjqaxMREtm/fHuhwgoK6uEVEHESzWXUuLXVxa5CYiIiIA9na\nxd0egxhEJLioFSjiGVsTdB42zrQiIkEnDc2QJOIpW89B21KwiAQ1taBbpnPQnUvgLrOyc65SkWCS\nlqbEJCJtokFiIiIiDqQubhEJGPUqXKyj30lMLhSwLu6sBo9T6xYREQBd41HL5XLhcrm82tfOgbga\n0Bd4akGLiLSgvVuRbWlB252gPXnvq1atYuPGjeTk5ACQlJTEyJEjee211wCIi4tj69atrFy5ko0b\nN1JWVkZSUhJLly7lpptuAiA7O5v8/HzCwsLYtm0bSUlJrFq1im984xu2vT+nCNwgsWxbSxcRsVd2\noANwvtTUVB588EEASkpKqK6u5r333gPg8OHDVFRU8M1vfpNRo0aRnZ1NREQES5cuZdq0aRQVFdG1\na1cAcnJyWLduHWvXrmXp0qVMmTKFgoICwsI67x2pfW5BG2PmACuBBMuyjjR4Xi1oEQl6akG3Lj4+\nns2bN3Pw4EHy8vLYt28fL730Ert27WLz5s1s2rTpon169+7N22+/zTXXXEN2dja5ubns2rULqK3z\nAQMG8Nprr7lb2R2V3S3oRGA/cOziVcrRIuJPRgOXHCglJQWXy0VhYSEpKSlERkby9ttv87e//Y2U\nlBQAnn32WVauXElJSQnGGMrLyzl58qS7jNjYWPdjYwyxsbEcP3683d+Lk/gjQY8HfmpZVs3FqzQM\nRESko0tJSSEnJ4dPP/2UX/7yl0RGRrJmzRree+895s2bx86dO1m8eDE7duxg2LBhQG0LuuHB1tGj\nR92Pa2pqOHbsGP3792/39+IkPidoy7JGtbDO1+JFRMThUlJSeOCBB4iJiaF///5cdtllzJw5k5qa\nGkaOHMkbb7xBWFgYffv2paqqiqeeeory8vILytizZw8bN25k4sSJLFu2jO7du3P99dcH6B05g25U\nIiIiPklKSqJnz54kJycDEB4ezqBBg7jxxhsxxpCenk56ejpXXnklCQkJ9OjRg/j4ePf+xhgmT57M\n+vXr6d27N2vXruXPf/4zoaGhgXpLjqD5oEVEHKQz3qjk8ccfp7CwkFdeeaVdXs9JAneZlYiI2KIj\nNYA60nvxJ3Vxi4hIQBlj2qVHINioi1tExEE03WTn0lIXt1rQIiIiDqQELSIi4kBK0CIiIg6kBC0i\nIuJAtl5m1RFG5WlQhog4RUf4TRXP+SVBG2NWA/+yLOvxhs/bOdNKe9CE5SLiFM2N9JWOy19d3Baa\nukpERMRv/NnFfdHRXUdogTbVpaRubxERsZs/E/TFWSsvuLu4m5QW/AcdIiLifLbeScyWgh1ELWkR\n8beW7iwlnYuto7izGjxOrVs6Cn17RMQfXC4XLpcr0GGIA6kF3Ump9S/iTGpBSz17p5vMtrV08VZ2\noAMQEZHWqAXdSakFLeJMakFLPXtb0LZcGm2UXEREpMPTvbhFREQcyOYWtHppREREvGFrglZXtIiI\niHfUxS0iIuJAStAiIiIOpAQtIiLiQErQIiIiDqQELSIi4kBK0CIiIg6kBC0iIuJAStAiIiIOpAQt\nIiLiQF7fScwYsxr4l2VZjxtjaoAEy7KONNrGx/DEiXSHOBER+/lyq0+LVqaryiPPh+LFidJIC3QI\nIiKdgi8JutXmsX7MRUREvONrC7qpx/95Ul2hIiIiXjF2JVFjjLKzdEg68BQ7GWOwLEsDeMTe6Saz\nGjxOrVtEgpl+NcXfXC4XLpcr0GGIA6kFLdJGakGLndSClnq2tqD1QyYiIuId3ahERETEgWxtQetG\nJcFBPR0iIs5ja4Ju5T4m4gg6iBIRcSKbE7R+/EVERLyhQWIiIiIOpEFiIiIiDqQELSIi4kBK0CIi\nIg6kBC0iIuJAStAiIiIOpAQtIiLiQErQIiIiDqQELSIi4kBK0CIiIg7kc4I2xuQbY7KaWdcui4iI\nSEfjj1t9WjQzK0YeeX4ovmVppNn+GiIiIu3N1ntxK3mKiIh4x97ZrPL81IJOS9PEGyIi0qlokJiI\niIgD+aMF3fworTR1cYuIiHjDX4PEmtRwaHdq3WLQPNEiIvVcLhculyvQYYgDGbuSpTHG8VlYBwoi\n4jTGGCzL0vWj4nsL2hjzF+DPlmUtv2hltq+l2yg70AGIiIg0zx9d3AOBPk2uyfZD6SIiIp2QzV3c\n3pZt1P0sIp2Surilnr3XQbcwwFtERESaZ2uCVitYRETEO53qRiVOvJRBMXnOiXEpJs8oJpG2U4IO\nMMXkOSfGpZg8o5hE2q5TJWgREZFgoQQtIiLiQJ36TmIiIk6ky6wEbEzQIiIi4j11cYuIiDiQErSI\niIgDKUGLiIg4kEcJ2hhzjzHmX8aYM8aY3caYm1rZ/hpjzNvGmK+NMceMMQub2CbFGLOnrsxDxpj/\n1dbg/R2XMSbVGFPTxHKlHTEZY7oZY1YbY/YZY6qMMXnNbOdTXfk7pgDUU6oxZrMxpsQYU1EX24+b\n2K4966nVmAJQT0ONMXnGmM8a1MF/G2O6NNquXb97nsTV3nXVaL8kY8xpY8zpJta122fKk5j8UU8S\nRCzLanEBfgRUAXOBq4BlwGkgrpntw4HPgHXAUOBWoBx4sME2iUAF8Ju6Mn9S9xq3tBaPzXGlAjXA\nEKBfgyXEppguAX5b9/43Ajua2ManurIppvaupwXAr4AbgATgbqAayAhgPXkSU3vX0yBgFnANEAdM\nrPvMLw7wd8+TuNq1rhrs1xXYA2wFygP53fMwJp/qSUtwLa1vAO8DLzZ6rgBY1Mz2/xs4BXRr8Nwv\ngWMN/n4aONhov/8H7PI4cHviqv/w9/GqMtsYU6Pt/i+Q18TzPtWVTTEFrJ4abL8e+KMT6qmFmJxQ\nT79uWAeB+O55GFdA6gp4Dvg9MBs43WhdQD5TrcTkUz1pCa6lxS5uY0xX4FtAbqNVucDoZna7Adhp\nWVZlo+37G2OuaLBNU2Vea4wJbSkmm+Oqt7uu6/J/jDGprcXjQ0ye8LqubIypXiDrKQL4ssHfTqin\nxjHVC0g9GWMGA99vVEYgvnuexFWv3erKGPNfwH8B82h66r12/0x5EFO9NteTBJ/WzkH3BUKBE42e\n/zdweTP7XN7E9icarAOIbmabsLrXbI1dcZVQ21V5S91yEPiLh+eNvInJE77UlV0xBbSejDETgDHA\nigZPB7Sdg+MPAAADGUlEQVSemokpIPVkjNlljDlDbWvtfcuyshusDsR3z5O42rWujDH9qf1f3W5Z\n1tfNlNuunykPY/KlniTI2DHdpFPvfNJqXJZlFVD741HvPWNMAvAQ8K49YQWfQNaTMeZGYC0wz7Ks\n3Xa+lqeaiymA9XQbcBkwAlhsjHnGsqxf2Ph6nmo2rgDU1SvAby3L+tCGsr3Vakz6jepcWmtBnwTO\nU3sk2VA0cLyZfT7j4iPE6AbrWtrmXN1rtsauuJryAZBkU0ye8KWu7IqpKbbXU10rYRuw0LKsFxut\nDkg9tRJTU2yvJ8uyjlmWdcCyrHXAI8D9DbpkA/Hd8ySupthZV2lAljGm2hhTDfwOuLTu75/UbdPe\nnylPYmqKp/UkQabFBG1ZVhW1ownHNVo1FtjVzG5/A5KNMd0abV9sWVZRg23GNlHmh5ZlnW8taBvj\nasoIaruV7IjJE17XlY0xNcXWejLGfJfaRJhlWdayJjZp93ryIKamtPfnKZTa73n9dz0Q3z1P4mqK\nnXU1HPhmg+Ux4Ezd4z/WbdPenylPYmqKR/UkQai1UWTUdktVUnupwNXUXnJQTt2lAsCTwP802D6c\n2iPEPwDDqD1PUgY80GCbBOArakcrXk3t5QuVwA89Hd1mU1zzgcnUHo0OqyujBphiR0x1zw2l9gu2\nDviQ2i/jCH/VlU0xtWs9UTtytYLaUbXR1LZqLgeiAlVPHsbU3vWUCUyl9hKcgXX7HwPWBPi750lc\n7f7da7T/HC4eMd2unykPY/KpnrQE1+LZRrWXKP0LOEvtD/ZNDdatAg432n448Da1R3/F1Hb/NS7z\nu9QeYZ4FDgF3tTl4P8dF7XmcAuBr4Iu6bdNtjulfdV+wGmq7xGqA8/6sK3/H1N71VPf3+QYx1S+N\n4263evIkpgDU0/S6919O7fW2+dR2JXdrVGa7fvc8iau966qJfefQ6JrjQHz3WovJH/WkJXgWzWYl\nIiLiQLoXt4iIiAMpQYuIiDiQErSIiIgDKUGLiIg4kBK0iIiIAylBi4iIOJAStIiIiAMpQYuIiDiQ\nErSIiIgD/X9zx531pAILaAAAAABJRU5ErkJggg==\n",
737 "text/plain": [
738 "<matplotlib.figure.Figure at 0x7f6540d40be0>"
739 ]
740 },
741 "metadata": {},
742 "output_type": "display_data"
743 }
744 ],
745 "source": [
746 "ax = punctuation_normalised[punctuation_normalised.sum(axis=1) > 0.1].plot(kind='barh',fontsize=14)\n",
747 "ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))"
748 ]
749 },
750 {
751 "cell_type": "markdown",
752 "metadata": {},
753 "source": [
754 "## Visualising the punctuation\n",
755 "Let's print the punctuation sets side-by-side to compare them."
756 ]
757 },
758 {
759 "cell_type": "code",
760 "execution_count": 23,
761 "metadata": {
762 "collapsed": false
763 },
764 "outputs": [
765 {
766 "name": "stdout",
767 "output_type": "stream",
768 "text": [
769 "?'\"\"!...,,.--,,,\",--.\"\".\"\",.\"-,'\",\"...,\" .',,,.!..!,.,!....,,?...'..,,.?.-,.?!!,.\n",
770 "\"\".\"\"\"..,..\",.\"\",'.\".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\" ...',...!\",.\",\",\"'..?\"\".,\",,\",,,.,..?.?\"\n",
771 ".\"\".\",;,\"\".'''''''''''',''''\".\",-,.--,.' \",\".\",\",,\"?.\",..\",\",.,',.',..,,(),:\".?.\"\n",
772 ",--',,,\",.\"\".\"..-''..,,.,,\"',..\",\".\"\",.\" .\",\",--\".?',.',\".\".,'.\".\"',\".\".\"\"';.?\"()\n",
773 "..',,\"\",\"\",....\"\"-\"\".,..,,\",,..\"\".,.,,.- ,\"'....\",,..\"?\".\",.,..\",.\"?,\",.\"...'!\",.\n",
774 ",-.,,.-,,,,,.,,,,.\"\".\"\",.\"\".\",.,.\"\",.,,, .\"?\".\",',.,.\",,,.\",\",,\"?,\",\",?\":\"'....?\"\n",
775 ".,\",.\",\".\"\".\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\",.,,\"\" \"..-,'.',..;,.--'.\"\",,\",'.\"-,.'.\",',,,.\"\n",
776 "\",.,..\"\",\"\".,,.\"\";\".\"\",\".\",-,\"\"\",,\"..\"\", ,\",,\"',',,.''.\"'.:.',,';.,,*.,,.',,,..*.\n",
777 "\",.\":,,-,.\"\",\".,.--.\"\".\"!.-!,!-!-!-!,,,, ,\",\"\"?\",,;,'.,,;.,,,\",,.\",,.-.,,,,.,----\n",
778 "\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\", -.,,,.,,,,.,,.,,,,,.\",\",.\",,,\",.\",.\"-,-,\n",
779 "!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\"\"\",\"\"....\"\"\"\"\".\"\".\"\" ,.\",,,\".\",\",,\"?.?\",,,.\"!\".,-,,-,,.,-'.,,\n",
780 ".\",.\"\"\"\"\"\"\"\"\"\"-,...\"\"\"\"...,.,.,-\"\"\"\"\"\"\"\" ..-,,,.,,,.\",,,\",.,.,.','.:\"?.\"\",,.\"\"?\".\n",
781 ".\"\",\".\",.,,\"\".\"\"\"\".\"\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\" .,.,,'.\",\".,,.,,,,,,-,,,..,.,'.,,,.-..,.\n",
782 ".\",,.'\".\"\".\"\"\"\"\",-,,.-,\",.\"-'\".',.'.,,,. '..,,,.,,.,,,',,,..-,..',,,.'.','.\",!,,\"\n",
783 ",,,,.,,,,,,,..,-,--,,.',.,-,.,.\",\",,,.\"\" ,:\",.\".,.\",\".\",\".\",\";,-..\",,,\",..--.,,,,\n",
784 "\"'.,\"\"'.,,\"\";.,.'..,..,,,..-,,,.,-.,.\",, ,,,,----.,.,,.\"!\";.\",,\",...,,,,.,,',..\",\n",
785 ".,,,,,,\"\"\".\",'..-,.,,,.,.,.,,,,..,..-,., '....,?\",.\".\",,.\",\",,.,,.,.',,-,,,,-,.,,\n",
786 ",.\"...,?,,?,.,.,'.,.,,\"\"\".\",.,,,-.,,.\",- ,,.\"'?\",,.\",,\",.\",\".-.-,,,,..,,';,',,'.,\n",
787 ",,,..,.,,'','&',..'\",,-,,.'.,,.'.,',''\", .,;.\"!\".\"!\",.,,,,.,,'-,.,.\"...,\".\"----,!\n",
788 ".,.,.'.',''-,.\".',...,...,,,.''.''.!'''. \"\"?\".,,.',,.\",,,,\".,,,.:,'.,,,.,,,,,.,.,\n"
789 ]
790 }
791 ],
792 "source": [
793 "line_len = 40\n",
794 "for i in range(5,25):\n",
795 " print(sherlock['punctuation'][line_len*i:line_len*(i+1)], wap['punctuation'][line_len*i:line_len*(i+1)])"
796 ]
797 },
798 {
799 "cell_type": "markdown",
800 "metadata": {},
801 "source": [
802 "Again, now I know it's working, wrap it in a function."
803 ]
804 },
805 {
806 "cell_type": "code",
807 "execution_count": 40,
808 "metadata": {
809 "collapsed": true
810 },
811 "outputs": [],
812 "source": [
813 "def compare(text1, text2, offset=0, line_len=40, max_lines=30):\n",
814 " for i in range(offset, min(max(len(text1), len(text2)), line_len * max_lines), line_len):\n",
815 " t1 = text1[i:i+line_len]\n",
816 " t1 += (' ' * (line_len - len(t1)))\n",
817 " print(t1, text2[i:i+line_len])"
818 ]
819 },
820 {
821 "cell_type": "code",
822 "execution_count": 41,
823 "metadata": {
824 "collapsed": false,
825 "scrolled": false
826 },
827 "outputs": [
828 {
829 "name": "stdout",
830 "output_type": "stream",
831 "text": [
832 ",,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,., !!\",.,,,,.,,.,,,,,.\",,.',\",.\"??\".\",?\"\"'?\n",
833 "'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\" .,\".\".\"\"'..\"\",,\",,-,.\"'!,'?.\"\"?\",.\"?,.\",\n",
834 "\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",-- .,,.,,.,,,,,,,,.:\",'.',,,.!..!,.,!....,,\n",
835 ".\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\" ?...'..,,.?.-,.?!!,....',...!\",.\",\",\"'..\n",
836 ".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".''''''''' ?\"\".,\",,\",,,.,..?.?\"\",\".\",\",,\"?.\",..\",\",\n",
837 "''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''. .,',.',..,,(),:\".?.\".\",\",--\".?',.',\".\".,\n",
838 ".,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\" '.\".\"',\".\".\"\"';.?\"(),\"'....\",,..\"?\".\",.,\n",
839 ".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\" ..\",.\"?,\",.\"...'!\",..\"?\".\",',.,.\",,,.\",\"\n",
840 ".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\" ,,\"?,\",\",?\":\"'....?\"\"..-,'.',..;,.--'.\"\"\n",
841 "\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\". ,,\",'.\"-,.'.\",',,,.\",\",,\"',',,.''.\"'.:.'\n",
842 "\"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--. ,,';.,,*.,,.',,,..*.,\",\"\"?\",,;,'.,,;.,,,\n",
843 "\"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\" \",,.\",,.-.,,,,.,-----.,,,.,,,,.,,.,,,,,.\n",
844 ",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\" \",\",.\",,,\",.\",.\"-,-,,.\",,,\".\",\",,\"?.?\",,\n",
845 "\"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\" ,.\"!\".,-,,-,,.,-'.,,..-,,,.,,,.\",,,\",.,.\n",
846 "\"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" ,.','.:\"?.\"\",,.\"\"?\"..,.,,'.\",\".,,.,,,,,,\n",
847 "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,, -,,,..,.,'.,,,.-..,.'..,,,.,,.,,,',,,..-\n",
848 ".-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--, ,..',,,.'.','.\",!,,\",:\",.\".,.\",\".\",\".\",\"\n",
849 ",.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'.., ;,-..\",,,\",..--.,,,,,,,,----.,.,,.\"!\";.\"\n",
850 "..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,., ,,\",...,,,,.,,',..\",'....,?\",.\".\",,.\",\",\n",
851 ",,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,. ,.,,.,.',,-,,,,-,.,,,,.\"'?\",,.\",,\",.\",\".\n",
852 ",,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\" -.-,,,,..,,';,',,'.,.,;.\"!\".\"!\",.,,,,.,,\n",
853 ",,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',.. '-,.,.\"...,\".\"----,!\"\"?\".,,.',,.\",,,,\".,\n",
854 ".,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,., ,,.:,'.,,,.,,,,,.,.,.,',.\",?\".\",\",',\"--.\n",
855 ".,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,- ...\"\",?\"\".\"\"?\"\",\",,\"!\".,,,.,,'.\"!...,,?\"\n",
856 "\"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\". .\",\".\".?\".\",!\",'.,.\",,\",.\"',.,\",.,,,,.,,\n",
857 "..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'- .\",\".\",\".':\"!..\".'..-.\",?\",.\"'..\",,,.\",?\n",
858 ",-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,. \".\",,,\",\"...\",,,..,,''.',,;'.\",,\".\",';'-\n",
859 ",.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'.. -,\".\",',!.,\",.\",,\",.,,.,,,.',,.;.,----,,\n"
860 ]
861 }
862 ],
863 "source": [
864 "compare(sherlock['punctuation'], wap['punctuation'], offset=100)"
865 ]
866 },
867 {
868 "cell_type": "code",
869 "execution_count": 42,
870 "metadata": {
871 "collapsed": false
872 },
873 "outputs": [
874 {
875 "name": "stdout",
876 "output_type": "stream",
877 "text": [
878 "..;,,',.'...,;,,.,.,,;,,',',,',,.,;',,,, ,,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,.,\n",
879 ".,,,,,.;,,-,',,;'.,;,,.',;':.!,,;,.,,',' '.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\"\n",
880 ";;',';,,'.?,',',,;,,,,,.,;,--,.,,;,;,.,, \",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",--\n",
881 "',,,,.,:,?,:,..,!??,.!,,;,,!'.,!'.,!'.,, .\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\"\n",
882 ",,,,,,,,,,,'!':.',:,,,,'.:,,.,,:;.,,,.', .\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".'''''''''\n",
883 "'-,,,,,.!',,',,',,,,,-.,.,.!??.:!-!'',,. ''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''.\n",
884 ":!,,,;,,'.,,'.!,'.,.!.,.!.,.,.,,.,:!:;., .,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\"\n",
885 "':!,,'.,.-,',,',''.,-,,;,.,:;!,:'.,.,:,! .,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\"\n",
886 "'!;?;;,',,.,,.,,'.';:,'.,';'',';,',.':-; .\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\"\n",
887 ",:,.?,,.',,,-.,,;,.,,,.,,.,,.,..,..,.,,. \".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\".\n",
888 "?,?,.:,;,.:.,,.:'.!.',';.,-.,..??.,,;.': \"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--.\n",
889 ",.,.',',!'',,;,!',;;,.,.,.,.,'.,.,.,';,' \"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\"\n",
890 ";,,':,,.'?,,,.,..,';,,',.',,;.,'.,,,;;'. ,!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\"\n",
891 ";-;,';,-;,.,.?,.-,-,--,-,.,-.,;,,,-,,,:; \"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\"\n",
892 ",,.,.,.;.;;.'.;,-.,!?,,,,,,,,,';,:;;,,:- \"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\"\n",
893 ",'.,:';.-.;,,';;,;,,,,.,,,,;,,-.,':;,,;; \",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,,\n",
894 "-,?,,:?,';.,-,:',;,,'.,,-;,,'',;;,,.,,!. .-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--,\n",
895 ".!,.!.,:.,!?,;',,,.,?,,,',,.,,,?,?,,?:,' ,.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'..,\n",
896 ",,,,,,,,,'.,,,';,,:',,':,;'',:::,,,,::-, ..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,.,\n",
897 "',,.,,,,,,,.,:.;.?,.;.:,,,',',;'-;,,,,,, ,,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,.\n",
898 ",,.,,;,.?,'.,,;,,.,..,!,.,:.,.',',,-'.., ,,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\"\n",
899 ",,':,'-,;''',,,-.':,-,',,--.;':.;.'.',:, ,,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',..\n",
900 ",,,,,,:,,'.?,.,.?',.'';,,.!,.,-:,:,.??,, .,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,.,\n",
901 "?.;,,,:,,,,;,,.,,?,..,,;.:,;,:,?',..,';, .,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,-\n",
902 ";;:,.:;,,.,,,,.,!.,;'.',.,:,,.?,.,.,.,-, \"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\".\n",
903 "-,:,';',:',.,::;..,..,,.,;,,;-,-,,,,.;,. ..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'-\n",
904 "..,-,;,-,;.,,:,,;,,:,,,;,,..;,-',!,;,.,, ,-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,.\n",
905 "&.,!..,-;:,,,,,',..,;,:',,,.,:,.;,,,.,;, ,.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'..\n",
906 ",,.!,,,'.,;;.-,,,,.:,.,,;,,;,,.'!,,,;!:! .,,\"\",\"\"...,.?,.,..\"\"\"\"!\"\"\"\"\"\"\"\"\"\"\"\"...\"\n",
907 "''!,,'.!?:,,;,,.!-,-.,.',:;.,,.,,.!?.,:. -..,,,.,,,-,.,,,,.;,.,-,,.,,;,.\"\".\"\".\",,\n",
908 "!.,.,';.?:,'.,,;;,,.'??!!??,.,,,..!,.?!. '\".\"'\"\"'\".\"''.,,.,'\"\"'.,\"\".-..,\",.,,.,,.\n",
909 ",;.?!?:.!:.,:?',.;,,;,,;'''.??',',,,,,', .,-.,,..,..\"\",-,,--\"\".,.,',..\",\".\".\"\"\"\"\"\n",
910 "?,,,,.:.!','..,;.,;,:,,,!,,,.,,!.,!!,:,. \"\"\",\"\"\"\"\",.\".,,.\"\"\"\",,.,,,.\"\"\"\"..?-,.,.,\n",
911 "!!'?!!!??,?!?,;,!.!:'.?,;'.,--;.,,?.,,.? ,,\"\"\"\"-.,.,.,.;-.-.....-.,-.,,,,.,,,.;,.\n",
912 "',.,.:.;,,;,,,,,:.,,.,:.?,.,:,!,;-,.,.,, -\"\"\".\".-,,.-,.\"\"\"\".,.,,.\".:\"-,\",.\"'\",.\",\n",
913 "';,,,',',','',',',',',,:.,:,:;',.,.,,;,, \".,.\"\",.\"\"\"\"\"\"\",.\"\"\"\",\"\"\".\"\".\"!\"\"\"\"\"\"\"\"\"\n",
914 ".?,!;,.,.,-,,;.;,,,.,:;,,..,?;,,-,;,.,., \"\"\"\"\".,.,'\"\".-!!\",.,..\".,\".\".\",.\"!.:\"\"\",\n",
915 ",',.,:,;.-,?!;';,.,.,.,,,.:,.,!,.''!?,,; .\"\"\"\"\"\".\"\"\"\"-,.,,.-,,,,.,\",.\".:\".--..,.,\n",
916 ",,.,-,,,,,',,'.'','.,;:,.,:;'.'!,.,,.!!. ,....,,.,,.,,...,,,,-,,.\",,..,,-,.\",;-.,\n",
917 ",!,!!',',,,,:',,,,;,,,,,,,,,,.?.,!??-,?, ...,.;,.,\",\",\"\"-,\",.\"??\"\"\".\"'\"\",\";\"..\"\"\"\n",
918 "!!.:;,.,:,,.-,,-,,.?,,,-,,,;,?,''?,,:';; \"..-\".\"\".\"\"\"\".\"'\".\",\"\"..-\",,,.,.'.,.,,..\n",
919 "',,,,,,.,,:,,--.,.,..,;,.:,'.;;:,;',,,:. -,.,,-,.,.\",\".\"\"\".\"\"\"\".,.,,\",-.\"\",,.\",,.\n",
920 "-!!!!....?;,;,,,.--,-,-',,;:,,.,!!!!,:'. ,,,\"\"\".\",,,\"\"\"\",,,.,.,.,,,.,.,.,...,,.,,\n",
921 ".,:,.,?-.,,,,.-,.,,?-.-,:,--.,.,-.,;.,;, \".,,,,.,,.,,,.',--,,,.-.,,,.',.\",,,,,\".,\n",
922 ",.',.';,,..,!-?.,,,,,'.-,,',;':,.,,-,,', ,.\",-,,.\".\",,.','\"\",..,\"\",,,\"\"',,,--\"\",,\n",
923 ",;,,,,,'';,.,,;;,,.,;,,''..''-,?,',;,',' .\"\",\"\",\"\"..'.,,-,\"..\",\".\",,\"\",\",\".'',,,.\n",
924 ".:.;.!?.;,,,.,',,..'?'',,'.';,,.',,';,,, ,.\"\",\".\"..,\":\"-:,,,..,.--,.,',,,',\"\"\".,.\n",
925 ",.'??!,?.,!,!''.?'!!,,';','?!,,?;,,.':', \",'\".\",.,,,.,,\"\",.\"\".,.\"\",,.\",;\"',.',.,;\n"
926 ]
927 }
928 ],
929 "source": [
930 "compare(shakespeare['punctuation'], sherlock['punctuation'], offset=100, max_lines=50)"
931 ]
932 },
933 {
934 "cell_type": "code",
935 "execution_count": 43,
936 "metadata": {
937 "collapsed": false
938 },
939 "outputs": [
940 {
941 "name": "stdout",
942 "output_type": "stream",
943 "text": [
944 ",,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,., ?,,:--?!,.--,,.--,?--?.--,'?..'.,!..,,.'\n",
945 "'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\" .,:,-..--,.?--!.?--,.'..',....--!.,',:--\n",
946 "\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",-- ...,,:--'!:.,'?,.--!.':?..__.,,!.._!_!..\n",
947 ".\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\" ..--!.'.--,.''.--,.--,,,,.'........--!.,\n",
948 ".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".''''''''' !,,.,,-.,,.,,,,,,,.....--,!..?--,..--,..\n",
949 "''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''. .,.'.',.'.--,.'.--',..'...--,,...'.!...-\n",
950 ".,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\" -,,!,...?..--',.....,'.--,.!,:--.-.',.--\n",
951 ".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\" ',,?.....--!.''..,,..'..--.'.'???''.'.:,\n",
952 ".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\" .,!,!!,,'.'.'!'!.,,',........--,.'.--?..\n",
953 "\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\". '.?,..--?.--,?.'.'.,.,,:--'?:--??'..??--\n",
954 "\"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--. ,,...--?.?.--,,_,'._'.--?.??.--,,'?..'.'\n",
955 "\"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\" .'.?,'.'...'.''.!.'..,,:--.--?.--,..--,!\n",
956 ",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\" ..,..,,.:--,?--',.:--.?,,..,:--',.'..:_'\n",
957 "\"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\" ._.,..,.,..,,.,.':,.:...,:'.?:,,,..:_._,\n",
958 "\"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" :._._...,,.'.,,,,,,.,,....,.._:._!!,!.--\n",
959 "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,, !'.,.,',.--,,...'.--',,.--,',...--.'.,?,\n",
960 ".-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--, .--,.--?.??.--,.--,.'..,:_,',,!,!,'!_.,,\n",
961 ",.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'.., .?,?,,,.....',.:,.--',.,,?.,.--?.--,.,'!\n",
962 "..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,., ,:--!--',,.,,.,...,.--',,...,!!,!,,.,..,\n",
963 ",,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,. ,.'?,,'...--?..--,.'.--,!..:--.--!,....,\n",
964 ",,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\" '.,:--_._.--',.,,,,'?,,':--,..--,,.:--_,\n",
965 ",,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',.. ,_._,',_,_'._,.--',,,...,:--,,'?--,.--?.\n",
966 ".,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,., ,?--,,.,,.'.--!,.?!,,:_--'.,..._..--,!--\n",
967 ".,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,- ,',.,.'.--',,..--?,.,!.--,,.--,?.--,.,..\n",
968 "\"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\". .,.,.,,.,.,.,,,.,:.--,',,.--,,..--,,'.,,\n",
969 "..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'- '.--,?.--,',.--,..,,:.','','..--?.--,?.,\n",
970 ",-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,. .--,.?--,,.,?--,.--',,.--,,''.''.--,..,.\n",
971 ",.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'.. ,'?--,,,,.:--?,,'?.--,?,.,'.',.,,.--,,.,\n"
972 ]
973 }
974 ],
975 "source": [
976 "compare(sherlock['punctuation'], ulysses['punctuation'], offset=100)"
977 ]
978 },
979 {
980 "cell_type": "code",
981 "execution_count": 44,
982 "metadata": {
983 "collapsed": false
984 },
985 "outputs": [
986 {
987 "name": "stdout",
988 "output_type": "stream",
989 "text": [
990 ",,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,., .,\",\"!.\"\"?\"\"!,!__,.\"\".,,,,..\"\",.__,.-,.\"\n",
991 "'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\" \",.\"\",,..\"\",.\"\"..,,,,.,__.\"\"-,..;;.\"\".;,\n",
992 "\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",-- -.__.\"\",\";\";.\"\".,__?..\"\",....\"\",.\"\",.\"\",\n",
993 ".\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\" ,.\"\",,,.\".,,,,--.__.,,.,.;....,;..,:\".,.\n",
994 ".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".''''''''' \"\"__.,\",\".\"\",,\",\",..\"\"...,,.\"\",\".;\".\".,,\n",
995 "''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''. ,.\"',,'!..\"\",\";\".\"\",\".\",?\"\"-.\"\",,\",\".;,.\n",
996 ".,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\" \"\",,,.__.\"\",.,,;?\"\".'..__;,.;,,,,.\"..,\",\n",
997 ".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\" !\"\"?\".\",,?__.,?,,.\",.\",\",\"..\"\".,\".\"__;?.\n",
998 ".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\" ;,.\";.;,,.\",.!..,!,,.\"\",,,\".;,,,.\",!\",.\"\n",
999 "\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\". ;,,.,,;,.,,__,..\"\"!\",\";__,'.\".',..,,,,..\n",
1000 "\"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--. --,,;,-,...,,,,,.!;.'.\",\".,\",.\"..',.,;.,\n",
1001 "\"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\" ,.;.,..,,,,...;,.;..,,--.--.,,,..-;,,.,.\n",
1002 ",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\" --,.,;.,,,,,.,.,,;;,;,,..;,,,..!..,,,..,\n",
1003 "\"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\" ,..,.,,;,..,,.\",,\",\"...\"\".,..,.\"\",\".,\"!,\n",
1004 "\"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" ;.\"\"__,\".,.\"!!,,..\"\"?\",,:\",__;.,.\"...;.,\n",
1005 "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,, ,;,,.....,.,.'.;,.,,,,...;.';.\"!.,\",\",..\n",
1006 ".-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--, ,.;.,!__,;!.,.!,,;,,;.,,.,,,,__--\"\"__,\",\n",
1007 ",.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'.., \"!',.!\"\"!,.!...'--\"...,,,..\",\",\"__;,,.!,\n",
1008 "..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,., ,!!,,-..\",,.,.\",\",\",-,;!--,!\"\",\",\",..\"\".\n",
1009 ",,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,. .\"\"?..__,__.?..,,..\"\"!\"\"!,,....\"\";.\"\";__\n",
1010 ",,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\" .__,!--.--',--.',,?.\"\"--..,;.\",;;,,.;,,.\n",
1011 ",,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',.. ,,,,,,.;'..,,..,;,,,.;,,--.,,..,.,----,,\n",
1012 ".,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,., .,.,,,,.',,.,.,.,,,,-,..,..;;,;;,,.,,,,.\n",
1013 ".,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,- ,..--,,.,..,,..,;,,,,,,,.,,;,.,,,.'.,...\n",
1014 "\"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\". ,,,-,'.;.\"__,,\".-.\"__.'.\"\";.\"\"!,,.__--__\n",
1015 "..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'- ------..\"\".;?.',,__?:'!,;.'\"\"!,----,,,.\"\n",
1016 ",-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,. \"____,,\".\".,?--!--__.\"\"'-,,..--.\"\",'?--?\n",
1017 ",.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'.. \".\"..\"\"--,;.\"\",\",\",.__.\"\",.,..;,.,.\"\".,\"\n"
1018 ]
1019 }
1020 ],
1021 "source": [
1022 "compare(sherlock['punctuation'], pap['punctuation'], offset=100)"
1023 ]
1024 },
1025 {
1026 "cell_type": "markdown",
1027 "metadata": {},
1028 "source": [
1029 "### Compare more than two texts at a time"
1030 ]
1031 },
1032 {
1033 "cell_type": "code",
1034 "execution_count": 29,
1035 "metadata": {
1036 "collapsed": true
1037 },
1038 "outputs": [],
1039 "source": [
1040 "def compare_many(*texts, offset=0, line_len=100, gap=' ', max_lines=30):\n",
1041 " def padded_segment(text, start, length):\n",
1042 " segment = text[start:start+segment_len]\n",
1043 " segment += (' ' * (segment_len - len(segment)))\n",
1044 " return segment\n",
1045 " segment_len = line_len // len(texts) - len(gap)\n",
1046 " max_len = min(max(len(text) for text in texts), segment_len * max_lines)\n",
1047 " for i in range(offset, max_len, segment_len):\n",
1048 " segments = [padded_segment(text, i, segment_len) for text in texts]\n",
1049 " print(gap.join(segments))"
1050 ]
1051 },
1052 {
1053 "cell_type": "code",
1054 "execution_count": 45,
1055 "metadata": {
1056 "collapsed": false
1057 },
1058 "outputs": [
1059 {
1060 "name": "stdout",
1061 "output_type": "stream",
1062 "text": [
1063 "..-.......'.........,,,., ,.,-..:::,[#]:,:,]::***** ,.,-..::::,[#]:,:******,/\n",
1064 ",,.,.-'..,-,.,,...,-,,,,, *,,.,,.\".,\",\"?\"..\",\";\".,. :::::-:-:-:-:::::::-:-:\",\n",
1065 ",,,.,,,,.:,,.,,,.-,-(,.-, \"..\"?\".\"__,.\".\",,,.;,,.;, ,.,',----,','!?--.\",,-,.,\n",
1066 ",,,.,,,,.,,.,,..-...;,,., .\"\"?\"\".\"\"?\"\"!,,!;.!\"\"??\"\" ,..,,;.,.,,-,:\",(),,--.\"\"\n",
1067 ",,,..\"\".\",,\"\"\".\",.,,.,.\"\" .,\",\"!.\"\"?\"\"!,!__,.\"\".,,, !!\",.,,,,.,,.,,,,,.\",,.',\n",
1068 ",\"\",.,\"\"\",\".,.,'.,,,,,\",. ,..\"\",.__,.-,.\"\",.\"\",,..\" \",.\"??\".\",?\"\"'?.,\".\".\"\"'.\n",
1069 "\"\";\",,..,,,-.,,,-,,,\".\"\", \",.\"\"..,,,,.,__.\"\"-,..;;. .\"\",,\",,-,.\"'!,'?.\"\"?\",.\"\n",
1070 "\",.\"\"\",,.\",..,\"\"\"\"\"\",\"\"\"\" \"\".;,-.__.\"\",\";\";.\"\".,__? ?,.\",.,,.,,.,,,,,,,,.:\",'\n",
1071 "?'\"\"!...,,.--,,,\",--.\"\".\" ..\"\",....\"\",.\"\",.\"\",,.\"\", .',,,.!..!,.,!....,,?...'\n",
1072 "\",.\"-,'\",\"...,\"\"\".\"\"\"..,. ,,.\".,,,,--.__.,,.,.;.... ..,,.?.-,.?!!,....',...!\"\n",
1073 ".\",.\"\",'.\".\"\"-\".\".\",\"\"\"\"\" ,;..,:\".,.\"\"__.,\",\".\"\",,\" ,.\",\",\"'..?\"\".,\",,\",,,.,.\n",
1074 "\"\"\"\"\".\"\".\",;,\"\".''''''''' ,\",..\"\"...,,.\"\",\".;\".\".,, .?.?\"\",\".\",\",,\"?.\",..\",\",\n",
1075 "''',''''\".\",-,.--,.',--', ,.\"',,'!..\"\",\";\".\"\",\".\",? .,',.',..,,(),:\".?.\".\",\",\n",
1076 ",,\",.\"\".\"..-''..,,.,,\"',. \"\"-.\"\",,\",\".;,.\"\",,,.__.\" --\".?',.',\".\".,'.\".\"',\".\"\n",
1077 ".\",\".\"\",.\"..',,\"\",\"\",.... \",.,,;?\"\".'..__;,.;,,,,.\" .\"\"';.?\"(),\"'....\",,..\"?\"\n",
1078 "\"\"-\"\".,..,,\",,..\"\".,.,,.- ..,\",!\"\"?\".\",,?__.,?,,.\", .\",.,..\",.\"?,\",.\"...'!\",.\n",
1079 ",-.,,.-,,,,,.,,,,.\"\".\"\",. .\",\",\"..\"\".,\".\"__;?.;,.\"; .\"?\".\",',.,.\",,,.\",\",,\"?,\n",
1080 "\"\".\",.,.\"\",.,,,.,\",.\",\".\" .;,,.\",.!..,!,,.\"\",,,\".;, \",\",?\":\"'....?\"\"..-,'.',.\n",
1081 "\".\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\", ,,.\",!\",.\";,,.,,;,.,,__,. .;,.--'.\"\",,\",'.\"-,.'.\",'\n",
1082 ".,,\"\"\",.,..\"\",\"\".,,.\"\";\". .\"\"!\",\";__,'.\".',..,,,,.. ,,,.\",\",,\"',',,.''.\"'.:.'\n",
1083 "\"\",\".\",-,\"\"\",,\"..\"\",\",.\": --,,;,-,...,,,,,.!;.'.\",\" ,,';.,,*.,,.',,,..*.,\",\"\"\n",
1084 ",,-,.\"\",\".,.--.\"\".\"!.-!,! .,\",.\"..',.,;.,,.;.,..,,, ?\",,;,'.,,;.,,,\",,.\",,.-.\n",
1085 "-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\". ,...;,.;..,,--.--.,,,..-; ,,,,.,-----.,,,.,,,,.,,.,\n",
1086 ",\"\"\"\",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\", ,,.,.--,.,;.,,,,,.,.,,;;, ,,,,.\",\",.\",,,\",.\",.\"-,-,\n",
1087 "!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\"\"\",\"\" ;,,..;,,,..!..,,,..,,..,. ,.\",,,\".\",\",,\"?.?\",,,.\"!\"\n",
1088 "....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\" ,,;,..,,.\",,\",\"...\"\".,.., .,-,,-,,.,-'.,,..-,,,.,,,\n",
1089 "\"\"\"\"-,...\"\"\"\"...,.,.,-\"\"\" .\"\",\".,\"!,;.\"\"__,\".,.\"!!, .\",,,\",.,.,.','.:\"?.\"\",,.\n",
1090 "\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" ,..\"\"?\",,:\",__;.,.\"...;., \"\"?\"..,.,,'.\",\".,,.,,,,,,\n",
1091 "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,. ,;,,.....,.,.'.;,.,,,,... -,,,..,.,'.,,,.-..,.'..,,\n",
1092 "'\".\"\".\"\"\"\"\",-,,.-,\",.\"-'\" ;.';.\"!.,\",\",..,.;.,!__,; ,.,,.,,,',,,..-,..',,,.'.\n"
1093 ]
1094 }
1095 ],
1096 "source": [
1097 "compare_many(sherlock['punctuation'], pap['punctuation'], wap['punctuation'], line_len=80)"
1098 ]
1099 },
1100 {
1101 "cell_type": "code",
1102 "execution_count": 46,
1103 "metadata": {
1104 "collapsed": false
1105 },
1106 "outputs": [
1107 {
1108 "name": "stdout",
1109 "output_type": "stream",
1110 "text": [
1111 ",,,..\"\".\",,\"\"\".\",.,,., <> .,\",\"!.\"\"?\"\"!,!__,.\"\". <> !!\",.,,,,.,,.,,,,,.\",,\n",
1112 ".\"\",\"\",.,\"\"\",\".,.,'.,, <> ,,,,..\"\",.__,.-,.\"\",.\" <> .',\",.\"??\".\",?\"\"'?.,\".\n",
1113 ",,,\",.\"\";\",,..,,,-.,,, <> \",,..\"\",.\"\"..,,,,.,__. <> \".\"\"'..\"\",,\",,-,.\"'!,'\n",
1114 "-,,,\".\"\",\",.\"\"\",,.\",.. <> \"\"-,..;;.\"\".;,-.__.\"\", <> ?.\"\"?\",.\"?,.\",.,,.,,.,\n",
1115 ",\"\"\"\"\"\",\"\"\"\"?'\"\"!...,, <> \";\";.\"\".,__?..\"\",....\" <> ,,,,,,,.:\",'.',,,.!..!\n",
1116 ".--,,,\",--.\"\".\"\",.\"-,' <> \",.\"\",.\"\",,.\"\",,,.\".,, <> ,.,!....,,?...'..,,.?.\n",
1117 "\",\"...,\"\"\".\"\"\"..,..\",. <> ,,--.__.,,.,.;....,;.. <> -,.?!!,....',...!\",.\",\n",
1118 "\"\",'.\".\"\"-\".\".\",\"\"\"\"\"\" <> ,:\".,.\"\"__.,\",\".\"\",,\", <> \",\"'..?\"\".,\",,\",,,.,..\n",
1119 "\"\"\"\".\"\".\",;,\"\".''''''' <> \",..\"\"...,,.\"\",\".;\".\". <> ?.?\"\",\".\",\",,\"?.\",..\",\n",
1120 "''''',''''\".\",-,.--,.' <> ,,,.\"',,'!..\"\",\";\".\"\", <> \",.,',.',..,,(),:\".?.\"\n",
1121 ",--',,,\",.\"\".\"..-''.., <> \".\",?\"\"-.\"\",,\",\".;,.\"\" <> .\",\",--\".?',.',\".\".,'.\n",
1122 ",.,,\"',..\",\".\"\",.\"..', <> ,,,.__.\"\",.,,;?\"\".'.._ <> \".\"',\".\".\"\"';.?\"(),\"'.\n",
1123 ",\"\",\"\",....\"\"-\"\".,..,, <> _;,.;,,,,.\"..,\",!\"\"?\". <> ...\",,..\"?\".\",.,..\",.\"\n",
1124 "\",,..\"\".,.,,.-,-.,,.-, <> \",,?__.,?,,.\",.\",\",\".. <> ?,\",.\"...'!\",..\"?\".\",'\n",
1125 ",,,,.,,,,.\"\".\"\",.\"\".\", <> \"\".,\".\"__;?.;,.\";.;,,. <> ,.,.\",,,.\",\",,\"?,\",\",?\n",
1126 ".,.\"\",.,,,.,\",.\",\".\"\". <> \",.!..,!,,.\"\",,,\".;,,, <> \":\"'....?\"\"..-,'.',..;\n",
1127 "\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\" <> .\",!\",.\";,,.,,;,.,,__, <> ,.--'.\"\",,\",'.\"-,.'.\",\n",
1128 ",.,,\"\"\",.,..\"\",\"\".,,.\" <> ..\"\"!\",\";__,'.\".',..,, <> ',,,.\",\",,\"',',,.''.\"'\n",
1129 "\";\".\"\",\".\",-,\"\"\",,\"..\" <> ,,..--,,;,-,...,,,,,.! <> .:.',,';.,,*.,,.',,,..\n",
1130 "\",\",.\":,,-,.\"\",\".,.--. <> ;.'.\",\".,\",.\"..',.,;., <> *.,\",\"\"?\",,;,'.,,;.,,,\n",
1131 "\"\".\"!.-!,!-!-!-!,,,,\"\" <> ,.;.,..,,,,...;,.;..,, <> \",,.\",,.-.,,,,.,-----.\n",
1132 ".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\",!\"\" <> --.--.,,,..-;,,.,.--,. <> ,,,.,,,,.,,.,,,,,.\",\",\n",
1133 "-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\" <> ,;.,,,,,.,.,,;;,;,,..; <> .\",,,\",.\",.\"-,-,,.\",,,\n",
1134 "\"\"\"..\"\"\"\"\"\".\"\"\"\",\"\"... <> ,,,..!..,,,..,,..,.,,; <> \".\",\",,\"?.?\",,,.\"!\".,-\n",
1135 ".\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\" <> ,..,,.\",,\",\"...\"\".,.., <> ,,-,,.,-'.,,..-,,,.,,,\n",
1136 "\"\"\"\"-,...\"\"\"\"...,.,.,- <> .\"\",\".,\"!,;.\"\"__,\".,.\" <> .\",,,\",.,.,.','.:\"?.\"\"\n"
1137 ]
1138 }
1139 ],
1140 "source": [
1141 "compare_many(sherlock['punctuation'], pap['punctuation'], wap['punctuation'], \n",
1142 " gap=' <> ', offset=100, line_len=80)"
1143 ]
1144 },
1145 {
1146 "cell_type": "markdown",
1147 "metadata": {
1148 "collapsed": true
1149 },
1150 "source": [
1151 "## Making images\n",
1152 "The text versions are fine, but let's turn the punctuation into images, with a coloured square for each punctuation character."
1153 ]
1154 },
1155 {
1156 "cell_type": "markdown",
1157 "metadata": {},
1158 "source": [
1159 "Start with just trying to get something out"
1160 ]
1161 },
1162 {
1163 "cell_type": "code",
1164 "execution_count": 32,
1165 "metadata": {
1166 "collapsed": false
1167 },
1168 "outputs": [],
1169 "source": [
1170 "# Periods and question marks and exclamation marks are red. \n",
1171 "# Commas and quotation marks are green. \n",
1172 "# Semicolons and colons are blue. \n",
1173 "colours = {'.': (255, 0, 0), '?': (255, 0, 0), '!': (255, 0, 0),\n",
1174 " ',': (0, 255, 0), '\"': (0, 255, 0), \"'\": (0, 255, 0),\n",
1175 " ':': (0, 0, 255), ';': (0, 0, 255),\n",
1176 " 'unknown': (128, 128, 128)}\n",
1177 "max_x = 1000\n",
1178 "max_y = 400\n",
1179 "block_size = 4\n",
1180 "text = sherlock['punctuation']\n",
1181 "img = Image.new('RGBA', (max_x, max_y))\n",
1182 "draw = ImageDraw.Draw(img)\n",
1183 "x = 0\n",
1184 "y = 0\n",
1185 "i = 0\n",
1186 "# for i in range(100):\n",
1187 "# if text[i] in colours:\n",
1188 "# this_colour = colours[text[i]]\n",
1189 "for p in text:\n",
1190 " if p in colours:\n",
1191 " this_colour = colours[p]\n",
1192 " else:\n",
1193 " this_colour = colours['unknown']\n",
1194 " draw.rectangle((x, y, x+block_size, y+block_size), fill=this_colour)\n",
1195 " x += block_size\n",
1196 " if x >= max_x:\n",
1197 " x = 0\n",
1198 " y += block_size\n",
1199 "img.save('test.png')"
1200 ]
1201 },
1202 {
1203 "cell_type": "markdown",
1204 "metadata": {},
1205 "source": [
1206 "The image: \n",
1207 "![alt text](test.png)"
1208 ]
1209 },
1210 {
1211 "cell_type": "markdown",
1212 "metadata": {},
1213 "source": [
1214 "Rearrange the colours to match the \"heatmaps\" in [the original](https://medium.com/@neuroecology/punctuation-in-novels-8f316d542ec4#.qwj8e1n8m), and wrap the whole thing in a function."
1215 ]
1216 },
1217 {
1218 "cell_type": "code",
1219 "execution_count": 33,
1220 "metadata": {
1221 "collapsed": false
1222 },
1223 "outputs": [],
1224 "source": [
1225 "# Periods and question marks and exclamation marks are red. \n",
1226 "# Commas and quotation marks are -green- blue. \n",
1227 "# Semicolons and colons are -blue- green. \n",
1228 "def make_image(text, block_size=4, width=1000, colours=None):\n",
1229 " default_colours = {'.': (255, 0, 0), '?': (255, 0, 0), '!': (255, 0, 0),\n",
1230 " ',': (0, 0, 255), '\"': (0, 0, 255), \"'\": (0, 0, 255),\n",
1231 " ':': (0, 255, 0), ';': (0, 255, 0),\n",
1232 " 'unknown': (128, 128, 128)}\n",
1233 " if not colours:\n",
1234 " colours = {}\n",
1235 " use_colours = default_colours.copy()\n",
1236 " use_colours.update(colours)\n",
1237 " height = ceil((len(text) * block_size) / width)\n",
1238 " img = Image.new('RGBA', (width, height))\n",
1239 " draw = ImageDraw.Draw(img)\n",
1240 " x = 0\n",
1241 " y = 0\n",
1242 " for p in text:\n",
1243 " if p in use_colours:\n",
1244 " this_colour = use_colours[p]\n",
1245 " else:\n",
1246 " this_colour = use_colours['unknown']\n",
1247 " draw.rectangle((x, y, x+block_size, y+block_size), fill=this_colour)\n",
1248 " x += block_size\n",
1249 " if x >= width:\n",
1250 " x = 0\n",
1251 " y += block_size\n",
1252 " return img"
1253 ]
1254 },
1255 {
1256 "cell_type": "code",
1257 "execution_count": 34,
1258 "metadata": {
1259 "collapsed": false
1260 },
1261 "outputs": [],
1262 "source": [
1263 "i = make_image(sherlock['punctuation'])\n",
1264 "i.save('sherlock.png')"
1265 ]
1266 },
1267 {
1268 "cell_type": "code",
1269 "execution_count": 35,
1270 "metadata": {
1271 "collapsed": true
1272 },
1273 "outputs": [],
1274 "source": [
1275 "i = make_image(wap['punctuation'], block_size=6, colours={'-': (255,255,255)})\n",
1276 "i.save('wap.png')"
1277 ]
1278 },
1279 {
1280 "cell_type": "code",
1281 "execution_count": 36,
1282 "metadata": {
1283 "collapsed": false
1284 },
1285 "outputs": [],
1286 "source": [
1287 "i = make_image(wap['punctuation'], colours={'-': (255,255,255), '(': (255, 165, 0), ')': (255, 165, 0)})\n",
1288 "i.save('wap.png')"
1289 ]
1290 },
1291 {
1292 "cell_type": "code",
1293 "execution_count": 37,
1294 "metadata": {
1295 "collapsed": true
1296 },
1297 "outputs": [],
1298 "source": [
1299 "i = make_image(shakespeare['punctuation'])\n",
1300 "i.save('shakespeare.png')"
1301 ]
1302 },
1303 {
1304 "cell_type": "code",
1305 "execution_count": 38,
1306 "metadata": {
1307 "collapsed": true
1308 },
1309 "outputs": [],
1310 "source": [
1311 "i = make_image(ulysses['punctuation'], colours={'-': (255,255,255), '(': (255, 165, 0), ')': (255, 165, 0)})\n",
1312 "i.save('ulysses.png')"
1313 ]
1314 },
1315 {
1316 "cell_type": "code",
1317 "execution_count": 39,
1318 "metadata": {
1319 "collapsed": true
1320 },
1321 "outputs": [],
1322 "source": [
1323 "i = make_image(pap['punctuation'])\n",
1324 "i.save('pap.png')"
1325 ]
1326 },
1327 {
1328 "cell_type": "markdown",
1329 "metadata": {
1330 "collapsed": true
1331 },
1332 "source": [
1333 "Sherlock: \n",
1334 "![alt text](sherlock.png)\n",
1335 "\n",
1336 "War and Peace:\n",
1337 "![alt text](wap.png)\n",
1338 "\n",
1339 "Shakespeare:\n",
1340 "![alt text](shakespeare.png)\n",
1341 "\n",
1342 "Ulysses:\n",
1343 "![alt text](ulysses.png)\n",
1344 "\n",
1345 "Pride and Prejudice:\n",
1346 "![alt text](pap.png)"
1347 ]
1348 },
1349 {
1350 "cell_type": "code",
1351 "execution_count": null,
1352 "metadata": {
1353 "collapsed": true
1354 },
1355 "outputs": [],
1356 "source": []
1357 }
1358 ],
1359 "metadata": {
1360 "kernelspec": {
1361 "display_name": "Python 3",
1362 "language": "python",
1363 "name": "python3"
1364 },
1365 "language_info": {
1366 "codemirror_mode": {
1367 "name": "ipython",
1368 "version": 3
1369 },
1370 "file_extension": ".py",
1371 "mimetype": "text/x-python",
1372 "name": "python",
1373 "nbconvert_exporter": "python",
1374 "pygments_lexer": "ipython3",
1375 "version": "3.4.3+"
1376 }
1377 },
1378 "nbformat": 4,
1379 "nbformat_minor": 0
1380 }