{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Punctuation in novels\n", "Inspired by [Punctuation in novels](https://medium.com/@neuroecology/punctuation-in-novels-8f316d542ec4#.qwj8e1n8m).\n", "\n", "Texts used here are [The complete works of Sherlock Holmes](sherlock.txt), [War and Peace](war-and-peace.txt), [The complete works of Shakespeare](shakespeare.txt), [Ulysses](ulysses.txt), and [Pride and Prejudice](pride-and-prejudice.txt)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import string\n", "import collections\n", "from PIL import Image, ImageDraw\n", "from math import ceil\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `string` module has some nice subsets of characters. Does it know about punctuation?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "string.punctuation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting the punctuation\n", "First, let's just open a text file and read the punctuation. We can also count the number of different punctuation characters in it." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sherlock = open('sherlock-holmes.txt').read()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "..-.......'.........,,,.,,,.,.-'..,-,.,,...,-,,,,,,,,.,,,,.:,,.,,,.-,-(,.-,,,,.,,,,.,,.,,..-...;,,.,,,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,.,'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\"\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",--.\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".'''''''''''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''..,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\".\n" ] } ], "source": [ "sherlock_punct = [c for c in sherlock if c in string.punctuation]\n", "print(''.join(sherlock_punct)[:500])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Counter({'!': 171,\n", " '\"': 4834,\n", " '&': 5,\n", " \"'\": 1490,\n", " '(': 5,\n", " ',': 7053,\n", " '-': 965,\n", " '.': 4843,\n", " '/': 1,\n", " ':': 56,\n", " ';': 202,\n", " '?': 138})" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sherlock_counts = collections.Counter(sherlock_punct)\n", "sherlock_counts" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "! 171\n", "\" 4834\n", "& 5\n", "' 1490\n", "( 5\n", ", 7053\n", "- 965\n", ". 4843\n", "/ 1\n", ": 56\n", "; 202\n", "? 138\n", "dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sherlock_ps = pd.Series(sherlock_counts)\n", "sherlock_ps.sort_index(inplace=True)\n", "sherlock_ps" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD9CAYAAABZVQdHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFPhJREFUeJzt3W2MXNdh3vH/I7FKKJkNwbpdkZRSC/Ey0rZ2I7MVnSaF\nxk3KbvoiEgVCUkgFoqaNCkwjt2iDLo22XH9xJfeVRkEVrZ1omVoM2DgR5JqmuWI4TeEg2tqRbcY0\nTbLIqt51dpXaceQ0aUNGTz/MoTjaDndnlzOj3cPnBwxw7rnn3nMuZ/js2XN3ZmSbiIio021v9gAi\nIqJ/EvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVbMuQlHZL0FUnnJD0r6bskbZI0KemipNOSNi5o\nf0nSBUk72+q3l3NcknSkXxcUERHXLRrykt4GvB94l+13ALcD+4AxYNL2NuBM2UbSCLAXGAFGgaOS\nVE73NHDA9jAwLGm051cTERFvsNRM/lXgCnCnpHXAncA3gEeAidJmAthdyruA47av2J4GLgM7JG0G\nNtieKu2OtR0TERF9smjI2/4W8K+A/0kr3L9texIYsj1fms0DQ6W8BZhpO8UMsLVD/Wypj4iIPlpq\nueb7gL8PvI1WUL9F0t9ub+PW5yLksxEiIlahdUvs//PAr9r+JoCkXwR+EJiTdLftubIU80ppPwvc\n23b8PbRm8LOl3F4/26lDSfmBERGxTLbVqX6pNfkLwLslrS83UH8UOA98Cthf2uwHnivl54F9ku6Q\ndB8wDEzZngNelbSjnOextmM6DXZZj8OHDy/7mJt5pL/0l/5ujf7WyrUtZtGZvO0vSToGfB54Dfh1\n4D8AG4ATkg4A08Ce0v68pBPlB8FV4KCvj+Ag8AywHjhp+9SiI4uIiJu21HINtj8CfGRB9bdozeo7\ntf8w8OEO9V8A3rGCMUZExApV8Y7XRqOR/tJf+kt/a7qvfvWnpdZzBk2SV9uYIiJWM0l4hTdeIyJi\nDUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxE\nRMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFVsy5CV9v6SX2h6/K+kJSZskTUq6KOm0pI1txxyS\ndEnSBUk72+q3SzpX9h3p10VFRETLkiFv+2u2H7T9ILAd+H3gl4AxYNL2NuBM2UbSCLAXGAFGgaOS\nrn0t1dPAAdvDwLCk0V5fUEREXLfc5ZofBS7b/jrwCDBR6ieA3aW8Czhu+4rtaeAysEPSZmCD7anS\n7ljbMRER0Qfrltl+H3C8lIdsz5fyPDBUyluAX2s7ZgbYClwp5WtmS31Ez13/5XH58kXyUZOuZ/KS\n7gD+JvCfF+5z639F/mfEKuMVPCLqspyZ/I8BX7D922V7XtLdtufKUswrpX4WuLftuHtozeBnS7m9\nfrZTR+Pj46+XG40GjUZjGcOMiKhbs9mk2Wx21Vbd/moq6eeBz9ieKNsfAb5p+ylJY8BG22Plxuuz\nwEO0lmNeAN5u25JeBJ4ApoBPAx+1fWpBP86vy3GzWss1K3kdKcs1seZIwnbHNcquQl7SXcDLwH22\nv1PqNgEngO8FpoE9tr9d9n0QeC9wFfiA7c+W+u3AM8B64KTtJzr0lZCPm5aQj1vJTYf8ICXkoxcS\n8nErWSzk847XiIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQj\nIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhXIS9po6Rf\nkPRVSecl7ZC0SdKkpIuSTkva2Nb+kKRLki5I2tlWv13SubLvSD8uKCIirut2Jn8EOGn7AeCdwAVg\nDJi0vQ04U7aRNALsBUaAUeCoWt+qDPA0cMD2MDAsabRnVxIREf+fJUNe0vcAf8n2zwDYvmr7d4FH\ngInSbALYXcq7gOO2r9ieBi4DOyRtBjbYnirtjrUdExERfdDNTP4+4Lcl/aykX5f0HyXdBQzZni9t\n5oGhUt4CzLQdPwNs7VA/W+ojIqJPugn5dcC7gKO23wX8b8rSzDW2Dbj3w4uIiJuxros2M8CM7f9e\ntn8BOATMSbrb9lxZinml7J8F7m07/p5yjtlSbq+f7dTh+Pj46+VGo0Gj0ehimBERt4Zms0mz2eyq\nrVqT8CUaSb8CvM/2RUnjwJ1l1zdtPyVpDNhoe6zceH0WeIjWcswLwNttW9KLwBPAFPBp4KO2Ty3o\ny92MKWIxrXv9K3kdibz+Yq2RhG112tfNTB7gp4BPSLoD+B/A3wFuB05IOgBMA3sAbJ+XdAI4D1wF\nDral9kHgGWA9rb/WeUPAR0REb3U1kx+kzOSjFzKTj1vJYjP5vOM1IqJiCfmIiIol5CMiKpaQj4io\nWEI+IqJiCfmIiIol5CMiKpaQj4ioWEI+IqJiCfmIiIol5CMiKpaQj4ioWEI+IqJiCfmIiIol5CMi\nKpaQj4ioWEI+IqJiCfmIiIol5CMiKtZVyEualvRlSS9Jmip1myRNSroo6bSkjW3tD0m6JOmCpJ1t\n9dslnSv7jvT+ciIiol23M3kDDdsP2n6o1I0Bk7a3AWfKNpJGgL3ACDAKHFXrW5UBngYO2B4GhiWN\n9ug6IiKig+Us1yz8JvBHgIlSngB2l/Iu4LjtK7angcvADkmbgQ22p0q7Y23HREREHyxnJv+CpM9L\nen+pG7I9X8rzwFApbwFm2o6dAbZ2qJ8t9RER0Sfrumz3Q7Z/S9KfBCYlXWjfaduS3KtBjY+Pv15u\nNBo0Go1enToiYs1rNps0m82u2speXjZLOgz8HvB+Wuv0c2Up5qzt+yWNAdh+srQ/BRwGXi5tHij1\njwIP2358wfm93DFFLNS6DbSS15HI6y/WGknYXrikDnSxXCPpTkkbSvkuYCdwDnge2F+a7QeeK+Xn\ngX2S7pB0HzAMTNmeA16VtKPciH2s7ZiIiOiDbpZrhoBfKn8gsw74hO3Tkj4PnJB0AJgG9gDYPi/p\nBHAeuAocbJuaHwSeAdYDJ22f6uG1RETEAsterum3LNdEL2S5Jm4liy3XdHvj9ZZz/U/7ly8hEf2W\n12d0KyG/qJXNBCMGI6/PWFo+uyYiomIJ+YiIiiXkIyIqlpCPiKhYQj4iomIJ+YiIiiXkIyIqlpCP\niKhYQj4iomIJ+YiIiiXkIyIqlpCPiKhYQj4iomIJ+YiIiiXkIyIqlpCPiKhYQj4iomJdhbyk2yW9\nJOlTZXuTpElJFyWdlrSxre0hSZckXZC0s61+u6RzZd+R3l9KREQs1O1M/gPAea5/39gYMGl7G3Cm\nbCNpBNgLjACjwFFd/zLKp4EDtoeBYUmjvbmEiIi4kSVDXtI9wF8DPsb1L4h8BJgo5QlgdynvAo7b\nvmJ7GrgM7JC0Gdhge6q0O9Z2TERE9Ek3M/l/A/w08Fpb3ZDt+VKeB4ZKeQsw09ZuBtjaoX621EdE\nRB+tW2ynpL8BvGL7JUmNTm1sW9JKvjb+hsbHx18vNxoNGo2OXUdE3JKazSbNZrOrtrJvnM+SPgw8\nBlwFvhv448AvAn8BaNieK0sxZ23fL2kMwPaT5fhTwGHg5dLmgVL/KPCw7cc79OnFxjQorVsJKxmH\nWA3jv9XV/vzVfn2xPJKwrU77Fl2usf1B2/favg/YB/yy7ceA54H9pdl+4LlSfh7YJ+kOSfcBw8CU\n7TngVUk7yo3Yx9qOiYiIPll0uaaDa1OAJ4ETkg4A08AeANvnJZ2g9Zc4V4GDbdPyg8AzwHrgpO1T\nNzf0iIhYyqLLNW+GLNdEL9T+/NV+fbE8K16uiYiItS0hHxFRsYR8RETFEvIRERVLyEdEVCwhHxFR\nsYR8RETFEvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVLyEdE\nVCwhHxFRsYR8RETFFg15Sd8t6UVJX5R0XtI/L/WbJE1KuijptKSNbcccknRJ0gVJO9vqt0s6V/Yd\n6d8lRUTENYuGvO3/A7zH9g8A7wTeI+mHgTFg0vY24EzZRtIIsBcYAUaBo2p94zDA08AB28PAsKTR\nflxQRERct+Ryje3fL8U7gNuB3wEeASZK/QSwu5R3AcdtX7E9DVwGdkjaDGywPVXaHWs7JiIi+mTJ\nkJd0m6QvAvPAWdtfAYZsz5cm88BQKW8BZtoOnwG2dqifLfUREdFH65ZqYPs14AckfQ/wWUnvWbDf\nktzLQY2Pj79ebjQaNBqNXp4+ImJNazabNJvNrtrK7j6fJf1T4A+A9wEN23NlKeas7fsljQHYfrK0\nPwUcBl4ubR4o9Y8CD9t+vEMfXs6Y+qV1K2El4xCrYfy3utqfv9qvL5ZHErbVad9Sf13z1mt/OSNp\nPfBXgJeA54H9pdl+4LlSfh7YJ+kOSfcBw8CU7TngVUk7yo3Yx9qOiYiIPllquWYzMCHpNlo/EH7O\n9hlJLwEnJB0ApoE9ALbPSzoBnAeuAgfbpuUHgWeA9cBJ26d6fTEREfFGy1quGYQs10Qv1P781X59\nsTwrXq6JiIi1LSEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGx\nhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMUS8hERFUvIR0RULCEfEVGxhHxERMWWDHlJ90o6K+kr\nkn5D0hOlfpOkSUkXJZ2WtLHtmEOSLkm6IGlnW/12SefKviP9uaSIiLimm5n8FeAf2P4zwLuBn5T0\nADAGTNreBpwp20gaAfYCI8AocFStbx0GeBo4YHsYGJY02tOriYiIN1gy5G3P2f5iKf8e8FVgK/AI\nMFGaTQC7S3kXcNz2FdvTwGVgh6TNwAbbU6XdsbZjIiKiD5a1Ji/pbcCDwIvAkO35smseGCrlLcBM\n22EztH4oLKyfLfUREdEn67ptKOktwCeBD9j+zvUVGLBtSe7VoMbHx18vNxoNGo1Gr04dEbHmNZtN\nms1mV21lL53Nkv4Y8F+Az9j+t6XuAtCwPVeWYs7avl/SGIDtJ0u7U8Bh4OXS5oFS/yjwsO3HF/Tl\nbsbUb60fYisZh1gN47/V1f781X59sTySsK1O+7r56xoBHwfOXwv44nlgfynvB55rq98n6Q5J9wHD\nwJTtOeBVSTvKOR9rOyYiIvpgyZm8pB8GfgX4MtenDoeAKeAE8L3ANLDH9rfLMR8E3gtcpbW889lS\nvx14BlgPnLT9RIf+MpOPm1b781f79cXyLDaT72q5ZpAS8tELtT9/tV9fLM9NLddERMTalZCPiKhY\nQj4iomIJ+YiIiiXkIyIqlpCPiKhYQj4iomIJ+YiIiiXkIyIq1vWnUEZd2j9FdLnyjsmItSMhf0tb\n2dviI2LtyHJNRETFEvIRERVLyEdEVCwhHxFRsYR8RETFEvIRERVLyEdEVKybL/L+GUnzks611W2S\nNCnpoqTTkja27Tsk6ZKkC5J2ttVvl3Su7DvS+0uJiIiFupnJ/ywwuqBuDJi0vQ04U7aRNALsBUbK\nMUd1/a2VTwMHbA8Dw5IWnjMiInpsyZC3/d+A31lQ/QgwUcoTwO5S3gUct33F9jRwGdghaTOwwfZU\naXes7ZiIiOiTla7JD9meL+V5YKiUtwAzbe1mgK0d6mdLfURE9NFN33h169Oq8olVERGr0Eo/oGxe\n0t2258pSzCulfha4t63dPbRm8LOl3F4/e6OTj4+Pv15uNBo0Go0VDjMioj7NZpNms9lVW3XzsbGS\n3gZ8yvY7yvZHgG/afkrSGLDR9li58fos8BCt5ZgXgLfbtqQXgSeAKeDTwEdtn+rQl1fDR9m27hev\n7FMaV8P4l5Lru+GRK7q+QX90c+3PXyyPJGx3fBEuOZOXdBx4GHirpK8D/wx4Ejgh6QAwDewBsH1e\n0gngPHAVONiW2AeBZ4D1wMlOAR+xtuWjm2P16WomP0iZyQ9Gru+GR66JmXXtz18sz2Iz+bzjNSKi\nYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKJeQjIiqWkI+I\nqFhCPiKiYgn5iIiKJeQjIiqWkI+IqFhCPiKiYgn5iIiKLfkdrxERNRv0l7AP2sBn8pJGJV2QdEnS\nP17GcSt+REQszit4rA0DDXlJtwP/DhgFRoBHJT3Q/Rlu9I99dpF9/dDs03lv0FtzsP3l+tLfsnob\n8PM32P5639egJ6yDnsk/BFy2PW37CvDzwK6bP23z5k+xivurPQRrv77a+0vIr8SNJqWHF9m3MoNe\nk98KfL1tewbYMeAxRMQqt9Ss9UMf+tAN962FdfJBGvRMPv/6EdGlwc12a6ZB/tST9G5g3PZo2T4E\nvGb7qbY2eaYiIpbJdsdffwYd8uuArwE/AnwDmAIetf3VgQ0iIuIWMtA1edtXJf094LPA7cDHE/AR\nEf0z0Jl8REQM1pp6x6ukf7igyrb/ddn3mO2fexOG1TeSNtj+Tim/3fblN3tMN0OSvMSsops2PRjH\nZuBbtv9vP/spfd1te67f/ayWfvtl4fX06/o69NOX14qkPwE8DvwB8DHbr/by/O3W2mfXbADe0vbY\n0Lbvzn50KOlseXyyH+dfwuckPSdpL3C6Hx1I+s3yeLEf51+gKemnJW3rMI7vL++A/q8DGMd/Ar4m\n6V8OoK+TA+ijk4/3+oSS/qLevLeQL7yenl/fDc7br9fKJ4G7gHuAX5P0fT0+/+uyXLMESX+6FP/I\n9kyf+7oL+MPyRrFrdQdpvUt4n+0T/ey/3yR9F/ATwKPAnwW+A4jWD+zfAD4BPGv7DwcwltuAB2x/\npc/9vGT7wX72MSiS/j2t97VcBD4DnKrpt4Ub6cdrRdKXbb+zlP8q8DHg28A/At5n+8d71ldCfnGS\nfrMUX7Hd1zduldn0btu/Vbb/FvAvgJ8CftL2X+9n/4NUPuLirWXzf9n+ozdzPP0i6aDto2/2OHqp\nfBTJjwE7gY3ALwOngM/V+jz2mqTPAT9he7ps3wZsAb4FbLT9jZ71lZBfPSR9yfafK+W/C/wT4Eds\nX5T0Bdvb39wRRryRpDuB99AK/R/Ma7Q7ku6ndU/xa33vKyG/ekg6S+vDMu4F3gv8ZdtNSX8KeOHa\nr3cREd1aazdea/fjwGu01jz3AB+XNAH8KvDUYgdGRHSSmfwqJmkr8EPAlwbxa11E1CchHxFRsSzX\nRERULCEfEVGxhHxERMUS8hERFUvIR0RU7P8BnaAgGCuaansAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sherlock_ps.plot(kind=\"bar\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can read and process a novel, wrap that into a function and read some other novels" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def punct_summarise(fname):\n", " content = open(fname).read()\n", " punct = ''.join(c for c in content if c in string.punctuation)\n", " counts = collections.Counter(punct)\n", " return {'punctuation': punct, 'counts': counts}" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Counter({'!': 171,\n", " '\"': 4834,\n", " '&': 5,\n", " \"'\": 1490,\n", " '(': 5,\n", " ',': 7053,\n", " '-': 965,\n", " '.': 4843,\n", " '/': 1,\n", " ':': 56,\n", " ';': 202,\n", " '?': 138})" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Complete Sherlock Holmes\n", "sherlock = punct_summarise('sherlock-holmes.txt')\n", "sherlock['counts']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Counter({'!': 3923,\n", " '\"': 17970,\n", " '#': 1,\n", " '$': 2,\n", " '%': 1,\n", " \"'\": 7529,\n", " '(': 670,\n", " ')': 670,\n", " '*': 300,\n", " ',': 39891,\n", " '-': 6308,\n", " '.': 30805,\n", " '/': 29,\n", " ':': 1014,\n", " ';': 1145,\n", " '=': 2,\n", " '?': 3137,\n", " '@': 2,\n", " '[': 1,\n", " ']': 1})" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wap = punct_summarise('war-and-peace.txt')\n", "wap['counts']" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Counter({'!': 10815,\n", " '\"': 6,\n", " '&': 10,\n", " \"'\": 27942,\n", " ',': 82750,\n", " '-': 4590,\n", " '.': 36881,\n", " ':': 10649,\n", " ';': 17400,\n", " '?': 10327,\n", " '[': 19,\n", " ']': 18})" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Complete works of Shakespeare\n", "shakespeare = punct_summarise('shakespeare.txt')\n", "shakespeare['counts']" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Counter({'!': 1576,\n", " '\"': 8,\n", " '%': 3,\n", " '&': 3,\n", " \"'\": 4485,\n", " '(': 1777,\n", " ')': 1788,\n", " '*': 90,\n", " '+': 2,\n", " ',': 16349,\n", " '-': 5037,\n", " '.': 21361,\n", " '/': 58,\n", " ':': 2564,\n", " ';': 34,\n", " '?': 2235,\n", " '_': 4566})" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ulysses = punct_summarise('ulysses.txt')\n", "ulysses['counts']" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Counter({'!': 500,\n", " '\"': 3553,\n", " '#': 1,\n", " '$': 2,\n", " '%': 1,\n", " \"'\": 748,\n", " '(': 38,\n", " ')': 38,\n", " '*': 58,\n", " ',': 9280,\n", " '-': 1193,\n", " '.': 6396,\n", " '/': 26,\n", " ':': 155,\n", " ';': 1538,\n", " '?': 462,\n", " '@': 2,\n", " '[': 1,\n", " ']': 2,\n", " '_': 808})" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pap = punct_summarise('pride-and-prejudice.txt')\n", "pap['counts']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Place all the counts into a Pandas dataframe, normalise them, and then plot them." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
papshakespearesherlockulysseswap
!5001081517115763923
\"355364834817970
#10001
$20002
%10031
&010530
'74827942149044857529
(38051777670
)38001788670
*580090300
+00020
,92808275070531634939891
-1193459096550376308
.63963688148432136130805
/26015829
:155106495625641014
;153817400202341145
=00002
?4621032713822353137
@20002
[119001
]218001
_8080045660
\n", "
" ], "text/plain": [ " pap shakespeare sherlock ulysses wap\n", "! 500 10815 171 1576 3923\n", "\" 3553 6 4834 8 17970\n", "# 1 0 0 0 1\n", "$ 2 0 0 0 2\n", "% 1 0 0 3 1\n", "& 0 10 5 3 0\n", "' 748 27942 1490 4485 7529\n", "( 38 0 5 1777 670\n", ") 38 0 0 1788 670\n", "* 58 0 0 90 300\n", "+ 0 0 0 2 0\n", ", 9280 82750 7053 16349 39891\n", "- 1193 4590 965 5037 6308\n", ". 6396 36881 4843 21361 30805\n", "/ 26 0 1 58 29\n", ": 155 10649 56 2564 1014\n", "; 1538 17400 202 34 1145\n", "= 0 0 0 0 2\n", "? 462 10327 138 2235 3137\n", "@ 2 0 0 0 2\n", "[ 1 19 0 0 1\n", "] 2 18 0 0 1\n", "_ 808 0 0 4566 0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "punctuation = pd.DataFrame({'sherlock': sherlock['counts'],\n", " 'wap': wap['counts'],\n", " 'shakespeare': shakespeare['counts'],\n", " 'ulysses': ulysses['counts'],\n", " 'pap': pap['counts']})\n", "punctuation.fillna(value=0, inplace=True)\n", "punctuation" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfoAAAEECAYAAADAjfYgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt8VNW5//HPQ7grCQjILWBA8QJ4iopUEWSwYtEiWBTl\nImK1VrQq1NZaPKLBeqSKClU5UlpFq9YLtSh4wdRCEIqioCLoEX6gAgKCaLgUkATy/P7YkzgZMsmQ\nC5kZvu/Xa15k9l5r7bUnQ569Lnttc3dEREQkNdWq6QqIiIhI9VGgFxERSWEK9CIiIilMgV5ERCSF\nKdCLiIikMAV6ERGRFKZALyIiksLiCvRmdr2ZfW5me8xsiZn1jDNfRzPbaWY7o7aHzKywlNfxFTkJ\nERERKV25gd7MLgMmA3cDXYFFwOtm1racfHWB54D5QKxVeToBLSNeq+OuuYiIiJQrnhb9zcB0d3/M\n3Ve6+03AJuC6cvLdC3wIzAAsRpqv3X1LxKsw7pqLiIhIucoM9OFW+alATtSuHKBHGfl+AvwEuJHY\nQR5giZltNLM3zSwUV41FREQkbuW16JsBacDmqO1bCLraD2BmrYFpwHB33x2j3I3AKGBQ+LUS+Fe8\nY/8iIiISn9rVUOZTwKPu/l6sBO6+ClgVsekdM8sCbgEWRqY1Mz11R0SkAty9rB5VOUyU16LfCuwH\nWkRtb0EwTl+aPsCdZlZgZgXAX4Ajwu9/Xsax3gU6lrbD3WO+7rzzzjL3V3W+ZDtmstVXx0zMvDpm\n8h1TpEiZLXp3zzezpcB5wIsRu/oSTLIrTZeo9xcB/w2cTtBlH0vXcvaLiIjIQYqn6/5B4Ckze5fg\n1rpRBOPzUwHMbAJwurufC+Dun0RmNrPuQGHkdjMbA3wOfALUBS4HBhKM14uIiEgVKTfQu/sLZtYU\nuB1oBSwHLnD39eEkLYEO5RUT9b4OMBHIBPYAK8JlzjmIugMQCoUONkul8iXbMSuTV8dMrWNWJq+O\nmVrHlMOLJfpYjpl5otdRRCTRmBmuyXiC1roXERFJaQr0IiIiKUyBXkREJIUp0IuIiKQwBXoREZEU\npkAvIiKSwhToRUREUpgCvYiISApToBcREUlhCvQiIiIpTIFeREQkhSnQi4iIpDAFehERkRSmQC8i\nIpLC4gr0Zna9mX1uZnvMbImZ9YwzX0cz22lmO0vZ19vMlobLXGNm1x5s5UVqiplhpieAikjiKzfQ\nm9llwGTgbqArsAh43czalpOvLvAcMB/wqH3tgdeAheEyJwAPm9mgCpyDiIiIxGDuXnYCs8XAh+5+\nbcS2VcDf3f22MvJNAtKBt4BH3L1RxL57gYvc/YSIbX8GOrt7j6hyvLw6ihxqRa15fTclUZkZ7q5u\nJym7RR9ulZ8K5ETtygF6HJijON9PgJ8ANwKlfdHOjFFmNzNLK6fOIiIiEqfyuu6bAWnA5qjtW4CW\npWUws9bANGC4u++OUW6LUsrcDNQOH1NERESqQO1qKPMp4FF3f6+qCszOzi7+ORQKEQqFqqpoEZGU\nkJubS25ubk1XQxJQmWP04a77XcAQd38xYvsUoJO79yklTyGwP3ITQc/BfuA6d/+Lmc0Hlrv7DRH5\nBgPPAA3cfX/Edo3RS8LRGL0kOo3RS5Eyu+7dPR9YCpwXtasvwez70nQBfhDxugPYE/757+E0b4fL\niC7zvcggLyIiIpUTT9f9g8BTZvYuQXAfRTA+PxXAzCYAp7v7uQDu/klkZjPrDhRGbZ8K3BCemT8N\nOAsYCQyp3OmIiIhIpHIDvbu/YGZNgduBVsBy4AJ3Xx9O0hLoUF4xUWV+YWYXAJOA64ANwI3uPvMg\n6y8iIiJlKPc++pqmMXpJRBqjl0SnMXoporXuRUREUpgCvYiISApToBcREUlhCvQiIiIpTIFeREQk\nhSnQi4iIpDAFehERkRRWHQ+1EUlZRffPi4gkC7XoRQ6aFskRkeShQC8iIpLCFOhFRERSmAK9iIhI\nClOgFxERSWEK9CIiIiksrkBvZteb2edmtsfMlphZzzLSdjKzeWb2VTj9GjP7HzOrE5EmZGaFpbyO\nr4qTEhERkUC599Gb2WXAZOA6YCHwS+B1M+vk7utLybIXmA58AGwDugJ/BuoCt0Sl7QR8G/F+68Ge\ngIiIiMRm7mXfE2xmi4EP3f3aiG2rgL+7+21xHcTsQeAMd+8Rfh8C5gLN3f2bcvJ6eXUUOVSCBXMc\nKLlwjr6jkmjMDHfXCk9Sdte9mdUFTgVyonblAD3iOYCZHQf8uJQyAJaY2UYzezMc/EWSisK7iCS6\n8sbomwFpwOao7VuAlmVlNLNFZrYHWAUsdvfsiN0bgVHAoPBrJfCvssb+RRKZmWl5XBFJSNW51v2l\nwJEEY/QTzew+d/8tgLuvIrgAKPKOmWURjOEvjC4oOzu7+OdQKEQoFKquOouIJKXc3Fxyc3NruhqS\ngMocow933e8Chrj7ixHbpwCd3L1PXAcxGw48DjR09/0x0twJXObunaK2a4xeEkb0GH30aL2+q5Io\nNEYvRcrsunf3fGApcF7Urr7AooM4Tlr4WGUdrytBl76IiIhUkXi67h8EnjKzdwmC+yiC8fmpAGY2\nATjd3c8Nvx8B7AFWAPlAN+Ae4Hl3LwinGQN8DnxCcNvd5cBAgvF6ESlH0XwA9SCISHnKDfTu/oKZ\nNQVuB1oBy4ELIu6hbwl0iMhSAIwFOhL0aq4FHgEmRaSpA0wEMvn+ouACd59TqbMRERGREsq9j76m\naYxeEkmijNGrRS/l0Ri9FNFa9yIiIilMgV5ERCSFKdCLiIikMAV6ERGRFKZALyIiksIU6EVERFKY\nAr2IiEgKq86H2oiISIIxMy2+kMJKWztBgV5E5DCjhZZSU6xHZavrXkREJIUp0IuIiKQwBXoREZEU\npkAvIiKSwjQZT0TkMBdrEldV0gTAmhNXi97Mrjezz81sj5ktMbOeZaTtZGbzzOyrcPo1ZvY/ZlYn\nKl1vM1sakebayp6MiIhUlFfjS2pSuYHezC4DJgN3A12BRcDrZtY2Rpa9wHSgL3A8MAa4Grgnosz2\nwGvAwnCZE4CHzWxQhc9ERESSWlZWFn/4wx/o3LkzRx11FFdddRV79+4lLy+P/v37c/TRR3PUUUdx\n4YUXsmHDhuJ8oVCIsWPH8sMf/pCMjAwuuugi8vLyavBMEouV151iZouBD9392ohtq4C/u/ttcR3E\n7EHgDHfvEX5/L3CRu58QkebPQOeiNBHbXV0+kiiCLk4Hgq7O738KHKrvalFXq/5vSCxmVuriKaX9\nTf3+e11ttYnru5qVlUV6ejqvv/46DRs25MILL6RPnz786le/Yv78+Zx//vns27ePq666ioKCAmbO\nnAkEgX716tXk5OSQlZXFFVdcQYMGDXjqqaeq8ZwST6zfeZktejOrC5wK5ETtygF6HJij1DKOA34c\nVcaZMcrsZmZp8ZQrIiKpxcy44YYbaNOmDU2aNOG///u/efbZZznqqKP46U9/Sv369TnyyCO57bbb\nmD9/fol8V1xxBZ06daJhw4b8/ve/54UXXtCFcFh5XffNgDRgc9T2LUDLsjKa2SIz2wOsAha7e3bE\n7hallLmZYHJgs3LqJCIiKapt2+9Hhdu1a8fGjRvZs2cP1157LVlZWWRkZNC7d2+2b99eIpBH5yso\nKGDr1q2HtO6Jqjpn3V8KHEkwBj/RzO5z999WpKDs7Ozin0OhEKFQqCrqJyKSMnJzc8nNza3palTa\nunXrSvzcunVrHnjgAVatWsW7777L0UcfzYcffsipp56KuxcPY0Xnq1OnDs2aqd0I5YzRh7vudwFD\n3P3FiO1TgE7u3ieug5gNBx4HGrr7fjObDyx39xsi0gwGngEauPv+iO0ao5eEoTF6SRbJOkafkZHB\na6+9RoMGDRgwYAChUIiCggKWL1/OzJkz2bVrF1dffTUvv/wy+/bto1atWoRCIdasWUNOTg7HHHMM\nI0eOpF69ejz99NPVeE6Jp0Jj9O6eDywFzova1Zdg9n280sLHKjre2+Eyost8LzLIi4jI4cPMGDZs\nGOeddx7HHnssHTt25Pbbb2fMmDHs2bOHZs2a0aNHD84///wS9/6bGSNGjODKK6+kVatW5Ofn89BD\nD9XgmSSWeGbdXwo8BVxPENxHAT8jmCG/3swmAKe7+7nh9COAPcAKIB/oBjwI5Lr75eE0WeH9fwam\nAWcBUwh6DmZGHV8tekkYatFLsjj4Fn31iue72r59ex577DHOOeecgyq7T58+jBgxgquuuqqi1UsJ\nsX7n5Y7Ru/sLZtYUuB1oBSwHLnD39eEkLYEOEVkKgLFAR4K/gWuBR4BJEWV+YWYXhLddB2wAbowO\n8iIiUv1S4YIxFc6husQ1Gc/dHwUejbHvZ1HvnwOei6PMt4DT4jm+iIhIWQ5Fr0SyKrfrvqap614S\nibruJVkcTNe9pIYKTcYTERGR5KZALyIiksIU6EVERFKYAr2IiEgKU6AXERFJYQr0IiKS0LKzsxkx\nYkSF8j7xxBP06tWrimuUXKrzoTYiIpIEEmVlvFh0j3zlKNCLJDHdTy9VJjtxyz5cv9/79+8nLS2t\n0uWo614kic1jXk1XQaRK3XvvvWRmZpKens6JJ57I3LlzMTPy8/MZOXIk6enpdOnShaVLlxbn+cMf\n/sBxxx1Heno6nTt35qWXXopZ/i233EKvXr3YuXMn27dv5+qrr6Z169ZkZmYybtw4CgsLAVi9ejW9\ne/emcePGNG/enCFDhhSXUatWLR5++GGOPfZYmjdvzm9/+9sSFyOPP/44nTp14qijjqJfv34lHqE7\nevRo2rVrR0ZGBt26dWPhwoXF+7Kzs7nkkksYMWIEGRkZPPnkk2XWMV4K9CIikhBWrlzJlClTWLJk\nCTt27CAnJ4esrCzcnVmzZjF06FC2b9/OgAEDuOGG4qecc9xxx7Fw4UJ27NjBnXfeyeWXX87mzZtL\nlO3uXHPNNaxYsYJ//vOfNGrUiCuvvJK6deuyZs0aPvjgA3JycvjLX/4CwLhx4+jXrx/btm1jw4YN\n3HTTTSXKe+mll1i6dCnvv/8+L7/8Mo8//jgAL7/8MhMmTGDmzJls3bqVXr16MXTo0OJ83bt3Z9my\nZeTl5TFs2DAGDx5Mfn5+8f5Zs2YxePBgtm/fzrBhw8qsY7wU6EVEJCGkpaWxd+9ePv74YwoKCmjX\nrh0dOgTPTOvVqxf9+vXDzLj88stZtmxZcb5LLrmEli1bAnDppZfSsWNHFi9eXLy/oKCAIUOGsG3b\nNmbPnk39+vXZvHkzr7/+OpMmTaJBgwY0b96cMWPG8NxzwaNa6tatyxdffMGGDRuoW7cuPXr0KFHX\nW2+9lcaNG9O2bVvGjBnDs88+C8DUqVMZO3YsJ5xwArVq1WLs2LF8+OGHrF8fPAdu+PDhNGnShFq1\nanHzzTezd+9eVq5cWVxujx49GDBgAADbt28vs47xUqAXEZGEcNxxxzF58mSys7Np0aIFQ4cOZdOm\nTQC0aNGiOF3Dhg357rvviruw//rXv3LKKafQpEkTmjRpwooVK/jmm2+K069evZrZs2dzxx13ULt2\nMDVt7dq1FBQU0KpVq+J8o0aN4uuvvwbgvvvuw93p3r07Xbp0Yfr06SXq2rZt2+Kf27Vrx8aNG4vL\nHT16dHGZTZs2BWDDhg0A3H///XTq1InGjRvTpEkTtm/fztatW4vLyszMLP65vDrGK65Ab2bXm9nn\nZrbHzJaYWc8y0obM7GUz22hmu8xsmZn9rJQ0haW8jj+o2ouISEoZOnQoCxYsYO3atZgZt956a5mz\n7teuXcsvfvELpkyZwrfffkteXh5dunQpMWZ+0kkn8fjjj3P++eezatUqIAjU9erV45tvviEvL4+8\nvDy2b9/O8uXLgeDCYtq0aWzYsIE//elPXH/99Xz22WfFZUaOu69bt442bdoAQdCfNm1acZl5eXns\n2rWLM844gwULFjBx4kRmzJjBtm3byMvLIyMjo0RdI8+1vDrGq9xAb2aXAZOBu4GuwCLgdTNrGyPL\nmcAy4GKgM8HjbaeZ2dBS0nYieJ590Wv1QdVeRERSxqpVq5g7dy579+6lXr161K9fv9xZ57t27cLM\naNasGYWFhUyfPp0VK1YckG7IkCHcc889nHvuuXz22We0atWK8847j5tvvpmdO3dSWFjImjVreOut\ntwCYMWMGX375JQCNGzfGzKhV6/uQef/997Nt2zbWr1/PQw89xGWXXQbAqFGjuOeee/jkk0+AoPt9\nxowZAOzcuZPatWvTrFkz8vPzueuuu9ixY0fMcyuvjvGKp0V/MzDd3R9z95XufhOwCbiutMTuPsHd\n73D3t939C3efCvyDIPBH+9rdt0S8Dm4qochhwMyKXyKpbO/evYwdO5bmzZvTqlUrtm7dyoQJE4AD\n76Uvet+pUyd+/etfc+aZZ9KyZUtWrFhBz549S6QrSnvFFVdwxx13cM4557Bu3Tr++te/kp+fXzxD\nfvDgwXz11VcALFmyhDPOOINGjRoxcOBAHnroIbKysorLHThwIKeddhqnnHIK/fv356qrrgLgoosu\n4tZbb2XIkCFkZGRw8skn88YbbwDQr18/+vXrx/HHH09WVhYNGjSgXbt2pda1SFl1jFeZz6M3s7rA\nLmCIu78Ysf0RoIu7h+I6iNkcYJ27/yL8PgTMBdYC9YBPgLvdPbeUvHp2siSMmngefeQxi8ov+mMw\nj3n0oc9he5+xxHYwz6NP9AVzEk2tWrVYvXp18UTBRBHrd17egjnNgDRgc9T2LQRd7fEcuD9wDhA5\nZXEjMAp4jyDQjwD+ZWa93X3hgaWIiEh1SaUgLAeq1pXxzOws4BngRndfUrTd3VcBqyKSvmNmWcAt\ngAK9iIgkrGQbRisv0G8F9gMtora3IBinjyk8M/9VYJy7/ymOurwLXFbajuzs7OKfQ6EQoVAojuJE\nRA4fubm55Obm1nQ1Dgv79++v6SoclDLH6AHM7B1gmbtfG7FtFTDD3f87Rp6zgVeAO9x9clwVMZsJ\nNHL3c6O2a4xeEobG6CVZHMwYvaSGio7RAzwIPGVm7xLcWjeKYHx+arjgCcDpRQE6PNHuVeAR4Fkz\nKxrL3+/uX4fTjAE+J5iEVxe4HBgIDKroCR4sPQxEklGydRmKSM0rN9C7+wtm1hS4HWgFLAcucPf1\n4SQtgciphyOB+gTj7bdEbP8iIl0dYCKQCewBVoTLnFPhMxE5TET3IoiIlKXcrvuaVl3dTGrRS0XU\ndNd99DHVdS+xqOv+8BPrd6617kVERFKYAr2IiCS07OxsRowYkVDlPfHEE/Tq1auKalS9qvU+ehER\nSXyJvjJeVdfvcJvUqkAvIiJU56h9ZcNqVc4p2LdvX5WVlSzUdS8iIgnj3nvvJTMzk/T0dE488UTm\nzp2LmZGfn8/IkSNJT0+nS5cuLF26tDjPxo0bufjiizn66KPp0KEDDz/8cPG+7OxsLrnkEkaMGEFG\nRgZPPvnkAcecNWsWnTt3pkmTJvTp04dPP/20eN/69esZNGgQRx99NM2aNePGG28std633HILvXr1\nKvNpdDVFgV5ERBLCypUrmTJlCkuWLGHHjh3k5OSQlZWFuzNr1iyGDh3K9u3bGTBgADfccAMAhYWF\nXHjhhZxyyils3LiRf/3rX0yePJmcnJzicmfNmsXgwYPZvn07w4cPL3HMVatWMWzYMB566CG2bt3K\nBRdcwIUXXsi+ffvYv38//fv3p3379qxdu5YNGzYwdGjJJ667O9dccw0rVqzgn//8J+np6dX/QR0k\nBXoREUkIaWlp7N27l48//piCggLatWtX/IS4Xr160a9fP8yMyy+/nGXLlgHw3nvvsXXrVm6//XZq\n165N+/bt+fnPf85zzz1XXG6PHj0YMGAAAPXr1y8xFPD888/Tv39/fvSjH5GWlsZvfvMb9uzZw7//\n/W/effddNm3axMSJE2nQoAH16tWjR4/vn89WUFDAkCFD2LZtG7Nnz6Z+/fqH4mM6aBqjFxGRhHDc\ncccxefJksrOz+fjjj/nxj3/Mgw8+CECLFt8/cqVhw4Z89913FBYWsnbtWjZu3EiTJk2K9+/fv5+z\nzz67+H1mZmbMY27cuPGAZ8K3bduWDRs2UKdOHY455hhq1Sq9Tbx69Wo++ugjFi9eTO3aiRtO1aIX\nEZGEMXToUBYsWMDatWsxM2699dYyZ8m3bduW9u3bk5eXV/zasWMHr7zyChAE7rLyt2nThrVr1xa/\nd3fWr19PZmYmbdu2Zd26dTEfYnPSSSfx+OOPc/7557Nq1apS0yQCBXoREUkIq1atYu7cuezdu5d6\n9epRv3590tLSyszTvXt3GjVqxH333ceePXvYv38/K1asYMmS4Mno5c3YHzx4MK+++ipz586loKCA\nBx54gPr169OjRw9OP/10WrVqxe9+9zt2797Nd999x6JFi0rkHzJkCPfccw/nnnsun332WeU+gGqi\nQC8iIglh7969jB07lubNm9OqVSu2bt3KhAkTgAPvfS96n5aWxiuvvMKHH35Ihw4daN68Ob/4xS+K\nZ7+X1qKP3HbCCSfw9NNPc+ONN9K8eXNeffVVZs+eTe3atUlLS2P27NmsXr2adu3a0bZtW1544YUD\nyrjiiiu44447OOecc1i3bl31fUAVpLXuE/z8JbForXtJFgez1n2iL5gj8anMY2pFRCSFKQinNnXd\ni4iIpLC4Ar2ZXW9mn5vZHjNbYmY9y0gbMrOXzWyjme0ys2Vm9rNS0vU2s6XhMteY2bWVORERERE5\nULmB3swuAyYDdwNdgUXA62bWNkaWM4FlwMVAZ+BRYJqZFS8nZGbtgdeAheEyJwAPm9mgip+KiIiI\nRCt3Mp6ZLQY+dPdrI7atAv7u7rfFdRCz54E0d78k/P5e4CJ3PyEizZ+Bzu7eIyqvJuNJwtBkPEkW\nBzMZT1JDrN95mS16M6sLnArkRO3KAXocmCOmDODbiPdnxiizm5mVfdOkiIiIxK28WffNgDRgc9T2\nLUDLeA5gZv2Bcyh5YdCilDI3h+vTrJR9IiIiUgHVenudmZ0FPAPc6O5LKlpOdnZ28c+hUIhQKFTp\nuomIpJLc3Fxyc3NruhqSgMocow933e8Chrj7ixHbpwCd3L1PGXl7Aq8C49z9oah984Hl7n5DxLbB\nBBcFDdx9f8R2jdFLwtAYvSSLVBujz83NZcSIEaxfv76mq5KwKjRG7+75wFLgvKhdfQlm38c62NkE\ns+rvjA7yYW+Hy4gu873IIC8iItWvaDnX6nxJzYmn6/5B4Ckze5cguI8iGJ+fCmBmE4DT3f3c8PsQ\nQUv+EeBZMysay9/v7l+Hf54K3GBmk4BpwFnASGBIVZyUiIgcpHnzqq/sPjE7f+UQKPc+end/ARgD\n3A58QDCp7gJ3L+o/aQl0iMgyEqgP3AJsAjaGX4sjyvwCuAA4O1zmWIJx/JmVOx0REUlWtWrVKvEE\nuCuvvJJx48YdkG7ixIlccsklJbbddNNNjBkzBoAnnniCY489lvT0dDp06MDf/vY3IHh+fO/evWnc\nuDHNmzdnyJDv25affvopffv2pWnTppx44onMmDGjeN9rr71G586dSU9PJzMzkwceeKBKz7u6xTUZ\nz90fJVj4prR9Pyvl/QEr4ZWS7y3gtHiOLyIih59Y3f6XX34548ePZ/v27WRkZLBv3z6ef/555syZ\nw65duxg9ejRLliyhY8eObN68mW+++QaAcePG0a9fP+bPn09+fn7xo2x37dpF3759ufvuu3njjTf4\n6KOP6Nu3LyeffDInnngiV199NX//+98566yz2L59e8I+jjYWrXUvIiIJq7SJg61ataJXr17Fre45\nc+bQrFkzTjnlFCDoGVi+fDl79uyhRYsWdOrUCYC6devyxRdfsGHDBurWrUuPHsFd36+88grt27dn\n5MiR1KpVi65duzJo0KDiR9LWrVuXjz/+mB07dpCRkVF8nGShQC8iIkln5MiRPP300wA8/fTTXHHF\nFQAcccQRPP/880ydOpXWrVvTv39/Vq5cCcB9992Hu9O9e3e6dOnC9OnTAVi7di2LFy+mSZMmxa+/\n/e1vbN4cLOny4osv8tprr5GVlUUoFOKdd96pgTOuOAV6ERFJCA0bNmT37t3F7zdt2hRzxv7AgQP5\n6KOPWLFiBa+++irDhw8v3nfeeeeRk5PDV199xYknnsg111wDQIsWLZg2bRobNmzgT3/6E9dffz1r\n1qyhXbt29O7dm7y8vOLXzp07mTJlCgDdunXjpZde4uuvv+aiiy7i0ksvrcZPoeop0IuISELo2rUr\nzzzzDPv372fOnDm89dZbMdM2aNCAiy++mGHDhvHDH/6QzMxMALZs2cLLL7/Mrl27qFOnDkcccQRp\nacHK6jNmzODLL78EoHHjxpgZaWlp9O/fn1WrVvH0009TUFBAQUEB7733Hp9++ikFBQU888wzbN++\nnbS0NBo1alRcXrJQoBcRkYTwxz/+kdmzZxd3nf/0pz8tsT+6dT9y5EhWrFjBiBEjircVFhYyadIk\n2rRpQ9OmTVmwYAGPPhrMJV+yZAlnnHEGjRo1YuDAgTz00ENkZWVx5JFHkpOTw3PPPUebNm1o1aoV\nY8eOJT8/HwiGBtq3b09GRgbTpk3jmWeeqeZPomqV+/S6mqaV8SSRaGU8SRYHszLeoVjQpjq+o+vX\nr+fEE09k8+bNHHnkkVVefrKJ9Tuv1rXuRUQk8SXjhWJhYSEPPPAAQ4cOVZAvhwK9iIgklV27dtGi\nRQvat2/PnDlzaro6CU+BXkREksoRRxzBf/7zn5quRtLQZDwREZEUpkAvIiKSwhToRUREUpgCvYiI\nSApToBcREUlhcQV6M7vezD43sz1mtsTMepaRtp6ZPWFmy8ws38zmlZImZGaFpbyOr8zJiIiISEnl\n3l5nZpcBk4HrgIXAL4HXzayTu68vJUsasAd4GPgJkFFG8Z2AbyPeb42z3lUmckWoZFw0QkSkspJ1\nZTyJTzz30d8MTHf3x8LvbzKzfgSB/7boxO6+O7wPM+sKNC6j7K/d/ZuDq3LVK1pGVETkcDWPAzpf\nq4z+vtbUBSYFAAAUMElEQVSsMrvuzawucCqQE7UrB+hRBcdfYmYbzexNMwtVQXkiIpKkpk+fzoAB\nA4rfd+zYscQjYdu2bcuyZcsYPXo07dq1IyMjg27durFw4cLiNNnZ2VxyySUMGTKE9PR0TjvtND76\n6KNDeh6Jprwx+mYEXfGbo7ZvAVpW4rgbgVHAoPBrJfCvssb+RUQktYVCIRYsWADAxo0bKSgo4J13\n3gHgs88+Y9euXfzgBz+ge/fuLFu2jLy8PIYNG8bgwYOLnzQHMGvWLC699NLi/RdddBH79u2rkXNK\nBDWyBK67rwJWRWx6x8yygFsI5gGUkJ2dXfxzKBQiFApVa/1ERJJNbm4uubm5NV2NSmnfvj2NGjXi\ngw8+YOXKlfz4xz9m2bJlrFy5kkWLFnH22WcDMHz48OI8N998M3fffTcrV67k5JNPBqBbt24MGjSo\neP8DDzzAO++8Q8+eh2dbsrxAvxXYD7SI2t4C2FTFdXkXuKy0HZGBXkREDhTdCBo/fnzNVaYSevfu\nTW5uLqtXr6Z37940btyY+fPn8/bbb9O7d28A7r//fh5//HE2btyImbFjxw62bv1+LndmZmbxz2ZG\nZmYmmzZVdchKHmV23bt7PrAUOC9qV19gURXXpStBl76IiBymevfuzbx581iwYAGhUKg48M+fP5/e\nvXuzYMECJk6cyIwZM9i2bRt5eXlkZGSUmNW/fv33N4QVFhby5Zdf0rp165o4nYQQz330DwJXmtnV\nZnaSmf2RYHx+KoCZTTCzNyMzmFmn8Iz7ZsCRZvaD8Pui/WPMbKCZdTSzzmY2ARgIPFJVJyYiIsmn\nKNB/9913tG7dmp49ezJnzhy+/fZbTjnlFHbu3Ent2rVp1qwZ+fn53HXXXezYsaNEGUuXLmXmzJns\n27ePyZMnU79+fc4444waOqOaV+4Yvbu/YGZNgduBVsBy4IKIe+hbAh2isr0KHFNUBPBB+N+08LY6\nwEQgk+Ce+xXhMqv1wcJF94rqfk4RkcTUsWNHGjVqRK9evQBIT0/n2GOP5eijj8bM6NevH/369eP4\n44/niCOO4Fe/+hXt2rUrzm9mDBw4kOeff56RI0fSsWNH/vGPf5CWlhbrkCnPEj3omZlXVR0jA33k\nAhFF99En+mchNS/43jgQ/i4V/xSoju9QWcfUd1diMTPc/YCVcEr7m5pKC+aMHz+e1atX89RTTx2S\n4yWSWL/zGpl1L4lNPR8ih5dU+r+eSudSVfRQGxERSRlmdkh6KJKJWvQiIpIy7rzzzpquQsJRi15E\nRCSFKdCLiIikMAV6ERGRFKYxehGRw4wmqx1eFOhFRA4jpd1nLalNXfciIiIpTIFeREQkhSnQi4iI\npDAFehERkRSmQC8iIpLCFOhFRERSWFyB3syuN7PPzWyPmS0xs55lpK1nZk+Y2TIzyzezeTHS9Taz\npeEy15jZtRU9CRERESlduYHezC4DJgN3A12BRcDrZtY2RpY0YA/wMPAqweOzo8tsD7wGLAyXOQF4\n2MwGVeAcREREJIZ4Fsy5GZju7o+F399kZv2A64DbohO7++7wPsysK9C4lDJHAV+6++jw+5Vm9kPg\nN8A/Du4UREREJJYyW/RmVhc4FciJ2pUD9KjEcc+MUWY3M0urRLkiIiISobyu+2YEXfGbo7ZvAVpW\n4rgtSilzM0EPQ7NKlCsiIiIRkmKt++zs7OKfQ6EQoVCoxuoiIpKIcnNzyc3NrelqSAIqL9BvBfYT\ntMAjtQA2VeK4X3Fgj0ALYF/4mCVEBnoRETlQdCNo/PjxNVcZSShldt27ez6wFDgvaldfgtn3FfV2\nuIzoMt9z9/2VKFdEREQixHMf/YPAlWZ2tZmdZGZ/JGiNTwUwswlm9mZkBjPrFJ5x3ww40sx+EH5f\nZCrQxswmhcv8OTASuL8qTkpEREQC5Y7Ru/sLZtYUuB1oBSwHLnD39eEkLYEOUdleBY4pKgL4IPxv\nWrjML8zsAmASwa14G4Ab3X1m5U5HREREIsU1Gc/dHwUejbHvZ6Vsax9HmW8Bp8VzfBEREakYrXUv\nIiKSwpLi9joRSS5mVvyz+wGrYIvIIaQWvYhUj+yaroCIgFr0IjVGrV4RORTUohepSfNKfYqziEiV\nUaAXERFJYQr0EpOZleheFhGR5KNALzFp1Fgk8emCXMqjQC/l0h8REZHkpUAvIiKSwhToRUREUpgC\nvcRF44AiIslJgV7iMg/d7y0ikowU6EWkSqnnRySxxBXozex6M/vczPaY2RIz61lO+pPNbL6Z7Taz\nL81sXNT+kJkVlvI6vjInIyIiIiWVu9a9mV0GTAauAxYCvwReN7NO7r6+lPTpwD+BXKAbcBIw3cx2\nufuDUck7Ad9GvN9akZMQERGR0sXTor8ZmO7uj7n7Sne/CdhEEPhLMxyoD4x090/c/UXg3nA50b52\n9y0Rr8KKnIRIslN3t4hUlzIDvZnVBU4FcqJ25QA9YmQ7E1jg7nuj0rc2s2Oi0i4xs41m9qaZheKv\ntmaBixwK+n8mkvzKa9E3A9KAzVHbtwAtY+RpWUr6zRH7ADYCo4BB4ddK4F/ljf2LiJRFFyYiB6qO\n59GXu0S6u68CVkVsesfMsoBbCOYBlJCdnV38cygUIhQKVbKKIjVLwUiqWm5uLrm5uTVdDUlA5QX6\nrcB+oEXU9hYE4/Sl+YoDW/stIvbF8i5wWWk7IgO9SLIqCu7uelyQVL3oRtD48eNrrjKSUMrsunf3\nfGApcF7Urr7AohjZ3gZ6mVm9qPQb3H1tGYfrStClLyIiIlUknln3DwJXmtnVZnaSmf2RoMU+FcDM\nJpjZmxHp/wbsBp4ws85mNgi4NVwO4TxjzGygmXUMp5kADAQeqaLzEhEREeIYo3f3F8ysKXA70ApY\nDlwQcQ99S6BDRPodZtYXmAIsIbhP/n53nxRRbB1gIpAJ7AFWhMucU/lTEpHqoKEHkeQU12Q8d38U\neDTGvp+Vsm0F0LuM8iYSBHoRERGpRlrrXkREJIVVx+11IiI1KvL2RQ01yOEuZVr0WihDREqYp0cr\ni4Ba9CIiSUM9FVIRCvRSrCp7RCrzB0mzu0XK4oB6LyV+KdN1L1WlCoOruk4lAWhITw53Sd+i139i\nERGR2FKjRa+WoyQRXZyKyKGUGoFeJJlk13QFRORwknRd92oNiUg0/V0QiS1JW/SajS0i0fR3QaQ0\nSRroRUREJB4K9CIiIiksrkBvZteb2edmtsfMlphZz3LSn2xm881st5l9aWbjSknT28yWhstcY2bX\nVvQkREREpHTlBnozuwyYDNwNdAUWAa+bWdsY6dOBfwKbgG7AaOAWM7s5Ik174DVgYbjMCcDDZjao\nUmcjItXuYJ8rUZRWz6MQqRnxtOhvBqa7+2PuvtLdbyII4tfFSD8cqA+MdPdP3P1F4N5wOUVGAV+6\n++hwmX8BngR+U+EzOYRyc3MPab6aylvxI1ZcjZxnZY5Z4ZwVV5ljVsVnNI9Ds25FZX4vNXHMZPu7\nIIePMgO9mdUFTgVyonblAD1iZDsTWODue6PStzazYyLSlFZmNzNLi6fiNSnZ/kPXdKA/mFZcZF0P\ntgWoQB9H3jjPNfqzT7agWxPHTLa/C3L4KK9F3wxIAzZHbd8CtIyRp2Up6TdH7ANoESNN7fAxRSrc\nclQXcVX5/na18ePH12A9RKQyqmPWvW5mPQxVJrhWNG9RvqK848ePV4CPEv3ZFn1GB/s53VkF9RCR\nmmFlPQo03HW/CxgSHmsv2j4F6OTufUrJ8yTQ1N37R2w7HVgMtHf3tWY2H1ju7jdEpBkMPAM0cPf9\nEdt14SAiUgHurissKXsJXHfPN7OlwHnAixG7+gIzYmR7G7jXzOpFjNP3BTa4+9qIND+NytcXeC8y\nyIfroC+qiIhIBcXTdf8gcKWZXW1mJ5nZHwnG2qcCmNkEM3szIv3fgN3AE2bWOXzL3K3hcopMBdqY\n2aRwmT8HRgL3V8E5iYiISFi5D7Vx9xfMrClwO9AKWA5c4O7rw0laAh0i0u8ws77AFGAJ8C1wv7tP\nikjzhZldAEwiuE1vA3Cju8+smtMSERERKGeMPtmZWbvI9+6+rqbqUlFm1gUIEfS+LHT39ytRVqa7\nf1lVdUt2ZnYWsNTdv6uCstoRrA1ReKiOWZ3C51Pg7psitrUGapf3/6i0vBHbY35GkZ9Non5OZtYd\neN/d98WZ/jSCxtGFHPxE5Tnuvvsg82BmmcCm6GFQOXwlRaA3szuJ8Z/E3e8ys18STAC8KypfYcmk\nXqF79M3sEeBOd/8mzvRHEvxB3BZ+X4tgMaCewPvAPe6eH0c51wK/J7h1uh5wDnC3u98bR94NwHzg\ncXd/08xOAWa7e2ZUuqLP1gg+o7sOLK3U8ju4+2fxpC0rffizaQNkANuAjXEEy57AF+7+pZm1IZjk\nuTDeukSUsxP4wcGcRxllFQIfA79097eq+5hm9n/A8bG+02Y2CZhJcHFY5ucZI38h8Km7d4rY9inQ\nsbz/R6Xljdge8zOK/Gzi/ZzM7ELgWOA5d/8qztMrytsSGEcwX+go4P8BE939r+WcW0t33xLnMXYC\nPwBWH0zdCP5PdqzI96Qqv9eSGpLlefSDOTDQW3jbXcAgguGDEkHK3St8+2BU63c4cB/wjZlFD12U\n5q/AR0B2+P3NwFjgZeDnBGsF3BCdycyau/vXEZtGA/9V9AfMzHoB/yBYabA8dwAnA8+F69yNYNnh\naO2p2C2Ri8JzM6YDc72UK0YL7qn6EfCz8L8tw9sbApcCQ4GzgIYR2Xab2b+BZ4EXYrRo6hLM+bgU\neAD4UwXqX9WuIvgs7we6H4LjTQGalrG/AcFnWM/MXgFeAt5w9z1xln8VwYVXpLEEF2QVyVu0vco+\nIzP7HcGF8BZgrJn1dfeP4sx7BsGF0J+BXgSrfZ4K/K+Z1Q2v1hnLPWYWT0vbCL6rRVq5e/T6IbHq\ntzOedCLxSIoWfUVF9wTE21oN590FbCVY2/8ioK+7L4znatnMPgNGuPu/w+8/JmjFP1P0B8bdW5WS\n73PgLnefHn6/BLjF3eeF318bfn9cKXkbEPw+d0dtHwY8TXCbZHt33xrvZ1AWM8sAbgGuIQgq7wPr\ngP8AjYB2wCnAd8A0gpbSdjMbDdwGfE1w4fNeON8OIB04hiAIXAgcTdCD8VApx59CEEwau/svK3gO\nh7zlcyiPGb7QOp3g+zuQIMi+SRD0Z0ddVNa4g23Rm9k6YJy7P2lmtxFcGF8B/B9B4G5OKUMN4e/u\np8Av3H121L7OwGvufoyZPQ/cEPk5mVku8V8YFzVGhgP3EMxDiiuAm9nU8Lkd9O9ILXqJluqB/glK\ndkv/7CDy1iG4wu8J/A+QT7B6XxbBH5QXo6/OzaxoKbeiLvqioNubIKDtDtflbIJudSLXIgh3Q08h\nCJS/IFhBcAZQh6D3ZR/BBcQbpdT3HwRjetMitvUE5gC/I2g5fx1+VkGVCa+1cG74nDqG676doKty\nAfBm5DCFmb0I/E88cw3C45u3ufvFEduKPuNGBL+f94GdUPKzjFFe5IWfEVxwPArkhbfFPXRRUWY2\nHHjZ3f9TnceJcezj+D7o/xB4lyDoP+vuGw51faJVIND/B+ji7l+E399O0KvnBBeLz1DKUEM43X+5\n+6VmtoKgRynyNt5jCHqfxgGF7j66qs7xUFCgl2gpHegrw8waFHVzmlkecBrBXQdvAiuAzsB6dz+h\nlLxrCQLyW2bWH3jQ3Y8P78sA1rl7zC5QM7uYYKjgL8BDBGOQtYCVsbpezWwL0Nvd/y/8vku4rjd5\ncOfEWcDz0WP0ySrcot8BpMfboo+48IPgD/swYBbBhcJBXwwmMzNrTtBrMgD4t7tPrOEqVSTQv0/Q\n6n01Yltrgv+nnxAMXTV099yofIuBCe7+kpldSXAh/HuCXq8xBE/VvBM4jmCOQ/OqO8vqp0AvB3B3\nvUp5AXsJVvObRNAS7xzevpMg8NYDesXI+ySwkqBFsJqg+7lo39nAkjiOn0Ew9vwB0D2O9P8BTgz/\n3J6g+/KciP0dgd01/blW0e+mDzAj/PPzQKiC5ewEjq3p89GrxO+jQ/TPZaS/AfhHBY6zOeL/81Lg\nRxH7mhJcQNYjuPjLJ5h8V+OfT0U+R730cvdqWes+VWQSdNnvJXiwz/tmtpDgD8CpBK2/BTHy/pqg\nW/RSglb1PRH7fkowZl4qM/uJmf2aILhfC9wIPGZmk83siDLq+wHwRzO7DniLYDxwbsT+nxDMKk4F\nBXz/2ONfA5W5jUhdWknK3R9x90EVyJrP95MKWxG05It8R9CVn07Qi1aLyn2/aoJWE5US1HUfh3DX\n/dnASQQz6r8iGD9/1917V+FxHgAuB+YRTKJ60oPbB+sRjCcPA8Z4RFdlRN7TCMbz9xN0R/8UGA+s\nIWgB30YQ/MuaTXxYURdnYqnI7XUVPM4bBPMk/jc8nHMCwYXjboJu/JPcvauZdSWYY5JUT9S0YL2C\nDa776CVMgT4OZraNYPLOuvAfoK4ErYCQuz9Xhcf5FjjP3ZeY2VHAYnfvGLG/E/And+8VR1lXENyG\n14LgD9gkdx9XVXVNBeGZ2o+6e165iaXaRf4+qvN3Y2ZXEVww/5eZpRPcojmA4Fa4t4DRHqze+ShQ\nK9yzJpK0FOjjEHmFHL5Vrp+XfR99RY+zHrjZ3WeEWxN/dff/ikpjHucvzYLFaI4G8vz7BwyJHNbC\nd9S8B7zi7rfHSDMAeBzo6lpNUpKcAn0CCd969WeC29MaAiPd/aWarZVI6jGzLOANgtszf+/un4S3\ntyCYFzMKuMSjZuyLJCMF+gRjZs0IVvn7f+pSFqk+ZtYI+C3BIjsZBBNv6xAs5DTew/fniyQ7BXoR\nOeyZWROCIL/VK/BsAJFEpkAvIiKSwnQfvYiISApToBcREUlhCvQiIiIpTIFeREQkhf1/BmYtWNLZ\nUuUAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "punctuation_normalised = punctuation.div(punctuation.sum())\n", "ax = punctuation_normalised.plot(kind='bar', fontsize=14)\n", "ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Too many types of punctuation with very low counts. Let's just look at the most common punctuation." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAegAAAD/CAYAAAA69EWbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X10VNX97/H3TsKTShIeQgwkMQGiCNhCa1HRNAkVGvrj\nqQqWIAEq1eu1omhrlVsxscsfPiCV4mVVuS2gggVpCwREjL+SUSz1AVrQUCErUAMkSEVDghGSQM79\nI8k0hDxMZuZkziSf11pnrcmcc/Z8Z2dmvmfvs8/ZxrIsRERExFlCAh2AiIiIXEwJWkRExIGUoEVE\nRBxICVpERMSBlKBFREQcSAlaRETEgcLsKtgYo+u3RES8YFmWCXQMEni2tqAty3LUkpWVFfAYFFPH\niksxKSZ/LyL11MUtIiLiQErQIiIiDtSpEnRqamqgQ7iIYvKcE+NSTJ5RTCJtZ+w652GMsXQ+RUSk\nbYwxWBokJnSyFrSIiEiwsO0yK6g9Euyo1DsgIiJ2sjVBk5dna/EBk5YW6AhERKSDUxe3iIiIA9k6\nSMyWgh1CXdwiYgcNEpN6tnZxZ2VluR+npqbqsgYRkUZcLhculyvQYYgD6TIrEREHUQta6ukctIiI\niAMF9WVWaqGLiEhHZe9lVtiZQNUDJCIiHZfNCVpJVERExBttStDGmHjgnzTfNL7asqxj9X+oC1pE\nRMQ7bRrFbYwJBa5oYZMiy7LO122rUdwiIm2kUdxST5dZiYg4iBK01NNlViIiIg6kBC0iIuJAStAi\nIiIOpAQtIiLiQErQIiIiDqQELSIi4kBK0CIiIg6kBC0iIuJAbU7Qxph7jTF/N8Z8ZYw5Yox5pIVt\n/bqIiIh0Ft5MljEGWAjsB1KA3xlj9luWtaXxhnnk+Rjef6SR5reyREREnM7nW30aY4qBxZZlLW30\nfLve51O3FRWRjkC3+pR6Pk03aYz5P3VlrGtygzz/taBblKbWtYiIdCxet6CNMY8C84CxlmV91MR6\ntaBFRNpILWip51UL2hjTH/gVML6p5Fwvq8HjVCANJVIRkYZcLhculyvQYYgDedWCNsZ8G/gASLQs\n60gz2zg2E+sgQUScSi1oqeftOeh/AqOA4y1ule1l6XbKDnQAIiIirfO2BT0KeBkYY1lWSTPbOLaZ\nqha0iDiVWtBSz9sW9CVAUuv7e5MIjRKoiIh0ej5fB91swcZYStAiIm2jFrTU8+k66NbpMyYiIuIN\nWxO0WsIiIiLe0WxWIiIiDqQELSIi4kBK0CIiIg6kBC0iIuJAStAiIiIOpAQtIiLiQErQIiIiDqQE\nLSIi4kBK0CIiIg5k653EjGn+Vp+6y5iIiEjzbE3QeeQ1+XwaaXa+rIiISNCzeTar5qkFLSJyMc1m\nJfV8PgdtjJljjKkxxsQ3XmdZVrOLiIiINM8fXdyngANAdeMVOgctIiLiHZ8TtGVZm4BNTa3LavA4\ntW4BzRItIlLP5XLhcrkCHYY4kK3noJsr2aAWtIhIU3QOWurZe5lVS+ta6P7uKHQQIiIi3rI1QZNt\na+nOlh3oAEREJJj53MVtjJkDrAQSLMs60uD5Tt98VAtaRNpKXdxSzx8t6ERgP3Ds4lXBmKCMEquI\niAScPxL0eOCnlmXVXLxKB4EiIiLesHcUt1qiIiJt0lwXt04bdlzNndKwd5CYiIj4jRo9HU9LVzRp\nukkREREHUoIWERFxICVoERERB1KCFhERcSANEhMRCULtcbtkDUoLLCVoEZGgZWcC1X0sAk1d3CIi\n4rWEhASeeuophg0bRu/evbnjjjuorKyktLSUCRMm0K9fP3r37s3EiRMpLi5275eamsqCBQu47rrr\niIiIYMqUKZSWlgbwnTiPErSIiPjk1VdfJTc3l0OHDlFQUMATTzyBZVnMnTuXI0eOcOTIEXr06MG9\n9957wX6vvPIKq1at4vjx44SFhXHfffcF6B04k613Emtunc5riIg0raU7iTX87aw9B21vF7cnv9WJ\niYksWLCAu+66C4A33niDefPmUVhYeMF2e/fuZcyYMXz55ZcApKWlccMNN7Bo0SIAPvnkE0aMGMHZ\ns2c7xXTE9VqaHMXWc9B55F30XBppdr6kiIi0s7i4OPfj+Ph4SkpKOHPmDPPnz+fNN990d11/9dVX\nWJblTsCN96uurubkyZNERUW17xtwKFsTdHPJWKMPRUQ6jiNHjlzwuH///ixZsoSCggI++OAD+vXr\nx969e/nWt751QYJuvF+XLl3o27dvu8fvVPbOB513cQu6XaSlKUGLSFAKti7uhIQEIiIi2LZtGz16\n9GDSpEmkpqZSXV3Nxx9/zMaNG6moqGDu3Lls3ryZc+fOERISQmpqKocOHSI3N5crrriC2bNn061b\nN9asWWPje3Kelrq4/TFI7BRwAKj2Q1kiIhJEjDHMmDGDcePGMWjQIJKSknj00UeZP38+Z86coW/f\nvowePZrx48df0HtqjCEzM5M5c+YQExNDVVUVy5YtC+A7cZ6ADBJrD2pBi0gwalsL2l6eDhL7/e9/\nz5gxY9pUdlpaGpmZmdxxxx3ehtchBGyQWBbwOJCVlUVqaiqpqal2vpyISNBxuVy4XK4279cRGiEd\n4T3YqcO2oFujD4aIOJGnLWinUAvaNy21oP0xSOyHwJPAGMuySho8b5HtU9H2yVaCFhFnCrYELb6x\nu4s7AkhqsqxsP5QuIiLSCdncxd2wbM+G7IuIdGZqQXcuARskptlQREREvGNrgtbRnoiIiHc0m5WI\niIgDKUGLiIjfZWdnk5mZ6dW+q1evJjk52c8RBR+bz0GLiIgdnHInseZ0pikj7aIELSISrLKdW3Zn\nHYN0/vx5QkND/VKWurhFRMQnTz/9NLGxsYSHhzNkyBB27NiBMYaqqipmz55NeHg4w4cPZ8+ePe59\nnnrqKQYPHkx4eDjDhg1j06ZNzZb/0EMPkZyczOnTpykrK2Pu3Ln079+f2NhYFi5cSE1NDQCFhYWk\npKQQGRlJVFQU06dPd5cREhLC888/z6BBg4iKiuIXv/jFBQcRK1euZOjQofTu3Zv09PQLpsK8//77\niY+PJyIigmuvvZZ3333XvS47O5upU6eSmZlJREQEL730UosxtoUStIiIeO3gwYMsX76c3bt3U15e\nTm5uLgkJCViWRU5ODhkZGZSVlTFp0iTuvfde936DBw/m3Xffpby8nKysLGbOnMmJEycuKNuyLO68\n807y8/N566236NmzJ3PmzKFr164cOnSIf/zjH+Tm5vK73/0OgIULF5Kens6pU6coLi7mvvvuu6C8\nTZs2sWfPHv7+97+zefNmVq5cCcDmzZt58skn2bhxIydPniQ5OZmMjAz3fqNGjWLfvn2UlpYyY8YM\npk2bRlVVlXt9Tk4O06ZNo6ysjBkzZrQYY1soQYuIiNdCQ0OprKxk//79VFdXEx8fz8CBAwFITk4m\nPT0dYwwzZ85k37597v2mTp3K5ZdfDsBtt91GUlIS77//vnt9dXU106dP59SpU2zZsoXu3btz4sQJ\n3njjDZ577jl69OhBVFQU8+fPZ926dQB07dqVTz/9lOLiYrp27cro0aMviPXhhx8mMjKSuLg45s+f\nzx/+8AcAXnjhBRYsWMBVV11FSEgICxYsYO/evRw9ehSA22+/nV69ehESEsKDDz5IZWUlBw8edJc7\nevRoJk2aBEBZWVmLMbaFErSIiHht8ODBLF26lOzsbKKjo8nIyOD48eMAREdHu7e75JJLOHv2rLur\n9+WXX2bkyJH06tWLXr16kZ+fzxdffOHevrCwkC1btvDYY48RFlY7XKqoqIjq6mpiYmLc+9199918\n/vnnADzzzDNYlsWoUaMYPnw4q1atuiDWuLg49+P4+HhKSkrc5d5///3uMvv06QNAcXExAM8++yxD\nhw4lMjKSXr16UVZWxsmTJ91lxcbGuh+3FmNbKEGLiIhPMjIy2LlzJ0VFRRhjePjhh1scxV1UVMRd\nd93F8uXL+fLLLyktLWX48OEXnBO++uqrWblyJePHj6egoACoTbDdunXjiy++oLS0lNLSUsrKyvj4\n44+B2gOCFStWUFxczIsvvsg999zD4cOH3WU2PK985MgRBgwYANQm6xUrVrjLLC0tpaKiguuvv56d\nO3eyePFiNmzYwKlTpygtLSUiIoLm5uZuLca2sDVBG2O0aNHSyiISzAoKCtixYweVlZV069aN7t27\ntzqKuaKiAmMMffv2paamhlWrVpGfn3/RdtOnT2fRokXcfPPNHD58mJiYGMaNG8eDDz7I6dOnqamp\n4dChQ7zzzjsAbNiwgWPHjgEQGRmJMYaQkP+kuWeffZZTp05x9OhRli1bxo9+9CMA7r77bhYtWsQ/\n//lPoLabesOGDQCcPn2asLAw+vbtS1VVFb/61a8oLy9v9r21FmNb2HqZVR55dhYvEvTSSAt0CCI+\nqaysZMGCBXzyySd06dKFG2+8kRUrVvDiiy9edABa//fQoUP52c9+xg033EBISAizZs3ipptuumC7\n+m1nzZpFVVUVY8aM4Z133uHll1/mkUceYejQoZw+fZqBAwfyyCOPALB7924eeOABysrKiI6OZtmy\nZSQkJLjLnTx5Mt/+9rcpKyvjxz/+sXsu6ilTpvDVV18xffp0ioqKiIiIYNy4cUybNo309HTS09O5\n8sorufTSS3nggQeIj49vMtZ6LcXYFrbOZqUELdKyNNI67fWi0jRjPJvNqj16XzrSZzMkJITCwkL3\nADanaO7/DTa3oNU6EGldR+3m7kg/7k6k+u347L2TWJ5a0CKdUpoOzsVZgvFA2NYublsKFpGgoBae\ndzzt4paOIWBd3FkNHqfWLSLBzKDEI/7lcrlwuVyBDkMcSC1okTZSghY7qQXduQSsBW3rTCsS3LKV\n6EREWuJzC9oYcy/wU8uyrm70vH59pVPTAYh4Qy3ozsXuFnQf4MqmV+nDJJ1V8I0YFRFn8flWn5Zl\nPW5Zln9mpxYRkQ4hOzubzMxMR5W3evVqkpOT/RSR/ew9B61WhIiILZx+JzF/xxeM1zH7ytYErfMl\nIiL2sfMX1td06M/f/3PnzvmtrGCi6SZFRMQnTz/9NLGxsYSHhzNkyBB27NiBMYaqqipmz55NeHg4\nw4cPZ8+ePe59SkpKuPXWW+nXrx8DBw7k+eefd6/Lzs5m6tSpZGZmEhERwUsvvXTRa+bk5DBs2DB6\n9epFWloaBw4ccK87evQot9xyC/369aNv377MmzevybgfeughkpOTW5ydKpCUoEVExGsHDx5k+fLl\n7N69m/LycnJzc0lISMCyLHJycsjIyKCsrIxJkyZx7733AlBTU8PEiRMZOXIkJSUl/OUvf2Hp0qXk\n5ua6y83JyWHatGmUlZVx++23X/CaBQUFzJgxg2XLlnHy5El+8IMfMHHiRM6dO8f58+eZMGECiYmJ\nFBUVUVxcTEZGxgX7W5bFnXfeSX5+Pm+99Rbh4eH2V5QXlKBFRMRroaGhVFZWsn//fqqrq4mPj3fP\nGJWcnEx6ejrGGGbOnMm+ffsA+PDDDzl58iSPPvooYWFhJCYm8pOf/IR169a5yx09ejSTJk0CoHv3\n7hd0ma9fv54JEybwve99j9DQUH7+859z5swZ/vrXv/LBBx9w/PhxFi9eTI8ePejWrRujR49271td\nXc306dM5deoUW7ZsoXv37u1RTV6xeZCYiIh0ZIMHD2bp0qVkZ2ezf/9+vv/97/PrX/8agOjoaPd2\nl1xyCWfPnqWmpoaioiJKSkro1auXe/358+f57ne/6/47Nja22dcsKSm5aE7muLg4iouL6dKlC1dc\ncQUhIU23PwsLC/noo494//33CQtzdgpUC1pERHySkZHBzp07KSoqwhjDww8/3OKo67i4OBITEykt\nLXUv5eXlbN26FahNuC3tP2DAAIqKitx/W5bF0aNHiY2NJS4ujiNHjnD+/Pkm97366qtZuXIl48eP\np6CgwMt33D6UoEVExGsFBQXs2LGDyspKunXrRvfu3QkNbfnWGKNGjaJnz54888wznDlzhvPnz5Of\nn8/u3buB1keAT5s2jddff50dO3ZQXV3NkiVL6N69O6NHj+Y73/kOMTExPPLII3z99decPXuWXbt2\nXbD/9OnTWbRoETfffDOHDx/2rQJspAQtIiJeq6ysZMGCBURFRRETE8PJkyd58skngYuvXa7/OzQ0\nlK1bt7J3714GDhxIVFQUd911l3s0dVMt6IbPXXXVVaxZs4Z58+YRFRXF66+/zpYtWwgLCyM0NJQt\nW7ZQWFhIfHw8cXFxvPbaaxeVMWvWLB577DHGjBnDkSNH7KsgH9g6m5WugxYRaRtP78Xt9BuViGcC\nN5uViIjYQsmz47M1QQfDrdn0IRcRESeyNUHnkWdn8T5LIy3QIYiIiDTJ1nPQthTsYGqNi4ivNB90\n5xK4c9B5zm5B+1WaWuMiIuI/akH7kY5uRcRXakF3Lra0oI0xtwMvNHgq3bKsvzbcJqvB49S6JVAM\nSqAi4jwulwuXyxXoMMSBvG5BG2MuA/o1eKrEsqyzDdY7LhsqQYuI06kF3bnY0oK2LOsr4KsWN8r2\ntvT/7K8PpIhIcHK5XGRmZnL06NFAhxKUdKMSEZEgpDuJdXz2Juhs34vw54dQHzYR6VDsvFJGV6YE\nnM2TZVgOWkRExN9CQkIumBFqzpw5LFy48KLtFi9ezNSpUy947r777mP+/PkArF69mkGDBhEeHs7A\ngQN59dVXgdr5m1NSUoiMjCQqKorp06e79z9w4ABjx46lT58+DBkyhA0bNrjXbdu2jWHDhhEeHk5s\nbCxLlizx6/tuDzZ3cTv/Vp8iIuI/zc3lPHPmTB5//HHKysqIiIjg3LlzrF+/nu3bt1NRUcH999/P\n7t27SUpK4sSJE3zxxRcALFy4kPT0dN5++22qqqrcU1JWVFQwduxYnnjiCd58800++ugjxo4dyzXX\nXMOQIUOYO3cuf/zjH7nxxhspKytz9LSSzbG1BW1ZlqMWERGxX1O/tzExMSQnJ7tbudu3b6dv376M\nHDkSqG2Jf/zxx5w5c4bo6GiGDh0KQNeuXfn0008pLi6ma9eujB49GoCtW7eSmJjI7NmzCQkJYcSI\nEdxyyy3uqSW7du3K/v37KS8vJyIiwv06wUTzQYuISLuYPXs2a9asAWDNmjXMmjULgEsvvZT169fz\nwgsv0L9/fyZMmMDBgwcBeOaZZ7Asi1GjRjF8+HBWrVoFQFFREe+//z69evVyL6+++ionTpwA4E9/\n+hPbtm0jISGB1NRU3nvvvQC8Y98oQYuIiNcuueQSvv76a/ffx48fb3Zw7+TJk/noo4/Iz8/n9ddf\n5/bbb3evGzduHLm5uXz22WcMGTKEO++8E4Do6GhWrFhBcXExL774Ivfccw+HDh0iPj6elJQUSktL\n3cvp06dZvnw5ANdeey2bNm3i888/Z8qUKdx222021oI9lKBFRMRrI0aMYO3atZw/f57t27fzzjvv\nNLttjx49uPXWW5kxYwbXXXcdsbGxAPz73/9m8+bNVFRU0KVLFy699FJCQ0MB2LBhA8eOHQMgMjIS\nYwyhoaFMmDCBgoIC1qxZQ3V1NdXV1Xz44YccOHCA6upq1q5dS1lZGaGhofTs2dNdXjBRghYREa/9\n5je/YcuWLe4u5h/+8IcXrG/cmp49ezb5+flkZma6n6upqeG5555jwIAB9OnTh507d/Lb3/4WgN27\nd3P99dfTs2dPJk+ezLJly0hISOCyyy4jNzeXdevWMWDAAGJiYliwYAFVVVVAbRd6YmIiERERrFix\ngrVr19pcE/5n62QZGpglItI2nt7qM1hvVHL06FGGDBnCiRMnuOyyy/xefrAJ3HSTIiJii2BsANXU\n1LBkyRIyMjKUnD2gBC0iIrarqKggOjqaxMREtm/fHuhwgoK6uEVEHESzWXUuLXVxa5CYiIiIA9na\nxd0egxhEJLioFSjiGVsTdB42zrQiIkEnDc2QJOIpW89B21KwiAQ1taBbpnPQnUvgLrOyc65SkWCS\nlqbEJCJtokFiIiIiDqQubhEJGPUqXKyj30lMLhSwLu6sBo9T6xYREQBd41HL5XLhcrm82tfOgbga\n0Bd4akGLiLSgvVuRbWlB252gPXnvq1atYuPGjeTk5ACQlJTEyJEjee211wCIi4tj69atrFy5ko0b\nN1JWVkZSUhJLly7lpptuAiA7O5v8/HzCwsLYtm0bSUlJrFq1im984xu2vT+nCNwgsWxbSxcRsVd2\noANwvtTUVB588EEASkpKqK6u5r333gPg8OHDVFRU8M1vfpNRo0aRnZ1NREQES5cuZdq0aRQVFdG1\na1cAcnJyWLduHWvXrmXp0qVMmTKFgoICwsI67x2pfW5BG2PmACuBBMuyjjR4Xi1oEQl6akG3Lj4+\nns2bN3Pw4EHy8vLYt28fL730Ert27WLz5s1s2rTpon169+7N22+/zTXXXEN2dja5ubns2rULqK3z\nAQMG8Nprr7lb2R2V3S3oRGA/cOziVcrRIuJPRgOXHCglJQWXy0VhYSEpKSlERkby9ttv87e//Y2U\nlBQAnn32WVauXElJSQnGGMrLyzl58qS7jNjYWPdjYwyxsbEcP3683d+Lk/gjQY8HfmpZVs3FqzQM\nRESko0tJSSEnJ4dPP/2UX/7yl0RGRrJmzRree+895s2bx86dO1m8eDE7duxg2LBhQG0LuuHB1tGj\nR92Pa2pqOHbsGP3792/39+IkPidoy7JGtbDO1+JFRMThUlJSeOCBB4iJiaF///5cdtllzJw5k5qa\nGkaOHMkbb7xBWFgYffv2paqqiqeeeory8vILytizZw8bN25k4sSJLFu2jO7du3P99dcH6B05g25U\nIiIiPklKSqJnz54kJycDEB4ezqBBg7jxxhsxxpCenk56ejpXXnklCQkJ9OjRg/j4ePf+xhgmT57M\n+vXr6d27N2vXruXPf/4zoaGhgXpLjqD5oEVEHKQz3qjk8ccfp7CwkFdeeaVdXs9JAneZlYiI2KIj\nNYA60nvxJ3Vxi4hIQBlj2qVHINioi1tExEE03WTn0lIXt1rQIiIiDqQELSIi4kBK0CIiIg6kBC0i\nIuJAtl5m1RFG5WlQhog4RUf4TRXP+SVBG2NWA/+yLOvxhs/bOdNKe9CE5SLiFM2N9JWOy19d3Baa\nukpERMRv/NnFfdHRXUdogTbVpaRubxERsZs/E/TFWSsvuLu4m5QW/AcdIiLifLbeScyWgh1ELWkR\n8beW7iwlnYuto7izGjxOrVs6Cn17RMQfXC4XLpcr0GGIA6kF3Ump9S/iTGpBSz17p5vMtrV08VZ2\noAMQEZHWqAXdSakFLeJMakFLPXtb0LZcGm2UXEREpMPTvbhFREQcyOYWtHppREREvGFrglZXtIiI\niHfUxS0iIuJAStAiIiIOpAQtIiLiQErQIiIiDqQELSIi4kBK0CIiIg6kBC0iIuJAStAiIiIOpAQt\nIiLiQF7fScwYsxr4l2VZjxtjaoAEy7KONNrGx/DEiXSHOBER+/lyq0+LVqaryiPPh+LFidJIC3QI\nIiKdgi8JutXmsX7MRUREvONrC7qpx/95Ul2hIiIiXjF2JVFjjLKzdEg68BQ7GWOwLEsDeMTe6Saz\nGjxOrVtEgpl+NcXfXC4XLpcr0GGIA6kFLdJGakGLndSClnq2tqD1QyYiIuId3ahERETEgWxtQetG\nJcFBPR0iIs5ja4Ju5T4m4gg6iBIRcSKbE7R+/EVERLyhQWIiIiIOpEFiIiIiDqQELSIi4kBK0CIi\nIg6kBC0iIuJAStAiIiIOpAQtIiLiQErQIiIiDqQELSIi4kBK0CIiIg7kc4I2xuQbY7KaWdcui4iI\nSEfjj1t9WjQzK0YeeX4ovmVppNn+GiIiIu3N1ntxK3mKiIh4x97ZrPL81IJOS9PEGyIi0qlokJiI\niIgD+aMF3fworTR1cYuIiHjDX4PEmtRwaHdq3WLQPNEiIvVcLhculyvQYYgDGbuSpTHG8VlYBwoi\n4jTGGCzL0vWj4nsL2hjzF+DPlmUtv2hltq+l2yg70AGIiIg0zx9d3AOBPk2uyfZD6SIiIp2QzV3c\n3pZt1P0sIp2Surilnr3XQbcwwFtERESaZ2uCVitYRETEO53qRiVOvJRBMXnOiXEpJs8oJpG2U4IO\nMMXkOSfGpZg8o5hE2q5TJWgREZFgoQQtIiLiQJ36TmIiIk6ky6wEbEzQIiIi4j11cYuIiDiQErSI\niIgDKUGLiIg4kEcJ2hhzjzHmX8aYM8aY3caYm1rZ/hpjzNvGmK+NMceMMQub2CbFGLOnrsxDxpj/\n1dbg/R2XMSbVGFPTxHKlHTEZY7oZY1YbY/YZY6qMMXnNbOdTXfk7pgDUU6oxZrMxpsQYU1EX24+b\n2K4966nVmAJQT0ONMXnGmM8a1MF/G2O6NNquXb97nsTV3nXVaL8kY8xpY8zpJta122fKk5j8UU8S\nRCzLanEBfgRUAXOBq4BlwGkgrpntw4HPgHXAUOBWoBx4sME2iUAF8Ju6Mn9S9xq3tBaPzXGlAjXA\nEKBfgyXEppguAX5b9/43Ajua2ManurIppvaupwXAr4AbgATgbqAayAhgPXkSU3vX0yBgFnANEAdM\nrPvMLw7wd8+TuNq1rhrs1xXYA2wFygP53fMwJp/qSUtwLa1vAO8DLzZ6rgBY1Mz2/xs4BXRr8Nwv\ngWMN/n4aONhov/8H7PI4cHviqv/w9/GqMtsYU6Pt/i+Q18TzPtWVTTEFrJ4abL8e+KMT6qmFmJxQ\nT79uWAeB+O55GFdA6gp4Dvg9MBs43WhdQD5TrcTkUz1pCa6lxS5uY0xX4FtAbqNVucDoZna7Adhp\nWVZlo+37G2OuaLBNU2Vea4wJbSkmm+Oqt7uu6/J/jDGprcXjQ0ye8LqubIypXiDrKQL4ssHfTqin\nxjHVC0g9GWMGA99vVEYgvnuexFWv3erKGPNfwH8B82h66r12/0x5EFO9NteTBJ/WzkH3BUKBE42e\n/zdweTP7XN7E9icarAOIbmabsLrXbI1dcZVQ21V5S91yEPiLh+eNvInJE77UlV0xBbSejDETgDHA\nigZPB7Sdg+MPAAADGUlEQVSemokpIPVkjNlljDlDbWvtfcuyshusDsR3z5O42rWujDH9qf1f3W5Z\n1tfNlNuunykPY/KlniTI2DHdpFPvfNJqXJZlFVD741HvPWNMAvAQ8K49YQWfQNaTMeZGYC0wz7Ks\n3Xa+lqeaiymA9XQbcBkwAlhsjHnGsqxf2Ph6nmo2rgDU1SvAby3L+tCGsr3Vakz6jepcWmtBnwTO\nU3sk2VA0cLyZfT7j4iPE6AbrWtrmXN1rtsauuJryAZBkU0ye8KWu7IqpKbbXU10rYRuw0LKsFxut\nDkg9tRJTU2yvJ8uyjlmWdcCyrHXAI8D9DbpkA/Hd8ySupthZV2lAljGm2hhTDfwOuLTu75/UbdPe\nnylPYmqKp/UkQabFBG1ZVhW1ownHNVo1FtjVzG5/A5KNMd0abV9sWVZRg23GNlHmh5ZlnW8taBvj\nasoIaruV7IjJE17XlY0xNcXWejLGfJfaRJhlWdayJjZp93ryIKamtPfnKZTa73n9dz0Q3z1P4mqK\nnXU1HPhmg+Ux4Ezd4z/WbdPenylPYmqKR/UkQai1UWTUdktVUnupwNXUXnJQTt2lAsCTwP802D6c\n2iPEPwDDqD1PUgY80GCbBOArakcrXk3t5QuVwA89Hd1mU1zzgcnUHo0OqyujBphiR0x1zw2l9gu2\nDviQ2i/jCH/VlU0xtWs9UTtytYLaUbXR1LZqLgeiAlVPHsbU3vWUCUyl9hKcgXX7HwPWBPi750lc\n7f7da7T/HC4eMd2unykPY/KpnrQE1+LZRrWXKP0LOEvtD/ZNDdatAg432n448Da1R3/F1Hb/NS7z\nu9QeYZ4FDgF3tTl4P8dF7XmcAuBr4Iu6bdNtjulfdV+wGmq7xGqA8/6sK3/H1N71VPf3+QYx1S+N\n4263evIkpgDU0/S6919O7fW2+dR2JXdrVGa7fvc8iau966qJfefQ6JrjQHz3WovJH/WkJXgWzWYl\nIiLiQLoXt4iIiAMpQYuIiDiQErSIiIgDKUGLiIg4kBK0iIiIAylBi4iIOJAStIiIiAMpQYuIiDiQ\nErSIiIgD/X9zx531pAILaAAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = punctuation_normalised[punctuation_normalised.sum(axis=1) > 0.1].plot(kind='barh',fontsize=14)\n", "ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualising the punctuation\n", "Let's print the punctuation sets side-by-side to compare them." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "?'\"\"!...,,.--,,,\",--.\"\".\"\",.\"-,'\",\"...,\" .',,,.!..!,.,!....,,?...'..,,.?.-,.?!!,.\n", "\"\".\"\"\"..,..\",.\"\",'.\".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\" ...',...!\",.\",\",\"'..?\"\".,\",,\",,,.,..?.?\"\n", ".\"\".\",;,\"\".'''''''''''',''''\".\",-,.--,.' \",\".\",\",,\"?.\",..\",\",.,',.',..,,(),:\".?.\"\n", ",--',,,\",.\"\".\"..-''..,,.,,\"',..\",\".\"\",.\" .\",\",--\".?',.',\".\".,'.\".\"',\".\".\"\"';.?\"()\n", "..',,\"\",\"\",....\"\"-\"\".,..,,\",,..\"\".,.,,.- ,\"'....\",,..\"?\".\",.,..\",.\"?,\",.\"...'!\",.\n", ",-.,,.-,,,,,.,,,,.\"\".\"\",.\"\".\",.,.\"\",.,,, .\"?\".\",',.,.\",,,.\",\",,\"?,\",\",?\":\"'....?\"\n", ".,\",.\",\".\"\".\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\",.,,\"\" \"..-,'.',..;,.--'.\"\",,\",'.\"-,.'.\",',,,.\"\n", "\",.,..\"\",\"\".,,.\"\";\".\"\",\".\",-,\"\"\",,\"..\"\", ,\",,\"',',,.''.\"'.:.',,';.,,*.,,.',,,..*.\n", "\",.\":,,-,.\"\",\".,.--.\"\".\"!.-!,!-!-!-!,,,, ,\",\"\"?\",,;,'.,,;.,,,\",,.\",,.-.,,,,.,----\n", "\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\", -.,,,.,,,,.,,.,,,,,.\",\",.\",,,\",.\",.\"-,-,\n", "!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\"\"\",\"\"....\"\"\"\"\".\"\".\"\" ,.\",,,\".\",\",,\"?.?\",,,.\"!\".,-,,-,,.,-'.,,\n", ".\",.\"\"\"\"\"\"\"\"\"\"-,...\"\"\"\"...,.,.,-\"\"\"\"\"\"\"\" ..-,,,.,,,.\",,,\",.,.,.','.:\"?.\"\",,.\"\"?\".\n", ".\"\",\".\",.,,\"\".\"\"\"\".\"\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\" .,.,,'.\",\".,,.,,,,,,-,,,..,.,'.,,,.-..,.\n", ".\",,.'\".\"\".\"\"\"\"\",-,,.-,\",.\"-'\".',.'.,,,. '..,,,.,,.,,,',,,..-,..',,,.'.','.\",!,,\"\n", ",,,,.,,,,,,,..,-,--,,.',.,-,.,.\",\",,,.\"\" ,:\",.\".,.\",\".\",\".\",\";,-..\",,,\",..--.,,,,\n", "\"'.,\"\"'.,,\"\";.,.'..,..,,,..-,,,.,-.,.\",, ,,,,----.,.,,.\"!\";.\",,\",...,,,,.,,',..\",\n", ".,,,,,,\"\"\".\",'..-,.,,,.,.,.,,,,..,..-,., '....,?\",.\".\",,.\",\",,.,,.,.',,-,,,,-,.,,\n", ",.\"...,?,,?,.,.,'.,.,,\"\"\".\",.,,,-.,,.\",- ,,.\"'?\",,.\",,\",.\",\".-.-,,,,..,,';,',,'.,\n", ",,,..,.,,'','&',..'\",,-,,.'.,,.'.,',''\", .,;.\"!\".\"!\",.,,,,.,,'-,.,.\"...,\".\"----,!\n", ".,.,.'.',''-,.\".',...,...,,,.''.''.!'''. \"\"?\".,,.',,.\",,,,\".,,,.:,'.,,,.,,,,,.,.,\n" ] } ], "source": [ "line_len = 40\n", "for i in range(5,25):\n", " print(sherlock['punctuation'][line_len*i:line_len*(i+1)], wap['punctuation'][line_len*i:line_len*(i+1)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, now I know it's working, wrap it in a function." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compare(text1, text2, offset=0, line_len=40, max_lines=30):\n", " for i in range(offset, min(max(len(text1), len(text2)), line_len * max_lines), line_len):\n", " t1 = text1[i:i+line_len]\n", " t1 += (' ' * (line_len - len(t1)))\n", " print(t1, text2[i:i+line_len])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,., !!\",.,,,,.,,.,,,,,.\",,.',\",.\"??\".\",?\"\"'?\n", "'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\" .,\".\".\"\"'..\"\",,\",,-,.\"'!,'?.\"\"?\",.\"?,.\",\n", "\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",-- .,,.,,.,,,,,,,,.:\",'.',,,.!..!,.,!....,,\n", ".\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\" ?...'..,,.?.-,.?!!,....',...!\",.\",\",\"'..\n", ".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".''''''''' ?\"\".,\",,\",,,.,..?.?\"\",\".\",\",,\"?.\",..\",\",\n", "''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''. .,',.',..,,(),:\".?.\".\",\",--\".?',.',\".\".,\n", ".,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\" '.\".\"',\".\".\"\"';.?\"(),\"'....\",,..\"?\".\",.,\n", ".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\" ..\",.\"?,\",.\"...'!\",..\"?\".\",',.,.\",,,.\",\"\n", ".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\" ,,\"?,\",\",?\":\"'....?\"\"..-,'.',..;,.--'.\"\"\n", "\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\". ,,\",'.\"-,.'.\",',,,.\",\",,\"',',,.''.\"'.:.'\n", "\"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--. ,,';.,,*.,,.',,,..*.,\",\"\"?\",,;,'.,,;.,,,\n", "\"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\" \",,.\",,.-.,,,,.,-----.,,,.,,,,.,,.,,,,,.\n", ",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\" \",\",.\",,,\",.\",.\"-,-,,.\",,,\".\",\",,\"?.?\",,\n", "\"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\" ,.\"!\".,-,,-,,.,-'.,,..-,,,.,,,.\",,,\",.,.\n", "\"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" ,.','.:\"?.\"\",,.\"\"?\"..,.,,'.\",\".,,.,,,,,,\n", "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,, -,,,..,.,'.,,,.-..,.'..,,,.,,.,,,',,,..-\n", ".-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--, ,..',,,.'.','.\",!,,\",:\",.\".,.\",\".\",\".\",\"\n", ",.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'.., ;,-..\",,,\",..--.,,,,,,,,----.,.,,.\"!\";.\"\n", "..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,., ,,\",...,,,,.,,',..\",'....,?\",.\".\",,.\",\",\n", ",,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,. ,.,,.,.',,-,,,,-,.,,,,.\"'?\",,.\",,\",.\",\".\n", ",,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\" -.-,,,,..,,';,',,'.,.,;.\"!\".\"!\",.,,,,.,,\n", ",,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',.. '-,.,.\"...,\".\"----,!\"\"?\".,,.',,.\",,,,\".,\n", ".,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,., ,,.:,'.,,,.,,,,,.,.,.,',.\",?\".\",\",',\"--.\n", ".,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,- ...\"\",?\"\".\"\"?\"\",\",,\"!\".,,,.,,'.\"!...,,?\"\n", "\"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\". .\",\".\".?\".\",!\",'.,.\",,\",.\"',.,\",.,,,,.,,\n", "..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'- .\",\".\",\".':\"!..\".'..-.\",?\",.\"'..\",,,.\",?\n", ",-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,. \".\",,,\",\"...\",,,..,,''.',,;'.\",,\".\",';'-\n", ",.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'.. -,\".\",',!.,\",.\",,\",.,,.,,,.',,.;.,----,,\n" ] } ], "source": [ "compare(sherlock['punctuation'], wap['punctuation'], offset=100)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "..;,,',.'...,;,,.,.,,;,,',',,',,.,;',,,, ,,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,.,\n", ".,,,,,.;,,-,',,;'.,;,,.',;':.!,,;,.,,',' '.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\"\n", ";;',';,,'.?,',',,;,,,,,.,;,--,.,,;,;,.,, \",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",--\n", "',,,,.,:,?,:,..,!??,.!,,;,,!'.,!'.,!'.,, .\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\"\n", ",,,,,,,,,,,'!':.',:,,,,'.:,,.,,:;.,,,.', .\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".'''''''''\n", "'-,,,,,.!',,',,',,,,,-.,.,.!??.:!-!'',,. ''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''.\n", ":!,,,;,,'.,,'.!,'.,.!.,.!.,.,.,,.,:!:;., .,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\"\n", "':!,,'.,.-,',,',''.,-,,;,.,:;!,:'.,.,:,! .,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\"\n", "'!;?;;,',,.,,.,,'.';:,'.,';'',';,',.':-; .\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\"\n", ",:,.?,,.',,,-.,,;,.,,,.,,.,,.,..,..,.,,. \".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\".\n", "?,?,.:,;,.:.,,.:'.!.',';.,-.,..??.,,;.': \"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--.\n", ",.,.',',!'',,;,!',;;,.,.,.,.,'.,.,.,';,' \"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\"\n", ";,,':,,.'?,,,.,..,';,,',.',,;.,'.,,,;;'. ,!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\"\n", ";-;,';,-;,.,.?,.-,-,--,-,.,-.,;,,,-,,,:; \"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\"\n", ",,.,.,.;.;;.'.;,-.,!?,,,,,,,,,';,:;;,,:- \"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\"\n", ",'.,:';.-.;,,';;,;,,,,.,,,,;,,-.,':;,,;; \",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,,\n", "-,?,,:?,';.,-,:',;,,'.,,-;,,'',;;,,.,,!. .-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--,\n", ".!,.!.,:.,!?,;',,,.,?,,,',,.,,,?,?,,?:,' ,.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'..,\n", ",,,,,,,,,'.,,,';,,:',,':,;'',:::,,,,::-, ..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,.,\n", "',,.,,,,,,,.,:.;.?,.;.:,,,',',;'-;,,,,,, ,,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,.\n", ",,.,,;,.?,'.,,;,,.,..,!,.,:.,.',',,-'.., ,,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\"\n", ",,':,'-,;''',,,-.':,-,',,--.;':.;.'.',:, ,,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',..\n", ",,,,,,:,,'.?,.,.?',.'';,,.!,.,-:,:,.??,, .,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,.,\n", "?.;,,,:,,,,;,,.,,?,..,,;.:,;,:,?',..,';, .,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,-\n", ";;:,.:;,,.,,,,.,!.,;'.',.,:,,.?,.,.,.,-, \"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\".\n", "-,:,';',:',.,::;..,..,,.,;,,;-,-,,,,.;,. ..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'-\n", "..,-,;,-,;.,,:,,;,,:,,,;,,..;,-',!,;,.,, ,-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,.\n", "&.,!..,-;:,,,,,',..,;,:',,,.,:,.;,,,.,;, ,.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'..\n", ",,.!,,,'.,;;.-,,,,.:,.,,;,,;,,.'!,,,;!:! .,,\"\",\"\"...,.?,.,..\"\"\"\"!\"\"\"\"\"\"\"\"\"\"\"\"...\"\n", "''!,,'.!?:,,;,,.!-,-.,.',:;.,,.,,.!?.,:. -..,,,.,,,-,.,,,,.;,.,-,,.,,;,.\"\".\"\".\",,\n", "!.,.,';.?:,'.,,;;,,.'??!!??,.,,,..!,.?!. '\".\"'\"\"'\".\"''.,,.,'\"\"'.,\"\".-..,\",.,,.,,.\n", ",;.?!?:.!:.,:?',.;,,;,,;'''.??',',,,,,', .,-.,,..,..\"\",-,,--\"\".,.,',..\",\".\".\"\"\"\"\"\n", "?,,,,.:.!','..,;.,;,:,,,!,,,.,,!.,!!,:,. \"\"\",\"\"\"\"\",.\".,,.\"\"\"\",,.,,,.\"\"\"\"..?-,.,.,\n", "!!'?!!!??,?!?,;,!.!:'.?,;'.,--;.,,?.,,.? ,,\"\"\"\"-.,.,.,.;-.-.....-.,-.,,,,.,,,.;,.\n", "',.,.:.;,,;,,,,,:.,,.,:.?,.,:,!,;-,.,.,, -\"\"\".\".-,,.-,.\"\"\"\".,.,,.\".:\"-,\",.\"'\",.\",\n", "';,,,',',','',',',',',,:.,:,:;',.,.,,;,, \".,.\"\",.\"\"\"\"\"\"\",.\"\"\"\",\"\"\".\"\".\"!\"\"\"\"\"\"\"\"\"\n", ".?,!;,.,.,-,,;.;,,,.,:;,,..,?;,,-,;,.,., \"\"\"\"\".,.,'\"\".-!!\",.,..\".,\".\".\",.\"!.:\"\"\",\n", ",',.,:,;.-,?!;';,.,.,.,,,.:,.,!,.''!?,,; .\"\"\"\"\"\".\"\"\"\"-,.,,.-,,,,.,\",.\".:\".--..,.,\n", ",,.,-,,,,,',,'.'','.,;:,.,:;'.'!,.,,.!!. ,....,,.,,.,,...,,,,-,,.\",,..,,-,.\",;-.,\n", ",!,!!',',,,,:',,,,;,,,,,,,,,,.?.,!??-,?, ...,.;,.,\",\",\"\"-,\",.\"??\"\"\".\"'\"\",\";\"..\"\"\"\n", "!!.:;,.,:,,.-,,-,,.?,,,-,,,;,?,''?,,:';; \"..-\".\"\".\"\"\"\".\"'\".\",\"\"..-\",,,.,.'.,.,,..\n", "',,,,,,.,,:,,--.,.,..,;,.:,'.;;:,;',,,:. -,.,,-,.,.\",\".\"\"\".\"\"\"\".,.,,\",-.\"\",,.\",,.\n", "-!!!!....?;,;,,,.--,-,-',,;:,,.,!!!!,:'. ,,,\"\"\".\",,,\"\"\"\",,,.,.,.,,,.,.,.,...,,.,,\n", ".,:,.,?-.,,,,.-,.,,?-.-,:,--.,.,-.,;.,;, \".,,,,.,,.,,,.',--,,,.-.,,,.',.\",,,,,\".,\n", ",.',.';,,..,!-?.,,,,,'.-,,',;':,.,,-,,', ,.\",-,,.\".\",,.','\"\",..,\"\",,,\"\"',,,--\"\",,\n", ",;,,,,,'';,.,,;;,,.,;,,''..''-,?,',;,',' .\"\",\"\",\"\"..'.,,-,\"..\",\".\",,\"\",\",\".'',,,.\n", ".:.;.!?.;,,,.,',,..'?'',,'.';,,.',,';,,, ,.\"\",\".\"..,\":\"-:,,,..,.--,.,',,,',\"\"\".,.\n", ",.'??!,?.,!,!''.?'!!,,';','?!,,?;,,.':', \",'\".\",.,,,.,,\"\",.\"\".,.\"\",,.\",;\"',.',.,;\n" ] } ], "source": [ "compare(shakespeare['punctuation'], sherlock['punctuation'], offset=100, max_lines=50)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,., ?,,:--?!,.--,,.--,?--?.--,'?..'.,!..,,.'\n", "'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\" .,:,-..--,.?--!.?--,.'..',....--!.,',:--\n", "\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",-- ...,,:--'!:.,'?,.--!.':?..__.,,!.._!_!..\n", ".\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\" ..--!.'.--,.''.--,.--,,,,.'........--!.,\n", ".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".''''''''' !,,.,,-.,,.,,,,,,,.....--,!..?--,..--,..\n", "''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''. .,.'.',.'.--,.'.--',..'...--,,...'.!...-\n", ".,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\" -,,!,...?..--',.....,'.--,.!,:--.-.',.--\n", ".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\" ',,?.....--!.''..,,..'..--.'.'???''.'.:,\n", ".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\" .,!,!!,,'.'.'!'!.,,',........--,.'.--?..\n", "\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\". '.?,..--?.--,?.'.'.,.,,:--'?:--??'..??--\n", "\"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--. ,,...--?.?.--,,_,'._'.--?.??.--,,'?..'.'\n", "\"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\" .'.?,'.'...'.''.!.'..,,:--.--?.--,..--,!\n", ",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\" ..,..,,.:--,?--',.:--.?,,..,:--',.'..:_'\n", "\"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\" ._.,..,.,..,,.,.':,.:...,:'.?:,,,..:_._,\n", "\"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" :._._...,,.'.,,,,,,.,,....,.._:._!!,!.--\n", "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,, !'.,.,',.--,,...'.--',,.--,',...--.'.,?,\n", ".-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--, .--,.--?.??.--,.--,.'..,:_,',,!,!,'!_.,,\n", ",.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'.., .?,?,,,.....',.:,.--',.,,?.,.--?.--,.,'!\n", "..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,., ,:--!--',,.,,.,...,.--',,...,!!,!,,.,..,\n", ",,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,. ,.'?,,'...--?..--,.'.--,!..:--.--!,....,\n", ",,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\" '.,:--_._.--',.,,,,'?,,':--,..--,,.:--_,\n", ",,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',.. ,_._,',_,_'._,.--',,,...,:--,,'?--,.--?.\n", ".,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,., ,?--,,.,,.'.--!,.?!,,:_--'.,..._..--,!--\n", ".,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,- ,',.,.'.--',,..--?,.,!.--,,.--,?.--,.,..\n", "\"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\". .,.,.,,.,.,.,,,.,:.--,',,.--,,..--,,'.,,\n", "..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'- '.--,?.--,',.--,..,,:.','','..--?.--,?.,\n", ",-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,. .--,.?--,,.,?--,.--',,.--,,''.''.--,..,.\n", ",.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'.. ,'?--,,,,.:--?,,'?.--,?,.,'.',.,,.--,,.,\n" ] } ], "source": [ "compare(sherlock['punctuation'], ulysses['punctuation'], offset=100)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",,,..\"\".\",,\"\"\".\",.,,.,.\"\",\"\",.,\"\"\",\".,., .,\",\"!.\"\"?\"\"!,!__,.\"\".,,,,..\"\",.__,.-,.\"\n", "'.,,,,,\",.\"\";\",,..,,,-.,,,-,,,\".\"\",\",.\"\" \",.\"\",,..\"\",.\"\"..,,,,.,__.\"\"-,..;;.\"\".;,\n", "\",,.\",..,\"\"\"\"\"\",\"\"\"\"?'\"\"!...,,.--,,,\",-- -.__.\"\",\";\";.\"\".,__?..\"\",....\"\",.\"\",.\"\",\n", ".\"\".\"\",.\"-,'\",\"...,\"\"\".\"\"\"..,..\",.\"\",'.\" ,.\"\",,,.\".,,,,--.__.,,.,.;....,;..,:\".,.\n", ".\"\"-\".\".\",\"\"\"\"\"\"\"\"\"\".\"\".\",;,\"\".''''''''' \"\"__.,\",\".\"\",,\",\",..\"\"...,,.\"\",\".;\".\".,,\n", "''',''''\".\",-,.--,.',--',,,\",.\"\".\"..-''. ,.\"',,'!..\"\",\";\".\"\",\".\",?\"\"-.\"\",,\",\".;,.\n", ".,,.,,\"',..\",\".\"\",.\"..',,\"\",\"\",....\"\"-\"\" \"\",,,.__.\"\",.,,;?\"\".'..__;,.;,,,,.\"..,\",\n", ".,..,,\",,..\"\".,.,,.-,-.,,.-,,,,,.,,,,.\"\" !\"\"?\".\",,?__.,?,,.\",.\",\",\"..\"\".,\".\"__;?.\n", ".\"\",.\"\".\",.,.\"\",.,,,.,\",.\",\".\"\".\"\",\";.\"\" ;,.\";.;,,.\",.!..,!,,.\"\",,,\".;,,,.\",!\",.\"\n", "\".\"\"\"\".\",\"\"\".\",.,,\"\"\",.,..\"\",\"\".,,.\"\";\". ;,,.,,;,.,,__,..\"\"!\",\";__,'.\".',..,,,,..\n", "\"\",\".\",-,\"\"\",,\"..\"\",\",.\":,,-,.\"\",\".,.--. --,,;,-,...,,,,,.!;.'.\",\".,\",.\"..',.,;.,\n", "\"\".\"!.-!,!-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\" ,.;.,..,,,,...;,.;..,,--.--.,,,..-;,,.,.\n", ",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\" --,.,;.,,,,,.,.,,;;,;,,..;,,,..!..,,,..,\n", "\"\",\"\"....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\"\"\"\"\"-,...\" ,..,.,,;,..,,.\",,\",\"...\"\".,..,.\"\",\".,\"!,\n", "\"\"\"...,.,.,-\"\"\"\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" ;.\"\"__,\".,.\"!!,,..\"\"?\",,:\",__;.,.\"...;.,\n", "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,.'\".\"\".\"\"\"\"\",-,, ,;,,.....,.,.'.;,.,,,,...;.';.\"!.,\",\",..\n", ".-,\",.\"-'\".',.'.,,,.,,,,.,,,,,,,..,-,--, ,.;.,!__,;!.,.!,,;,,;.,,.,,,,__--\"\"__,\",\n", ",.',.,-,.,.\",\",,,.\"\"\"'.,\"\"'.,,\"\";.,.'.., \"!',.!\"\"!,.!...'--\"...,,,..\",\",\"__;,,.!,\n", "..,,,..-,,,.,-.,.\",,.,,,,,,\"\"\".\",'..-,., ,!!,,-..\",,.,.\",\",\",-,;!--,!\"\",\",\",..\"\".\n", ",,.,.,.,,,,..,..-,.,,.\"...,?,,?,.,.,'.,. .\"\"?..__,__.?..,,..\"\"!\"\"!,,....\"\";.\"\";__\n", ",,\"\"\".\",.,,,-.,,.\",-,,,..,.,,'','&',..'\" .__,!--.--',--.',,?.\"\"--..,;.\",;;,,.;,,.\n", ",,-,,.'.,,.'.,',''\",.,.,.'.',''-,.\".',.. ,,,,,,.;'..,,..,;,,,.;,,--.,,..,.,----,,\n", ".,...,,,.''.''.!'''.',,,,''\"-,,,,,,.,,., .,.,,,,.',,.,.,.,,,,-,..,..;;,;;,,.,,,,.\n", ".,,.,-\"\"\";\"\"\",.,.,,,,.''..,\"\"\"\"\",.\",.,,- ,..--,,.,..,,..,;,,,,,,,.,,;,.,,,.'.,...\n", "\"\"\"\"'\"\"\"\"\"\"\"\",\"\"\"\"\"\"\"\"..\",\",...,,,.\"\"\"\". ,,,-,'.;.\"__,,\".-.\"__.'.\"\";.\"\"!,,.__--__\n", "..,.\"\"\"\"....-.\"\"\"\",\"\"\"\"--,,,.\"\"\"\"\",-.\"'- ------..\"\".;?.',,__?:'!,;.'\"\"!,----,,,.\"\n", ",-..,.,.\"\",,,,,\"\"\"\"\"\".,,\"-.,,,,...,,.,,. \"____,,\".\".,?--!--__.\"\"'-,,..--.\"\",'?--?\n", ",.,,.',.,,.,-,-,-.\"\",,\".-..,.,\"\",\"\"..'.. \".\"..\"\"--,;.\"\",\",\",.__.\"\",.,..;,.,.\"\".,\"\n" ] } ], "source": [ "compare(sherlock['punctuation'], pap['punctuation'], offset=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare more than two texts at a time" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compare_many(*texts, offset=0, line_len=100, gap=' ', max_lines=30):\n", " def padded_segment(text, start, length):\n", " segment = text[start:start+segment_len]\n", " segment += (' ' * (segment_len - len(segment)))\n", " return segment\n", " segment_len = line_len // len(texts) - len(gap)\n", " max_len = min(max(len(text) for text in texts), segment_len * max_lines)\n", " for i in range(offset, max_len, segment_len):\n", " segments = [padded_segment(text, i, segment_len) for text in texts]\n", " print(gap.join(segments))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "..-.......'.........,,,., ,.,-..:::,[#]:,:,]::***** ,.,-..::::,[#]:,:******,/\n", ",,.,.-'..,-,.,,...,-,,,,, *,,.,,.\".,\",\"?\"..\",\";\".,. :::::-:-:-:-:::::::-:-:\",\n", ",,,.,,,,.:,,.,,,.-,-(,.-, \"..\"?\".\"__,.\".\",,,.;,,.;, ,.,',----,','!?--.\",,-,.,\n", ",,,.,,,,.,,.,,..-...;,,., .\"\"?\"\".\"\"?\"\"!,,!;.!\"\"??\"\" ,..,,;.,.,,-,:\",(),,--.\"\"\n", ",,,..\"\".\",,\"\"\".\",.,,.,.\"\" .,\",\"!.\"\"?\"\"!,!__,.\"\".,,, !!\",.,,,,.,,.,,,,,.\",,.',\n", ",\"\",.,\"\"\",\".,.,'.,,,,,\",. ,..\"\",.__,.-,.\"\",.\"\",,..\" \",.\"??\".\",?\"\"'?.,\".\".\"\"'.\n", "\"\";\",,..,,,-.,,,-,,,\".\"\", \",.\"\"..,,,,.,__.\"\"-,..;;. .\"\",,\",,-,.\"'!,'?.\"\"?\",.\"\n", "\",.\"\"\",,.\",..,\"\"\"\"\"\",\"\"\"\" \"\".;,-.__.\"\",\";\";.\"\".,__? ?,.\",.,,.,,.,,,,,,,,.:\",'\n", "?'\"\"!...,,.--,,,\",--.\"\".\" ..\"\",....\"\",.\"\",.\"\",,.\"\", .',,,.!..!,.,!....,,?...'\n", "\",.\"-,'\",\"...,\"\"\".\"\"\"..,. ,,.\".,,,,--.__.,,.,.;.... ..,,.?.-,.?!!,....',...!\"\n", ".\",.\"\",'.\".\"\"-\".\".\",\"\"\"\"\" ,;..,:\".,.\"\"__.,\",\".\"\",,\" ,.\",\",\"'..?\"\".,\",,\",,,.,.\n", "\"\"\"\"\".\"\".\",;,\"\".''''''''' ,\",..\"\"...,,.\"\",\".;\".\".,, .?.?\"\",\".\",\",,\"?.\",..\",\",\n", "''',''''\".\",-,.--,.',--', ,.\"',,'!..\"\",\";\".\"\",\".\",? .,',.',..,,(),:\".?.\".\",\",\n", ",,\",.\"\".\"..-''..,,.,,\"',. \"\"-.\"\",,\",\".;,.\"\",,,.__.\" --\".?',.',\".\".,'.\".\"',\".\"\n", ".\",\".\"\",.\"..',,\"\",\"\",.... \",.,,;?\"\".'..__;,.;,,,,.\" .\"\"';.?\"(),\"'....\",,..\"?\"\n", "\"\"-\"\".,..,,\",,..\"\".,.,,.- ..,\",!\"\"?\".\",,?__.,?,,.\", .\",.,..\",.\"?,\",.\"...'!\",.\n", ",-.,,.-,,,,,.,,,,.\"\".\"\",. .\",\",\"..\"\".,\".\"__;?.;,.\"; .\"?\".\",',.,.\",,,.\",\",,\"?,\n", "\"\".\",.,.\"\",.,,,.,\",.\",\".\" .;,,.\",.!..,!,,.\"\",,,\".;, \",\",?\":\"'....?\"\"..-,'.',.\n", "\".\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\", ,,.\",!\",.\";,,.,,;,.,,__,. .;,.--'.\"\",,\",'.\"-,.'.\",'\n", ".,,\"\"\",.,..\"\",\"\".,,.\"\";\". .\"\"!\",\";__,'.\".',..,,,,.. ,,,.\",\",,\"',',,.''.\"'.:.'\n", "\"\",\".\",-,\"\"\",,\"..\"\",\",.\": --,,;,-,...,,,,,.!;.'.\",\" ,,';.,,*.,,.',,,..*.,\",\"\"\n", ",,-,.\"\",\".,.--.\"\".\"!.-!,! .,\",.\"..',.,;.,,.;.,..,,, ?\",,;,'.,,;.,,,\",,.\",,.-.\n", "-!-!-!,,,,\"\".-\"\"\"\"\"\"\"\"\"\". ,...;,.;..,,--.--.,,,..-; ,,,,.,-----.,,,.,,,,.,,.,\n", ",\"\"\"\",!\"\"-\"\"\"\"\"\"\"\"\"\"\"\"\"\", ,,.,.--,.,;.,,,,,.,.,,;;, ,,,,.\",\",.\",,,\",.\",.\"-,-,\n", "!!\"\"-\"\"\"\"..\"\"\"\"\"\".\"\"\"\",\"\" ;,,..;,,,..!..,,,..,,..,. ,.\",,,\".\",\",,\"?.?\",,,.\"!\"\n", "....\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\" ,,;,..,,.\",,\",\"...\"\".,.., .,-,,-,,.,-'.,,..-,,,.,,,\n", "\"\"\"\"-,...\"\"\"\"...,.,.,-\"\"\" .\"\",\".,\"!,;.\"\"__,\".,.\"!!, .\",,,\",.,.,.','.:\"?.\"\",,.\n", "\"\"\"\"\".\"\",\".\",.,,\"\".\"\"\"\".\" ,..\"\"?\",,:\",__;.,.\"...;., \"\"?\"..,.,,'.\",\".,,.,,,,,,\n", "\",\"\"\"\"\"\"\"\"\".\"\".-.\"'\".\",,. ,;,,.....,.,.'.;,.,,,,... -,,,..,.,'.,,,.-..,.'..,,\n", "'\".\"\".\"\"\"\"\",-,,.-,\",.\"-'\" ;.';.\"!.,\",\",..,.;.,!__,; ,.,,.,,,',,,..-,..',,,.'.\n" ] } ], "source": [ "compare_many(sherlock['punctuation'], pap['punctuation'], wap['punctuation'], line_len=80)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",,,..\"\".\",,\"\"\".\",.,,., <> .,\",\"!.\"\"?\"\"!,!__,.\"\". <> !!\",.,,,,.,,.,,,,,.\",,\n", ".\"\",\"\",.,\"\"\",\".,.,'.,, <> ,,,,..\"\",.__,.-,.\"\",.\" <> .',\",.\"??\".\",?\"\"'?.,\".\n", ",,,\",.\"\";\",,..,,,-.,,, <> \",,..\"\",.\"\"..,,,,.,__. <> \".\"\"'..\"\",,\",,-,.\"'!,'\n", "-,,,\".\"\",\",.\"\"\",,.\",.. <> \"\"-,..;;.\"\".;,-.__.\"\", <> ?.\"\"?\",.\"?,.\",.,,.,,.,\n", ",\"\"\"\"\"\",\"\"\"\"?'\"\"!...,, <> \";\";.\"\".,__?..\"\",....\" <> ,,,,,,,.:\",'.',,,.!..!\n", ".--,,,\",--.\"\".\"\",.\"-,' <> \",.\"\",.\"\",,.\"\",,,.\".,, <> ,.,!....,,?...'..,,.?.\n", "\",\"...,\"\"\".\"\"\"..,..\",. <> ,,--.__.,,.,.;....,;.. <> -,.?!!,....',...!\",.\",\n", "\"\",'.\".\"\"-\".\".\",\"\"\"\"\"\" <> ,:\".,.\"\"__.,\",\".\"\",,\", <> \",\"'..?\"\".,\",,\",,,.,..\n", "\"\"\"\".\"\".\",;,\"\".''''''' <> \",..\"\"...,,.\"\",\".;\".\". <> ?.?\"\",\".\",\",,\"?.\",..\",\n", "''''',''''\".\",-,.--,.' <> ,,,.\"',,'!..\"\",\";\".\"\", <> \",.,',.',..,,(),:\".?.\"\n", ",--',,,\",.\"\".\"..-''.., <> \".\",?\"\"-.\"\",,\",\".;,.\"\" <> .\",\",--\".?',.',\".\".,'.\n", ",.,,\"',..\",\".\"\",.\"..', <> ,,,.__.\"\",.,,;?\"\".'.._ <> \".\"',\".\".\"\"';.?\"(),\"'.\n", ",\"\",\"\",....\"\"-\"\".,..,, <> _;,.;,,,,.\"..,\",!\"\"?\". <> ...\",,..\"?\".\",.,..\",.\"\n", "\",,..\"\".,.,,.-,-.,,.-, <> \",,?__.,?,,.\",.\",\",\".. <> ?,\",.\"...'!\",..\"?\".\",'\n", ",,,,.,,,,.\"\".\"\",.\"\".\", <> \"\".,\".\"__;?.;,.\";.;,,. <> ,.,.\",,,.\",\",,\"?,\",\",?\n", ".,.\"\",.,,,.,\",.\",\".\"\". <> \",.!..,!,,.\"\",,,\".;,,, <> \":\"'....?\"\"..-,'.',..;\n", "\"\",\";.\"\"\".\"\"\"\".\",\"\"\".\" <> .\",!\",.\";,,.,,;,.,,__, <> ,.--'.\"\",,\",'.\"-,.'.\",\n", ",.,,\"\"\",.,..\"\",\"\".,,.\" <> ..\"\"!\",\";__,'.\".',..,, <> ',,,.\",\",,\"',',,.''.\"'\n", "\";\".\"\",\".\",-,\"\"\",,\"..\" <> ,,..--,,;,-,...,,,,,.! <> .:.',,';.,,*.,,.',,,..\n", "\",\",.\":,,-,.\"\",\".,.--. <> ;.'.\",\".,\",.\"..',.,;., <> *.,\",\"\"?\",,;,'.,,;.,,,\n", "\"\".\"!.-!,!-!-!-!,,,,\"\" <> ,.;.,..,,,,...;,.;..,, <> \",,.\",,.-.,,,,.,-----.\n", ".-\"\"\"\"\"\"\"\"\"\".,\"\"\"\",!\"\" <> --.--.,,,..-;,,.,.--,. <> ,,,.,,,,.,,.,,,,,.\",\",\n", "-\"\"\"\"\"\"\"\"\"\"\"\"\"\",!!\"\"-\" <> ,;.,,,,,.,.,,;;,;,,..; <> .\",,,\",.\",.\"-,-,,.\",,,\n", "\"\"\"..\"\"\"\"\"\".\"\"\"\",\"\"... <> ,,,..!..,,,..,,..,.,,; <> \".\",\",,\"?.?\",,,.\"!\".,-\n", ".\"\"\"\"\".\"\".\"\".\",.\"\"\"\"\"\" <> ,..,,.\",,\",\"...\"\".,.., <> ,,-,,.,-'.,,..-,,,.,,,\n", "\"\"\"\"-,...\"\"\"\"...,.,.,- <> .\"\",\".,\"!,;.\"\"__,\".,.\" <> .\",,,\",.,.,.','.:\"?.\"\"\n" ] } ], "source": [ "compare_many(sherlock['punctuation'], pap['punctuation'], wap['punctuation'], \n", " gap=' <> ', offset=100, line_len=80)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Making images\n", "The text versions are fine, but let's turn the punctuation into images, with a coloured square for each punctuation character." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start with just trying to get something out" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Periods and question marks and exclamation marks are red. \n", "# Commas and quotation marks are green. \n", "# Semicolons and colons are blue. \n", "colours = {'.': (255, 0, 0), '?': (255, 0, 0), '!': (255, 0, 0),\n", " ',': (0, 255, 0), '\"': (0, 255, 0), \"'\": (0, 255, 0),\n", " ':': (0, 0, 255), ';': (0, 0, 255),\n", " 'unknown': (128, 128, 128)}\n", "max_x = 1000\n", "max_y = 400\n", "block_size = 4\n", "text = sherlock['punctuation']\n", "img = Image.new('RGBA', (max_x, max_y))\n", "draw = ImageDraw.Draw(img)\n", "x = 0\n", "y = 0\n", "i = 0\n", "# for i in range(100):\n", "# if text[i] in colours:\n", "# this_colour = colours[text[i]]\n", "for p in text:\n", " if p in colours:\n", " this_colour = colours[p]\n", " else:\n", " this_colour = colours['unknown']\n", " draw.rectangle((x, y, x+block_size, y+block_size), fill=this_colour)\n", " x += block_size\n", " if x >= max_x:\n", " x = 0\n", " y += block_size\n", "img.save('test.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The image: \n", "![alt text](test.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rearrange the colours to match the \"heatmaps\" in [the original](https://medium.com/@neuroecology/punctuation-in-novels-8f316d542ec4#.qwj8e1n8m), and wrap the whole thing in a function." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Periods and question marks and exclamation marks are red. \n", "# Commas and quotation marks are -green- blue. \n", "# Semicolons and colons are -blue- green. \n", "def make_image(text, block_size=4, width=1000, colours=None):\n", " default_colours = {'.': (255, 0, 0), '?': (255, 0, 0), '!': (255, 0, 0),\n", " ',': (0, 0, 255), '\"': (0, 0, 255), \"'\": (0, 0, 255),\n", " ':': (0, 255, 0), ';': (0, 255, 0),\n", " 'unknown': (128, 128, 128)}\n", " if not colours:\n", " colours = {}\n", " use_colours = default_colours.copy()\n", " use_colours.update(colours)\n", " height = ceil((len(text) * block_size) / width)\n", " img = Image.new('RGBA', (width, height))\n", " draw = ImageDraw.Draw(img)\n", " x = 0\n", " y = 0\n", " for p in text:\n", " if p in use_colours:\n", " this_colour = use_colours[p]\n", " else:\n", " this_colour = use_colours['unknown']\n", " draw.rectangle((x, y, x+block_size, y+block_size), fill=this_colour)\n", " x += block_size\n", " if x >= width:\n", " x = 0\n", " y += block_size\n", " return img" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [], "source": [ "i = make_image(sherlock['punctuation'])\n", "i.save('sherlock.png')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": true }, "outputs": [], "source": [ "i = make_image(wap['punctuation'], block_size=6, colours={'-': (255,255,255)})\n", "i.save('wap.png')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [], "source": [ "i = make_image(wap['punctuation'], colours={'-': (255,255,255), '(': (255, 165, 0), ')': (255, 165, 0)})\n", "i.save('wap.png')" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "i = make_image(shakespeare['punctuation'])\n", "i.save('shakespeare.png')" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": true }, "outputs": [], "source": [ "i = make_image(ulysses['punctuation'], colours={'-': (255,255,255), '(': (255, 165, 0), ')': (255, 165, 0)})\n", "i.save('ulysses.png')" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "i = make_image(pap['punctuation'])\n", "i.save('pap.png')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Sherlock: \n", "![alt text](sherlock.png)\n", "\n", "War and Peace:\n", "![alt text](wap.png)\n", "\n", "Shakespeare:\n", "![alt text](shakespeare.png)\n", "\n", "Ulysses:\n", "![alt text](ulysses.png)\n", "\n", "Pride and Prejudice:\n", "![alt text](pap.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3+" } }, "nbformat": 4, "nbformat_minor": 0 }