Imported all the notebooks
[tm351-notebooks.git] / notebooks / 22. Data Mining II / 22.d. Data Mining II- Silhouette Coefficients.ipynb
1 {
2 "metadata": {
3 "name": "",
4 "signature": "sha256:a77c882790ae13478cd1764723d80cf70ed56674f967c3165f22ed1397ae0ad6"
5 },
6 "nbformat": 3,
7 "nbformat_minor": 0,
8 "worksheets": [
9 {
10 "cells": [
11 {
12 "cell_type": "heading",
13 "level": 1,
14 "metadata": {},
15 "source": [
16 "Section 22.d. Data mining II: Silhouette Coefficients"
17 ]
18 },
19 {
20 "cell_type": "markdown",
21 "metadata": {},
22 "source": [
23 "In this notebook, we will investigate how silhouette coefficients are used to select an appropriate value $k$ for $k$-means clustering."
24 ]
25 },
26 {
27 "cell_type": "code",
28 "collapsed": false,
29 "input": [
30 "import numpy as np\n",
31 "import matplotlib.pyplot as plt\n",
32 "\n",
33 "import pandas as pd\n",
34 "\n",
35 "%pylab inline\n",
36 "\n",
37 "np.set_printoptions(precision=2, linewidth=100)"
38 ],
39 "language": "python",
40 "metadata": {},
41 "outputs": [
42 {
43 "output_type": "stream",
44 "stream": "stdout",
45 "text": [
46 "Populating the interactive namespace from numpy and matplotlib\n"
47 ]
48 },
49 {
50 "output_type": "stream",
51 "stream": "stderr",
52 "text": [
53 "/Users/agw96/virtualenv_environments/tm351/lib/python3.3/site-packages/pandas/io/excel.py:626: UserWarning: Installed openpyxl is not supported at this time. Use >=1.6.1 and <2.0.0.\n",
54 " .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver))\n"
55 ]
56 }
57 ],
58 "prompt_number": 1
59 },
60 {
61 "cell_type": "heading",
62 "level": 2,
63 "metadata": {},
64 "source": [
65 "Silhouette Coefficient"
66 ]
67 },
68 {
69 "cell_type": "markdown",
70 "metadata": {},
71 "source": [
72 "The following function definition implements the silhouette coefficient as defined in section 22.6 of the module notes. That is:\n",
73 " \n",
74 "$\\hspace{5em}s(i)=\\frac{b(i)-a(i)}{max(a(i), b(i))}$\n",
75 "\n",
76 "where $a(i)$ is the average distance of data instance $i$ to data instances within a cluster, and $b(i)$ is the average distance of data instance $i$ to data instances in other clusters. \n",
77 "\n",
78 "The function is called with two variables: `silhouette(X,cIDX)`. The input data instances, `X` are a numpy array, and `cIDX` is the resulting cluster labels of the data instances.\n"
79 ]
80 },
81 {
82 "cell_type": "code",
83 "collapsed": false,
84 "input": [
85 "from scipy.spatial.distance import pdist, squareform\n",
86 "from matplotlib import cm\n",
87 " \n",
88 "def silhouette(X, cIDX):\n",
89 " \"\"\"\n",
90 " Computes the silhouette score for each instance of a clustered dataset,\n",
91 " which is defined as:\n",
92 " s(i) = (b(i)-a(i)) / max{a(i),b(i)}\n",
93 " with:\n",
94 " -1 <= s(i) <= 1\n",
95 "\n",
96 " Args:\n",
97 " X : A M-by-N array of M observations in N dimensions\n",
98 " cIDX : array of len M containing cluster indices (starting from zero)\n",
99 "\n",
100 " Returns:\n",
101 " s : silhouette value of each observation\n",
102 " \"\"\"\n",
103 "\n",
104 " N = X.shape[0] # number of instances\n",
105 " K = len(np.unique(cIDX)) # number of clusters\n",
106 "\n",
107 " # compute pairwise distance matrix\n",
108 " D = squareform(pdist(X))\n",
109 "\n",
110 " # indices belonging to each cluster\n",
111 " kIndices = [np.flatnonzero(cIDX==k) for k in range(K)]\n",
112 "\n",
113 " # compute a,b,s for each instance\n",
114 " a = np.zeros(N)\n",
115 " b = np.zeros(N)\n",
116 " for i in range(N):\n",
117 " # instances in same cluster other than instance itself\n",
118 " a[i] = np.mean( [D[i][ind] for ind in kIndices[cIDX[i]] if ind!=i] )\n",
119 " # instances in other clusters, one cluster at a time\n",
120 " b[i] = np.min( [np.mean(D[i][ind]) for k,ind in enumerate(kIndices) if cIDX[i]!=k] )\n",
121 " s = (b-a)/np.maximum(a,b)\n",
122 "\n",
123 " # plot\n",
124 " order = np.lexsort((-s,cIDX))\n",
125 " indices = [np.flatnonzero(cIDX[order]==k) for k in range(K)]\n",
126 " ytick = [(np.max(ind)+np.min(ind))/2 for ind in indices]\n",
127 " ytickLabels = [\"%d\" % x for x in range(K)]\n",
128 " cmap = cm.jet( np.linspace(0,1,K) ).tolist()\n",
129 " clr = [cmap[i] for i in cIDX[order]]\n",
130 "\n",
131 " fig = plt.figure()\n",
132 " ax = fig.add_subplot(111)\n",
133 " ax.barh(range(X.shape[0]), s[order], height=1.0, edgecolor='none', color=clr)\n",
134 " ax.set_ylim(ax.get_ylim()[::-1])\n",
135 " plt.yticks(ytick, ytickLabels)\n",
136 " plt.xlabel('Silhouette Coefficient')\n",
137 " plt.ylabel('Cluster')\n",
138 " \n",
139 " return s"
140 ],
141 "language": "python",
142 "metadata": {},
143 "outputs": [],
144 "prompt_number": 2
145 },
146 {
147 "cell_type": "markdown",
148 "metadata": {},
149 "source": [
150 "For example, we calculate Silhouette coefficient of the k-means clustering result with $k = 2$ for the example data given in Figure 22.3 using the `silhouette` function. The function generates a plot that shows Silhouette coefficient for each of the data instances in an order for each cluster and prints the Silhouette coefficient for each data instance. A higher Silhouette coefficient means a better clustered data instance. \n"
151 ]
152 },
153 {
154 "cell_type": "markdown",
155 "metadata": {},
156 "source": [
157 "First, reimplement the data from Figure 22.3 from the course notes:"
158 ]
159 },
160 {
161 "cell_type": "code",
162 "collapsed": false,
163 "input": [
164 "\n",
165 "# Data\n",
166 "data = pd.DataFrame({'Attribute 1':[4, 2, 1, 1, 4, 5, 8, 9, 6, 8],\n",
167 " 'Attribute 2':[8, 7, 5, 6, 5, 2, 3, 2, 1, 4]},\n",
168 " index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'])\n",
169 "data"
170 ],
171 "language": "python",
172 "metadata": {},
173 "outputs": [
174 {
175 "html": [
176 "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
177 "<table border=\"1\" class=\"dataframe\">\n",
178 " <thead>\n",
179 " <tr style=\"text-align: right;\">\n",
180 " <th></th>\n",
181 " <th>Attribute 1</th>\n",
182 " <th>Attribute 2</th>\n",
183 " </tr>\n",
184 " </thead>\n",
185 " <tbody>\n",
186 " <tr>\n",
187 " <th>A</th>\n",
188 " <td> 4</td>\n",
189 " <td> 8</td>\n",
190 " </tr>\n",
191 " <tr>\n",
192 " <th>B</th>\n",
193 " <td> 2</td>\n",
194 " <td> 7</td>\n",
195 " </tr>\n",
196 " <tr>\n",
197 " <th>C</th>\n",
198 " <td> 1</td>\n",
199 " <td> 5</td>\n",
200 " </tr>\n",
201 " <tr>\n",
202 " <th>D</th>\n",
203 " <td> 1</td>\n",
204 " <td> 6</td>\n",
205 " </tr>\n",
206 " <tr>\n",
207 " <th>E</th>\n",
208 " <td> 4</td>\n",
209 " <td> 5</td>\n",
210 " </tr>\n",
211 " <tr>\n",
212 " <th>F</th>\n",
213 " <td> 5</td>\n",
214 " <td> 2</td>\n",
215 " </tr>\n",
216 " <tr>\n",
217 " <th>G</th>\n",
218 " <td> 8</td>\n",
219 " <td> 3</td>\n",
220 " </tr>\n",
221 " <tr>\n",
222 " <th>H</th>\n",
223 " <td> 9</td>\n",
224 " <td> 2</td>\n",
225 " </tr>\n",
226 " <tr>\n",
227 " <th>I</th>\n",
228 " <td> 6</td>\n",
229 " <td> 1</td>\n",
230 " </tr>\n",
231 " <tr>\n",
232 " <th>J</th>\n",
233 " <td> 8</td>\n",
234 " <td> 4</td>\n",
235 " </tr>\n",
236 " </tbody>\n",
237 "</table>\n",
238 "</div>"
239 ],
240 "metadata": {},
241 "output_type": "pyout",
242 "prompt_number": 3,
243 "text": [
244 " Attribute 1 Attribute 2\n",
245 "A 4 8\n",
246 "B 2 7\n",
247 "C 1 5\n",
248 "D 1 6\n",
249 "E 4 5\n",
250 "F 5 2\n",
251 "G 8 3\n",
252 "H 9 2\n",
253 "I 6 1\n",
254 "J 8 4"
255 ]
256 }
257 ],
258 "prompt_number": 3
259 },
260 {
261 "cell_type": "markdown",
262 "metadata": {},
263 "source": [
264 "Next, run the k-means algorithm on the data with $k=2$. This will show how well the clustering works for two clusters:"
265 ]
266 },
267 {
268 "cell_type": "code",
269 "collapsed": false,
270 "input": [
271 "import sklearn.cluster as sc\n",
272 "\n",
273 "# Run k-means algorithm on the data (k=2)\n",
274 "kmeans = sc.KMeans(n_clusters=2)\n",
275 "assigned_clusters = kmeans.fit(data).labels_\n",
276 "\n",
277 "silhouette(data, assigned_clusters)"
278 ],
279 "language": "python",
280 "metadata": {},
281 "outputs": [
282 {
283 "metadata": {},
284 "output_type": "pyout",
285 "prompt_number": 4,
286 "text": [
287 "array([ 0.51, 0.69, 0.62, 0.69, 0.32, 0.41, 0.68, 0.66, 0.58, 0.58])"
288 ]
289 },
290 {
291 "metadata": {},
292 "output_type": "display_data",
293 "png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFZBJREFUeJzt3XtwVPX9//HXwoaKgBEr7a9DkOUSbkl2E4KlUCgXpwSw\nODi0XDpDwSLTkUFKq06n41RoazsyhWlVpko7FKkFbAVqBLnNlCyiaEEEIgYEpomNoaCE0BTCPe/f\nHw77JZDNnlxONuHzfMw4s2c5ez6vPS4vj2fPfk7AzEwAgFtem2QHAAA0DwofABxB4QOAIyh8AHAE\nhQ8AjqDwAcARwWQOHgj8P0knkxkBAFqdSCSi/fv31/t1gWRehx8IBCQtTNbwTaBA0qhkh2gE8idX\na87fmrNLLTm/2YKE6wQCATWkujmlAwCOoPABwBEUfqOEkh2gkULJDtBIoWQHaKRQsgM0QijZARop\nlOwASUHhN0qPZAdoJPInV2vO35qzS60/f8NQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4A\nOILCBwBHUPgA4AgKHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+ADiCwgcARwSTHQAAXODl5uR+4wgf\nABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgKHwAcQeEDgCMofABwBIUPAI6g8AHA\nERQ+ADiCwgcAR1D4AOAICh8AHOFr4W/ZskX9+vVTenq6Fi1a5OdQAIAEfCv8q1evau7cudqyZYuK\nioq0Zs0aHTp0yK/hAAAJ+Fb4u3fvVu/evRUKhZSSkqKpU6cqPz/fr+EAAAn4VvhlZWXq1q1bbDkt\nLU1lZWV+DQcASCDo14YDgYDHNQuuexyS1KPpwwC45ZktSHYE30SjUUWj0UZvx7fC79q1q0pLS2PL\npaWlSktLq2XNUX5FAIBbwsiRIzVy5MjY8s9//vMGbce3UzqDBg3S0aNHVVJSokuXLumvf/2rHnjg\nAb+GAwAk4NsRfjAY1NKlS5WXl6erV69q1qxZ6t+/v1/DAQASCJiZJW3wQEDSwmQND+AWciufw79R\nIBBQQ6qbX9oCgCMofABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCO\noPABwBEUPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgK\nHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPAB\nwBEUPgA4gsIHAEdQ+ADgCAofABwRTHYAAJLZgmRHgAM4wgcAR1D4AOAICh8AHEHhA4AjKHwAcESd\nhX/16lX99re/ba4sAAAf1Vn4bdu21erVq5srCwDARwmvwx82bJjmzp2rKVOmqEOHDrHnBw4c6Gsw\nAEDTCpiZ1bXCyJEjFQgEbnq+oKCg8YMHApIWNno7QGvHD69QH4FAQAmqu1YJj/Cj0WhD8gAAWpiE\nV+mcOHFCs2bN0tixYyVJRUVFWr58ue/BAABNK2Hhz5w5U2PGjNHx48clSenp6Vy5AwCtUMLCP3Xq\nlKZMmaK2bdtKklJSUhQMMucaALQ2CQu/Y8eOKi8vjy2/++67Sk1N9TUUAKDpJTxUX7JkiSZMmKB/\n/etfGjp0qD777DOtXbu2ObIBAJpQwsLPyMjQjh079NFHH8nM1LdvX1VXVzdHNgBAE0p4Smfo0KFK\nSUlRZmamsrKy1K5dOw0dOrQ5sgEAmlDcI/z//Oc/On78uKqqqvT+++/LzBQIBFRZWamqqqrmzAgA\naAJxC3/btm166aWXVFZWpsceeyz2fKdOnfTrX/+6WcIBAJpOwqkV1q1bp0mTJvkzOFMrAJKYWgH1\n09CpFRKewy8tLVVlZaXMTLNmzdLAgQO1devWBoUEACRPwiP8cDiswsJCbd26VS+++KJ++ctfavr0\n6dq3b1/jBw8EOL4HAA8WXFfVvh3hX9voG2+8oenTpyszM7PegwAAki9h4efm5mrMmDHatGmT8vLy\nVFlZqTZtuDMiALQ2CU/pVFdXa9++ferVq5fuvPNOlZeXq6ysTOFwuPGDc0oHADxpilM6CX9pu3Pn\nTgUCARUWFtZ74wCAliNh4f/mN7+J3fHqwoUL2r17t3Jzc7V9+3bfwwEAmk7Cwt+4cWON5dLSUv3w\nhz/0LRAAwB/1/vY1LS1Nhw4d8iMLAMBHCY/wH3300djj6upq7d+/X7m5ub6GAgA0vYSFf325B4NB\nTZs2TcOGDfM1FACg6SW8LNPXwbksEwA88fWyzKysrLgv4jJNAGh94hb++vXrdfLkSaWlpdV4vrS0\nVF/5yld8DwYAaFpxr9KZP3++UlNTFQqFavyTmpqqH/3oR82ZEQDQBOIW/smTJ2s9rRMOh1VcXOxr\nKABA04tb+GfOnIn7ogsXLvgSBgDgn7iFP2jQIP3hD3+46fk//vGPXIcPAK1Q3MsyT5w4oQcffFDt\n2rWLFfzevXt18eJF/f3vf2+SL265LBMAvGmKyzLrvA7fzFRQUKCDBw8qEAgoIyNDo0ePblja2gan\n8AHAE98L328UPgB40yy3OAQA3BoofABwBIUPAI5IOFsmAKD5LfDh61WO8AHAERQ+ADiCwgcAR1D4\nAOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8A\njvC18L///e/ry1/+srKysvwcBgDgga+F/9BDD2nLli1+DgEA8MjXwh8+fLg6d+7s5xAAAI84hw8A\njqDwAcARwWQHKLjucUhSjyTlAIDrLTBLdoSYaDSqaDTa6O0EzPx9VyUlJZowYYI++OCDmwcPBLTQ\nz8EBoIFaUuHfKBAIqCHV7espnWnTpmno0KE6cuSIunXrphUrVvg5HACgDr6e0lmzZo2fmwcA1ANf\n2gKAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgKHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+\nADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcISvtzgE4IaWfMNv/B+O8AHAERQ+ADiCwgcAR1D4AOAI\nCh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDw\nAcARFD4AOILCBwBHUPgA4AgKHwAcEUx2AMBvC8ySHQFoETjCBwBHUPgA4AgKHwAcQeEDgCMofABw\nBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4gsIHAEdQ\n+ADgCAq/EYqTHaCRyJ9c0Wg02REarDVnl1p//oai8BuhJNkBGqkk2QEaqSTZARqpNZdOa84utf78\nDUXhA4AjKHwAcETAzCxZg2dnZ+vAgQPJGh4AWqVIJKL9+/fX+3VJLXwAQPPhlA4AOILCBwBHNEvh\nb9myRf369VN6eroWLVpU6zrz5s1Tenq6IpGI9u3b1xyxPEuU//DhwxoyZIhuu+02LVmyJAkJ65Yo\n/6pVqxSJRBQOh/X1r39dhYWFSUhZu0TZ8/PzFYlElJOTo9zcXG3fvj0JKePz8tmXpD179igYDGr9\n+vXNmC6xRPmj0ahSU1OVk5OjnJwcPf3000lIGZ+X/R+NRpWTk6PMzEyNHDmyeQMmkCj/4sWLY/s+\nKytLwWBQZ86cib9B89mVK1esV69eVlxcbJcuXbJIJGJFRUU11nnjjTds3LhxZmb27rvv2uDBg/2O\n5ZmX/J9++qnt2bPHnnzySVu8eHGSktbOS/5du3bZmTNnzMxs8+bNLWb/e8l+9uzZ2OPCwkLr1atX\nc8eMy0v+a+uNGjXK7r//flu7dm0SktbOS/6CggKbMGFCkhLWzUv+iooKGzBggJWWlpqZ2WeffZaM\nqLXy+vm5ZsOGDXbffffVuU3fj/B3796t3r17KxQKKSUlRVOnTlV+fn6NdV5//XXNmDFDkjR48GCd\nOXNGJ0+e9DuaJ17yd+nSRYMGDVJKSkqSUsbnJf+QIUOUmpoq6fP9/8knnyQj6k28ZO/QoUPs8dmz\nZ3X33Xc3d8y4vOSXpOeff17f/va31aVLlySkjM9rfmuh1314yb969WpNmjRJaWlpktQqPz/XrF69\nWtOmTatzm74XfllZmbp16xZbTktLU1lZWcJ1WkrpeMnfktU3//LlyzV+/PjmiJaQ1+yvvfaa+vfv\nr3Hjxum5555rzoh18vrZz8/P1yOPPCJJCgQCzZqxLl7yBwIB7dq1S5FIROPHj1dRUVFzx4zLS/6j\nR4/q9OnTGjVqlAYNGqSXX365uWPGVZ+/u1VVVdq6dasmTZpU5zaDTZqwFl4/wDceJbSUD35LydFQ\n9clfUFCgP/3pT3r77bd9TOSd1+wTJ07UxIkTtXPnTk2fPl0fffSRz8m88ZJ//vz5euaZZxQIBGRm\nLepo2Uv+gQMHqrS0VLfffrs2b96siRMn6siRI82QLjEv+S9fvqz3339f//jHP1RVVaUhQ4boa1/7\nmtLT05shYd3q83d3w4YNGjZsmO6888461/O98Lt27arS0tLYcmlpaex/n+Kt88knn6hr165+R/PE\nS/6WzGv+wsJCzZ49W1u2bFHnzp2bM2Jc9d33w4cP15UrV1ReXq4vfvGLzRGxTl7y7927V1OnTpUk\nnTp1Sps3b1ZKSooeeOCBZs1aGy/5O3XqFHs8btw4zZkzR6dPn9Zdd93VbDnj8ZK/W7duuvvuu9W+\nfXu1b99e3/jGN3TgwIEWUfj1+fy/8sorCU/nSPL/S9vLly9bz549rbi42C5evJjwS9t33nmnxXxp\naOYt/zULFixocV/aesn/8ccfW69eveydd95JUsraecl+7Ngxq66uNjOzvXv3Ws+ePZMRtVb1+eyY\nmc2cOdPWrVvXjAnr5iX/iRMnYvv/n//8p3Xv3j0JSWvnJf+hQ4fsvvvusytXrti5c+csMzPTPvzw\nwyQlrsnr5+fMmTN21113WVVVVcJt+n6EHwwGtXTpUuXl5enq1auaNWuW+vfvr2XLlkmSfvCDH2j8\n+PHatGmTevfurQ4dOmjFihV+x/LMS/4TJ07o3nvvVWVlpdq0aaNnn31WRUVF6tixY5LTe8v/i1/8\nQhUVFbHzyCkpKdq9e3cyY0vyln3dunX685//rJSUFHXs2FGvvPJKklP/Hy/5WzIv+deuXasXXnhB\nwWBQt99+e6vb//369dPYsWMVDofVpk0bzZ49WwMGDEhy8s95/fy89tprysvLU/v27RNuk6kVAMAR\n/NIWABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCR5P41a9+pczMzNhUxXv27JEkzZ49W4cPH5Yk\nhUIhnT59WiUlJcrKyvI1z8cff6w1a9bElg8cOKDNmzfXeztHjhzR+PHj1adPH+Xm5mrKlCn69NNP\nG5TpiSeeUGZmpn7yk5/o1KlTGjx4sHJzc/XWW2/p/vvvV2VlZdzXLlu2rMHzvNy4L+Cwpv99GFyz\na9cuGzJkiF26dMnMzMrLy+348eM3rRcKhay8vNyKi4stMzPT10wFBQX2rW99K7a8YsUKmzt3br22\ncf78eUtPT7eNGzfGnotGo3bw4MEGZUpNTY39KnXNmjX28MMPN2g79XXjvoC7KHw02vr16+POiT5i\nxAjbu3evmdUs/P79+9vs2bMtIyPDxowZY+fPnzczs3379tngwYMtHA7bgw8+aBUVFbHtvPfee2b2\n+ZzloVDIzD6fM/zxxx+3e++918LhsC1btszMzAYPHmypqamWnZ1tixYtsnvuuce6dOli2dnZ9re/\n/c3Onj1rDz30kH31q1+1nJwcy8/Pvyn78uXLbcaMGbW+r/Pnz9vMmTMtKyvLcnJyrKCgoM48EyZM\nsLZt296UJycnx86fP2/du3e38vJyMzNbuXKlhcNhi0Qi9r3vfc/Mak7bcezYMRs7dqzl5uba8OHD\n7fDhw2ZmNmPGDJs3b54NHTrUevbsGZtb//p98bvf/c7Lv1Lcoih8NNrZs2ctOzvb+vTpY3PmzLEd\nO3bE/mzkyJG1Fn4wGLQDBw6YmdnkyZPtL3/5i5mZZWVl2ZtvvmlmZk899ZTNnz//pu1cX/jLli2z\np59+2szMLly4YIMGDbLi4mKLRqM1jmpfeukle/TRR2PLP/3pT2NjVlRUWJ8+fezcuXM13tePf/xj\ne+6552p9z4sXL7ZZs2aZmdnhw4ftnnvusQsXLtSap6SkxMzMOnbsGDfPtX1z8OBB69OnT6z8r/0H\nb+HChbZkyRIzMxs9erQdPXrUzD6/YdDo0aPN7PPCnzx5spmZFRUVWe/evc3MbtoXcJfvc+ng1teh\nQwft3btXO3fuVEFBgaZMmaJnnnkmdlOb2vTo0UPhcFiSlJubq5KSElVWVuq///2vhg8fLkmaMWOG\nvvOd79Q59rZt2/TBBx9o7dq1kqTKykodO3ZMwWDNj7bdMPXwtm3btGHDBi1evFiSdPHiRZWWlqpv\n3743va42b7/9tubNmydJ6tu3r7p3764jR47Umufo0aPq3r17nXmuPbd9+3ZNnjw5NtvkjdPdnjt3\nTrt27aqxXy5duiTp8+l0J06cKEnq379/7CZC8d4D3EPho0m0adNGI0aM0IgRI5SVlaWVK1fWWfhf\n+MIXYo/btm2rCxcu3LTO9UUVDAZVXV0tSTetu3TpUn3zm9+s8Vw0Gq2xXNvc4uvXr69zGtyMjAzt\n2LEj7p/HK9La8two3lzn1+bFj6e6ulqdO3eOe9/ndu3aJcwHd3GVDhrtyJEjOnr0aGx53759CoVC\n9dqGmemOO+5Q586d9dZbb0mSXn755dhNpUOhkN577z1Jih09S1JeXp5+//vf68qVK7EsVVVVuuOO\nO/S///0vtl6nTp1qLOfl5dW4O1ZtBfrd735Xu3bt0qZNm2LPvfnmm/rwww81fPhwrVq1Kjbmv//9\nb/Xr1y9untre740CgYBGjx6tV199VadPn5YkVVRU1HhNp06d1KNHj9g+MLOEN52/8b3DXRQ+Gu3s\n2bOaOXOmMjIyFIlEdPjwYS1cuLDO19x4hHtteeXKlXriiScUiURUWFiop556SpL0+OOP64UXXtDA\ngQNVXl4eW//hhx/WgAEDNHDgQGVlZemRRx7R1atXFQ6H1bZtW2VnZ+vZZ5/VqFGjVFRUpJycHL36\n6qv62c9+psuXLyscDiszM1MLFiy4KeNtt92mjRs36vnnn1efPn2UkZGhF198UV/60pc0Z84cVVdX\nKxwOa+rUqVq5cqVSUlLi5rnxPQcCgZuWJWnAgAF68sknNWLECGVnZ+uxxx67aZ1Vq1Zp+fLlys7O\nVmZmpl5//fVa9+u1x5FIpMa+gLuYHhkAHMERPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8A\njqDwAcAR/x9bFW6trbXoGQAAAABJRU5ErkJggg==\n",
294 "text": [
295 "<matplotlib.figure.Figure at 0x105d2cb90>"
296 ]
297 }
298 ],
299 "prompt_number": 4
300 },
301 {
302 "cell_type": "heading",
303 "level": 3,
304 "metadata": {},
305 "source": [
306 "Exercise 22.5"
307 ]
308 },
309 {
310 "cell_type": "markdown",
311 "metadata": {},
312 "source": [
313 "Measure the cluster validity using the Silhouette coefficient for the k-means algorithm using $k = 2$ and $k=3$ for the English Premier League data. \n",
314 "\n",
315 "The data is reproduced here:"
316 ]
317 },
318 {
319 "cell_type": "code",
320 "collapsed": false,
321 "input": [
322 "premierLeague=pd.DataFrame([(68, 41), (39, 61), (32, 74), (71, 27),\n",
323 " (33, 48), (61, 39), (40, 85), (38, 53), \n",
324 " (101, 50), (102, 37), (64, 43), (43, 59), \n",
325 " (28, 62), (54, 46), (45, 52), (41, 60), \n",
326 " (54, 54), (55, 51), (43, 59), (40, 51)],\n",
327 " index=['Arsenal', 'Aston Villa', 'Cardiff City', 'Chelsea', \n",
328 " 'Crystal Palace', 'Everton', 'Fulham', 'Hull City', \n",
329 " 'Liverpool', 'Manchester City', 'Manchester United',\n",
330 " 'Newcastle United', 'Norwich City', 'Southampton',\n",
331 " 'Stoke City', 'Sunderland', 'Swansea City',\n",
332 " 'Tottenham Hotspur', 'West Bromwich Albion',\n",
333 " 'West Ham United'],\n",
334 " columns=['Goals for', 'Goals against'])\n",
335 "\n",
336 "premierLeague"
337 ],
338 "language": "python",
339 "metadata": {},
340 "outputs": [
341 {
342 "html": [
343 "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
344 "<table border=\"1\" class=\"dataframe\">\n",
345 " <thead>\n",
346 " <tr style=\"text-align: right;\">\n",
347 " <th></th>\n",
348 " <th>Goals for</th>\n",
349 " <th>Goals against</th>\n",
350 " </tr>\n",
351 " </thead>\n",
352 " <tbody>\n",
353 " <tr>\n",
354 " <th>Arsenal</th>\n",
355 " <td> 68</td>\n",
356 " <td> 41</td>\n",
357 " </tr>\n",
358 " <tr>\n",
359 " <th>Aston Villa</th>\n",
360 " <td> 39</td>\n",
361 " <td> 61</td>\n",
362 " </tr>\n",
363 " <tr>\n",
364 " <th>Cardiff City</th>\n",
365 " <td> 32</td>\n",
366 " <td> 74</td>\n",
367 " </tr>\n",
368 " <tr>\n",
369 " <th>Chelsea</th>\n",
370 " <td> 71</td>\n",
371 " <td> 27</td>\n",
372 " </tr>\n",
373 " <tr>\n",
374 " <th>Crystal Palace</th>\n",
375 " <td> 33</td>\n",
376 " <td> 48</td>\n",
377 " </tr>\n",
378 " <tr>\n",
379 " <th>Everton</th>\n",
380 " <td> 61</td>\n",
381 " <td> 39</td>\n",
382 " </tr>\n",
383 " <tr>\n",
384 " <th>Fulham</th>\n",
385 " <td> 40</td>\n",
386 " <td> 85</td>\n",
387 " </tr>\n",
388 " <tr>\n",
389 " <th>Hull City</th>\n",
390 " <td> 38</td>\n",
391 " <td> 53</td>\n",
392 " </tr>\n",
393 " <tr>\n",
394 " <th>Liverpool</th>\n",
395 " <td> 101</td>\n",
396 " <td> 50</td>\n",
397 " </tr>\n",
398 " <tr>\n",
399 " <th>Manchester City</th>\n",
400 " <td> 102</td>\n",
401 " <td> 37</td>\n",
402 " </tr>\n",
403 " <tr>\n",
404 " <th>Manchester United</th>\n",
405 " <td> 64</td>\n",
406 " <td> 43</td>\n",
407 " </tr>\n",
408 " <tr>\n",
409 " <th>Newcastle United</th>\n",
410 " <td> 43</td>\n",
411 " <td> 59</td>\n",
412 " </tr>\n",
413 " <tr>\n",
414 " <th>Norwich City</th>\n",
415 " <td> 28</td>\n",
416 " <td> 62</td>\n",
417 " </tr>\n",
418 " <tr>\n",
419 " <th>Southampton</th>\n",
420 " <td> 54</td>\n",
421 " <td> 46</td>\n",
422 " </tr>\n",
423 " <tr>\n",
424 " <th>Stoke City</th>\n",
425 " <td> 45</td>\n",
426 " <td> 52</td>\n",
427 " </tr>\n",
428 " <tr>\n",
429 " <th>Sunderland</th>\n",
430 " <td> 41</td>\n",
431 " <td> 60</td>\n",
432 " </tr>\n",
433 " <tr>\n",
434 " <th>Swansea City</th>\n",
435 " <td> 54</td>\n",
436 " <td> 54</td>\n",
437 " </tr>\n",
438 " <tr>\n",
439 " <th>Tottenham Hotspur</th>\n",
440 " <td> 55</td>\n",
441 " <td> 51</td>\n",
442 " </tr>\n",
443 " <tr>\n",
444 " <th>West Bromwich Albion</th>\n",
445 " <td> 43</td>\n",
446 " <td> 59</td>\n",
447 " </tr>\n",
448 " <tr>\n",
449 " <th>West Ham United</th>\n",
450 " <td> 40</td>\n",
451 " <td> 51</td>\n",
452 " </tr>\n",
453 " </tbody>\n",
454 "</table>\n",
455 "</div>"
456 ],
457 "metadata": {},
458 "output_type": "pyout",
459 "prompt_number": 5,
460 "text": [
461 " Goals for Goals against\n",
462 "Arsenal 68 41\n",
463 "Aston Villa 39 61\n",
464 "Cardiff City 32 74\n",
465 "Chelsea 71 27\n",
466 "Crystal Palace 33 48\n",
467 "Everton 61 39\n",
468 "Fulham 40 85\n",
469 "Hull City 38 53\n",
470 "Liverpool 101 50\n",
471 "Manchester City 102 37\n",
472 "Manchester United 64 43\n",
473 "Newcastle United 43 59\n",
474 "Norwich City 28 62\n",
475 "Southampton 54 46\n",
476 "Stoke City 45 52\n",
477 "Sunderland 41 60\n",
478 "Swansea City 54 54\n",
479 "Tottenham Hotspur 55 51\n",
480 "West Bromwich Albion 43 59\n",
481 "West Ham United 40 51"
482 ]
483 }
484 ],
485 "prompt_number": 5
486 },
487 {
488 "cell_type": "heading",
489 "level": 4,
490 "metadata": {},
491 "source": [
492 "First, calculate the coefficints for $k = 2$."
493 ]
494 },
495 {
496 "cell_type": "code",
497 "collapsed": false,
498 "input": [
499 "\n",
500 "# Run k-means algorithm on the data (k=2)\n",
501 "kmeans = sc.KMeans(n_clusters=2)\n",
502 "assigned_clusters = kmeans.fit(premierLeague).labels_\n",
503 "\n",
504 "silhouette(premierLeague, assigned_clusters)"
505 ],
506 "language": "python",
507 "metadata": {},
508 "outputs": [
509 {
510 "metadata": {},
511 "output_type": "pyout",
512 "prompt_number": 6,
513 "text": [
514 "array([ 0.41, 0.73, 0.62, 0.46, 0.62, 0.21, 0.51, 0.7 , 0.46, 0.5 , 0.26, 0.72, 0.66,\n",
515 " 0.24, 0.64, 0.73, 0.44, 0.34, 0.72, 0.68])"
516 ]
517 },
518 {
519 "metadata": {},
520 "output_type": "display_data",
521 "png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFjhJREFUeJzt3XtwVPX9//HXSkKtCAEs7c8SZCEkXLLZ3RAshUK5OHJT\nOji0XDpDwSJ/SJFi1Wk7ToXOWEZGmBZlWmgHhVqgrUDLRW4zJYsoWhC5CAGBltgYBCFAI4R73r8/\n+LIl5LKbkLOb8Hk+ZpjJWU7O55UD++LD7tnP8ZmZCQBwx7sr2QEAAIlB4QOAIyh8AHAEhQ8AjqDw\nAcARFD4AOCIlmYP7fP9P0olkRgCARicUCmn37t21/j5fMq/D9/l8kmYka/hayJc0INkh4kDO+tUY\ncjaGjNKdntNsev1HqYHP51NdqpuXdADAERQ+ADiCwo+LP9kB4uRPdoA4+ZMdIE7+ZAeIgz/ZAeLk\nT3aAOPmTHcBTFH5cOiQ7QJzIWb8aQ87GkFEiZ8NA4QOAIyh8AHAEhQ8AjqDwAcARFD4AOCKpSysA\nQGOW6E/Y3i5m+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgKHwAcwdIK\nAFBLjW1JhRuY4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgKHwAcQeEDgCMofABwBIUP\nAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4IiXZAQCgITGb\nnuwInmGGDwCOoPABwBEUPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARLK0A4I53\nJy+XUBvM8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4gsIHAEew\nlg6AOwZr5tTM0xn+hg0b1KVLF2VmZmrWrFleDgUAiMGzwr927ZqmTJmiDRs2qKCgQMuWLdOBAwe8\nGg4AEINnhb99+3Z16tRJfr9fqampGjNmjFatWuXVcACAGDwr/OLiYrVr1y66nZ6eruLiYq+GAwDE\n4Nmbtj6fL84982/62i+pQ/2HAYBGLBKJKBKJ3PZxPCv8tm3bqqioKLpdVFSk9PT0KvYc4FUEALgj\n9O/fX/37949u//KXv6zTcTx7SadHjx46fPiwCgsLdfnyZf3lL3/Rd77zHa+GAwDE4NkMPyUlRfPm\nzdPgwYN17do1TZw4UV27dvVqOABADD4zs6QN7vNJmpGs4QHcYVz54JXP51NdqpulFQDAESytAKBR\ncWUW7wVm+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgKHwAcwdIKAJKO\n5RISgxk+ADiCwgcAR1D4AOAICh8AHEHhA4Ajaiz8a9eu6de//nWisgAAPFRj4Tdp0kRLly5NVBYA\ngIdiXoffp08fTZkyRaNHj1azZs2ij3fv3t3TYACA+uWzGLc+79+/v3w+X6XH8/Pzb39wn0/SjNs+\nDoDGjQ9e1Y7P51OM6q5SzBl+JBKpSx4AQAMT8yqd48ePa+LEiRoyZIgkqaCgQAsXLvQ8GACgfsUs\n/AkTJmjQoEE6duyYJCkzM5MrdwCgEYpZ+KdOndLo0aPVpEkTSVJqaqpSUlhzDQAam5iFf++996qk\npCS6/f777ystLc3TUACA+hdzqj5nzhwNHz5c//73v9W7d2+dPHlSy5cvT0Q2AEA9iln42dnZ2rJl\niz7++GOZmTp37qzy8vJEZAMA1KOYL+n07t1bqampCgQCysnJUdOmTdW7d+9EZAMA1KNqZ/ifffaZ\njh07prKyMn344YcyM/l8PpWWlqqsrCyRGQEA9aDawt+0aZMWLVqk4uJiPfPMM9HHmzdvrpkzZyYk\nHACg/sRcWmHFihUaOXKkN4OztAIAsbRCbdV1aYWYr+EXFRWptLRUZqaJEyeqe/fu2rhxY51CAgCS\nJ+ZVOq+99pqmTZumjRs36vTp0/rjH/+ocePGafDgwYnIByBOzJIRS8wZ/o3/Nrz11lsaN26cAoGA\n56EAAPUvZuHn5eVp0KBBWrdunQYPHqzS0lLddRd3RgSAxibmm7bl5eXatWuXMjIy1LJlS5WUlKi4\nuFjBYPD2B+dNW6De8JKOOzxbD3/r1q3y+Xzau3dvnYIBABqGmIX/8ssvR+94dfHiRW3fvl15eXna\nvHmz5+EAAPUnZuGvXbu2wnZRUZF+/OMfexYIAOCNWr/7mp6ergMHDniRBQDgoZgz/Keeeir6dXl5\nuXbv3q28vDxPQwEA6l/Mwr+53FNSUjR27Fj16dPH01AAgPoX87JMTwfnskyg3nBZpjvq/bLMnJyc\nGgfjMk0gNkoYDUm1hb9y5UqdOHFC6enpFR4vKirS/fff73kwAED9qvYqnWnTpiktLU1+v7/Cr7S0\nND399NOJzAgAqAfVFv6JEyeqfFknGAzq6NGjnoYCANS/agv/7Nmz1X7TxYsXPQkDAPBOtYXfo0cP\n/f73v6/0+B/+8AeuwweARqjayzKPHz+uxx57TE2bNo0W/M6dO3Xp0iX97W9/q5c3brksE3c6rtKB\nF+p6WWaN1+GbmfLz87Vv3z75fD5lZ2dr4MCBtxW0wuAUPu5wFD684Enhe43Cx52OwocXPLuJOQDg\nzkDhA4AjKHwAcETM1TKBhorXx4HaYYYPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwA\ncASFDwCOoPABwBEsrQBJLFMAuIAZPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcAR\nFD4AOILCBwBH+MzMkja4z6cZyRocuMNMT95TGQnm8/lUl+pmhg8AjqDwAcARFD4AOILCBwBHUPgA\n4AgKHwAcQeEDgCMofABwBIUPAI6g8AHAESnJDgC4iqUQkGjM8AHAERQ+ADiCwgcAR1D4AOAICh8A\nHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4gsIHkoBlFZAMnhb+D3/4Q33ta19TTk6Ol8MAAOLgaeE/\n/vjj2rBhg5dDAADi5Gnh9+3bV61atfJyCABAnHgNHwAcQeEDgCOSfgOU/Ju+9kvqkKQcANBQRSIR\nRSKR2z6Oz8zb68MKCws1fPhwffTRR5UH9/k0w8vBgQaKyzJxO3w+n+pS3Z6+pDN27Fj17t1bhw4d\nUrt27fT66697ORwAoAaevqSzbNkyLw8PAKgF3rQFAEck/U1boLHgdXc0dszwAcARFD4AOILCBwBH\nUPgA4AgKHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+ADiCpRUQN5YWABo3ZvgA4AgKHwAcQeEDgCMo\nfABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHhA4AjWEvnDsbaNwBuxgwfABxB4QOAIyh8\nAHAEhQ8AjqDwAcARFD4AOILCj8PRZAeIU2PJGYlEkh0hLo0hZ2PIKJGzoaDw41CY7ABxKkx2gDg1\nlidVY8jZGDJK5GwoKHwAcASFDwCO8Jkl7/P34XBYe/bsSdbwANAohUIh7d69u9bfl9TCBwAkDi/p\nAIAjKHwAcERCCn/Dhg3q0qWLMjMzNWvWrCr3mTp1qjIzMxUKhbRr165ExKokVs6DBw+qV69euvvu\nuzVnzpwkJLwuVs4lS5YoFAopGAzqW9/6lvbu3ZuElLFzrlq1SqFQSLm5ucrLy9PmzZsbXMYbduzY\noZSUFK1cuTKB6f4nVs5IJKK0tDTl5uYqNzdXL774YhJSxnc+I5GIcnNzFQgE1L9//8QG/D+xcs6e\nPTt6LnNycpSSkqKzZ882uJynTp3SkCFDFA6HFQgEtGjRopoPaB67evWqZWRk2NGjR+3y5csWCoWs\noKCgwj5vvfWWDR061MzM3n//fevZs6fXseqU8/PPP7cdO3bY888/b7Nnz054xnhzbtu2zc6ePWtm\nZuvXr2+w5/PcuXPRr/fu3WsZGRkNLuON/QYMGGCPPPKILV++PKEZ482Zn59vw4cPT3i2m8WT88yZ\nM9atWzcrKioyM7OTJ082yJw3W7NmjT300EMJTHhdPDmnT59uP/vZz8zs+rls3bq1Xblypdpjej7D\n3759uzp16iS/36/U1FSNGTNGq1atqrDP6tWrNX78eElSz549dfbsWZ04ccLraLXO2aZNG/Xo0UOp\nqakJzXazeHL26tVLaWlpkq6fz08//bRB5mzWrFn063PnzukrX/lKg8soSa+++qq++93vqk2bNgnN\nd0O8OS3J11/Ek3Pp0qUaOXKk0tPTJSnhf+bx5rzZ0qVLNXbs2AQmvC6enPfff79KS0slSaWlpbrv\nvvuUklL9fa08L/zi4mK1a9cuup2enq7i4uKY+yS6pOLJ2RDUNufChQs1bNiwRESrIN6cf//739W1\na1cNHTpUr7zySiIjxv13c9WqVXryySclST6fL6EZb2SIldPn82nbtm0KhUIaNmyYCgoKEh0zrpyH\nDx/W6dOnNWDAAPXo0UNvvPFGomPW6jlUVlamjRs3auTIkYmKFxVPzkmTJmn//v36+te/rlAopLlz\n59Z4TM9vcRjvE+TW2Umin1jJeCLXRW1y5ufn67XXXtO7777rYaKqxZtzxIgRGjFihLZu3apx48bp\n448/9jjZ/8STcdq0aXrppZfk8/lkZkmZRceTs3v37ioqKtI999yj9evXa8SIETp06FAC0v1PPDmv\nXLmiDz/8UP/4xz9UVlamXr166Zvf/KYyMzMTkPC62jyH1qxZoz59+qhly5YeJqpaPDlnzpypcDis\nSCSif/3rX3r44Ye1Z88eNW/evMr9PZ/ht23bVkVFRdHtoqKi6H/nqtvn008/Vdu2bb2OVmOGqnI2\nBPHm3Lt3ryZNmqTVq1erVatWiYwoqfbns2/fvrp69apKSkoSEU9SfBl37typMWPGqEOHDlqxYoUm\nT56s1atXJyxjvDmbN2+ue+65R5I0dOhQXblyRadPn25wOdu1a6dBgwbpy1/+su677z59+9vfTviH\nL2vzd/PPf/5zUl7OkeLLuW3bNn3ve9+TJGVkZKhDhw41T5o8e8fh/1y5csU6duxoR48etUuXLsV8\n0/a9995LypuM8eS8Yfr06Ul70zaenJ988ollZGTYe++9l5SMZvHlPHLkiJWXl5uZ2c6dO61jx44N\nLuPNJkyYYCtWrEhgwuviyXn8+PHoufznP/9p7du3b5A5Dxw4YA899JBdvXrVzp8/b4FAwPbv39/g\ncpqZnT171lq3bm1lZWUJzXdDPDmffvppmzFjhpld/zvQtm1bKykpqfaYnhe+mdm6dessKyvLMjIy\nbObMmWZmNn/+fJs/f350nx/96EeWkZFhwWDQdu7cmYhYtc752WefWXp6urVo0cJatmxp7dq1sy++\n+KLB5Zw4caK1bt3awuGwhcNhe/DBBxOeMZ6cs2bNsuzsbAuHw9anTx/bvn17g8t4s2QVvlnsnPPm\nzbPs7GwLhULWq1evpP1jH8/5fPnll61bt24WCARs7ty5DTbnokWLbOzYsUnJd0OsnCdPnrRHH33U\ngsGgBQIBW7JkSY3HY2kFAHAEn7QFAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwUS9+9atfKRAI\nRJc73rFjh6Tra30cPHhQkuT3+3X69GkVFhYqJyfH0zyffPKJli1bFt3es2eP1q9fX+vjHDp0SMOG\nDVNWVpby8vI0evRoff7553XK9NxzzykQCOinP/2pTp06pZ49eyovL0/vvPOOHnnkkegiWFVZsGBB\nndedufVcwGEef24ADti2bZv16tXLLl++bGZmJSUlduzYsUr7+f1+KykpsaNHj1ogEPA0U35+vj36\n6KPR7ddff92mTJlSq2NcuHDBMjMzbe3atdHHIpGI7du3r06Z0tLSop+GXbZsmT3xxBN1Ok5t3Xou\n4C4KH7dt5cqV1a7F3q9fv+gnp28u/K5du9qkSZMsOzvbBg0aZBcuXDAzs127dlnPnj0tGAzaY489\nZmfOnIke54MPPjCz658u9Pv9ZnZ9zfBnn33WHnzwQQsGg7ZgwQIzM+vZs6elpaVZOBy2WbNm2QMP\nPGBt2rSxcDhsf/3rX+3cuXP2+OOP2ze+8Q3Lzc21VatWVcq+cOFCGz9+fJU/14ULF2zChAmWk5Nj\nubm5lp+fX2Oe4cOHW5MmTSrlyc3NtQsXLlj79u2jH4lfvHixBYNBC4VC9oMf/MDMKi7nceTIERsy\nZIjl5eVZ37597eDBg2ZmNn78eJs6dar17t3bOnbsGF27/+Zz8Zvf/CaeP1LcoSh83LZz585ZOBy2\nrKwsmzx5sm3ZsiX6e/3796+y8FNSUmzPnj1mZjZq1Cj705/+ZGZmOTk59vbbb5uZ2QsvvGDTpk2r\ndJybC3/BggX24osvmpnZxYsXrUePHnb06FGLRCIVZrWLFi2yp556Krr985//PDrmmTNnLCsry86f\nP1/h5/rJT35ir7zySpU/8+zZs23ixIlmZnbw4EF74IEH7OLFi1XmKSwsNDOze++9t9o8N87Nvn37\nLCsrK1r+N/7BmzFjhs2ZM8fMzAYOHGiHDx82s+s3DBo4cKCZXS/8UaNGmZlZQUGBderUycys0rmA\nuzxfHhl3vmbNmmnnzp3aunWr8vPzNXr0aL300kvRm9pUpUOHDgoGg5KkvLw8FRYWqrS0VP/973/V\nt29fSdL48eOjKwFWZ9OmTfroo4+0fPlySddvAnHkyJFKN4GwW5Y23rRpk9asWaPZs2dLki5duqSi\noiJ17ty50vdV5d1339XUqVMlSZ07d1b79u116NChKvMcPnxY7du3rzHPjcc2b96sUaNGqXXr1pJU\naVne8+fPV1ghUZIuX74s6fpyuiNGjJAkde3aNXoToep+BriHwke9uOuuu9SvXz/169dPOTk5Wrx4\ncY2F/6UvfSn6dZMmTXTx4sVK+9xcVCkpKSovL5ekSvvOmzdPDz/8cIXHIpFIhe2q1hZfuXJljeuw\nZ2dna8uWLdX+fnVFWlWeW1W31vmNdferU15erlatWlV73+emTZvGzAd3cZUObtuhQ4d0+PDh6Pau\nXbvk9/trdQwzU4sWLdSqVSu98847kqQ33ngjepNrv9+vDz74QJKis2dJGjx4sH7729/q6tWr0Sxl\nZWVq0aKFvvjii+h+zZs3r7A9ePDgCnfYqqpAv//972vbtm1at25d9LG3335b+/fvV9++fbVkyZLo\nmP/5z3/UpUuXavNU9fPeyufzaeDAgXrzzTeja9mfOXOmwvc0b95cHTp0iJ4DM4t5k/pbf3a4i8LH\nbTt37pwmTJig7OxshUIhHTx4UDNmzKjxe26d4d7YXrx4sZ577jmFQiHt3btXL7zwgiTp2Wef1e9+\n9zt1795dJSUl0f2feOIJdevWTd27d1dOTo6efPJJXbt2TcFgUE2aNFE4HNbcuXM1YMAAFRQUKDc3\nV2+++aZ+8Ytf6MqVKwoGgwoEApo+fXqljHfffbfWrl2rV199VVlZWcrOztb8+fP11a9+VZMnT1Z5\nebmCwaDGjBmjxYsXKzU1tdo8t/7MPp+v0rYkdevWTc8//7z69euncDisZ555ptI+S5Ys0cKFCxUO\nhxUIBCrckKWqY4ZCoQrnAu5ieWQAcAQzfABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHh\nA4Aj/j9524DP6slcNQAAAABJRU5ErkJggg==\n",
522 "text": [
523 "<matplotlib.figure.Figure at 0x106fc5910>"
524 ]
525 }
526 ],
527 "prompt_number": 6
528 },
529 {
530 "cell_type": "heading",
531 "level": 4,
532 "metadata": {},
533 "source": [
534 "Similarly, calculate the coefficients for $k = 3$:"
535 ]
536 },
537 {
538 "cell_type": "code",
539 "collapsed": false,
540 "input": [
541 "from sklearn.cluster import KMeans\n",
542 "\n",
543 "# Run k-means algorithm on the data (k=3)\n",
544 "kmeans = KMeans(n_clusters=3)\n",
545 "assigned_clusters = kmeans.fit(premierLeague).labels_\n",
546 "\n",
547 "silhouette(premierLeague, assigned_clusters)"
548 ],
549 "language": "python",
550 "metadata": {},
551 "outputs": [
552 {
553 "metadata": {},
554 "output_type": "pyout",
555 "prompt_number": 7,
556 "text": [
557 "array([ 0.63, 0.64, 0.55, 0.37, 0.44, 0.65, 0.41, 0.55, 0.69, 0.69, 0.66, 0.57, 0.59,\n",
558 " 0.45, 0.28, 0.62, 0.17, 0.36, 0.57, 0.46])"
559 ]
560 },
561 {
562 "metadata": {},
563 "output_type": "display_data",
564 "png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFo5JREFUeJzt3XtwVPX9//HXQmJFgilWHDsECZcAIXeCTaFQLo5EEBwc\nLJfOILSR6cggpVWn0y8j0o7tyBRGQaaWdhBRAVuBitwCMyWLSLAgBhADAlNi11BQAjQN4Z737w+H\n/RnIZjeXs5vweT5mnNmznD2f1x6Xl8ezZz/HZ2YmAMAtr02sAwAAooPCBwBHUPgA4AgKHwAcQeED\ngCMofABwRFwsB/f57pV0KpYRAKDVycrK0r59+xr8Ol8sr8P3+XyS5sZq+GZQJGlYrEM0AfljqzXn\nb83ZpVjmN3u+ydvw+XxqTHVzSgcAHEHhA4AjKPwmSY51gCZKjnWAJkqOdYAmSo51gCZIjnWAJkqO\ndYCYoPCbpFusAzQR+WOrNedvzdml1p+/cSh8AHAEhQ8AjqDwAcARFD4AOILCBwBHxHRqBQC4lTTH\nr2i9xBE+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4gsIHAEcwtQIARKCl\nT5sQCY7wAcARFD4AOILCBwBHUPgA4AgKHwAcQeEDgCM8LfzCwkL16dNHKSkpmjdvnpdDAQDC8Kzw\nr127phkzZqiwsFClpaVatWqVDh065NVwAIAwPCv83bt3q2fPnkpOTlZ8fLwmTpyodevWeTUcACAM\nzwq/vLxcXbp0CS4nJSWpvLzcq+EAAGF4NrWCz+eLcM2ibzxOltSt+cMADroVpgLA1/x+v/x+f5O3\n41nhd+7cWYFAILgcCASUlJRUx5rDvIoAALeEoUOHaujQocHl3/zmN43ajmendPr376+jR4+qrKxM\nly9f1l//+lc98sgjXg0HAAjDsyP8uLg4LV68WPn5+bp27ZoKCgqUmprq1XAAgDB8ZmYxG9znkzQ3\nVsMDtzTO4d+6fD6fGlPd/NIWABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOMKz6/CBWx2XPaK14Qgf\nABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AgKHwAcQeEDgCOYWgFoIKZUQGvFET4A\nOILCBwBHUPgA4AgKHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAI5tK5RTC/C4Bw\nOMIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILCBwBHUPgA4AifmVnMBvf59Jz9\nX6yGB4BG+61+F7OxfT6fGlPdHOEDgCMofABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8AHEHh\nA4AjKHwAcERcrAMAQEsTy2kTvMQRPgA4gsIHAEdQ+ADgCAofABxB4QOAI+ot/GvXrumll16KVhYA\ngIfqLfy2bdtq5cqV0coCAPBQ2OvwBw0apBkzZmjChAlq37598Pl+/fp5GgwA0LzCFn5JSYl8Pp/m\nzJlT6/mioiLPQgEAml/Ywvf7/VGIAQDwWtjCP3nypGbPnq3y8nIVFhaqtLRUu3btUkFBQTTyAUCT\n3KrTJDRG2Msyp06dqhEjRujEiROSpJSUFK7cAYBWKGzhnz59WhMmTFDbtm0lSfHx8YqLY841AGht\nwhZ+QkKCKioqgssffvihEhMTPQ0FAGh+YQ/VFyxYoDFjxuhf//qXBg4cqK+++kqrV6+ORjYAQDMK\nW/hpaWnavn27PvvsM5mZevfurZqammhkAwA0o7CndAYOHKj4+Hilp6crIyNDt912mwYOHBiNbACA\nZhTyCP8///mPTpw4oerqan388ccyM/l8PlVWVqq6ujqaGQEAzSBk4W/dulWvv/66ysvL9fTTTwef\n79Chg37/+99HJRwAoPn4zMzqW2HNmjUaN26cN4P7fHrO/s+TbQOAdGv+8Mrn8ylMddcp7Dn8QCCg\nyspKmZkKCgrUr18/bdmypVEhAQCxE/Yqnddee02zZs3Sli1bdObMGb3xxhuaPHmy8vPzo5EPAG5y\nKx61R0PYI/zr/9uwceNGTZ48Wenp6Z6HAgA0v7CFn5ubqxEjRmjTpk3Kz89XZWWl2rThzogA0NpE\ndEqnpKREPXr0UPv27VVRUaFly5ZFIxsAoBmFLfwdO3bI5/PpwIED0cgDAPBI2ML/wx/+IJ/PJ0m6\nePGidu/erdzcXG3bts3zcACA5hO28Dds2FBrORAI6Oc//7lngQAA3mjwt69JSUk6dOiQF1kAAB4K\ne4T/1FNPBR/X1NRo3759ys3N9TQUAKD5hS38b5Z7XFycJk2apEGDBnkaCgDQ/MIW/tSpU6MQAwDg\ntZCFn5GREfJFXKYJAK1PyMJfu3atTp06paSkpFrPBwIBffe73/U8GACgeYW8SmfWrFlKTExUcnJy\nrX8SExP1i1/8IpoZAQDNIGThnzp1qs7TOpmZmTp+/LinoQAAzS9k4Z87dy7kiy5evOhJGACAd0IW\nfv/+/fXnP//5puf/8pe/cB0+ALRCIb+0ffnll/Xoo49qxYoVwYLfu3evLl26pL///e9RCwgAaB4h\nC//ee+9VcXGxioqKdPDgQfl8Po0ePVrDhw+PeOM//elPtXHjRt1zzz365JNPmiUwAKBxwt7EvCl2\n7NihhIQEPf7443UWPjcxB9AYrt/i0LObmDfF4MGD1bFjRy+HAABEiHsVAoAjKHwAcETYydO8tn3u\n+8HHXYd2VfLQrjFMA6AlcP0c/Y38fr/8fn+Tt+Ppl7aSVFZWpjFjxvClLYCIUfj1a5Ff2k6aNEkD\nBw7UkSNH1KVLFy1btszL4QAA9fD0lM6qVau83DwAoAH40hYAHEHhA4AjKHwAcASFDwCOoPABwBEU\nPgA4gsIHAEdQ+ABaFH5l6x0KHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh8A\nHEHhA4AjPL3FIYDYY6oCXMcRPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4A\nOILCBwBHMLUC0MoxdQIixRE+ADiCwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBEUPgA4\ngsIHAEcwtQLQQjBFArzGET4AOILCBwBHUPgA4AgKHwAcQeEDgCMofABwBIUPAI6g8AHAERQ+ADiC\nwgcARzC1AloMphYAvMURPgA4gsIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwAcARFD4AOILC\nBwBH+MzMYja4z6e5sRocAFqJ52+oaZ/Pp8ZUN0f4AOAITws/EAho2LBhSktLU3p6uhYtWuTlcACA\neng6W2Z8fLxeeuklZWdnq6qqSrm5uXrwwQeVmprq5bAAgDp4eoR/7733Kjs7W5KUkJCg1NRUnThx\nwsshAQAhRO0cfllZmUpKSpSXlxetIQEA3xCVG6BUVVXpscce08KFC5WQkFDrz4q+8ThZUrdoBAKA\nVsTv98vv9zd5O55flnnlyhWNHj1aI0eO1KxZs2oPzmWZABBWq7gs08xUUFCgvn373lT2AIDo8rTw\nd+7cqbfeektFRUXKyclRTk6OCgsLvRwSABCCp+fwBw0apJqaGi+HAABEiF/aAoAjonKVDgCg4W78\nsrapOMIHAEdQ+ADgCAofABxB4QOAIyh8AHAEhQ8AjqDwm+B4rAM0EfljqzXnb83Zpdafv7Eo/CYo\ni3WAJiqLdYAmKot1gCYqi3WAJiiLdYAmKot1gBih8AHAERQ+ADjC8/nw65Odna39+/fHangAaJWy\nsrK0b9++Br8upoUPAIgeTukAgCMofABwRFQKv7CwUH369FFKSormzZtX5zozZ85USkqKsrKyVFJS\nEo1YEQuX//DhwxowYIBuv/12LViwIAYJ6xcu/4oVK5SVlaXMzEz94Ac/0IEDB2KQsm7hsq9bt05Z\nWVnKyclRbm6utm3bFoOUoUXy2ZekPXv2KC4uTmvXro1iuvDC5ff7/UpMTAze0e6FF16IQcrQItn/\nfr9fOTk5Sk9P19ChQ6MbMIxw+efPnx/c9xkZGYqLi9O5c+dCb9A8dvXqVevRo4cdP37cLl++bFlZ\nWVZaWlprnY0bN9rIkSPNzOzDDz+0vLw8r2NFLJL8X375pe3Zs8dmz55t8+fPj1HSukWSv7i42M6d\nO2dmZps3b24x+z+S7FVVVcHHBw4csB49ekQ7ZkiR5L++3rBhw+zhhx+21atXxyBp3SLJX1RUZGPG\njIlRwvpFkv/s2bPWt29fCwQCZmb21VdfxSJqnSL9/Fy3fv16e+CBB+rdpudH+Lt371bPnj2VnJys\n+Ph4TZw4UevWrau1znvvvacpU6ZIkvLy8nTu3DmdOnXK62gRiSR/p06d1L9/f8XHx8coZWiR5B8w\nYIASExMlfb3/v/jii1hEvUkk2du3bx98XFVVpbvvvjvaMUOKJL8kvfLKK3rsscfUqVOnGKQMLdL8\n1kKv+4gk/8qVKzVu3DglJSVJUqv8/Fy3cuVKTZo0qd5tel745eXl6tKlS3A5KSlJ5eXlYddpKaUT\nSf6WrKH5ly5dqlGjRkUjWliRZn/33XeVmpqqkSNHatGiRdGMWK9IP/vr1q3Tk08+KUny+XxRzVif\nSPL7fD4VFxcrKytLo0aNUmlpabRjhhRJ/qNHj+rMmTMaNmyY+vfvrzfffDPaMUNqyN/d6upqbdmy\nRePGjat3m57f4jDSD/CNRwkt5YPfUnI0VkPyFxUV6bXXXtPOnTs9TBS5SLOPHTtWY8eO1Y4dOzR5\n8mR99tlnHieLTCT5Z82apRdffFE+n09m1qKOliPJ369fPwUCAd1xxx3avHmzxo4dqyNHjkQhXXiR\n5L9y5Yo+/vhj/eMf/1B1dbUGDBig73//+0pJSYlCwvo15O/u+vXrNWjQIH3729+udz3PC79z584K\nBALB5UAgEPzfp1DrfPHFF+rcubPX0SISSf6WLNL8Bw4c0LRp01RYWKiOHTtGM2JIDd33gwcP1tWr\nV1VRUaHvfOc70YhYr0jy7927VxMnTpQknT59Wps3b1Z8fLweeeSRqGatSyT5O3ToEHw8cuRITZ8+\nXWfOnNFdd90VtZyhRJK/S5cuuvvuu9WuXTu1a9dOP/zhD7V///4WUfgN+fy//fbbYU/nSPL+S9sr\nV65Y9+7d7fjx43bp0qWwX9ru2rWrxXxpaBZZ/uuef/75FvelbST5P//8c+vRo4ft2rUrRinrFkn2\nY8eOWU1NjZmZ7d2717p37x6LqHVqyGfHzGzq1Km2Zs2aKCasXyT5T548Gdz///znP61r164xSFq3\nSPIfOnTIHnjgAbt69aqdP3/e0tPT7dNPP41R4toi/fycO3fO7rrrLquurg67Tc+P8OPi4rR48WLl\n5+fr2rVrKigoUGpqqpYsWSJJ+tnPfqZRo0Zp06ZN6tmzp9q3b69ly5Z5HStikeQ/efKk7r//flVW\nVqpNmzZauHChSktLlZCQEOP0keX/7W9/q7NnzwbPI8fHx2v37t2xjC0psuxr1qzRG2+8ofj4eCUk\nJOjtt9+Ocer/L5L8LVkk+VevXq1XX31VcXFxuuOOO1rd/u/Tp48eeughZWZmqk2bNpo2bZr69u0b\n4+Rfi/Tz8+677yo/P1/t2rULu02mVgAAR/BLWwBwBIUPAI6g8AHAERQ+ADiCwgcAR1D4AOAICh/N\n4ne/+53S09ODUxXv2bNHkjRt2jQdPnxYkpScnKwzZ86orKxMGRkZnub5/PPPtWrVquDy/v37tXnz\n5gZv58iRIxo1apR69eql3NxcTZgwQV9++WWjMj377LNKT0/Xr371K50+fVp5eXnKzc3VBx98oIcf\nfliVlZUhX7tkyZJGz/Ny476Aw5r/92FwTXFxsQ0YMMAuX75sZmYVFRV24sSJm9ZLTk62iooKO378\nuKWnp3uaqaioyEaPHh1cXrZsmc2YMaNB27hw4YKlpKTYhg0bgs/5/X47ePBgozIlJiYGf5W6atUq\ne+KJJxq1nYa6cV/AXRQ+mmzt2rUh50QfMmSI7d2718xqF35qaqpNmzbN0tLSbMSIEXbhwgUzMysp\nKbG8vDzLzMy0Rx991M6ePRvczkcffWRmX89ZnpycbGZfzxn+zDPP2P3332+ZmZm2ZMkSMzPLy8uz\nxMREy87Otnnz5tl9991nnTp1suzsbPvb3/5mVVVV9pOf/MS+973vWU5Ojq1bt+6m7EuXLrUpU6bU\n+b4uXLhgU6dOtYyMDMvJybGioqJ684wZM8batm17U56cnBy7cOGCde3a1SoqKszMbPny5ZaZmWlZ\nWVn2+OOPm1ntaTuOHTtmDz30kOXm5trgwYPt8OHDZmY2ZcoUmzlzpg0cONC6d+8enFv/m/vi5Zdf\njuRfKW5RFD6arKqqyrKzs61Xr142ffp02759e/DPhg4dWmfhx8XF2f79+83MbPz48fbWW2+ZmVlG\nRoa9//77ZmY2Z84cmzVr1k3b+WbhL1myxF544QUzM7t48aL179/fjh8/bn6/v9ZR7euvv25PPfVU\ncPnXv/51cMyzZ89ar1697Pz587Xe1y9/+UtbtGhRne95/vz5VlBQYGZmhw8ftvvuu88uXrxYZ56y\nsjIzM0tISAiZ5/q+OXjwoPXq1StY/tf/gzd37lxbsGCBmZkNHz7cjh49amZf3zBo+PDhZvZ14Y8f\nP97MzEpLS61nz55mZjftC7jL87l0cOtr37699u7dqx07dqioqEgTJkzQiy++GLypTV26deumzMxM\nSVJubq7KyspUWVmp//73vxo8eLAkacqUKfrRj35U79hbt27VJ598otWrV0uSKisrdezYMcXF1f5o\n2w1TD2/dulXr16/X/PnzJUmXLl1SIBBQ7969b3pdXXbu3KmZM2dKknr37q2uXbvqyJEjdeY5evSo\nunbtWm+e689t27ZN48ePD842eeN0t+fPn1dxcXGt/XL58mVJX0+nO3bsWElSampq8CZCod4D3EPh\no1m0adNGQ4YM0ZAhQ5SRkaHly5fXW/jf+ta3go/btm2rixcv3rTON4sqLi5ONTU1knTTuosXL9aD\nDz5Y6zm/319rua65xdeuXVvvNLhpaWnavn17yD8PVaR15blRqLnOr8+LH0pNTY06duwY8r7Pt912\nW9h8cBdX6aDJjhw5oqNHjwaXS0pKlJyc3KBtmJnuvPNOdezYUR988IEk6c033wzeVDo5OVkfffSR\nJAWPniUpPz9ff/zjH3X16tVglurqat1555363//+F1yvQ4cOtZbz8/Nr3R2rrgL98Y9/rOLiYm3a\ntCn43Pvvv69PP/1UgwcP1ooVK4Jj/vvf/1afPn1C5qnr/d7I5/Np+PDheuedd3TmzBlJ0tmzZ2u9\npkOHDurWrVtwH5hZ2JvO3/je4S4KH01WVVWlqVOnKi0tTVlZWTp8+LDmzp1b72tuPMK9vrx8+XI9\n++yzysrK0oEDBzRnzhxJ0jPPPKNXX31V/fr1U0VFRXD9J554Qn379lW/fv2UkZGhJ598UteuXVNm\nZqbatm2r7OxsLVy4UMOGDVNpaalycnL0zjvv6LnnntOVK1eUmZmp9PR0Pf/88zdlvP3227Vhwwa9\n8sor6tWrl9LS0vSnP/1J99xzj6ZPn66amhplZmZq4sSJWr58ueLj40PmufE9+3y+m5YlqW/fvpo9\ne7aGDBmi7OxsPf300zets2LFCi1dulTZ2dlKT0/Xe++9V+d+vf44Kyur1r6Au5geGQAcwRE+ADiC\nwgcAR1D4AOAICh8AHEHhA4AjKHwAcASFDwCOoPABwBH/D+rknmSpzF7KAAAAAElFTkSuQmCC\n",
565 "text": [
566 "<matplotlib.figure.Figure at 0x10b665890>"
567 ]
568 }
569 ],
570 "prompt_number": 7
571 },
572 {
573 "cell_type": "code",
574 "collapsed": false,
575 "input": [],
576 "language": "python",
577 "metadata": {},
578 "outputs": [],
579 "prompt_number": 7
580 }
581 ],
582 "metadata": {}
583 }
584 ]
585 }