Done task 6 in Python
authorNeil Smith <neil.git@njae.me.uk>
Tue, 2 Oct 2018 10:22:18 +0000 (11:22 +0100)
committerNeil Smith <neil.git@njae.me.uk>
Tue, 2 Oct 2018 10:22:18 +0000 (11:22 +0100)
src/task6/task6.ipynb [new file with mode: 0644]

diff --git a/src/task6/task6.ipynb b/src/task6/task6.ipynb
new file mode 100644 (file)
index 0000000..e00885e
--- /dev/null
@@ -0,0 +1,247 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "friendships = [l.strip().split() for l in open('../../data/06-friendships.txt')]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "friendships = [l.strip().split() for l in open('../../data/06-small.txt')]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "friendships = [['Jacqueline', 'Edgar'],\n",
+    " ['Ella-Louise', 'Raj'],\n",
+    " ['Abby', 'Edgar'],\n",
+    " ['Anita', 'Harlow'],\n",
+    " ['Raj', 'Edgar'],\n",
+    " ['Bronwyn', 'Sanjay'],\n",
+    " ['Caiden', 'Anita'],\n",
+    " ['Raj', 'Finlay'],\n",
+    " ['Raj', 'Jacqueline'],\n",
+    " ['Ella-Louise', 'Abby'],\n",
+    " ['Samson', 'Sanjay'],\n",
+    " ['Samson', 'Alessandra'],\n",
+    " ['Edgar', 'Finlay'],\n",
+    " ['Finlay', 'Jacqueline'],\n",
+    " ['Bronwyn', 'Samson']]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This is a **union-find** problem. \n",
+    "\n",
+    "It's all about asking if two people are in the same connected set. Socially, looking at the example in the question, there are three \"gangs\": Edgar's gang, Harlow's gang, and Sanjay's gang. We label each gang with some representative person from that gang: the _exemplar_. For each person, we can ask which gang they're in by asking for their gang's exemplar. So long as each member of a gang uses the same exemplar, checking if people are in the same gang is easy: just check if they have the same exemplar.\n",
+    "\n",
+    "That's the **find** part. \n",
+    "\n",
+    "The **union** part is for merging gangs. We can merge two gangs by setting the exemplar of one gang to be the exemplar of the other.\n",
+    "\n",
+    "How to implement this?\n",
+    "\n",
+    "The simplest approach is to use a lookup table (a `dict`) that goes from each person to their exemplar. If someone _is_ an exemplar, their entry points to themself. For instance, we could add friendships like this, abritrarily calling one of each pair the exemplar of the new group:\n",
+    "\n",
+    "```\n",
+    "Jacqueline ----> Edgar\n",
+    "Ella-Louise ---> Raj\n",
+    "\n",
+    "```\n",
+    "\n",
+    "When we add the Abby-Edgar link, we see Edgar is already an exemplar, so we make Abby have Edgar as her exemplar. Anita-Harlow starts a new gang.\n",
+    "\n",
+    "```\n",
+    "Jacqueline --+-> Edgar\n",
+    "Abby --------+\n",
+    "\n",
+    "Ella-Louise ---> Raj\n",
+    "\n",
+    "Anita -------+-> Harlow\n",
+    "```\n",
+    "\n",
+    "That does _find_. How about _union_?\n",
+    "\n",
+    "For instance, what do we do in the above diagram when we find Edgar and Raj are friends?\n",
+    "\n",
+    "To join two groups, we could change all the exemplars in the absorbed group to point to the absorber's exemplar. But that's a lot of effort. Instead, let's just change the absorbed exemplar to point to the absorbing exemplar (i.e. Raj's exemplar changes from Raj to Edgar). When we're looking up exemplars, we change the algorithm from being a straight lookup to being a \"chain\" lookup. So to find Ella-Louise's exemplar, we look her up in the table and find Raj. We then look up Raj and find Edgar. Finally, we look up Edgar and find he's his own exemplar. \n",
+    "\n",
+    "Effectively, we have the structure like this, and to find the exemplar we keep following the links up and right.\n",
+    "\n",
+    "```\n",
+    "Ella-Louise ---> Raj -+-> Edgar\n",
+    "Jacqueline -----------+\n",
+    "Abby -----------------+\n",
+    "\n",
+    "Anita ---------> Harlow\n",
+    "```\n",
+    "\n",
+    "The entire friendship group in the example will look like this:\n",
+    "\n",
+    "```\n",
+    "Ella-Louise ---> Raj -+-> Edgar\n",
+    "Jacqueline -----------+\n",
+    "Abby -----------------+\n",
+    "Finlay ---------------+\n",
+    "\n",
+    "Anita -------+-> Harlow\n",
+    "Caiden ------+\n",
+    "\n",
+    "Bronwyn -----+-> Sanjay\n",
+    "Samson  -----+\n",
+    "Alessandra --+\n",
+    "```\n",
+    "\n",
+    "To find the number of groups, we just look in the table for the number of exemplar (people who point to themselves).\n",
+    "\n",
+    "For part 2, the sizes of groups, we extend the value in the lookup table to include the group size. An exemplar's group size is the number of people in that group. When a group is absorbed, we increase the absorbing exemplar's size by the absorbed group's size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def exemplar_of(person, groups):\n",
+    "    if person in groups:\n",
+    "        exemplar = person\n",
+    "        while groups[exemplar]['parent'] != exemplar:\n",
+    "            exemplar = groups[exemplar]['parent']\n",
+    "        return exemplar\n",
+    "    else:\n",
+    "        return None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def new_group(name, debug=False):\n",
+    "    if debug: print('adding new', name)\n",
+    "    return {'parent': name, 'size': 1}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "debug = False\n",
+    "\n",
+    "groups = {}\n",
+    "for this, that in friendships:\n",
+    "    # if need be, create a new group of size 1 for each person mentioned.\n",
+    "    if this not in groups:\n",
+    "        groups[this] = new_group(this, debug=debug)\n",
+    "    if that not in groups:\n",
+    "        groups[that] = new_group(that, debug=debug)\n",
+    "    # now we know we have two groups, merge them if necessary.\n",
+    "    # first find the two exemplars\n",
+    "    this_exemplar = exemplar_of(this, groups)\n",
+    "    that_exemplar = exemplar_of(that, groups)\n",
+    "    if debug: print('{} -> {} ; {} -> {}'.format(this, this_exemplar, that, that_exemplar))\n",
+    "    if this_exemplar != that_exemplar:\n",
+    "        # different groups, so need to merge\n",
+    "        # absorb the smaller into the larger, so find the sizes\n",
+    "        this_size = groups[this_exemplar]['size']\n",
+    "        that_size = groups[that_exemplar]['size']\n",
+    "        if this_size > that_size:\n",
+    "            # set the absorbed exemplar to be the absorbing exemplar\n",
+    "            groups[that_exemplar]['parent'] = this_exemplar\n",
+    "            # update the absorbing group's size\n",
+    "            groups[this_exemplar]['size'] = this_size + that_size\n",
+    "            if debug: print('merging {} <- {}'.format(this_exemplar, that_exemplar))\n",
+    "        else:\n",
+    "            groups[this_exemplar]['parent'] = that_exemplar\n",
+    "            groups[that_exemplar]['size'] = this_size + that_size\n",
+    "            if debug: print('merging {} -> {}'.format(this_exemplar, that_exemplar))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "21"
+      ]
+     },
+     "execution_count": 45,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sum(1 for k, v in groups.items() if v['parent'] == k)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "147"
+      ]
+     },
+     "execution_count": 46,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "max(g['size'] for g in groups.values())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}