From: Neil Smith Date: Fri, 4 Aug 2017 14:34:31 +0000 (+0100) Subject: Added walkthroughs on instructions X-Git-Url: https://git.njae.me.uk/?a=commitdiff_plain;h=1c646cdbee7c09ee25ffbd9c5179588e0dab44e3;p=ou-summer-of-code-2017.git Added walkthroughs on instructions --- diff --git a/03-door-codes/door-codes-solution.ipynb b/03-door-codes/door-codes-solution.ipynb index b1fd105..f0ea349 100644 --- a/03-door-codes/door-codes-solution.ipynb +++ b/03-door-codes/door-codes-solution.ipynb @@ -43,9 +43,29 @@ "**What is your door code?**" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Worked example of solution: Part 1\n", + "\n", + "While the overall shape of this is the same as previous days (walk along a list, updating the code as you reach each letter), there are a couple of wrinkles:\n", + "\n", + "1. Not every character in the input should be processed (and the others should be converted to lower-case letters).\n", + "2. The 'update the code' part is complex.\n", + "\n", + "\"Sanitising\" the input is, again, walking over the input, convering letters and discarding the rest. These are examples of standard approaches: `filter` is applying a predicate to every item in a sequence, returning just hose that pass; `map` is applying a function to every item in a sequence, returning the sequence of results. In this case, sanitising the input is `filter`ing to keep just the letters then `map`ping over the \"convert to lowercase\" function. Python's comprehensions do this: the general form is `f(x) for x in sequence if predicate(x)`\n", + "\n", + "Updating the code involves lots of faffing around, converting between characters and numbers. Rather than retyping lots of arithmetic, I define a couple of functions to do the conversions how I want. I've deliberately given them short names, as I want the functions to almost disappear in the program, becoming little more than punctuation. That will keep the focus on the important part, the updating.\n", + "\n", + "The `ord(letter) - ord('a')` and `chr(number + ord('a')` are standard idioms for converting from letters to positions in the alphabet. There's also moving the result by 1 to give one-based numbering, and the modulus operation `%` to keep the numbers in the range 0-25 before converting back to letters.\n", + "\n", + "Finally, the `string` library defines some convenient constants, which helps prevent annoying and hard-to-find typos if I wrote out the alphabet verbatim here." + ] + }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "metadata": { "collapsed": true }, @@ -56,79 +76,96 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ - "def o(letter):\n", - " return ord(letter) - ord('a') + 1\n", - "\n", - "def c(number):\n", - " return chr((number - 1) % 26 + ord('a'))" + "def sanitise(phrase):\n", + " return ''.join(l.lower() for l in phrase if l in string.ascii_letters)" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "('a', 'z', 'z', 'a')" + "'helloworld'" ] }, - "execution_count": 5, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "c(1), c(0), c(26), c(27)" + "sanitise('Hello World')" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ - "def sanitise(phrase):\n", - " return ''.join(l.lower() for l in phrase if l in string.ascii_letters)" + "def o(letter):\n", + " return ord(letter) - ord('a') + 1\n", + "\n", + "def c(number):\n", + " return chr((number - 1) % 26 + ord('a'))" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "'helloworld'" + "('a', 'z', 'z', 'a')" ] }, - "execution_count": 7, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "sanitise('Hello World')" + "c(1), c(0), c(26), c(27)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def whash1(word):\n", + " h = list(word[:2])\n", + " for l in word[2:]:\n", + " h[0] = c(o(h[0]) + o(h[1]))\n", + " h[1] = c(o(h[1]) + o(l))\n", + " return ''.join(h)" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ + "# Extended version that generates the tables used in the question text.\n", "def whash1(word, show_steps=False):\n", " if show_steps:\n", " print('| old code | code as
numbers | passphrase
letter | number of
letter | new first
part of code |'\n", @@ -154,7 +191,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -175,7 +212,7 @@ "'vk'" ] }, - "execution_count": 8, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -186,7 +223,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "metadata": { "collapsed": true }, @@ -198,7 +235,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 24, "metadata": {}, "outputs": [ { @@ -207,7 +244,7 @@ "'mc'" ] }, - "execution_count": 10, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } @@ -265,9 +302,18 @@ "Using this new algorithm, **what is your door code?**" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Worked example of solution: Part 2\n", + "\n", + "This is almost identical to part 1, but the arithmetic is slightly different. Note the use of keyword arguments with default values, to allow the code to use different starting values." + ] + }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "metadata": {}, "outputs": [ { @@ -276,7 +322,7 @@ "(21, 231, 23, 'w')" ] }, - "execution_count": 11, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -287,7 +333,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -296,7 +342,7 @@ "(18, 9, 45, 63, 'k')" ] }, - "execution_count": 12, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -307,7 +353,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -316,7 +362,7 @@ "(9, 20, 220, 229, 'u')" ] }, - "execution_count": 13, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } @@ -327,12 +373,32 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ + "def whash2(word, h0=None, alpha=5, beta=11):\n", + " if h0 is None:\n", + " h = list('ri')\n", + " else:\n", + " h = list(h0)\n", + " for l in word:\n", + " h[0] = c(o(h[0]) + o(h[1]) * alpha)\n", + " h[1] = c(o(h[1]) + o(l) * beta)\n", + " return ''.join(h)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# Extended version that generates the tables used in the question text.\n", "def whash2(word, h0=None, alpha=5, beta=11, show_steps=False):\n", " if show_steps:\n", " print('| old code | code as
numbers | passphrase
letter | number of
letter | new first
part of code |'\n", @@ -361,7 +427,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -384,7 +450,7 @@ "'vl'" ] }, - "execution_count": 15, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -395,7 +461,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 22, "metadata": {}, "outputs": [ { @@ -404,7 +470,7 @@ "'qb'" ] }, - "execution_count": 16, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } diff --git a/04-amidakuji/amidakuji-solution-1.ipynb b/04-amidakuji/amidakuji-solution-1.ipynb index 0a9b424..9ef2187 100644 --- a/04-amidakuji/amidakuji-solution-1.ipynb +++ b/04-amidakuji/amidakuji-solution-1.ipynb @@ -45,9 +45,62 @@ "(Your answer should be one string of 26 letters, without spaces or punctuation, like `acfbed` .)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Worked solution\n", + "\n", + "> Note: the code in this notebook doesn't really follow the structure of the problem. That's because I was working on this task for a while, looking at different approaches and different challenges to pose based on the idea. I'll pick out the important parts from the code that's below.\n", + "\n", + "> Also note that terminology changes throughout the code here. The question talks about lanes and distances, but the code refers to lines and heights.\n", + "\n", + "\n", + "## Data structure\n", + "This task requires a bit of though behind the data structures before you can start tackling the problem. The first thing to do is think about the data structure to use to represent the network, and the operations we need to perform on it.\n", + "\n", + "The operations are:\n", + "1. Create the network from a collection of links.\n", + "2. Follow a person through the labyrinth.\n", + "3. Follow lots of people together through the layrinth (if that's different and easier).\n", + "4. (Looking ahead) Shuffle the heights of links for packing.\n", + "\n", + "We could consider the labyrinth as a collection of links that affect lanes, or as a collection of lanes that know about links. Given that we're presented with a set of links, and need to move the links around, let's go with representing the network as a bunch of links. \n", + "\n", + "How to represent each link? As we're doing things with links later, it's easier if I just store the labyrinth as a bunch of links, and have each link know everything about itself that I would want. Given that we could, later, have more than one link at each height, each link will need to know its own left end, right end, and height. \n", + "\n", + "Python doesn't have anything like records from other languages. I could use a `dict` to store each link, but in this case I'll use a `namedtuple` from the `collections` library. It will allow me to say things like `this_link.height` and `this_link.left`, which is easier to read (and write!) than `this_link['height']` and `this_link['left']`.\n", + "\n", + "## Reading the input\n", + "Each line of the file consists of two sequences of digits with sequences of non-digits surrounding them. I use a regular expression and the `re.split()` function to split the line, treating non-digits (the `\\D+` term) as the separators. The `read_net` procedure reads the file, splits each line into the substrings, converts the relevant parts into numbers, then builds the links. Note the use of the `enumerate` built-in function, which iterates through a sequence, returning both the item and the count each time. That gives me the heights of the links. \n", + "\n", + "## Following people through\n", + "`follow()` follows one person through the labyrinth. The `line` variable holds the line/lane this person is currently on as they move through the laybrinth.\n", + "\n", + "The procedure is structured to allow for there being several links at the same height. It finds the distinct heights, puts them in order, then iterates through the heights. It then finds all the links at that height and, if one of them has an end on `line`, it uses that link to do the swap. \n", + "\n", + "This is fine for one person, but it's slow to execute the whole process 26 times for 26 people, when most of the work is the same for each person going through. But I've implemented that process to work on packed networks (the part 2 problem), so let's look at that first.\n", + "\n", + "## Packing\n", + "The idea of packing is to keep track of the position of the furthest link that's on each lane. When we add a link to the packed network, we look up the lanes it joins, find the furthest link on either of them, then add the new link at one level beyond that. We then update the positions recorded for the two lanes. \n", + "\n", + "The current lane distances are held in the `line_heights` dictionary. It's a `defaultdict`, defined to return the value of `-1` for any line that hasn't been processed yet. The height for a new link is the maximum existing height for either of its ends, +1. This means that the first link is placed at height zero.\n", + "\n", + "## Following again\n", + "Once we have the idea of a packed network, I clarified the idea of a `height_group`, the set of links that are all at a particular height. The `height_groups()` function uses some library magic to split a network into a list of lists of links, with each inner list being all the links at that height, i.e. it returns a list like this:\n", + "\n", + "```\n", + "[, , … ]\n", + "```\n", + "\n", + "Once you have the height groups, you can use them to follow many items through the network at the same time. `follow_many()` takes a sequence of things in their starting order, and follows them all through the network. There must be at least as many items in the input as there are lanes in the network, and I don't check for that being true. The input sequence is converted to a `list`, as that can be updated in place, while Python won't allow changes inside `string`s.\n", + "\n", + "As the packing and height-group-finding ensures that there is at most one link on each lane in any particular height group, I don't need to go through the height group in any particular order. I just take each link swap the items at the ends, and update the `seq` accordingly. (Note the simultaneous assignment for swapping without a temporary variable.)" + ] + }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 1, "metadata": { "collapsed": true }, @@ -61,7 +114,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 2, "metadata": { "collapsed": true }, @@ -72,7 +125,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 3, "metadata": { "collapsed": true }, @@ -84,7 +137,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 4, "metadata": { "collapsed": true }, @@ -96,7 +149,27 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['', '2', '4', '']" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "re.split('\\D+', '(2, 4)')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, "metadata": { "collapsed": true }, @@ -104,7 +177,7 @@ "source": [ "def read_net(filename, rev=False):\n", " with open(filename) as f:\n", - " pairs = [re.split('\\D+', p.strip()) for p in f.readlines()]\n", + " pairs = [re.split('\\D+', p.strip()) for p in f]\n", " if rev:\n", " lrs = [(int(lr[1]), int(lr[2])) for lr in reversed(pairs)]\n", " else:\n", @@ -115,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 21, "metadata": {}, "outputs": [ { @@ -138,7 +211,7 @@ " Link(height=14, left=1, right=4)]" ] }, - "execution_count": 15, + "execution_count": 21, "metadata": {}, "output_type": "execute_result" } @@ -150,7 +223,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 22, "metadata": {}, "outputs": [ { @@ -159,7 +232,7 @@ "10135" ] }, - "execution_count": 16, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } diff --git a/05-display-board/display-board-solution.ipynb b/05-display-board/display-board-solution.ipynb index 9813099..a202fc5 100644 --- a/05-display-board/display-board-solution.ipynb +++ b/05-display-board/display-board-solution.ipynb @@ -84,6 +84,65 @@ "You're standing in front of gate 9¾. You have [the instructions](05-pixels.txt). How many pixels would be lit on the board, if it were working?" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Worked example solution: Parts 1 and 2\n", + "This is an example of building an interpreter for a virtual machine, and then executing a program on it. Thinking about the problem in this way allows me to split the problem into two parts straight away:\n", + "\n", + "1. Build a virtual machine\n", + "2. Build a parser that will take the string representation of instructions and convert a set of commands the machine will understand. \n", + "\n", + "The second task is difficult until we've decided what the result of that parsing should look like, so let's look at the virtual machine first.\n", + "\n", + "Neither part is particularly hard. What makes this problem more of a challenge is doing the two parts together, and making sure they mesh in the end." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Virtual machine\n", + "\n", + "### Representation\n", + "The machine is little more than the grid and the operations that act on it. There are only four operations, so let's implement each one as a function that takes as input a machine/grid and returns as output the updated grid (But because of the way Python handles lists, changes are generally done in-place, so we don't _need_ to use the return value.)\n", + "\n", + "We have some choices about how to implement the grid itself. One approach would be a 2d array of booleans, or a 2d array of characters, or of numbers. Python doesn't do 2d arrays (or fixed-sized arrays at all), so arrays would be lists. \n", + "\n", + "Using a boolean for each cell has the advantage that the `toggle` operations are simple `not` operations. But, if we represent the cells as characters, we can store the grid as a list of strings. To some extent, it doesn't matter with Python, as lists and strings both offer the same interface of picking out individual parts and sections with the slice notation. \n", + "\n", + "We could also do something like a 'spare array' representation, using a `dict` to store the grid. The keys would be a `tuple` of `(row, column)` and the value would be the boolean/character/number which is the cell at that position. However, as we're taking slices of grid and acting on them, this is likely to get cumbersome. \n", + "\n", + "For no particular reason, I chose to represent the grid as a list of strings, with a cell being '\\*' if that pixel is on and '.' if it's off. This has be slight advantage that printing the grid is easier.\n", + "\n", + "### Commands\n", + "You'll notice that the first thing I did was build procedures that create a new grid of a particular size, and print the grid. These are really useful for testing and debugging, as I can easily see what's happening when the other commands go wrong! (The `print_grid` then got a bit more complex to allow for different output formatting.)\n", + "\n", + "The `top` and `left` commands are fairly straightforward, apart from having to be careful with boundary arithmetic for moving between 1-based and 0-based counting, and inclusive-at-both-end sections. \n", + "\n", + "The `rotate_column` and `rotate_row` functions use the modulus `%` operation to keep the rotation amount down to a sensible level. The `rotate_row` is the simplest: it forms a new row by joining the last few elements of the row to the first few elements of the row, wrapping the last elements to the front. `rotate_column` does the same thing, but with the added complication of snipping out the column into the `col` variable then rebuilding the rows with the `new_col`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Parsing\n", + "This is much simpler than the first part. \n", + "\n", + "Each instruction is of the format ` `. That means we can apply the same parsing function to each line without having to worry about which instruction it is, and we can always return the same thing, namely the text and the two numbers. \n", + "\n", + "Python's `rsplit()` methods splits a string into chunks, by default splitting on whitespace. The `maxsplit` parameter limits how many splits to make, so we don't end up splitting the multi-word command names.\n", + "\n", + "## Applying instructions\n", + "One way to do this is with a multi-way `if` statement (or a `case` statement), but that takes space to write and is fragile when it comes to perhaps adding extra commands. Instead, I'm using a _dispatch table_. \n", + "\n", + "The general idea is that the table contains the functions and procedures we could call, and we pick which one at run time. This simplifies the code in the `interpet` procedure, as all we do is look up the instruction name in the table and apply the function that comes out of it. This is helped by Python's easy syntax for this sort of thing: the name of a function, used without brackets, is the function itself (the brackets mean \"apply this function\"). \n", + "\n", + "The `interpret` function just goes along the list of instructions, applying them one at a time. There's some extra bits in there for generating different outputs, and the `clear_output` function is specific to Jupyter notebooks, allowing the next output to be printed in the same place as the last, effectively animating the creation of the message on the display." + ] + }, { "cell_type": "code", "execution_count": 1, @@ -236,25 +295,25 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def interpret(commands, grid=None, w=WIDTH, h=HEIGHT, \n", - " show_each_step=False, md=False, overprint=False):\n", + " show_each_step=False, md=False, overprint=False, overprint_delay=0.25):\n", " if grid is None:\n", " grid = new_grid(w, h)\n", " for c in commands:\n", " cmd, a, b = parse(c)\n", " if cmd in command_dispatch:\n", - " command_dispatch[cmd](grid, a, b)\n", + " grid = command_dispatch[cmd](grid, a, b)\n", " else:\n", " raise ValueError('Unknown command')\n", " if show_each_step:\n", " if overprint:\n", - " time.sleep(0.25)\n", + " time.sleep(overprint_delay)\n", " if md: \n", " print('`{}`'.format(c))\n", " else:\n", @@ -311,6 +370,31 @@ "sum(1 for c in ''.join(g) if c == '*')" ] }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Final\n", + "...****..............*...................*.....*..............*.................\n", + "......*..............*...................***..**..............*.................\n", + "......*.*****.*****.****.*****..****.....*.*.***.*****.*****.****.*****..****...\n", + "......*.....*.*...*..*.......*..*........*..**.*.....*.*...*..*.......*..*......\n", + "......*.*****.*...*..*...*****..*........*..*..*.*****.*...*..*...*****..*......\n", + "......*.*...*.*...*..*...*...*..*........*.....*.*...*.*...*..*...*...*..*......\n", + "...*..*.*..**.*...*..**..*..**..*........*.....*.*..**.*...*..**..*..**..*......\n", + "....**...**.*.*...*...**..**.*..*........*.....*..**.*.*...*...**..**.*..*......\n" + ] + } + ], + "source": [ + "g = interpret(cmds, show_each_step=True, overprint=True, overprint_delay=0.1)" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/problem-ideas.ipynb b/problem-ideas.ipynb index 137892d..6c5268b 100644 --- a/problem-ideas.ipynb +++ b/problem-ideas.ipynb @@ -265,7 +265,9 @@ "\n", "* \"How tweet it is\" from [2014 APL programming language competition](http://www.dyalog.com/uploads/files/student_competition/2014_problems_phase1.pdf) (remove interior vowels from words)\n", "\n", - "* More ghost leg: simplify a network by finding whole permuation, then splitting it down into transpositions. Look at theory of permutations for details.\n" + "* More ghost leg: simplify a network by finding whole permuation, then splitting it down into transpositions. Look at theory of permutations for details.\n", + "\n", + "* [Strata](https://en.wikipedia.org/wiki/Strata_(video_game)) game: how many puzzles have a valid solution? How many valid solutions are there to a puzzle?\n" ] }, { @@ -304,7 +306,13 @@ "- Lua\n", "- JavaScript\n", "- Java\n", - "- Dart" + "- Dart\n", + "- Kotlin\n", + "- Elixir / Erlang\n", + "- Oz / Mozart\n", + "- APL / J\n", + "- Rust\n", + "- Go" ] }, {