+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Worked example: part 1\n",
+ "Ah, the problem that didn't pan out. \n",
+ "\n",
+ "This was _meant_ to be an exercise in dynamic programming, another technique taught in M269. However, I mucked up both the problem specification in part 1 and the test data in part 2, so that other, simpler, approaches gave the correct answers. \n",
+ "\n",
+ "Part 1 was meant to be a variant on the greatest-common subsequence problem, but making it _whole_ subsequence checking meant there was no need for dynamic programming. Part 2 did require something like dynamic programming for the general case, but the test data didn't force examination of all the cases, so a simpler algorithm that would gave false positives didn't return any while using this data set. \n",
+ "\n",
+ "So, part 1. We want to see if $s_1$ is a subsequence of $s_2$. The simple way is to walk along $s_2$, character by character, keeping track of how much of $s_1$ is a subsequence up to this point. I use the ppinter _i_ as the position in the next character to check in $s_1$. If, when we've finished, _i_ points beyond the end of $s_1$, $s_1$ is a subsequence of $s_2$.\n",
+ "\n",
+ "For instance, if we want to see if `abc` is a subseqence of `babaca`, we can see that:\n",
+ "* ø is a subsequence of `b` (_i_ == 0)\n",
+ "* `a` is a subsequence of `ba` (_i_ == 1)\n",
+ "* `ab` is a subsequence of `bab` (_i_ == 2)\n",
+ "* `ab` is a subsequence of `baba` (_i_ == 2)\n",
+ "* `abc` is a subsequence of `babac` (_i_ == 3)\n",
+ "* `abc` is a subsequence of `babaca` (_i_ == 3)\n",
+ "\n",
+ "That's implemented as `is_subseq_simple`. The `is_subseq_simple_shortcut` does the same, but bails out of the loop as soon as it's determined that $s_1$ is a subsequence of $s_2$; working from the end of $s_2$ means the checks are for _i_ < 0 rather than using the length of $s_1$.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The idea of the problem was dynamic programming. This comes in useful when considering the _longest common subsequence_ problem, were we have to identify the how much of $s_1$ can be found as subsequence in $s_2$.\n",
+ "\n",
+ "A recursive solution to the subsequence problem looks at the last character of each of $s_1$ and $s_2$. \n",
+ "\n",
+ "> If they're different, $s_1$ is only a subsequence of $s_2$ if $s_1$ is also a subsequence of all but the last character of $s_2$ (i.e. `abc` is a subsequence of `babaca` iff `abc` is a subsequence of `babac`). \n",
+ "\n",
+ "> If the last two characters are the same, $s_1$ is a subsequence of $s_2$ if $s_1$ is a subsequence of all but the last character of $s_2$, or all but the last character of $s_1$ is a subsequence of all but the last character of $s_2$ (i.e. `abc` is a subsequence of `babac` if `ab` is a subsequence of `baba` or `abc` is a subsequence of `baba`)\n",
+ "\n",
+ "> There are two base cases. If $s_1$ is empty, return True. If length($s_1$) > length($s_2$), return False.\n",
+ "\n",
+ "The problem with this definition is that it can do a lot of repeated work (see the image below). The complexity is $O(2^{\\text{length of } s_2})$. The dynamic programmic approach comes at the problem from the other angle. \n",
+ "\n",
+ "<a href=\"gt.dot.png\"><img src=\"gt.dot.png\" alt=\"Finding a subsequence\" style=\"width: 200px;\"/></a>\n",
+ "\n",
+ "The way I think about it is that the recursive solution would be very efficient if there was some magic lookup table we could consult, which would give the answers to the subproblems. We can build that lookup table, starting with very short fragments of $s_1$ and $s_2$, building up the table, and using previous results in the table to fill in each cell.\n",
+ "\n",
+ "In this problem, we build a table such that the cell at row _i_ and column _j_ contains True if the first _i_ characters of $s_1$ are a subsequence of the first _j_ characters of $s_2$. (Note a complication due to Python's zero-based indexing of strings and lists. The third character of $s_1$ is referred to in Python as `s1[2]`.)\n",
+ "\n",
+ "Going back to the recursive description, we can see that:\n",
+ "\n",
+ "> All cells in the top row (_i_ == 0) contain True.\n",
+ "\n",
+ "> All cells in the left column (_j_ == 0) contain False (apart from _i_ == _j_ == 0, which is True).\n",
+ "\n",
+ "> If the _i_-1 th character of $s_1$ is different from the _j_-1 th character of $s_2$, this cell (at position (_i_, _j_) ) contains the same value as the cell at (_i_, _j_ - 1) i.e. the cell to the left.\n",
+ "\n",
+ "> If the _i_-1 th character of $s_1$ is the same as the _j_-1 th character of $s_2$, this cell (at position (_i_, _j_) ) contains True if either cell at (_i_, _j_ - 1) (i.e. the cell to the left) contains True, or the cell at (_i_ - 1, _j_ - 1) (i.e. the cell diagonally above and to the left) contains True.\n",
+ "\n",
+ "As each cell in the table only references the cells above and to the left, we can fill out the table row by row, going from left to right, and know we will always have the information needed to complete each cell when we get to it.\n",
+ "\n",
+ "And that's dynamic programming. As we're filling out a table, the complexity is $O({\\text{length of } s_1} \\times {\\text{length of } s_2})$ or roughly $O\\left((\\text{length of } s_2)^2 \\right)$\n",
+ "\n",
+ "The tables below show worked examples for seeing if `acba` is a subsequnce of `aaccabab` (it is) and `cdabcaca` (it isn't).\n",
+ "\n",
+ "For the first example, we fill out the first row of the table with True (by definition).\n",
+ "\n",
+ "For the second row (with _i_ = 1), we want to see if `a` is a subsequence of different prefixes is `aaccabab`. The cell with _j_ = 0 is False, by definition. For the cell at (_i_ = 1, _j_ = 1), the characters at $s_1$[0] and $s_2$[0] are the same, so this cell is True if the cell to the left is True (it isn't) or the cell above and to the left is True (it is). So cell (1, 1) is True, and the rest of that row is filled out to True.\n",
+ "\n",
+ "For the third row (with _i_ = 2), we want to see if `ac` is a subsequence of different prefixes is `aaccabab`. The cell with _j_ = 0 and _j_ = 1 are False, by definition. For the cell at (_i_ = 2, _j_ = 2), the characters at $s_1$[1] and $s_2$[1] are different, so this cell is True is the cell to the left is True; it isn't, so this cell contains False. For the cell at (_i_ = 2, _j_ = 3), the characters at $s_1$[1] and $s_2$[2] are the same, so this cell is True if the cell to the left is True (it isn't) or the cell above and to the left is True (it is). So cell (2, 3) is True, and the rest of that row is filled out to True.\n",
+ "\n",
+ "You can continue filling out the table in the same way.\n",
+ "\n",
+ "When the table is complete, the bottom right cell contains True, which means that `acba` is a subsequnce of `aaccabab`\n",
+ "\n",
+ "| |<br />0|a<br />1|a<br />a<br />2|a<br />a<br />c<br />3|a<br />a<br />c<br />c<br />4|a<br />a<br />c<br />c<br />a<br />5|a<br />a<br />c<br />c<br />a<br />b<br />6|a<br />a<br />c<br />c<br />a<br />b<br />a<br />7|a<br />a<br />c<br />c<br />a<br />b<br />a<br />b<br />8|\n",
+ "|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n",
+ "|0<br />|T|T|T|T|T|T|T|T|T|\n",
+ "|1<br />a|.|T|T|T|T|T|T|T|T|\n",
+ "|2<br />ac|.|.|.|T|T|T|T|T|T|\n",
+ "|3<br />acb|.|.|.|.|.|.|T|T|T|\n",
+ "|4<br />acba|.|.|.|.|.|.|.|T|T|\n",
+ "\n",
+ "\n",
+ "| |<br />0|c<br />1|c<br />d<br />2|c<br />d<br />a<br />3|c<br />d<br />a<br />b<br />4|c<br />d<br />a<br />b<br />c<br />5|c<br />d<br />a<br />b<br />c<br />a<br />6|c<br />d<br />a<br />b<br />c<br />a<br />c<br />7|c<br />d<br />a<br />b<br />c<br />a<br />c<br />a<br />8|\n",
+ "|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n",
+ "|0<br />|T|T|T|T|T|T|T|T|T|\n",
+ "|1<br />a|.|.|.|T|T|T|T|T|T|\n",
+ "|2<br />ac|.|.|.|.|.|T|T|T|T|\n",
+ "|3<br />acb|.|.|.|.|.|.|.|.|.|\n",
+ "|4<br />acba|.|.|.|.|.|.|.|.|.|\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Worked example: part 2\n",
+ "This was a harder task, but the test data I provided didn't require the general case to be solved. \n",
+ "\n",
+ "The task was to return if $s_1$ and $s_2$ could be interleaved to form $s_3$. That's a stronger condition than just saying that both $s_1$ and $s_2$ are subsequences of $s_3$. For instance, `aba` and `aca` are both subsequences of `abbcca`, but there's no way of interleaving `aba` and `aca` to form `abbcca` (the interleaved sequence should have four `a`s, one `b`, and one `c`).\n",
+ "\n",
+ "For the test data provided, there was only one string which had both $s_1$ and $s_2$ as subsequences. I should have given other distractors in the test data, where $s_1$ and $s_2$ were both subsequences but the distactor wasn't formed from the interleaving.\n",
+ "\n",
+ "Anyway, the solution I was hoping for was another dynamic programming one. \n",
+ "\n",
+ "A recursive solution to the problem (can $s_1$ and $s_2$ be interleaved to form $s_3$?) looks like:\n",
+ "\n",
+ "> If the last characters of $s_1$ and $s_3$ are the same, $s_1$ and $s_2$ be interleaved to form $s_3$ if `butlast`($s_1$) and $s_2$ can be interleaved to form `butlast`($s_3$).\n",
+ "\n",
+ "> If the last characters of $s_2$ and $s_3$ are the same, $s_1$ and $s_2$ be interleaved to form $s_3$ if $s_1$ and `butlast`($s_2$) can be interleaved to form `butlast`($s_3$).\n",
+ "\n",
+ "> If the last characters of $s_1$ and $s_2$ and $s_3$ are all the same, check both of the conditions above, returning True if either is True.\n",
+ "\n",
+ "> If the last characters of $s_1$ and $s_2$ and $s_3$ are all different, return False.\n",
+ "\n",
+ "> There are three base cases. If $s_1$ is empty, return $s_2$ == $s_3$. If $s_2$ is empty, return $s_1$ == $s_3$. If $s_1$ + $s_2$ is longer than $s_3$, return False.\n",
+ "\n",
+ "This gives us the ammunition to build the dynamic programming table. The cell at (_i_, _j_) will contain True if the first _i_ characters of $s_1$ can be interleaved with the first _j_ characters of $s_2$ to form the first _i_ + _j_ characters of $s_3$.\n",
+ "\n",
+ "All cells in the table initially contain False.\n",
+ "\n",
+ "When filling out the table, you either look at the cell to the left (if the last characters of $s_1$ and $s_3$ are the same) or the cell above (if the last characters of $s_2$ and $s_3$ are the same). "
+ ]
+ },