Linted
[menace.git] / menace-nim-project.ipynb
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# Menace-like Nim player\n",
8 "\n",
9 "This is an implementation of a Menace-like player for one-stack Nim."
10 ]
11 },
12 {
13 "cell_type": "markdown",
14 "metadata": {},
15 "source": [
16 "## Nim\n",
17 "\n",
18 "There are many version of Nim. This version uses a single stack of tokens (initially nine). Players take turns removing one, two, or three tokens from the stack. The player who takes the last token loses."
19 ]
20 },
21 {
22 "cell_type": "markdown",
23 "metadata": {},
24 "source": [
25 "### Menace\n",
26 "[Menace](http://chalkdustmagazine.com/features/menace-machine-educable-noughts-crosses-engine/) was an early program that learnt how to play a perfect game of noughts and crosses. It was developed by [Donald Michie](https://en.wikipedia.org/wiki/Donald_Michie). As computers were rare in the 1950s, Michie implemented the machine as a set of matchboxes.\n",
27 "\n",
28 "There is one matchbox for each state of the game (330 for noughts and crosses, ignoring rotations and reflections; 9 for this version of Nim). Each box contains some beads, with each type of bead representing a different move. \n",
29 "\n",
30 "When playing the game, Menace makes its move by picking a random bead from the appropriate box. It plays randomly, but biased by the number of beads in each box.\n",
31 "\n",
32 "Learning takes place by [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning) after the game. If Menace won, all the beads are replaced in their original boxes, and another bead of the same type is added to the box. In other words, one bead came out of the box, two beads of that type go back in. If Menace lost, the played beads are removed (with a minimum of one bead of each type remaining in each box)."
33 ]
34 },
35 {
36 "cell_type": "markdown",
37 "metadata": {},
38 "source": [
39 "## Implementation\n",
40 "\n",
41 "Create a new Python3 trinket. In **`main.py`**, add the lines:"
42 ]
43 },
44 {
45 "cell_type": "code",
46 "execution_count": 6,
47 "metadata": {},
48 "outputs": [],
49 "source": [
50 "from menace import *\n",
51 "import random\n",
52 "import pprint\n",
53 "\n",
54 "pp = pprint.PrettyPrinter()"
55 ]
56 },
57 {
58 "cell_type": "markdown",
59 "metadata": {},
60 "source": [
61 "Create a new file called **`menace.py`**. All the procedures we write will be in this file.\n",
62 "\n",
63 "We start with some imports and constant definitions."
64 ]
65 },
66 {
67 "cell_type": "code",
68 "execution_count": 7,
69 "metadata": {
70 "collapsed": true
71 },
72 "outputs": [],
73 "source": [
74 "import collections\n",
75 "import random"
76 ]
77 },
78 {
79 "cell_type": "code",
80 "execution_count": 8,
81 "metadata": {
82 "collapsed": true
83 },
84 "outputs": [],
85 "source": [
86 "INITIAL_GAME_SIZE = 9\n",
87 "MAX_TAKE = 3\n",
88 "INITIAL_BEAD_COUNT = 3"
89 ]
90 },
91 {
92 "cell_type": "markdown",
93 "metadata": {},
94 "source": [
95 "A **Menace** instance is just a collection of boxes, each box containing some beads. The bead types are just the number of tokens to remove on this move. \n",
96 "\n",
97 "We'll use a `collections.Counter` object to store counts of the number of beads. `collections.Counter()` creates a new empty Counter. We can assign counts to it the same as a `dict`, by just saying\n",
98 "\n",
99 "```\n",
100 "box = collections.Counter()\n",
101 "box[1] = INITIAL_BEAD_COUNT\n",
102 "```\n",
103 "\n",
104 "We'll use a `dict` of boxes, just to avoid off-by-one errors when comparing the number of tokens to the index of the box collection. \n",
105 "\n",
106 "The boxes `dict` also has a key to show that this is not a human player.\n",
107 "\n",
108 "Note that we ensure that only legal moves are represented in the Menace player.\n",
109 "\n",
110 "#### Task\n",
111 "Write a procedure that will create a new Menace player."
112 ]
113 },
114 {
115 "cell_type": "code",
116 "execution_count": 9,
117 "metadata": {
118 "collapsed": true
119 },
120 "outputs": [],
121 "source": [
122 "def new_menace():\n",
123 " boxes = {'human?': False}\n",
124 " for b in range(1, INITIAL_GAME_SIZE+1):\n",
125 " box = collections.Counter()\n",
126 " for i in range(1, MAX_TAKE+1):\n",
127 " if b >= i:\n",
128 " box[i] = INITIAL_BEAD_COUNT\n",
129 " boxes[b] = box\n",
130 " return boxes"
131 ]
132 },
133 {
134 "cell_type": "markdown",
135 "metadata": {},
136 "source": [
137 "This procedure will pick a random move by listing all the beads and picking one at random."
138 ]
139 },
140 {
141 "cell_type": "code",
142 "execution_count": 10,
143 "metadata": {
144 "collapsed": true
145 },
146 "outputs": [],
147 "source": [
148 "def menace_move(box):\n",
149 " return random.choice(list(box.elements()))"
150 ]
151 },
152 {
153 "cell_type": "markdown",
154 "metadata": {},
155 "source": [
156 "### Test\n",
157 "We test this by adding a line to the end of **`main.py`** and running the project.\n",
158 "\n",
159 "Note that there are four 'p's on this line!"
160 ]
161 },
162 {
163 "cell_type": "code",
164 "execution_count": 11,
165 "metadata": {},
166 "outputs": [
167 {
168 "name": "stdout",
169 "output_type": "stream",
170 "text": [
171 "{1: Counter({1: 3}),\n",
172 " 2: Counter({1: 3, 2: 3}),\n",
173 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
174 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
175 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
176 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
177 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
178 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
179 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
180 " 'human?': False}\n"
181 ]
182 }
183 ],
184 "source": [
185 "pp.pprint(new_menace())"
186 ]
187 },
188 {
189 "cell_type": "code",
190 "execution_count": 41,
191 "metadata": {},
192 "outputs": [
193 {
194 "name": "stdout",
195 "output_type": "stream",
196 "text": [
197 "1\n"
198 ]
199 }
200 ],
201 "source": [
202 "p1 = new_menace()\n",
203 "print(menace_move(p1[9]))"
204 ]
205 },
206 {
207 "cell_type": "markdown",
208 "metadata": {},
209 "source": [
210 "A **human** player has no state, apart from saying it's human. There's some complex logic to ensure that we only get valid moves from the human.\n",
211 "\n",
212 "Add these procedures to **`menace.py`**"
213 ]
214 },
215 {
216 "cell_type": "code",
217 "execution_count": 13,
218 "metadata": {
219 "collapsed": true
220 },
221 "outputs": [],
222 "source": [
223 "def new_human():\n",
224 " return {'human?': True}"
225 ]
226 },
227 {
228 "cell_type": "code",
229 "execution_count": 14,
230 "metadata": {
231 "collapsed": true
232 },
233 "outputs": [],
234 "source": [
235 "def human_move(game):\n",
236 " if game['history']:\n",
237 " print('Opponent took', game['history'][-1]['move'], 'pieces.')\n",
238 " else:\n",
239 " print('You play first.')\n",
240 " print('There are', game['tokens'], 'pieces left.')\n",
241 " \n",
242 " max_move = min(MAX_TAKE, game['tokens'])\n",
243 " valid_input = False\n",
244 " \n",
245 " while not valid_input:\n",
246 " user_input = input('Your move (1-{})? '.format(max_move))\n",
247 " if user_input.isnumeric():\n",
248 " move = int(user_input)\n",
249 " if move in range(1, max_move+1):\n",
250 " valid_input = True\n",
251 " else:\n",
252 " print('Number not a valid move.')\n",
253 " else:\n",
254 " print('Please enter a number.')\n",
255 " return move"
256 ]
257 },
258 {
259 "cell_type": "markdown",
260 "metadata": {},
261 "source": [
262 "A **game** is the current state of the game (number of tokens, who's the current player), the two players, and a history of moves. We use the history for the reinforcement learning phase."
263 ]
264 },
265 {
266 "cell_type": "code",
267 "execution_count": 15,
268 "metadata": {
269 "collapsed": true
270 },
271 "outputs": [],
272 "source": [
273 "def new_game(player1, player2):\n",
274 " return {'tokens': INITIAL_GAME_SIZE,\n",
275 " 'player1': player1,\n",
276 " 'player2': player2,\n",
277 " 'player1_active': True,\n",
278 " 'history': []}"
279 ]
280 },
281 {
282 "cell_type": "markdown",
283 "metadata": {},
284 "source": [
285 "The **active player** makes a move, depending on whether it's a human or a menace. After the move is taken, update the game state and history. We store plenty in the history to make the learning phase easier."
286 ]
287 },
288 {
289 "cell_type": "code",
290 "execution_count": 16,
291 "metadata": {
292 "collapsed": true
293 },
294 "outputs": [],
295 "source": [
296 "def make_move(game):\n",
297 " if game['player1_active']:\n",
298 " active = game['player1']\n",
299 " else:\n",
300 " active = game['player2']\n",
301 " if active['human?']:\n",
302 " move = human_move(game)\n",
303 " else:\n",
304 " move = menace_move(active[game['tokens']])\n",
305 " game['history'] += [{'player1?': game['player1_active'], 'move': move, 'tokens': game['tokens']}]\n",
306 " game['tokens'] -= move\n",
307 " game['player1_active'] = not game['player1_active'] "
308 ]
309 },
310 {
311 "cell_type": "markdown",
312 "metadata": {},
313 "source": [
314 "### Test what we've got\n",
315 "\n",
316 "Let's test the move-making code.\n",
317 "\n",
318 "Remove the `pp.pprint` and `print` lines from **`main.py`** . Add these to **`main.py`** then run the project. Check you get output like this."
319 ]
320 },
321 {
322 "cell_type": "code",
323 "execution_count": 17,
324 "metadata": {},
325 "outputs": [
326 {
327 "name": "stdout",
328 "output_type": "stream",
329 "text": [
330 "{'history': [],\n",
331 " 'player1': {1: Counter({1: 3}),\n",
332 " 2: Counter({1: 3, 2: 3}),\n",
333 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
334 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
335 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
336 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
337 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
338 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
339 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
340 " 'human?': False},\n",
341 " 'player1_active': True,\n",
342 " 'player2': {1: Counter({1: 3}),\n",
343 " 2: Counter({1: 3, 2: 3}),\n",
344 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
345 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
346 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
347 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
348 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
349 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
350 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
351 " 'human?': False},\n",
352 " 'tokens': 9}\n"
353 ]
354 }
355 ],
356 "source": [
357 "p1 = new_menace()\n",
358 "p2 = new_menace()\n",
359 "g = new_game(p1, p2)\n",
360 "\n",
361 "pp.pprint(g)"
362 ]
363 },
364 {
365 "cell_type": "markdown",
366 "metadata": {},
367 "source": [
368 "Note that we've got two players, an empty history, and that player 1 is active. \n",
369 "\n",
370 "Add these lines to make a move, run the project, and check you get output like this."
371 ]
372 },
373 {
374 "cell_type": "code",
375 "execution_count": 18,
376 "metadata": {},
377 "outputs": [
378 {
379 "name": "stdout",
380 "output_type": "stream",
381 "text": [
382 "{'history': [{'move': 2, 'player1?': True, 'tokens': 9}],\n",
383 " 'player1': {1: Counter({1: 3}),\n",
384 " 2: Counter({1: 3, 2: 3}),\n",
385 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
386 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
387 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
388 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
389 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
390 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
391 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
392 " 'human?': False},\n",
393 " 'player1_active': False,\n",
394 " 'player2': {1: Counter({1: 3}),\n",
395 " 2: Counter({1: 3, 2: 3}),\n",
396 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
397 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
398 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
399 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
400 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
401 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
402 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
403 " 'human?': False},\n",
404 " 'tokens': 7}\n"
405 ]
406 }
407 ],
408 "source": [
409 "make_move(g)\n",
410 "pp.pprint(g)"
411 ]
412 },
413 {
414 "cell_type": "markdown",
415 "metadata": {},
416 "source": [
417 "After the move, note that the move is now recorded in the history and player 1 is no longer active."
418 ]
419 },
420 {
421 "cell_type": "markdown",
422 "metadata": {},
423 "source": [
424 "### Playing the game\n",
425 "\n",
426 "#### Task\n",
427 "Back in **`menace.py`**, write a function that returns `True` if the game has finished (i.e. has zero tokens left) and `False` otherwise."
428 ]
429 },
430 {
431 "cell_type": "code",
432 "execution_count": 19,
433 "metadata": {
434 "collapsed": true
435 },
436 "outputs": [],
437 "source": [
438 "def game_finished(game):\n",
439 " # Your code here"
440 ]
441 },
442 {
443 "cell_type": "markdown",
444 "metadata": {},
445 "source": [
446 "A **game** is just move after move until the game is finished."
447 ]
448 },
449 {
450 "cell_type": "code",
451 "execution_count": 20,
452 "metadata": {
453 "collapsed": true
454 },
455 "outputs": [],
456 "source": [
457 "def play_game(game):\n",
458 " while not game_finished(game):\n",
459 " make_move(game)"
460 ]
461 },
462 {
463 "cell_type": "markdown",
464 "metadata": {},
465 "source": [
466 "The last player to move in a game is the loser. \n",
467 "\n",
468 "Note that \n",
469 "\n",
470 "`game['history']` is the list of game states, \n",
471 "\n",
472 "`game['history'][-1]` gives us the last game state, and \n",
473 "\n",
474 "`game['history'][-1]['player1?]` tells us if player 1 made the last move."
475 ]
476 },
477 {
478 "cell_type": "code",
479 "execution_count": 21,
480 "metadata": {
481 "collapsed": true
482 },
483 "outputs": [],
484 "source": [
485 "def winner(game):\n",
486 " if game['history'][-1]['player1?']:\n",
487 " return game['player2']\n",
488 " else:\n",
489 " return game['player1']"
490 ]
491 },
492 {
493 "cell_type": "markdown",
494 "metadata": {},
495 "source": [
496 "#### Task\n",
497 "Write a function, similar to `winner()`, that returns the losing player."
498 ]
499 },
500 {
501 "cell_type": "code",
502 "execution_count": 22,
503 "metadata": {},
504 "outputs": [],
505 "source": [
506 "def loser(game):\n",
507 " # Your code here"
508 ]
509 },
510 {
511 "cell_type": "markdown",
512 "metadata": {},
513 "source": [
514 "### Test what we've got\n",
515 "Let's test what we've got so far, by creating a game and making some moves with it. \n",
516 "\n",
517 "Change the end of **`main.py`** so that you play a whole game, rather than make just one move."
518 ]
519 },
520 {
521 "cell_type": "code",
522 "execution_count": 23,
523 "metadata": {},
524 "outputs": [
525 {
526 "name": "stdout",
527 "output_type": "stream",
528 "text": [
529 "{'history': [{'move': 3, 'player1?': True, 'tokens': 9},\n",
530 " {'move': 1, 'player1?': False, 'tokens': 6},\n",
531 " {'move': 3, 'player1?': True, 'tokens': 5},\n",
532 " {'move': 1, 'player1?': False, 'tokens': 2},\n",
533 " {'move': 1, 'player1?': True, 'tokens': 1}],\n",
534 " 'player1': {1: Counter({1: 3}),\n",
535 " 2: Counter({1: 3, 2: 3}),\n",
536 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
537 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
538 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
539 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
540 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
541 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
542 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
543 " 'human?': False},\n",
544 " 'player1_active': False,\n",
545 " 'player2': {1: Counter({1: 3}),\n",
546 " 2: Counter({1: 3, 2: 3}),\n",
547 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
548 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
549 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
550 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
551 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
552 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
553 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
554 " 'human?': False},\n",
555 " 'tokens': 0}\n"
556 ]
557 }
558 ],
559 "source": [
560 "p1 = new_menace()\n",
561 "p2 = new_menace()\n",
562 "g = new_game(p1, p2)\n",
563 "\n",
564 "play_game(g) # <-- change this line\n",
565 "pp.pprint(g)"
566 ]
567 },
568 {
569 "cell_type": "markdown",
570 "metadata": {},
571 "source": [
572 "### Play a game between two players\n",
573 "\n",
574 "Let's not make a procedure that will take two players, pick one to be the first player, then play a game between them. \n",
575 "\n",
576 "Add this procedure to **`menace.py`**"
577 ]
578 },
579 {
580 "cell_type": "code",
581 "execution_count": 24,
582 "metadata": {
583 "collapsed": true
584 },
585 "outputs": [],
586 "source": [
587 "def game_with_players(p1, p2, report_result_for=None):\n",
588 " if random.choice([True, False]):\n",
589 " g = new_game(p1, p2)\n",
590 " else:\n",
591 " g = new_game(p2, p1)\n",
592 " play_game(g)\n",
593 " if report_result_for:\n",
594 " if winner(g) == report_result_for:\n",
595 " print('You won')\n",
596 " else:\n",
597 " print('You lost')\n",
598 " return g"
599 ]
600 },
601 {
602 "cell_type": "markdown",
603 "metadata": {},
604 "source": [
605 "Now change the lines at the end of **`main.py`** to play a game against the computer!"
606 ]
607 },
608 {
609 "cell_type": "code",
610 "execution_count": 25,
611 "metadata": {},
612 "outputs": [
613 {
614 "name": "stdout",
615 "output_type": "stream",
616 "text": [
617 "Opponent took 1 pieces.\n",
618 "There are 8 pieces left.\n",
619 "Your move (1-3)? 3\n",
620 "Opponent took 2 pieces.\n",
621 "There are 3 pieces left.\n",
622 "Your move (1-3)? 2\n",
623 "You won\n"
624 ]
625 }
626 ],
627 "source": [
628 "p1 = new_menace()\n",
629 "ph = new_human()\n",
630 "\n",
631 "g = game_with_players(p1, ph, report_result_for=ph)"
632 ]
633 },
634 {
635 "cell_type": "markdown",
636 "metadata": {},
637 "source": [
638 "## Learning\n",
639 "Now we can play a game, it's time to start the learning. We need to extract the winning and losing moves from a game, and use them to update the counts of beads in the player.\n",
640 "\n",
641 "Add this function to **`menace.py`**"
642 ]
643 },
644 {
645 "cell_type": "code",
646 "execution_count": 26,
647 "metadata": {
648 "collapsed": true
649 },
650 "outputs": [],
651 "source": [
652 "def winning_moves(game):\n",
653 " moves = []\n",
654 " player1_won = game['history'][-1]['player1?']\n",
655 " for m in game['history']:\n",
656 " if m['player1?'] != player1_won:\n",
657 " moves += [m]\n",
658 " return moves"
659 ]
660 },
661 {
662 "cell_type": "markdown",
663 "metadata": {},
664 "source": [
665 "#### Task\n",
666 "Write a similar function to find the losing moves. The function you need to write is almost identical to the one above, but you need to change the condition in the `if` statement.\n",
667 "\n",
668 "Note that in Python, `!=` means \"is not equal to\" and `==` means \"is equal to\". "
669 ]
670 },
671 {
672 "cell_type": "code",
673 "execution_count": 27,
674 "metadata": {
675 "collapsed": true
676 },
677 "outputs": [],
678 "source": [
679 "def losing_moves(game):\n",
680 " # Your code here"
681 ]
682 },
683 {
684 "cell_type": "markdown",
685 "metadata": {},
686 "source": [
687 "### Test\n",
688 "Test this works. Add the following lines to the end of **`main.py`**\n"
689 ]
690 },
691 {
692 "cell_type": "code",
693 "execution_count": 42,
694 "metadata": {},
695 "outputs": [
696 {
697 "name": "stdout",
698 "output_type": "stream",
699 "text": [
700 "Opponent took 3 pieces.\n",
701 "There are 6 pieces left.\n",
702 "Your move (1-3)? 1\n",
703 "Opponent took 1 pieces.\n",
704 "There are 4 pieces left.\n",
705 "Your move (1-3)? 2\n",
706 "You won\n",
707 "\n",
708 "Winning moves\n",
709 "[{'move': 1, 'player1?': False, 'tokens': 6},\n",
710 " {'move': 2, 'player1?': False, 'tokens': 4}]\n",
711 "\n",
712 "Losing moves\n",
713 "[{'move': 3, 'player1?': True, 'tokens': 9},\n",
714 " {'move': 1, 'player1?': True, 'tokens': 5},\n",
715 " {'move': 2, 'player1?': True, 'tokens': 2}]\n"
716 ]
717 }
718 ],
719 "source": [
720 "p1 = new_menace()\n",
721 "ph = new_human()\n",
722 "\n",
723 "g = game_with_players(p1, ph, report_result_for=ph)\n",
724 "\n",
725 "print() # <-- Add this line and the ones below.\n",
726 "print('Winning moves')\n",
727 "pp.pprint(winning_moves(g))\n",
728 "print()\n",
729 "print('Losing moves')\n",
730 "pp.pprint(losing_moves(g))"
731 ]
732 },
733 {
734 "cell_type": "markdown",
735 "metadata": {},
736 "source": [
737 "**Updating the winner** is straighforward: just find the correct box for each move, and update the number of winning beads in that box. \n",
738 "\n",
739 "In **`menace.py`**, add and complete this procedure. Remember that `m['tokens']` will tell you how many tokens there were before the move, and `m['move']` will tell you how many tokens the winner took in that move. `player[n]` will contain box _n_, which you want to update."
740 ]
741 },
742 {
743 "cell_type": "code",
744 "execution_count": 30,
745 "metadata": {},
746 "outputs": [],
747 "source": [
748 "def update_winner(game):\n",
749 " player = # how do you find the winner?\n",
750 " moves = # how do you find the winning moves?\n",
751 " for m in moves:\n",
752 " # how do you update the correct boxes?"
753 ]
754 },
755 {
756 "cell_type": "markdown",
757 "metadata": {},
758 "source": [
759 "**Updating the loser** is a bit tricker. The idea is the same as updating the winner, but we don't want to take the last bead of any particular type. You'll need to write a procedure very similar to `update_winner()`, but with an additional `if` statement to check if there's more than one bead of that type."
760 ]
761 },
762 {
763 "cell_type": "code",
764 "execution_count": 31,
765 "metadata": {
766 "collapsed": true
767 },
768 "outputs": [],
769 "source": [
770 "def update_loser(game):\n",
771 " # Your code here"
772 ]
773 },
774 {
775 "cell_type": "markdown",
776 "metadata": {},
777 "source": [
778 "### Testing\n",
779 "We test the learning by playing a game and updating the players. Change the lines at the end of **`main.py`**."
780 ]
781 },
782 {
783 "cell_type": "code",
784 "execution_count": 32,
785 "metadata": {},
786 "outputs": [
787 {
788 "name": "stdout",
789 "output_type": "stream",
790 "text": [
791 "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n",
792 " {'move': 2, 'player1?': False, 'tokens': 8},\n",
793 " {'move': 1, 'player1?': True, 'tokens': 6},\n",
794 " {'move': 1, 'player1?': False, 'tokens': 5},\n",
795 " {'move': 1, 'player1?': True, 'tokens': 4},\n",
796 " {'move': 1, 'player1?': False, 'tokens': 3},\n",
797 " {'move': 2, 'player1?': True, 'tokens': 2}],\n",
798 " 'player1': {1: Counter({1: 3}),\n",
799 " 2: Counter({1: 3, 2: 3}),\n",
800 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
801 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
802 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
803 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
804 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
805 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
806 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
807 " 'human?': False},\n",
808 " 'player1_active': False,\n",
809 " 'player2': {1: Counter({1: 3}),\n",
810 " 2: Counter({1: 3, 2: 2}),\n",
811 " 3: Counter({1: 4, 2: 3, 3: 3}),\n",
812 " 4: Counter({2: 3, 3: 3, 1: 2}),\n",
813 " 5: Counter({1: 4, 2: 3, 3: 3}),\n",
814 " 6: Counter({2: 3, 3: 3, 1: 2}),\n",
815 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
816 " 8: Counter({2: 4, 1: 3, 3: 3}),\n",
817 " 9: Counter({2: 3, 3: 3, 1: 2}),\n",
818 " 'human?': False},\n",
819 " 'tokens': 0}\n"
820 ]
821 }
822 ],
823 "source": [
824 "p1 = new_menace()\n",
825 "p2 = new_menace() # Change this line\n",
826 "g = new_game(p1, p2) # Change this line to be p2 not ph\n",
827 "play_game(g)\n",
828 "\n",
829 "update_winner(g) # Add these lines\n",
830 "update_loser(g) # Add these lines \n",
831 "\n",
832 "pp.pprint(g)"
833 ]
834 },
835 {
836 "cell_type": "markdown",
837 "metadata": {},
838 "source": [
839 "This shows the winner's counts being increased and the losers decreased.\n",
840 "\n",
841 "What happens when the push the updates? Does the number of beads stay at the minum value of 1?\n",
842 "\n",
843 "Wrap the `update...` lines in a `for` loop to do many updates at the same time."
844 ]
845 },
846 {
847 "cell_type": "code",
848 "execution_count": 33,
849 "metadata": {},
850 "outputs": [
851 {
852 "name": "stdout",
853 "output_type": "stream",
854 "text": [
855 "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n",
856 " {'move': 2, 'player1?': False, 'tokens': 8},\n",
857 " {'move': 1, 'player1?': True, 'tokens': 6},\n",
858 " {'move': 1, 'player1?': False, 'tokens': 5},\n",
859 " {'move': 1, 'player1?': True, 'tokens': 4},\n",
860 " {'move': 1, 'player1?': False, 'tokens': 3},\n",
861 " {'move': 2, 'player1?': True, 'tokens': 2}],\n",
862 " 'player1': {1: Counter({1: 3}),\n",
863 " 2: Counter({1: 3, 2: 3}),\n",
864 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
865 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
866 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
867 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
868 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
869 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
870 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
871 " 'human?': False},\n",
872 " 'player1_active': False,\n",
873 " 'player2': {1: Counter({1: 3}),\n",
874 " 2: Counter({1: 3, 2: 1}),\n",
875 " 3: Counter({1: 8, 2: 3, 3: 3}),\n",
876 " 4: Counter({2: 3, 3: 3, 1: 1}),\n",
877 " 5: Counter({1: 8, 2: 3, 3: 3}),\n",
878 " 6: Counter({2: 3, 3: 3, 1: 1}),\n",
879 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
880 " 8: Counter({2: 8, 1: 3, 3: 3}),\n",
881 " 9: Counter({2: 3, 3: 3, 1: 1}),\n",
882 " 'human?': False},\n",
883 " 'tokens': 0}\n"
884 ]
885 }
886 ],
887 "source": [
888 "for _ in range(4):\n",
889 " update_winner(g)\n",
890 " update_loser(g)\n",
891 " \n",
892 "pp.pprint(g)"
893 ]
894 },
895 {
896 "cell_type": "markdown",
897 "metadata": {},
898 "source": [
899 "## Making clever players\n",
900 "Now we can play games and have players learn, let's train some players. \n",
901 "\n",
902 "We'll create two players and get them to play a large number of games against each other. To avoid any first-player advantage, we'll randomly swap which player goes first.\n",
903 "\n",
904 "Create this procedure in **`menace.py`**"
905 ]
906 },
907 {
908 "cell_type": "code",
909 "execution_count": 34,
910 "metadata": {
911 "collapsed": true
912 },
913 "outputs": [],
914 "source": [
915 "def game_with_players(p1, p2, report_result_for=None):\n",
916 " if random.choice([True, False]):\n",
917 " g = new_game(p1, p2)\n",
918 " else:\n",
919 " g = new_game(p2, p1)\n",
920 " play_game(g)\n",
921 " if report_result_for:\n",
922 " if winner(g) == report_result_for:\n",
923 " print('You won')\n",
924 " else:\n",
925 " print('You lost')\n",
926 " return g"
927 ]
928 },
929 {
930 "cell_type": "markdown",
931 "metadata": {},
932 "source": [
933 "Change the end of **`main.py`** to be this"
934 ]
935 },
936 {
937 "cell_type": "code",
938 "execution_count": 35,
939 "metadata": {},
940 "outputs": [
941 {
942 "name": "stdout",
943 "output_type": "stream",
944 "text": [
945 "You play first.\n",
946 "There are 9 pieces left.\n",
947 "Your move (1-3)? 1\n",
948 "Opponent took 3 pieces.\n",
949 "There are 5 pieces left.\n",
950 "Your move (1-3)? 1\n",
951 "Opponent took 2 pieces.\n",
952 "There are 2 pieces left.\n",
953 "Your move (1-2)? 1\n",
954 "You won\n"
955 ]
956 },
957 {
958 "data": {
959 "text/plain": [
960 "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n",
961 " {'move': 3, 'player1?': False, 'tokens': 8},\n",
962 " {'move': 1, 'player1?': True, 'tokens': 5},\n",
963 " {'move': 2, 'player1?': False, 'tokens': 4},\n",
964 " {'move': 1, 'player1?': True, 'tokens': 2},\n",
965 " {'move': 1, 'player1?': False, 'tokens': 1}],\n",
966 " 'player1': {'human?': True},\n",
967 " 'player1_active': True,\n",
968 " 'player2': {1: Counter({1: 3}),\n",
969 " 2: Counter({1: 3, 2: 3}),\n",
970 " 3: Counter({1: 3, 2: 3, 3: 3}),\n",
971 " 4: Counter({1: 3, 2: 3, 3: 3}),\n",
972 " 5: Counter({1: 3, 2: 3, 3: 3}),\n",
973 " 6: Counter({1: 3, 2: 3, 3: 3}),\n",
974 " 7: Counter({1: 3, 2: 3, 3: 3}),\n",
975 " 8: Counter({1: 3, 2: 3, 3: 3}),\n",
976 " 9: Counter({1: 3, 2: 3, 3: 3}),\n",
977 " 'human?': False},\n",
978 " 'tokens': 0}"
979 ]
980 },
981 "execution_count": 35,
982 "metadata": {},
983 "output_type": "execute_result"
984 }
985 ],
986 "source": [
987 "p1 = new_menace()\n",
988 "ph = new_human()\n",
989 "\n",
990 "game_with_players(p1, ph, report_result_for=ph)"
991 ]
992 },
993 {
994 "cell_type": "markdown",
995 "metadata": {},
996 "source": [
997 "It should play a game with you and say if you won or lost."
998 ]
999 },
1000 {
1001 "cell_type": "markdown",
1002 "metadata": {},
1003 "source": [
1004 "Now we want to train some players. We create two `menace` players and have them play a bunch of games against themselves. About 10,000 games is enough for training.\n",
1005 "\n",
1006 "In **`menace.py`**, create this procedure:"
1007 ]
1008 },
1009 {
1010 "cell_type": "code",
1011 "execution_count": 36,
1012 "metadata": {
1013 "collapsed": true
1014 },
1015 "outputs": [],
1016 "source": [
1017 "def train_players(p1, p2, rounds=10000):\n",
1018 " for _ in range(rounds):\n",
1019 " # your code here to...\n",
1020 " # play a game with the two players\n",
1021 " # update the winner\n",
1022 " # update the loser\n",
1023 " return p1, p2"
1024 ]
1025 },
1026 {
1027 "cell_type": "markdown",
1028 "metadata": {},
1029 "source": [
1030 "Change the end of **`main.py`** to do the training. "
1031 ]
1032 },
1033 {
1034 "cell_type": "code",
1035 "execution_count": 37,
1036 "metadata": {},
1037 "outputs": [
1038 {
1039 "name": "stdout",
1040 "output_type": "stream",
1041 "text": [
1042 "{1: Counter({1: 1}),\n",
1043 " 2: Counter({1: 1741, 2: 1}),\n",
1044 " 3: Counter({2: 1790, 1: 1, 3: 1}),\n",
1045 " 4: Counter({3: 1620, 1: 1, 2: 1}),\n",
1046 " 5: Counter({1: 1, 2: 1, 3: 1}),\n",
1047 " 6: Counter({1: 1248, 2: 1, 3: 1}),\n",
1048 " 7: Counter({2: 2651, 1: 1, 3: 1}),\n",
1049 " 8: Counter({3: 1120, 1: 2, 2: 1}),\n",
1050 " 9: Counter({1: 1, 2: 1, 3: 1}),\n",
1051 " 'human?': False}\n"
1052 ]
1053 }
1054 ],
1055 "source": [
1056 "p1 = new_menace()\n",
1057 "p2 = new_menace()\n",
1058 "\n",
1059 "train_players(p1, p2) # Change this line\n",
1060 "\n",
1061 "pp.pprint(p1)"
1062 ]
1063 },
1064 {
1065 "cell_type": "markdown",
1066 "metadata": {},
1067 "source": [
1068 "## Playing Menace\n",
1069 "Let's now play our trained Menace AI!"
1070 ]
1071 },
1072 {
1073 "cell_type": "code",
1074 "execution_count": 40,
1075 "metadata": {},
1076 "outputs": [
1077 {
1078 "name": "stdout",
1079 "output_type": "stream",
1080 "text": [
1081 "Opponent took 1 pieces.\n",
1082 "There are 8 pieces left.\n",
1083 "Your move (1-3)? 1\n",
1084 "Opponent took 2 pieces.\n",
1085 "There are 5 pieces left.\n",
1086 "Your move (1-3)? 3\n",
1087 "Opponent took 1 pieces.\n",
1088 "There are 1 pieces left.\n",
1089 "Your move (1-1)? 1\n",
1090 "You lost\n"
1091 ]
1092 },
1093 {
1094 "data": {
1095 "text/plain": [
1096 "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n",
1097 " {'move': 1, 'player1?': False, 'tokens': 8},\n",
1098 " {'move': 2, 'player1?': True, 'tokens': 7},\n",
1099 " {'move': 3, 'player1?': False, 'tokens': 5},\n",
1100 " {'move': 1, 'player1?': True, 'tokens': 2},\n",
1101 " {'move': 1, 'player1?': False, 'tokens': 1}],\n",
1102 " 'player1': {1: Counter({1: 1}),\n",
1103 " 2: Counter({1: 1713, 2: 1}),\n",
1104 " 3: Counter({1: 1, 2: 1678, 3: 1}),\n",
1105 " 4: Counter({1: 1, 2: 1, 3: 1674}),\n",
1106 " 5: Counter({1: 1, 2: 1, 3: 1}),\n",
1107 " 6: Counter({1: 203, 2: 2, 3: 1}),\n",
1108 " 7: Counter({1: 12, 2: 4775, 3: 2}),\n",
1109 " 8: Counter({1: 1, 2: 1, 3: 7}),\n",
1110 " 9: Counter({1: 193, 2: 1, 3: 1}),\n",
1111 " 'human?': False},\n",
1112 " 'player1_active': True,\n",
1113 " 'player2': {'human?': True},\n",
1114 " 'tokens': 0}"
1115 ]
1116 },
1117 "execution_count": 40,
1118 "metadata": {},
1119 "output_type": "execute_result"
1120 }
1121 ],
1122 "source": [
1123 "p1 = new_menace()\n",
1124 "p2 = new_menace()\n",
1125 "\n",
1126 "train_players(p1, p2)\n",
1127 "\n",
1128 "ph = new_human()\n",
1129 "\n",
1130 "game_with_players(p1, ph, report_result_for=ph)"
1131 ]
1132 },
1133 {
1134 "cell_type": "markdown",
1135 "metadata": {},
1136 "source": [
1137 "## Extension\n",
1138 "\n",
1139 "The `update_loser()` procedure doesn't take the last bead of a particular type from any box. But taking this last bead could make Menace completely avoid bad moves. \n",
1140 "\n",
1141 "Change the `update_loser()` procedure so that we can say whether we want to take the last bead of a type from the box. \n",
1142 "\n",
1143 "Change the procedure definition to\n",
1144 "\n",
1145 "```\n",
1146 "def update_loser(game, allow_drop_move=False):\n",
1147 "```\n",
1148 "\n",
1149 "You'll need add an `if` statement when you update the number of beads, something like:\n",
1150 "\n",
1151 "```\n",
1152 "for m in moves:\n",
1153 " if allow_drop_move:\n",
1154 " # your new code\n",
1155 " else:\n",
1156 " # the existing update code\n",
1157 "```\n",
1158 "\n",
1159 "Be careful that you don't take the last bead from a box, leaving that box with no beads at all!\n",
1160 "\n",
1161 "You'll also need to update `train_players()` to accept an `allow_drop_moves` parameter and pass it to the `update_loser()` procedure inside it.\n",
1162 "\n",
1163 "Can you beat players trained this way?"
1164 ]
1165 },
1166 {
1167 "cell_type": "code",
1168 "execution_count": null,
1169 "metadata": {
1170 "collapsed": true
1171 },
1172 "outputs": [],
1173 "source": []
1174 }
1175 ],
1176 "metadata": {
1177 "kernelspec": {
1178 "display_name": "Python 3",
1179 "language": "python",
1180 "name": "python3"
1181 },
1182 "language_info": {
1183 "codemirror_mode": {
1184 "name": "ipython",
1185 "version": 3
1186 },
1187 "file_extension": ".py",
1188 "mimetype": "text/x-python",
1189 "name": "python",
1190 "nbconvert_exporter": "python",
1191 "pygments_lexer": "ipython3",
1192 "version": "3.5.2+"
1193 }
1194 },
1195 "nbformat": 4,
1196 "nbformat_minor": 2
1197 }