{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Menace-like Nim player\n", "\n", "This is an implementation of a Menace-like player for one-stack Nim." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nim\n", "\n", "There are many version of Nim. This version uses a single stack of tokens (initially nine). Players take turns removing one, two, or three tokens from the stack. The player who takes the last token loses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Menace\n", "[Menace](http://chalkdustmagazine.com/features/menace-machine-educable-noughts-crosses-engine/) was an early program that learnt how to play a perfect game of noughts and crosses. It was developed by [Donald Michie](https://en.wikipedia.org/wiki/Donald_Michie). As computers were rare in the 1950s, Michie implemented the machine as a set of matchboxes.\n", "\n", "There is one matchbox for each state of the game (330 for noughts and crosses, ignoring rotations and reflections; 9 for this version of Nim). Each box contains some beads, with each type of bead representing a different move. \n", "\n", "When playing the game, Menace makes its move by picking a random bead from the appropriate box. It plays randomly, but biased by the number of beads in each box.\n", "\n", "Learning takes place by [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning) after the game. If Menace won, all the beads are replaced in their original boxes, and another bead of the same type is added to the box. In other words, one bead came out of the box, two beads of that type go back in. If Menace lost, the played beads are removed (with a minimum of one bead of each type remaining in each box)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation\n", "We start with some imports and constant definitions." ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import collections\n", "import random\n", "from IPython.display import HTML, display" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "INITIAL_GAME_SIZE = 9\n", "MAX_TAKE = 3\n", "INITIAL_BEAD_COUNT = 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **Menace** instance is just a collection of boxes, each box containing some beads. The bead types are just the number of tokens to remove on this move. \n", "\n", "I use a `collections.Counter` object to store counts of the number of beads. \n", "\n", "I use a `dict` of boxes, just to avoid off-by-one errors when comparing the number of tokens to the index of the box collection. \n", "\n", "Note that we ensure that only legal moves are represented in the Menace player." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def new_menace():\n", " boxes = {'human?': False}\n", " for b in range(1, INITIAL_GAME_SIZE+1):\n", " box = collections.Counter()\n", " for i in range(1, MAX_TAKE+1):\n", " if b >= i:\n", " box[i] = INITIAL_BEAD_COUNT\n", " boxes[b] = box\n", " return boxes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_menace()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pick a random move by listing all the beads and picking one at random." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def menace_move(box):\n", " return random.choice(list(box.elements()))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "menace_move(new_menace()[9])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **human** player has no state, apart from saying it's human. There's some complex logic to ensure that we only get valid moves from the human." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def new_human():\n", " return {'human?': True}" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def human_move(game):\n", " if game['history']:\n", " print('Opponent took', game['history'][-1]['move'], 'pieces.')\n", " else:\n", " print('You play first.')\n", " print('There are', game['tokens'], 'pieces left.')\n", " \n", " max_move = min(MAX_TAKE, game['tokens'])\n", " valid_input = False\n", " \n", " while not valid_input:\n", " user_input = input('Your move (1-{})? '.format(max_move))\n", " if user_input.isnumeric():\n", " move = int(user_input)\n", " if move in range(1, max_move+1):\n", " valid_input = True\n", " else:\n", " print('Number not a valid move.')\n", " else:\n", " print('Please enter a number.')\n", " return move" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **game** is the current state of the game (number of tokens, who's the current player), the two players, and a history of moves. We use the history for the reinforcement learning phase." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def new_game(player1, player2):\n", " return {'tokens': INITIAL_GAME_SIZE,\n", " 'player1': player1,\n", " 'player2': player2,\n", " 'player1_active': True,\n", " 'history': []}" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def game_finished(game):\n", " return game['tokens'] == 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last player to move in a game is the loser." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def winner(game):\n", " if game['history'][-1]['player1?']:\n", " return game['player2']\n", " else:\n", " return game['player1']\n", "\n", "def loser(game):\n", " if game['history'][-1]['player1?']:\n", " return game['player1']\n", " else:\n", " return game['player2'] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **active player** makes a move, depending on whether it's a human or a menace. After the move is taken, update the game state and history. We store plenty in the history to make the learning phase easier." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def make_move(game):\n", " if game['player1_active']:\n", " active = game['player1']\n", " else:\n", " active = game['player2']\n", " if active['human?']:\n", " move = human_move(game)\n", " else:\n", " move = menace_move(active[game['tokens']])\n", " game['history'] += [{'player1?': game['player1_active'], 'move': move, 'tokens': game['tokens']}]\n", " game['tokens'] -= move\n", " game['player1_active'] = not game['player1_active'] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **game** is just move after move until the game is finished." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def play_game(game):\n", " while not game_finished(game):\n", " make_move(game)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test what we've got\n", "Let's test what we've got so far, by creating a game and making some moves with it." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [],\n", " 'player1': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'player1_active': True,\n", " 'player2': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'tokens': 9}" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "g = new_game(p1, p2)\n", "g" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [{'move': 3, 'player1?': True, 'tokens': 9}],\n", " 'player1': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'player1_active': False,\n", " 'player2': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'tokens': 6}" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "make_move(g)\n", "g" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n", " {'move': 2, 'player1?': False, 'tokens': 8},\n", " {'move': 3, 'player1?': True, 'tokens': 6},\n", " {'move': 2, 'player1?': False, 'tokens': 3},\n", " {'move': 1, 'player1?': True, 'tokens': 1}],\n", " 'player1': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'player1_active': False,\n", " 'player2': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'tokens': 0}" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "g = new_game(p1, p2)\n", "play_game(g)\n", "g" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Opponent took 1 pieces.\n", "There are 8 pieces left.\n", "Your move (1-3)? 2\n", "Opponent took 2 pieces.\n", "There are 4 pieces left.\n", "Your move (1-3)? 3\n", "You won\n" ] } ], "source": [ "p1 = new_menace()\n", "ph = new_human()\n", "if random.choice([True, False]):\n", " g = new_game(p1, ph)\n", "else:\n", " g = new_game(ph, p1)\n", "play_game(g)\n", "if winner(g) == ph:\n", " print('You won')\n", "else:\n", " print('You lost')" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'human?': True}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "winner(g)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False}" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loser(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning\n", "Now we can play a game, it's time to start the learning. We need to extract the winning and losing moves from a game, and use them to update the counts of beads in the player." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def winning_moves(game):\n", " return [h for h in game['history'] \n", " if h['player1?'] != game['history'][-1]['player1?']]\n", "\n", "def losing_moves(game):\n", " return [h for h in game['history'] \n", " if h['player1?'] == game['history'][-1]['player1?']] " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'move': 2, 'player1?': True, 'tokens': 9},\n", " {'move': 1, 'player1?': True, 'tokens': 4},\n", " {'move': 1, 'player1?': True, 'tokens': 2}]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "winning_moves(g)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'move': 3, 'player1?': False, 'tokens': 7},\n", " {'move': 1, 'player1?': False, 'tokens': 3},\n", " {'move': 1, 'player1?': False, 'tokens': 1}]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "losing_moves(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Updating the winner** is easy: just find the correct box for each move, and update the number of winning beads in that box." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def update_winner(game):\n", " player = winner(game)\n", " moves = winning_moves(game)\n", " for m in moves:\n", " player[m['tokens']][m['move']] += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Updating the loser** is a bit tricker. The idea is the same as updating the winner, but we have to deal with excepttions. We don't update the bead count in three cases:\n", "1. There are no beads of that type in the box to start with (though this shouldn't happen in practice).\n", "2. There's only one instance of this type of bead in the box (unless we override that behaviour with the `allow_drop_move` flag).\n", "3. In any case, we never take the last bead from a box." ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def update_loser_0(game):\n", " player = loser(game)\n", " moves = losing_moves(game)\n", " for m in moves:\n", " if player[m['tokens']][m['move']] > 1:\n", " player[m['tokens']][m['move']] -= 1" ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def update_loser(game, allow_drop_move=False):\n", " player = loser(game)\n", " moves = losing_moves(game)\n", " for m in moves:\n", " if allow_drop_move:\n", " if len(list(player[m['tokens']].elements())) > 1:\n", " player[m['tokens']][m['move']] -= 1\n", " else:\n", " if player[m['tokens']][m['move']] > 1:\n", " player[m['tokens']][m['move']] -= 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing\n", "We test the learning by playing a game and updating the players." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n", " {'move': 1, 'player1?': False, 'tokens': 8},\n", " {'move': 2, 'player1?': True, 'tokens': 7},\n", " {'move': 1, 'player1?': False, 'tokens': 5},\n", " {'move': 3, 'player1?': True, 'tokens': 4},\n", " {'move': 1, 'player1?': False, 'tokens': 1}],\n", " 'player1': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'player1_active': True,\n", " 'player2': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'tokens': 0}" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "g = new_game(p1, p2)\n", "play_game(g)\n", "g" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n", " {'move': 1, 'player1?': False, 'tokens': 8},\n", " {'move': 2, 'player1?': True, 'tokens': 7},\n", " {'move': 1, 'player1?': False, 'tokens': 5},\n", " {'move': 3, 'player1?': True, 'tokens': 4},\n", " {'move': 1, 'player1?': False, 'tokens': 1}],\n", " 'player1': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 4}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 4, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 4, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'player1_active': True,\n", " 'player2': {1: Counter({1: 2}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 2, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 2, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'tokens': 0}" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "update_winner(g)\n", "update_loser(g)\n", "g" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows the winner's counts being increased and the losers decreased.\n", "\n", "What happens when the push the updates? Does the number of beads stay at the minum value of 1?" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [{'move': 1, 'player1?': True, 'tokens': 9},\n", " {'move': 1, 'player1?': False, 'tokens': 8},\n", " {'move': 2, 'player1?': True, 'tokens': 7},\n", " {'move': 1, 'player1?': False, 'tokens': 5},\n", " {'move': 3, 'player1?': True, 'tokens': 4},\n", " {'move': 1, 'player1?': False, 'tokens': 1}],\n", " 'player1': {1: Counter({1: 3}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 8}),\n", " 5: Counter({1: 3, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 8, 3: 3}),\n", " 8: Counter({1: 3, 2: 3, 3: 3}),\n", " 9: Counter({1: 8, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'player1_active': True,\n", " 'player2': {1: Counter({1: 1}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 1, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 1, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False},\n", " 'tokens': 0}" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "for _ in range(4):\n", " update_winner(g)\n", " update_loser(g)\n", "g" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now try it again, but allow moves to be dropped. Does the bead count go to zero?" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: Counter({1: 1}),\n", " 2: Counter({1: 3, 2: 3}),\n", " 3: Counter({1: 3, 2: 3, 3: 3}),\n", " 4: Counter({1: 3, 2: 3, 3: 3}),\n", " 5: Counter({1: 0, 2: 3, 3: 3}),\n", " 6: Counter({1: 3, 2: 3, 3: 3}),\n", " 7: Counter({1: 3, 2: 3, 3: 3}),\n", " 8: Counter({1: 0, 2: 3, 3: 3}),\n", " 9: Counter({1: 3, 2: 3, 3: 3}),\n", " 'human?': False}" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "update_loser(g, allow_drop_move=True)\n", "loser(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes, that move is now forgotten." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 2, 2, 3, 3, 3]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(g['player2'][8].elements())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making clever players\n", "Now we can play games and have players learn, let's train some players. \n", "\n", "We'll create two players and get them to play a large number of games against each other. To avoid any first-player advantage, we'll randomly swap which player goes first." ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "({1: Counter({1: 1}),\n", " 2: Counter({1: 1683, 2: 1}),\n", " 3: Counter({1: 1, 2: 1678, 3: 1}),\n", " 4: Counter({1: 1, 2: 1, 3: 1647}),\n", " 5: Counter({1: 1, 2: 1, 3: 1}),\n", " 6: Counter({1: 1634, 2: 1, 3: 1}),\n", " 7: Counter({1: 1, 2: 1714, 3: 1}),\n", " 8: Counter({1: 1, 2: 1, 3: 1573}),\n", " 9: Counter({1: 1, 2: 1, 3: 1}),\n", " 'human?': False},\n", " {1: Counter({1: 1}),\n", " 2: Counter({1: 1642, 2: 1}),\n", " 3: Counter({1: 1, 2: 1681, 3: 1}),\n", " 4: Counter({1: 1, 2: 1, 3: 1656}),\n", " 5: Counter({1: 1, 2: 1, 3: 1}),\n", " 6: Counter({1: 1694, 2: 1, 3: 1}),\n", " 7: Counter({1: 1, 2: 1603, 3: 1}),\n", " 8: Counter({1: 1, 2: 1, 3: 1599}),\n", " 9: Counter({1: 1, 2: 1, 3: 1}),\n", " 'human?': False})" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "for i in range(10000):\n", " if random.choice([True, False]):\n", " g = new_game(p1, p2)\n", " else:\n", " g = new_game(p2, p1)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser(g)\n", "p1, p2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows there's clearly one correct move in each situation, and the first player loses.\n", "\n", "What if we allow the elimination of moves?" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "({1: Counter({1: 1}),\n", " 2: Counter({1: 14, 2: 1}),\n", " 3: Counter({1: 0, 2: 16, 3: 1}),\n", " 4: Counter({1: 0, 2: 0, 3: 26}),\n", " 5: Counter({1: 0, 2: 1, 3: 0}),\n", " 6: Counter({1: 14, 2: 2, 3: 0}),\n", " 7: Counter({1: 0, 2: 0, 3: 1}),\n", " 8: Counter({1: 0, 2: 0, 3: 12}),\n", " 9: Counter({1: 0, 2: 1, 3: 0}),\n", " 'human?': False},\n", " {1: Counter({1: 1}),\n", " 2: Counter({1: 14, 2: 1}),\n", " 3: Counter({1: 0, 2: 4974, 3: 0}),\n", " 4: Counter({1: 0, 2: 0, 3: 4965}),\n", " 5: Counter({1: 1, 2: 0, 3: 0}),\n", " 6: Counter({1: 17, 2: 0, 3: 1}),\n", " 7: Counter({1: 0, 2: 4948, 3: 0}),\n", " 8: Counter({1: 1, 2: 2, 3: 9}),\n", " 9: Counter({1: 0, 2: 4963, 3: 0}),\n", " 'human?': False})" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "for _ in range(10000):\n", " if random.choice([True, False]):\n", " g = new_game(p1, p2)\n", " else:\n", " g = new_game(p2, p1)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser2(g, allow_drop_move=True)\n", "p1, p2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much lower counts for some moves. These are the obviously losing moves, which seem to be taken rarely in the game. This isn't so good for playing against a human, as the **Menace** player needs to play well after non-optimal play by the human." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Playing Menace\n", "Let's retrain, then try playing it!" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "for i in range(10000):\n", " if random.choice([True, False]):\n", " g = new_game(p1, p2)\n", " else:\n", " g = new_game(p2, p1)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser(g)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "You play first.\n", "There are 9 pieces left.\n", "Your move (1-3)? 2\n", "Opponent took 2 pieces.\n", "There are 5 pieces left.\n", "Your move (1-3)? 1\n", "Opponent took 3 pieces.\n", "There are 1 pieces left.\n", "Your move (1-1)? 1\n", "You lost\n" ] } ], "source": [ "ph = new_human()\n", "if random.choice([True, False]):\n", " g = new_game(p1, ph)\n", "else:\n", " g = new_game(ph, p1)\n", "play_game(g)\n", "if winner(g) == ph:\n", " print('You won')\n", "else:\n", " print('You lost')" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'history': [{'move': 2, 'player1?': True, 'tokens': 9},\n", " {'move': 2, 'player1?': False, 'tokens': 7},\n", " {'move': 1, 'player1?': True, 'tokens': 5},\n", " {'move': 3, 'player1?': False, 'tokens': 4},\n", " {'move': 1, 'player1?': True, 'tokens': 1}],\n", " 'player1': {'human?': True},\n", " 'player1_active': False,\n", " 'player2': {1: Counter({1: 1}),\n", " 2: Counter({1: 1676, 2: 1}),\n", " 3: Counter({1: 1, 2: 1695, 3: 1}),\n", " 4: Counter({1: 1, 2: 1, 3: 1653}),\n", " 5: Counter({1: 1, 2: 1, 3: 1}),\n", " 6: Counter({1: 1648, 2: 1, 3: 1}),\n", " 7: Counter({1: 2, 2: 1646, 3: 1}),\n", " 8: Counter({1: 1, 2: 1, 3: 1639}),\n", " 9: Counter({1: 1, 2: 1, 3: 1}),\n", " 'human?': False},\n", " 'tokens': 0}" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How strong are these players?\n", "Let's generate some players by different methods, and get them playing each other. We'll also introduce a partially-trained player, a \"newbie\" untrained player (essentially random play), and a hand-crafted ideal player." ] }, { "cell_type": "code", "execution_count": 149, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wins 23 games out of 10000 , or 0.23 %\n" ] } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "for _ in range(10000):\n", " if random.choice([True, False]):\n", " g = new_game(p1, p2)\n", " else:\n", " g = new_game(p2, p1)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser(g)\n", "\n", "wins = 0\n", "plays = 10000\n", "for _ in range(plays):\n", " g = new_game(p1, p2)\n", " play_game(g)\n", " if winner(g) == p1: \n", " wins += 1\n", "print(\"Wins\", wins, \"games out of\", plays, \", or \", (100.0 * wins) / plays, \"%\")\n", "\n", "p_floor1 = p1" ] }, { "cell_type": "code", "execution_count": 150, "metadata": {}, "outputs": [], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "for _ in range(100):\n", " if random.choice([True, False]):\n", " g = new_game(p1, p2)\n", " else:\n", " g = new_game(p2, p1)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser(g)\n", "\n", "p_parttrained = p1" ] }, { "cell_type": "code", "execution_count": 170, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wins 0 games out of 10000 , or 0.0 %\n" ] } ], "source": [ "p1 = new_menace()\n", "p2 = new_menace()\n", "for _ in range(10000):\n", " if random.choice([True, False]):\n", " g = new_game(p1, p2)\n", " else:\n", " g = new_game(p2, p1)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser(g, allow_drop_move=True)\n", "\n", "wins = 0\n", "plays = 10000\n", "for _ in range(plays):\n", " g = new_game(p1, p2)\n", " play_game(g)\n", " if winner(g) == p1: \n", " wins += 1\n", "print(\"Wins\", wins, \"games out of\", plays, \", or \", (100.0 * wins) / plays, \"%\")\n", " \n", "p_floor0 = p2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try building a stable of players with the `allow_drop_move` and see if that's better." ] }, { "cell_type": "code", "execution_count": 171, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wins 1068 games out of 10000 , or 10.68 %\n" ] } ], "source": [ "players = [new_menace() for _ in range(10)]\n", "\n", "for _ in range(100000):\n", " p1, p2 = random.sample(players, 2)\n", " g = new_game(p1, p2)\n", " play_game(g)\n", " update_winner(g)\n", " update_loser(g, allow_drop_move=True)\n", "\n", "wins = 0\n", "plays = 10000\n", "p1 = players[0]\n", "for _ in range(plays):\n", " p2 = random.choice(players[1:])\n", " g = new_game(p1, p2)\n", " play_game(g)\n", " if winner(g) == p1: \n", " wins += 1\n", "print(\"Wins\", wins, \"games out of\", plays, \", or \", (100.0 * wins) / plays, \"%\")\n", " \n", "p_floor0_stable = p1" ] }, { "cell_type": "code", "execution_count": 172, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: Counter({1: 1}),\n", " 2: Counter({1: 3323, 2: 0}),\n", " 3: Counter({1: 2, 2: 24, 3: 2}),\n", " 4: Counter({1: 0, 2: 0, 3: 7682}),\n", " 5: Counter({1: 0, 2: 1, 3: 0}),\n", " 6: Counter({1: 3288, 2: 0, 3: 0}),\n", " 7: Counter({1: 0, 2: 3302, 3: 0}),\n", " 8: Counter({1: 0, 2: 0, 3: 3290}),\n", " 9: Counter({1: 0, 2: 0, 3: 1}),\n", " 'human?': False}" ] }, "execution_count": 172, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p_floor0_stable" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: Counter({1: 1}),\n", " 2: Counter({1: 1}),\n", " 3: Counter({2: 1}),\n", " 4: Counter({3: 1}),\n", " 5: Counter({1: 1}),\n", " 6: Counter({1: 1}),\n", " 7: Counter({2: 1}),\n", " 8: Counter({3: 1}),\n", " 9: Counter({1: 1}),\n", " 'human?': False}" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newbie = new_menace()\n", "ideal = {'human?': False,\n", " 1: collections.Counter([1]),\n", " 2: collections.Counter([1]),\n", " 3: collections.Counter([2]),\n", " 4: collections.Counter([3]),\n", " 5: collections.Counter([1]),\n", " 6: collections.Counter([1]),\n", " 7: collections.Counter([2]),\n", " 8: collections.Counter([3]),\n", " 9: collections.Counter([1])}\n", "ideal" ] }, { "cell_type": "code", "execution_count": 153, "metadata": {}, "outputs": [], "source": [ "def count_wins(p1, p2, plays=10000):\n", " wins = 0\n", " p2d = p2.copy()\n", " for _ in range(plays):\n", " g = new_game(p1, p2d)\n", " play_game(g)\n", " if not g['history'][-1]['player1?']:\n", " wins += 1\n", " return wins" ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "collapsed": true }, "outputs": [], "source": [ "players = [p_floor1, p_floor0, p_floor0_stable, newbie, p_parttrained, ideal]\n", "player_names = ['Floor 1', 'Floor 0', 'Floor 0 stable', 'Newbie', 'Part trained', 'Ideal']" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{(0, 0): 22,\n", " (0, 1): 1265,\n", " (0, 2): 511,\n", " (0, 3): 8646,\n", " (0, 4): 4449,\n", " (0, 5): 0,\n", " (1, 0): 12,\n", " (1, 1): 1575,\n", " (1, 2): 0,\n", " (1, 3): 7684,\n", " (1, 4): 6222,\n", " (1, 5): 0,\n", " (2, 0): 24,\n", " (2, 1): 2281,\n", " (2, 2): 1444,\n", " (2, 3): 8532,\n", " (2, 4): 3239,\n", " (2, 5): 0,\n", " (3, 0): 7,\n", " (3, 1): 783,\n", " (3, 2): 375,\n", " (3, 3): 5153,\n", " (3, 4): 1286,\n", " (3, 5): 0,\n", " (4, 0): 21,\n", " (4, 1): 781,\n", " (4, 2): 462,\n", " (4, 3): 7901,\n", " (4, 4): 3720,\n", " (4, 5): 0,\n", " (5, 0): 26,\n", " (5, 1): 0,\n", " (5, 2): 0,\n", " (5, 3): 8837,\n", " (5, 4): 7529,\n", " (5, 5): 0}" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results = {}\n", "\n", "for i, p1 in enumerate(players):\n", " for j, p2 in enumerate(players):\n", " results[i, j] = count_wins(p1, p2)\n", "results" ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
  Player 1
  Floor 1Floor 0Floor 0 stableNewbiePart trainedIdeal
Player 2Floor 1 22122472126
Floor 0 1265157522817837810
Floor 0 stable 511014443754620
Newbie 864676848532515379018837
Part trained 444962223239128637207529
Ideal 000000
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "result_table = '\\n'\n", "result_table += '\\n'.format(len(players))\n", "result_table += ''\n", "for i in range(len(players)):\n", " result_table += ''.format(player_names[i])\n", "result_table += '\\n'\n", "\n", "for i in range(len(players)):\n", " result_table += '\\n'\n", " if i == 0:\n", " result_table += '\\n'.format(len(players))\n", " result_table += ''.format(player_names[i])\n", " for j in range(len(players)):\n", " result_table += ' \\n'.format(results[j, i])\n", " result_table += '\\n'\n", "result_table += \"
  Player 1
  {}
Player 2{}{}
\"\n", "# print(result_table)\n", "display(HTML(result_table))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do these numbers tell us? First, against a trained opponent, the first player is in a very bad position. Indeed, with optimal play (the \"ideal\" player against itself), the first player never wins. However, the disadvantage of first player is almost eliminated with random play.\n", "\n", "We can also see that the non-optimal play can throw off the \"floor 0\" player, while the \"floor 1\" player is more able to cope with non-ideal opponents. \"Floor 0\" ends up only knowing what to do in a few of the game positions. If it ends up outside on of those, it loses badly. This suggest that this player is overtrained, reliant on its opponent doing the right thing. \n", "\n", "All this goes to show that the non-perfect play from \"floor 1\" is an advantage, as it avoids overtraining, keeping the final player more flexible." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2+" } }, "nbformat": 4, "nbformat_minor": 2 }