If you’ve ever played Tic Tac Toe, you know that the game gets boring after only a few rounds. The reason for this is because every possible board state has an optimal play. Furthermore, these optimal plays are simple enough to calculate that two optimal players will always cause a draw. The only way to win is if your opponent makes a mistake. Contrast this to a game like Chess where there is rarely a single optimal move. Instead moves must be made off of a series out potential outcomes. Going further, we can consider a game like Go where almost any play can be advantageous depending on the player’s skill. Randall Munroe of xkcd has a nice visual comparison of this concept. (Randall Munroe of xkcd also has nice visual comparison of the optimal tic tac toe plays!)
A few months ago, I came across this great article by James Bridle. In it, he describes a mechanical system (Matchbox Educable Noughts And Crosses Engine) designed by Donald Michie to “play” Tic Tac Toe and learn the rules without any information about how to win.
But what’s really clever is that MENACE learns. Every time it wins a game, an additional bead is added to each matchbox played, corresponding to each winning move. Likewise, every time it loses, a bead corresponding to each losing move is removed. As a result, over time, MENACE becomes more likely to play moves that have previously resulted in wins and less likely to play moves that have resulted in losses.
The concept is simple enough that I decided to make a computer implementation of MENACE.
The Code
Unlike a traditional Tic Tac Toe “AI”, the purpose of this implementation is that the computer player does not know the rules of the game. The only “rules” it knows about are:
- Certain moves are illegal (but no explanation is given as to why they are illegal)
- The game ends
- The outcome is either win, lose, or draw
When it is the computer’s turn to play, it checks to see if it has a record of the current board state. If it does, it plays the move that has the highest score (number of beans). Otherwise, it plays randomly.
private int retrieveOptimalPlayLocation(BoardState bs) { //Find matching board state BoardState matchingState = retrieveKnownBoardState(bs); //Find the highest value in the array int[] bsCounter = null; if(rank == playerRank.ALPHA) { bsCounter = Arrays.copyOf(matchingState.getPlayerAlphaLogicCounter(),9); } else if(rank == playerRank.BETA) { bsCounter = Arrays.copyOf(matchingState.getPlayerBetaLogicCounter(),9); } Arrays.sort(bsCounter); int optimalPlayValue = bsCounter[8]; //Find a play with the highest value if(rank == playerRank.ALPHA) { bsCounter = matchingState.getPlayerAlphaLogicCounter(); } else if(rank == playerRank.BETA) { bsCounter = matchingState.getPlayerBetaLogicCounter(); } int optimalPlayLocation = -1; boolean bestPlayFound = false; //It's possible that multiple plays are equal. //This picks a random best play. while(!bestPlayFound) { int randomEntry = r.nextInt(9); int playValueToCheck = bsCounter[randomEntry]; //Select a random play if(playValueToCheck == optimalPlayValue && checkPlayAvailable(bs, randomEntry)) { optimalPlayLocation = randomEntry; bestPlayFound = true; } } return optimalPlayLocation; }
After each round, the computer updates its “matchboxes” based off of the outcome of the round.
public void processEndOfRound() { Player lastToPlay; if(turns % 2==0) lastToPlay = player2; else lastToPlay = player1; BoardState boardToUpdate; int logicAmountChange = 0; if(victory && lastToPlay instanceof HumanPlayer) //Human wins { System.out.println("You won in " + turns + " moves!"); logicAmountChange = -1; } else if(victory && lastToPlay instanceof ComputerPlayer) //Computer wins { System.out.println("The computer won in " + turns + " moves!"); logicAmountChange = 2; } else //Draw { System.out.println("The game ended in a draw."); logicAmountChange = 1; } for(int[] turn : moveHistory) { if(turn[0]!=0) { boardToUpdate = BoardStateChecker.getKnownBoardStates().get(turn[0]); boardToUpdate.changeLikelihood(turn[1], logicAmountChange, ((Game_HvC)theGame).getComputerPlayer().rank); } } }
You can view the full source code over at GitHub.
When I first put this together, I wanted to test it with me playing against the computer. It worked, but only if I played reliably. Any deviation results in a new board pattern that the computer doesn’t know about. To really see a computer player “learn”, I implemented two additional game modes: learning computer vs. random computer and learning computer vs. learning computer. This allows the two players to play a few thousand games in just about a second and get stats about the game outcomes. In my next post, I’ll share my findings and stats from these automated game play modes.