Project Report

Motivation:

There are several reasons why I use Monte Carlo Tree Search (MCTS) in this final project. Fisrt, according to the lectures on game playing, we learned Minimax and Monte Carlo Tree Search. We were told that there are some issues with Minimax like alpha-beta pruning may be expensive. Monte Carlo Tree Approach can be another way to go. Secondly, I also did some research online and found that AlphaGo, which is the very famous AI player, uses Monte Carlo Tree Search algorithm to choose the its move. Also, Monte Carloo Tree Search is widely used in a large number of board games, and our pentagon-swap is also a board game. It proves that this approach is popularly approved and trusted by the game developers. Last but not least, this project has competition constrain on turn timeouts and Monte Carlo Tree search can always make a move by controling the number of steps of play. Hence, I choose to use it as my approach. Hence, I choose to write my project using Monte Carlo Approach with the help of upper confidence bound algorithm.

Explanation:

To explain my program, I add two .java files respectively MCTree.java and Node.java in the student player folder. MCTree.java has a public class MCTree and a class UCB. MyTools.java is never used as a part of the program. StudentPlayer.java is modifed to create an instance of monte carlo tree and get the move from the tree.

In order to implement Monte Carlo Tree, I create a Node class first. A node object has the pentago move which is the move reaches it, pentago state, parent of node, and its children in the form of an arraylist of node. In addition, there are two double parameters win and visit that represents the number of wins and simulations after a number of moves of the node.

MCTree class has two data members, respectively root node and playerID. When the student player.java creates one instance of MCTree, root node is created in the tree as a new node that only takes in the current state of the pentago board and the student player's id as its inputs.

There are four steps in MCTS: selection, expansion, random simulation (rollout), and backpropogation. My algorithm will repeat the four steps for a given number of times. When the (driver) method getNextMove is called, the method will begin to perform MCTS and return the desired move in the end.

For the purpose of optimization, I apply a first play strategy which tries to place the chess piece at one of the four middle positions of the quadrant {(1,1), (1,4), (4,1), (4,4)}. When getNextMove is called and it is my turn, it will check if the board is empy which means I am the first player and I will play the first move. I then set the first move to the position (1,1) and swap top right and bottom right quadrant instead of playing randomly. If there is only one piece on the board and I am the second player, I will search for the available moves from that four positions and place the piece there.

Aftet that, I start to repeat the actual MCTS. By the reason that this approach depends on random simulations, I use real time as the condition of repetition. The MCTS will execute for at most 1500ms. Thus, I can collect useful statistics as much as possbile.

The first step is selection. Since we will select the most promising leaf node of the root node, it will call the selectBestNode(root) which returns the desire node. In selectBestNode(Node root) method, I use a while loop to traver the tree in order to find bestNode. If the current node is not a leaf node, it will call the findBestNodeWithUCB(node) which calculates the UCB value according to the

$\frac{win}{visit} + c * \sqrt{\frac{\ln(total visit)}{visit}}$

win stands for the number of wins of the node
visit stands for the number of simulations of the node
scaling factor stands for the scaling constant which can be set to square root of 2 as a first good guess, but it need to be modified during experiments.
total visit stands for the total number of simulations which is the number of simulations of the parent node

The formula is composed of two parts. One is to exploit the known statistics and the other is to explore the nodes that are not tried so much by giving them bonus points.

Selection phase terminates when we reach the bottom of the tree and return the most promising UCB node.

The second step is expansion. the expanddNode(Node) method will return an integer to indicate whether the best leaf node is visited or not. If this node is never visited before, program will give it a try so 0 is returned. Otherwise, It will add children to the best node. Firstly, I will create a clone of the current board state and get all the legal moves. For all the moves, It will create an action and create an node relating to the action. Then, add the newly created node as the child of my best node. When all the avaiable actions (nodes) are added as the child, 1 is returned. The return values 0 and 1 represents the choice of rollout.

The third step is random simulation (rollout). It will run a random simulation from the best leaf node and terminate when a result is achieved. There are two cases for start executing rollout. When the choice of rollout is 0, nothing is done. The node for the next stage is just the best leaf node itself. When it is 1, the program will firstly check if the children are correctly created, if yes, it will choose the first child as the node for the next stage. After case checking, the actual rollout is performed by rollout(Node). It will firstly create a clone of the current state, and keep performing random play until the game is over. If the student is the winnerm a positive number 10 is returned as the winning value. If the opponent player wins the game, -10 is returned. When there is a tie, 0 is returned. This process is very fast because it only performs random moves.

The last step is backpropogation. It traverses back up from the leaf to the root to updates the win and visit values. The method backPropogation(Node, win) takes the node to explore and the winning value given by rollout to proceed. At the first step, it will increment the visit by 1 and win of the node itself by the input win. A temporary node is then created for back traversing. It goes up to become the parent node and do all the updates until the root is reached. After all, one cycle of MCTS is completed.

Once all the cycles of MCTS is completed, it selects the best child node of root that has the best UCB value. The move stored in the node object is returned as the best move.

In order to reach better performance, I applied an additional strategy before returning the best move in case the best move is not the real most optimal one. The idea is to simply check all the legal moves and perform them one by one using a new and cloned state. If there exists one move that can make a win, just return it. For the following step, the program will try to perform the the best move and watch the opponent player's move. If the opponent can win the game right after my best move, I will lower my best move node's UCB value and give up the best move. The approach ends until one best move is found to make this condition never happen again.

Finally, if all the previous strategies do not return a best move, the program will just return the best move to the game server.

Theoretical Basis:

Monte Caro Tree Search is a heuristic search algorithm for making optimal decisions that is used by many games. It is the combination of random simulation and tree policy. Unlike minimax algorithm that needs a complete game tree, MCTS does not require tactical knowleged and employs random simulations with no knowledge of the whole game. "MCTS eschews the typical brute force tree searching methods, and utilizes statistical sampling instead."

MCTS applies the combination of random sampling of traditional Monte Carlo method and tree search strategies. Monte Carlo method computes the result by using repeated random sampling. "In MCTS, the random sampling is in the form of random simulations which are used to expand the game tree. " Next move is determined by the game tree. "MCTS grows the game tree iteratively. With each iteration, the game tree is traversed and expanded. Over time, the game tree will converge. This means that the same path in the tree is traversed in each iteration." This indicates MCTS has found a promising move that can result in the most largest win value from current state. To conclude, MCTS is a probabilistic method with the use of randomness. It does not always find the best move, but it can positively succeed when it is choosing moves that lead to greater chances of winning.

[Monte Carlo Tree Search and Its Applications] Retrived from https://digitalcommons.morris.umn.edu/cgi/viewcontent.cgi?article=1028&context=horizons

Advatanges:

Comparing with tradition tree search algorithms. It is an anytime algorithm. The performance increases as the times of game play increases. Also, it can be halted at any time and return a so-far best move.
The implementation of MCTS is relative easy since it mainly relies on the random plays, but not special strategies.
Unlike minimax, the game tree of MCTS grows in an asymmetric fashion, MCTS is not affected by the branching factor . As for minimax, it will generate enormous game trees.

Disadvantages:

In my opinion, MCTS may fail to return the most optimal move. Our board has the size 6*6 thus there is a huge amount of deep nodes to explore. However, it may miss if the given execution time is not enough.
The convergence rate of MCTS is not good. When the number of playouts are not enough, the probabilities are not in the best state.
Random simulation is totally non-heuristic. It does not help the approach to collect good data for expansions.
In my experiment, MCTS sometimes causes irrational timeout. Also, when I watch the board, i can find "stupid" moves which will transfer the chance of winning to the opponent. I concludes that it is the reason that my random playout only return three evaluated values for win, lose, and draw. Comparing with minimax, MCTS is lack of more specific and case-oriented evaluation functions.

Future Improvements:

Actually, I tries to use neutral network to improve my UCB formula, but I failed to finish it in the actual implementaion. It requires too many advanced knowledgements for me. Initially, we set the number of cycles of MCTS to small numbers lie 3 or 10. After several experiemnts were completed, I changed my idea and adjusted the number of cycles depending on the real time for the purpose of reaching max

One applicable improvement is to combine MCTS and minimax together and have a regulation on which to perform at each step. First of all, I will try to add evaluation system into MCTS. By doing this, I will consider a lot of board states and assign them with different values to enfore my performance. For example, when I see three pieces in a row, I can assign it with the value 10; when I see four pieces in a row, I can assign it with the value 10000 since it is only one move to go to win.

Instead of random simulations, I may try to apply other tree policies for this stage in order to get a better collection of data. RAVE(Rapid Action Value Estimation) is recommonded online and I want to try to learn this in the future.

One factor that is also important is the scaling constant. In my UCB part, I set it to 2 but not the sqrt(2) after a number of experiements. Nonetheless, this constant is still not a good one. It requires a huge number of experiements by using machine learing knowledge to finally decide what is the best one.

References:

[Tic Tac Toe at the Monte Carlo] Retrived from https://medium.com/swlh/tic-tac-toe-at-the-monte-carlo-a5e0394c7bc2

[Monte Carlo Tree Search for Game AI] Retrived from https://spin.atomicobject.com/2015/12/12/monte-carlo-tree-search-algorithm-game-ai/

[Monte Carlo Tree Search - Beginner Guide] Retrived from https://int8.io/monte-carlo-tree-search-beginners-guide/#Backpropagation_8211_propagating_back_the_simulation_result

[Monte Carlo Tree Search (MCTS) research hub.] Retrived from http://mcts.ai/about/index.html/

[Introduction to Monte Carlo Tree Search: The Game-Changing Algorithm behind DeepMind’s AlphaGo] Retrived from https://www.analyticsvidhya.com/blog/2019/01/monte-carlo-tree-search-introduction-algorithm-deepmind-alphago/