Adversarial Search and Games

Date: 23.04.03 ~ 23.04.09

Writer: 9tailwolf

Introduction

Chapter 5 deals with game problem, in other words, adversarial search. In this chapter, we will learn about pruning algorithm, evaluation function method, bridge game.

Optimal Decisions in Games

Optimal Decisions is kinds of strategy of game to make maximum utility.

Minimax Algorithm

Minimax Algorithm is algorithm that calculate maxinum-minimum utility by recursion. It executes complete and optimal DFS about game tree. Since it is one of the DFS, time complexity is \(O(b^{m})\) and space complexity is \(O(bm)\).

Algorithm

def Minimax(state):
    return MinValue(result(state))

def MaxValue(state):
    if terminal_test(state):
        return Utility(state)
    
    v = -inf
    for a in actions(state):
        v = max(v,MinValue(result(state,a)))
    
    return v

def MinValue(state):
    if terminal_test(state):
        return Utility(state)
    
    v = inf
    for a in actions(state):
        v = min(v,MaxValue(result(state,a)))
    
    return v

α-β Pruning

α-β Pruning is algorithm of optimal minimax. Difference is more efficient because cutting nodes that can’t affect in final decisions. With perfect ordering, time complexity is \(O(b^{\frac{m}{2}})\)

Algorithm

def alpha_beta_search(state):
    return MinValue(result(state),-inf,inf)

def MaxValue(state,a,b):
    if terminal_test(state):
        return Utility(state)
    
    v = -inf
    for act in actions(state):
        v = max(v,MinValue(result(state,act)))
        if v>=b:
            return v
        a = max(a,v)
    
    return v

def MinValue(state,a,b):
    if terminal_test(state):
        return Utility(state)
    
    v = inf
    for act in actions(state):
        v = min(v,MaxValue(result(state,act)))
        if v<=a:
            return v
        b = min(b,v)
    
    return v

Improval Algorithm

EVAL and CUTOFF function To make use of limited computation time, We can use EVAL and CUTOFF. EVAL function is method that estimate desirability of position. CUFOFF function determine when EVAL function apply.

Algorithm

def H_minimax(s,d):
    if CUTOFF(s,d):
        return EVAL(s)
    elif Player(s)==MAX:
        return max([H_minimax(Result(s,a),d+1) for a in actions(s)])
    elif Player(s)==MIN:
        return min([H_minimax(Result(s,a),d+1) for a in actions(s)])

To make more efficient EVAL function, It can be evaluated similar to real utility function. And to make more efficient CUTOFF function, It should work in quiescent state by using quiescence search. But it is hard to eliminate Horizon Effect(Effect that hiding bad decision by strategy)

Forward Pruning Foward Pruning is a algorithm that eliminate some nodes. But it is a little bit dangerous because of not optimal.

There is some way to implement Foward Pruning.

Beam Search
PROBCUT function : Cutting by probability.
late move reduction : Remove overlapped way that appear later.

Monte Carlo Methods

Monte Carlo Methods is algorithm that estimate probability by simulation. It is consist of four steps.

Selection : In selection state, we use UCB, Upper Confidence Bound function. It is a function to measure probability of win. Fomular is as follows.

\[UCB(b) = \frac{U(n)}{N(n)} + C \times \sqrt{\frac{\log{N(P(n))}}{N(n)}}\]

\(U(n)\) is number of win with n nodes, \(N(n)\) is number of simulations, \(P(n)\) is Parent node of n in tree, \(C\) is constant that is determined by simulations. By UCB function, we can select node of the greatest UCB value.

Expansion
Simulation
Back-Propagation

By repeating the above process, the probability can be calculated.

Algorithm

def Monte_Carlo_Tree_Search(state):
    tree = Node(state)
    while is_time_remaining():
        leaf = select(tree)
        child = expand(leaf)
        result = simulate(child)
        back_propagate(result,child)
    return max(actions(state),key = value)

Stochastic Games

In stochastic games, chance introduced by luck. In this game, we choose the highest expected value.

Algorithm

def Expect_minimax(s):
    if IS_TERMINAL(s,d):
        return UTILITY(s)
    elif To_Move(s)==MAX:
        return max([Expect_minimax(Result(s,a)) for a in actions(s)])
    elif To_Move(s)==MIN:
        return min([Expect_minimax(Result(s,a)) for a in actions(s)])
    elif To_Move(s)==CHANCE:
        return sum([P(r)*Expect_minimax(Result(s,a)) for a in actions(s)])