|
# RESEARCH ON FUSION ALGORITHM OF MULTI## ATTRIBUTE DECISION MAKING AND REINFORCEMENT LEARNING BASED ON INTUITIONISTIC FUZZY NUMBER IN WARGAME ENVIRONMENT |
|
|
|
**Anonymous authors** |
|
Paper under double-blind review |
|
|
|
ABSTRACT |
|
|
|
Intelligent games have seen an increasing interest within the research community |
|
on artificial intelligence . The article proposes an algorithm that combines the |
|
multi-attribute management and reinforcement learning methods, and that joined |
|
their effect on wargaming, it solves the problem of the agent’s low rate of winning against specific rules and its inability to quickly converge during intelligent |
|
wargame training. At the same time, this paper studied a multi-attribute decision |
|
making and reinforcement learning algorithm in a wargame simulation environment, yielding data on the conflict between red and blue sides. We calculate the |
|
weight of each attribute based on the intuitionistic fuzzy number weight calculations. And then we determine the threat posed by each opponent’s game agents |
|
. Using the red side reinforcement learning reward function, the AC framework |
|
is trained on the reward function, and an algorithm combining multi-attribute decision making with reinforcement learning is obtained. A simulation experiment |
|
confirms that the algorithm of multi-attribute decision making combined with reinforcement learning presented in this paper is significantly more intelligent than |
|
the pure reinforcement learning algorithm. By resolving the shortcomings of the |
|
agent’s neural network, coupled with sparse rewards in large-map combat games, |
|
this robust algorithm effectively reduces the difficulties of convergence. It is also |
|
the first time in this field that an algorithm design for intelligent wargaming combines multi-attribute decision making with reinforcement learning. Finally, another novelty of this research is the interdisciplinary, like designing intelligent |
|
wargames and improving reinforcement learning algorithms. ABSTRACT must be |
|
centered, in small caps, and in point size 12. Two line spaces precede the abstract. |
|
The abstract must be limited to one paragraph. |
|
|
|
1 INTRODUCTION |
|
|
|
Artificial intelligence (AI) and machine learning (ML) are becoming increasingly popular in realworld applications. For example, AlphaGo has attracted huge attention in the research community |
|
and society by showing the capability of AI defeating professional human players in the board game |
|
Go. Yet Alphastar, another strong AI program, has achieved great success in the human-machine |
|
combating game ’StarCraft’ Pang et al. (2019); Silver et al. (2016). In RTS games, AI-driven methods are widely studied and integrated into the game AI design to increase the intelligence of computer opponent and generate more realistic confrontation gaming experience. In the King Glory |
|
Game, Ye D used an improved PPO algorithm to train the game AI, with positive results Ye et al. |
|
(2020). By using reinforcement learning techniques, Silver D et al. developed a training framework |
|
that requires no human knowledge other than the rules of the game, allowing AlphaGo to train itself, |
|
and achieving high levels of intelligence in the process Silver et al. (2017). Using deep reinforcement learning and supervised strategy learning, Barrigan el al. improved the AI performance of RTS |
|
games, and defeats the built-in game AI Barriga et al. (2019). AI has become a hot research topic |
|
in recent years, showing a wide variety of applications such as deduction and analysis Schrittwieser |
|
et al. (2020); Barriga et al. (2017); O’Hanlon (2021). However, there are still limited research to |
|
|
|
|
|
----- |
|
|
|
address the problem of slow convergence during AI training process under a variety of conditions, |
|
especially when it comes to human-AI confrontation games. |
|
|
|
Indexes measure the value of things or the parameter of an evaluation system. It is the scale of the |
|
effectiveness of things to the subject. As an attribute value, it provides the subjective consciousness |
|
or the objective facts expressed in numbers or words. It is important to select a scientifically valid |
|
target threat assessment (TA) index and evaluate that index scientifically. Target threat assessment |
|
contributes to intelligence wargame decision-making as part of current intelligent wargames. It is |
|
mainly based on rules, decision trees, reinforcement learning, and other technologies in the current |
|
mainstream game intelligent decision-making field, but rarely incorporates multi-attribute decisionmaking theory and methods into the intelligent decision-making field. The actual wargame data |
|
obtained through wargame environments are presented in this paper, as well as the multi-attribute |
|
threat assessment indicators that are effectively transformed and presented as a unified expression. |
|
Using three expression forms of real number, interval number, and intuitionistic fuzzy number, the |
|
multi-attribute decision-making theory and methods are used to analyse the target threat degree. |
|
Then, an enhanced reward function based on the generated threat degree is established to train |
|
more effective intelligent decision making model. To the best of our knowledge, this is the first |
|
work that combines the multi-attribute decision making with reinforcement learning to produce high |
|
performance for game AI in a wargame experiment. |
|
|
|
2 WARGAMING MULTIPLE ATTRIBUTE INDEX THREAT QUANTIFICATION |
|
|
|
Obtaining scientific evaluation results requires a reasonable quantification of indicators. An important aspect of decision-making assistance in wargames is target threat assessment, and the evaluation |
|
result directly affects the effectiveness of wargame AI. The aim of this section is to introduce threat |
|
quantification methods for different types of indicators. By combining the target type, this section |
|
divides the target into target distance threat, target attack threat, target speed threat, terrain visibility |
|
threat, environmental indicator threat, and target defense value. The acquired confrontation data are |
|
incorporated into different indicator types, and then the corresponding comprehensive threat value |
|
is calculated. In Table 1 are the attributes and meanings of specific indicators. |
|
|
|
Table 1: A list of indicator attributes and their meanings |
|
|
|
Indicator Attribute Meaning |
|
|
|
Target distance threat Cost type Distance between the two parties will influence the kill probability. |
|
Target attack threat Benefit type Threat degrees should be |
|
determined by the opponent’s type, range, and lethality of the weapon. |
|
Target speed threat Benefit type The threat of speed from our opponents. |
|
Terrain visibility threat Intervisibility > no intervisibility Whether or not the terrain is visible will directly impact the threat. |
|
Environmental indicator threat Benefit type While the opponent’s environment |
|
is conducive to concealment, mobility is more dangerous. |
|
Target defence value Cost type The stronger the opponent’s armor, the harder it is to destroy it. |
|
|
|
3 ESTABLISHMENT OF A MULTI-ATTRIBUTE QUANTITATIVE THREAT MODEL |
|
BASED ON INTUITIONISTIC FUZZY NUMBERS |
|
|
|
By using the interval number method, our framework indicates whether visibility is possible, and |
|
different threats are generated. Nevertheless, the quantified values of other threat targets are real |
|
numbers. To unify the problem-solving method, our algorithm converts all interval numbers and |
|
real numbers to intuitionistic fuzzy numbers, and calculates the size of the threat by calculating the |
|
intuitionistic fuzzy numbers. |
|
|
|
(1) This intuitionistic fuzzy entropy describes the degree of fuzzy judgment information provided |
|
by an intuitionistic fuzzy set. The larger the intuitionistic fuzzy entropy of an evaluation criterion, |
|
the smaller the weight it is; otherwise, the larger needs to be. Based on formulas from the literature |
|
Vlachos & Sergiadis (2007), we calculated the entropy weights for each intuitionistic fuzzy. Among |
|
them, ideal solution Si[+] [is a conceived optimal solution (scheme), and its attribute values hit the] |
|
best value among the alternatives; and the negative ideal solution Si[−] [is the worst conceived solution] |
|
(scheme), and its attribute values hit the worst value among the alternatives. pi is generated by |
|
comparing each alternative scheme with the ideal solution and negative ideal solution. If one of the |
|
solutions is closest to the ideal solution, but at the same time far from the negative ideal solution, |
|
|
|
|
|
----- |
|
|
|
then it is the best solution among the alternatives. |
|
|
|
|
|
_i=1_ [µij ln µij + vij ln vij− (µij + vij) ln (µij + vij) − (1 − _µij −_ _vij) ln 2] (1)_ |
|
|
|
X |
|
|
|
|
|
_Hj = −_ |
|
|
|
|
|
_n ln 2_ |
|
|
|
|
|
If µij = 0, vij = 0, then µij ln µij = 0, vij ln vij = 0, (µij + vij) ln (µij + vij) = 0. |
|
|
|
The entropy weight of the j attribute is defined as: |
|
|
|
1 _Hj_ |
|
_wj =_ _−n_ (2) |
|
|
|
_n_ _Hj_ |
|
_−_ _j=1_ |
|
P |
|
|
|
|
|
Among wj 0, j = 1, 2, _, n,_ |
|
_≥_ _· · ·_ |
|
|
|
|
|
_wj = 1_ |
|
_j=1_ |
|
|
|
P |
|
|
|
|
|
(2) Determine the optimal solution A+ and the worst solution A- using the following formula: |
|
_A[+]_ = _µ[+]1_ _[, ν]1[+]_ _,_ _µ[+]2_ _[, ν]2[+]_ _,_ _,_ _µ[+]n_ _[, ν]n[+]_ |
|
|
|
_A[−]_ = _µ[−]1_ _[, ν]1[−]_ _,_ _µ[−]2_ _[, ν]2[−]_ _, · · ·, ⟨µ[−]n_ _[, ν]n[−][⟩]_ (3) |
|
|
|
_· · ·_ _⟨_ _[⟩]_ |
|
|
|
|
|
|
|
Where |
|
_µ[+]i_ [=] _j=12max......m_ _i_ [=] _j=1min,2,...,m_ (4) |
|
|
|
_[{][µ][ij][}][, ν][+]_ _[{][ν][ij][}]_ |
|
|
|
_µ[−]i_ [=] _j=1min,2,···,m_ _[{][µ][ij][}][, ν]i[−]_ [=] _j=1max,2,···,m_ _[{][ν][ij][}]_ (5) |
|
|
|
(3) Calculate the similarity between the fuzzy intuitionistic A and B as follows: |
|
|
|
|
|
_π1 + π2_ |
|
|
|
2 |
|
|
|
|
|
|
|
|
|
_s (_ _µ1, ν1_ _,_ _µ2, ν2_ ) = 1 |
|
_⟨_ _⟩_ _⟨_ _⟩_ _−_ _[|][2 (][µ][1][ −]_ _[µ][2][)]3[ −]_ [(][ν][1][ −] _[ν][2][)][|]_ |
|
|
|
|
|
1 |
|
_−_ _[π][1][ +]2_ _[ π][2]_ |
|
|
|
|
|
_−_ _[|][2 (][ν][1][ −]_ _[ν][2][)][ −]3_ [(][µ][1][ −] _[µ][2][)][|]_ |
|
|
|
(6) |
|
|
|
|
|
In which, π1 = 1 − _µ1 −_ _ν1, π2 = 1 −_ _µ2 −_ _ν2_ |
|
|
|
(4) Calculate the similarity Si[+] [and][ S]i[−] [between each solution and the optimal solution and the worst] |
|
solution based on the following formula: |
|
|
|
|
|
_Si[+]_ [=] |
|
|
|
_Si[−]_ [=] |
|
|
|
|
|
_k=1_ _wk · s_ _µ[+]k_ _[, ν]k[+]_ _, ⟨µik, νik⟩_ |
|
|
|
_n_ (7) |
|
|
|
P |
|
|
|
_k=1_ _wk · s_ _µ[−]k_ _[, ν]k[−]_ _, ⟨µik, νik⟩_ |
|
|
|
P |
|
|
|
|
|
(5) Then calculate the relative closeness |
|
|
|
_pi = Si[−][/]_ _Si[+]_ [+][ S]i[−] (8) |
|
|