zl程序教程

您现在的位置是:首页 >  IT要闻

当前栏目

不完美信息游戏中的搜索

2023-03-14 22:36:25 时间

从这个领域的黎明开始,带有价值函数的搜索就是计算机游戏研究的一个基本概念。图灵1950年的国际象棋算法能够提前两步思考,香农1950年关于国际象棋的工作包括一个关于搜索中使用的评价函数的广泛章节。塞缪尔1959年的跳棋程序已经结合了搜索和价值函数,这些函数是通过自我游戏和引导来学习的。TD-Gammon在这些想法的基础上进行了改进,并使用神经网络来学习这些复杂的价值函数--只是为了在搜索中再次使用。决策时间搜索和价值函数的结合已经成为计算机在长期的挑战性游戏中战胜人类对手的显著里程碑--国际象棋的DeepBlue和围棋的AlphaGo。直到最近,这种以(学习)价值函数为辅助的强大搜索框架还仅限于完全信息游戏。由于许多有趣的问题没有为代理人提供完美的环境信息,这是一个令人遗憾的限制。这篇论文向读者介绍了不完全信息博弈的健全搜索。

原文题目:Search in Imperfect Information Games

原文:From the very dawn of the field, search with value functions was a fundamental concept of computer games research. Turing's chess algorithm from 1950 was able to think two moves ahead, and Shannon's work on chess from 1950 includes an extensive section on evaluation functions to be used within a search. Samuel's checkers program from 1959 already combines search and value functions that are learned through self-play and bootstrapping. TD-Gammon improves upon those ideas and uses neural networks to learn those complex value functions -- only to be again used within search. The combination of decision-time search and value functions has been present in the remarkable milestones where computers bested their human counterparts in long standing challenging games -- DeepBlue for Chess and AlphaGo for Go. Until recently, this powerful framework of search aided with (learned) value functions has been limited to perfect information games. As many interesting problems do not provide the agent perfect information of the environment, this was an unfortunate limitation. This thesis introduces the reader to sound search for imperfect information games.