Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper introduces a board game, Tetris Link, that is yet unexplored and appears to be... Show moreDecades of research have been invested in making computer programs for playing games such as Chess and Go. This paper introduces a board game, Tetris Link, that is yet unexplored and appears to be highly challenging. Tetris Link has a large branching factor and lines of play that can be very deceptive, that search has a hard time uncovering. Finding good moves is very difficult for a computer player, our experiments show. We explore heuristic planning and two other approaches: Reinforcement Learning and Monte Carlo tree search. Curiously, a naive heuristic approach that is fueled by expert knowledge is still stronger than the planning and learning approaches. We, therefore, presume that Tetris Link is more difficult than expected. We offer our findings to the community as a challenge to improve upon. Show less
Müller-Brockhausen, M.F.T.; Preuss, M.; Plaat, A. 2021
The idea of transfer in reinforcement learning (TRL) is intriguing: being able to transfer knowledge from one problem to another problem without learning everything from scratch. This promises... Show moreThe idea of transfer in reinforcement learning (TRL) is intriguing: being able to transfer knowledge from one problem to another problem without learning everything from scratch. This promises quicker learning and learning more complex methods. To gain an insight into the field and to detect emerging trends, we performed a database search. We note a surprisingly late adoption of deep learning that starts in 2018. The introduction of deep learning has not yet solved the greatest challenge of TRL: generalization. Transfer between different domains works well when domains have strong similarities (e.g. MountainCar to Cartpole), and most TRL publications focus on different tasks within the same domain that have few differences. Most TRL applications we encountered compare their improvements against self-defined baselines, and the field is still missing unified benchmarks. We consider this to be a disappointing situation. For the future, we note that: (1) A clear measure of task similarity is needed. (2) Generalization needs to improve. Promising approaches merge deep learning with planning via MCTS or introduce memory through LSTMs. (3) The lack of benchmarking tools will be remedied to enable meaningful comparison and measure progress. Already Alchemy and Meta-World are emerging as interesting benchmark suites. We note that another development, the increase in procedural content generation (PCG), can improve both benchmarking and generalization in TRL. Show less
Recently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a deep neural network, that is... Show moreRecently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a deep neural network, that is trained using self-play. The unified deep neural network has a policy-head and a value-head, and during training, the optimizer minimizes the sum of policy loss and value loss. However, it is not clear if and under which circumstances other formulations of the loss function are better. Therefore, we perform experiments with different combinations of these two minimization targets. In contrast to many recent papers who adopt single run experiments and use the whole history Elo ratings from self-play, we propose to use repeated runs. The results show that this method can describe the training performance quite well within each training run, but there is a high self-play bias, such that it is incomparable among different training runs. Therefore, inspired by the AlphaGo series papers, a self-play bias avoiding performance assessment, final best player Elo rating, is adopted to evaluate the playing strength in a direct competition between the evolved players. For relatively small games, based on this new evaluation method, surprisingly, minimizing only value loss achieves the strongest playing strength in the final best players' round-robin tournament. These results indicate that more research is needed into the relative importance of value function and policy function in small games. Show less
The LOFAR radio telescope is a low-frequency aperture synthesis radio telescope with headquarters in the Netherlands and stations across Europe. As a general purpose telescope, LOFAR produces... Show moreThe LOFAR radio telescope is a low-frequency aperture synthesis radio telescope with headquarters in the Netherlands and stations across Europe. As a general purpose telescope, LOFAR produces petabytes of data each year serving a wide range of science cases. The data volumes produced are difficult or impossible to process on a single machine or even a small cluster at a scientific institute. We provide a layout for serving LOFAR processing to the astronomical community by providing access to LOFAR pipelines accelerated on a high throughput platform. We build this on our previous success with parallelizing the LOFAR Surveys pipeline and with creating automated LOFAR workflows on a distributed architecture. The LOFAR As A Service platform will serve the LOFAR Key Science Projects (KSPs), specifically the LOFAR Surveys KSP, which aims to provide science ready products to the scientific community. Additionally, this system will provide a robust method to re-process LOFAR data with a single click. Show less
After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However,... Show moreAfter the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee & Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex), to allow comparison to Banerjee et al. We find that Q-learning converges to a high win rate in GGP. For the ϵ" role="presentation" style="display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; position: relative;">ϵ-greedy strategy, we propose a first enhancement, the dynamic ϵ" role="presentation" style="display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border-width: 0px; border-style: initial; position: relative;">ϵ algorithm. In addition, inspired by (Gelly & Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games. Show less
Fuchs, C.; Murillo Mejias, N.M.; Plaat, A.; Kouwe, E. van der; Stefanov, T.P. 2019