User:Georgebidgood380/sandbox

DI-Star is an artificial intelligence developed by OpenDILabs to play the video game Starcraft II, a popular real-time strategy game developed by Blizzard Entertainment. Specifically, DI-Star plays in the Zerg vs Zerg matchup, referring to the two in-game species who compete in a 1v1 battle. The project is open-sourced and was first published in February of 2022. The program has defeated professional Starcraft II players in the Zerg vs Zerg matchup, a task previous Starcraft II playing AIs had struggled with.

Overview
The use of artificial intelligence to play games developed for humans has been used as a measuring stick to track the progress of artificial intelligence. Major past achievements in the field include Deep Blue defeating the world champion of chess in 1997 and AlphaGo defeating the world go champion in 2016. In 2019, another Starcraft II playing artificial intelligence named AlphaStar defeated several prominent players. Starcraft II is seen as a logical goal for artificial intelligence because of its combination of real time decisions, large action space, diversity of strategy, and imperfect information.

The final agents produced by OpenDILabs are created first by completing a period of supervised learning, followed by a period of reinforcement learning. In the supervised learning segment, the AI learns from replays of high level Zerg vs Zerg games. Towards the end of this training, the AI is given a smaller set of very high level games to hopefully fine tune its strategy once a baseline proficiency has been developed. In the reinforcement learning section, the AI plays against itself millions of times, where winning strategies are propagated to future iterations. During this self play, the AI also competes against “exploiter” versions of itself that perform exaggerated versions of certain strategies. For example, one such exploiter might attack its opponent immediately or only produce certain types of units.

DI-Star was able to perform at the grandmaster level, which is the highest ranking in the Starcraft II matchmaking system. The program has also defeated professional players although it does not consistently beat these players. In fact, top players who have played multiple games against it have been able to beat it fairly reliably. This is still a large improvement from the results of AlphaStar in the Zerg vs Zerg matchup, which struggled to secure victories against professional players. Unlike previous Starcraft II AI such as AlphaStar, DI-Star performs a similar number of actions per minute to a good player. AlphaStar’s success was in part attributed to its superhuman speed, performing actions far faster and with more precision than any human. The community reaction has been positive, with players especially happy that anyone can play against the open-sourced project by downloading the source code.

Supervised Learning
DI-Star began learning Starcraft II through a supervised learning program. In this period, the AI created a mapping of actions taken by human to their outcomes by watching replays of matches played by humans against each other. 165,000 high level replays were used for the first round of this training. High level is defined by OpenDILabs as above 4000 MMR. The purpose of this training was for the agents to develop a baseline ability in Starcraft II and avoid stalling at local minimums as they improved. Next, the agents were trained on 7000 very high level replays (above 5800 MMR). The goal of this second stage was to fine tune the bots to use the strategies that appeared most often at the highest level of play. After the first round of training, DI-Star beat the built in Starcraft II Elite bot 87% of the time. After the second stage, this rose to 97%.

Reinforcement Learning
After completing its supervised learning, DI-Star entered into a phase of reinforcement learning, where different versions of DI-Star compete against each other. The different versions compete in a league with two types of players. The first type are called learners who change their strategies based on their wins and losses. These are the versions that will improve over time and ultimately represent the final DI-Star. The other players present are historical players. These players do not change their strategy as a result of wins or losses. They exist in the league to ensure that the learning players do not lose the ability to beat older strategies or tactics. As the learning players improve, new historical players are added based on copies of successful learning players. In addition to these players, the league contained exploiters or bots that used hand-picked strategies to ensure learners were resilient to particular tactics. For example, one of the exploiters used a strategy based on building a large number of fast flying units called mutalisks. As the learning players faced this bot over and over again, they developed strategies that would work not only against the other learning and historical players but also the exploiters present in the league as well.

Results and Public Reactions
After around 5 weeks of training, DI-Star reached a level of play competitive with professional Starcraft II players in the Zerg vs Zerg matchup. This level of play was not reached by Starcraft II AI Alphastar even after twice the training period using more processing power. However, DI-Star only plays one of the 7 matchups of Starcraft II, compared to Alphastar, which played all of them. Additionally, the training of DI-Star involved more human intervention compared to Alphastar in the form of a more comprehensive batch of exploiter bots in the reinforcement learning section. Alphastar faced criticism from many in the Starcraft II community over its absurd actions per minute. Detractors noted the difficulty in rating the skill of the program when it benefited from super human abilities to control hundreds of units individually at the same time. Alphastar was consistently making thousands of actions per minute which is an order of magnitude above even the fastest human players. DI-Star on the other hand, consistently remained within the speed of a fast professional human player despite having no hard coded limit to its actions. This likely stems from the large amount of human games it studied in its first learning period. The overall community reaction was very positive. Players were excited to see more work in artificial intelligence being dedicated towards Starcraft II as well as the open source nature of the project, which allowed anyone to download and play against DI-Star or even attempt to create their own versions. This build-your-own feature was aided by guidance provided by the developers in their github repository.