DeepMind AI achieves Grandmaster status at Starcraft 2
Artificial intelligence firm says its AI agents have achieved Grandmaster status at Starcraft 2. …
DeepMind says it has created the first artificial intelligence to reach the top league of one of the most popular esport video games.
It says Starcraft 2 had posed a tougher AI challenge than chess and other board games, in part because opponents’ pieces were often hidden from view.
Publication in the peer-reviewed journal Nature allows the London-based lab to claim a new milestone.
But some pro-gamers have mixed feelings about it claiming Grandmaster status.
DeepMind – which is owned by Google’s parent company Alphabet – said the development of AlphaStar would help it develop other AI tools which should ultimately benefit humanity.
“One of the key things we’re really excited about is that Starcraft raises a lot of challenges that you actually see in real-world problems,” said Dave Silver, who leads the lab’s reinforcement learning research group.
“We see Starcraft as a benchmark domain to understand the science of AI, and advance in our quest to build better AI systems.”
DeepMind says that examples of technologies that might one day benefit from its new insights include robots, self-driving cars and virtual assistants, which all need to make decisions based on “imperfectly observed information”.
How do you play Starcraft 2?
In one-on-one games, two players compete against each other after choosing which alien race to be. Each of the three options – Zerg, Protoss and Terran – has different abilities.
Players start with only a few pieces and must gather resources – minerals and gasses – which can be used to make new buildings and create technologies. They can also invest time increasing their number of worker units.
Gamers can only see a small section of the map at a time, and they can only point the in-game “camera” to an area if some of their units are based there or have travelled to it.
When ready, players can send out scouting parties to reveal their enemy’s preparations, or alternatively go straight ahead and launch attacks.
All of this happens in real-time, and players do not take turns to make moves.
As the action picks up pace, gamers typically have to juggle hundreds of units and structures, and make choices that might only pay off minutes later.
Part of the challenge is the huge amount of choice on offer.
At any time, there are up to 100 trillion trillion possible moves, and thousands of such choices must be taken before it becomes apparent who has overwhelmed the others’ buildings and won.
How did DeepMind approach the problem?
DeepMind trained three separate neural networks – one for each race of aliens it played as.
To start with, it tapped into a vast database of past games provided by Starcraft’s developer Blizzard. This was used to train its agents to imitate the moves of the strongest players.
Copies of these agents were then pitted against each other to hone their skills via a technique known as reinforcement learning.
They also created “exploiter agents”, whose job it was to expose weaknesses in the main agents’ strategies, so as to let them find ways to correct them.
Prof Silver likened these subsidiary agents to “sparring partners” and said they forced the main agents to adopt more robust strategies than would otherwise have been the case.
This all took place across 44 days. But because the process was carried out at high speed, it represented about 200 years of human gameplay.
The resulting three neural networks were then pitted against human players on Blizzard’s Battle.net platform, without their identity being revealed until after each game, to see if they would triumph.
What was the result?
The lab said its neural networks attained Grandmaster status for each of the three alien races – the ranking given to the top players in each region of the world.
But it acknowledged there were still about 50 to 100 people who still outperform AlphaStar on Battle.net.
Is this really about developing AI to fight wars?
DeepMind has pledged never to develop technologies for lethal autonomous weapons. Prof Silver said the work on Starcraft 2 did not change that.
“To say that this has any kind of military use is saying no more than to say an AI for chess could be used to lead to military applications,” he added.
“Our goal is to try and build general purpose intelligences [but] there are deeper ethical questions which have to be answered by the community.”
It is noteworthy that after DeepMind beat South Korea’s top Go player in 2016, the Chinese military published a document saying the achievement highlighted “the enormous potential of artificial intelligence in combat command”.
Beijing subsequently announced its intention to overtake the US and become the world’s leader in AI by 2030.
What do gamers think?
Raza “RazerBlader” Sekha is one of the UK’s top three Starcraft 2 pros. He played as a Terran against AlphaStar and also watched its matches against others.
He said the neural networks were “impressive”, but suggested it still had quirks.
“There was one game where someone went for a very weird [army] composition, made up of purely air units – and AlphaStar didn’t really know how to respond,” he recalled.
“It didn’t adapt its play and ended up losing.
“That’s interesting because good players tend to play more standard styles, while it’s the weaker players who often play weirdly.”
Joshua “RiSky” Hayward is the UK’s top player.
He did not get to play AlphaStar but has studied games it played as a Zerg. He believes its behaviour was atypical for a Grandmaster.
“It often didn’t make the most efficient, strategic decisions,” he remarked, “but it was very good at executing its strategy and doing lots of things all at once, so it still got to a decent level.
“When AI got better than people at chess, it did so by making abnormal moves that ended up being stronger than those played by humans. I feel that DeepMind needed more time to create its own innovations and it will be a bit disappointing if the project doesn’t continue.”
Didn’t DeepMind previously show AI doesn’t need to learn from humans?
The “zero” versions of the lab’s Chess, Go and Shogi-playing agents did indeed perform better when they relied on reinforcement learning alone.
But DeepMind said Starcraft 2 was too complex for this to be practical, at least at this point.
Discovering new strategies without any guide would be a “needle in a haystack problem,” Prof Silver said, with the agent required to stumble upon a series of steps with a beneficial outcome.
“You’d have to do so many unlikely things, each of which in turn looks really bad from where you are,” he explained.
“We call this the exploration problem.
“There’s still an open research question, as to how to do something like AlphaStar Zero, which could fully learn for itself without human data.”
What next?
DeepMind says it hopes the techniques used to develop AlphaStar will ultimately help it “advance our research in real-world domains”.
But Prof Silver said the lab “may rest at this point”, rather than try to get AlphaStar to the level of the very elite players.