A paper describing MineDojo, Nvidia’s generalist AI agent that can perform actions from written prompts in Minecraft, won an Outstanding Datasets and Benchmarks Paper Award at the 2022 NeurIPS (Neural Information Processing Systems) conference, Nvidia revealed on Monday.
To train the MineDojo framework to play Minecraft, researchers fed it 730,000 Minecraft YouTube videos (with more than 2.2 billion words transcribed), 7,000 scraped webpages from the Minecraft wiki, and 340,000 Reddit posts and 6.6 million Reddit comments describing Minecraft gameplay.
From this data, the researchers created a custom transformer model called MineCLIP that associates video clips with specific in-game Minecraft activities. As a result, someone can tell a MineDojo agent what to do in the game using high-level natural language, such as “find a desert pyramid” or “build a nether portal and enter it,” and MineDojo will execute the series of steps necessary to make it happen in the game.
MineDojo aims to create a flexible agent that can generalize learned actions and apply them to different behaviors in the game. As Nvidia writes, “While researchers have long trained autonomous AI agents in video-game environments such as StarCraft, Dota, and Go, these agents are usually specialists in only a few tasks. So Nvidia researchers turned to Minecraft, the world’s most popular game, to develop a scalable training framework for a generalist agent—one that can successfully execute a wide variety of open-ended tasks.”
The award-winning paper, “MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge,” debuted in June. Its authors include Linxi Fan of Nvidia and Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar of various academic institutions.
You can see examples of MineDojo in action on its official website, and the code for MineDojo and MineCLIP is available on GitHub.