A 12th grade student Adi Singh and his team developed the Minecraft Benchmark (MC-Bench) website to evaluate the creative abilities of different AI models through Minecraft games. Users can vote for better performing models, and only after voting can they see the AI maker behind each work. Singh said Minecraft was chosen as the test platform because of its wide popularity, making the evaluation more intuitive.
MC-Bench currently has 8 volunteers and is supported by major AI companies such as Anthropic, Google, OpenAI and Alibaba. Singh shares future vision, plans to expand to long-term planning and goal-oriented tasks. In addition to Minecraft, games such as Pokémon Red, Street Fighter and You Draw and I Guess are also used as benchmarks for AI experiments.
MC-Bench is a programming benchmark that requires the model to write code to create a specified build. But for most users, evaluating the appearance of the snowman is more intuitive than delving into the code, which makes the project more attractive and hopefully collect more data about the model's performance. Singh firmly believes this is a powerful signal to help companies understand whether they are heading in the right direction.