Factorio is a complex construction and resource management game that has been recently used to evaluate the capabilities of artificial intelligence. The research team developed the "Factorio Learning Environment" (FLE), which provides two test methods: experimental mode and open mode. The experimental mode contains 24 structured challenges, while the open mode allows AI to explore procedurally generated maps with the goal of building the largest factory possible.
Through the Python API, AI agents can interact with Factorio, perform various operations and monitor game status. The researchers evaluated the performance of six leading language models in the FLE environment, including Claude3.5Sonnet, GPT-4o, etc. Test results show that these models face significant challenges in spatial reasoning, long-term planning, and error correction.
In the test, Claude 3.5Sonnet performed the best, completing 15 of 24 tasks and earning a production score of 2456 in the open test. The researchers believe that the openness and scalability of FLE make it valuable in future testing of more powerful language models and suggest expanding the environment to include multi-agent scenarios and human performance benchmarks.
Factorio Learning Environment: https://top.aibase.com/tool/factorio-learning-environment