Current location: Home> Ai News

Google launches Gemini Robotics: Let robots think and act like humans

Author: LoRA Time: 13 Mar 2025 810

Google DeepMind has launched their secret weapon - Gemini Robotics ! This is not a small fight between your sweeping robot, but a real injecting AI's intelligence into the body of steel, so that robots can also show their skills in the physical world like us (even smarter).

All-powerful "all-rounder"

The core of Gemini Robotics is that it is based on the advanced Gemini2.0 model. You should know that Gemini itself has the powerful ability to process text, images, audio and video.

Gemini Robotics goes a step further, giving robots a "superpower" to understand physical space and take action . This means that whether it is dealing with text commands, identifying the image in front of you, understanding your "soul" voice, or analyzing an operation video, Gemini Robotics can understand it and transform it into actual physical operations.

Imagine that in the future, just move your mouth or show a picture of the robot, and it can help you arrange the housework clearly. Are you a little excited?

What makes Gemini Robotics most eye-catching is its generalization ability . This is not a "hushang" who can only execute preset programs. It has Gemini's powerful world knowledge, and can quickly understand and find solutions even in the face of brand new objects, various instructions, and even unprecedented environments.

Google proudly says that in the comprehensive generalization benchmark, Gemini Robotics performs more than twice as much as other top visual-language-action models. This is like a top student, who can not only easily deal with exams, but also learn from one example and solve various practical problems. In the future, when encountering emergencies, you no longer have to worry about the robot being "disconnected"!

QQ_1741834367635.png

"I understand your heart in seconds" caring assistant

Gemini Robotics also demonstrates amazing interactivity in human-computer interaction. It not only understands daily colloquial instructions, but also responds quickly to sudden changes in instructions or changes in surrounding environment.

What's even more amazing is that it can complete tasks independently without excessive intervention after receiving preliminary instructions. Imagine that while you drink coffee leisurely and say "Help me clean up the table", Gemini Robotics can quickly understand and flexibly deal with various possible minor accidents, such as accidentally knocking down a water cup, and it can also adjust its movements in time. .

Although Gemini Robotics has a high "IQ", its "emotional intelligence" - that is, flexibility is also excellent. Many of the fine movements that we humans are used to are often a huge challenge for traditional robots.

But Gemini Robotics can handle it easily, whether it is origami, packing lunches, or making a delicate salad, it can show delicate movements and precise coordination. If you want to have a love bento in the future, you may just need to give Gemini Robotics a simple recipe.

"Variety of Changes" is highly adaptable

What is even more surprising is that Gemini Robotics also has multi-form adaptability . It is not only suitable for a specific robot form, whether it is the two-arm robot platform ALOHA2 or the humanoid robot Apptronik's Apollo, Gemini Robotics can easily control it. This means that in the future we can see a variety of intelligent robots equipped with Gemini Robotics, playing their unique role in different fields.

QQ_1741834392508.png

In addition to Gemini Robotics, the "all-round player", Google has also launched Gemini Robotics-ER . "ER" here stands for "Embodied Reasoning".

This model focuses more on improving robots' spatial understanding of the physical world and can be combined with existing low-level controllers. It can greatly improve Gemini2.0's capabilities in object identification and 3D detection.

By combining spatial reasoning and Gemini's coding capabilities, Gemini Robotics-ER can even create brand new robotic capabilities "on the fly". For example, when you see a coffee cup, it can independently judge the most appropriate grasping method and safe movement trajectory.

Of course, while allowing AI to enter the real world, security issues are also the top priority. Google emphasized that they have taken comprehensive security measures, and have carefully considered everything from the underlying motor control to the high-level semantic understanding.

Gemini Robotics-ER can interact with the robot's original security controller, judge the safety of potential actions, and generate appropriate responses. In addition, Google has released a new dataset ASIMOV to evaluate and improve the semantic security of embodied AI and robots. They also work closely with internal and external experts, policy makers, and the Responsibility and Safety Commission to ensure that Gemini Robotics develops in accordance with ethical and safety standards.

In order to accelerate the implementation of Gemini Robotics, Google has cooperated with several robotics companies, including Apptronik, Agile Robots, Agility Robotics, Boston Dynamics and Enchanted Tools. Through collaboration with these industry leaders, we can look forward to seeing more intelligent robots equipped with Gemini Robotics appear in our lives and work in the near future.

Google's Gemini Robotics has undoubtedly injected new vitality into the fields of artificial intelligence and robotics. Its powerful multimodal understanding ability, excellent generalization, natural human-computer interaction and superb operating skills all herald the coming of an era of intelligent robots. As for whether this is a "gospel for workers" or it will bring some "small" career challenges, let's wait and see! After all, who doesn't want to have a smart and hardworking robot assistant?

Official blog: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world/