The Universal Manipulator Interface (UMI) is a data collection and policy learning framework that allows the direct transfer of skills from live human demonstrations to deployable robot policies. UMI uses a handheld gripper combined with careful interface design to enable portable, low-cost, and information-rich data collection for challenging bimanual and dynamic manipulation demonstrations. To facilitate deployable policy learning, UMI incorporates a carefully designed policy interface with inference-time delayed matching and relative trajectory action representation. The resulting learning strategy is hardware-independent and can be deployed on multiple robotic platforms. Equipped with these capabilities, the UMI framework unlocks new robotic manipulation capabilities, allowing for generalized dynamic, bimanual, precise and long-duration behaviors by simply changing the training data for each task, thus enabling zero adjustments. We demonstrate the generality and effectiveness of UMI through comprehensive real-world experiments, in which a UMI policy trained solely using a variety of human demonstrations achieves generalization with zero adjustments when faced with new environments and objects.
Demand group:
["Robot skill learning", "Handheld devices with external sensors", "Human-computer interaction interface design"]
Example of usage scenario:
Use UMI to collect various daily actions, such as throwing balls, folding clothes, washing dishes, etc.
No calibration required, deploy trained strategies directly on different robot platforms
Use CLIP pre-trained ViT as the visual encoder to make the policy more responsive to changes
Product features:
Portable data collection, starts in 2 minutes
Camera-led action representation, no calibration required, highly robust
Fast data collection, 30 seconds per presentation
Generalizes with zero adjustments and can be deployed in new environments