mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity

Abstract

We present a diffusion-based model recipe for real-world control of a highly dexterous humanoid robotic hand, designed for sample-efficient learning and smooth fine-motor action inference. Our system features a newly designed 16-DoF tendon-driven hand, equipped with wide angle wrist cameras and mounted on a Franka Emika Panda arm. We develop a versatile teleoperation pipeline and data collection protocol using both glove-based and VR interfaces, enabling high-quality data collection across diverse tasks such as pick and place, item sorting and assembly insertion. Leveraging high-frequency generative control, we train end-to-end policies from raw sensory inputs, enabling smooth, self-correcting motions in complex manipulation scenarios. Real-world evaluations demonstrate up to 93.3% out of distribution success rates, with up to a +33.3% performance boost due to emergent self-correcting behaviors, while also revealing scaling trends in policy performance. Our results advance the state-of-the-art in dexterous robotic manipulation through a fully integrated, practical approach to hardware, learning, and real-world deployment.

Autonomous Experiments

We train autonomous policies for a variety of tasks, including pick and place, item sorting, and assembly insertion. Our model architecture is based on diffusion policy. We refer to the paper for the key design choices for state and action representation that significantly boost performance and generalization for dexterous policies.

Bottle Sorting

We train a policy to grasp plastic bottles sideways on a conveyor belt and insert them precisely into a bottle rack. Success requires placing a bottle into an empty slot, permitting recovery from initial misplacements, such as re-orienting a poorly slotted bottle.

Bread Roll Pick and Place

We train a policy on a task involving picking a bread roll and placing it into a container. We evaluate it on unseen table/background settings with arbitrary loaf starting positions. The policy is reactive to disturbances.

Battery Insertion

We train a policy for a high precision task involving picking a battery from one rack, transporting it, and inserting it fully into a specific slot in a second rack, including a final "punch" motion to ensure it is connected. The policy reacts dynamically to different achieved grasps for the battery, reorienting the hand for precise insertion.

Teleoperation and Data Collection Protocol

We collect demonstration data via teleoperation using the Apple Vision Pro's hand and wrist tracking capabilities.

To boost success rates, we follow a data collection protocol involving iteratively collecting data, labeling it and filtering it, training policies, and re-collecting error self-correction trajectories to address common pitfalls. For each of the tasks, we collect data over a large variety of randomized initial conditions, including different object placements, lighting conditions, and backgrounds.

BibTeX

@article{nava2025mimicone,
  author    = {Nava, Elvis and Montesinos, Victoriano and Bauer, Erik and Forrai, Benedek and Pai, Jonas and Weirich, Stefan and Gravert, Stephan-Daniel and Wand, Philipp and Polinski, Stephan and Grewe, Benjamin F. and Katzschmann, Robert K.},
  title     = {mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity},
  journal   = {arXiv preprint arXiv:2506.11916},
  year      = {2025},
}