Learning Dexterous Manipulation with Three Independent Fingers from Human Demonstrations

Humans have proven to be powerful teachers for robot manipulation skills via imitation learning. How can we leverage this potential for robots with a morphology unlike our own? In this work, we demonstrate that teleoperation of a three-fingered robot morphology is both feasible and effective for dexterous manipulation tasks. To address the challenges posed by the embodiment gap between human demonstrators and non-humanoid robots, we investigate three teleoperation strategies: fingertip matching using hand tracking from a commercial AR headset, control via motion controllers, and kinesthetic teaching with a leader robot. We collect demonstrations on a suite of dexterous manipulation tasks, including assembling a 3D-printed object and folding a napkin. We then train manipulation policies with ACT and Diffusion Policy and evaluate their success on the respective tasks. The policies trained on data collected via motion controllers and kinesthetic teaching generally outperform those trained on hand-tracking data. We additionally fine-tune vision-language-action models on pick-and-place data collected with the TriFinger robot. The resulting policies achieve high success rates for in-distribution tasks and can generalize to objects not seen during fine-tuning, demonstrating that large-scale pretraining can be leveraged for this non-standard embodiment. We release open-source datasets and policy checkpoints to support further research in non-anthropomorphic dexterous manipulation.

Learning Dexterous Manipulation with Three Independent Fingers from Human Demonstrations

Abstract

Pipeline Overview

Why three independent fingers?

Franka

TriFinger

Teleoperation Methods

Hand tracking

Motion controllers

Kinesthetic teaching

Autonomous Imitation Learning Rollouts

Successful rollouts

ACT

Pass Through (Kinesthetic data)

Assemble (Kinesthetic data)

Disassemble (Kinesthetic data)

Fold (Kinesthetic data)

Unfold (Controller data)

Insert Battery (Controller data)

Remove Battery (Controller data)

Diffusion Policy

Pass Through (Controller data)

Assemble (Controller data)

Disassemble (Controller data)

Fold (Controller data)

Unfold (Controller data)

Insert Battery (Controller data)

Remove Battery (Controller data)

Autonomous VLA Rollouts

Successful and unsuccessful rollouts

In-distribution

\(\pi_0\)

Prompt: Put apple into cup (2/2)

Prompt: Stack lemon on top of green cylinder (0/2)

Prompt: Insert blue hexagonal prism into red cup (1/2)

Prompt: Put white cone onto green cylinder (2/2)

Prompt: Remove pear from red plate (0/2)

Prompt: Put pear in cup closest to apple (2/2)

\(\pi_{0.5}\)

Prompt: Put apple into cup (2/2)

Prompt: Stack lemon on top of green cylinder (2/2)

Prompt: Insert blue hexagonal prism into red cup (2/2)

Prompt: Put white cone onto green cylinder (2/2)

Prompt: Remove pear from red plate (2/2)

Prompt: Put pear in cup closest to apple (2/2)

Out-of-distribution

Objects not seen during fine-tuning are marked in orange in the prompt.

\(\pi_0\)

Prompt: Pick up green apple (0/2)

Prompt: Stack blue cone on white cylinder (0/2)

Prompt: Put baseball into red bowl (1/2)

Prompt: Remove green apple from red plate (0/2)

Prompt: Put white sphere into green cup (0/2)

Prompt: Put plush toy into blue bowl (2/2)

\(\pi_{0.5}\)

Prompt: Pick up green apple (2/2)

Prompt: Stack blue cone on white cylinder (2/2)

Prompt: Put baseball into red bowl (2/2)

Prompt: Remove green apple from red plate (2/2)

Prompt: Put white sphere into green cup (0/2)

Prompt: Put plush toy into blue bowl (2/2)

\(\pi_{0.5}\) Trained on Combined Data

The combined dataset includes all datasets except those collected with hand tracking as they are too noisy.

Failed Rollouts

Some examples of failed rollouts, mostly caused by drifting out of distribution.