Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile Manipulation (EMMA), an end-to-end framework training mobile manipulation policies from human mobile manipulation data with static robot data, sidestepping mobile teleoperation. To accomplish this, we co-train the human full-body motion data with the static robot data. In our experiments across three real-world tasks, EMMA demonstrates comparable performance to baselines trained on teleoperated mobile robot data (Mobile ALOHA), achieving higher or equivalent task performance in full task success with significantly less data. We find that EMMA is able to generalize to new spatial configurations and scenes, and we observe positive performance scaling as we increase the hours of human data, opening new avenues for scalable robotic learning in real-world environments.
EMMA: The model cotrains human fullbody motion data with static robot data. and outputs a unified policy that enables mobile manipulation deployment without mobile robot teleoperation.
@misc{zhu2025emmascalingmobilemanipulation,
title={EMMA: Scaling Mobile Manipulation via Egocentric Human Data},
author={Lawrence Y. Zhu and Pranav Kuppili and Ryan Punamiya and Patcharapong Aphiwetsa and Dhruv Patel and Simar Kareer and Sehoon Ha and Danfei Xu},
year={2025},
eprint={2509.04443},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2509.04443},
}