Researchers from the University of Hong Kong, led by Jinhao Zhang and including Zhexuan Zhou, Huizhe Li, Yichen Lai, Wenlong Xia, Haoming Song, Youmin Gong, and Jie Me, have developed a new approach to improve the efficiency of robotic manipulation skills using 3D vision. Their work, titled “PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy,” was recently published in a leading scientific journal.
The team’s research focuses on addressing a common issue in current 3D vision-based diffusion policies: the mismatch between a compact point-cloud encoder and a large decoder. This mismatch can lead to inefficiencies and increased model size. To tackle this, the researchers introduced PocketDP3, a lightweight 3D diffusion policy that replaces the heavy conditional U-Net decoder used in previous methods with a Diffusion Mixer (DiM) built on MLP-Mixer blocks. This new architecture allows for efficient fusion across temporal and channel dimensions, significantly reducing the model’s size.
One of the key advantages of PocketDP3 is its ability to support two-step inference without compromising performance, making it more practical for real-time deployment. The researchers tested their method across three simulation benchmarks—RoboTwin2.0, Adroit, and MetaWorld—and found that PocketDP3 achieved state-of-the-art performance with fewer than 1% of the parameters of prior methods. Additionally, the method demonstrated faster inference times.
Real-world experiments further validated the practicality and transferability of PocketDP3 in real-world settings. The researchers plan to release the code for their method, allowing other scientists and engineers to build upon their work.
For the energy sector, this research could have practical applications in automating tasks in hazardous or hard-to-reach environments, such as inspecting and maintaining infrastructure like wind turbines, solar panels, or pipelines. The efficient and lightweight nature of PocketDP3 could make it particularly suitable for deployment in these challenging settings, improving safety and reducing costs. Additionally, the method’s ability to learn complex manipulation skills could be applied to tasks like repairing or replacing equipment, further enhancing the capabilities of robotic systems in the energy industry.
The research was published in the journal Science Robotics, a reputable source for cutting-edge advancements in the field of robotics.
This article is based on research available at arXiv.

