In the realm of autonomous driving, a team of researchers from the Autonomous Driving Research Institute of Baidu, led by Jianhua Han, has developed a novel model aimed at enhancing the perception and decision-making capabilities of self-driving vehicles. Their work, titled “Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving,” was published in the prestigious journal Nature Communications.
Autonomous driving systems heavily rely on accurate and robust spatial perception to navigate complex environments. However, current models often struggle with long-tail scenarios and intricate interactions, leading to failures. To address these challenges, the researchers introduced Percept-WAM, a model that integrates 2D and 3D scene understanding abilities within a single vision-language model (VLM). This integration is a first in the field, setting Percept-WAM apart from existing systems.
Percept-WAM unifies 2D and 3D perception tasks into World-PV and World-BEV tokens, which encode both spatial coordinates and confidence levels. The model employs a grid-conditioned prediction mechanism for dense object perception, incorporating IoU-aware scoring and parallel autoregressive decoding. These features improve stability in challenging scenarios, such as long-tail, far-range, and small-object detection.
One of the key advantages of Percept-WAM is its ability to leverage pretrained VLM parameters, retaining general intelligence capabilities like logical reasoning. The model can directly output perception results and trajectory control outputs, making it a comprehensive solution for autonomous driving.
Experiments conducted by the researchers demonstrated that Percept-WAM matches or surpasses classical detectors and segmenters on downstream perception benchmarks. It achieved impressive results, such as 51.7/58.9 mAP on COCO 2D detection and nuScenes BEV 3D detection. When integrated with trajectory decoders, Percept-WAM further improved planning performance on nuScenes and NAVSIM, outperforming DiffusionDrive by 2.1 in PMDS on NAVSIM.
The practical applications of Percept-WAM for the energy industry are significant. As the energy sector increasingly adopts autonomous vehicles for tasks such as inspection, maintenance, and transportation, advanced perception models like Percept-WAM can enhance the efficiency, safety, and reliability of these operations. By improving the ability of autonomous vehicles to navigate complex environments, Percept-WAM can contribute to the broader adoption of autonomous technologies in the energy sector, ultimately driving innovation and sustainability.
This article is based on research available at arXiv.

