Spatial Retrieval Augmented Autonomous Driving (CVPR 2026)

Abstract

Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this "recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks.
For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.

nuScenes-Geography Dataset

We extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories.
The dataset and development toolkit are publicly available:

Dataset: nuScenes-Geography-Data on Hugging Face
Devkit: SpatialRetrievalAD-Dataset-Devkit on GitHub

Task Baselines

To systematically investigate the spatial retrieval paradigm, we establish baselines across five key AD tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. We design a plug-and-play adapter to seamlessly integrate geographic images into existing models. Extensive experiments demonstrate this modality improves performance across tasks. All implementation repositories are hosted under the SpatialRetrievalAD organization:

Task	Repository
Generative World Modeling	Generative-World-Model
End-to-End Planning	End2End-Planning
Online Mapping	Online-Mapping
Occupancy Prediction	Occupancy-Prediction
Object Detection	3D-Detection

Dataset Visualization

The following figures show the correspondence between Geography images and nuScenes images:

Comparison

Generative World Model Results. Conditioning UniMLVG and MagicDriveDiT on geographic images leads to lower FVD and FID, effectively preventing scene drift and preserving geometric consistency during rollouts. This demonstrates that spatial retrieval provides a structural scaffold for coherent world modeling.

Online Mapping Results. Integrating geographic priors into MapTR and MapTRv2 substantially improves online mapping. The extra background information enables recovery of occluded lanes.

Occupancy Results. Extending FB-OCC and FlashOCC yields consistent mIoU improvements, particularly on static categories. The incorporation of geographic priors further boosts mIoU on static terrain, as they provide additional background information.

End-to-end Planning Results. We evaluate how spatial retrieval improves safe planning with VAD. Geographic priors provide stable road-layout information, compensating for sensing failures under occlusion or low light. With similar trajectory accuracy, our method achieves better safety margins, reducing the collision rate from 0.55% to 0.48% in challenging night scenes.

Conclusion

In this work, we present the spatial retrieval paradigm for AD, introducing geographic data as an additional input. We extend nuScenes with geographic data by Google Maps APIs and evaluate five key AD tasks on the extended nuScenes-Geography dataset. We propose a general plug-and-play Spatial Retrieval Adapter module as an intuitive baseline to incorporate geographic data. We propose Reliability Estimation to adaptively fuse geographic information based on the reliability of the retrieved data. Extensive experiments show that the proposed paradigm can enhance the performance of multiple AD tasks, demonstrating the substantial potential of the new paradigm.

Citation

If you use SpatialRetrievalAD in your research, please cite our paper:


@inproceedings{jia2026spatial,
      title={Spatial Retrieval Augmented Autonomous Driving}, 
      author={Xiaosong Jia and Chenhe Zhang and Yule Jiang and Songbur Wong and Zhiyuan Zhang and Chen Chen and Shaofeng Zhang and Xuanhe Zhou and Xue Yang and Junchi Yan and Yu-Gang Jiang},
       booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
       year={2026}
},
}

Acknowledgments

We thank the following projects for their contributions to the development of this project: BEVDet, BEVFormer, FB-OCC, FlashOCC, MagicDriveDiT, MapTR, MapTRv2, nuScenes, PETR, UniMLVG, VAD