Spatial Retrieval Augmented
Autonomous Driving

Xiaosong Jia1*, Chenhe Zhang1*, Yule Jiang2*, Songbur Wong2*,
Zhiyuan Zhang2, Chen Chen3, Shaofeng Zhang4, Xuanhe Zhou2, Xue Yang2†,
Junchi Yan2†, Yu-Gang Jiang1
*Equal contributions   Correspondence authors
1Institute of Trustworthy Embodied AI, Fudan University
2Shanghai Jiao Tong University
3Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences
4University of Science and Technology of China


Abstract

  Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this "recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks.
  For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.
teaser

nuScenes-Geography Dataset

  We extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories.
  The dataset and development toolkit are publicly available:
distribution

Task Baselines

  To systematically investigate the spatial retrieval paradigm, we establish baselines across five key AD tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. We design a plug-and-play adapter to seamlessly integrate geographic images into existing models. Extensive experiments demonstrate this modality improves performance across tasks. All implementation repositories are hosted under the SpatialRetrievalAD organization:
Task Repository
Generative World Modeling Generative-World-Model
End-to-End Planning End2End-Planning
Online Mapping Online-Mapping
Occupancy Prediction Occupancy-Prediction
Object Detection 3D-Detection

Dataset Visualization

The following figures show the correspondence between Geography images and nuScenes images:
data_example_0
data_example_1
data_example_2

Comparison

Generative World Model Results. Conditioning UniMLVG and MagicDriveDiT on geographic images leads to lower FVD and FID, effectively preventing scene drift and preserving geometric consistency during rollouts. This demonstrates that spatial retrieval provides a structural scaffold for coherent world modeling.

generative
visualization_generative

Online Mapping Results. Integrating geographic priors into MapTR and MapTRv2 substantially improves online mapping. The extra background information enables recovery of occluded lanes.

mapping
visualization_mapping

Occupancy Results. Extending FB-OCC and FlashOCC yields consistent mIoU improvements, particularly on static categories. The incorporation of geographic priors further boosts mIoU on static terrain, as they provide additional background information.

occupancy
visualization_occupancy

End-to-end Planning Results. We evaluate how spatial retrieval improves safe planning with VAD. Geographic priors provide stable road-layout information, compensating for sensing failures under occlusion or low light. With similar trajectory accuracy, our method achieves better safety margins, reducing the collision rate from 0.55% to 0.48% in challenging night scenes.

planning
visualization_planning

Conclusion

  In this work, we present the spatial retrieval paradigm for AD, introducing geographic data as an additional input. We extend nuScenes with geographic data by Google Maps APIs and evaluate five key AD tasks on the extended nuScenes-Geography dataset. We propose a general plug-and-play Spatial Retrieval Adapter module as an intuitive baseline to incorporate geographic data. We propose Reliability Estimation to adaptively fuse geographic information based on the reliability of the retrieved data. Extensive experiments show that the proposed paradigm can enhance the performance of multiple AD tasks, demonstrating the substantial potential of the new paradigm.

Citation

If you use SpatialRetrievalAD in your research, please cite our paper:


Acknowledgments

We thank the following projects for their contributions to the development of this project: BEVDet, BEVFormer, FB-OCC, FlashOCC, MagicDriveDiT, MapTR, MapTRv2, nuScenes, PETR, UniMLVG, VAD