GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Zebin Xing^1,2*, Xingyu Zhang^2*, Yang Hu², Bo Jiang^4,2, Tong He⁵, Qian Zhang²,
Xiaoxiao Long³, Wei Yin^2✝,
(^* Indicates Equal Contribution, ^✝ Indicates Corresponding Author, Project Leader)
Contact: xzebin@bupt.edu.cn, yvanwy@outlook.com

¹ University of Chinese Academy of Sciences, ² Horizon Robot, ³ Nanjing University, ⁴ Huazhong University of Scicence & Technology, ⁵ Shanghai AI Laboratory

Paper Code

Abstract

In autonomous driving scenarios, there is rarely a single suitable trajectory. Recent methods have increasingly focused on modeling multimodal trajectory distributions. However, they suffer from trajectory selection complexity and reduced trajectory quality due to high trajectory divergence and inconsistencies between guidance and scene information. To address these issues, we introduce GoalFlow, a novel method that effectively constrains the generative process to produce high-quality, multimodal trajectories. specifically: 1) To resolve the trajectory divergence problem inherent in diffusion-based methods, GoalFlow constrains the generated trajectories by introducing a goal point. 2) GoalFlow establishes a novel scoring mechanism that selects the most appropriate goal point from the candidate points based on scene information. 3) Furthermore, GoalFlow employs an efficient generative method, Flow Matching, to generate multimodal trajectories, and incorporates a refined scoring mechanism to select the optimal trajectory from the candidates. Our experimental results, validated on the Navsim, demonstrate that GoalFlow achieves state-of-the-art performance, delivering robust multimodal trajectories for autonomous driving. GoalFlow achieved PDMS of 90.3, significantly surpassing other methods. Compared with other diffusion-policy-based methods, our approach requires only a single denoising step to obtain excellent performance.

Demos

Fig 1: Driving vedios generated by GoalFlow, where the \( \times \) is the goal point predicted.

Pipeline of GoalFlow

Fig2: GoalFlow consists of three modules. The Perception Module is responsible for integrating scene information into a BEV feature \( F_{\text{bev}} \), the Goal Point Construction Module selects the optimal goal point from Goal Point Vocabulary \( \mathbb{V} \) as guidance information, and the Trajectory Planning Module generates the trajectories by denoising from the Gaussian distribution to the target distribution. Finally, the Trajectory Scorer selects the optimal trajectory from the candidates.

Goal Point Construction

Fig 3: (a) shows the detailed structure of the Goal Point Construction Module, and (b) presents the score distributions of \( \{ \hat{\delta}^{dis}_i \}^N \), \( \{ \hat{\delta}^{dac}_i \}^N \), and \( \{ \hat{\delta}^{final}_i \}^N \), where points with higher scores are highlighted with warmer color.

Comparison with others

Fig 4: \( \times \) indicates that the trajectory results in a collision or goes beyond the drivable area, while ✔ represents a safe trajectory. The orange points are generated by the Goal Constructor, while the blue and yellow points correspond to samples from the vocabulary. The results highlight that GoalFlow generates higher-quality trajectories compared to the other two methods.