Thesis Work: 30 credits - Language-Action Alignment for VLM-Guided Autonomous Driving
Introduction
A thesis project at Traton is an excellent way of making contacts for your future working life. Many of our current employees began their careers with a thesis project.
Background
End-to-end autonomous driving integrates perception, reasoning, and decision-making into a unified pipeline. Recent advances in vision-language models (VLMs) have enhanced end-to-end driving by enabling richer scene understanding, commonsense reasoning, and language-guided control. Prior works [1,2] demonstrate that integrating language into driving pipelines improves interpretability and generalization. However, a critical challenge remains: the misalignment between what a model understands in language and what it does in action. For example, a model may recognize a red light yet still fail to stop, revealing that its behavior does not fully reflect its linguistic understanding. The language-action alignment framework, introduced in works like SimLingo (2025) [3], aims to bridge this gap by jointly training models to associate language instructions with corresponding driving behaviors. This alignment is essential for building interpretable, trustworthy, and instruction-following autonomous agents.
Objective
This thesis aims to investigate how to ensure that an end-to-end autonomous driving model’s behavior truly reflects its linguistic understanding. The study will explore new strategies to strengthen language–action alignment, guided by following research questions.
- What training strategies or architectures can better enforce consistent alignment between linguistic instructions and driving actions?
- How can novel strategies such as counterfactual statements (“what if” reasoning) or self-dialogue, help a driving model make more causally consistent and interpretable decisions?
Job description
The thesis can roughly be divided into the following sub-tasks:
- Survey and summarize related literature on VLM/VLA- guided end-to-end planning in autonomous driving.
- Reproduce the baseline SimLingo, or something similar.
- Proposal and design new training strategies or networks.
- Evaluate performance of proposed strategies / network architectures / generated training data.
[1] Tian, Xiaoyu, et al. "Drivevlm: The convergence of autonomous driving and large vision-language models." Conference on Robot Learning (CoRL) 2024.
[2] Fu, Haoyu, et al. "Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation”. International Conference on Computer Vision (ICCV) 2025.
[3] Renz, Katrin, et al. "Simlingo: Vision-only closed-loop autonomous driving with language-action alignment." Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR). 2025.
Education
Master (civilingenjör) in machine learning, robotics, computer science, engineering physics, electrical engineering, or applied mathematics, preferably with specialization in AI.
Number of students: 1-2
Start date: January 2026
Estimated time needed: 20 weeks
Contact person and supervisor
Truls Nyberg, Industrial PostDoc in Autonomous Research, Autonomous Motion,
truls.nyberg@scania.com, 08 – 553 535 27
Carol Yi Yang, Industrial PhD student in Autonomous Research, Perception,
carol-yi.yang@scania.com, +46 70 081 17
Application
Enclose CV, cover letter and transcript of records.
A background check might be conducted for this position. We are conducting interviews continuously and may close the recruitment earlier than the day specified.
Södertälje, SE, 151 38