30hp-Learning realistic driving behaviours for autonomou heavy vehicles using Reinforcement learning

Introduction:
Traton Group R&D is joint R&D organisation shared between several brands including Scania, MAN, International, and Volkswagen Truck & Bus. A thesis work at Traton is an excellent way to get closer to the company and build relationships for the future. Many of today's employees began their Traton/Scania career with their degree project.

Background:
Autonomous vehicles (AVs) are expected to transform the transportation industry, but some major challenges remain to be solved. Models trained with Imitation Learning (IL) typically struggle to handle long-tail scenarios where the amount of collected data is limited due to distributional shift . This limits the model’s ability to generalize to the wide range of traffic situation that an AVs will face out in the real world.

Especially for long-tail scenarios reinforcement learning (RL) have been identified as a promising tool to increase reliability and robustness1. However, yielding realistic agent behaviour policies is still a challenge due to the complexity in reward function design . To migrate this issue, several methods such as regularization and adversarial Inverse RL have been investigated, bringing the learnt policy closer to a human-like reference. Further, recent advancements in development of high-performance simulators for autonomous driving have enabled training behaviour models in large-scale using self-play , posing an alternative to traditional methods depending on collected data for training. So far, these works have focused on training models for cars, showing promising results towards learning realistic behaviours only by using simulated data. There is of interest of Traton to investigate how these methods can be expanded to cover use cases with heavy-duty vehicles including truck and trailer configurations, both considering self-play solutions and reward design.

Objective:
The task is to extend an existing simulation framework such as GPUDrive or similar to support truck and trailer combinations. This simulator should then be used for training a behaviour policy using reinforcement learning and self-play, or for exploring different reward-shaping methods. Results will be evaluated towards real-world driving data to evaluate the performance and realism of the learnt policy.

Job description:
The assignment is divided into sub-tasks:

[1] Lu et al. “Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios”. In: Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS).

[2] Knox et al. “Reward (Mis)design for Autonomous Driving”. In: Artif. Intell. 316 (2023).

[3] Kazemkhani et al. “GPUDrive: Data-driven, Multi-agent Driving Simulation at 1 Million

FPS”. In: Proc. Int. Conf. Learn. Represent. (ICLR). 2025.

[4] Zhang et al. “Learning to Drive via Asymmetric Self-Play”. In: Computer Vision – ECCV.

2024.

- Implement a dynamic model supporting heavy-duty vehicles in an existing high-performance simulator
- (a) Implement and train a behaviour model using RL and self-play
- (b) Investigate different reward-shaping methods using RL
- Evaluate the performance and realism of learnt policy

[1] Lu et al. “Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios”. In: Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS).

[2] Knox et al. “Reward (Mis)design for Autonomous Driving”. In: Artif. Intell. 316 (2023).

[3] Kazemkhani et al. “GPUDrive: Data-driven, Multi-agent Driving Simulation at 1 Million

FPS”. In: Proc. Int. Conf. Learn. Represent. (ICLR). 2025.

[4] Zhang et al. “Learning to Drive via Asymmetric Self-Play”. In: Computer Vision – ECCV. 2024.

Education:
Master (civilingenjör) in computer science, robotics, engineering physics, electrical engineering, applied mathematics, or similar.

Number of students: 1-2
Start date for the thesis work: January 2026
Estimated time required: 20 weeks

Contact persons and supervisors:

Caroline Skoglund, Software Engineer – Data-driven motion planning
caroline.skoglund@se.traton.com

Oscar Palfelt, Software Engineer – Data-driven motion planning
oscar.palfelt@se.traton.com

Application:
Your application must include a CV, personal letter, and transcript of grades

A background check might be conducted for this position. We are conducting interviews continuously and may close the recruitment earlier than the date specified.

[i]

Requisition ID: 21986

Number of Openings: 1.0

Part-time / Full-time: Full-time

Permanent / Temporary: Temporary

Country/Region: SE

Location(s):

Södertälje, SE, 151 38

Required Travel: 0%

Workplace: On-site

This position is within one of TRATON’s companies.

30hp-Learning realistic driving behaviours for autonomou heavy vehicles using Reinforcement learning