Thesis Work:30 credits - Self-supervised pre-training of multi-modal networks for efficient dynamic

30 credits – Self-supervised pre-training of multi-modal networks for efficient dynamic object detection

Introduction
Are you passionate about cutting-edge technology? Would you like to contribute to future autonomous driving? This thesis project is an excellent opportunity to learn about autonomous heavy vehicles and contribute to a highly competitive, fast-moving industry.

Background
AI-based perception is a key enabler for autonomous driving. There are many challenges when replacing the human driver with a safe robot: wide field of view with no blind spots, detecting objects at long range, detecting and predicting movement in various weather conditions, etc. This is typically addressed with a variety of perception sensor modalities and with increased sensor resolution. To handle this large amount of data, one needs efficient multi-modal ML architectures but also efficient ways of leveraging large amounts of training data. Since annotation of multi-modal data is resource demanding, supervised learning is typically limited to a small fraction of the available data. Self-supervised learning has successfully been used to leverage large amounts of unannotated data.

Objective
Investigate self-supervised learning techniques for pre-training ML models, using temporal, multi-sensor, and multi-modality data. The focus is on architectures that can fuse data from multiple lidars and cameras; on data from heavy vehicles, collected from highly dynamic environments; and on detecting moving objects. Of particular importance for the work is the ability to detect the dynamics of objects with very few detections (either because they are very small, or very far, or are partially occluded).

Job description
The thesis can roughly be divided into the following sub-tasks:

Literature survey on multi-sensor, multi-view, multi-modality sensor fusion; on scene flow; and on self-supervised learning and pretext tasks
Selecting a suitable architecture for fusing data from multiple lidars and cameras, and for extracting dynamic-aware features
Prototyping of self-supervised pretraining mechanism
Evaluating the performance of the proposed pretraining mechanism.

Qualifications

Currently enrolled in a Master’s program in an field (e.g. AI, Computer Vision, etc.)
Strong background in transformer based architectures. Knowledge in LiDAR based perception algorithms, multi-temporal perception , and/or experience of self-supervised learning is valued.
Strong programming skills for designing and training ML models (Pytorch, etc.)
Excellent analytical and problem-solving skills, and the ability to work independently
Able to work in a diverse environment and communicate effectively in English

Number of students: 1
Start date: January 2026
Estimated time needed: 20 weeks

Contact person and supervisor
Bogdan Timus, Perception Research and Advanced Engineering
bogdan.timus@se.traton.com

Application
Enclose transcript of records, as well as CV and cover letter in which you highlight how you fulfill the mentioned qualifications.

A background check might be conducted for this position. We are conducting interviews continuously and may close the recruitment earlier than the day specified.

Requisition ID: 21992

Number of Openings: 1.0

Part-time / Full-time: Full-time

Permanent / Temporary: Temporary

Country/Region: SE

Location(s):

Södertälje, SE, 151 38

Required Travel: 0%

Workplace: On-site

This position is within one of TRATON’s companies.

Thesis Work:30 credits - Self-supervised pre-training of multi-modal networks for efficient dynamic