Thesis work 30hp - Explainable agents For Information retrieval from multimodal data

Smart Factory Lab

 

Background

Within Scania, massive amounts of unstructured data are continuously generated, for example documents, images, audio, videos, and tabular files. Accessing relevant and usable information from these sources remains a major challenge. Recent advancements in multimodal agents offer new possibilities: these agents dynamically orchestrate specialized tools for each data modality (e.g., text extraction, image processing, audio transcription), combine intermediate results, and reason over them to produce coherent, explainable responses.

This thesis will focus on the design of a scalable, explainable information retrieval system based on multimodal agents. The system will extract, represent, and make information accessible and explainable across multiple data types. Students will have access to cloud platforms such as AWS and Snowflake to build scalable, reproducible solutions.

 

Assignment

The main goal of this thesis is to design and evaluate an information retrieval system on multimodal data. The system should be developed, deployed, and tested in a cloud environment, focusing on scalability, reproducibility, and explainability.

 

The challenges include:

Extraction: Implement methods for extracting information from documents, images, audio, video, and tabular data.
Representation: Build structured knowledge representations (e.g., knowledge graphs, relational or vector databases) that support efficient retrieval.
Accessibility: Modularize and expose the represented knowledge via APIs or MCP servers to enable seamless integration with other systems.
Explainability: Ensure responses are transparent and traceable, clearly referencing their sources and reasoning steps.
Evaluation: Evaluate the system across multiple layers (extraction accuracy, representation quality, accessibility, and explainability) .
 

Even if you don’t have experience with everything mentioned above, we still encourage applications from students of all backgrounds and perspectives. Participants will gain hands-on experience and receive regular mentorship and collaboration opportunities throughout the project.

 

Education and time plan

Education: Master’s program in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Industrial Analytics

Number of students: 1 - 2

Start date: January 2026

Estimated time needed: 20 weeks

Topics: Artifical Intelligence, Agents, Explainability, Cloud

 

Contact persons and supervisors:
Swathi Rao and Joris Rombouts will be the supervisors and will be able to answer questions on the project.   

email:swathi.rao@scania.com and joris.rombouts@scania.com 

 

Application:
Your application must include a CV, personal letter and transcript of grades.

 

A background check might be conducted for this position. We are conducting interviews continuously and may close the recruitment earlier than the date specified.   

 

Requisition ID:  21549
Number of Openings:  1.0
Part-time / Full-time:  Full-time
Permanent / Temporary:  Permanent
Country/Region:  SE
Location(s): 

Södertälje, SE, 151 38

Required Travel:  0%
Workplace:  Hybrid