WU (Vienna University of Economics and Business)

1. Text Mining and Machine Learning

Supervisor: Johann Mitlöhner

2. Visualizing Data in Virtual and Augmented Reality

Supervisor: Johann Mitlöhner

3. Causal Discovery Algorithms for timeseries data

Supervisors: Katrin Ehrenmüller, Marta Sabou

Keywords: causal discovery, smart industrial systems, causal representation, sensor data, semantic representation

Context: Causal Discovery for time series data, such as sensor data is a growing field of research, where causal relations are derived from data, even without expert knowledge. There are various methods available, which aim to find a causal graph that is most fitting to the data. However, different methods produce different causal graphs based on their assumptions. To be able to further investigate discrepancies between various methods, an analysis of the assumptions guiding the discovery of a causal graph, as well as potential preprocessing steps conducted to fit the data into a model are crucial.

Problem: Currently, no method exists to consolidate such discrepancies in causal graphs from multiple sources.

Goal/expected results of the thesis: This thesis will apply a set of causal discovery algorithms on existing data from an industrial use case, and compare their output. The goal is to identify assumptions and differences in the methods that could potentially explain the differences in generated causal graphs.

Research Questions: What are the main influences of causal discovery graph generation discrepancies?

How different do causal graphs generated from different discovery methods look?
What are differing assumptions/hyperparameter settings between applied methods?
Which causal graph is “more trustworthy”, and why?

Methodology:

Get familiar with existing causal discovery methods for timeseries data
Analyse the use case data
Apply a set of 3 causal discovery methods on the data, to create a causal graph
Analyse differences between the generated causal graphs
Analyse differences in assumptions, and metadata, between the applied causal discovery methods
Discuss potential consolidation approaches

Required Skills:

Experience with Python/R to implement the causal discovery methods
Data analysis skills for processing generated outputs and evaluating performance.
Motivation to work on causal discovery, as a growing field.
Critical thinking and the ability to analyse discrepancies based on model assumptions

References

[1] Göbler, K., Windisch, T., Drton, M., Pychynski, T., Roth, M., & Sonntag, S. (2024, March). causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery. In Causal Learning and Reasoning (pp. 609-642). PMLR.

[2] Assaad, C. K., Devijver, E., & Gaussier, E. (2022). Survey and evaluation of causal discovery methods for time series. Journal of Artificial Intelligence Research, 73, 767-819.

[3] Pearl, J. (2010). An introduction to causal inference. The international journal of biostatistics, 6(2).

4. Benchmarking Evaluation Protocols for Large Language Models as Judges in Multilingual Contexts

Supervisors: Svitlana Vakulenko, Clemencia Siro

5. A Survey of RAG Systems in the Legal Domain

Supervisors: Svitlana Vakulenko, Adrian Bracher

6. Classifying User Query Types for Graph-Based Retrieval

Supervisors: Svitlana Vakulenko, Adrian Bracher

7. Qualitative Analysis of Failure in Generative Retrieval

Supervisors: Svitlana Vakulenko, Adrian Bracher

8. Market Analysis of Visual Modelling Software for Supporting AI System Engineering

Supervisors: Alexander Prock, Fajar J. Ekaputra

Keywords: AI transparency, requirements engineering, visual modelling, market analysis, AI system engineering

Context: AI systems have become increasingly complex and difficult to understand, partially due to their nature as “black boxes” [1]. One approach to address this complexity and support AI system engineering is the Boxology Extended Annotation Model (BEAM). The original version of BEAM was introduced in a master thesis [2] and refined in [3]. BEAM enables simplified representations of AI systems through a dataflow-oriented abstraction, enhancing both comprehensibility and transparency. Using BEAM, an AI system can be depicted with various visual elements representing its components, such as data inputs and outputs, machine learning models, processes, and actors, along with the dataflow between them. Additionally, extensions for different perspectives, such as risk assessment or legal and ethical considerations, can be incorporated into the AI system representation. The visual representation can be automatically converted into a machine-readable format, suitable for further processing and analysis.

Problem: A prototype of the BEAM visual notation (see Github repository) exists as a library for draw.io, an open-source diagramming software. However, this prototype has several limitations. As draw.io is a general-purpose diagramming tool, it permits arbitrary drawings that deviate from the BEAM notation syntax, making automatic translation into a machine-readable format challenging. Moreover, support for various perspectives and levels of abstraction is limited, which would be beneficial to meet the needs of different stakeholders, such as engineers, project managers, or ethical experts.

Goal/expected results of the thesis:

Collection of requirements for the visual tool for the BEAM visual notation
Comparison of available tools (both open-source and commercial) against these requirements

Research Questions:

Which techniques are suitable for eliciting requirements for visual modelling software?
What are the specific requirements for visual modelling software tailored to the BEAM notation?
Which tools are available that meet these requirements?

Methodology:

Familiarize yourself with BEAM and AI system representation
Identify suitable requirements elicitation techniques from literature (see [4] as a starting point)
Conduct requirements elicitation using an appropriate method, e.g. involving a survey or interviews. Stakeholder groups and requirements regarding AI documentation were already identified in [2], this bachelor thesis shall extend this existing work with a focus on the visual modelling tool
Research available tools and evaluate their suitability based on the identified requirements

Required Skills:

Understanding of data science or machine learning workflows
Interest in requirements engineering and/or visual modelling

References

[1] Königstorfer, F., & Thalmann, S. (2022). AI Documentation: A path to accountability. Journal of Responsible Technology, 11, 100043. https://doi.org/10.1016/j.jrt.2022.100043.

[2] B. Kollmann, Towards a Workflow-based AI System Documentation, Master’s thesis, WU Wien, Vienna, AT, 2024. URL: https://semantic-systems.org/wp-content/uploads/2024/11/Kollmann.pdf.

[3] Ekaputra, F. J., Prock, A., & Kiesling, E. (2025). Towards Supporting AI System Engineering with an Extended Boxology Notation. Proceedings of the 2nd International Workshop on Knowledge Graphs for Responsible AI (KG-STAR 2025) co-located with the 22nd Extended Semantic Web Conference (ESWC 2025) (Vol. 4018). CEUR Workshop Proceedings. http://ceur-ws.org/Vol-4018/

[4] Pacheco, C., García, I., & Reyes, M. (2018). Requirements elicitation techniques: a systematic literature review based on the maturity of the techniques. IET Software, 12(4), 365-378. https://doi.org/10.1049/iet-sen.2017.0144.

9. Post-Quantum Cryptographic Research in Europe and Asia

Supervisors: Jennifer-Marieclaire Sturlese, Marta Sabou

10. Modeling Ethical Bias into Normative Semantic Web

Supervisors: Jennifer-Marieclaire Sturlese, Marta Sabou

Abstract:

Bias is often associated with negative outcomes, and rightfully so in many contexts. As such, bias may lead to unfair outcomes, reinforcing inequalities and excluding marginalized groups [1]. In the context of developing a normative Semantic Web based on humanistic values, bias can play an effective role by guiding technology to integrate fairness, transparency, and democracy [2]. Instead of being harmful, an ethical bias may be used to create a more accountable Semantic Web, ensuring that humanistic values are embedded in technological decision-making, reflecting the foundations of the Digital Humanism initiative [3].

In the theoretical part of the thesis, you will conduct a literature review on the evolving role of bias in digital technology, and link this to humanistic values focusing specifically on inclusion and democracy. In the empirical part of your thesis, you will develop a conceptual model that shows to which extent Semantic Web (for example, Linked Data, Wikidata, Recommender Systems) may act upon an ethical bias that persists on humanistic values including inclusivity, fairness and democracy. In your conceptual prototype, you discuss features that address this ethical bias through processing algorithms, data selection, and data representation. The aim of this thesis is to demonstrate how ethical bias can create a more inclusive, transparent, and fair Semantic Web, by modeling empirical solutions to the pressing issues related to it.

Keywords:

Ethical Bias, Normative Technology, Semantic Web, Digital Humanism

Preliminary References:

1. Hanna, M., Pantanowitz, L., Jackson, B., Palmer, O., Visweswaran, S., Pantanowitz, J., ... & Rashidi, H. (2024). Ethical and Bias Considerations in Artificial Intelligence (AI)/Machine Learning. Modern Pathology, 100686.
2. Reyero Lobo, P., Daga, E., Alani, H., & Fernandez, M. (2023). Semantic Web technologies and bias in artificial intelligence: A systematic literature
review. Semantic Web, 14(4), 745-770.
3. Werthner, H., Ghezzi, C., Kramer, J., Nida-Rümelin, J., Nuseibeh, B., Prem, E., & Stanger, A. (2024). Introduction to Digital Humanism: A Textbook (p. 637). Springer Nature.

11. Automated evaluation of AI-generated explanations

Supervisors: Stefani Tsaneva, Marta Sabou

Thesis supervision starting February/March 2026!

Keywords: ontology engineering, human-centric explanations, large language models

Context: Knowledge Engineering (KE) encompasses a variety of activities, including the acquisition of knowledge and its representation through semantic models such as ontologies. Traditionally, KE requires substantial manual effort to define, implement, and validate domain-specific requirements. Moreover, tool support for many KE tasks remains limited, increasing the likelihood of modeling errors, especially when ontology engineers lack advanced KE training or are working with complex logical constraints. Recently, to support ontology engineers, the potential of Large Language Models (LLMs) has been explored in the context of ontology verification, specifically for defect detection, classification, explanation, and correction. While initial studies demonstrate that LLMs can assist with these tasks, further experimentation is necessary to generalize and extend these findings.

Problem: Currently, there is a lack of tools and methods for the automated evaluation of AI-generated explanations within the context of ontology validation.

Goal/expected results of the thesis: This thesis will investigate how LLMs can be utilised to annotate AI-generated explanations according to value-based requirements.

Research Questions: To what extent can LLMs annotate AI-generated explanations according to value-based requirements?

How accurately do LLMs evaluate ontology defect explanations?
Do different LLMs vary in their evaluation performance?
How consistent are LLM-generated annotations across repeated evaluations of the same input?

Methodology:

Get familiar with prior experiments on LLMs for ontology defect explanation and the produced explanations dataset.
Design and implement scripts (e.g., in Python or other suitable languages) to prompt various LLMs for explanation evaluation tasks.
Perform experiments across different LLMs and analyse the results.

Required Skills:

Understanding of ontologies, ontology constraints and reasoning (SBWL K2 completed).
Experience with Python (or other languages that support API access to LLMs).
Data analysis skills for processing generated outputs and evaluating performance.

References

Tsaneva, S., Herwanto, G. B., Llugiqi, M., & Sabou, M. Knowledge Engineering with Large Language Models: A Capability Assessment in Ontology Evaluation. https://www.semantic-web-journal.net/system/files/swj3852.pdf
C.-H. Chiang, H.-y. Lee, Can large language models be an alternative to human evaluations?, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023. 10.18653/v1/2023.acl-long.870

12. AI-based data completion for fair representation in online discussions

Supervisors: Felicia Schmidt, Jan Maly

Keywords: computational social choice, recommender systems, digital democracy

Context: Online discussions are a crucial part of modern democratic deliberation. To reap the benefits of such debates, it needs to be possible to summarize them with statements that best represent the different discussion points. So far, most algorithms simply show majority opinions, which tend to neglect the full spectrum of beliefs. The research field of computational social choice offers algorithms for summaries with better representation guarantees. However, the sheer multitude of comments in online discussions makes it impossible for any single user to express their opinion on all of them. In this thesis project, we will explore whether we can use modern AI technologies to bridge this gap in information to further fair representation also in a digital democracy setting.

Problem: Currently, existing methods for selecting representative statements in online discussions struggle with the highly incomplete information on user opinions.

Goal/expected results of the thesis: This thesis will investigate how machine learning techniques can best be used to predict user opinions on statements in online discussions.

Research Questions: To what extent can matrix completion methods accurately predict users’ approval of online discussion statements? Can these methods be combined with state-of-the-art algorithms to accurately represent users’ opinions?

Methodology:

Get familiar with the field of computational social choice and machine learning-based matrix completion methods.
Design and implement scripts (in Python) to use these completion methods on data from real-world online discussions.
Perform experiments across different completion methods and analyse the results.

Required Skills:

Good understanding of data analysis, ideally with python.
Willingness to learn about mathematical measures of fairness.

References

Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, and Nimrod Talmon. Multiwinner Voting: A New Challenge for Social Choice Theory. In Ulle Endriss (editor), Trends in Computational Social Choice, chapter 2, pages 27–47. AI Access, 2017. https://archive.illc.uva.nl/COST-IC1205/BookDocs/Chapters/TrendsCOMSOC-02.pdf

Zhaoliang Chen, Shiping Wang. A review on matrix completion for recommender systems. https://doi.org/10.1007/s10115-021-01629-6

13. Testing Algorithms for Digital Democracy in Simulations

Supervisors: Hanna Kern, Jan Maly

Keywords: Computational Social Choice, Fairness, Democracy, Voting, Simulations, Simulations in Python.

Context:

A growing number of novel, digital participation processes allow citizens to express their opinions and directly influence policy decisions on a wide range of topics, from the design of a new park in Vienna [1] to the shape of the new constitution in Chile [2] and Iceland [3]. One major challenge in such digital democracy processes is the fair representation of minority opinions. Computer scientists have, in recent years, developed novel tools and algorithms that can be used to make the different forms of digital participation fairer and more representative.

In this thesis, we will focus on a setting that has not received a lot of attention so far, namely on elections where voters can express which candidates or opinions they approve of and which they disapprove of - a model that captures in particular many large scale deliberation processes, hosted on platforms like Pol.is. Kraiczy et al. (2025) [4] recently proposed fair and proportional voting rules for this setting and showed that they perform well in theory. In this thesis, we will investigate in simulations on real-world and synthetic data whether the proposed rules also work well in practice.

Problem:

Research into voting with approvals and disapprovals is very new and only theoretical, therefore there is a lack of more practical studies and simulations.

Goal/expected results of the thesis:

This thesis will experimentally investigate how fair the outcomes of the different voting rules in this setting are and how much they are affected by small changes in the voting instance.

Research Questions:

How fair are the outcomes produced by the newly proposed voting rules?
How much does the outcome change if we add new candidates?
To what extent does the probability of approval or disapproval of an added candidate change the outcome?

Methodology:

Get familiar with the setting and the intuition behind the proposed voting rules
Develop Python scripts to run the simulations on different real-world and synthetic data-sets.
Evaluate the results of the simulations.

Required Skills:

Good understanding of data analysis, ideally with python.
Willingness to learn about mathematical measures of fairness.

References:

[1] https://mitgestalten.wien.gv.at/de-DE/projects/miep-gies-park
[2] https://europeandemocracyhub.epd.eu/wp-content/uploads/2023/12/Case-Study-Chile-FINAL-v2.pdf
[3] Hélène Landemore, When public participation matters: The 2010–2013 Icelandic constitutional process, International Journal of Constitutional Law, Volume 18, Issue 1, January 2020, Pages 179–205,https://doi.org/10.1093/icon/moaa004
[4] Sections 1,2,3 of:

Kraiczy, Sonja, Georgios Papasotiropoulos, and Piotr Skowron. "Proportionality in Thumbs Up and Down Voting." arXiv preprint arXiv:2503.01985 (2025).

Boehmer, Niclas, et al. "Guide to numerical experiments on elections in computational social choice." arXiv preprint arXiv:2402.11765 (2024).

14. The Football ChatBot: A GraphRAG-based Approach for QA over Football KG

Supervisor: Fajar J. Ekaputra

Main idea: Populating the Football-CDF Ontology (into a Knowledge Graphs – KG) with selected datasets and developing an LLM-based Question Answering application on top of the KG.

Background: The applications of AI in sports, particularly in football (soccer), have been growing in the last few years. Example applications include player recruitment, performance monitoring, and player selection [1]. Despite this progress, practical barriers often hinder the effective use of football data for AI applications. To address these issues, the Football Common Data Format (CDF) has been proposed as a community-driven effort [2], and recently, an ontology serialization of it has been introduced [3]. The evaluation of the Football CDF ontology, however, has been limited, and it has not been utilised in user-facing applications.

Research Problem and Questions: Pertinent to the challenge mentioned above, the main research question for this thesis topic is: To what extent can we utilise the Football-CDF ontology for question answering mechanisms in the football domain?

Several (sub-) research questions can be explored as sub-research questions, as follows:

How to populate the Football-CDF ontology with selected datasets into a KG?
How to design and develop a GraphRAG-based solution for Football-CDF KG?
How to evaluate the developed QA system of the Football domain?

Expected Tasks:

Literature study on the methods for the KG population and GraphRAG-based QA
KG population
QA application development & evaluation

Prior-Knowledge and Skills:

Understanding of ontologies (skill from course K2/K3 in the SBWL KM)
Proficiency in at least one programming language (Java or Python preferred)
The selected student would have a passion for the domain. Please contact me directly if you are really interested in working on this topic.

References:

[1] Decroos, T., Bransen, L., Van Haaren, J., & Davis, J. (2019, July). Actions speak louder than goals: Valuing player actions in soccer. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. https://dl.acm.org/doi/10.1145/3292500.3330758

[2] G. Anzer, K. Arnsmeyer, P. Bauer, J. Bekkers, U. Brefeld, J. Davis, N. Evans, M. Kempe, S. J. Robertson, J. W. Smith, J. V. Haaren, Common Data Format (CDF): A Standardized Format for Match-Data in Football, Technical Report arXiv:2505.15820, 2025. URL: http://arxiv.org/abs/2505.15820.

[3] FJ Ekaputra, G Käfer, M Kempe. An Ontology for the Common Data Format on Football Match Data. Proceedings of the ISWC 2025 Posters, Demos, and Industry Tracks (accepted for publication). URL: http://bit.ly/4mIZLXj

Keywords: Question Answering, KG population, Sport Science, Football, Large Language Model

15. A Data Model for Collective Decision Making

Supervisors: Fajar J. Ekaputra, Martin Lackner

Main idea: There is no established data format for election data. When countries publish election results, this is typically done in an ad-hoc fashion. This thesis should explore what requirements an election data format should have and how different types of election information should be recorded.

This question can be studied more broadly from the perspective of collective decision making, that is, group decisions also outside of a political context. For example, the Preflib.org website stores a large amount of preference data in various formats, which can be further used to test collective decision mechanisms. Apart from election data, there are data sets related to blockchain decisions, sport competitions, Kidney donations, and many other topics.

As a concrete first step, you should collect 10 data sets as diverse as possible and record their similarities and differences. The main task is then to propose a data model (as an ontology or relational database schema) to capture the broad range of elements/components of the election data. Furthermore, mapping between the collected datasets to the proposed data model needs to be established to show the feasibility of the proposed data model as part of the evaluation.

Example election and preference data:

Nationalratswahl 2024: https://www.data.gv.at/datasets/e40e3b00-1a98-4338-acb7-42547e6fee55?locale=de
Spotify: https://preflib.github.io/PrefLib-Jekyll/dataset/#00048
Polkadot blockchain: https://preflib.github.io/PrefLib-Jekyll/dataset/#00060
Bundestagswahl 2025: https://www.bundeswahlleiterin.de/bundestagswahlen/2025/ergebnisse/opendata.html

Possible Research Questions:

What are the key elements of election data in the context of collective decision results?
How to design a data model for preference data and collective decision results?
In which way is the proposed data model a generalization/improvement over the (rather restrictive) Preflib.org file formats (see https://preflib.github.io/PrefLib-Jekyll/format).

Expected Tasks:

Literature study on related topics.
Technical
- Collect (at least) 10 datasets of election data. Note that the diversity of these datasets is the most critical aspect.
- Gain an understanding about these datasets and their underlying elements.
- Record their similarities and differences. Analyze and categorize the components of the datasets.
- Data model development (e.g., ontology (preferred), JSON Schema, or SQL schema)

Prior Knowledge and Skills:

Data model building (e.g., ontology engineering, SQL schema design)
Data analysis and evaluation skills.

References:

Ontology development: Noy, Natalya F., and Deborah L. McGuinness. "Ontology development 101: A guide to creating your first ontology." (2001).
Mattei, Nicholas, and Toby Walsh. "A preflib.org retrospective: Lessons learned and new directions." Trends in Computational Social Choice (2017): 289-309. https://archive.illc.uva.nl/COST-IC1205/BookDocs/TrendsCOMSOC.pdf

Keywords: Election data, data model development

16. Automating topic trend analysis for a particular research field

Supervisor: Axel Polleres

17. Investigating community repairs in Wikidata

Supervisors: Nicolas Ferranti, Axel Polleres

Background

Knowledge graphs (KGs) are nowadays the main structured data representation model on the web, representing interconnected knowledge of different domains. There are several methods to model a KG. For instance, they can be extracted from semi-structured web data, like DBpedia, or edited collaboratively by a community, like Wikidata. Since there is no perfect method and knowledge about the world is constantly changing, regular updates in the KGs are required.
Knowledge graph refinement is the process of improving the quality and accuracy of a knowledge graph by adding, modifying, or deleting entities, relationships, or attributes based on new information or corrections. This process is crucial for ensuring that a knowledge graph reflects the current state of knowledge in a particular domain and that it can be used effectively for applications such as search, recommendation, and decision-making.
Wikidata has different constraint mechanisms to identify possible inconsistent data, however, it relies exclusively on its user community to fix inconsistencies.
Overall, knowledge graph refinement is an important and ongoing process that is essential for ensuring that knowledge graphs remain up-to-date, accurate, and useful for a range of applications. As new information becomes available and our understanding of the world evolves, it will be necessary to continue refining and improving knowledge graphs to ensure that they reflect the current state of knowledge in a particular domain.

The goal of the thesis

The goal of this thesis is to extend an already existent dataset of Wikidata historical repairs by including the user behind the repair and to analyze the user behavior. The student would have to: (1) work with the dataset and extract the users; (2) analyze the results towards the role of humans and bots, the specificities of different constraint types, and the domain knowledge.

Requirements

Pro-activity and self-organization. Programming skills.

Initial references

● To learn about RDF KGs: HOGAN, Aidan et al. Knowledge graphs. ACM Computing Surveys (CSUR), v. 54, n. 4, p. 1-37, 2021.
● To learn about Wikidata property constraints: Ferranti, N., De Souza, J. F., Ahmetaj, S., & Polleres, A. (2024). Formalizing and validating Wikidata’s property constraints using SHACL and SPARQL. Semantic Web, 15(6), 2333-2380.
● Data quality in Wikidata: Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., & Szekely, P. (2021). A Study of the Quality of Wikidata. arXiv preprint arXiv:2107.00156.

18. Using entity embeddings to assess constraint definitions

Supervisors: Nicolas Ferranti, Axel Polleres

Background

Knowledge graphs (KGs) are nowadays the main structured data representation model on the web, representing interconnected knowledge of different domains. There are several methods to create a KG. For instance, they can be extracted from semi-structured web data, like DBpedia, or edited collaboratively by a community, like Wikidata. Since there is no perfect method and knowledge about the world is constantly changing, regular updates in the KGs are required.
Knowledge graph refinement is the process of improving the quality and accuracy of a knowledge graph by adding, modifying, or deleting entities, relationships, or attributes based on new information or corrections. This process is crucial for ensuring that a knowledge graph reflects the current state of knowledge in a particular domain and that it can be used effectively for applications such as search, recommendation, and decision-making.
Wikidata has different constraint mechanisms to identify possible inconsistent data; for instance, one property constraint can be added to describe that the property gender can only be used with the values male or female. As knowledge changes over time, this constraint might include non-binary and other types of genders.
Entity embeddings are approaches that take concepts and represent them in a vector space, respecting their meaning. Therefore, for instance, “male” should be more similar to “female” than to “airplane”.

The goal of the thesis

The goal of this thesis is to analyze historical changes in constraint definitions and check whether already existing embedding models can capture the semantic similarity between those concepts.
Steps would consist of:
● Take historical constraint changes (we already have them for 2019, 2023);
● Possibly extend the historical constraint dataset (to include 2024)
● Use different types of embeddings (graph, sentence, etc) and map the selected entities to this space;
● Assess what similarity score these embeddings would give for the already executed historical changes, and evaluate how good they would be in helping the refinement process.

Requirements

Pro-activity and self-organization. Programming skills.

Initial references

● To learn about RDF KGs: HOGAN, Aidan et al. Knowledge graphs. ACM Computing Surveys (CSUR), v. 54, n. 4, p. 1-37, 2021.
● To learn about Wikidata property constraints: Ferranti, N., De Souza, J. F., Ahmetaj, S., & Polleres, A. (2024). Formalizing and validating Wikidata’s property constraints using SHACL and SPARQL. Semantic Web, 15(6), 2333-2380.
● Data quality in Wikidata: Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., & Szekely, P. (2021). A Study of the Quality of Wikidata. arXiv preprint arXiv:2107.00156.

19. European Train Travel Made Easy: Creating an Integrated Digital Rail Network – THE DATA

Supervisors: Shahrom Sohi, Axel Polleres

Travelling by train across Europe often involves navigating complex information systems. Different countries have different rail systems, and crossing borders can create challenges in the passenger experience. This is where innovative aggregator platforms come into play—such as Trainline, Omio , and even Uber —which have successfully integrated train information and ticketing services.

At the heart of these systems lies timetable data, which provides real-time departures andarrivals for trains across Europe. However, combining this data fromvarious sourcesis a challenging task due to several factors:

• Information retrieval: Accessing accurate and up-to-date timetable information
from multiple operators.
• Data quality control: Ensuring consistency and reliability in the data.
• Regular updates: Keeping the timetable data current to reflect any changes or
disruptions.
• Disruptions in service : Handling delays, cancellations, or track alterations.

These challenges can lead to an unpleasant passenger experience, causing travelers to opt for alternativemodes of transportation. We are seeking students with a passion for railways and cross-border train travel who want to "get their hands dirty" with GTFS (General Transit Feed Specification), one of the many data standards used in scheduling.

Example of the thesis can be developed like:

Connecting real time timetable updates with static timetable data.
This topic consists of an analysis of real time timetable data formats (GTFS-RT, SIRI, proprietary formats) andan implementation of an integrating adapter that allows tomap the gathered data onto static timetable information.

Integrating open timetable data
The EU MMTIS regulation obligated all member states to provide rail timetables on national access points. The data covers the rail connections in the country and sometimes (parts of) internationalconnections. This leads to duplicated trips thatmight overlap completely, partially or not at all.
The aim of this topic would be to analyse the different data variants, develop and test different methods tomatch the broken up journeys. This can involve classical heuristics or machine learning. Test datasets will be provided. The results will be utilized by OpenTimetable.eu

Meet Your Supervisor: ShahromSohi (shahrom.hosseinisohi@pv.oebb.at)

Shahrom Sohi, a transport engineer and digital transportation enthusiast working with ÖBB, will be your main point of contact throughout this research project. This thesis offers a unique opportunity to collaborate with ÖBB and other European Mobility Digital players, contributing to the development of more efficient and user-friendly rail travel experiences across Europe.

References:

van Overhagen, L. (2021) ‘A design vision towards seamless European train journeys: Making the train the default option to travel within Europe’. Available at:
https://repository.tudelft.nl/islandora/object/uuid%3A01a0e501-2e1a-469d-b1c3-03df7abae737 (Accessed: 13 May 2024).

CER Ticketing Roadmap (no date). Available at: https://www.cer.be/cer-eu-projects-initiatives/cer-ticketing-roadmap (Accessed: 13 May 2024).

Railways, E.U.A. for (2024) Analysis of distribution rules in TAP, OSDM, and recent competition cases | European Union Agency for Railways. Available at:
https://www.era.europa.eu/content/analysis-distribution-rules-tap-osdm-and-recent-competition-cases

20. Bridging Linguistic Barriers in Cross-Border Rail Operations: Improving Driver–Controller Communication through Intelligent Translation Tools.

Supervisors: Shahrom Sohi, Axel Polleres

The European railway domain increasingly depends on cross-border operations, where train drivers and infrastructure controllers often speak different national languages. The Translate4Rail (https://translate4rail.eu/) project has shown that providing a standardized set of predefined messages and a translation tool can mitigate miscommunication in normal and emergency situations.

This is a relatively open topic for people who like to play around with technology and AI models and push the boundaries: LLMs have enabled incredible advances over rthe past yeard, not only as generative question answering systems by also replacing most traditional techniques for multi-lingual translations. On the other hand, speech-to-text and text-to-speech technologies have likewise evolved to not only understanding spoken text and reading out text, but also capturing vocal features, up to the level of creating (synchronous) voice clones.

In thes course of this thesis, we will explore technical boundaries and different approaches for solving speech-to-speech translation in a safety critical context, where messages should be translated preserving standardized terminology in a particular domain, namely railway/mobility services.

This thesis will:

Analyze the limitations of the current Translate4Rail prototype and associated language-tool approaches, focusing on weak points such as message coverage, ambiguity, error handling, latency, and safety assurance.
Propose practical hands on improvements or extensions of Translate4Rail with, RAG, Model Finetuning and agent Frameworks.
Implement a pilot to validate the enhancements, measuring metrics such as misunderstanding rate, response delay, coverage of scenario space, and safety margins.
Evaluate interoperability and safety aspects, possibly in collaboration with rail actors or infrastructure managers, to assess feasibility in real corridor settings.

Supervisor contact:

Meet Your Supervisor: Shahrom Sohi (shahrom.hosseinisohi@pv.oebb.at)

Shahrom Sohi, a transport engineer and digital transportation enthusiast working with ÖBB, will be your main point of contact (feel free to contact him any time for questions!). This thesis offers you a chance to work at the intersection of rail operations, safety, and AI/linguistic tools contributing to more seamless language interoperability in European rail.

References:

Atanasov, I., Pencheva, E., & Vatakov, V. (2023). An Approach to Designing Critical Railway Voice Communication. Electronics, 12(6), 1406.

https://doi.org/10.3390/electronics12061406

Rosberg, T., Thorslund, B. Radio communication-based method for analysis of train driving in an ERTMS signaling environment. Eur. Transp. Res. Rev. 14, 18 (2022). https://doi.org/10.1186/s12544-022-00542-5

21. Linking Railway Accident Reports with Infrastructure Knowledge Graphs: Towards a European Railway Safety Knowledge Space

Supervisors: Shahrom Sohi, Axel Polleres

22. Analyzing Mobility Patterns and Utilization of Park & Ride Facilities in Austria

Supervisors: Shahrom Sohi, Axel Polleres

Intermodal transport systems, which incorporate multiple modes of transportation within a single journey, are essential for sustainable urban mobility (Riley et al., 2010). Among such systems, Park and Ride (P&R) facilities reduce car congestion, improving public transport accessibility, and enhancing overall travel efficiency. These facilities, strategically positioned near railway stations or bus stops, serve as key nodes for commuters transitioning from private vehicles to public transport (Litman, 2011; Pitsiava-Latinopoulou & Iordanopoulos, 2012). ÖBB INFRA operates numerous P&R stations across Austria, facilitating seamlessmultimodal trips.

ResearchObjectives:

Usage Patterns: How do commuters utilize P&R facilities in Austria?
Accessibility and Travel Behavior: What are the key factors influencingmode choice at these locations?
Forecasting Commuter Flows: How can mobility data be leveraged to predict where people travel after using P&R facilities?
Optimization Strategies: How can ÖBB INFRA enhance the efficiency of P&R facilities based onmobility insights?

This research will could employ a mixed-methods approach combining:
Quantitative Data Analysis: Processing and visualizing mobility datasets from ÖBB INFRA to identify peak usage times, parking turnover rates, and travel flows.
Survey Data Collection: Conducting on-site commuter surveys to understand behavioral factors affecting P&R usage.
GIS & Spatial Analysis: Mapping station accessibility and evaluating the relationship between facility location and utilization rates.
Predictive Modeling: Utilizing historical data to develop forecasting models for commuter flows and demand prediction.
By integrating mobility data analytics with user insights, this thesis will contribute to improving ÖBBINFRA’s intermodal strategies. The findingswill support evidence-based decision-making for future infrastructure planning, ensuring more sustainable and efficient mobility solutions in Austria.

Meet Your Supervisor: ShahromSohi (shahrom.hosseinisohi@pv.oebb.at)
Shahrom Sohi, a transport engineer and digital transportation enthusiast working with ÖBB, will be your main point of contact throughout this research project. This thesis offers a unique opportunity to collaborate withÖBB and other European Mobility Digital players, contributing to the development of more efficient and user-friendly rail travel experiences across Europe.

References:

Sohi, S., Wutz, G., Hrivnák, R., Reiter, F., Pichler, D., Anjomshoaa, A. and Polleres, A., 2025. Enhancing rail transit accessibility: a data-centric approach to Park and Ride. Transportation Research Procedia, 86, pp.405-412. Sohi, S., Wutz, G., Hrivnák, R., Reiter, F., Pichler, D., Anjomshoaa, A. and Polleres, A., 2025. Enhancing rail transit accessibility: a data-centric approach to Park and Ride. Transportation Research Procedia, 86, pp.405-412.
https://www.sciencedirect.com/science/article/pii/S2352146525002984

Riley, P. etal. (2010) IntermodalPassengerTransport inEuropen passengerintermodality from A to Z A TO Z the european forum on intermodal passenger travel. Available at: https://www.academia.edu/5074766/P_Intermodal_Passenger_Transport_in_Europe_PASSENGER_INTERMODALITY_FROM_A_TO_Z_the_european_forum_on_intermodal_passenger_travel_Link_is_funded_by_the_European_Commissions_Directorate_General_for_Mobility_and_Transport_DG_MOVE

Litman, T. (2007)‘Evaluating rail transitbenefits: Acomment’, TransportPolicy, 14(1), pp. 94–97. Available at: https://doi.org/10.1016/j.tranpol.2006.09.003.

Pitsiava-Latinopoulou, M. and Iordanopoulos, P. (2012) ‘Intermodal Passengers Terminals: Design Standards forBetter Level of Service’, Procedia - SocialandBehavioral Sciences, 48, pp. 3297–3306. Available at: https://doi.org/10.1016/j.sbspro.2012.06.1295.

23. Thesis project: Training a LLM to generate B2B sales leads from media observation

Supervisors: Shahrom Sohi, Axel Polleres

Keywords:
1. Large LanguageModels (LLMs)
2. Web scrapping
3. Pattern recognition
4. B2B Sales
5. Lead Generation

Abstract:
This thesis topic is a highly relevant and practically applicable tuning of an LLM: Training an LLM to Generate Sales Leads from Media News Through mediamonitoring, it is possible to generate sales leads. This works very wellmanually but is time-consuming and doesn't scale. We can provide a set of media articles and, from this, themanually identified subset of sales leads. An LLMshould be trained to recognize a pattern from this so that it can then independently filter the sales leads relevant to the RCG from the media articles."

The challenge is to teach the LLM the specific context to understand through pattern recognition, so that it will be able to select news articles that represent a sales lead from other news articles.

Example: Company A is a prospective target customer. On a given day there are a X number of media articles published with company A’s name in it. One of them states that company A won a new contract and will because of this soon start exports to a new country. These transport streams provide a business opportunity for Rail Cargo Group; hence we classify it as a sales lead. The trained LLM should be equipped with such a contextual understanding that it can
(pre-)select this one article from the entire set of X articles.

A functioning model would be an additional source to continuously feed the B2B sales funnel with new sales leads.

You will be provided by Rail Cargo Group with historical data on news articles, sales leads selected from it, as well as on a limited subset the conversion rates of these sales leads. In addition, elaborated list of key words is available, and you will receive close coaching and input by Rail Cargo Group subject matter experts.

V. Kumar et al. (2024). “AI-poweredmarketing:What, where, and how?” Journal: IndustrialMarketing Management (Elsevier) https://www.sciencedirect.com/science/article/pii/S0268401224000318

D. Herhausen et al. (2025). “From words to insights: Text analysis in business research” Journal: Journal of Business Research
https://www.sciencedirect.com/science/article/pii/S0148296325003145

Name	Purpose	Lifetime	Provider
CookieConsent	Saves your consent to using cookies.	30 days	WU
site-popup	Saves if popup was filled or closed.	30 days	WU
BACH_PRXY_ID	To be able to display some WU-specific content, it is necessary that some information must be accessed by back-end WU systems. Required to assign the appropriate answer to a request.	20 years	WU
BACH_PRXY_SN	To be able to display some WU-specific content, it is necessary that some information must be accessed by back-end WU systems. Required to assign the appropriate answer to a request.	session	WU
fe_typo_user	Required for login and access to protected content or for editing the user’s personal profile.	session	WU
be_typo_user	Required for login and editing content in the TYPO3 back end.	session	WU
be_lastLoginProvider	Stores the last method used for logging in to the TYPO3 back end.	90 days	WU
ASP.NET_SessionId	Required for assigning visitors to forms.	session	WU (forms.wu.ac.at)
__RequestVerificationToken	Required to protect forms against attacks.	session	WU (forms.wu.ac.at)
ESRASOFTSID	Required for identifying the logged-in user in the Business Language Center’s course registration system.	session	WU (esrasoft.wu.ac.at)
esraSoftWiData	Required to track the language and language courses selected by the user.	session	WU (esrasoft.wu.ac.at)
esraSimpleSAMLAuthToken	Required for identifying WU employees during the course registration process.	session	WU (esrasoft.wu.ac.at)
esraSimpleSAML	Required for identifying WU employees during the course registration process.	session	WU (esrasoft.wu.ac.at)
SimpleSAML	Required for identifying WU employees during the course registration process.	session	WU (esrasoft.wu.ac.at)

Name	Purpose	Lifetime	Provider
_pk_id	Used by Matomo Analytics to store a few details about the user, such as the unique visitor ID.	30 days	WU (piwik.wu.ac.at)
_pk_ref	Used by Matomo Analytics to store the attribution information, the referrer initially used to visit the website.	6 months	WU (piwik.wu.ac.at)
_pk_ses	Created by Matomo Analytics, short-lived cookies used to temporarily store data for the current visit.	1 hours	WU (piwik.wu.ac.at)
_gcl_au	Contains a randomly generated user ID.	3 months	Google
AMP_TOKEN	Contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, request in progress or an error retrieving a Client ID from AMP Client ID service.	1 year	Google
_dc_gtm_--property-id--	Used by DoubleClick (Google Tag Manager) to help identify the visitors by either age, gender or interests.	2 years	Google
_ga	Contains a randomly generated user ID. Using this ID, Google Analytics can recognize returning users on this website and merge the data from previous visits.	2 year	Google
_gat_gtag	Certain data is only sent to Google Analytics a maximum of once per minute. As long as it is set, certain data transfers are prevented.	1 minute	Google
_gid	Contains a randomly generated user ID. Using this ID, Google Analytics can recognize returning users on this website and merge the data from previous visits.	24 hour	Google
_gac_gb	Contains campaign-related information for the user. If Google Analytics and Google Ads accounts are linked, the conversion tags on the Google Ads website read this cookie.	90 day	Google
_dc_gtm	Used to throttle the request rate.	1 minute	Google
IDE	Contains a randomly generated user ID. Using this ID, Google can recognize the user across different websites across domains and display personalized advertising.	1 year	Google
player	This cookie saves user-specific settings before an embedded Vimeo video is played. This means that the next time you watch a Vimeo video, your preferred settings will be loaded.	1 year	Vimeo
vuid	This cookie is used to save the usage history of the user.	2 year	Vimeo
__cf_bm	This cookie is used to distinguish between humans and bots. This is necessary for Vimeo to collect valid data about the use of the service.	1 day	Vimeo
_uetvid	This cookie is set to enable the use of the Vimeo video player.	1 year	Vimeo
_tt_enable_cookie	This cookie is used to enable the vimeo video embedding on the WU Website and for other unspecified purposes.	1 year	Vimeo
afUserId	This cookie collects data from users who interact with embedded Vimeo videos.	2 years	Vimeo
_abexps	This cookie saves settings made by the user, e.g. Default language, region or username as well as interaction data of the user with Vimeo	10 months	Vimeo
_clck	This cookie enables the use of the embedded Vimeo video player	1 year	Vimeo
has_logged_in	This cookie stores login information and if the user has ever logged in.	10 years	Vimeo
language	This cookie remembers the language setting of a user. This ensures that Vimeo appears in the language selected by the user.	11 years	Vimeo
_ttp	This cookie is set to enable the use of the Vimeo video player	1 year	Vimeo
sd_client_id	This cookie stores data about the users current video settings and a personal identification token	2 year	Vimeo
_rdt_uuid	This cookie collects data about the users actions on websites that have a vimeo video embedded.	3 months	Vimeo
vimeo_cart	This cookie is used to check how many times a video has been played by the user.	10 years	Vimeo
OptanonConsent	This cookie stores information about the consent status of a visitor.	1 year	Vimeo
_scid	This cookie is used to assign a unique ID to a user	10 months	Vimeo
hjSessionBenutzer_	Set when a user first lands on a page. Persists the Hotjar User ID which is unique to that site. Hotjar does not track users across different sites. Ensures data from subsequent visits to the same site are attributed to the same user ID.	1 year	Hotjar
_hjid	This is an old cookie which is not set anymore, but if a user has it unexpired in their browser. It will be reused and migrated to _hjSessionUser_{site_id}. Set when a user first lands on a page. Persists the Hotjar User ID which is unique to that site. Ensures data from subsequent visits to the same site are attributed to the same user ID.	1 year	Hotjar
_hjFirstSeen	Identifies a new users first session. Used by Recording filters to identify new user sessions.	30 minutes	Hotjar
_hjHasCachedUserAttributes	Enables us to know whether the data set in _hjUserAttributes Local Storage item is up to date or not.	session	Hotjar
_hjUserAttributesHash	Enables us to know when any User Attribute has changed and needs to be updated.	2 minutes	Hotjar
_hjBenutzerAttribute	Stores User Attributes sent through the Hotjar Identify API. No explicit expiration.	session	Hotjar
hjViewportId	Stores user viewport details such as size and dimensions.	session	Hotjar
hjActiveViewportIds	Stores user active viewports IDs. Stores an expirationTimestamp that is used to validate active viewports on script initialization.	session	Hotjar
_hjSession_	Holds current session data. Ensures subsequent requests in the session window are attributed to the same session.	30 minutes	Hotjar
_hjSessionTooLarge	Causes Hotjar to stop collecting data if a session becomes too large. Determined automatically by a signal from the server if the session size exceeds the limit.	1 hour	Hotjar
_hjSessionResumed	Set when a session/recording is reconnected to Hotjar servers after a break in connection.	session	Hotjar
_hjCookieTest	Checks to see if the Hotjar Tracking Code can use cookies. If it can, a value of 1 is set. Deleted almost immediately after it is created.	session	Hotjar
_hjLocalStorageTest	Checks if the Hotjar Tracking Code can use Local Storage. If it can, a value of 1 is set. Data stored in _hjLocalStorageTest has no expiration time, but it is deleted almost immediately after it is created.	none	Hotjar
_hjSessionStorageTest	Checks if the Hotjar Tracking Code can use Session Storage. If it can, a value of 1 is set. Data stored in _hjSessionStorageTest has no expiration time, but it is deleted almost immediately after it is created.	none	Hotjar
_hjIncludedInPageviewSample	Set to determine if a user is included in the data sampling defined by your site's pageview limit.	2 minutes	Hotjar
_hjIncludedInSessionSample_	Set to determine if a user is included in the data sampling defined by your site's daily session limit.	2 minutes	Hotjar
_hjAbsoluteSessionInProgress	Used to detect the first pageview session of a user.	30 minutes	Hotjar
_hjTLDTest	We try to store the _hjTLDTest cookie for different URL substring alternatives until it fails. Enables us to try to determine the most generic cookie path to use, instead of page hostname. It means that cookies can be shared across subdomains (where applicable). After this check, the cookie is removed.	session	Hotjar

Name	Purpose	Lifetime	Provider
test_cookie	Is set as a test to check whether the browser allows cookies to be set. Does not contain any identification features.	15 minute	Google
IDE	Contains a randomly generated user ID. Using this ID, Google can recognize the user across different websites across domains and display personalized advertising.	1 year	Google
_gcl_au	Contains a randomly generated user ID.	90 day	Google
_gcl_aw	This cookie is set when a user clicks on a Google ad on the website. It contains information about which ad was clicked.	90 day	Google
xs	Used to maintain a Facebook session. It works in combination with the c_user cookie to authenticate the user's identity on Facebook.	1 year	Facebook
fr	Used to serve advertisements and measure and improve their relevance.	90 day	Facebook
m_pixel_ratio	Performance cookie used by Facebook with Facebook pixels.	session	Facebook
wd	Used for analysis purposes. Technical parameters are logged (e.g. aspect ratio and dimensions of the screen) so that Facebook apps can be displayed correctly.	7 day	Facebook
dpr	Used for analysis purposes. Technical parameters are logged (e.g. aspect ratio and dimensions of the screen) so that Facebook apps can be displayed correctly.	7 day	Facebook
sb	Used to save browser details and Facebook account security information.	2 year	Facebook
dbln	Used to save browser details and Facebook account security information.	2 year	Facebook
spin	Cookie for advertising purposes and reporting on social campaigns.	session	Facebook
presence	Contains the "Chat" status of a logged in user.	1 month	Facebook
x-referer	Performance cookie that is used by Facebook in combination with Facebook pixels.	session	Facebook
cppo	Cookie for statistical purposes.	90 day	Facebook
datr	Identifies the browser for security and website integrity purposes, including account recovery and identification of potentially compromised accounts.	2 year	Facebook
locale	Saves language settings.	session	Facebook
_fbp	A cookie for Facebook advertising that is used to track and improve relevance and to serve ads on Facebook.	90 day	Facebook
_fbc	A cookie for Facebook advertising that is used to track and improve relevance and to serve ads on Facebook.	90 day	Facebook
UserMatchHistory	This cookie is used to synchronize the LinkedIn Ads IDs.	30 day	LinkedIn
AnalyticsSyncHistory	This cookie saves the time at which the user was synchronized with the "lms_analytics" cookie.	30 day	LinkedIn
li_oatml	This cookie is used to identify LinkedIn members outside of LinkedIn for advertising and analysis purposes.	30 day	LinkedIn
lms_ads	This cookie is used to identify LinkedIn members outside of LinkedIn.	30 day	LinkedIn
lms_analytics	This cookie is used to identify LinkedIn members for analysis purposes.	30 day	LinkedIn
li_fat_id	This cookie is an indirect member identification that is used for conversion tracking, retargeting and analysis.	30 day	LinkedIn
li_sugr	This cookie is used to determine probabilistic matches of the identity of a user.	90 day	LinkedIn
U	This cookie identifies the user’s browser.	3 month	LinkedIn
_guid	This cookie is used to identify a LinkedIn member for advertising via Google Ads.	90 day	LinkedIn
BizographicsOptOut	This cookie is used to determine the rejection status for tracking by third-party providers.	10 year	LinkedIn
lidc	This cookie makes it easier to select LinkedIn's data center.	24 hours	LinkedIn
aam_uuid	This cookie is used for ID synchronization with Adobe Audience Manager.	30 days	LinkedIn
AMCV_XXX_at_AdobeOrg	This cookie contains a unique identifier for the Adobe Experience Cloud.	180 days	LinkedIn
li_mc	This cookie is used as a temporary cache. It is used to have the user's consent information from the database available client side.	2 years	LinkedIn
lang	This cookie stores the language settings of a user. This ensures that the LinkedIn.com website appears in the language selected by the user.	session	LinkedIn
twll	This cookie is set when X is embedded on the page. X collects data that is mainly used for tracking and targeting.	4 year	X
secure_session	This cookie is set when X is embedded on the page. E.g. X's like or sharing functions.	14 year	X
guest_id	This cookie is set by X when a visitor shares content from the WU website on X.	2 year	X
personalization_id	This cookie is set by X to measure the performance of X advertising campaigns in a user's browsers and devices	2 year	X
remember_checked	This cookie is set by when X is embedded on the page. X collects data that is mainly used for tracking and targeting.	4 year	X
remember_checked_on	This cookie is set when X is embedded on the page. E.g. X's like or sharing functions.	4 year	X
mbox	This cookie is intended for identifying X users, for analyzing interaction with the X Service and advertising whitin the service.	2 years	X
guest_id_ads	This cookie is set due to X integration and for sharing content to social media.	10 months	X
d_prefs	This cookie ist used to check referral links and the login status.	90 days	X
ct0	This cookie is set due to X integration and sharing capabilities for the social media.	10 months	X
kdt	This cookie is used to monitor the users login status on X.	10 months	X
guest_id_marketing	This cookie is set for tracking and analytics purposes.	10 months	X
twid	This cookie checks if you are logged in to X during a browser session.	1 year	X
auth_token	This cookie is required for authentication and checks whether the user is logged in.	10 months	X
external_referer	This cookie collects statistical data, including how often you visit X and how long a user stays on X.	1 day	X
NID	This cookie contains a unique ID that is used to save user-specific settings and other information, in particular your preferred language, how many search results should be displayed per page and whether the Google SafeSearch filter should be activated.	6 month	YouTube
1P_JAR	This Google cookie is used to optimize advertising, to provide ads that are relevant to users, to improve reports on campaign performance or to prevent a user from seeing the same ads multiple times.	1 month	YouTube
CONSENT	This cookie is used to support Google's advertising services.	20 year	YouTube
OTZ	Aggregated analysis of website visitors.	17 day	YouTube

Current Bachelor Thesis Topics

Bachelor Topics WS 2025/2026

1. Text Mining and Machine Learning

2. Visualizing Data in Virtual and Augmented Reality

3. Causal Discovery Algorithms for timeseries data

4. Benchmarking Evaluation Protocols for Large Language Models as Judges in Multilingual Contexts

5. A Survey of RAG Systems in the Legal Domain

6. Classifying User Query Types for Graph-Based Retrieval

7. Qualitative Analysis of Failure in Generative Retrieval

8. Market Analysis of Visual Modelling Software for Supporting AI System Engineering

9. Post-Quantum Cryptographic Research in Europe and Asia

Abstract:

Keywords:

Initial References:

10. Modeling Ethical Bias into Normative Semantic Web

Abstract:

Keywords:

Preliminary References:

Further Reading:

11. Automated evaluation of AI-generated explanations

Thesis supervision starting February/March 2026!

12. AI-based data completion for fair representation in online discussions

13. Testing Algorithms for Digital Democracy in Simulations

14. The Football ChatBot: A GraphRAG-based Approach for QA over Football KG

15. A Data Model for Collective Decision Making

16. Automating topic trend analysis for a particular research field

17. Investigating community repairs in Wikidata

Background

The goal of the thesis

Requirements

Initial references

18. Using entity embeddings to assess constraint definitions

Background

The goal of the thesis

Requirements

Initial references

19. European Train Travel Made Easy: Creating an Integrated Digital Rail Network – THE DATA

Example of the thesis can be developed like:

Meet Your Supervisor: ShahromSohi (shahrom.hosseinisohi@pv.oebb.at)

References:

20. Bridging Linguistic Barriers in Cross-Border Rail Operations: Improving Driver–Controller Communication through Intelligent Translation Tools.

21. Linking Railway Accident Reports with Infrastructure Knowledge Graphs: Towards a European Railway Safety Knowledge Space

22. Analyzing Mobility Patterns and Utilization of Park & Ride Facilities in Austria

23. Thesis project: Training a LLM to generate B2B sales leads from media observation