Current Bachelor Thesis Topics
Bachelor Topics SS 2024
1. Visualizing Data in Virtual and Augmented Reality
Supervisor: Johann MitlöhnerText mining aims to turn written natural language into structured data that allow for various types of analysis which are hard or impossible on the text itself; machine learning aims to automate the process using a variety of adaptive methods, such as artificial neural nets which learn from training data. Typical goals of text mining are Classification, Sentiment Detection, and other types of Information Extraction, e.g. Named Entity Recognition: identify people, places, organizations; Relation Extraction, e.g. locations of organizations.
Connectionist methods and deep learning in particular have achieved much attention and success recently; these methods tend to work well on large training datasets which require ample computing power. Our institute has recently acquired high performance GPU units which are available for student use in thesis projects. It is highly recommended to use a framework such as PyTorch or Tensorflow/Keras for developing your deep learning application; the changes required to go from CPU to GPU computing will be minimal. This means that you can start developing using your PC or notebook, or the Jupyter notebook server of the department, with a small subset of the training data; when you later transition to the GPU server more performance will mean that larger datasets become feasible.
On text mining e.g.: Minqing Hu, Bing Liu: Mining and summarizing customer reviews. KDD '04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168-177, ACM, 2004
For a more recent work and overview e.g.: Percha B. Modern Clinical Text Mining: A Guide and Review. Annu Rev Biomed Data Sci. 2021 Jul 20;4:165-187. doi: 10.1146/annurev-biodatasci-030421-030931. Epub 2021 May 26. PMID: 34465177.
2. Text Mining and Machine Learning
Supervisor: Johann MitlöhnerHow can AR and VR be used to improve exploration of data? Developing new methods for exploring and analyzing data in virtual and augmented reality presents many opportunities and challenges, both in terms of software development and design inspiration. There are various hardware options, from cheap but feasible, such as Google Cardboard, to more sophisticated and expensive, such as Oculus Rift. Taking part in this challenge demands programming skills as well as creativity. A basic VR or AR application for exploring a specific type of (open) data will be developed by the student. The use of a platform-independent kit such as A-Frame is essential, as the application will be compared in a small user study to its non-VR version in order to identify advantages and disadvantages of the visualization method implemented. Details will be discussed with supervisor.
Some References:
Butcher, Peter WS, and Panagiotis D. Ritsos. "Building Immersive Data Visualizations for the Web." Proceedings of International Conference on Cyberworlds (CW'17), Chester, UK. 2017.
Teo, Theophilus, et al. "Data fragment: Virtual reality for viewing and querying large image sets." Virtual Reality (VR), 2017 IEEE. IEEE, 2017.
Millais, Patrick, Simon L. Jones, and Ryan Kelly. "Exploring Data in Virtual Reality: Comparisons with 2D Data Visualizations." Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018.
Yu Shu, Yen-Zhang Huang, Shu-Hsuan Chang, and Mu-Yen Chen. Do virtual reality head-mounted displays make a difference? a comparison of presence and self-efficacy between head-mounted displays and desktop computer-facilitated virtual environments. Virtual Reality, 23(4):437-446, 2019
More info on topics and process at mitloehner.com/lehre/thesis-en.html
3. Exploring the Integration of Digital Signatures in European E-Government Systems
Supervisors: Jennifer-Marieclaire Sturlese, Marta SabouThe Regulation on Electronic Identification and Trust Services for Electronic Transactions in the Internal Market (eIDAS Regulation) is a milestone towards a predictable regulatory environment. The eIDAS Regulation supports businesses, citizens, and authorities in conducting secure electronic interactions (EU Commission, [3]). A digital signature is a cryptographic technique used to verify the authenticity of a digital document, providing a secure method of identification [4]. The European eIDAS Regulation aims, among other things, to enable the use of national electronic identities (eIDs) for online applications in other EU member states [2, 3]. It thus regulates the cross-border acceptance of certain high-quality eIDs ([2]). In December 2023, ID-AUSTRIA was introduced in Austria. ID-Austria allows people to securely
authenticate online and thus use digital services and conduct transactions [1]. Besides Austria, 15 more EU member states have implemented eIDAS, thus, far not all member states are part of this joint mission of e-government [2]. E-government refers to the use of digital technologies by government institutions to provide public services, engage with citizens, and enhance administrative efficiency [5].
The aim of this bachelor thesis is to provide a detailed work on e-government, digital signatures, with a focus on cryptography, eIDAS, and ID-AUSTRIA (= theoretical part). In the practical section, you will choose a country not currently implementing eIDAS and develop a model for the successful implementation of digital signatures, like ID-AUSTRIA. The objective is to create a process map of the necessary steps through modeling (UML and BPMN). This will then be summarized in a short policy brief, where you will provide a recommendation on how the transition to the EU eIDAS regulation should be carried out.
In your application, please state your experience with LaTeX text processing, UML and BPMN modeling, whether you have personal experience with the use of ID-AUSTRIA, and which country you would like to analyze (and why).
Keywords: cryptography, digital signature, e-government
Sources:
[1] www.oesterreich.gv.at/id-austria.html
[2] www.oesterreich.gv.at/themen/egovernment_moderne_verwaltung/elektronische-identität-(eiD)-anderer-eu-mitgliedstaaten-(SDG).html
[3] digital-strategy.ec.europa.eu/de/policies/eidasregulation.
ec.europa.eu/digital-building-blocks/sites/display/DIGITAL/Country+overview
[4] Laudon, K.C. & Laudon, J.P. (2022). Management Information Systems: Managing the Digital Firm (17. Auflage). Pearson. permalink.obvsg.at/wuw/AC16811890
[5] www.usp.gv.at/it-geistiges-eigentum/e-government.html
For UML: Rupp, C., Queins, S. & die SOPHISTen (2012). UML 2 glasklar, Praxiswissen für die UMLModellierung (4. Auflage). Carl Hanser Verlag München. www.hanserelibrary.com/doi/book/10.3139/9783446431973
4. Information Security Risks of IT-Outsourcing and the Role of In-House Development
Supervisors: Jennifer-Marieclaire Sturlese, Marta SabouThe widespread adoption of cloud-based services has led many organizations to consider migrating from local platforms to those offered by major tech companies [1] referred to as the “Big Five” (Google, Amazon, Microsoft, Apple, Alphabet). This IT outsourcing presents potential benefits, such as scalability, cost-effectiveness, and access to advanced features [2]. However, it also introduces significant information security risks that must be addressed.
This bachelor thesis aims to assess the information security risks associated with IT outsourcing in general and specifically to the Big Five companies. Aspects may include data privacy, various kinds of cyber-attacks, and the strength of diverse security measures such as encryption protocols and access controls.
The aim of this bachelor thesis is to provide a detailed work on in-house development and IT outsourcing, focusing on the information security risks of both options (= theoretical part). In the practical section, you will create an action plan of what would be necessary to implement a local infrastructure for switching from an outsourced cloud platform to a local, in-house developed one. The objective is to create detailed report on the procedures during requirements engineering, software design, development and testing. You will argue for one development process framework and demonstrate your model with UML diagrams.
In your application, please state your experience with LaTeX text processing, UML modeling, whether you have personal experience with IT-outsourcing, and which organization you would like to analyze.
Keywords: Software Engineering, Information Security, In-House Development, IT-Outsourcing
Preliminary References:
[1] Lacity, M. C., Khan, S., Yan, A., & Willcocks, L. P. (2010). A review of the IT outsourcing empirical literature and future research directions. Journal of Information technology, 25, 395-433.
[2] Wright, C. (2004). Top three potential risks with outsourcing information systems. Information Systems Control Journal, 5, 40-42.
Sommerville, I. (2018). Software Engineering (10. aktualisierte Auflage). Pearson Higher Education München. https://permalink.obvsg.at/wuw/AC09030731
Rupp, C., Queins, S. & die SOPHISTen (2012). UML 2 glasklar, Praxiswissen für die UMLModellierung (4. Auflage). Carl Hanser Verlag München. www.hanserelibrary.com/doi/book/10.3139/9783446431973
5. AI Demonstrators for public outreach
Supervisor: Marta SabouMain idea: Review research lines of the SemSys research group and propose ideas for demonstrators suitable for public outreach, preferably among youg people.
Background: The SemSys research group [1] performs foundational and applied research in the area of information systems enabled by semantic (web) and Artificial Intelligence (AI) technologies at the confluence of Semantic Web, Machine Learning and Human Computation research areas. SemSys works on several exciting research topics and projects and has so far focused primarily on scientific dissemination in conference papers and journal articles. As the general public’s interest in AI is increasing, the group sees it as an opportunity to intensify its public outreach to the broader public through interesting demonstrators based on its work.
Research Problem and Questions:
The research problem focuses on transforming scientific work and results into appealing demonstrators for the general public. This requires answering the following questions:
Which lines of work of the group are most intuitive for presentation to the general public?
What are key segments of the general public and what are their interests towards AI? What are the modalities to reach out to them?
What are concrete ideas of 1-2 demonstrators that would showcase the work of the group in an intuitive and appealing manner?
Expected Tasks:
Acquire an in-depth understanding of the research performed in SemSys (e.g., read selected research paper, interviews with group members)
Identify target groups for outreach activities
For one selected target group, clarify their interests towards AI (e.g., conduct focus groups)
For the selected target group propose 1-2 concrete activities/demonstrators that could be built. For this, take also into account examples of other Ai demonstrators.
Prior-Knowledge and Skills:
Understanding of semantic web technologies (skills from course K2 and K3 in the SBWL KM);
Ability to understand potential target groups and to identify their needs;
Creativity in proposing novel technology demonstrators for target groups.
References:
[1] SemSys Research Group web-site: https://semantic-systems.org/
Keywords: Semantic Web Technologies, AI, Public Outreach, Demonstrators
6. Semi-automatic knowledge graph evaluation
Supervisor: Stefani Tsaneva, Marta SabouMain idea: The thesis will consist in a literature review of available semi-automatic approaches for the evaluation of semantic resources such as ontologies and knowledge graphs.
Background: Knowledge graphs (KGs) conceptualise real-world knowledge and act as a foundational component in many advanced intelligent applications (e.g., search, decision support) harnessing human knowledge. As such ensuring the quality of KGs is important.
While several methods exist for automatic KG verification, some defects can only be identified by human inspection [1]. Although expert (human-in-the-loop) evaluations achieve high accuracy for the identification of such defects, they are cost-intensive and time-consuming.
As an alternative, semi-automatic approaches relying on “Hybrid Intelligence” have been developed harnessing human intelligence while minimising the evaluation time and costs (e.g., [2]).
Research Question: What are the characteristics of current semi-automatic KG evaluation approaches?
Methodology: Literature review
Expected Tasks:
Start by reading a number of provided papers (published 2010-2020) and extract relevant information of the described semi-automatic approaches (e.g., the human-in-the-loop pattern, the role of the human/automatic component, the evaluated KG, etc.)
Find similar relevant literature published after 2020 and extract relevant information
Perform data analysis on the extracted results (e.g., find trends, limitations)
Skills:
Understanding of ontologies and semantic web technologies (course K2 in the SBWL KM);
Critical thinking
Good academic writing skills (e.g., course K5 in the SBWL KM)
References:
[1] Mortensen, J.M. (2013). Crowdsourcing Ontology Verification. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8219. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41338-4_30
[2] Yifan Qi, Weiguo Zheng, Liang Hong, and Lei Zou (2022). Evaluating Knowledge Graph Accuracy Powered by Optimized Human-machine Collaboration. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22). Association for Computing Machinery, New York, NY, USA, 1368–1378. https://doi.org/10.1145/3534678.3539233
7. LLM usage when learning Ontology Engineering
Supervisor: Stefani Tsaneva, Marta SabouMain idea: The thesis aims to collect qualitative insights into how beginners in ontology engineering make use of applications supported by large language models.
Background: Ontologies conceptualise real-world knowledge and act as a foundational component in many advanced intelligent applications (e.g., search, decision support) harnessing human knowledge. Ontology engineering, the process of developing ontologies, is a time intensive task containing several activities which can potentially be computationally supported [2].
Meanwhile, large language models (LLMs) have shown performance similar to humans on a number of natural language tasks, typically requiring commonsense or domain knowledge. With recent advances of LLMs and their application in a broad range of tasks, an interest into the synergy between LLMs and ontology engineering has emerged [1].
To better understand the extent to which LLMs can currently support the ontology engineering process, the thesis will focus on collecting information about how students learning to build ontologies make use of LLM-based tools when developing their semantic artefacts.
Research Question: How do novice ontology engineers make use of LLM-supported tools when performing ontology engineering tasks?
Methodology: Literature review + Semi-structured interviews/Focus group
Expected Tasks:
Read literature on collaborative ontology engineering tools supported by LLMs
Condict interviews/a focus group which students who completed K2 in the Knowledge Management SBWL on their experience in using ChatGPT and other LLM-based tools
Summarize the findings and identify trends (e.g. which tools used for which task)
Skills:
Understanding of ontologies and semantic web technologies (course K2 in the SBWL KM);
Active listening skills and conversational skills
References:
[1] Fabian Neuhaus: Ontologies in the era of large language models – a perspective. Applied Ontology, vol 18, no. 4, 2023, pp. 399–407. DOI 10.3233/AO-230072.
[2] Zhang, B., Carriero, V.A., Schreiberhuber, K., Tsaneva, S., Gonz'alez, L.S., Kim, J., & Berardinis, J.D. (2024). OntoChat: a Framework for Conversational Ontology Engineering using Language Models.DOI 10.48550/arXiv.2403.05921
8. SceneGraphs - Knowledge graph construction for visual understanding of images?
Supervisor: Axel PolleresA colleague from robotics recently hinted me to the notion of scenegraphs in robotics [1,2]. The term is also used in gaming [3], yet the notion seems closely related to and potentially benefit from incorporating Knowledge Graphs [4], in that the shared idea is that abstract graph structures that describe the world in a structure manner support AI applications to answer questions and steer deicison making.
In this thesis, you should survey and explore approaches from scene graph construction for images and compare them to knowledge graph generation from text. We may also attempt to leverage language models and multi-modal models to support this task, or other approaches deom the found literatuture to come up with at least a prototypical implementation - this is a thesis topics for data science or knowledge management students who look for a challenge! :-)
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, Antoine Zimmermann (2021) Knowledge Graphs, Synthesis Lectures on Data, Semantics, and Knowledge, No. 22, 1–237, DOI: 10.2200/S01125ED1V01Y202109DSK022, Springer.
9. Mapping and Implementing PropertyGraph Schemas using PG-Schema and PG-Keys in Postgres...
Supervisor: Axel PolleresGraph Database Management Systems [1] are picking up rapidly with the advent of Property Graphs as a data model [2] and the needs to store and query Knowledge Graphs [3] efficiently. Graph database models are actually not so new at all [4]: they have developed their own query languages [5,6], and also, interestingly and recently, new schema languages and constraint language, such as PG-Schema [7] and PG-Keys [8].
In this thesis a prototype should be developed that implements graph data in a standard relational database such as PostgreSQL [8] and maps PG-Schema to standard SQL constraints.
This should be a rewarding exercise to learn more about the trending Graph database paradigm and emerging standards and should be combined with a literature review in this busy research area. As such the thesis *could* be worked upon by a team of two, however, at least one team member has to focus on a (prototypical) implementation.
Renzo Angles: The Property Graph Database Model. AMW 2018
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, Antoine Zimmermann (2021) Knowledge Graphs, Synthesis Lectures on Data, Semantics, and Knowledge, No. 22, 1–237, DOI: 10.2200/S01125ED1V01Y202109DSK022, Springer
Renzo Angles, Claudio Gutierrez: Survey of graph database models. ACM Comput. Surv. 40(1): 1:1-1:39 (2008)
Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, Domagoj Vrgoc: Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv. 50(5): 68:1-68:40 (2017)
Francis, Nadime, et al. "Cypher: An evolving query language for property graphs." Proceedings of the 2018 international conference on management of data. 2018.
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, Dusan Zivkovic: PG-Schema: Schemas for Property Graphs. Proc. ACM Manag. Data 1(2): 198:1-198:25 (2023)
Angles, Renzo, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee et al. "Pg-keys: Keys for property graphs." In Proceedings of the 2021 International Conference on Management of Data, pp. 2423-2436. 2021
10. Cloud Database Systems
Supervisor: Axel PolleresApart from traditional standalone server SQL Relational Database Management Systems, scalable Cloud databases such as e.g. Snowflake [1] and Google Bigquery have entered the market that operate in the cloud. In this thesis we would like to investigate and document: architectural differences, indexing methods and other novel approaches in these systems, restrictions and extensions of standard SQL which they support, as well as common use cases.
It is expected that you don't shy away to
1) practically test and work with these systems, systematically evaluate, compare them, and document your findings
2) back ups your finding by diving into existing academic literature and documentation of these now systems.
Long story told short/challenge: convince me why and for what I should use a cloud database instead of a more traditional Relatioal DB system such as PostgreSQL[3] :-)
11. Seamless Booking Systems in Transportation: A Comparative Analysis
Supervisor: Shahrom Sohi HosseiniWe are looking for motivated candidates with a passion in transportation and urban mobility.
Abstract
This research thesis explores the key components and data architecture requirements of seamless booking systems across various modes of transportation: airlines, railways, urban transit, ferries, and buses. The focus is on understanding how these systems can be designed for efficiency, user-friendliness, and integration, enhancing the travel experience in diverse transit environments.
Introduction
Seamless booking systems are pivotal in modern transportation, offering convenience and efficiency to travellers. These systems vary considerably across different modes of transportation due to unique operational, logistical, and customer service requirements. This thesis investigates these variations, aiming to draw insights that could inform the development of more integrated, user-centric booking experiences.
Examples of Key Components of Seamless Booking Systems by Transport Mode
1. Airlines:
Advanced reservation and ticketing systems with real-time updates.
Integration with global distribution systems for wide access.
Dynamic pricing models based on demand and seasonality.
2. Railways:
Robust scheduling systems to handle frequent, high-capacity services.
Integrated ticketing with other modes of transportation for multimodal journeys.
Real-time information systems for delays and platform changes.
3. Urban Transit:
Contactless and mobile payment options for convenience.
Real-time updates on service changes, especially during peak hours.
Integration with city-wide transport services for a unified experience.
4. Ferries:
Online booking platforms that accommodate vehicle and passenger transport.
Seasonal scheduling adjustments to match tourist demands.
Integration with local transportation for seamless end-to-end travel.
5. Buses:
Online booking systems with dynamic route planning.
Real-time tracking for operational efficiency.
Integration with local transport systems for extended connectivity.
Data Architecture Requirements
Data Integration and Interoperability: Crucial for cross-platform functionality, especially in urban areas where multiple modes of transport interconnect. Ensuring data systems can communicate seamlessly is fundamental for a unified booking experience.
Scalability: Systems must be able to handle varying loads, particularly in peak travel seasons. This is vital for airlines and urban transit where the volume of transactions can be extraordinarily high.
Real-Time Data Processing: Essential for up-to-date scheduling and pricing, especially in railways and urban transport, where delays are common, and schedules are tight.
Security and Privacy: Robust security protocols to protect user data, especially considering the sensitivity of travel documents and payment information.
User-Centric Design: Systems should be built with the end-user in mind, ensuring ease of use, accessibility, and straightforward navigation.
Conclusion
Seamless booking systems in different modes of transportation have unique requirements and challenges, reflecting the operational intricacies of each mode. However, commonalities like the need for data integration, scalability, real-time processing, security, and user-centric design are evident. Understanding these aspects can guide the development of more integrated and efficient transportation networks, ultimately enhancing the traveller's experience.
Future Research Directions
Further research could explore the potential of emerging technologies, like Data Spaces, in enhancing the efficiency and security of these systems. Additionally, case studies on successful integration efforts in EU areas could provide practical insights for future improvements.
References
Smith, A. (2023). Digital Transformation in Airline Booking Systems.
Johnson, K., & Liu, M. (2024). Integrating Urban Transit Networks: A Data-Driven Approach.
O'Connell, J. F. (2022). Railway Reservation Systems: Design and Implementation Challenges.
Mobility Data Space - https://mobility-dataspace.eu
12. Augmented Process Discovery
Supervisor: Maxim VidgofBackground:
Process mining allows to create process models from event logs. However, the event logs are not always fully representative for the behavior of the system that produced them.
Research problem:
In this thesis, the student is expected to measure representativeness of event log (using existing metrics but also proposing own view) and augment the input data for process discovery algorithms by means of simulation and process model prediction.
Prerequisites: Basic knowledge of Process mining, Python is a strict requirement, Java is a plus.
References:
Kabierski, M., Richter, M., & Weidlich, M. (2023, October). Addressing the Log Representativeness Problem using Species Discovery. In 2023 5th International Conference on Process Mining (ICPM) (pp. 65-72). IEEE.
De Smedt, J., Yeshchenko, A., Polyvyanyy, A., De Weerdt, J., & Mendling, J. (2021, October). Process model forecasting using time series analysis of event sequence data. In International Conference on Conceptual Modeling (pp. 47-61). Cham: Springer International Publishing.
De Weerdt, J. (2023, October). Cracking the Nut: Unraveling Challenges in Predictive Process Monitoring [Keynote]. ml4pm2023.di.unimi.it/preproceedings/Keynote-ML4PM2023-JochenDeWeerdt.pdf
13. Simulating business processes with BPMS
Supervisor: Maxim VidgofBackground:
Business Process Management (BPM) benefits significantly from simulations, which forecast how processes operate under various conditions. However, current simulation methods tend to oversimplify, especially regarding resource management, which skews outcomes. In contrast, Business Process Management Systems (BPMS) effectively support intricate process logistics,including detailed resource allocation. This thesis suggests enhancing simulation techniques by incorporating BPMS functionalities, focusing on their resource management features. The goal is to develop simulations that mirror real-world complexities more accurately, providing better insights for process optimization.
Research problem:
In this thesis, the student is expected to answer the question: “How can business process simulation benefit from using a BPMS?” by building and evaluating a prototype of a business process simulation tool.
Prerequisites:
Programming skills (Python or Java) are a strict requirement, basic knowledge of process mining is a nice-to-have.
References:
van der Aalst, W.M.P. (2015). Business Process Simulation Survival Guide. In: vom Brocke, J., Rosemann, M. (eds) Handbook on Business Process Management 1. International Handbooks on Information Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45100-3_15
Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A. (2018). Quantitative Process Analysis. In: Fundamentals of Business Process Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-56509-4_7
Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A. (2018). Process-Aware Information Systems. In: Fundamentals of Business Process Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-56509-4_9
BIMP and QBP Simulator - Ultimate Business Process Simulator for BPMN. https://bimp.cs.ut.ee/
Get started with Camunda docs.camunda.org/get-started/quick-start/
14. State of Streaming Process Mining
Supervisor: Maxim VidgofBackground:
Process mining traditionally analyzes historical data from static event logs to improve business processes. However, this method can't keep up with the need for immediate insights from ongoing events. Streaming process mining emerges as a solution, focusing on real-time events and offering instant analysis. This approach is essential for quickly reacting to new information, but it introduces
challenges such as managing continuous data flow. Understanding the developments in streaming process mining is crucial for harnessing its full potential in dynamic environments.
Research problem:
In this thesis, the student is expected to conduct a Systematic Literature Review of the recent developments in Streaming Process Mining.
References:
Burattin, A. (2022). Streaming Process Mining. In: van der Aalst, W.M.P., Carmona, J. (eds) Process Mining Handbook. Lecture Notes in Business Information Processing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-031-08848-3_11
van der Aalst, W. (2016). Process Mining in the Large. In: Process Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49851-4_12
Okoli, C., & Schabram, K. (2015). A guide to conducting a systematic literature review of information systems research.
15. Enhancing Neural Networks with Ontologies and Knowledge Graphs: A Comprehensive Survey
Supervisors:Majlinda Llugiqi, Marta SabouMain idea: Review and analyze existing literature and methods that utilize ontologies and knowledge graphs as tools to improve the architecture, performance, and interpretability of neural networks.
Motivation: While neural networks have achieved significant success in numerous applications, their black-box nature and data-driven essence can sometimes result in less transparency and domain-specificity. Ontologies and knowledge graphs, encapsulating structured domain knowledge, can potentially address these gaps. An in-depth survey of existing methods will offer clarity on the advancements and challenges in this interdisciplinary domain.
Research Questions:
How have ontologies and knowledge graphs been historically employed to enhance neural networks?
What are the primary benefits reported in using structured knowledge to enhance neural network models?
Which specific neural network architectures or domains (e.g., NLP, Computer Vision) have most extensively adopted these methods?
What challenges and limitations have researchers faced when integrating ontologies and knowledge graphs with neural networks?
Expected Tasks:
Conduct a systematic literature review to identify key papers and works in the domain.
Categorize the methods based on the specific application (e.g., weight initialization, network regularization, interpretability).
Analyze the reported advantages and challenges for each method or approach.
Summarize the domains and neural network architectures that have seen significant ontology and knowledge graph integration.
Discuss the potential future directions
Prior-Knowledge and Skills:
Comprehensive understanding of neural networks and their architectures.
Familiarity with ontology structures, knowledge graph representations, and their applications.
Analytical and critical reading skills to discern the quality and relevance of research works.
References:
Sheth, Amit, et al. "Shades of knowledge-infused learning for enhancing deep learning." IEEE Internet Computing 23.6 (2019): 54-63.
Tiddi, Ilaria, and Stefan Schlobach. "Knowledge graphs as tools for explainable machine learning: A survey." Artificial Intelligence 302 (2022): 103627.
Gaur, Manas, Keyur Faldu, and Amit Sheth. "Semantics of the black-box: Can knowledge graphs help make deep learning systems more interpretable and explainable?." IEEE Internet Computing 25.1 (2021): 51-59.
16. Leveraging AI-Tools for Translating Video in Higher Education
Supervisor: Michael FeursteinBackground: With the surge of publicly available interfaces to use components of artificial intelligence (AI), arise questions of how these tools can be integrated in the context of higher education. One aspect of this integration is the process of translating lecture content, specifically video-based lecture content. With more and more international students attending classes the demand increases to offer content in both German and English language.
This is the case with the newly designed course “Grundlagen der Wirtschaftsinformatik” for 1st year bachelor students. Lecture content has been designed in German language. Soon, however, it is planned to offer this course in English language as well. Hence, the question arises how AI-based tools can help translate German language content to English language content.
One challenge lies in the tremendous workload of re-recording all video-based lecture content with spoken English language. In total there is over 9 hours of video material to be translated or re-recorded. Experience shows that a 45-minute video, needs 4-5 hours of video material. If AI-based tools can help reduce this workload it could be of enormous help to lecturers.
Research Problem: The bachelor thesis should analyze the feasibility of using AI-tools for translating video in higher education. This should include and be guided by the following points:
Overview of available tools and their affordances.
What can be done with current tools on the market and what are the limitations?
How does the process of translating video with AI-based tools look?
Do translations work as expected, with real test material?
What legal and ethical aspects need to be considered?
References:
Anantrasirichai, N., Bull, D. Artificial intelligence in the creative industries: a review. Artif Intell Rev 55, 589–656 (2022). https://doi.org/10.1007/s10462-021-10039-7
Carlos Amaral and Peggy van der Kreeft. 2022. plain X - AI Supported Multilingual Video Workflow Platform. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 319–320, Ghent, Belgium. European Association for Machine Translation.
Adhikary, P. K., Sugandhi, B., Ghimire, S., Pal, S., & Pakray, P. (2023). Travid: An end-to-end video translation framework. arXiv preprint https://doi.org/10.48550/arXiv.2309.11338
17. Explanation Performance – Evaluating the appropriateness and effectiveness of user-centered explanations
Supervisors: Katrin Schreiberhuber, Marta SabouMain idea: The thesis will consist in a literature review of available approaches for the evaluation of user-centered explanations, with an example evaluation of a set of given explanations for each method.
Background: A Cyber Physical System (CPS) represents the integration of computational and physical components, to tackle complex problems. However, the complexity of these systems makes it increasingly difficult for engineers and operators to understand the system behaviour, creating a need for explainable systems. Explanation is a broadly used term, as it can have different aims, such as transparency, increase of trust, effectiveness, persuasiveness, efficiency or satisfaction. The definition of a good explanation remains an open topic until today. Since explanations are generated with different goals in mind (as mentioned above), their evaluation tends to be application-specific as well. Recently, a few approaches have been developed that aim to evaluate explanations in a more structured manner. While these approaches are scarce, they are valuable resources to develop explanation frameworks for CPS.
Research Questions: What are characteristics of good explanations? How can they be extracted and evaluated? Is there a (semi-)automated way to evaluate explanations?
Methodology: Literature review
Expected Tasks:
Read and analyse papers on explanations, characteristics of user-centered explanations and explanation evaluation to understand the domain.
Summarize current approaches to evaluate user-centered explanations.
For a list of given explanations, apply the approaches identified above to these explanations.
Skills:
A basic understanding of automation systems, recommendation systems
Critical thinking
Good academic writing skills (e.g., course K5 in the SBWL KM)
References:
Sadeghi, Mersedeh, et al. "SmartEx: A Framework for Generating User-Centric Explanations in Smart Environments." arXiv preprint arXiv:2402.13024 (2024).
Confalonieri, R., & Guizzardi, G. (2023). On the Multiple Roles of Ontologies in Explainable AI. arXiv preprint arXiv:2311.04778.
Sokol, K., & Flach, P. (2020, January). Explainability fact sheets: A framework for systematic assessment of explainable approaches. In Proceedings of the 2020 conference on fairness, accountability, and transparency (pp. 56-67).
Tintarev, N., & Masthoff, J. (2007, October). Effective explanations of recommendations: user-centered design. In Proceedings of the 2007 ACM conference on Recommender systems (pp. 153-156).
18. Causal Model Extraction - LLMs as Explainability Support in Cyber Physical Systems
Supervisors: Katrin Schreiberhuber, Marta SabouMain idea: The thesis will explore a workflow to suggest potential causal relations between states in a cyber-physical system (CPS), such as smart grids or smart buildings. Given a system setup and potential faults, a list of root causes should be extracted using LLMs to help in the development of an automated root cause analysis. The workflow should consist of a multi-level approach, where first potential system states are identified with LLMs and where they might occur. Then, analyse how they are causally and/or physically connected to each other. To improve accuracy of the system, user interaction with an expert should be possible to give feedback to suggested results.
Background: Cyber-Physical Systems (CPS) integrate computing and physical components to manage complex tasks in various industries, such as power grids, buildings or factories. Due to their complex nature, their functioning can be challenging to understand for humans. Causal Knowledge is considered to be a key factor to enable causal inference. However, extracting this knowledge can be a time-consuming and expensive task, as it requires highly skilled workers to create a well-crafted causal model. Leveraging LLMs for their general knowledge to create a causal model about a system could potentially create an initial model if experts are unavailable or act as a starting point for experts to create a more sophisticated causal model.
Research Questions: How could a hybrid workflow, which leverages LLM and expert knowledge to create a causal model of a CPS, look like? Does the inclusion of LLMs increase the quality of the resulting causal model compared to expert-crafted systems?
Methodology: Prototype Implementation
Expected Tasks:
Start by reading literature on causal models and cyber-physical systems to get familiar with the topic.
Create a concept for a hybrid causal model generation workflow, enabling interaction between LLMs and expert users.
Implement a prototype and evaluate its performance.
Skills:
Prompt engineering
Critical thinking
Basic Programming Skills (Python)
Optional: familiarity with Cyber Physical Systems and Causality
References:
Blumreiter, M., Greenyer, J., Chiyah Garcia, F.J., Klos, V., Schwammberger, M., Sommer, C., Vogelsang, A., Wortmann, A., 2019. Towards Self-Explainable Cyber-Physical Systems, in: 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). https://doi.org/10.1109/MODELS-C.2019.00084
Jha, S.S., Mayer, S., García, K., 2022. Poster: Towards Explaining the Effects of Contextual Influences on Cyber-Physical Systems, in: Proceedings of the 11th International Conference on the Internet of Things, IoT ’21. Association for Computing Machinery, New York, NY, USA, pp. 203–206. https://doi.org/10.1145/3494322.3494359
Ibrahim, A., Kacianka, S., Pretschner, A., Hartsell, C., Karsai, G., 2019. Practical Causal Models for Cyber-Physical Systems, in: Badger, J.M., Rozier, K.Y. (Eds.), NASA Formal Methods, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp. 211–227. https://doi.org/10.1007/978-3-030-20652-9_14
Pearl, J., 2019. The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62, 54–60. https://doi.org/10.1145/3241036
19. Visual Analytics of Public Transport
Supervisor: Amin AnjomshoaaDescription:
In urban areas, surveillance technology has become ubiquitous, with cities extensively monitored through the deployment of numerous webcams. These cameras serve a multifaceted purpose, not only facilitating surveillance but also actively contributing to the management and regulation of various urban functions. Among their diverse applications, webcams play an important role in generating real-time data streams of mobility patterns and traffic conditions within cities.
The primary objective of this research is to leverage the existing webcam infrastructure across Austria to extract relevant information
regarding public transportation systems based on deep learning techniques. In Austria, there is a webcam service provided by ASFINAG (Autobahnen und Schnellstraßen Finanzierungs AG), that is responsible for planning, financing, building, and maintaining Austrian highways. By harnessing the live feeds from these webcams, the aim is to capture information concerning the movement and operational dynamics of public transport vehicles such as buses, tramways, and trains.
By combining live webcam footage with existing timetable data, valuable insights into the operations of public transportation can be achieved, and various indicators such as delays and service disruptions can be identified in real-time. This integration allows for the analysis of factors influencing service reliability, punctuality, and overall efficiency.
Keywords:
Visual Analytics, Public Transport, Open Data, Deep Learning
References:
ASFINAG Webcams, https://www.asfinag.at/verkehr-sicherheit/webcams/
Public Transport Timetables of Vienna,https://www.wien.gv.at/english/transportation-urbanplanning/timetables/
VTRACS - Visual Traffic Counting System, https://projekte.ffg.at/anhang/61a3b5a4e8507_VTRACS_Ergebnisbericht.pdf
20. A review of Knowledge Graph Refinement techniques with an emphasis on Wikidata
Supervisor: Nicolas Ferranti, Axel PolleresBackground
Knowledge graphs (KGs) are nowadays the main structured data representation model on the web, representing interconnected knowledge of different domains. There are several methods to model a KG. For instance, they can be extracted from semi-structured web data, like DBpedia, or edited collaboratively by a community, like Wikidata. Since there is no perfect method and knowledge about the world is constantly changing, regular updates in the KGs are required.
Knowledge graph refinement is the process of improving the quality and accuracy of a knowledge graph by adding, modifying or deleting entities, relationships or attributes based on new information or corrections. This process is crucial for ensuring that a knowledge graph reflects the current state of knowledge in a particular domain and that it can be used effectively for applications such as search, recommendation, and decision-making.
One of the main challenges in knowledge graph refinement is dealing with the large volume of data that is available, as well as the diverse sources and formats of that data. To address these challenges, researchers have developed a range of techniques for knowledge graph refinement, including entity resolution, relation extraction, entity linking, data fusion, and ontology alignment.
Overall, knowledge graph refinement is an important and ongoing process that is essential for ensuring that knowledge graphs remain up-to-date, accurate, and useful for a range of applications. As new information becomes available and our understanding of the world evolves, it will be necessary to continue refining and improving knowledge graphs to ensure that they reflect the current state of knowledge in a particular domain.
The goal of the thesis
The goal of this thesis is to provide a comprehensive review of knowledge graph refinement techniques with a specific focus on their application in Wikidata. The thesis will build upon the existing literature in the field, including the paper "Knowledge graph refinement: A survey of approaches and evaluation methods" published in 2017, and will seek to extend this work by providing a detailed analysis of the techniques that have been used to refine knowledge graphs and their possible application in Wikidata. This will include a critical evaluation of the strengths and weaknesses of different approaches, as well as an assessment of the effectiveness of various evaluation methods. Ultimately, the thesis aims to provide insights into how knowledge graph refinement techniques can be applied in the context of Wikidata, and how these techniques can be used to improve the accuracy,
completeness, and usefulness of the knowledge graph. The thesis can be focused on a reproducibility study, (re-)implementing one or several KG refinement approaches of some KG refinement task on an existing Knowledge graph such as Wikidata, or on a comprehensive literature survey.
Requirements
Pro-activity and self-organization. Programming skills.
Initial references
PAULHEIM, Heiko. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, v. 8, n. 3, p. 489-508, 2017.
Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., & Szekely, P. (2021). A Study of the Quality of Wikidata. arXiv preprint arXiv:2107.00156.
HOGAN, Aidan et al. Knowledge graphs. ACM Computing Surveys (CSUR), v.54, n. 4, p. 1-37, 2021.
21. Readiness Assessment of Health Data Space in Austria
Supervisor: Amin AnjomshoaaDescription:
The European Health Data Space (EHDS) [1] represents the initial proposal of the European Data Strategy [2] to establish domain-specific European data spaces as the foundation for a European Health Union. This initiative is designed to tackle health-specific challenges related to electronic health data access and sharing. It aims to enable individuals to control their electronic health data while providing researchers, innovators, and policymakers with the means to utilize this data in a trusted and secure manner that preserves privacy.
The Austrian healthcare sector has significant potential for enhancing healthcare through digitalization and optimizing data utilization via health data spaces. Austrian health authorities have already taken the initial steps toward achieving this overarching objective [3, 4]. Nevertheless, the conceptual ambiguity and synonymous usage of the term in both research and industry pose significant challenges to achieving a precise conceptualization and meaningful utilization of data spaces [5, 6].
The primary goal of this research is to investigate the current status of data space implementations in the healthcare sector across Europe.
The study aims to deliver a comprehensive analysis, offering insights into the adoption and utilization of data spaces in the context of EHDS proposal. Additionally, the research seeks to establish a benchmark for assessing the readiness of health data spaces in Austria, considering various perspectives such as technical aspects and policy frameworks.
Keywords:
Data Space, Health Industry, European Data Strategy
References:
European Health Data Space, https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space_en
European Commission. European Data Strategy (2020). https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy_en
Health Data Space in Österreich, https://www.sozialversicherung.at/cdscontent/load?contentid=10008.748742&version=1623841403
Wiener eHealth Strategie, https://www.wien.gv.at/spezial/ehealth-strategie/ziele-und-handlungsfelder-der-ehealth-strategie-der-stadtwien/europaischer-raum-fur-gesundheitsdaten-ehds/
Hutterer, A., Krumay, B., & Mühlburger, M. (2023). What Constitutes a Dataspace? Conceptual Clarity beyond Technical Aspects.
Hussein, R., Scherdel, L., Nicolet, F., & Martin-Sanchez, F. (2023). Towards the European Health Data Space (EHDS) ecosystem: A survey research on future health data scenarios. International Journal of Medical Informatics, 170, 104949.
22. Visualizing the Evolution of Wikidata Entities
Supervisor: Amin AnjomshoaaDescription:
Wikidata is a free and collaborative knowledge base maintained by volunteers and operated by the Wikimedia Foundation. In practice, it is used as a central repository for structured data and serves as a valuable resource for gathering, organizing, and sharing structured data in a collaborative manner, helping to support a wide range of knowledge-based projects and applications. The evolution of Wikidata entities over time showcases the dynamic nature of knowledge representation within the platform. As contributors continually update and refine information, entities undergo significant changes, reflecting the evolving understanding and expanding scope of various topics. Consequently, tracking the evolution of Wikidata entities provides valuable insights into the development of knowledge domains, the shifting focus of community efforts, and the evolving interconnections between different topics.
The goal of this research is to develop state-of-the-art visualization techniques to track the evolution of entities over time based on the Wikidata endpoints hosted at our institute, which provide entity data at various points in the past. Furthermore, it is desired to investigate the rationale behind the applied changes and to determine if these changes align with particular events associated with the target entities. To this end, the correlations between the timing of changes and significant occurrences relevant to the entities should be highlighted during the visualization process.
Keywords:
Wikidata, Visualization
References:
Vrandečić, D., & Krötzsch, M. (2014). Wikidata: a free collaborative knowledge base. Communications of the ACM, 57(10), 78-85.
DPKM Wikidata Endpoints, https://wikidata.ai.wu.ac.at/
Wikidata data model,https://iu.pressbooks.pub/wikidatascholcomm/chapter/datamodel/
Write a Thesis