Current Thesis Topics
SS 2022
Bachelor thesis topics will be assigned primarily to students of our SBWLs; and within this group of students, to those who have passed most SBWL courses, i.e., bring the necessary prior subject knowledge for successfully writing the bachelor thesis (ideally, have already passed the SBWL Research Seminar).
1. Event Correlation Based On Constraint Satisfaction Problem (Master Thesis)
Supervisor: Dina BayomieRecent years have seen an increasing availability of process execution data from several data sources. Process mining offers different analysis techniques to extract business insights from these data, known as event logs. An event log is a set of the executed process instances, i.e., cases. Each event in the event log represents an execution of process activity. The event's primary attributes carry the control-flow information as execution timestamp, executed activity, and case identifier. Additionally, an event has different data attributes that describe business objects used for the process execution and environment, e.g., resources, project name, location, cost..., etc.
In some circumstances, event logs do not include a case identifier. Such logs are called uncorrelated event logs. This problem occurs mainly when event logs are extracted from non-process-aware information systems, which do not keep track of case identifiers. In this case, the event log has to be pre-processed by grouping events into cases -- an operation known as event correlation.
A few approaches have investigated the issue of reasoning about uncorrelated events. However, these approaches either assume that the process does not contain cyclic behavior or that a complete process model is available. Also, these approaches suffer from poor efficiency.
The thesis objective is to build an event correlation engine that provides flexibility concerning the apriori knowledge about the process. We cannot assume full knowledge about the process or its runtime characteristics. Yet, the more information and constraints the user provides, the fewer combinations and closer to reality we should achieve. The minimum we know about the process is its designated start and end activities. All other information is optional. Examples of extra information can be a causal dependency between pairs of activities, mutual exclusiveness, task duration, resource dependency..., etc.
Using this information, we can encode the event correlation as a constraint satisfaction problem (CSP). Extra constraints are added based on the information given, and the solver finds a solution(s). A solution is a possible correlation of events to cases that satisfies all the given constraints.
Programming skills are required to be able to integrate existing tools and create a software prototype.
References:
[1] Diogo R. Ferreira, Daniel Gillblad: Discovering Process Models from Unlabelled Event Logs. BPM 2009: 143-158
[2] Shaya Pourmirza, Remco M. Dijkman, Paul Grefen: Correlation Miner: Mining Business Process Models and Event Correlations Without Case Identifiers. Int. J. Cooperative Inf. Syst. 26(2): 1-32 (2017)
[3] Dina Bayomie, Ahmed Awad, Ehab Ezat: Correlating Unlabeled Events from Cyclic Business Processes Execution. CAiSE 2016: 274-289.
[4] Bayomie, D., Di Ciccio, C., La Rosa, M., Mendling, J. (2019). A Probabilistic Approach to Event-Case Correlation for Process Mining. In: Laender, A., Pernici, B., Lim, EP., de Oliveira, J. (eds) Conceptual Modeling. ER 2019. Lecture Notes in Computer Science(), vol 11788. Springer, Cham.
[5] Rossi, Francesca, Peter Van Beek, and Toby Walsh, eds. Handbook of constraint programming. Elsevier, 2006
2. Explore Event Attributes' Association Relation (Bachelor or Master Thesis)
Supervisor: Dina BayomieRecent years have seen an increasing availability of process execution data from several data sources. Process mining offers different analysis techniques to extract business insights from these data, known as event logs. An event log is a set of the executed process instances, i.e., cases. Each event in the event log represents an execution of process activity. The primary attributes associated with an event carry the control-flow information as execution timestamp, executed activity, and case identifier. Additionally, an event has different data attributes that describe business objects used for the process execution and process environment, e.g., resources, project name, location, cost..., etc.
The analysts use the process mining techniques to discover the control-flow model, identify the deviations using conformance checking techniques and provide recommendations and suggestions for process improvement.
Unfortunately, these techniques focus on the event process-attributes and ignoring the event data-context attributes. Analyzing the data-context attributes will improve the understanding of the process execution environment and the performed analysis quality.
The data-context attributes provide information about the business objects affected by executing the process instances. These attributes may be constant within a case or change over time in a case. Detecting the data-context attributes changing patterns will help understand what triggers these changes and how they affect the process variations. To detect the changing patterns, we need to identify the association and correlation rules between the data-context attributes.
The thesis's objective is to build a tool that detects the associate and correlation rules between the log's event data-context attributes, and provide a proper visualization for them.
References:
[1] Pentland, B., et al. "Bringing context inside process research with digital trace data." Journal of the Association for Information Systems 21 (2020).
[2] Pegoraro, Marco, Merih Seran Uysal, and Wil MP van der Aalst. "Discovering Process Models from Uncertain Event Data." International Conference on Business Process Management. Springer, Cham, 2019.
[3] Motahari-Nezhad, Hamid Reza, et al. "Event correlation for process discovery from web service interaction logs." The VLDB Journal 20.3 (2011): 417-444.
[4] Hornik, Kurt, Bettina Grün, and Michael Hahsler. "arules-A computational environment for mining association rules and frequent item sets." Journal of Statistical Software 14.15 (2005): 1-25.
3. Performance measures based on the process data (Bachelor Thesis)
Supervisor: Dina BayomieBackground:
Business Process Management, Process Mining
Research problem:
Business process management (BPM) is an approach for controlling and operating the execution of the business process [1]. BPM handles the monitoring and evaluation of the business process. Monitoring the process performance is challenging without extracting the process performance measures from the event logs—four dimensions used for evaluating the process performance [2][3]. The time dimension focuses on the execution time behavior of the activities and process instances. The cost dimension evaluates the different cost types for each process instance, i.e., fixed and direct costs. The quality dimension focuses on the quality of the process execution, e.g., the fitness between the process instances and the expected behavior. The last dimension is the flexibility that evaluates the execution behavior over time and to what extent the process execution can handle the different workload levels.
This thesis aims to conduct a literature review of the performance measures upon the four dimensions. Moreover, to build a framework that expresses these measures in a formal definition of concepts accessible in the event log.
Initial references:
[1] M. Dumas, M. La Rosa, J. Mendling, and H. A. Reijers, Fundamentals of Business Process Management. Springer, 2013.
[2] M. Jansen-Vullers, M. Loosschilder, P. Kleingeld, and H. Reijers,
“Performance measures to evaluate the impact of best practices,” in
BPMDS workshop, vol. 1. Tapir Academic Press Trondheim, 2007,
pp. 359–368.
[3] A. Van Looy and A. Shafagatova, “Business process performance measurement: a structured literature review of indicators, measures and metrics,” SpringerPlus, vol. 5, no. 1, p. 1797, 2016.
4. Review of Process Models before the end of World War I (Bachelor or Master Thesis)
Supervisor: Djordje DjuricaProcess modeling has its roots in the scientific management of the late 19th and the early 20th century. So far, research publications around that time has never been analyzed systematically for process models and related visualizations. The goal of this thesis is to conduct a systematic review of the literature, identify different categories of models and visualizations, and compare them with contemporary concepts.
References:
Mendling, J.: Business Process Modeling in the 1920s and 1930s as reflected in Fritz Nordsieck’s PhD Thesis. Enterprise Modelling and Information Systems Architectures, 2021.
5. Systematic Literature Review of Process Mining User Evaluation Studies (Master Thesis)
Supervisor: Djordje DjuricaThe evaluation of the findings is a crucial part of a successful process mining project. So far, process mining papers, fall short in performing such evaluations. We aim to highlight the current state of affairs when it comes to user evaluations, showing usual practices, tasks, evaluation study designs.
References:
Koorn, J. J., Beerepoot, I., Dani, V. S., Lu, X., van de Weerd, I., Leopold, H., & Reijers, H. A. (2021). Bringing Rigor to the Qualitative Evaluation of Process Mining Findings: An Analysis and a Proposal. In 2021 3rd International Conference on Process Mining (ICPM) (pp. 120-127). IEEE.
Mendling, J., Djurica, D., & Malinova, M. (2021, September). Cognitive Effectiveness of Representations for Process Mining. In International Conference on Business Process Management (pp. 17-22). Springer, Cham.
6. Analyzing semantic inconsistency patterns of community-based KG (Bachelor Thesis)
Supervisor: Nicolas FerrantiBackground
Knowledge graphs (KGs) are nowadays the main structured data representation model on the web, representing interconnected knowledge of different domains. There are several methods to model a KG. For instance, they can be extracted from semi-structured web data, like DBpedia, or edited collaboratively by a community, like Wikidata. Since there is no perfect method and knowledge about the world is constantly changing, regular updates in the KG are required. Community-based KGs rely on the input of their users community, which can lead to the insertion of statements that violate the KG restriction rules.
Goal of the thesis
The goal of this thesis is to perform an empirical study on the mistakes done on Community-based KGs.
Requirements
It's interesting that the student has some previous knowledge and interest in databases, and knowledge graphs. Further desirable requirements are pro-activity and self-organization.
Initial references
● Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., & Szekely, P. (2021). A Study of the Quality of Wikidata. arXiv preprint arXiv:2107.00156.
● Amaral, G., Piscopo, A., Kaffee, L. A., Rodrigues, O., & Simperl, E. (2021). Assessing the quality of sources in Wikidata across languages: a hybrid approach. arXiv preprint arXiv:2109.09405.
● Vrandečić, D. (2012, April). Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st international conference on world wide web (pp. 1063-1064).
7. Analyzing class clusterization as an alternative to refinement of community-based Knowledge Graph (Bachelor Thesis)
Supervisors: Nicolas Ferranti and Stefan BachhofnerBackground
Knowledge graphs (KGs) are nowadays the main structured data representation model on the web, representing interconnected knowledge of different domains. There are several methods to model a KG. For instance, they can be extracted from semi-structured web data, like DBpedia, or edited collaboratively by a community, like Wikidata. Since there is no perfect method and knowledge about the world is constantly changing, regular updates in the KGs are required. Classes defined by the Wikidata KG can have millions of instances, however, in order for abstractions to be helpful, the number of subclasses of a given class should be relatively limited.
Goal of the thesis
The aim of this thesis is to investigate the viability of cluster-based techniques in creating groups of subclasses that can be used to reduce the amount of direct subclasses in Wikidata knowledge graph, creating a more concise and organized class hierarchy.
Requirements
It's interesting that the student has some previous knowledge and interest in databases, and knowledge graphs. Further desirable requirements are pro-activity and self-organization.
Initial references
● Ritchie, A., Chen, J., Castro, L. J., Rebholz-Schuhmann, D., & Jimenez-Ruiz, E. (2021, April). Ontology Clustering with OWL2Vec. In CEUR Workshop Proceedings. CEUR Workshop Proceedings.
● Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., & Szekely, P. (2021). A Study of the Quality of Wikidata. arXiv preprint arXiv:2107.00156.
● Vrandečić, D. (2012, April). Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st international conference on world wide web (pp. 1063-1064).
8. Literature Review: Methods of Inquiry into Organizational Aesthetics (Bachelor Thesis)
Supervisor: Clemens KerschbaumDifferent people like different pieces of art. They are attracted by the particular aesthetic that, for example, a painting provides. The same holds true for many other things like buildings, cars, and other products but also for intangible things like music or social environments. People prefer what they find aesthetically appealing, thus there likely is an aesthetic to almost everything in the world. Accordingly, this type of non-rational knowledge exists also in the field of business and management. A recently published Review in the Academy of Management Annals journal takes up the topic of aesthetics in organizations and presents an overview of the current body of literature on the topic (Baldessarelli et al., 2021). In that article you can also find references to other reviews on the topic of aesthetics in organizations. Your task in this thesis will be to analyze those papers in the AOM article that are classified as “aesthetics as a directed stimulus”. To this end your task will be to conduct a Rapid Structured Literature Review according to Armitage and Keeble-Allen (Armitage and Keeble-Allen, 2008) with particular focus on the research methods used to set up inquiry into aesthetics understood as a directed stimulus for actions. This means that you will have to pay close attention the method sections of the articles that you review in order to collect and summarize the methods that have been used to explore aesthetics as a directed stimulus.
Armitage, A., Keeble-Allen, D., 2008. Undertaking a Structured Literature Review or Structuring a Literature Review: Tales from the Field 6, 12.
Baldessarelli, G., Stigliani, I., Elsbach, K., 2021. The Aesthetic Dimension of Organizing: A Review and Research Agenda. Acad. Manag. Ann.
9. Collaborative Knowledge Graph Curation: Use Cases and Tooling (Bachelor Thesis)
Supervisor: Elmar KieslingBackground:
Knowledge graphs describe real-world entities and their relationships in a graph structure. The concept has attracted considerable attention since Google introduced its Knowledge Graph project in 2012 and since then, major web technology companies (Facebook, Bing, LinkedIn, Amazon etc.) have constructed and integrated knowledge graphs into their business.
Due to their use as an underlying infrastructure technology, however, tooling to directly interact with knowledge graphs and edit them collaboratively has not been a key focus in industry. Community-centric open projects such as Wikidata have demonstrated that encyclopedic knowledge graphs and their schema can be curated collaboratively at scale. In industry, however, such collaborative knowledge graph construction approaches have seen more limited adoption, which may partly be attributed to a lack of adequate tooling.
Research problem:
In this Bachelor thesis, you will (i) conduct a systematic survey of reported use cases and instances of collaboratively constructed knowledge graphs, (ii) develop requirements and a framework to compare tooling to support collaborative KG construction, and (ii) review, compare, and contrast existing tools.
Prerequisites:
Interest in collaborative knowledge engineering
Interest in comparing and evaluating tools
Familiarity with knowledge graph concepts is beneficial, but not necessary.
Initial references:
Hogan, A., Blomqvist, E., Cochez, M., D’Amato, C., Melo, G.D., Gutiérrez, C., Gayo, J.E., Kirrane, S., Neumaier, S., Polleres, A., Navigli, R., Ngomo, A.N., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., & Zimmermann, A. (2020). Knowledge Graphs. ArXiv, abs/2003.02320. https://arxiv.org/abs/2003.02320
Gomez-Perez, Jose Manuel, et al. "Enterprise knowledge graph: An introduction." Exploiting linked data and knowledge graphs in large organizations. Springer, Cham, 2017. 1-14. https://pdfs.semanticscholar.org/70d2/d131861a49e5875fcaaeaf9478ac61c05734.pdf
Diefenbach, D., Wilde, M. D., & Alipio, S. (2021, October). Wikibase as an infrastructure for knowledge graphs: The eu knowledge graph. In International Semantic Web Conference(pp. 631-647). Springer, Cham.
https://hal.archives-ouvertes.fr/hal-03353225/documentTudorache, Tania, et al. "WebProtégé: A collaborative ontology editor and knowledge acquisition tool for the web." Semantic web 4.1 (2013): 89-99.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3691821/Stellato, Armando, et al. "VocBench 3: A collaborative Semantic Web editor for ontologies, thesauri and lexicons." Semantic Web 11.5 (2020): 855-881.
http://art.uniroma2.it/publications/docs/2020_SWJ_VocBench3.pdf
Example tools: Wikibase, Semantic MediaWiki, VocBench, WebProtégé, gra.fo, PoolParty
10. Employing the Organizational Phronesis Scale (Bachelor Thesis)
Supervisor: Florian KraguljThe concept of phronesis, dating back to Aristotle, has recently been „rediscovered“ and has entered the stage of knowledge management (Nonaka & Takeuchi, 2019, 2021). In essence, it is about doing the right thing in a particular context to promote the common good. However, the concept remains theoretically elusive and empirically difficult to test. In a recent attempt to address this shortcoming, Raysa et al. (2021a, 2021b, in prep.) propose the Organizational Phronesis Scale (OPS). In this bachelor thesis, you will theoretically related organizational phronesis to (another) concept(s) (e.g., organizational purpose) and are among the first to use the OPS in combination with another/other scale(s) you identify in the literature. You will perform basic statistical analysis (correlation) of empirical data, that you will need to obtain, and discuss your findings.
Nonaka, I., & Takeuchi, H. (2021). Humanizing strategy. Long Range Planning, 102070.
Nonaka, I., & Takeuchi, H. (2019). The wise company: How companies create continuous innovation. Oxford University Press.
Rocha G. R., Pinheiro, P., D‘Angelo, M., & Kragulj, F. (2021) Organizational Phronesis Scale Development. 22nd European Conference on Knowledge Management - ECKM 2021
Rocha G. R., Pinheiro, P., Kragulj, F., & Nunes C. (2021) There remains much to learn about organizational phronesis. Theory and Applications in the Knowledge Economy - TAKE 2021
Rocha G. R., Pinheiro, P., Kragulj, F., & Nunes C. (in prep.) ONE STEP TOWARDS RECOGNIZING THE PRACTICALLY WISE COMPANY: MEASUREMENT AND VALIDITY
11. Digital Transformation of Small and Medium-Sized Enterprises: A Structured Literature Review on the State of the Field (Bachelor Thesis)
Supervisor: Florian Kragulj“Digitalization offers unprecedented opportunities for entrepreneurial small and medium-sized enterprises (SMEs)” (Cenamor et al., 2019, p. 196). However, due to their nature, SMEs require special consideration in research on digitization and digital transformation (D&DT). SMEs have particular characteristics that distinguish them significantly from large and multinational companies in terms of the challenges and approaches to D&DT (González-Varona et al., 2021). In this bachelor thesis, you will conduct a structured literature review on the state of research on D&DT of small and medium-sized enterprises, paying particular attention to empirical insights (i.e., projects/approaches to D&DT) as well as conceptual work on the digital readiness of SMEs (North et al., 2020). You will provide an outlook on the knowledge required to successfully exploit the opportunities of D&DT.
Cenamor, J., Parida, V. and Wincent, J. (2019) ‘How entrepreneurial SMEs compete through digital platforms: The roles of digital platform capability, network capability and ambidexterity’, Journal of Business Research. Elsevier, 100(December 2018), pp. 196–206.
González-Varona, J. M. et al. (2021) ‘Building and development of an organizational competence for digital transformation in SMEs’, Journal of Industrial Engineering and Management, 14(1), pp. 15–24.
North, K., Aramburu, N. and Lorenzo, O. J. (2020) ‘Promoting digitally enabled growth in SMEs: a framework proposal’, Journal of Enterprise Information Management, 33(1), pp. 238–262.
12. Visualizing Open Data in Virtual and Augmented Reality (Bachelor or Master Thesis)
Supervisor: Johann MitlöhnerBackground:
Developing new methods for exploring and analyzing data in virtual
and augmented reality presents many opportunities and challenges, both
in terms of software development and design inspiration. There are various
hardware options, from Google Cardboard to Oculus Rift.
Taking part in this challenge demands programming skills as well as creativity.
A basic VR or AR application for exploring a specific type of open data will
be developed by the student. The use of a platform-independent kit such as
A-Frame is essential, as the application will be compared in a small
user study to its non-VR version in order to identify advantages and disadvantages
of the visualization method implemented. Details will be discussed with supervisor.
Research problem:
How can AR and VR be used to improve exploration of data?
Some References:
Butcher, Peter WS, and Panagiotis D. Ritsos. "Building Immersive Data
Visualizations for the Web." Proceedings of International Conference on
Cyberworlds (CW’17), Chester, UK. 2017.
Teo, Theophilus, et al. "Data fragment: Virtual reality for viewing and
querying large image sets." Virtual Reality (VR), 2017 IEEE. IEEE, 2017.
Millais, Patrick, Simon L. Jones, and Ryan Kelly. "Exploring Data in
Virtual Reality: Comparisons with 2D Data Visualizations." Extended
Abstracts of the 2018 CHI Conference on Human Factors in
Computing Systems. ACM, 2018.
More info at mitloehner.com/lehre/thesis-en.html
13. Text Mining using Tensorflow & GPU (Bachelor Thesis)
Supervisor: Johann MitlöhnerExplore capabilities of state of the art GPU units providing huge RAM and processing power for neural net and deep learning in Python and Keras; new hardware now available at our institute, accessible conveniently via remote interface. Details to be discussed with supervisor. Some programming experience in Python required.
Minqing Hu, Bing Liu, "Mining and summarizing customer reviews", KDD '04, pp. 168-177
machinelearningmastery.com/prepare-text-data-deep-learning-keras/
keras.io
14. Data-Evolution and Data quality analysis of Wikidata (wikidata.org) (Master Thesis)
Supervisor: Axel PolleresWikidata [1] as a project of the Wikimedia Foundation is a publicly available and collaboratively edited knowledge graph that can be queried using SPARQL and accessed using other APIs containing a wide range of factual knowledge that for instance is used to enrich Wikipedia. In this project you should think about how to use techniques such as network analysis or temporal queries to analyse the structure and evolution of this knowledge graph. To this end, we provide several historical snapshots of Wikidata as compressed HDT [2] dumps [3].
1. Denny Vrandecic, Markus Krötzsch: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10): 78-85 (2014)
2. Javier D. Fernández, Miguel A. Martinez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. Binary RDF Representation for Publication and Exchange (HDT). Journal of Web Semantics (JWS), 19(2), 2013.
15. Linking Wikidata to OpenData (Bachelor or Master Thesis)
Supervisor: Axel PolleresOver the past years, both Open Government Data published by public administration institutions, as well as public knowledge graphs, such as Wikidata [1] or DBpedia [2] have evolved in parallel as sources to freely access information of public interest. However,
interlinking Knowledge Graphs such as Wikidata with other Open Data Sources such as Open Data from Open Government Data Portals remains a challenge. We have published two papers outlining challenges in this context, which could be prototypically addressed using different Data Science and AI methods [1,2]. The goal of this thesis is to further investigate how these challenges can be tackled and prototypically interlink information from specific portals (e.g. data.gv.at or data.europa.eu) to Knowledge graphs by modelling and interlinking organisations, geographic entities, etc.
1. Jan Portisch, Omaima Fallatah, Sebastian Neumaier, and Axel Polleres. Challenges of linking organizational information in open government data to knowledge graphs. In 22nd International Conference on Knowledge Engineering and Knowledge Management (EKAW 2020), volume 12387 of Lecture Notes in Computer Science (LNCS), pages 271--286, Bozen-Bolzano, Italy, September 2020. Springer. dx.doi.org/10.1007/978-3-030-61244-3_19
2. Bernhard Krabina, Axel Polleres: Seeding Wikidata with Municipal Finance Data. 2021 http://ceur-ws.org/Vol-2982/paper-9.pdf
16. Aggregating a knowledge Graph of Semantic Wikis on the Web: Integrate and crawl Data (Bachelor Thesis)
Supervisor: Axel PolleresSemantic MediaWiki (SMW) [1,2] is a kind of predecessor of Wikidata/Wikibase [3,4] as a Wiki platform for aggregating and collecting structured information that can be accessed as a Knowledge Graph using Semantic Web technologies such as RDF and SPARQL.
Since SMW is around for a while already, there are many small and medium size instances of it available on the Web.
The idea of this thesis is
- to investigate where and how SMW and Wikibase are used in practice and whether or how they make Semantic Data available, and which schemas they are using (analyse the classes, properties and amount of instance data)
- to automatically collect, aggregate and analyse RDF Data from these installations
1. Markus Krötzsch, Denny Vrandecic:
Semantic MediaWiki. Foundations for the Web of Information and Services 2011: 311-32
2. https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki
3. Denny Vrandecic, Markus Krötzsch: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10): 78-85 (2014)
17. Comparing and ranking transparency and Open Data around Covid (Master Thesis)
Supervisor: Axel PolleresBased on and extending the results of a prior thesis that has analysed COVID 19 dashboards and Open Data availability on a national level
aic.ai.wu.ac.at/~polleres/supervised_theses/Felix_Helmreich_BSc2021.pdf
the goal of this thesis is to compare on a timeline which countries published which indicators about COVID-19 and analyse transparency with respect to these indicators, as well as setting the timeline of publishing and adapting dashboards in context of the development of the pandemic. As possible extension of the prior thesis, for instance variant sequencing data was published and made available only very late and by a few countries only. cf. https://twitter.com/EllingUlrich/status/1485959300614918148
18. Potential for Open Access and Open Data in WU's research publications (Bachelor or Master Thesis)
Supervisor: Axel PolleresThere is increasing pressure for more transparency, easier access and openness in scientific research, as shown by initiatives such as the Go FAIR [1] Initiative by the research data alliance (RDA), www.go-fair.org, or the increasing demand for more open access to research publications:
While traditionally research publications have been often published by commercially acting publishers, the EU and many other funding agencies now typically demand Open Access to all research artefacts and publications they fund, as far a this is not possible and not precluded for instance by sensitivity or legitimate business interests (for instance regarding patents, personal data, etc.).
In order to fulfil the requirements for Open Access, there are different models [2]: Publishing with non-commercial publishers and journals who make their publications freely available automatically (Diamond or Platinum Open Access), or respectively paying a fee to publishers as an author (Gold Open Access). Alternatively, many publishers, while charging for their publications, at least permit authors to make their own articles available on the authors personal Webpage or an institutional electronic publications repository (Green Open Access). Still, many authors do not fully make use of the latter option.
The goal of this thesis is a data science project to
a) assess the potential of Green Open Access at WU, by analysing different sources such as FIDES [3] and ePub [4] in order to assess to what extend WU's authors use Green Open Access opportunities.
b) develop a tool that flags such Open access options to authors
The main difficulties/challenges in this task will be data integration (ambiguous or misspelt author names, publication titles or journal names) as well as building up a knowledge base for Green Open Access options. The latter at least is already available as a tool for WU researchers through the SHERPA ROMEO database [5].
The topic could be split up into several topics or being worked on collaboratively by more than one student, focusing on complementary aspects, such as:
* data integration/entity resolution
* making consolidated metadata available as Linked Open Data [6] according to the FAIR principles [1]
* developing a tool usable for WU's authors or WU's library to flag Green Open Access opportunities.
1. www.nature.com/articles/sdata201618
19. Recommendation systems using Knowledge Graphs (Bachelor Thesis)
Supervisor: Dawa ChangBackground:
This is the era of machine learning (ML) and AI. The limitation, however, lies in its explainability. When the results of machine learning recommend something to you, it is natural to want to know why it was recommended to you. This is because humans understand and accept situations and decisions as contexts. That’s why explainable ML/AI has become an active field of research these days. The latest machine learning solutions often provide highly accurate, but hardly scrutable and interpretable decisions [1]. Knowledge Graphs (KGs) are attracting attention, in this regard, as an alternative to the explainability of the decisions by ML/AI.
Research problem:
This thesis aims to (1) try out two or three recommendation systems using KGs, (2) see what and how much the systems recommend, (3) see if the results can be explained, (4) and compare the differences between the recommendation systems. When choosing recommendation systems to try out, the most important criterion is its explainability, and a student is free to choose it in discussion with the supervisor of this topic.
Initial references:
Tiddi, I., & Schlobach, S. (2022). Knowledge graphs as tools for explainable machine learning: A survey. Artificial Intelligence, 302, 103627.
https://www.sciencedirect.com/science/article/pii/S0004370221001788
Guo, Q., Zhuang, F., Qin, C., Zhu, H., Xie, X., Xiong, H., & He, Q. (2020). A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering.
https://arxiv.org/pdf/2003.00911.pdf
Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78-94.
https://www.sciencedirect.com/science/article/pii/S0950705118301540
Zou, X. (2020, March). A survey on application of knowledge graph. In Journal of Physics: Conference Series (Vol. 1487, No. 1, p. 012016). IOP Publishing.
https://iopscience.iop.org/article/10.1088/1742-6596/1487/1/012016/pdf
20. Knowledge Graph Embedding Applications in Manufacturing (Bachelor or Master Thesis)
Supervisor: Stefan BachhofnerBachelor Thesis with the option to make it a Master Thesis
Knowledge graphs are means to model a domain of interest via relationships between objects, where the objects are the nodes and the relationships are the edges of a graph (Hogan, 2020). See (Rotmensch, 2017) for an example from medicine. These nodes and edges can be further mapped into a continuous vector space – which is referred to as embedding a knowledge graph (Wang, 2017). Embeddings are, however, not limited to nodes and edges but are also done for substructures (subset of nodes and/or edges) and even the whole-graph (Cai, 2018). Manufacturing companies hence might find the knowledge graph properties, i.e., explicit modelling of relationships and ability to embed the graph, helpful in achieving business goals, e.g., when multiple sensors capture data on an object which implies dependent measurements (Garofalo, 2018).
The objective of this thesis is to conduct a comprehensive literature review on applications of knowledge graph embeddings in the context of manufacturing and Industry 4.0. The review shall start with identifying companies that offer semantic web technologies, and gather information about who there customers are. Based on this the student should investigate the grey literature (in other words non-scientific pieces of literature). Examples of grey literature include company websites and marketing artefacts from the said semantic web companies. In the thesis, the student will systematically identify and characterize use cases upon multiple dimensions (for example industry type and the type of process) which might reveal interesting patterns, and even new use cases. The student might also be asked to give her/his opinion on the current state of the art.
Resources
CS 520, Knowledge Graphs, Stanford University, https://web.stanford.edu/class/cs520/, web.stanford.edu/~vinayc/kg/notes/Table_Of_Contents.html
References
Hogan, A., Blomqvist, E., Cochez, M., d'Amato, C., de Melo, G., Gutierrez, C., ... & Zimmermann, A. (2020). Knowledge graphs. arXiv preprint arXiv:2003.02320.
Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S., & Sontag, D. (2017). Learning a health knowledge graph from electronic medical records. Scientific reports, 7(1), 1-11.
Wang, Q., Mao, Z., Wang, B., & Guo, L. (2017). Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2724-2743.
Cai, H., Zheng, V. W., & Chang, K. C. C. (2018). A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9), 1616-1637.
Garofalo, M., Pellegrino, M. A., Altabba, A., & Cochez, M. (2018). Leveraging knowledge graph embedding techniques for industry 4.0 use cases. arXiv preprint arXiv:1808.00434.
Write a Thesis