Funding for research with Azure services
Microsoft Azure is a platform providing various cloud services, such as virtual servers or services with artificial intelligence. Employees of the University of Vienna may use these services provided by the ZID for the purposes of research, subject to a fee at special conditions. More information about Microsoft Azure
In order to support research activities in Azure, the ZID offers financial support for the calendar year 2025. A total of 20,000.00 euros is available. Up to 5,000.00 euros will be awarded per project.
Projects with one of the following characteristics will be prioritised when the funding is awarded:
- They work with hybrid approaches (combined use of Azure services with local infrastructure)
- They use Azure services with artificial intelligence
- They use Azure services for which the ZID does not offer alternative IT services
Funded research projects
-
Susanne Blumesberger, Herbert Van Uffelen | iConTxt – AI for the DLBT
Organisational unit: Vienna University Library and Archive Services, Library - Research and Publication Services
Abstract:
For several years, a digital library and a digital bibliography on literature in translation have been established at the Vienna University Library (Department of Repository Management PHAIDRA Services) under the name DLBT. The Phaidra-add-on DLBT uses the YARM package as technical infrastructure and Phaidra as a long-term repository.
The DLBT not only collects data on translated literature, but also texts (and other assets) on their reception. Currently, over 65,000 translations, adaptations and reception documents are listed in the DLBT and more than 24,000 digital copies are available for research.
As part of the project ‘iConTxt - AI for the DLBT’ (7 European partners, funded by the Taalunie), led by Herbert Van Uffelen, the possibilities and limits of using artificial intelligence for the DLBT are being explored and a new YARM package (iConTxt) is being developed for the DLBT. The new package uses AI to support users in entering data and to improve the quality and accessibility of the information available in the DLBT. During the quality improvement process, the metadata are checked and ‘improved OCRs’ of the scanned texts are created.
iConTxt creates further English translations and English summaries for every reception text in the DLBT and creates relationships to the translations listed in the DLBT. In combination with the results of specific web queries, the DLBT-specific ‘knowledge pool’ generated by iConTxt will be used for Retrieval Augmented Generation of new information on authors, translators, publishers and translated titles mentioned in the DLBT.
Category: Geförderte Forschungsprojekte 2025 -
Can Çelebi | ExpBotEval: Advancing Robust Instruction Bots for Experimental Economics
Organisational unit: Faculty of Business, Economics and Statistics - Vienna Center for Experimental Economics
Abstract: This project aims to develop and validate a robust framework for stress testing chatbots powered by large language models (LLMs) designed to serve as instruction bots in experimental economics. These bots are intended to provide participants with accurate and unbiased instructions, ensuring experimental integrity and minimizing noise in collected data. While general-purpose tools exist for stress testing commercial chatbots, the unique demands of experimental economics require the development of a specialized stress testing protocol.
The framework involves the creation of two LLM-based agents: the Stress Tester Bot (STB) and the Evaluator Bot (EB). STB acts as a synthetic subject, attempting to “break” the Instruction Bot (IB) by eliciting biased, false, or irrelevant responses. EB evaluates these interactions, identifying problematic outputs from IB to ensure its reliability and robustness.
Two potential approaches for developing STB and EB are considered. The first approach leverages multi-agent pipeline structures using larger, off-the-shelf LLMs (e.g., 4o or o1), prioritizing accessibility, flexibility, and scalability. This method eliminates the need for fine-tuning and data collection, making it an appealing and computationally efficient option for the broader social science community. However, whether this approach can achieve desirable performance for stress testing without the need for fine-tuning and data collection remains an open empirical question.
The second approach involves fine-tuning smaller LLMs (e.g., 4o-mini) on data collected from incentivized experiments where subjects are instructed to “break” the IB. While this approach is likely to yield more reliable and tailored outcomes, it requires extensive data collection, computational resources, and fine-tuning for each specific experiment, potentially limiting its scalability and accessibility, particularly for resource-constrained researchers.
The project will initially pursue the multi-agent pipeline approach and only consider fine-tuning if the first approach fails to meet reliability standards. Funding is sought to access computational resources, via Azure infrastructure and API credits, to support development and testing. This research addresses a critical gap in the field, providing a scalable and objective standard for stress testing LLM-based tools in experimental economics, ultimately enhancing the reliability of experimental outcomes and promoting the broader adoption of instructional chatbots in social science research.
Category: Geförderte Forschungsprojekte 2025 -
Laura Gandlgruber | u:know
Organisational unit: Teaching Affairs and Student Services
Abstract:
This project proposes the development of a pilot knowledge database to provide University of Vienna students with quick and accurate answers to their study-related questions. The platform will leverage information from the existing university website, “studieren.univie.ac.at”, and will be built using Azure Cognitive Search and the Microsoft Power Platform. This approach aims to improve information accessibility, reduce time spent searching for answers, and enhance the overall student experience.
Category: Geförderte Forschungsprojekte 2025 -
Martin Gasteiner | Impacts with Abstracts & Talking Abstracts: Echoes of Knowledge
Organisational units:
- Vienna University Library and Archive Services – Department for Bibliometrics and Publication Strategies,
- Vienna University Library and Archive Services – Library-Communications,
- Zentraler Informatikdienst – IT-Support for Research
Abstract:
The “Impacts with Abstracts” project delves into the transformative potential of AI in analyzing and reshaping scientific abstracts to create innovative services for the University of Vienna. This initiative is driven by a cross-departmental team: Janos Bekesi (IT Support), Martin Gasteiner (PHAIDRA, UB Communications), and Christian Gumpenberger and Lothar Hölbling (Bibliometrics and Publication Strategies). Their goal is to enhance the accessibility and usability of scientific content for both internal and external stakeholders.
A key feature of this project is the podcast “Talking Abstracts: Echoes of Knowledge”, which distills complex research findings into engaging content making it accessible to various audiences. Internally it helps in the efficient identification of relevant content for Communications, thereby enhancing the university’s outreach efforts. It promotes interdisciplinary networking and supports the rectorate in strategic planning and decision-making.
“Impacts with Abstracts” integrates AI technology with science communication, enhancing the visualization and dissemination of university research outputs.
Category: Geförderte Forschungsprojekte 2025 -
Fares Kayali | Technical Feasibility for Lab Chatbot CE Lab
Organisational unit: Centre for Teacher Education – Department for Teacher Education
Abstract:
The Computational Empowerment Lab is part of the Digitalisation in Education unit and is located at the Centre for Teacher Education at the University of Vienna. The Lab sees itself as an intellectual and practical space for the realisation of ideas, research and projects to promote computational empowerment. This is done using a wide range of technologies. In the first stage of the project, a self-trained chatbot (self-trained text-based AI) will be developed and implemented, which is specially designed for questions about CE-Lab equipment and CE-Lab questions from students. This chatbot is intended to provide students with fast and reliable help by responding to frequently asked questions and problems in the CE-Lab environment. By integrating this chatbot into the CE-Lab, the existing support offer for the use of the various technologies is to be expanded and direct and efficient support for students is to be ensured, resulting in an improved learning experience and a reduction in the workload of teaching staff. In the second stage of the project, which is considered a “nice to have”, didactic recommendations around the CE-Lab will be developed on the basis of (own) scientific and didactic literature, specialised articles and brochures (e.g. the Computational Empowerment in Practice brochure). These recommendations are intended to help students gain a deeper understanding of the use and handling of lab technologies. The reason for the realisation of this project lies in the existing bias, lack of transparency and lack of comprehensibility of existing models. This project aims to create a pilot for a specialised educational LLM (Large Language Model) that addresses these challenges and provides transparent and comprehensible support in the education sector.
Category: Geförderte Forschungsprojekte 2025 -
Thomas Kohlwein | Unlocking library heritage with data from historic catalogues
Organisational unit: Faculty of Philological and Cultural Studies – Department of German Studies
Abstract:
Libraries hold a treasure trove of historic catalogues describing their collections. But since many of them are in the handwriting style of their times, they are not easily accessible anymore to readers in general. While many libraries created new catalogue data based on automation to create the online systems we know today, many libraries with an especially high number of historic material still suffer of low visibility of their collections to the public. This project will use Computer Vision API to achieve both a higher rate of identification of individual books and a systematic collection of historic metadata: How books were historically arranged by topic, location, etc. and what specific information libraries have on items like donations or book plates. In making large library collections visible we unlock the stories hidden in the stacks to appreciate our common heritage.
Project website: germ.univie.ac.at/projekt/kwk/
Category: Geförderte Forschungsprojekte 2025 -
Ema Kusen | Hybrid Azure-Powered Detection of High-Risk Texts
Organisational unit: Faculty of Informatics – Research Group Security and Privacy
Abstract:
The detection of grievances in social media texts is critical for understanding their role in shaping public discourse and their potential to escalate into collective unrest or harmful actions. Grievances are psychological responses to perceived injustices and have been shown to be precursors to radicalization or mobilization for contentious actions (Scrivens, 2022). This project develops a hybrid AI framework that combines Microsoft Azure’s Cognitive Services, including Text Analytics and the Azure OpenAI Service, with on-premises infrastructure to detect grievances in high-risk social media texts. This approach ensures scalability, flexibility, and compliance with data protection regulations. By integrating Azure Machine Learning for model training and deployment, and utilizing Azure’s Explainable AI tools, the project prioritizes interpretability and transparency in detecting grievance-related patterns. The outcomes will provide actionable insights into grievance propagation and its impact on digital discourse, contributing to computational social science and the ethical design of AI systems. This framework will also serve as a replicable model for combining cloud-based AI and local infrastructure in high-stakes text analysis applications, providing valuable insights for policymakers, platform moderators, and researchers in managing online discourse.
Category: Geförderte Forschungsprojekte 2025 -
Mauricio Martins | Building and Evaluating Historical Large Language Models for Understanding Psychological Dimensions in Fiction
Organisational unit: Faculty of Psychology – Department of Cognition, Emotion, and Methods in Psychology
Abstract:
Large-scale surveys are crucial for understanding the cultural dynamics of populations over time. However, applying these tools to study long-term historical trends remains a recent development. Historical psychology has emerged as a field that uses past fiction as “cognitive fossils” to uncover insights into societal values, emotions, and moral frameworks of bygone eras [1].
Traditional computational methods, such as bag-of-words and topic modeling, have revealed intriguing trends, such as rising sentiments towards cooperation preceding the French Revolution and English Civil War, followed by a decline [2]. However, these methods fail to capture the contextual and nuanced use of language in historical texts [3]. Advances in large language models (LLMs) now offer tools to overcome these limitations. While foundational LLMs excel in modern language analysis, they are not optimized for historical texts, where linguistic conventions differ. Recent theoretical work proposes that these challenges can be addressed by developing Historical Large Language Models (HLLMs) [4].
This project aims to develop and compare two approaches to analyzing historical texts:
- Fine-Tuning LLMs with Historical Corpora: This involves building HLLMs tailored to English, French, and German fiction from the 16th to 19th centuries [4]. It involves modifying the model’s parameters using techniques like LoRA (Low-Rank Adaptation) and PEFT (Parameter-Efficient Fine-Tuning) to adapt the foundation model efficiently to task-specific needs. Using Azure’s infrastructure, we will fine-tune the open-source Llama 3.3-70B on a pre-processed corpus of ~15 million tokens. The fine-tuning process will leverage NVIDIA A100 GPU (e.g., NC24ads A100 v4). After fine-tuning, the HLLM will classify texts across psychological dimensions, replicating Martins & Baumard (2020) [2] - which used bags of words.
- Optimizing Prompt Engineering: This involves leveraging GPT-4 - the gold standard tool for text annotation in psychological sciences [5] - with refined instructions to analyze historical corpora without retraining [6], such as requesting that the annotation consider the year the text was produced. Prompt optimization will utilize Azure OpenAI Services to refine instructions for the LLMs iteratively. The context window constrains this approach and relies on precise and iterative prompt refinement.
This project will deliver a paper that describes both a methodological advancement and a replication of Martins & Baumard (2020). It will comprehensively evaluate fine-tuning versus prompt engineering across several dimensions: a) Comparison Against Student Classifications: The outputs of both approaches will be validated against human-annotated classifications of 3000 sentences. Metrics such as precision, recall, and F1 scores will quantify alignment with human evaluations, providing a precise measure of accuracy. b) Comparison Against Bag-of-Words Benchmark: The results from both approaches will be compared with the findings from the Martins & Baumard (2020) [2] study on democratic sentiments, which utilized traditional Bag-of-Words methods. c) Comparison of Cost and Time Efficiency
References
[1]. Baumard, N., Safra, L., Martins, M. & Chevallier, C. Cognitive fossils: Using cultural artifacts to reconstruct psychological changes throughout history. Trends in Cognitive Sciences (2024).
[2]. Martins, M. D. & Baumard, N. The rise of prosociality in fiction preceded democratic revolutions in Early Modern Europe. Proc Natl Acad Sci USA 117, 28684 (2020).
[3]. Martins, M. D. & Baumard, N. How to Develop Reliable Instruments to Measure the Cultural Evolution of Preferences and Feelings in History? Frontiers in Psychology 13, (2022).
[4]. Varnum, M. E. W., Baumard, N., Atari, M. & Gray, K. Large Language Models based on historical text could offer informative tools for behavioral science. Proceedings of the National Academy of Sciences 121, e2407639121 (2024).
[5]. Rathje, Steve, Dan-Mircea Mirea, Ilia Sucholutsky, Raja Marjieh, Claire E. Robertson, and Jay J. Van Bavel. "GPT is an effective tool for multilingual psychological text analysis." Proceedings of the National Academy of Sciences 121, no. 34 (2024): e2308950121.
[6]. Dubourg, E., Thouzeau, V. & Baumard, N. A step-by-step method for cultural annotation by LLMs. Front. Artif. Intell. 7, (2024).
Category: Geförderte Forschungsprojekte 2025 -
Stefan Wagner | EduBot: Evaluating an AI Teaching Assistant for Enhanced Learning Outcomes
Organisational unit: Faculty of Business, Economics and Statistics – Department of Accounting, Innovation and Strategy
Abstract:
We intend to develop a prototype of a teaching assistant chatbot using Microsoft Azure infrastructure. The proposed project aims to integrate cutting-edge AI technologies into the teaching process, to enhance student engagement and learning outcomes. By leveraging Azure OpenAI Service, Azure Bot Service, and Azure App Service, the chatbot will be designed to fulfill the core functions of a teaching assistant, providing timely responses to student queries, offering tailored academic guidance, and supporting administrative tasks. It will be trained on teaching materials available to students and provide library access via ResearchGate and JSTOR. The primary goal for 2025 is to develop and evaluate a functional prototype that reliably performs the intended tasks. This initial phase will involve designing, deploying, and refining the chatbot using Microsoft Azure’s robust suite of AI and cloud services. Building upon the insights gained during the prototyping stage, the project’s subsequent phase will focus on investigating the educational impacts of deploying such a chatbot in a real-world classroom environment. By the end of 2025, I aim to initiate a systematic study to observe student interactions with the chatbot and evaluate the resulting learning outcomes. Using carefully crafted experimental variations, the study will identify causal effects of chatbot use on student performance and engagement. This research phase, planned for 2026, will provide empirical evidence regarding the efficacy of AI-powered teaching assistants in higher education.
Category: Geförderte Forschungsprojekte 2025 -
Anna Weichselbraun | Analyzing the language of AI regulation
Organisational unit: Faculty of Historical and Cultural Studies – Department of European Ethnology
Abstract:
The Regulatory Language Analyzer Pilot Project presents a focused investigation into how different jurisdictions conceptualize AI governance through their regulatory frameworks, concentrating specifically on comparative analysis between EU and US approaches. It is intended as a preliminary methodological exploration for an ERC Consolidator Grant proposal to be submitted by the PI in January 2025. This 7-month study leverages Azure's ML infrastructure to develop and validate a specialized language model for analyzing regulatory AI policy documents, serving as a proof-of-concept for larger-scale cross-cultural policy analysis.
The project employs a streamlined fine-tuning process using Azure OpenAI GPT-3.5 as the base model, focusing on English-language regulatory documents from the EU AI Act and US federal and state-level AI frameworks. Through a single-stage fine-tuning approach, the model will be optimized to identify and analyze key regulatory patterns, linguistic features, and policy concepts specific to AI governance. This targeted approach allows for rapid development and validation of the core analytical capabilities while maintaining sufficient depth for meaningful comparative analysis.
The research methodology emphasizes efficiency and foundational insights, with data collection and preprocessing focused on creating a high-quality parallel corpus of EU and US regulatory documents. The model's performance will be evaluated through essential metrics including regulatory framework classification accuracy, policy intention recognition, and basic cross-jurisdictional concept alignment. Technical implementation uses Azure ML compute with NC6s_v3 infrastructure, providing the necessary computational power for model development and optimization within the condensed timeframe.
This pilot study will deliver key insights into how two major regulatory approaches to AI governance differ in their linguistic and conceptual frameworks. The analysis will focus on identifying significant patterns in how these jurisdictions express core concepts such as accountability, transparency, and risk management in AI systems. The findings will provide an empirical foundation for understanding how different legal traditions approach AI governance while establishing a methodological framework that can be expanded to include additional jurisdictions and languages in future research.
The project's deliverables will include a validated model for regulatory language analysis, documentation of key linguistic and conceptual patterns identified in EU and US frameworks, and recommendations for scaling the analysis to additional jurisdictions. This pilot serves as a crucial first step in developing more comprehensive tools for understanding how cultural and linguistic differences shape AI governance approaches, while providing immediate practical insights for policymakers working on AI regulation in these key jurisdictions.
The condensed scope enables rapid development and validation of the core analytical framework while establishing a solid foundation for future expansion to additional jurisdictions and more nuanced cross-cultural analysis. This approach balances the need for meaningful results within a limited timeframe with the potential for broader application in future research phases.
Category: Geförderte Forschungsprojekte 2025
Timetable
- 28.10.– 31.12.2024: Application for funding
- 01.–12.01.2025: Internal review of applications and possible queries
- From 13.01.2025:
Announcement of funded projects by e-mail
Setup of Azure environments by the ZID, onboarding of users - From beginning of February 2025: Implementation of the projects
- September 2025: Submission of interim report
- December 2025: Submission of final report
Application requirements
The applicant must:
- have a current employment with the University of Vienna and an active u:account
- be entitled to order Microsoft 365 via the self-service portal
- accept the Privacy Policy and Terms of Use for Microsoft Azure, see servicedesk form Microsoft Azure bestellen (ordering Microsoft Azure, in German)
Conditions of funding
- The team Coordination Digital Transformation of the ZID decides on funding. If necessary, it consults with peer reviewers.
- The amount of funding granted per project will be deducted monthly from the costs incurred for Azure over the period of use until 31.12.2025 on a pro rata basis.
- Costs that exceed the granted funding amount or are incurred after the end of the funding must be covered by a cost centre available for the project.
- The ZID is responsible for setting up the project environment in Azure, for onboarding and assigning user authorisations. Support for the technical implementation of the project is not offered.
- Personnel resources are explicitly not funded.
- Projects that have already been funded in 2024 are excluded from funding.
- After the end of the funding period, the Azure environment provided and the resources contained therein remain available to users. Subsequent use of the services is possible and desired.
Applying for funding
The application deadline for funding has expired.
Funding 2024
Funded research projects
-
Marina Dütsch | FLEXWEB
Organisational unit: Department of Meteorology and Geophysics
Abstract (in German):
Flexpart (FLEXible PARTicle dispersion model) ist ein numerisches Modell, das die Ausbreitung von Gasen und Aerosolen in der Atmosphäre simuliert. Das Ziel dieses Projekts war, ein Web Service (FLEXWEB) entwickeln, bei dem Flexpart über eine Webseite laufen gelassen werden kann. Flexpart sollte mit Hilfe eines Kubernetes Clusters Trajektorien berechnen und den Usern diese Ergebnisse leicht zugänglich machen. Ein wichtiger Schritt dabei war die Containerisierung der Arbeitsschritte und diese mit den relativen großen Eingangsdaten abzustimmen. Eine erste Version von FLEXWEB haben wir erfolgreich zum Laufen gebracht, allerdings ist es uns nicht gelungen, das Service skalierbar zu machen, so dass mehrere User es gleichzeitig nutzen können
Flexpart Entwicklung an der Universität Wien
Category: Geförderte Forschungsprojekte 2024 -
Wolfgang Klas | FactCheck
Organisational unit: Multimedia Information Systems research group, Faculty of Computer Science
Abstract:
FactCheck is an ongoing research project conducted by the Research Group on Multimedia Information Systems within the Faculty of Computer Science. The project aims to identify conflicts in data available on the web. These conflicts may exist in various formats, such as textual (e.g., paragraphs in HTML documents) or multimedia (e.g., news segments in videos). To extract this conflict information, we will employ a combination of Semantic Web approaches (e.g., structured data) and state-of-the-art artificial intelligence technologies (e.g., named entity recognition and entity linking). The comparison processes for this information will involve both human intelligence and feedback, which is why we will also investigate methods for user identity and user management, such as Azure Entra ID. We are considering a hybrid approach to deploying the FactCheck prototypes. This will involve using both scalable Azure services (e.g., cognitive services like AI Video Indexer and user management solutions) and the existing on-premises infrastructure (e.g., virtual machines or databases) at the University of Vienna. This approach aims to achieve an appropriate balance between security, privacy, and cost. Some components may be containerized to ensure that the deployment remains highly flexible and modular, facilitating easier deployment on both Azure and local infrastructure.
Category: Geförderte Forschungsprojekte 2024 -
Oliver Wieder | Revolutionizing Olfactory Perception Mapping: A Contrastive Learning Graph Neural Network Approach
Organisational unit: Department of Pharmaceutical Sciences
Abstract:
Abstract will be added soon
Category: Geförderte Forschungsprojekte 2024 -
Abert Claas | Very Largescale Distributed Micromagnetic Research Tools
Organisational unit: Physics of Functional Materials
Abstract (in German):
Im Rahmen des Projekts wurde untersucht, wie gut sich die Microsoft Azure Cloud für wissenschaftliche Berechnungen im Bereich der Mikromagnetik eignet. Dafür wurden verschiedene virtuelle Maschinen getestet, sowohl mit Prozessoren als auch mit Grafikprozessoren. Ein besonderer Fokus lag auf der Nutzung von kostengünstigen Spot Instanzen, die jedoch gelegentlich unterbrochen werden können. Um die Arbeit mit den Simulationen zu vereinfachen, wurden hilfreiche Tools entwickelt, die Abläufe automatisieren und so Zeit sparen. Während des Projekts konnten außerdem mikromagnetische Simulationen zur Optimierung von magnonischen Geräten erfolgreich durchgeführt werden, die wertvolle Erkenntnisse lieferten. Anstelle ursprünglich geplanter Studien zur Nutzung mehrerer Grafikprozessoren lag der Schwerpunkt darauf, die Cloud-Lösungen umfassend zu bewerten. Zusätzlich wurde eine Bachelorarbeit verfasst, die zur Entwicklung der Automatisierungstools beitrug. Das Projektbudget von 2.000 EUR wurde fast vollständig genutzt, und die Ergebnisse zeigten, dass die Microsoft Azure Infrastruktur ein großes Potenzial für wissenschaftliche Anwendungen bietet.
Category: Geförderte Forschungsprojekte 2024 -
Xin Huang | selscape: Automated and Distributed Pipelines for Investigating the Landscape of Natural Selection from Large-scale Genomic Datasets
Organisational unit: Department of Evolutionary Anthropology
Abstract:
This project developed three Snakemake pipelines for detecting balancing selection, positive selection, and inferring the distribution of fitness effects. Azure Batch was tested for cloud deployment, and the first pipeline was successfully implemented in the cloud. The remaining pipelines are ready for deployment using insights gained from the first pipeline’s testing. Key results include contributions to three studies, showcasing the pipelines' effectiveness in analyzing genomes and exploring genetic diversity. Despite challenges with inadequate documentation for integrating Snakemake with Azure Batch, the project goals were partially achieved, with development carried out conservatively on local servers due to the novelty of cloud integration. Future work will focus on fully deploying all pipelines in the cloud and expanding their applications for large-scale genomic analyses.
Category: Geförderte Forschungsprojekte 2024 -
Dylan Paltra | MULTIREP – Multidimensional Representation: Enabling An Alternative Research Agenda on the Citizen-Politician Relationship
Organisational unit: Department of Government
Abstract:
The “MULTIREP” project aims to enable an alternative approach to studying the citizen-politician relationship. It focuses primarily on how citizens conceptualize representation. A mixed-methods approach combines qualitative methods (focus groups and one-to-one interviews with citizens) and quantitative methods in five countries (ca. 2.000 respondents in each), focusing on natural language processing approaches. In a multinational and multilingual mass survey in five countries, including 10.000 participants, we want to improve the current survey methodology by analyzing respondents’ answers to open-ended questions using different machine-learning approaches. During the funding period, the project team was able to conduct the survey, collecting rich text data from representative samples of the public. The team used the Azure infrastructure to analyze the open-ended text answers preliminary by prompting large-language models. These results complement a theoretically induced coding scheme, which will be used later in the analysis. Besides the already established dimensions of representation, the team found that citizens conceptualize representation very much in formalistic terms. The team plans to continue the usage of Microsoft Azure to thoroughly analyse the open-ended text answers, making use of not only large-language models but also more established natural language processing approaches.
Category: Geförderte Forschungsprojekte 2024 -
Miguel Angel Rios Gaona | Controlled Machine Translation with Large Language Models for the Technical Domain
Organisational unit: Centre for Translation Studies
Abstract:
Large Language Models (LLMs) have shown promising results on machine translation for high resource language pairs and domains. However, in specialised domains (e.g. medical) LLMs have shown lower performance compared to standard neural machine translation models. The consistency in the machine translation of terminology is crucial for users, researchers, and translators in specialised domains. In this study, we compare the performance between baseline LLMs and instruction-tuned LLMs in the medical domain. In addition, we introduce terminology from specialised medical dictionaries into the instruction formatted datasets for fine-tuning LLMs. The instruction-tuned LLMs significantly outperform the baseline models with automatic metrics, and quality estimation. Moreover, the instruction-tuned LLMs produce fewer errors compared to the baseline based on automatic error annotation.
Category: Geförderte Forschungsprojekte 2024
Contact
If you have any questions about funding, please use the Servicedesk form Anfrage zu Microsoft Azure (enquiry about Azure, in German).