Förderung für Forschung mit Azure-Services

Microsoft Azure ist eine Plattform, die unterschiedliche Cloud-Services bereitstellt, etwa virtuelle Server oder Services mit künstlicher Intelligenz. Mitarbeiter*innen der Universität Wien können diese Services für die Forschung kostenpflichtig zu besonderen Konditionen über den ZID nutzen. Mehr Informationen zu Microsoft Azure

Um Forschungsaktivitäten in Azure zu unterstützen, bietet der ZID für das Kalenderjahr 2025 eine finanzielle Förderung an. Insgesamt stehen 20.000,00 Euro zur Verfügung. Pro Projekt werden bis zu 5.000,00 Euro vergeben.

Projekte mit einem der folgenden Merkmale werden bei der Vergabe der Förderung priorisiert:

  • Sie arbeiten mit hybriden Ansätzen (kombinierte Nutzung von Azure-Services mit lokaler Infrastruktur)
  • Sie nutzen Azure-Services mit künstlicher Intelligenz
  • Sie nutzen Azure-Services, für die der ZID keine alternativen IT-Services anbietet

Antragsbedingungen

Der*die Antragsteller*in muss:

  • über ein aufrechtes Dienstverhältnis mit der Universität Wien und über einen aktiven u:account verfügen
  • berechtigt sein, Microsoft 365 über das Selfservice-Portal zu bestellen
  • die Datenschutzbestimmungen und Nutzungsbedingungen für Microsoft Azure akzeptieren, siehe Servicedesk-Formular Microsoft Azure bestellen

Förderungsbedingungen

  • Über die Förderung entscheidet das Team Coordination Digital Transformation des ZID. Bei Bedarf hält es dazu Rücksprache mit Peer Reviewern.
  • Die gewährte Fördersumme pro Projekt wird über die Nutzungsdauer bis 31.12.2025 von den für Azure anfallenden Kosten monatlich aliquot abgezogen.
  • Kosten, die die gewährte Fördersumme überschreiten oder nach Ende der Förderung anfallen, muss eine für das Projekt verfügbare Kostenstelle übernehmen.
  • Der ZID ist für die Einrichtung der Projektumgebung in Azure, das Onboarding und die Vergabe der Nutzer*innenberechtigungen verantwortlich. Unterstützung bei der technischen Umsetzung des Projektvorhabens wird nicht angeboten.
  • Personalressourcen werden ausdrücklich nicht gefördert.
  • Ausgeschlossen von der Förderung sind Projekte, die bereits 2024 gefördert wurden.
  • Nach Ablauf der Förderung bleiben die zur Verfügung gestellte Azure-Umgebung sowie die darin enthaltenen Ressourcen für Nutzer*innen verfügbar. Eine nachfolgende Nutzung der Services ist möglich und erwünscht.

Zeitplan

  • 28.10.– 31.12.2024: Beantragung der Förderung
  • 01.–12.01.2025: Interne Prüfung der Anträge und eventuelle Rückfragen
  • Ab 13.01.2025:
    Bekanntgabe der geförderten Projekte per E-Mail
    Einrichtung der Azure-Umgebungen durch den ZID, Onboarding der Nutzer*innen
  • Ab Anfang Februar 2025: Durchführung der Projekte
  • September 2025: Abgabe Zwischenbericht
  • Dezember 2025: Abgabe Endbericht

Förderung beantragen

Die Antragsfrist für die Förderung ist abgelaufen.

Geförderte Forschungsprojekte

/

  • Susanne Blumesberger, Herbert Van Uffelen | iConTxt – AI for the DLBT

    Organisational unit: Vienna University Library and Archive Services, Library - Research and Publication Services

    Abstract:

    For several years, a digital library and a digital bibliography on literature in translation have been established at the Vienna University Library (Department of Repository Management PHAIDRA Services) under the name DLBT. The Phaidra-add-on DLBT uses the YARM package as technical infrastructure and Phaidra as a long-term repository.

    The DLBT not only collects data on translated literature, but also texts (and other assets) on their reception. Currently, over 65,000 translations, adaptations and reception documents are listed in the DLBT and more than 24,000 digital copies are available for research.

    As part of the project ‘iConTxt - AI for the DLBT’ (7 European partners, funded by the Taalunie), led by Herbert Van Uffelen, the possibilities and limits of using artificial intelligence for the DLBT are being explored and a new YARM package (iConTxt) is being developed for the DLBT. The new package uses AI to support users in entering data and to improve the quality and accessibility of the information available in the DLBT. During the quality improvement process, the metadata are checked and ‘improved OCRs’ of the scanned texts are created.

    iConTxt creates further English translations and English summaries for every reception text in the DLBT and creates relationships to the translations listed in the DLBT. In combination with the results of specific web queries, the DLBT-specific ‘knowledge pool’ generated by iConTxt will be used for Retrieval Augmented Generation of new information on authors, translators, publishers and translated titles mentioned in the DLBT.

  • Can Çelebi | ExpBotEval: Advancing Robust Instruction Bots for Experimental Economics

    Organisational unit: Faculty of Business, Economics and Statistics - Vienna Center for Experimental Economics

    Abstract: This project aims to develop and validate a robust framework for stress testing chatbots powered by large language models (LLMs) designed to serve as instruction bots in experimental economics. These bots are intended to provide participants with accurate and unbiased instructions, ensuring experimental integrity and minimizing noise in collected data. While general-purpose tools exist for stress testing commercial chatbots, the unique demands of experimental economics require the development of a specialized stress testing protocol.

    The framework involves the creation of two LLM-based agents: the Stress Tester Bot (STB) and the Evaluator Bot (EB). STB acts as a synthetic subject, attempting to “break” the Instruction Bot (IB) by eliciting biased, false, or irrelevant responses. EB evaluates these interactions, identifying problematic outputs from IB to ensure its reliability and robustness.

    Two potential approaches for developing STB and EB are considered. The first approach leverages multi-agent pipeline structures using larger, off-the-shelf LLMs (e.g., 4o or o1), prioritizing accessibility, flexibility, and scalability. This method eliminates the need for fine-tuning and data collection, making it an appealing and computationally efficient option for the broader social science community. However, whether this approach can achieve desirable performance for stress testing without the need for fine-tuning and data collection remains an open empirical question.

    The second approach involves fine-tuning smaller LLMs (e.g., 4o-mini) on data collected from incentivized experiments where subjects are instructed to “break” the IB. While this approach is likely to yield more reliable and tailored outcomes, it requires extensive data collection, computational resources, and fine-tuning for each specific experiment, potentially limiting its scalability and accessibility, particularly for resource-constrained researchers.

    The project will initially pursue the multi-agent pipeline approach and only consider fine-tuning if the first approach fails to meet reliability standards. Funding is sought to access computational resources, via Azure infrastructure and API credits, to support development and testing. This research addresses a critical gap in the field, providing a scalable and objective standard for stress testing LLM-based tools in experimental economics, ultimately enhancing the reliability of experimental outcomes and promoting the broader adoption of instructional chatbots in social science research.

  • Laura Gandlgruber | u:know

    Organisational unit: Teaching Affairs and Student Services

    Abstract:

    This project proposes the development of a pilot knowledge database to provide University of Vienna students with quick and accurate answers to their study-related questions. The platform will leverage information from the existing university website, “studieren.univie.ac.at”, and will be built using Azure Cognitive Search and the Microsoft Power Platform. This approach aims to improve information accessibility, reduce time spent searching for answers, and enhance the overall student experience.

  • Martin Gasteiner | Impacts with Abstracts & Talking Abstracts: Echoes of Knowledge

    Organisational units:

    • Vienna University Library and Archive Services – Department for Bibliometrics and Publication Strategies,
    • Vienna University Library and Archive Services – Library-Communications,
    • Zentraler Informatikdienst – IT-Support for Research

    Abstract:

    The “Impacts with Abstracts” project delves into the transformative potential of AI in analyzing and reshaping scientific abstracts to create innovative services for the University of Vienna. This initiative is driven by a cross-departmental team: Janos Bekesi (IT Support), Martin Gasteiner (PHAIDRA, UB Communications), and Christian Gumpenberger and Lothar Hölbling (Bibliometrics and Publication Strategies). Their goal is to enhance the accessibility and usability of scientific content for both internal and external stakeholders.

    A key feature of this project is the podcast “Talking Abstracts: Echoes of Knowledge”, which distills complex research findings into engaging content making it accessible to various audiences. Internally it helps in the efficient identification of relevant content for Communications, thereby enhancing the university’s outreach efforts. It promotes interdisciplinary networking and supports the rectorate in strategic planning and decision-making.

    “Impacts with Abstracts” integrates AI technology with science communication, enhancing the visualization and dissemination of university research outputs.

  • Fares Kayali | Technical Feasibility for Lab Chatbot CE Lab

    Organisational unit: Centre for Teacher Education – Department for Teacher Education

    Abstract:

    The Computational Empowerment Lab is part of the Digitalisation in Education unit and is located at the Centre for Teacher Education at the University of Vienna. The Lab sees itself as an intellectual and practical space for the realisation of ideas, research and projects to promote computational empowerment. This is done using a wide range of technologies. In the first stage of the project, a self-trained chatbot (self-trained text-based AI) will be developed and implemented, which is specially designed for questions about CE-Lab equipment and CE-Lab questions from students. This chatbot is intended to provide students with fast and reliable help by responding to frequently asked questions and problems in the CE-Lab environment. By integrating this chatbot into the CE-Lab, the existing support offer for the use of the various technologies is to be expanded and direct and efficient support for students is to be ensured, resulting in an improved learning experience and a reduction in the workload of teaching staff. In the second stage of the project, which is considered a “nice to have”, didactic recommendations around the CE-Lab will be developed on the basis of (own) scientific and didactic literature, specialised articles and brochures (e.g. the Computational Empowerment in Practice brochure). These recommendations are intended to help students gain a deeper understanding of the use and handling of lab technologies. The reason for the realisation of this project lies in the existing bias, lack of transparency and lack of comprehensibility of existing models. This project aims to create a pilot for a specialised educational LLM (Large Language Model) that addresses these challenges and provides transparent and comprehensible support in the education sector.

     

  • Thomas Kohlwein | Unlocking library heritage with data from historic catalogues

    Organisational unit: Faculty of Philological and Cultural Studies – Department of German Studies

    Abstract:

    Libraries hold a treasure trove of historic catalogues describing their collections. But since many of them are in the handwriting style of their times, they are not easily accessible anymore to readers in general. While many libraries created new catalogue data based on automation to create the online systems we know today, many libraries with an especially high number of historic material still suffer of low visibility of their collections to the public. This project will use Computer Vision API to achieve both a higher rate of identification of individual books and a systematic collection of historic metadata: How books were historically arranged by topic, location, etc. and what specific information libraries have on items like donations or book plates. In making large library collections visible we unlock the stories hidden in the stacks to appreciate our common heritage.

    Project website: germ.univie.ac.at/projekt/kwk/

  • Ema Kusen | Hybrid Azure-Powered Detection of High-Risk Texts

    Organisational unit: Faculty of Informatics – Research Group Security and Privacy

    Abstract:

    The detection of grievances in social media texts is critical for understanding their role in shaping public discourse and their potential to escalate into collective unrest or harmful actions. Grievances are psychological responses to perceived injustices and have been shown to be precursors to radicalization or mobilization for contentious actions (Scrivens, 2022). This project develops a hybrid AI framework that combines Microsoft Azure’s Cognitive Services, including Text Analytics and the Azure OpenAI Service, with on-premises infrastructure to detect grievances in high-risk social media texts. This approach ensures scalability, flexibility, and compliance with data protection regulations. By integrating Azure Machine Learning for model training and deployment, and utilizing Azure’s Explainable AI tools, the project prioritizes interpretability and transparency in detecting grievance-related patterns. The outcomes will provide actionable insights into grievance propagation and its impact on digital discourse, contributing to computational social science and the ethical design of AI systems. This framework will also serve as a replicable model for combining cloud-based AI and local infrastructure in high-stakes text analysis applications, providing valuable insights for policymakers, platform moderators, and researchers in managing online discourse.

  • Mauricio Martins | Building and Evaluating Historical Large Language Models for Understanding Psychological Dimensions in Fiction

    Organisational unit: Faculty of Psychology – Department of Cognition, Emotion, and Methods in Psychology

    Abstract:

    Large-scale surveys are crucial for understanding the cultural dynamics of populations over time. However, applying these tools to study long-term historical trends remains a recent development. Historical psychology has emerged as a field that uses past fiction as “cognitive fossils” to uncover insights into societal values, emotions, and moral frameworks of bygone eras [1].

    Traditional computational methods, such as bag-of-words and topic modeling, have revealed intriguing trends, such as rising sentiments towards cooperation preceding the French Revolution and English Civil War, followed by a decline [2]. However, these methods fail to capture the contextual and nuanced use of language in historical texts [3]. Advances in large language models (LLMs) now offer tools to overcome these limitations. While foundational LLMs excel in modern language analysis, they are not optimized for historical texts, where linguistic conventions differ. Recent theoretical work proposes that these challenges can be addressed by developing Historical Large Language Models (HLLMs) [4].

    This project aims to develop and compare two approaches to analyzing historical texts:

    1. Fine-Tuning LLMs with Historical Corpora: This involves building HLLMs tailored to English, French, and German fiction from the 16th to 19th centuries [4]. It involves modifying the model’s parameters using techniques like LoRA (Low-Rank Adaptation) and PEFT (Parameter-Efficient Fine-Tuning) to adapt the foundation model efficiently to task-specific needs. Using Azure’s infrastructure, we will fine-tune the open-source Llama 3.3-70B on a pre-processed corpus of ~15 million tokens. The fine-tuning process will leverage NVIDIA A100 GPU (e.g., NC24ads A100 v4). After fine-tuning, the HLLM will classify texts across psychological dimensions, replicating Martins & Baumard (2020) [2] - which used bags of words.
    2. Optimizing Prompt Engineering: This involves leveraging GPT-4 - the gold standard tool for text annotation in psychological sciences [5] - with refined instructions to analyze historical corpora without retraining [6], such as requesting that the annotation consider the year the text was produced. Prompt optimization will utilize Azure OpenAI Services to refine instructions for the LLMs iteratively. The context window constrains this approach and relies on precise and iterative prompt refinement.

    This project will deliver a paper that describes both a methodological advancement and a replication of Martins & Baumard (2020). It will comprehensively evaluate fine-tuning versus prompt engineering across several dimensions: a) Comparison Against Student Classifications: The outputs of both approaches will be validated against human-annotated classifications of 3000 sentences. Metrics such as precision, recall, and F1 scores will quantify alignment with human evaluations, providing a precise measure of accuracy. b) Comparison Against Bag-of-Words Benchmark: The results from both approaches will be compared with the findings from the Martins & Baumard (2020) [2] study on democratic sentiments, which utilized traditional Bag-of-Words methods. c) Comparison of Cost and Time Efficiency

    References 

    [1].          Baumard, N., Safra, L., Martins, M. & Chevallier, C. Cognitive fossils: Using cultural artifacts to reconstruct psychological changes throughout history. Trends in Cognitive Sciences (2024).

    [2].          Martins, M. D. & Baumard, N. The rise of prosociality in fiction preceded democratic revolutions in Early Modern Europe. Proc Natl Acad Sci USA 117, 28684 (2020).

    [3].          Martins, M. D. & Baumard, N. How to Develop Reliable Instruments to Measure the Cultural Evolution of Preferences and Feelings in History? Frontiers in Psychology 13, (2022).

    [4].          Varnum, M. E. W., Baumard, N., Atari, M. & Gray, K. Large Language Models based on historical text could offer informative tools for behavioral science. Proceedings of the National Academy of Sciences 121, e2407639121 (2024).

    [5].          Rathje, Steve, Dan-Mircea Mirea, Ilia Sucholutsky, Raja Marjieh, Claire E. Robertson, and Jay J. Van Bavel. "GPT is an effective tool for multilingual psychological text analysis." Proceedings of the National Academy of Sciences 121, no. 34 (2024): e2308950121.

    [6].          Dubourg, E., Thouzeau, V. & Baumard, N. A step-by-step method for cultural annotation by LLMs. Front. Artif. Intell. 7, (2024).

  • Stefan Wagner | EduBot: Evaluating an AI Teaching Assistant for Enhanced Learning Outcomes

    Organisational unit: Faculty of Business, Economics and Statistics – Department of Accounting, Innovation and Strategy

    Abstract:

    We intend to develop a prototype of a teaching assistant chatbot using Microsoft Azure infrastructure. The proposed project aims to integrate cutting-edge AI technologies into the teaching process, to enhance student engagement and learning outcomes. By leveraging Azure OpenAI Service, Azure Bot Service, and Azure App Service, the chatbot will be designed to fulfill the core functions of a teaching assistant, providing timely responses to student queries, offering tailored academic guidance, and supporting administrative tasks. It will be trained on teaching materials available to students and provide library access via ResearchGate and JSTOR. The primary goal for 2025 is to develop and evaluate a functional prototype that reliably performs the intended tasks. This initial phase will involve designing, deploying, and refining the chatbot using Microsoft Azure’s robust suite of AI and cloud services. Building upon the insights gained during the prototyping stage, the project’s subsequent phase will focus on investigating the educational impacts of deploying such a chatbot in a real-world classroom environment. By the end of 2025, I aim to initiate a systematic study to observe student interactions with the chatbot and evaluate the resulting learning outcomes. Using carefully crafted experimental variations, the study will identify causal effects of chatbot use on student performance and engagement. This research phase, planned for 2026, will provide empirical evidence regarding the efficacy of AI-powered teaching assistants in higher education.

  • Anna Weichselbraun | Analyzing the language of AI regulation

    Organisational unit: Faculty of Historical and Cultural Studies – Department of European Ethnology

    Abstract:

    The Regulatory Language Analyzer Pilot Project presents a focused investigation into how different jurisdictions conceptualize AI governance through their regulatory frameworks, concentrating specifically on comparative analysis between EU and US approaches. It is intended as a preliminary methodological exploration for an ERC Consolidator Grant proposal to be submitted by the PI in January 2025. This 7-month study leverages Azure's ML infrastructure to develop and validate a specialized language model for analyzing regulatory AI policy documents, serving as a proof-of-concept for larger-scale cross-cultural policy analysis.

    The project employs a streamlined fine-tuning process using Azure OpenAI GPT-3.5 as the base model, focusing on English-language regulatory documents from the EU AI Act and US federal and state-level AI frameworks. Through a single-stage fine-tuning approach, the model will be optimized to identify and analyze key regulatory patterns, linguistic features, and policy concepts specific to AI governance. This targeted approach allows for rapid development and validation of the core analytical capabilities while maintaining sufficient depth for meaningful comparative analysis.

    The research methodology emphasizes efficiency and foundational insights, with data collection and preprocessing focused on creating a high-quality parallel corpus of EU and US regulatory documents. The model's performance will be evaluated through essential metrics including regulatory framework classification accuracy, policy intention recognition, and basic cross-jurisdictional concept alignment. Technical implementation uses Azure ML compute with NC6s_v3 infrastructure, providing the necessary computational power for model development and optimization within the condensed timeframe.

    This pilot study will deliver key insights into how two major regulatory approaches to AI governance differ in their linguistic and conceptual frameworks. The analysis will focus on identifying significant patterns in how these jurisdictions express core concepts such as accountability, transparency, and risk management in AI systems. The findings will provide an empirical foundation for understanding how different legal traditions approach AI governance while establishing a methodological framework that can be expanded to include additional jurisdictions and languages in future research.

    The project's deliverables will include a validated model for regulatory language analysis, documentation of key linguistic and conceptual patterns identified in EU and US frameworks, and recommendations for scaling the analysis to additional jurisdictions. This pilot serves as a crucial first step in developing more comprehensive tools for understanding how cultural and linguistic differences shape AI governance approaches, while providing immediate practical insights for policymakers working on AI regulation in these key jurisdictions.

    The condensed scope enables rapid development and validation of the core analytical framework while establishing a solid foundation for future expansion to additional jurisdictions and more nuanced cross-cultural analysis. This approach balances the need for meaningful results within a limited timeframe with the potential for broader application in future research phases.

Förderung 2024

Geförderte Forschungsprojekte

/

  • Marina Dütsch | FLEXWEB

    Organisationseinheit: Institut für Meteorologie und Geophysik

    Abstract:

    Flexpart (FLEXible PARTicle dispersion model) ist ein numerisches Modell, das die Ausbreitung von Gasen und Aerosolen in der Atmosphäre simuliert. Das Ziel dieses Projekts war, ein Web Service (FLEXWEB) entwickeln, bei dem Flexpart über eine Webseite laufen gelassen werden kann. Flexpart sollte mit Hilfe eines Kubernetes Clusters Trajektorien berechnen und den Usern diese Ergebnisse leicht zugänglich machen. Ein wichtiger Schritt dabei war die Containerisierung der Arbeitsschritte und diese mit den relativen großen Eingangsdaten abzustimmen. Eine erste Version von FLEXWEB haben wir erfolgreich zum Laufen gebracht, allerdings ist es uns nicht gelungen, das Service skalierbar zu machen, so dass mehrere User es gleichzeitig nutzen können.

    Flexpart Entwicklung an der Universität Wien

  • Wolfgang Klas | FactCheck

    Organisationseinheit: Forschungsgruppe Multimedia Information Systems, Fakultät für Informatik

    Abstract:

    FactCheck ist ein laufendes Forschungsprojekt der Forschungsgruppe Multimedia Information Systems der Fakultät für Informatik, das darauf abzielt, Konflikte innerhalb von Web-Daten zu identifizieren. Diese Informationen über Konflikte, die in textueller Form (z. B. Absätze in einem HTML-Dokument) oder in multimedialer Form (z. B. Nachrichtenbeiträge in einem Video) vorliegen können, sollen mithilfe einer Kombination von Ansätzen aus dem Semantic Web (z. B. strukturierte Daten) und modernen KI-Technologien und -Konzepten (z. B. Named Entity Recognition oder Entity Linking) extrahiert werden. Die Vergleichsprozesse für diese Informationen werden teilweise durch menschliche Intelligenz und menschliches Feedback gestützt, weshalb auch Ansätze für Benutzeridentitäten und Benutzerverwaltung (z. B. Azure Entra ID) untersucht werden. Für die Bereitstellung des FactCheck-Prototyps wird ein hybrider Ansatz in Betracht gezogen, der die Nutzung sowohl skalierbarer Azure-Dienste (z. B. kognitive Dienste wie AI Video Indexer oder Benutzerverwaltung) als auch vorhandener lokaler Infrastruktur (z. B. VMs oder Datenbanken) an der Universität Wien ermöglicht, um geeignete Kompromisse in Bezug auf Sicherheit, Datenschutz und Kosten zu erreichen. Um das Deployment hochgradig flexibel und modular zu halten, können Teile davon containerisiert werden, was den Prozess sowohl auf Azure als auch auf lokaler Infrastruktur vereinfacht.

  • Oliver Wieder | Revolutionizing Olfactory Perception Mapping: A Contrastive Learning Graph Neural Network Approach

    Organisationseinheit: Department für Pharmazeutische Wissenschaften

    Abstract:

    Abstract folgt

  • Abert Claas | Very Largescale Distributed Micromagnetic Research Tools

    Organisationseinheit: Institut Physik Funktioneller Materialien

    Abstract:

    Im Rahmen des Projekts wurde untersucht, wie gut sich die Microsoft Azure Cloud für wissenschaftliche Berechnungen im Bereich der Mikromagnetik eignet. Dafür wurden verschiedene virtuelle Maschinen getestet, sowohl mit Prozessoren als auch mit Grafikprozessoren. Ein besonderer Fokus lag auf der Nutzung von kostengünstigen Spot Instanzen, die jedoch gelegentlich unterbrochen werden können. Um die Arbeit mit den Simulationen zu vereinfachen, wurden hilfreiche Tools entwickelt, die Abläufe automatisieren und so Zeit sparen. Während des Projekts konnten außerdem mikromagnetische Simulationen zur Optimierung von magnonischen Geräten erfolgreich durchgeführt werden, die wertvolle Erkenntnisse lieferten. Anstelle ursprünglich geplanter Studien zur Nutzung mehrerer Grafikprozessoren lag der Schwerpunkt darauf, die Cloud-Lösungen umfassend zu bewerten. Zusätzlich wurde eine Bachelorarbeit verfasst, die zur Entwicklung der Automatisierungstools beitrug. Das Projektbudget von 2.000 EUR wurde fast vollständig genutzt, und die Ergebnisse zeigten, dass die Microsoft Azure Infrastruktur ein großes Potenzial für wissenschaftliche Anwendungen bietet.

  • Xin Huang | selscape: Automated and Distributed Pipelines for Investigating the Landscape of Natural Selection from Large-scale Genomic Datasets

    Organisationseinheit: Department für Evolutionäre Anthropologie

    Abstract:

    This project developed three Snakemake pipelines for detecting balancing selection, positive selection, and inferring the distribution of fitness effects. Azure Batch was tested for cloud deployment, and the first pipeline was successfully implemented in the cloud. The remaining pipelines are ready for deployment using insights gained from the first pipeline’s testing. Key results include contributions to three studies, showcasing the pipelines' effectiveness in analyzing genomes and exploring genetic diversity. Despite challenges with inadequate documentation for integrating Snakemake with Azure Batch, the project goals were partially achieved, with development carried out conservatively on local servers due to the novelty of cloud integration. Future work will focus on fully deploying all pipelines in the cloud and expanding their applications for large-scale genomic analyses.

  • Dylan Paltra | MULTIREP – Multidimensional Representation: Enabling An Alternative Research Agenda on the Citizen-Politician Relationship

    Organisationseinheit: Institut für Staatswissenschaft

    Abstract:

    The “MULTIREP” project aims to enable an alternative approach to studying the citizen-politician relationship. It focuses primarily on how citizens conceptualize representation. A mixed-methods approach combines qualitative methods (focus groups and one-to-one interviews with citizens) and quantitative methods in five countries (ca. 2.000 respondents in each), focusing on natural language processing approaches. In a multinational and multilingual mass survey in five countries, including 10.000 participants, we want to improve the current survey methodology by analyzing respondents’ answers to open-ended questions using different machine-learning approaches. During the funding period, the project team was able to conduct the survey, collecting rich text data from representative samples of the public. The team used the Azure infrastructure to analyze the open-ended text answers preliminary by prompting large-language models. These results complement a theoretically induced coding scheme, which will be used later in the analysis. Besides the already established dimensions of representation, the team found that citizens conceptualize representation very much in formalistic terms. The team plans to continue the usage of Microsoft Azure to thoroughly analyse the open-ended text answers, making use of not only large-language models but also more established natural language processing approaches.

  • Miguel Angel Rios Gaona | Controlled Machine Translation with Large Language Models for the Technical Domain

    Organisationseinheit: Zentrum für Translationswissenschaft

    Abstract:

    Large Language Models (LLMs) have shown promising results on machine translation for high resource language pairs and domains. However, in specialised domains (e.g. medical) LLMs have shown lower performance compared to standard neural machine translation models. The consistency in the machine translation of terminology is crucial for users, researchers, and translators in specialised domains. In this study, we compare the performance between baseline LLMs and instruction-tuned LLMs in the medical domain. In addition, we introduce terminology from specialised medical dictionaries into the instruction formatted datasets for fine-tuning LLMs. The instruction-tuned LLMs significantly outperform the baseline models with automatic metrics, and quality estimation. Moreover, the instruction-tuned LLMs produce fewer errors compared to the baseline based on automatic error annotation.

Zeitplan Förderung 2024

  • 01.11.– 31.12.2023: Beantragung der Förderung
  • 01.–14.01.2024: Interne Prüfung der Anträge und eventuelle Rückfragen
  • Ab 16.01.2024: Bekanntgabe der geförderten Projekte per E-Mail
  • 17.01.–31.01.2024: Einrichtung der Azure-Umgebungen durch den ZID, Onboarding der Nutzer*innen
  • 01.02.–31.07.2024: Durchführung der Projekte
  • 01.08.–30.09.2024: Abgabe der Projektberichte

Kontakt

Bei Fragen zur Förderung steht Ihnen das Servicedesk-Formular Anfrage zu Microsoft Azure zur Verfügung.