Funding for research with Azure services

Microsoft Azure is a platform providing various cloud services, such as virtual servers or services with artificial intelligence. Employees of the University of Vienna may use these services provided by the ZID for the purposes of research, subject to a fee at special conditions. More information about Microsoft Azure

In order to support research activities in Azure, the ZID offers financial support for the summer semester 2024. A total of 22,000.00 euros is available. Up to 4,000.00 euros will be awarded per project.

Projects with one of the following characteristics will be prioritised when the funding is awarded:

  • they use Azure services for which the ZID does not offer alternative IT services
  • they work with hybrid approaches (combined use of Azure services with local infrastructure)

Funded research projects

Funding amount 4,000.00 Euro

/

  • Marina Dütsch | FLEXWEB

    Organisational unit: Department of Meteorology and Geophysics

    Abstract (in German):

    Flexpart (FLEXible PARTicle dispersion model) ist ein numerisches Modell, das die Ausbreitung von Gasen und Aerosolen in der Atmosphäre simuliert. Das Modell wird am Institut für Meteorologie und Geophysik weiterentwickelt, und kommt in verschiedenen internationalen und nationalen Forschungsprojekten zum Einsatz. Einige Anwendungsfälle sind z.B. die Bestimmung von Treibhausgas-Emissionen oder Transport von Mikroplastik, sowie Ausbreitungsrechnungen bei nuklearen Störfällen (z.B. CTBTO).

    Damit Flexpart verwendet werden kann, muss es auf einem (Super-)computer installiert und ausgeführt werden. Das ist allerdings mit Hürden verbunden, denn einerseits haben nicht alle Wissenschaftler*innen Zugang zu einem Supercomputer, und andererseits gibt es bei der Installation oder Ausführung oft technische Probleme. In diesem Projekt wollen wir deshalb ein Flexpart Web Service (FLEXWEB) entwickeln, bei dem Flexpart über eine Webseite laufen gelassen werden kann.

    Das Projekt soll ein Testprojekt für ein späteres operationelles Service sein. Flexpart soll mit Hilfe eines Kubernetes Clusters in der Cloud Trajektorien berechnen und den Usern diese Ergebnisse leicht zugänglich machen. Sobald die Simulation fertig ist, sollen die Output-Dateien zum Download bereitgestellt und graphisch dargestellt werden. Damit hoffen wir, den Zugang zu Flexpart für Wissenschaftler*innen weltweit zu vereinfachen.

    Flexpart Entwicklung an der Universität Wien

  • Wolfgang Klas | FactCheck

    Organisational unit: Multimedia Information Systems research group, Faculty of Computer Science

    Abstract:

    FactCheck is an internal research project of the Research Group Multimedia Information Systems, Faculty of Computer Science, that aims to compare and signal conflicts in information available on the Web. This information, which may be available in textual form (e.g., paragraphs in an HTML document) or multimedia form (e.g., news segments in video form), shall be extracted using a combination of approaches from the Semantic Web (e.g., structured data) and state-of-the-art AI technologies and concepts (e.g., named entity recognition or entity linking). The comparison processes for this information will be partially driven by human intelligence and human feedback, which is why approaches for user identities and user management (e.g., Azure Entra ID) will also be investigated. For the deployment of the FactCheck prototype(s), a hybrid approach is considered, which allows for the use of both scalable Azure services (e.g., cognitive services like AI Video Indexer and user management) as well as available on-premises infrastructure (e.g., VMs or databases) at the University of Vienna to achieve suitable tradeoffs in terms of security, privacy, and costs. To keep the deployment highly flexible and modular, parts of this deployment may be containerized, thus simplifying deployment on both Azure and local infrastructure. 

  • Oliver Wieder | Revolutionizing Olfactory Perception Mapping: A Contrastive Learning Graph Neural Network Approach

    Organisational unit: Department of Pharmaceutical Sciences

    Abstract:

    This project proposes a groundbreaking approach to understanding olfactory perceptions by developing a novel computational model that maps chemical structures to olfactory characteristics. Leveraging the advanced techniques of contrastive learning and graph neural networks (GNNs), the project aims to overcome the limitations of current olfactory perception studies, which predominantly rely on subjective human olfactory tests. The core objective is to create a GNN model that accurately represents the complex geometries and properties of small molecules in an embedding space. This space will then be used to fine-tune an odor classifier, significantly enhancing its predictive accuracy. A key innovation of this project is the integration of attention mechanisms to elucidate the role of functional groups in odor perception, a facet largely unexplored in existing research. A significant outcome of this project will be the development of an interactive online dashboard. This platform will enable industry professionals and researchers to visualize and interact with the olfactory map, inputting their compounds and receiving insights into their olfactory characteristics. This tool is expected to have substantial applications in various industries, particularly in the development of products like mosquito repellants. Backed by promising literature in the fields of contrastive learning of small molecules and deep-learning approaches to odor mapping, this project stands on the cusp of a significant breakthrough in olfactory science. It promises not only to advance our fundamental understanding of how chemical structures translate into olfactory experiences but also to transform industries that rely on these insights.

Funding amount 2,000.00 Euro

/

  • Abert Claas | Very Largescale Distributed Micromagnetic Research Tools

    Organisational unit: Physics of Functional Materials

    Abstract:

    In the context of the FWF standalone project "Very Largescale Distributed Micromagnetic Research Tools" (P 34671) we are developing algorithms for the distributed solution of micromagnetic problems on multi-GPU systems. First tests on our group-owned workstation with 4xA100 Nvidia GPUs as well as the VSC5 nodes with 2xA100 GPUs show promising results. However, in order to perform a comprehensive scaling study, we ask for GPU computing hours on the Azure cluster, which features fat GPU nodes with 8xA100 Nvidia GPUs. Our planned study requires 100 hours of the largest GPU VM instance "ND96amsr A100 v4" and will allow us to investigate both scaling of our algorithm on single instances as well as distributed multi-GPU instances. Hence, we ask for a funding of 100 x 32.7 $ = 3270.00 $ in order to carry out our numerical study.

  • Xin Huang | selscape: Automated and Distributed Pipelines for Investigating the Landscape of Natural Selection from Large-scale Genomic Datasets

    Organisational unit: Department of Evolutionary Anthropology

    Abstract:

    Natural selection plays a pivotal role in evolutionary processes. With the increasing availability of genomic datasets across various species and populations, studying the genomic imprints of natural selection is crucial for understanding evolutionary histories and conserving biodiversity. However, the burgeoning size of these datasets, coupled with the plethora of computational tools available, can overwhelm researchers, especially given the limited computing resources often available for exploring the numerous modes of natural selection. Here, we aim to implement a curated suite of established software tools for detecting and quantifying signals and intensities of natural selection within large-scale genomic datasets. Our proposed pipelines offer a comprehensive, automated analysis workflow, from data preparation to result visualization. Designed for implementation using Snakemake, a versatile workflow management system, these pipelines ensure scalable and reproducible analysis across diverse computing environments, including high-performance computing clusters and cloud infrastructures. Initially developed on our local Life Science Compute Cluster (LiSC), we plan to extend and test these pipelines for cloud deployment via Azure Batch, which provides native support for Snakemake. Our intermediate goal is to apply these pipelines to the UK Biobank dataset, the largest whole-genome dataset in the world, comprising 500,000 genomes. We aim to benchmark our pipelines and investigate the landscape of natural selection within British populations. Finally, the implementation of this workflow on cloud infrastructures can be utilized for analyzing massive genomic datasets from various species, offering new insights into how natural selection shapes the biodiversity of our world.

  • Dylan Paltra | MULTIREP – Multidimensional Representation: Enabling An Alternative Research Agenda on the Citizen-Politician Relationship

    Organisational unit: Department of Government

    Abstract:

    The “MULTIREP” project aims to enable an alternative approach to studying the citizen-politician relationship. It focuses primarily on how citizens conceptualize representation. A mixed-methods approach combines qualitative methods (focus groups and one-to-one interviews with citizens) and quantitative methods in five countries (ca. 2.000 respondents in each), focusing on natural language processing approaches. In a multinational and multilingual mass survey in five countries, including 10.000 participants, we want to improve on current survey methodology by analyzing respondents’ answers in real-time to provide tailored probing questions. We will use several cloud computing instances during the data collection, accessed from the survey platform via web services. To evaluate respondents’ answers, we will implement several NLP algorithms such as language detection, mBERT, and Flesch’s reading ease score, among others. After the data collection, we want to examine the survey answers through different language models like mBERT and our implementation of a large language model (Llama) to classify citizens’ text answers. These models must be additionally trained and fine-tuned based on existing models for our use case. For this, cloud computing instances are necessary, especially with GPU; otherwise, the computation costs would be very high. Llama especially requires a GPU instance. Additionally, we might access the Microsoft Translator Services depending on the developments in our research process. We aim to classify citizens’ answers to our open-ended questions. Here, we want to categorize how citizens conceptualize different dimensions of representation. Additionally, we would like to access Azure’s developed speaker recognition service to transcribe our focus group and one-to-one interviews. This is standard practice when applying qualitative methods. To the best of our knowledge, the real-time evaluation of survey answers by machine learning algorithms has yet to be adopted in current social science research. Therefore, the implications and contributions of this work could be far-reaching, as a successful implementation of our study through functions offered by Azure would open up new avenues in survey implementation for both respondents and researchers. The delivery of the survey through these means would mimic a humanassisted interaction in the questioning and prompting phases of the survey, which would be far more expensive to achieve through traditional channels of computer-assisted web or telephone interviewing. Finally, it would enhance our analytical capabilities on mass-collected open-ended data to a new standard for social science research.

  • Miguel Angel Rios Gaona | Controlled Machine Translation with Large Language Models for the Technical Domain

    Organisational unit: Centre for Translation Studies

    Abstract:

    Current state-of-the-art Neural Machine Translation (NMT) models and Large Language Models (LLM) have shown promising results on machine translation of high resource language pairs [5, 3]. However, in a high-risk and low-resource domain, like the technical domain (e.g. clinical notes, or engineering manuals), the accurate translation of terminology and correct document structure is crucial for exchanging information across international healthcare providers or researchers [6]. Moreover, the introduction of terminology and document structure constraints into neural models via instructions are currently an open problem [4, 11]. For example, controlled generation in MT output translations with the correct medical terms, length, or grammar compared to human translations. Our goal is to incorporate terminology and document structure constraints into a LLM. We plan to add a dictionary of technical terms and in-domain technical data as instructions for fine-tuning a pre-trained model based on FLAN-T5 [11] or LLaMA [10]. We will study different strategies for adding dictionaries and constraints into LLMs, e.g. source constraints and instruction fine-tuning [4, 11]. We will test the proposed model on the English-German and German-English language pairs with medical and scientific paper abstracts [6, 1]. We will evaluate with automatic metrics [7, 8], and in-house human experts [9]. We plan to use one A100 40 GB GPU or V100 32 GB GPU for tuning our proposed model and compare it with related work. We require GPUs to develop our model, NMT baselines, and instruction fine-tune related work (e.g. FLAN-T5).


    Project timeline:

    • NMT and LLM baselines, 01.02.24 to 01.03.24
    • LLM instruction fine-tuning, 02.03.24 to 01.07.24
    • Manual error annotation, 15.06.24 to 31.07.24
    • Draft paper, 0.1.06.24 to 15.08.2024
    • Project report, 01.08.24 to 30.09.24


    Project outcomes:

    • Paper submitted to a peer-reviewed publication;
    • Project report;
    • Open source code and models.


    References:

    1. Alam, M., Kvapilíková, I., Anastasopoulos, A., Besacier, L., Dinu, G., Federico, M., Gallé, M., Jung, K.W., Koehn, P., & Nikoulina, V. (2021). Findings of the WMT Shared Task on Machine Translation Using Terminologies. Conference on Machine Translation.

    2. Alves, D.M., Guerreiro, N.M., Alves, J., Pombal, J.P., Rei, R., Souza, J.G., Colombo, P., & Martins, A. (2023). Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning. ArXiv, abs/2310.13448.

    3. Bawden, R., & Yvon, F. (2023). Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM. European Association for Machine Translation Conferences/Workshops.

    4. Exel, M., Buschbeck-Wolf, B., Brandt, L., & Doneva, S. (2020). Terminology-Constrained Neural Machine Translation at SAP. EAMT.

    5. Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F.B., Wattenberg, M., Corrado, G.S., Hughes, M., & Dean, J. (2017). Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics, 5, 339-351.

    6. Neves, M.L., Jimeno-Yepes, A., Névéol, A., Grozea, C., Siu, A., Kittner, M., & Verspoor, K.M. (2018). Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets. WMT.

    7. Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). Bleu: a Method for Automatic Evaluation of Machine Translation. ACL.

    8. Rei, R., Stewart, C.A., Farinha, A.C., & Lavie, A. (2020). COMET: A Neural Framework for MT Evaluation. EMNLP.

    9. Rios, M., Chereji, R., Secară, A., & Ciobanu, D. (2023). Quality Analysis of Multilingual Neural Machine Translation Systems and Reference Test Translations for the English-Romanian language pair in the Medical Domain. European Association for Machine Translation Conferences.

    10. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. ArXiv, abs/2302.13971.

    11. Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., & Le, Q.V. (2021). Finetuned Language Models Are Zero-Shot Learners. ArXiv, abs/2109.01652.

  • Petro Tolochko | Determining Scientific Uncertainty in Academic Publications

    Organisational unit: Department of Communication

    Abstract:

    Misleading scientific information is increasingly discussed as one of the most pressing challenges to science (Druckman, 2022; Swire-Thompson and Lazer, 2022; West and Bergstrom, 2021). Its threat to planetary and human health “has reached crisis proportions” (West and Bergstrom, 2021, p. 1), and its impact on societies’ and individuals’ reactions to the COVID-19 pandemic has led the WHO to declare an “infodemic” (John, 2020). Research on misleading scientific information is heavily focused on social media (e.g., Renstrom, 2022). However, given that most people still only come in contact with science through its media portrayals (Schäfer et al., 2019), the misrepresentation of scientific information in news coverage might be even more problematic. One central aspect in which scientific findings are misrepresented is the failure to convey uncertainty (Druckman, 2022; Dumas-Mallet et al., 2018; Swire-Thompson and Lazer, 2022). Uncertainty is inherent to the self-correcting nature of science, and scientific findings are always limited by scientists’ decisions regarding sampling and statistical analyses (Gustafson and Rice, 2020). However, the uncertainty of scientific information is often misrepresented in news coverage (Dumas-Mallet et al., 2018; Sumner et al., 2016), and findings are frequently simplified and presented as certain, suggesting causal relationships where researchers describe correlation (Haber et al., 2018). While media logic plays a crucial role in this misrepresentation of scientific information, scholars urge us to acknowledge that the roots might also lie within science (West and Bergstrom, 2021). There are indications that misrepresentation of uncertainty already occurs in scientific articles or related press releases (West and Bergstrom, 2021; Haber et al., 2018). The failure to convey uncertainty has detrimental consequences for science communication. It can leave people misinformed about scientific issues. For example, they might overestimate the effectiveness of new medical discoveries (Dumas-Mallet et al., 2018). Alternatively, it might distort public perceptions of the scientific process. While most scientists understand that uncertainty is an inherent part of the scientific process and there are no “hard facts,” only degrees of plausibility (e.g., Russell, 2013), an average person might not. This misunderstanding might further be exacerbated by overly “deterministic” coverage of scientific evidence in the media. Furthermore, when findings initially presented as certain are not replicated later on (Dumas-Mallet et al., 2018), it might have detrimental effects on people’s trust in science. Thus far there is only little empirical evidence on the prevalence of uncertainty in science and science communication. Specifically, there is no systematic analysis of how the communication of scientific (un)certainty differs across a) different scientific disciplines and b) platforms of science communication (i.e., academic studies, press releases, news coverage). A large amount of data needs to be analyzed to fill these gaps. Thus, in this study, we will develop an automated method of measuring the concept of “uncertainty” in texts. We will then use this method to analyze the prevalence of (un)certainty in a large sample of scientific studies, their related press releases, and news coverage. We select studies from all major research disciplines. The contribution of our study is thus three-fold: first, it would be the first to provide a large-scale, comprehensive analysis of the role of (un)certainty in science communication, adding a comparative perspective across disciplines and platforms. Second, by linking scientific studies and their related press releases and news coverage, we will create a unique dataset that will be used to explain at what stages of science communication (study, press release, news coverage) the degree of (un)certainty changes. Lastly, the measurement of (un)certainty will be a valuable tool in future research as the concept is of high relevance in science communication and other fields such as crisis communication (Sellnow and Seeger, 2021; O’malley, 2012) and political science (e.g., Manski, 2013).

Application requirements

The applicant must:

  • have a current employment with the University of Vienna and an active u:account
  • be entitled to order Microsoft 365 via the self-service portal
  • accept the Privacy Policy and Terms of Use for Microsoft Azure, see servicedesk form Microsoft Azure bestellen (ordering Microsoft Azure, in German)

Conditions of funding

  • The team Coordination Digital Transformation of the ZID decides on funding. If necessary, it consults with peer reviewers.
  • The amount of funding granted per project will be deducted monthly from the costs incurred for Azure over the period of use until 31.07.2024 on a pro rata basis.
  • Costs that exceed the granted funding amount or are incurred after the end of the funding must be covered by a cost centre available for the project.
  • The ZID is responsible for setting up the project environment in Azure, for onboarding and assigning user authorisations. Support for the technical implementation of the project is not offered.
  • Personnel resources are explicitly not funded.
  • After the end of the funding period, the Azure environment provided and the resources contained therein remain available to users. Subsequent use of the services is possible and desired.

Timetable

  • 01.11.– 31.12.2023: Application for funding
  • 01.–14.01.2024: Internal review of applications and possible queries
  • From 16.01.2024: Announcement of funded projects by e-mail
  • 17.01.–31.01.2024: Setup of Azure environments by the ZID, onboarding of users
  • 01.02.–31.07.2024: Implementation of the projects
  • 01.08.–30.09.2024: Submission of project reports

Applying for funding

The application deadline for funding has expired.

Contact

If you have any questions about funding, please use the Servicedesk form Anfrage zu Microsoft Azure (enquiry about Azure, in German).