Förderung für Forschung mit Azure-Services
Microsoft Azure ist eine Plattform, die unterschiedliche Cloud-Services bereitstellt, etwa virtuelle Server oder Services mit künstlicher Intelligenz. Mitarbeiter*innen der Universität Wien können diese Services für die Forschung kostenpflichtig zu besonderen Konditionen über den ZID nutzen. Mehr Informationen zu Microsoft Azure
Um Forschungsaktivitäten in Azure zu unterstützen, bietet der ZID für das Kalenderjahr 2025 eine finanzielle Förderung an. Insgesamt stehen 20.000,00 Euro zur Verfügung. Pro Projekt werden bis zu 5.000,00 Euro vergeben.
Projekte mit einem der folgenden Merkmale werden bei der Vergabe der Förderung priorisiert:
- Sie arbeiten mit hybriden Ansätzen (kombinierte Nutzung von Azure-Services mit lokaler Infrastruktur)
- Sie nutzen Azure-Services mit künstlicher Intelligenz
- Sie nutzen Azure-Services, für die der ZID keine alternativen IT-Services anbietet
Antragsbedingungen
Der*die Antragsteller*in muss:
- über ein aufrechtes Dienstverhältnis mit der Universität Wien und über einen aktiven u:account verfügen
- berechtigt sein, Microsoft 365 über das Selfservice-Portal zu bestellen
- die Datenschutzbestimmungen und Nutzungsbedingungen für Microsoft Azure akzeptieren, siehe Servicedesk-Formular Microsoft Azure bestellen
Förderungsbedingungen
- Über die Förderung entscheidet das Team Coordination Digital Transformation des ZID. Bei Bedarf hält es dazu Rücksprache mit Peer Reviewern.
- Die gewährte Fördersumme pro Projekt wird über die Nutzungsdauer bis 31.12.2025 von den für Azure anfallenden Kosten monatlich aliquot abgezogen.
- Kosten, die die gewährte Fördersumme überschreiten oder nach Ende der Förderung anfallen, muss eine für das Projekt verfügbare Kostenstelle übernehmen.
- Der ZID ist für die Einrichtung der Projektumgebung in Azure, das Onboarding und die Vergabe der Nutzer*innenberechtigungen verantwortlich. Unterstützung bei der technischen Umsetzung des Projektvorhabens wird nicht angeboten.
- Personalressourcen werden ausdrücklich nicht gefördert.
- Ausgeschlossen von der Förderung sind Projekte, die bereits 2024 gefördert wurden.
- Nach Ablauf der Förderung bleiben die zur Verfügung gestellte Azure-Umgebung sowie die darin enthaltenen Ressourcen für Nutzer*innen verfügbar. Eine nachfolgende Nutzung der Services ist möglich und erwünscht.
Zeitplan
- 28.10.– 31.12.2024: Beantragung der Förderung
- 01.–12.01.2025: Interne Prüfung der Anträge und eventuelle Rückfragen
- Ab 13.01.2025:
Bekanntgabe der geförderten Projekte per E-Mail
Einrichtung der Azure-Umgebungen durch den ZID, Onboarding der Nutzer*innen - Ab 20.01.2025: Durchführung der Projekte
- September 2025: Abgabe Zwischenbericht
- Dezember 2025: Abgabe Endbericht
Förderung beantragen
Nutzen Sie das Formular Förderung beantragen (Login erforderlich). Ein Antrag ist bis 31.12.2024 möglich.
Hinweis
Wenn Sie Ihren Antrag ändern möchten, füllen Sie das Formular bitte bis 31.12.2024 erneut aus.
Förderung 2024
Geförderte Forschungsprojekte
Fördersumme 4.000,00 Euro
-
Abert Claas | Very Largescale Distributed Micromagnetic Research Tools
Organisationseinheit: Institut Physik Funktioneller Materialien
Abstract:
In the context of the FWF standalone project "Very Largescale Distributed Micromagnetic Research Tools" (P 34671) we are developing algorithms for the distributed solution of micromagnetic problems on multi-GPU systems. First tests on our group-owned workstation with 4xA100 Nvidia GPUs as well as the VSC5 nodes with 2xA100 GPUs show promising results. However, in order to perform a comprehensive scaling study, we ask for GPU computing hours on the Azure cluster, which features fat GPU nodes with 8xA100 Nvidia GPUs. Our planned study requires 100 hours of the largest GPU VM instance "ND96amsr A100 v4" and will allow us to investigate both scaling of our algorithm on single instances as well as distributed multi-GPU instances. Hence, we ask for a funding of 100 x 32.7 $ = 3270.00 $ in order to carry out our numerical study.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Xin Huang | selscape: Automated and Distributed Pipelines for Investigating the Landscape of Natural Selection from Large-scale Genomic Datasets
Organisationseinheit: Department für Evolutionäre Anthropologie
Abstract:
Natural selection plays a pivotal role in evolutionary processes. With the increasing availability of genomic datasets across various species and populations, studying the genomic imprints of natural selection is crucial for understanding evolutionary histories and conserving biodiversity. However, the burgeoning size of these datasets, coupled with the plethora of computational tools available, can overwhelm researchers, especially given the limited computing resources often available for exploring the numerous modes of natural selection. Here, we aim to implement a curated suite of established software tools for detecting and quantifying signals and intensities of natural selection within large-scale genomic datasets. Our proposed pipelines offer a comprehensive, automated analysis workflow, from data preparation to result visualization. Designed for implementation using Snakemake, a versatile workflow management system, these pipelines ensure scalable and reproducible analysis across diverse computing environments, including high-performance computing clusters and cloud infrastructures. Initially developed on our local Life Science Compute Cluster (LiSC), we plan to extend and test these pipelines for cloud deployment via Azure Batch, which provides native support for Snakemake. Our intermediate goal is to apply these pipelines to the UK Biobank dataset, the largest whole-genome dataset in the world, comprising 500,000 genomes. We aim to benchmark our pipelines and investigate the landscape of natural selection within British populations. Finally, the implementation of this workflow on cloud infrastructures can be utilized for analyzing massive genomic datasets from various species, offering new insights into how natural selection shapes the biodiversity of our world.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Dylan Paltra | MULTIREP – Multidimensional Representation: Enabling An Alternative Research Agenda on the Citizen-Politician Relationship
Organisationseinheit: Institut für Staatswissenschaft
Abstract:
The “MULTIREP” project aims to enable an alternative approach to studying the citizen-politician relationship. It focuses primarily on how citizens conceptualize representation. A mixed-methods approach combines qualitative methods (focus groups and one-to-one interviews with citizens) and quantitative methods in five countries (ca. 2.000 respondents in each), focusing on natural language processing approaches. In a multinational and multilingual mass survey in five countries, including 10.000 participants, we want to improve on current survey methodology by analyzing respondents’ answers in real-time to provide tailored probing questions. We will use several cloud computing instances during the data collection, accessed from the survey platform via web services. To evaluate respondents’ answers, we will implement several NLP algorithms such as language detection, mBERT, and Flesch’s reading ease score, among others. After the data collection, we want to examine the survey answers through different language models like mBERT and our implementation of a large language model (Llama) to classify citizens’ text answers. These models must be additionally trained and fine-tuned based on existing models for our use case. For this, cloud computing instances are necessary, especially with GPU; otherwise, the computation costs would be very high. Llama especially requires a GPU instance. Additionally, we might access the Microsoft Translator Services depending on the developments in our research process. We aim to classify citizens’ answers to our open-ended questions. Here, we want to categorize how citizens conceptualize different dimensions of representation. Additionally, we would like to access Azure’s developed speaker recognition service to transcribe our focus group and one-to-one interviews. This is standard practice when applying qualitative methods. To the best of our knowledge, the real-time evaluation of survey answers by machine learning algorithms has yet to be adopted in current social science research. Therefore, the implications and contributions of this work could be far-reaching, as a successful implementation of our study through functions offered by Azure would open up new avenues in survey implementation for both respondents and researchers. The delivery of the survey through these means would mimic a humanassisted interaction in the questioning and prompting phases of the survey, which would be far more expensive to achieve through traditional channels of computer-assisted web or telephone interviewing. Finally, it would enhance our analytical capabilities on mass-collected open-ended data to a new standard for social science research.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Miguel Angel Rios Gaona | Controlled Machine Translation with Large Language Models for the Technical Domain
Organisationseinheit: Zentrum für Translationswissenschaft
Abstract:
Current state-of-the-art Neural Machine Translation (NMT) models and Large Language Models (LLM) have shown promising results on machine translation of high resource language pairs [5, 3]. However, in a high-risk and low-resource domain, like the technical domain (e.g. clinical notes, or engineering manuals), the accurate translation of terminology and correct document structure is crucial for exchanging information across international healthcare providers or researchers [6]. Moreover, the introduction of terminology and document structure constraints into neural models via instructions are currently an open problem [4, 11]. For example, controlled generation in MT output translations with the correct medical terms, length, or grammar compared to human translations. Our goal is to incorporate terminology and document structure constraints into a LLM. We plan to add a dictionary of technical terms and in-domain technical data as instructions for fine-tuning a pre-trained model based on FLAN-T5 [11] or LLaMA [10]. We will study different strategies for adding dictionaries and constraints into LLMs, e.g. source constraints and instruction fine-tuning [4, 11]. We will test the proposed model on the English-German and German-English language pairs with medical and scientific paper abstracts [6, 1]. We will evaluate with automatic metrics [7, 8], and in-house human experts [9]. We plan to use one A100 40 GB GPU or V100 32 GB GPU for tuning our proposed model and compare it with related work. We require GPUs to develop our model, NMT baselines, and instruction fine-tune related work (e.g. FLAN-T5).
Project timeline:- NMT and LLM baselines, 01.02.24 to 01.03.24
- LLM instruction fine-tuning, 02.03.24 to 01.07.24
- Manual error annotation, 15.06.24 to 31.07.24
- Draft paper, 0.1.06.24 to 15.08.2024
- Project report, 01.08.24 to 30.09.24
Project outcomes:- Paper submitted to a peer-reviewed publication;
- Project report;
- Open source code and models.
References:1. Alam, M., Kvapilíková, I., Anastasopoulos, A., Besacier, L., Dinu, G., Federico, M., Gallé, M., Jung, K.W., Koehn, P., & Nikoulina, V. (2021). Findings of the WMT Shared Task on Machine Translation Using Terminologies. Conference on Machine Translation.
2. Alves, D.M., Guerreiro, N.M., Alves, J., Pombal, J.P., Rei, R., Souza, J.G., Colombo, P., & Martins, A. (2023). Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning. ArXiv, abs/2310.13448.
3. Bawden, R., & Yvon, F. (2023). Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM. European Association for Machine Translation Conferences/Workshops.
4. Exel, M., Buschbeck-Wolf, B., Brandt, L., & Doneva, S. (2020). Terminology-Constrained Neural Machine Translation at SAP. EAMT.
5. Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F.B., Wattenberg, M., Corrado, G.S., Hughes, M., & Dean, J. (2017). Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics, 5, 339-351.
6. Neves, M.L., Jimeno-Yepes, A., Névéol, A., Grozea, C., Siu, A., Kittner, M., & Verspoor, K.M. (2018). Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets. WMT.
7. Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). Bleu: a Method for Automatic Evaluation of Machine Translation. ACL.
8. Rei, R., Stewart, C.A., Farinha, A.C., & Lavie, A. (2020). COMET: A Neural Framework for MT Evaluation. EMNLP.
9. Rios, M., Chereji, R., Secară, A., & Ciobanu, D. (2023). Quality Analysis of Multilingual Neural Machine Translation Systems and Reference Test Translations for the English-Romanian language pair in the Medical Domain. European Association for Machine Translation Conferences.
10. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. ArXiv, abs/2302.13971.
11. Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., & Le, Q.V. (2021). Finetuned Language Models Are Zero-Shot Learners. ArXiv, abs/2109.01652.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Petro Tolochko | Determining Scientific Uncertainty in Academic Publications
Organisationseinheit: Institut für Publizistik- und Kommunikationswissenschaft
Abstract:
Misleading scientific information is increasingly discussed as one of the most pressing challenges to science (Druckman, 2022; Swire-Thompson and Lazer, 2022; West and Bergstrom, 2021). Its threat to planetary and human health “has reached crisis proportions” (West and Bergstrom, 2021, p. 1), and its impact on societies’ and individuals’ reactions to the COVID-19 pandemic has led the WHO to declare an “infodemic” (John, 2020). Research on misleading scientific information is heavily focused on social media (e.g., Renstrom, 2022). However, given that most people still only come in contact with science through its media portrayals (Schäfer et al., 2019), the misrepresentation of scientific information in news coverage might be even more problematic. One central aspect in which scientific findings are misrepresented is the failure to convey uncertainty (Druckman, 2022; Dumas-Mallet et al., 2018; Swire-Thompson and Lazer, 2022). Uncertainty is inherent to the self-correcting nature of science, and scientific findings are always limited by scientists’ decisions regarding sampling and statistical analyses (Gustafson and Rice, 2020). However, the uncertainty of scientific information is often misrepresented in news coverage (Dumas-Mallet et al., 2018; Sumner et al., 2016), and findings are frequently simplified and presented as certain, suggesting causal relationships where researchers describe correlation (Haber et al., 2018). While media logic plays a crucial role in this misrepresentation of scientific information, scholars urge us to acknowledge that the roots might also lie within science (West and Bergstrom, 2021). There are indications that misrepresentation of uncertainty already occurs in scientific articles or related press releases (West and Bergstrom, 2021; Haber et al., 2018). The failure to convey uncertainty has detrimental consequences for science communication. It can leave people misinformed about scientific issues. For example, they might overestimate the effectiveness of new medical discoveries (Dumas-Mallet et al., 2018). Alternatively, it might distort public perceptions of the scientific process. While most scientists understand that uncertainty is an inherent part of the scientific process and there are no “hard facts,” only degrees of plausibility (e.g., Russell, 2013), an average person might not. This misunderstanding might further be exacerbated by overly “deterministic” coverage of scientific evidence in the media. Furthermore, when findings initially presented as certain are not replicated later on (Dumas-Mallet et al., 2018), it might have detrimental effects on people’s trust in science. Thus far there is only little empirical evidence on the prevalence of uncertainty in science and science communication. Specifically, there is no systematic analysis of how the communication of scientific (un)certainty differs across a) different scientific disciplines and b) platforms of science communication (i.e., academic studies, press releases, news coverage). A large amount of data needs to be analyzed to fill these gaps. Thus, in this study, we will develop an automated method of measuring the concept of “uncertainty” in texts. We will then use this method to analyze the prevalence of (un)certainty in a large sample of scientific studies, their related press releases, and news coverage. We select studies from all major research disciplines. The contribution of our study is thus three-fold: first, it would be the first to provide a large-scale, comprehensive analysis of the role of (un)certainty in science communication, adding a comparative perspective across disciplines and platforms. Second, by linking scientific studies and their related press releases and news coverage, we will create a unique dataset that will be used to explain at what stages of science communication (study, press release, news coverage) the degree of (un)certainty changes. Lastly, the measurement of (un)certainty will be a valuable tool in future research as the concept is of high relevance in science communication and other fields such as crisis communication (Sellnow and Seeger, 2021; O’malley, 2012) and political science (e.g., Manski, 2013).
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro
Fördersumme 2.000,00 Euro
-
Abert Claas | Very Largescale Distributed Micromagnetic Research Tools
Organisationseinheit: Institut Physik Funktioneller Materialien
Abstract:
In the context of the FWF standalone project "Very Largescale Distributed Micromagnetic Research Tools" (P 34671) we are developing algorithms for the distributed solution of micromagnetic problems on multi-GPU systems. First tests on our group-owned workstation with 4xA100 Nvidia GPUs as well as the VSC5 nodes with 2xA100 GPUs show promising results. However, in order to perform a comprehensive scaling study, we ask for GPU computing hours on the Azure cluster, which features fat GPU nodes with 8xA100 Nvidia GPUs. Our planned study requires 100 hours of the largest GPU VM instance "ND96amsr A100 v4" and will allow us to investigate both scaling of our algorithm on single instances as well as distributed multi-GPU instances. Hence, we ask for a funding of 100 x 32.7 $ = 3270.00 $ in order to carry out our numerical study.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Xin Huang | selscape: Automated and Distributed Pipelines for Investigating the Landscape of Natural Selection from Large-scale Genomic Datasets
Organisationseinheit: Department für Evolutionäre Anthropologie
Abstract:
Natural selection plays a pivotal role in evolutionary processes. With the increasing availability of genomic datasets across various species and populations, studying the genomic imprints of natural selection is crucial for understanding evolutionary histories and conserving biodiversity. However, the burgeoning size of these datasets, coupled with the plethora of computational tools available, can overwhelm researchers, especially given the limited computing resources often available for exploring the numerous modes of natural selection. Here, we aim to implement a curated suite of established software tools for detecting and quantifying signals and intensities of natural selection within large-scale genomic datasets. Our proposed pipelines offer a comprehensive, automated analysis workflow, from data preparation to result visualization. Designed for implementation using Snakemake, a versatile workflow management system, these pipelines ensure scalable and reproducible analysis across diverse computing environments, including high-performance computing clusters and cloud infrastructures. Initially developed on our local Life Science Compute Cluster (LiSC), we plan to extend and test these pipelines for cloud deployment via Azure Batch, which provides native support for Snakemake. Our intermediate goal is to apply these pipelines to the UK Biobank dataset, the largest whole-genome dataset in the world, comprising 500,000 genomes. We aim to benchmark our pipelines and investigate the landscape of natural selection within British populations. Finally, the implementation of this workflow on cloud infrastructures can be utilized for analyzing massive genomic datasets from various species, offering new insights into how natural selection shapes the biodiversity of our world.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Dylan Paltra | MULTIREP – Multidimensional Representation: Enabling An Alternative Research Agenda on the Citizen-Politician Relationship
Organisationseinheit: Institut für Staatswissenschaft
Abstract:
The “MULTIREP” project aims to enable an alternative approach to studying the citizen-politician relationship. It focuses primarily on how citizens conceptualize representation. A mixed-methods approach combines qualitative methods (focus groups and one-to-one interviews with citizens) and quantitative methods in five countries (ca. 2.000 respondents in each), focusing on natural language processing approaches. In a multinational and multilingual mass survey in five countries, including 10.000 participants, we want to improve on current survey methodology by analyzing respondents’ answers in real-time to provide tailored probing questions. We will use several cloud computing instances during the data collection, accessed from the survey platform via web services. To evaluate respondents’ answers, we will implement several NLP algorithms such as language detection, mBERT, and Flesch’s reading ease score, among others. After the data collection, we want to examine the survey answers through different language models like mBERT and our implementation of a large language model (Llama) to classify citizens’ text answers. These models must be additionally trained and fine-tuned based on existing models for our use case. For this, cloud computing instances are necessary, especially with GPU; otherwise, the computation costs would be very high. Llama especially requires a GPU instance. Additionally, we might access the Microsoft Translator Services depending on the developments in our research process. We aim to classify citizens’ answers to our open-ended questions. Here, we want to categorize how citizens conceptualize different dimensions of representation. Additionally, we would like to access Azure’s developed speaker recognition service to transcribe our focus group and one-to-one interviews. This is standard practice when applying qualitative methods. To the best of our knowledge, the real-time evaluation of survey answers by machine learning algorithms has yet to be adopted in current social science research. Therefore, the implications and contributions of this work could be far-reaching, as a successful implementation of our study through functions offered by Azure would open up new avenues in survey implementation for both respondents and researchers. The delivery of the survey through these means would mimic a humanassisted interaction in the questioning and prompting phases of the survey, which would be far more expensive to achieve through traditional channels of computer-assisted web or telephone interviewing. Finally, it would enhance our analytical capabilities on mass-collected open-ended data to a new standard for social science research.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Miguel Angel Rios Gaona | Controlled Machine Translation with Large Language Models for the Technical Domain
Organisationseinheit: Zentrum für Translationswissenschaft
Abstract:
Current state-of-the-art Neural Machine Translation (NMT) models and Large Language Models (LLM) have shown promising results on machine translation of high resource language pairs [5, 3]. However, in a high-risk and low-resource domain, like the technical domain (e.g. clinical notes, or engineering manuals), the accurate translation of terminology and correct document structure is crucial for exchanging information across international healthcare providers or researchers [6]. Moreover, the introduction of terminology and document structure constraints into neural models via instructions are currently an open problem [4, 11]. For example, controlled generation in MT output translations with the correct medical terms, length, or grammar compared to human translations. Our goal is to incorporate terminology and document structure constraints into a LLM. We plan to add a dictionary of technical terms and in-domain technical data as instructions for fine-tuning a pre-trained model based on FLAN-T5 [11] or LLaMA [10]. We will study different strategies for adding dictionaries and constraints into LLMs, e.g. source constraints and instruction fine-tuning [4, 11]. We will test the proposed model on the English-German and German-English language pairs with medical and scientific paper abstracts [6, 1]. We will evaluate with automatic metrics [7, 8], and in-house human experts [9]. We plan to use one A100 40 GB GPU or V100 32 GB GPU for tuning our proposed model and compare it with related work. We require GPUs to develop our model, NMT baselines, and instruction fine-tune related work (e.g. FLAN-T5).
Project timeline:- NMT and LLM baselines, 01.02.24 to 01.03.24
- LLM instruction fine-tuning, 02.03.24 to 01.07.24
- Manual error annotation, 15.06.24 to 31.07.24
- Draft paper, 0.1.06.24 to 15.08.2024
- Project report, 01.08.24 to 30.09.24
Project outcomes:- Paper submitted to a peer-reviewed publication;
- Project report;
- Open source code and models.
References:1. Alam, M., Kvapilíková, I., Anastasopoulos, A., Besacier, L., Dinu, G., Federico, M., Gallé, M., Jung, K.W., Koehn, P., & Nikoulina, V. (2021). Findings of the WMT Shared Task on Machine Translation Using Terminologies. Conference on Machine Translation.
2. Alves, D.M., Guerreiro, N.M., Alves, J., Pombal, J.P., Rei, R., Souza, J.G., Colombo, P., & Martins, A. (2023). Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning. ArXiv, abs/2310.13448.
3. Bawden, R., & Yvon, F. (2023). Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM. European Association for Machine Translation Conferences/Workshops.
4. Exel, M., Buschbeck-Wolf, B., Brandt, L., & Doneva, S. (2020). Terminology-Constrained Neural Machine Translation at SAP. EAMT.
5. Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F.B., Wattenberg, M., Corrado, G.S., Hughes, M., & Dean, J. (2017). Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics, 5, 339-351.
6. Neves, M.L., Jimeno-Yepes, A., Névéol, A., Grozea, C., Siu, A., Kittner, M., & Verspoor, K.M. (2018). Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets. WMT.
7. Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). Bleu: a Method for Automatic Evaluation of Machine Translation. ACL.
8. Rei, R., Stewart, C.A., Farinha, A.C., & Lavie, A. (2020). COMET: A Neural Framework for MT Evaluation. EMNLP.
9. Rios, M., Chereji, R., Secară, A., & Ciobanu, D. (2023). Quality Analysis of Multilingual Neural Machine Translation Systems and Reference Test Translations for the English-Romanian language pair in the Medical Domain. European Association for Machine Translation Conferences.
10. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. ArXiv, abs/2302.13971.
11. Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., & Le, Q.V. (2021). Finetuned Language Models Are Zero-Shot Learners. ArXiv, abs/2109.01652.
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro -
Petro Tolochko | Determining Scientific Uncertainty in Academic Publications
Organisationseinheit: Institut für Publizistik- und Kommunikationswissenschaft
Abstract:
Misleading scientific information is increasingly discussed as one of the most pressing challenges to science (Druckman, 2022; Swire-Thompson and Lazer, 2022; West and Bergstrom, 2021). Its threat to planetary and human health “has reached crisis proportions” (West and Bergstrom, 2021, p. 1), and its impact on societies’ and individuals’ reactions to the COVID-19 pandemic has led the WHO to declare an “infodemic” (John, 2020). Research on misleading scientific information is heavily focused on social media (e.g., Renstrom, 2022). However, given that most people still only come in contact with science through its media portrayals (Schäfer et al., 2019), the misrepresentation of scientific information in news coverage might be even more problematic. One central aspect in which scientific findings are misrepresented is the failure to convey uncertainty (Druckman, 2022; Dumas-Mallet et al., 2018; Swire-Thompson and Lazer, 2022). Uncertainty is inherent to the self-correcting nature of science, and scientific findings are always limited by scientists’ decisions regarding sampling and statistical analyses (Gustafson and Rice, 2020). However, the uncertainty of scientific information is often misrepresented in news coverage (Dumas-Mallet et al., 2018; Sumner et al., 2016), and findings are frequently simplified and presented as certain, suggesting causal relationships where researchers describe correlation (Haber et al., 2018). While media logic plays a crucial role in this misrepresentation of scientific information, scholars urge us to acknowledge that the roots might also lie within science (West and Bergstrom, 2021). There are indications that misrepresentation of uncertainty already occurs in scientific articles or related press releases (West and Bergstrom, 2021; Haber et al., 2018). The failure to convey uncertainty has detrimental consequences for science communication. It can leave people misinformed about scientific issues. For example, they might overestimate the effectiveness of new medical discoveries (Dumas-Mallet et al., 2018). Alternatively, it might distort public perceptions of the scientific process. While most scientists understand that uncertainty is an inherent part of the scientific process and there are no “hard facts,” only degrees of plausibility (e.g., Russell, 2013), an average person might not. This misunderstanding might further be exacerbated by overly “deterministic” coverage of scientific evidence in the media. Furthermore, when findings initially presented as certain are not replicated later on (Dumas-Mallet et al., 2018), it might have detrimental effects on people’s trust in science. Thus far there is only little empirical evidence on the prevalence of uncertainty in science and science communication. Specifically, there is no systematic analysis of how the communication of scientific (un)certainty differs across a) different scientific disciplines and b) platforms of science communication (i.e., academic studies, press releases, news coverage). A large amount of data needs to be analyzed to fill these gaps. Thus, in this study, we will develop an automated method of measuring the concept of “uncertainty” in texts. We will then use this method to analyze the prevalence of (un)certainty in a large sample of scientific studies, their related press releases, and news coverage. We select studies from all major research disciplines. The contribution of our study is thus three-fold: first, it would be the first to provide a large-scale, comprehensive analysis of the role of (un)certainty in science communication, adding a comparative perspective across disciplines and platforms. Second, by linking scientific studies and their related press releases and news coverage, we will create a unique dataset that will be used to explain at what stages of science communication (study, press release, news coverage) the degree of (un)certainty changes. Lastly, the measurement of (un)certainty will be a valuable tool in future research as the concept is of high relevance in science communication and other fields such as crisis communication (Sellnow and Seeger, 2021; O’malley, 2012) and political science (e.g., Manski, 2013).
Kategorie: Geförderte Forschungsprojekte 2.000,00 Euro
Zeitplan Förderung 2024
- 01.11.– 31.12.2023: Beantragung der Förderung
- 01.–14.01.2024: Interne Prüfung der Anträge und eventuelle Rückfragen
- Ab 16.01.2024: Bekanntgabe der geförderten Projekte per E-Mail
- 17.01.–31.01.2024: Einrichtung der Azure-Umgebungen durch den ZID, Onboarding der Nutzer*innen
- 01.02.–31.07.2024: Durchführung der Projekte
- 01.08.–30.09.2024: Abgabe der Projektberichte
Kontakt
Bei Fragen zur Förderung steht Ihnen das Servicedesk-Formular Anfrage zu Microsoft Azure zur Verfügung.