Alan Ramponi | Publications

A list of peer-reviewed publications and associated materials is presented below. You can also check my Google Scholar page. Note that NLP, and AI more broadly, are conference-driven fields. For top-tier venues on the subject, see this page.

Selected publications

NAACL

Fine-grained Fallacy Detection with Human Label Variation

Ramponi, A., Daffara, A., and Tonelli, S.

In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Abs HTML PDF Repo Poster Slides Bib

We introduce FAINA, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. FAINA includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian about migration, climate change, and public health given by two expert annotators. Through an extensive annotation study that allowed discussion over multiple rounds, we minimize annotation errors whilst keeping signals of human label variation. Moreover, we devise a framework that goes beyond single ground truth evaluation and simultaneously accounts for multiple (equally reliable) test sets and the peculiarities of the task, i.e., partial span matches, overlaps, and the varying severity of labeling errors. Our experiments across four fallacy detection setups show that multi-task and multi-label transformer-based approaches are strong baselines across all settings. We release our data, code, and annotation guidelines to foster research on fallacy detection and human label variation more broadly.
TACL

Language Varieties of Italy: Technology Challenges and Opportunities

Ramponi, A.

Transactions of the Association for Computational Linguistics, 2024

Abs HTML PDF Repo Poster Slides Website Bib

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its speakers. However, most local languages and dialects in Italy are at risk of disappearing within a few generations. The NLP community has recently begun to engage with endangered languages, including those of Italy. Yet, most efforts assume that these varieties are under-resourced language monoliths with an established written form and homogeneous functions and needs, and thus highly interchangeable with each other and with high-resource, standardized languages. In this paper, we introduce the linguistic context of Italy and challenge the default machine-centric assumptions of NLP for Italy’s language varieties. We advocate for a shift in the paradigm from machine-centric to speaker-centric NLP, and provide recommendations and opportunities for work that prioritizes languages and their speakers over technological advances. To facilitate the process, we finally propose building a local community towards responsible, participatory efforts aimed at supporting vitality of languages and dialects of Italy.
ACL

Variationist: Exploring Multifaceted Variation and Bias in Written Language Data

Ramponi, A.*, Casula, C.*, and Menini, S.

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2024

Abs HTML PDF Repo Video Poster Bib

Exploring and understanding language data is a fundamental stage in all areas dealing with human language. It allows NLP practitioners to uncover quality concerns and harmful biases in data before training, and helps linguists and social scientists to gain insight into language use and human behavior. Yet, there is currently a lack of a unified, customizable tool to seamlessly inspect and visualize language variation and bias across multiple variables, language units, and diverse metrics that go beyond descriptive statistics. In this paper, we introduce Variationist, a highly-modular, extensible, and task-agnostic tool that fills this gap. Variationist handles at once a potentially unlimited combination of variable types and semantics across diversity and association metrics with regards to the language unit of choice, and orchestrates the creation of up to five-dimensional interactive charts for over 30 variable type-semantics combinations. Through our case studies on computational dialectology, human label variation, and text generation, we show how Variationist enables researchers from different disciplines to effortlessly answer specific research questions or unveil undesired associations in language data. A Python library, code, documentation, and tutorials are made publicly available to the research community.
EACL

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

Van Der Goot, R., Üstün, A., Ramponi, A., Sharaf, I., and Plank, B.

In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021 — Outstanding paper award

Abs HTML PDF Repo Video Poster Website Bib

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation.
COLING

Neural Unsupervised Domain Adaptation in NLP---A Survey

Ramponi, A., and Plank, B.

In Proceedings of the 28th International Conference on Computational Linguistics, 2020

Abs HTML PDF Poster Website Bib

Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.

All publications

EVALITA

FadeIT at EVALITA 2026: Overview of the Fallacy Detection in Italian Social Media Texts Task

Ramponi, A., and Tonelli, S.

In Proceedings of the Ninth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, 2026

Abs HTML PDF Repo Slides Website Bib

FadeIT is the first shared task on fallacy detection in social media texts in Italian, an understudied language for this task. FadeIT relies on Faina, a fallacy detection dataset that includes span-level annotations with overlaps for 20 fallacy types in social media texts about migration, climate change, and public health over a 4-year time period. The shared task is articulated into two subtasks at different granularities: i) post-level fallacy detection, aiming at predicting the fallacy types expressed in each input post, and ii) span-level fallacy detection, aiming at predicting all text segments expressing any given fallacy type in each input post. Participants' systems are evaluated against two equally valid gold standards (i.e., parallel annotations in Faina) to account for natural disagreement, in line with recent work advocating the importance of considering human label variation in subjective tasks. FadeIT has attracted wide interest at Evalita 2026 with a total of 25 runs submitted by 7 participant teams. In this paper, we present the task setup, including the data used and the evaluation criteria, as well as the results obtained by all participant teams, an analysis of their approaches, and insights for future research on the topic.
EMNLP

Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches

Ramponi, A.*, Rovera, M.*, Moro, R., and Tonelli, S.

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Abs HTML PDF Repo Poster Bib

Retrieval of previously fact-checked claims is a well-established task, whose automation can assist professional fact-checkers in the initial steps of information verification. Previous works have mostly tackled the task monolingually, i.e., having both the input and the retrieved claims in the same language. However, especially for languages with a limited availability of fact-checks and in case of global narratives, such as pandemics, wars, or international politics, it is crucial to be able to retrieve claims across languages. In this work, we examine strategies to improve the multilingual and crosslingual performance, namely selection of negative examples (in the supervised) and re-ranking (in the unsupervised setting). We evaluate all approaches on a dataset containing posts and claims in 47 languages (283 language combinations). We observe that the best results are obtained by using LLM-based re-ranking, followed by fine-tuning with negative examples sampled using a sentence similarity-based strategy. Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.
EMNLP

Translation in the Hands of Many: Centering Lay Users in Machine Translation Interactions

Savoldi, B., Ramponi, A., Negri, M., and Bentivogli, L.

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Abs HTML PDF Poster Slides Website Bib

Converging societal and technical factors have transformed language technologies into user-facing applications used by the general public across languages. Machine Translation (MT) has become a global tool, with cross-lingual services now also supported by dialogue systems powered by multilingual Large Language Models (LLMs). Widespread accessibility has extended MT's reach to a vast base of lay users, many with little to no expertise in the languages or the technology itself. And yet, the understanding of MT consumed by such a diverse group of users -- their needs, experiences, and interactions with multilingual systems -- remains limited. In our position paper, we first trace the evolution of MT user profiles, focusing on non-experts and how their engagement with technology may shift with the rise of LLMs. Building on an interdisciplinary body of work, we identify three factors -- usability, trust, and literacy -- that are central to shaping user interactions and must be addressed to align MT with user needs. By examining these dimensions, we provide insights to guide the progress of more user-centered MT.
CLiC-it

WorthIt: Check-worthiness Estimation of Italian Social Media Posts

Daffara, A., Ramponi, A., and Tonelli, S.

In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025), 2025

Abs HTML PDF Repo Slides Bib

Check-worthiness estimation is the first and a paramount task in the automated fact-checking pipeline. It allows professional fact-checkers to cope with the increasing amount of mis/disinformative textual content being published online by prioritizing claims that are factual/verifiable and worthy of verification. Despite the long tradition of check-worthiness estimation in NLP, there is currently a lack of annotated resources and associated methods for Italian. Moreover, current datasets typically cover a single topic and focus on a limited time frame, affecting models' generalizability on out-of-distribution data. To fill these gaps, in this paper we introduce WorthIt, the first annotated dataset for factuality/verifiability and check-worthiness estimation of Italian social media posts that covers public discourse on migration, climate change, and public health issues across a large time period of six years. We describe the dataset creation in detail and conduct thorough experimentation with the WorthIt dataset using a wide array of encoder- and decoder-based models. Our results show that fine-tuning monolingual encoder-based models in a multi-task setting provides the best overall performance and that decoder-based models in a few-shot setup still struggle in capturing the relation between factuality/verifiability and check-worthiness. We release our dataset, code, and associated materials to the research community.
ArgMining@ACL

ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection

Ramponi, A., Genoni, G., and Tonelli, S.

In Proceedings of the 12th Argument Mining Workshop, 2025

Abs HTML PDF Repo Poster Bib

Critical questions (CQs) generation for argumentative texts is a key task to promote critical thinking and counter misinformation. In this paper, we present a two-step approach for CQs generation that i) uses a large language model (LLM) for generating candidate CQs, and ii) leverages a fine-tuned classifier for ranking and selecting the top-k most useful CQs to present to the user. We show that such usefulness-based CQs selection consistently improves the performance over the standard application of LLMs. Our system was designed in the context of a shared task on CQs generation hosted at the 12th Workshop on Argument Mining, and represents a viable approach to encourage future developments on CQs generation. Our code is made available to the research community.
NAACL

Fine-grained Fallacy Detection with Human Label Variation

Ramponi, A., Daffara, A., and Tonelli, S.

In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Abs HTML PDF Repo Poster Slides Bib

We introduce FAINA, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. FAINA includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian about migration, climate change, and public health given by two expert annotators. Through an extensive annotation study that allowed discussion over multiple rounds, we minimize annotation errors whilst keeping signals of human label variation. Moreover, we devise a framework that goes beyond single ground truth evaluation and simultaneously accounts for multiple (equally reliable) test sets and the peculiarities of the task, i.e., partial span matches, overlaps, and the varying severity of labeling errors. Our experiments across four fallacy detection setups show that multi-task and multi-label transformer-based approaches are strong baselines across all settings. We release our data, code, and annotation guidelines to foster research on fallacy detection and human label variation more broadly.
W-NUT@NAACL

Proceedings of the Tenth Workshop on Noisy and User-generated Text

Bak, J., Van Der Goot, R., Jang, H., Buaphet, W., Ramponi, A., Xu, W., and Ritter, A.

In Proceedings of the Tenth Workshop on Noisy and User-generated Text, 2025

Abs HTML PDF Bib

The W-NUT 2025 workshop focuses on a core set of natural language processing tasks on top of noisy and user-generated text, such as those found on social media, web forums and online reviews. The internet has democratized content creation leading to an explosion of informal user-generated text, publicly available in electronic format, motivating the need for NLP on noisy text to enable new data analytics applications. We have received a total of 18 main workshop submissions, of which 16 are included in the proceedings. The workshop will be held in hybrid in-person and virtual modes. We have two invited speakers: Su Lin Blodgett and Verena Blaschke, who have generously agreed to share their ongoing research work. We are very thankful to have them in our workshop. We would like to thank the Program Committee members who reviewed the papers, as well as all of the workshop participants for submitting their work.
Preprint

Generative AI Practices, Literacy, and Divides: An Empirical Analysis in the Italian Context

Savoldi, B., Attanasio, G., Gorodetskaya, O., Marchiori Manerba, M., Bassignana, E., Casola, S., Negri, M., Caselli, T., Bentivogli, L., Ramponi, A., Muti, A., Balbo, N., and Nozza, D.

arXiv preprint, 2025

Abs HTML PDF Bib

The rise of Artificial Intelligence (AI) language technologies, particularly generative AI (GenAI) chatbots accessible via conversational interfaces, is transforming digital interactions. While these tools hold societal promise, they also risk widening digital divides due to uneven adoption and low awareness of their limitations. This study presents the first comprehensive empirical mapping of GenAI adoption, usage patterns, and literacy in Italy, based on newly collected survey data from 1,906 Italian-speaking adults. Our findings reveal widespread adoption for both work and personal use, including sensitive tasks like emotional support and medical advice. Crucially, GenAI is supplanting other technologies to become a primary information source: this trend persists despite low user digital literacy, posing a risk as users struggle to recognize errors or misinformation. Moreover, we identify a significant gender divide -- particularly pronounced in older generations -- where women are half as likely to adopt GenAI and use it less frequently than men. While we find literacy to be a key predictor of adoption, it only partially explains this disparity, suggesting that other barriers are at play. Overall, our data provide granular insights into the multipurpose usage of GenAI, highlighting the dual need for targeted educational initiatives and further investigation into the underlying barriers to equitable participation that competence alone cannot explain.
IJCoL

Introduction to the Special Issue on Natural Language for Artificial Intelligence in the Era of LLMs

Bassignana, E., Brunato, D., Polignano, M., and Ramponi, A.

IJCoL - Italian Journal of Computational Linguistics, 2024

Abs HTML PDF Bib

The rapid advancement of Large Language Models (LLMs) has revolutionized the field of Natural Language Processing (NLP) and Artificial Intelligence (AI) in recent years. Transformer-based models such as GPT-3 and BERT have demonstrated remarkable capabilities in modeling and generating human-like text. These models have not only redefined the potential of AI systems but also revolutionized applications across a broad spectrum, including machine translation, sentiment analysis, question answering, and beyond. While the era of LLMs has significantly expanded the horizons of AI, it has also presented critical challenges in effectively and responsibly harnessing their capabilities. This special issue was conceived as a dedicated platform for exploring the latest advancements, methodologies, and applications of LLMs. It sought to foster collaboration and knowledge exchange within the NLP and AI communities, encouraging researchers and practitioners to address real-world challenges while also delving into the theoretical foundations of language in machine learning. The issue builds on discussions initiated at the NL4AI 2023 workshop held at AIxIA 2023 (Bassignana et al. 2023), where many contributions showcased the innovative use of LLMs for natural language understanding and generation tasks. The papers in this volume reflect a diverse and rich body of research, spanning a wide array of topics. They investigate cutting-edge applications, introduce novel methodologies, and provide theoretical insights into the use of LLMs, while also reflecting on the integration of multimodal information and cognitively inspired frameworks to tackle complex problems. More specifically, the issue covers multimodal approaches, domain-specific adaptations, and efficient model deployment. The contributions highlight both opportunities and challenges in advancing natural language technologies.
TACL

Language Varieties of Italy: Technology Challenges and Opportunities

Ramponi, A.

Transactions of the Association for Computational Linguistics, 2024

Abs HTML PDF Repo Poster Slides Website Bib

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its speakers. However, most local languages and dialects in Italy are at risk of disappearing within a few generations. The NLP community has recently begun to engage with endangered languages, including those of Italy. Yet, most efforts assume that these varieties are under-resourced language monoliths with an established written form and homogeneous functions and needs, and thus highly interchangeable with each other and with high-resource, standardized languages. In this paper, we introduce the linguistic context of Italy and challenge the default machine-centric assumptions of NLP for Italy’s language varieties. We advocate for a shift in the paradigm from machine-centric to speaker-centric NLP, and provide recommendations and opportunities for work that prioritizes languages and their speakers over technological advances. To facilitate the process, we finally propose building a local community towards responsible, participatory efforts aimed at supporting vitality of languages and dialects of Italy.
ACL

Variationist: Exploring Multifaceted Variation and Bias in Written Language Data

Ramponi, A.*, Casula, C.*, and Menini, S.

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2024

Abs HTML PDF Repo Video Poster Bib

Exploring and understanding language data is a fundamental stage in all areas dealing with human language. It allows NLP practitioners to uncover quality concerns and harmful biases in data before training, and helps linguists and social scientists to gain insight into language use and human behavior. Yet, there is currently a lack of a unified, customizable tool to seamlessly inspect and visualize language variation and bias across multiple variables, language units, and diverse metrics that go beyond descriptive statistics. In this paper, we introduce Variationist, a highly-modular, extensible, and task-agnostic tool that fills this gap. Variationist handles at once a potentially unlimited combination of variable types and semantics across diversity and association metrics with regards to the language unit of choice, and orchestrates the creation of up to five-dimensional interactive charts for over 30 variable type-semantics combinations. Through our case studies on computational dialectology, human label variation, and text generation, we show how Variationist enables researchers from different disciplines to effortlessly answer specific research questions or unveil undesired associations in language data. A Python library, code, documentation, and tutorials are made publicly available to the research community.
EMNLP

Delving into Qualitative Implications of Synthetic Data for Hate Speech Detection

Casula, C., Vecellio Salto, S., Ramponi, A., and Tonelli, S.

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs HTML PDF Repo Slides Bib

The use of synthetic data for training models for a variety of NLP tasks is now widespread. However, previous work reports mixed results with regards to its effectiveness on highly subjective tasks such as hate speech detection. In this paper, we present an in-depth qualitative analysis of the potential and specific pitfalls of synthetic data for hate speech detection in English, with 3,500 manually annotated examples. We show that, across different models, synthetic data created through paraphrasing gold texts can improve out-of-distribution robustness from a computational standpoint. However, this comes at a cost: synthetic data fails to reliably reflect the characteristics of real-world data on a number of linguistic dimensions, it results in drastically different class distributions, and it heavily reduces the representation of both specific identity groups and intersectional hate.
CLiC-it

When You Doubt, Abstain: A Study of Automated Fact-checking in Italian Under Domain Shift

Valer, G., Ramponi, A., and Tonelli, S.

In Proceedings of the 9th Italian Conference on Computational Linguistics, 2023

Abs HTML PDF Repo Slides Bib

Data for building fact-checking models for Italian is scarce, often contains ambiguous claims, and lacks textual diversity. This makes it hard to reliably apply such tools in the real world to support fact-checkers' work. In this paper, we propose a categorization of claim ambiguity and label the largest Italian test set based on it. Moreover, we create challenge sets across two axes of variation: genres and fact-checking sources. Our experiments using transformer-based semantic search show a large drop in performance under domain shift, and indicate the benefit of models' abstention in case of lacking evidence.
NL4AI

Preface to the Seventh Workshop on Natural Language for Artificial Intelligence (NL4AI 2023)

Bassignana, E., Brunato, D., Polignano, M., and Ramponi, A.

In Proceedings of the Seventh Workshop on Natural Language for Artificial Intelligence co-located with 22th International Conference of the Italian Association for Artificial Intelligence, 2023

Abs HTML PDF Slides Website Bib

The Natural Language for Artificial Intelligence (NL4AI) workshop, supported by the Special Interest Group on NLP of the Italian Association for Artificial Intelligence (AIxIA) and by the Italian Association of Computational Linguistics (AILC), aims at providing a broad overview of recent activities in the field of Human Language Technologies (HLT) in Italy. Since its first edition in 2017, the workshop has served as a platform for researchers to exchange experiences and insights on research and applications at the intersection of Natural Language Processing (NLP) and Artificial Intelligence (AI). Like previous years, the current edition of the workshop was co-located within the International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023), which took place on November 6–7th in Rome, Italy. The program of the meeting is available on the official workshop website.
EVALITA

GeoLingIt at EVALITA 2023: Overview of the Geolocation of Linguistic Variation in Italy Task

Ramponi, A., and Casula, C.

In Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, 2023

Abs HTML PDF Repo Slides Website Bib

GeoLingIt is the first shared task on geolocation of linguistic variation in Italy from social media posts comprising content in language varieties other than standard Italian (i.e., regional Italian, and languages and dialects of Italy). The task is articulated into two subtasks of increasing complexity for which only textual content is allowed: i) coarse-grained geolocation, aiming at predicting the region in which the variety expressed in the post is spoken, and ii) fine-grained geolocation, aiming at predicting its exact coordinates. Both tasks can be either at the country level (standard track) or restricted to a linguistic area of choice (special track). GeoLingIt has attracted wide interest at the Evalita 2023 evaluation campaign with 37 registrations and 35 submitted runs. In this paper, we present the task and data, the evaluation criteria, the participants' results, an analysis of their approaches, and the main insights from the shared task.
EVALITA

HaSpeeDe3 at EVALITA 2023: Overview of the Political and Religious Hate Speech Detection Task

Lai, M., Celli, F., Ramponi, A., Tonelli, S., Bosco, C., and Patti, V.

In Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, 2023

Abs HTML PDF Repo Slides Website Bib

The Hate Speech Detection (HaSpeeDe3) task is the third edition of a shared task on the detection of hateful content in Italian tweets. It differs from the previous editions while maintaining continuity in analysing and contrasting hate speech (HS) on social media. While HaSpeeDe and HaSpeeDe2 were focused on HS against immigrants, Muslims and Roms, HaSpeeDe3 explores hate speech in strong polarised debates, concerning in particular politics and religion. It is articulated in two different tasks: A) In-domain political hate speech detection and B) Cross-domain hate speech detection about political and religious tweets. Task A consists in two different subtasks for which participants i) can only use the provided textual content of the tweet, or ii) can additionally employ contextual information about the tweet and its author. In Task B, that consists in two subtasks, participants are allowed to use any kind of external data for detecting hate speech in tweets about i) politics and ii) religion. Six teams from both academia and industry participated in the evaluation, with a total of 13 submitted runs for Task A and 16 for Task B.
VarDial@EACL

DiatopIt: A Corpus of Social Media Posts for the Study of Diatopic Language Variation in Italy

Ramponi, A., and Casula, C.

In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Abs HTML PDF Repo Video Poster Slides Bib

We introduce DiatopIt, the first corpus specifically focused on diatopic language variation in Italy for language varieties other than Standard Italian. DiatopIt comprises over 15K geolocated social media posts from Twitter over a period of two years, including regional Italian usage and content fully written in local language varieties or exhibiting code-switching with Standard Italian. We detail how we tackled key challenges in creating such a resource, including the absence of orthography standards for most local language varieties and the lack of reliable language identification tools. We assess the representativeness of DiatopIt across time and space, and show that the density of non-Standard Italian content across areas correlates with actual language use. We finally conduct computational experiments and find that modeling diatopic variation on highly multilingual areas such as Italy is a complex task even for recent language models.
NAACL

Features or Spurious Artifacts? Data-centric Baselines for Fair and Robust Hate Speech Detection

Ramponi, A., and Tonelli, S.

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Abs HTML PDF Repo Video Slides Teaser Bib

Avoiding to rely on dataset artifacts to predict hate speech is at the cornerstone of robust and fair hate speech detection. In this paper we critically analyze lexical biases in hate speech detection via a cross-platform study, disentangling various types of spurious and authentic artifacts and analyzing their impact on out-of-distribution fairness and robustness. We experiment with existing approaches and propose simple yet surprisingly effective data-centric baselines. Our results on English data across four platforms show that distinct spurious artifacts require different treatments to ultimately attain both robustness and fairness in hate speech detection. To encourage research in this direction, we release all baseline models and the code to compute artifacts, pointing it out as a complementary and necessary addition to the data statements practice.
SemEval@NAACL

DH-FBK at SemEval-2022 Task 4: Leveraging Annotators' Disagreement and Multiple Data Views for Patronizing Language Detection

Ramponi, A., and Leonardelli, E.

In Proceedings of the 16th International Workshop on Semantic Evaluation, 2022

Abs HTML PDF Repo Video Poster Slides Bib

The subtle and typically unconscious use of patronizing and condescending language (PCL) in large-audience media outlets undesirably feeds stereotypes and strengthens power-knowledge relationships, perpetuating discrimination towards vulnerable communities. Due to its subjective and subtle nature, PCL detection is an open and challenging problem, both for computational methods and human annotators. In this paper we describe the systems submitted by the DH-FBK team to SemEval-2022 Task 4, aiming at detecting PCL towards vulnerable communities in English media texts. Motivated by the subjectivity of human interpretation, we propose to leverage annotators' uncertainty and disagreement to better capture the shades of PCL in a multi-task, multi-view learning framework. Our approach achieves competitive results, largely outperforming baselines and ranking on the top-left side of the leaderboard on both PCL identification and classification. Noticeably, our approach does not rely on any external data or model ensemble, making it a viable and attractive solution for real-world use.
PeerJ Comp Sci

Addressing Religious Hate Online: From Taxonomy Creation to Automated Detection

Ramponi, A., Testa, B., Tonelli, S., and Jezek, E.

PeerJ Computer Science, 2022

Abs HTML PDF Repo Bib

Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech.
W-NUT@EMNLP

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization

Van Der Goot, R., Ramponi, A., Zubiaga, A., Plank, B., Muller, B., San Vicente Roncal, I., Ljubešić, N., Çetinoğlu, Ö., Mahendra, R., Çolakoğlu, T., Baldwin, T., Caselli, T., and Sidorenko, W.

In Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021

Abs HTML PDF Repo Slides Website Bib

Lexical normalization is the task of transforming an utterance into its standardized form. This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation. Such variation is typical for social media on which information is shared in a multitude of ways, including diverse languages and code-switching. Since the seminal work of Han and Baldwin (2011) a decade ago, lexical normalization has attracted attention in English and multiple other languages. However, there exists a lack of a common benchmark for comparison of systems across languages with a homogeneous data and evaluation setup. The MultiLexNorm shared task sets out to fill this gap. We provide the largest publicly available multilingual lexical normalization benchmark including 13 language variants. We propose a homogenized evaluation setup with both intrinsic and extrinsic evaluation. As extrinsic evaluation, we use dependency parsing and part-of-speech tagging with adapted evaluation metrics (a-LAS, a-UAS, and a-POS) to account for alignment discrepancies. The shared task hosted at W-NUT 2021 attracted 9 participants and 18 submissions. The results show that neural normalization systems outperform the previous state-of-the-art system by a large margin. Downstream parsing and part-of-speech tagging performance is positively affected but to varying degrees, with improvements of up to 1.72 a-LAS, 0.85 a-UAS, and 1.54 a-POS for the winning system.
NAACL

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

Van Der Goot, R., Sharaf, I., Imankulova, A., Üstün, A., Stepanović, M., Ramponi, A., Khairunnisa, S. O., Komachi, M., and Plank, B.

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Abs HTML PDF Repo Video Poster Slides Bib

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
EACL

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

Van Der Goot, R., Üstün, A., Ramponi, A., Sharaf, I., and Plank, B.

In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021 — Outstanding paper award

Abs HTML PDF Repo Video Poster Website Bib

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation.
Front Cell Dev Biol

Reconstruction of the Cytokine Signaling in Lysosomal Storage Diseases by Literature Mining and Network Analysis

Parolo, S., Tomasoni, D., Bora, P., Ramponi, A., Kaddi, C., Azer, K., Domenici, E., Neves-Zaph, S., and Lombardo, R.

Frontiers in Cell and Developmental Biology, 2021

Abs HTML PDF Bib

Lysosomal storage diseases (LSDs) are characterized by the abnormal accumulation of substrates in tissues due to the deficiency of lysosomal proteins. Among the numerous clinical manifestations, chronic inflammation has been consistently reported for several LSDs. However, the molecular mechanisms involved in the inflammatory response are still not completely understood. In this study, we performed text-mining and systems biology analyses to investigate the inflammatory signals in three LSDs characterized by sphingolipid accumulation: Gaucher disease, Acid Sphingomyelinase Deficiency (ASMD), and Fabry Disease. We first identified the cytokines linked to the LSDs, and then built on the extracted knowledge to investigate the inflammatory signals. We found numerous transcription factors that are putative regulators of cytokine expression in a cell-specific context, such as the signaling axes controlled by STAT2, JUN, and NR4A2 as candidate regulators of the monocyte Gaucher disease cytokine network. Overall, our results suggest the presence of a complex inflammatory signaling in LSDs involving many cellular and molecular players that could be further investigated as putative targets of anti-inflammatory therapies.
COLING

Neural Unsupervised Domain Adaptation in NLP---A Survey

Ramponi, A., and Plank, B.

In Proceedings of the 28th International Conference on Computational Linguistics, 2020

Abs HTML PDF Poster Website Bib

Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.
EMNLP

Biomedical Event Extraction as Sequence Labeling

Ramponi, A., Van Der Goot, R., Lombardo, R., and Plank, B.

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Abs HTML PDF Repo Video Slides Bib

We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model. BeeSL recasts the task as sequence labeling, taking advantage of a multi-label aware encoding strategy and jointly modeling the intermediate tasks via multi-task learning. BeeSL is fast, accurate, end-to-end, and unlike current methods does not require any external knowledge base or preprocessing tools. BeeSL outperforms the current best system (Li et al., 2019) on the Genia 2011 benchmark by 1.57% absolute F1 score reaching 60.22% F1, establishing a new state of the art for the task. Importantly, we also provide first results on biomedical event extraction without gold entity information. Empirical results show that BeeSL's speed and accuracy makes it a viable approach for large-scale real-world scenarios.
LREC

Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing

Van Der Goot, R., Ramponi, A., Caselli, T., Cafagna, M., and De Mattei, L.

In Proceedings of the 12th Language Resources and Evaluation Conference, 2020

Abs HTML PDF Repo Bib

Lexical normalization is the task of translating non-standard social media data to a standard form. Previous work has shown that this is beneficial for many downstream tasks in multiple languages. However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data. In this paper, we discuss the creation of a lexical normalization dataset for Italian. After two rounds of annotation, a Cohen's kappa score of 78.64 is obtained. During this process, we also analyze the inter-annotator agreement for this task, which is only rarely done on datasets for lexical normalization, and when it is reported, the analysis usually remains shallow. Furthermore, we utilize this dataset to train a lexical normalization model and show that it can be used to improve dependency parsing of social media data. All annotated data and the code to reproduce the results are available at: http://bitbucket.org/robvanderg/normit.
LREC

Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction

Ramponi, A., Plank, B., and Lombardo, R.

In Proceedings of the 12th Language Resources and Evaluation Conference, 2020

Abs HTML PDF Repo Bib

Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985.
IEEE Access

High-Precision Biomedical Relation Extraction for Reducing Human Curation Efforts in Industrial Applications

Ramponi, A., Giampiccolo, S., Tomasoni, D., Priami, C., and Lombardo, R.

IEEE Access, 2020

Abs HTML PDF Bib

The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale.

Other

Knowledge Extraction from Biomedical Literature with Symbolic and Deep Transfer Learning Methods

Ramponi, A., Ph.D. thesis. University of Trento, Italy, 2021