zl程序教程

您现在的位置是:首页 >  IT要闻

当前栏目

自然语言处理学术速递[12.20]

2023-04-18 16:12:56 时间

cs.CL 方向,今日共计22篇

Transformer(1篇)

【1】 Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations 标题:通过LSTM和转换器学习有界上下文无关文法:差异与解释 链接:https://arxiv.org/abs/2112.09174

作者:Hui Shi,Sicun Gao,Yuandong Tian,Xinyun Chen,Jishen Zhao 机构:University of California San Diego,Facebook AI Research,University of California, Berkeley 备注:Accepted By AAAI22 摘要:长短时记忆(LSTM)和变换器是两种常用的用于自然语言处理任务的神经结构。理论结果表明,这两种语言都是图灵完备的,可以表示任何上下文无关语言(CFL)。在实践中,经常观察到Transformer模型比LSTM具有更好的表示能力。但原因却鲜为人知。我们研究了LSTM和Transformer之间的实际差异,并根据它们的潜在空间分解模式提出了解释。为了实现这一目标,我们引入了oracle训练范式,该范式强制分解LSTM和Transformer的潜在表示,并监督相应CFL的下推自动机(PDA)的转换。通过强制分解,我们证明了LSTM和Transformer在学习CFL时的性能上界是接近的:它们都可以模拟堆栈并执行堆栈操作以及状态转换。然而,缺乏强制分解会导致LSTM模型无法捕获堆栈和堆栈操作,同时对Transformer模型的影响很小。最后,我们将原型PDA上的实验与真实世界的解析任务联系起来,以重新验证结论 摘要:Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. Theoretical results show that both are Turing-complete and can represent any context-free language (CFL).In practice, it is often observed that Transformer models have better representation power than LSTM. But the reason is barely understood. We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns. To achieve this goal, we introduce an oracle training paradigm, which forces the decomposition of the latent representation of LSTM and the Transformer and supervises with the transitions of the Pushdown Automaton (PDA) of the corresponding CFL. With the forced decomposition, we show that the performance upper bounds of LSTM and Transformer in learning CFL are close: both of them can simulate a stack and perform stack operation along with state transitions. However, the absence of forced decomposition leads to the failure of LSTM models to capture the stack and stack operations, while having a marginal impact on the Transformer model. Lastly, we connect the experiment on the prototypical PDA to a real-world parsing task to re-verify the conclusions

QA|VQA|问答|对话(2篇)

【1】 Reasoning Chain Based Adversarial Attack for Multi-hop Question Answering 标题:基于推理链的多跳问答对抗性攻击 链接:https://arxiv.org/abs/2112.09658

作者:Jiayu Ding,Siyuan Wang,Qin Chen,Zhongyu Wei 机构:A R T I C L E I N F O 备注:10 pages including reference, 4 figures 摘要:近年来,在挑战多跳QA任务方面取得了令人印象深刻的进展。然而,当面对输入文本中的一些干扰时,这些QA模型可能会失败,并且它们进行多跳推理的可解释性仍然不确定。以往的对抗性攻击通常对整个问题句进行编辑,这对测试基于实体的多跳推理能力影响有限。本文提出了一种基于多跳推理链的对抗攻击方法。我们在构建的图中建立了从查询实体到答案实体的多跳推理链,这允许我们将问题与每个推理跳对齐,从而攻击任何跳。我们将问题分类为不同的推理类型,并对所选推理跳对应的部分问题进行逆向修改,以生成分心句。我们在HotpotQA数据集的三个QA模型上测试了我们的对抗方案。结果表明,在回答和支持事实预测方面,性能显著降低,验证了基于推理链的攻击方法对多跳推理模型的有效性及其脆弱性。我们的对抗性再训练进一步提高了这些模型的性能和鲁棒性。 摘要:Recent years have witnessed impressive advances in challenging multi-hop QA tasks. However, these QA models may fail when faced with some disturbance in the input text and their interpretability for conducting multi-hop reasoning remains uncertain. Previous adversarial attack works usually edit the whole question sentence, which has limited effect on testing the entity-based multi-hop inference ability. In this paper, we propose a multi-hop reasoning chain based adversarial attack method. We formulate the multi-hop reasoning chains starting from the query entity to the answer entity in the constructed graph, which allows us to align the question to each reasoning hop and thus attack any hop. We categorize the questions into different reasoning types and adversarially modify part of the question corresponding to the selected reasoning hop to generate the distracting sentence. We test our adversarial scheme on three QA models on HotpotQA dataset. The results demonstrate significant performance reduction on both answer and supporting facts prediction, verifying the effectiveness of our reasoning chain based attack method for multi-hop reasoning models and the vulnerability of them. Our adversarial re-training further improves the performance and robustness of these models.

【2】 WebGPT: Browser-assisted question-answering with human feedback 标题:WebGPT:具有人工反馈的浏览器辅助问答 链接:https://arxiv.org/abs/2112.09332

作者:Reiichiro Nakano,Jacob Hilton,Suchir Balaji,Jeff Wu,Long Ouyang,Christina Kim,Christopher Hesse,Shantanu Jain,Vineet Kosaraju,William Saunders,Xu Jiang,Karl Cobbe,Tyna Eloundou,Gretchen Krueger,Kevin Button,Matthew Knight,Benjamin Chess,John Schulman 机构:OpenAI 备注:30 pages 摘要:我们使用基于文本的web浏览环境对GPT-3进行微调,以回答长格式的问题,该环境允许模型搜索和浏览web。通过将任务设置为可由人工执行,我们可以使用模仿学习对任务模型进行训练,然后通过人工反馈优化答案质量。为了使人类更容易评估事实的准确性,模型必须在浏览时收集参考资料以支持其答案。我们在ELI5(Reddit用户提出的问题数据集)上训练和评估我们的模型。我们的最佳模型是通过使用行为克隆对GPT-3进行微调,然后对经过训练以预测人类偏好的奖励模型进行拒绝抽样来获得的。这个模型的答案在56%的时间里是人类的首选答案,而在69%的时间里是来自Reddit的投票率最高的答案。 摘要:We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.

语义分析(2篇)

【1】 Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling 标题:基于两阶段跨度标注的汉语联合分词和词性标注 链接:https://arxiv.org/abs/2112.09488

作者:Duc-Vu Nguyen,Linh-Bao Vo,Ngoc-Linh Tran,Kiet Van Nguyen,Ngan Luu-Thuy Nguyen 机构:Multimedia Communications Laboratory, University of Information Technology, University of Information Technology, Ho Chi Minh City, Vietnam, Vietnam National University, Ho Chi Minh City, Vietnam 备注:In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021) 摘要:中文分词和词性标注是计算语言学和自然语言处理应用中必不可少的任务。在深度学习时代,许多研究者仍在争论汉语分词和词性标注的需求。然而,解决歧义和检测未知词是这一领域的挑战性问题。以往对汉语分词和词性标注的联合研究主要遵循基于字符的标注模型,重点是对n-gram特征进行建模。与以前的工作不同,我们提出了一个神经模型,名为SpanSegTag,用于汉语分词和词性标注,其中每个n-gram作为单词和词性标注的概率是主要问题。我们在连续字符的左右边界表示上使用biaffine操作来建模n-gram。我们的实验表明,我们基于BERT的模型SpanSegTag在CTB5、CTB6和UD上取得了具有竞争力的性能,或者与当前使用BERT或ZEN编码器的最先进方法相比,在CTB7和CTB9基准数据集上取得了显著的改进。 摘要:Chinese word segmentation and part-of-speech tagging are necessary tasks in terms of computational linguistics and application of natural language processing. Many re-searchers still debate the demand for Chinese word segmentation and part-of-speech tagging in the deep learning era. Nevertheless, resolving ambiguities and detecting unknown words are challenging problems in this field. Previous studies on joint Chinese word segmentation and part-of-speech tagging mainly follow the character-based tagging model focusing on modeling n-gram features. Unlike previous works, we propose a neural model named SpanSegTag for joint Chinese word segmentation and part-of-speech tagging following the span labeling in which the probability of each n-gram being the word and the part-of-speech tag is the main problem. We use the biaffine operation over the left and right boundary representations of consecutive characters to model the n-grams. Our experiments show that our BERT-based model SpanSegTag achieved competitive performances on the CTB5, CTB6, and UD, or significant improvements on CTB7 and CTB9 benchmark datasets compared with the current state-of-the-art method using BERT or ZEN encoders.

【2】 Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets 标题:众包自然语言推理数据集中语义偏差的自动识别 链接:https://arxiv.org/abs/2112.09237

作者:Michael Saxon,Xinyi Wang,William Yang Wang 机构:University of California, Santa Barbara 备注:5 pages, 4 figures, 2 tables 摘要:自然语言推理(NLI)是生成有用的人类语言模型的一项重要任务。不幸的是,大规模NLI数据集的制作依赖于众工,他们在所写的句子中容易引入偏见。特别是,如果没有质量控制,他们会产生一些假设,在没有前提的情况下,这些假设可以比偶然更好地预测关系标签。我们引入了一种模型驱动、无监督的技术,在NLI数据集中的假设的学习嵌入空间中找到“偏差簇”,从中可以进行干预和额外的标记,以改善数据集假设分布的语义偏差。 摘要:Natural language inference (NLI) is an important task for producing useful models of human language. Unfortunately large-scale NLI dataset production relies on crowdworkers who are prone to introduce biases in the sentences they write. In particular, without quality control they produce hypotheses from which the relational label can be predicted, without the premise, better than chance. We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of the hypotheses in NLI datasets, from which interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.

Graph|知识图谱|Knowledge(3篇)

【1】 KGBoost: A Classification-based Knowledge Base Completion Method with Negative Sampling 标题:KGBoost:一种基于分类的负抽样知识库补全方法 链接:https://arxiv.org/abs/2112.09340

作者:Yun-Cheng Wang,Xiou Ge,Bin Wang,C. -C. Jay Kuo 机构:University of Southern California, Los Angeles, USA, National University of Singapore, C.-C. Jay Kuo 摘要:在这项工作中,知识库完成被描述为一个二元分类问题,使用知识图(KG)中的相关链接为每个关系训练XGBoost二元分类器。新方法KGBoost采用模块化设计,尝试寻找硬负样本,从而训练出一个强大的缺失链预测分类器。我们在多个基准数据集上进行了实验,并证明KGBoost在大多数数据集上都优于最先进的方法。此外,与通过端到端优化训练的模型相比,KGBoost在低维设置下工作良好,从而允许更小的模型尺寸。 摘要:Knowledge base completion is formulated as a binary classification problem in this work, where an XGBoost binary classifier is trained for each relation using relevant links in knowledge graphs (KGs). The new method, named KGBoost, adopts a modularized design and attempts to find hard negative samples so as to train a powerful classifier for missing link prediction. We conduct experiments on multiple benchmark datasets, and demonstrate that KGBoost outperforms state-of-the-art methods across most datasets. Furthermore, as compared with models trained by end-to-end optimization, KGBoost works well under the low-dimensional setting so as to allow a smaller model size.

【2】 Link-Intensive Alignment for Incomplete Knowledge Graphs 标题:不完备知识图的链接密集对齐 链接:https://arxiv.org/abs/2112.09266

作者:Vinh Van Tong,Thanh Trung Huynh,Thanh Tam Nguyen,Hongzhi Yin,Quoc Viet Hung Nguyen,Quyet Thang Huynh 机构:Hanoi University of Science and Technology, Vietnam, Griffith University, Australia, The University of Queensland, Australia 摘要:知识图(KG)对齐——识别不同KG中引用相同事物的实体的任务——被认为是KG构建和完成领域中最重要的操作之一。然而,现有的对齐技术通常假设输入KG是完整的和同构的,这是不正确的,因为现实世界中的域、大小和稀疏性存在异质性。在这项工作中,我们解决了将不完全KG与表征学习相结合的问题。我们的KG嵌入框架利用了两个特征通道:基于传递性和基于邻近性。前者通过转换路径捕捉实体间的一致性约束,后者通过注意引导的关系感知图神经网络捕捉KG的邻域结构。两个特征通道被联合学习以在输入KG之间交换重要特征,同时在相同的嵌入空间中强制输入KG的输出表示。此外,我们还开发了一个缺失链接检测器,用于在训练过程中发现并恢复输入KG中的缺失链接,这有助于缓解不完整性问题,从而提高学习表示的兼容性。然后融合嵌入以生成对齐结果,并将高置信度匹配的节点对更新为预对齐的监控数据以逐步改进嵌入。实证结果表明,我们的模型比SOTA高出15.2\%,并且对不同程度的不完全性具有鲁棒性。我们还证明,KG之间的知识交换有助于从知识图(也称为知识完成)中揭示不可见的事实,其结果比SOTA知识图完成技术高3.5%。 摘要:Knowledge graph (KG) alignment - the task of recognizing entities referring to the same thing in different KGs - is recognized as one of the most important operations in the field of KG construction and completion. However, existing alignment techniques often assume that the input KGs are complete and isomorphic, which is not true due to the real-world heterogeneity in the domain, size, and sparsity. In this work, we address the problem of aligning incomplete KGs with representation learning. Our KG embedding framework exploits two feature channels: transitivity-based and proximity-based. The former captures the consistency constraints between entities via translation paths, while the latter captures the neighbourhood structure of KGs via attention guided relation-aware graph neural network. The two feature channels are jointly learned to exchange important features between the input KGs while enforcing the output representations of the input KGs in the same embedding space. Also, we develop a missing links detector that discovers and recovers the missing links in the input KGs during the training process, which helps mitigate the incompleteness issue and thus improve the compatibility of the learned representations. The embeddings then are fused to generate the alignment result, and the high-confidence matched node pairs are updated to the pre-aligned supervision data to improve the embeddings gradually. Empirical results show that our model is up to 15.2\% more accurate than the SOTA and is robust against different levels of incompleteness. We also demonstrate that the knowledge exchanging between the KGs helps reveal the unseen facts from knowledge graphs (a.k.a. knowledge completion), with the result being 3.5\% higher than the SOTA knowledge graph completion techniques.

【3】 Two-view Graph Neural Networks for Knowledge Graph Completion 标题:用于知识图补全的双视图图神经网络 链接:https://arxiv.org/abs/2112.09231

作者:Vinh Tong,Dai Quoc Nguyen,Dinh Phung,Dat Quoc Nguyen 机构:VinAI Research, Vietnam; ,Oracle Labs, Australia; ,Monash University, Australia 摘要:在本文中,我们引入了一种新的基于GNN的知识图嵌入模型WGE,用于捕获以实体为中心的图结构和以关系为中心的图结构。特别是,给定知识图,WGE构建一个以实体为中心的无向图,将实体视为节点。此外,WGE还从以关系为中心的约束构造另一个单一的无向图,将实体和关系视为节点。然后,WGE提出了一种新的体系结构,在这两个单独的图上直接使用两个普通GNN来更好地更新实体和关系的向量表示,然后使用加权分数函数来返回三重分数。实验结果表明,WGE在三个新的具有挑战性的基准数据集CoDEx上获得了最先进的性能,用于知识图的完成。 摘要:In this paper, we introduce a novel GNN-based knowledge graph embedding model, named WGE, to capture entity-focused graph structure and relation-focused graph structure. In particular, given the knowledge graph, WGE builds a single undirected entity-focused graph that views entities as nodes. In addition, WGE also constructs another single undirected graph from relation-focused constraints, which views entities and relations as nodes. WGE then proposes a new architecture of utilizing two vanilla GNNs directly on these two single graphs to better update vector representations of entities and relations, followed by a weighted score function to return the triple scores. Experimental results show that WGE obtains state-of-the-art performances on three new and challenging benchmark datasets CoDEx for knowledge graph completion.

摘要|信息提取(1篇)

【1】 Topic-Aware Encoding for Extractive Summarization 标题:用于抽取摘要的主题感知编码 链接:https://arxiv.org/abs/2112.09572

作者:Mingyang Song,Liping Jing 机构:Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China 备注:4 pages, 0 figures 摘要:文档摘要提供了一种工具,可以更快地理解文本文档的集合,并具有多种实际应用。随着在线文本数据的增长,近年来提出了许多摘要模型。基于Sequence-to-Sequence(Seq2Seq)的神经摘要模型由于其高性能在摘要领域得到了最广泛的应用。这是因为编码时充分考虑了文本中的语义信息和结构信息。然而,现有的抽取式摘要模型很少关注并使用中心主题信息来辅助摘要的生成,这导致模型不能保证在主主题下生成摘要。一份冗长的文档可以跨越多个主题,而一份摘要不能公正地涵盖所有主题。因此,生成高质量摘要的关键是确定中心主题并在此基础上构建摘要,特别是对于长文档。为了解决这个问题,我们提出了一种基于主题的文档摘要编码方法。该模型有效地结合了句法层面和话题层面的信息,构建了一个全面的句子表示。具体地,在基于神经的句子级表示学习中添加神经主题模型,以充分考虑中心主题信息以捕获原始文档中的关键内容。在三个公共数据集上的实验结果表明,我们的模型优于最先进的模型。 摘要:Document summarization provides an instrument for faster understanding the collection of text documents and has several real-life applications. With the growth of online text data, numerous summarization models have been proposed recently. The Sequence-to-Sequence (Seq2Seq) based neural summarization model is the most widely used in the summarization field due to its high performance. This is because semantic information and structure information in the text is adequately considered when encoding. However, the existing extractive summarization models pay little attention to and use the central topic information to assist the generation of summaries, which leads to models not ensuring the generated summary under the primary topic. A lengthy document can span several topics, and a single summary cannot do justice to all the topics. Therefore, the key to generating a high-quality summary is determining the central topic and building a summary based on it, especially for a long document. We propose a topic-aware encoding for document summarization to deal with this issue. This model effectively combines syntactic-level and topic-level information to build a comprehensive sentence representation. Specifically, a neural topic model is added in the neural-based sentence-level representation learning to adequately consider the central topic information for capturing the critical content in the original document. The experimental results on three public datasets show that our model outperforms the state-of-the-art models.

推理|分析|理解|解释(1篇)

【1】 Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations 标题:解释、编辑和理解:对评估模型解释的用户研究设计的再思考 链接:https://arxiv.org/abs/2112.09669

作者:Siddhant Arora,Danish Pruthi,Norman Sadeh,William W. Cohen,Zachary C. Lipton,Graham Neubig 机构: Carnegie Mellon University, Google AI 备注:AAAI 2022 摘要:为了“解释”机器学习模型的预测,研究人员提出了数百种将预测归因于重要特征的技术。虽然这些归因常常被认为有可能提高人类对模型的“理解”,但令人惊讶的是,很少有工作明确评估实现这一愿望的进展。在本文中,我们进行了一项众包研究,参与者与欺骗检测模型互动,该模型经过训练,能够区分真假酒店评论。他们面临的挑战是在新评论上模拟模型,以及编辑评论,以降低最初预测的类的概率。成功的操纵将导致一个敌对的例子。在训练(而不是测试)阶段,输入范围会突出显示,以传达显著性。通过我们的评估,我们观察到,与无解释控制相比,对于线性词袋模型,在训练期间能够访问特征系数的参与者能够在测试阶段导致模型置信度的更大降低。对于基于BERT的分类器,流行的局部解释并不能提高其在无解释情况下降低模型可信度的能力。值得注意的是,当通过一个线性模型的(全局)属性来解释BERT模型时,人们可以有效地操纵该模型。 摘要:In attempts to "explain" predictions of machine learning models, researchers have proposed hundreds of techniques for attributing predictions to features that are deemed important. While these attributions are often claimed to hold the potential to improve human "understanding" of the models, surprisingly little work explicitly evaluates progress towards this aspiration. In this paper, we conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. They are challenged both to simulate the model on fresh reviews, and to edit reviews with the goal of lowering the probability of the originally predicted class. Successful manipulations would lead to an adversarial example. During the training (but not the test) phase, input spans are highlighted to communicate salience. Through our evaluation, we observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control. For the BERT-based classifier, popular local explanations do not improve their ability to reduce the model confidence over the no-explanation case. Remarkably, when the explanation for the BERT model is given by the (global) attributions of a linear model trained to imitate the BERT model, people can effectively manipulate the model.

半/弱/无监督|不确定性(1篇)

【1】 Expedition: A System for the Unsupervised Learning of a Hierarchy of Concepts 标题:探险:一种概念层次的无监督学习系统 链接:https://arxiv.org/abs/2112.09348

作者:Omid Madani 机构:Cisco Secure Workload, University Ave., Suite , Palo Alto, CA 摘要:我们提出了一个自底向上的累积学习系统,用于学习与有意义字符串及其部分相关和预测边缘相对应的无数概念。学习是自我监督的,因为发现的概念被用作预测因子以及预测目标。我们设计了一个目标,通过与基线预测系统的比较,对所学概念进行分段,从而促进更大概念的形成和使用,进而允许预测更大的文本跨度,我们描述了一种促进探索的简单技术,即在细分过程中尝试新生成的概念。我们激发并解释概念分层,以帮助分离概念之间学习的(条件)分布。概念的分层大致对应于部分-整体概念层次结构。通过基本的分词和学习算法,该系统有望获得许多概念(在我们的小规模实验中有上万个),并能很好地学习文本的分词:当输入英语文本时,去掉空格,从字符级别开始,所学的很多内容都尊重单词或短语的边界,随着时间的推移,随着更大的概念被发现,并且系统学会了在切分过程中何时使用它们,切分中的“坏”分裂(即单词内部分裂)的平均数量会减少。我们报告了当输入文本被转换成二进制并且系统只从两个概念开始,“0”和“1”时有希望的实验。该系统是透明的,从这个意义上讲,很容易判断所学的概念对应于什么,哪些概念在细分中是活跃的,或者系统如何“看到”其输入。我们希望这个框架是可扩展的,我们讨论了当前的局限性以及增强学习和推理能力的一些方向。 摘要:We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaningful character strings, and their part-related and prediction edges. The learning is self-supervised in that the concepts discovered are used as predictors as well as targets of prediction. We devise an objective for segmenting with the learned concepts, derived from comparing to a baseline prediction system, that promotes making and using larger concepts, which in turn allows for predicting larger spans of text, and we describe a simple technique to promote exploration, i.e. trying out newly generated concepts in the segmentation process. We motivate and explain a layering of the concepts, to help separate the (conditional) distributions learnt among concepts. The layering of the concepts roughly corresponds to a part-whole concept hierarchy. With rudimentary segmentation and learning algorithms, the system is promising in that it acquires many concepts (tens of thousands in our small-scale experiments), and it learns to segment text well: when fed with English text with spaces removed, starting at the character level, much of what is learned respects word or phrase boundaries, and over time the average number of "bad" splits within segmentations, i.e. splits inside words, decreases as larger concepts are discovered and the system learns when to use them during segmentation. We report on promising experiments when the input text is converted to binary and the system begins with only two concepts, "0" and "1". The system is transparent, in the sense that it is easy to tell what the concepts learned correspond to, and which ones are active in a segmentation, or how the system "sees" its input. We expect this framework to be extensible and we discuss the current limitations and a number of directions for enhancing the learning and inference capabilities.

识别/分类(2篇)

【1】 Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages 标题:2021年HASOC火灾中的子路径概览:英语和印度雅利安语言中的仇恨言论和攻击性内容识别 链接:https://arxiv.org/abs/2112.09301

作者:Thomas Mandl,Sandip Modha,Gautam Kishore Shahi,Hiren Madhu,Shrey Satapara,Prasenjit Majumder,Johannes Schaefer,Tharindu Ranasinghe,Marcos Zampieri,Durgesh Nandini,Amit Kumar Jaiswal 机构:University of Hildesheim, Germany, LDRP-ITR, Gandhinagar, India, University of Duisburg-Essen, Germany, Indian Institute of Science, Bangalore, India, DA-IICT, Gandhinagar, India, University of Wolverhampton, United Kingdom, Rochester Institute of Technology, USA 摘要:仇恨言论等攻击性内容在网上的广泛传播构成了一个日益严重的社会问题。AI工具对于支持在线平台上的审核过程是必要的。为了评估这些识别工具,有必要对不同语言的数据集进行连续实验。HASOC track(仇恨言论和攻击性内容识别)致力于为此目的开发基准数据。本文介绍了英语、印地语和马拉地语的HASOC子词条。该数据集是从Twitter收集的。此子机架有两个子任务。任务A是一个针对所有三种语言的二进制分类问题(仇恨而非攻击性)。任务B是一个针对三类(仇恨)的细粒度分类问题,即针对英语和印地语的仇恨言论、冒犯和亵渎。总共有65个团队提交了652分。任务A的最佳分类算法的性能为F1,马拉地语、印地语和英语分别为0.91、0.78和0.83。本概述介绍了任务和数据开发以及详细结果。提交竞赛的系统应用了多种技术。性能最好的算法主要是Transformer结构的变体。 摘要:The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at online platforms. For the evaluation of these identification tools, continuous experimentation with data sets in different languages are necessary. The HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop benchmark data for this purpose. This paper presents the HASOC subtrack for English, Hindi, and Marathi. The data set was assembled from Twitter. This subtrack has two sub-tasks. Task A is a binary classification problem (Hate and Not Offensive) offered for all three languages. Task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY offered for English and Hindi. Overall, 652 runs were submitted by 65 teams. The performance of the best classification algorithms for task A are F1 measures 0.91, 0.78 and 0.83 for Marathi, Hindi and English, respectively. This overview presents the tasks and the data development as well as the detailed results. The systems submitted to the competition applied a variety of technologies. The best performing algorithms were mainly variants of transformer architectures.

【2】 Continual Learning for Monolingual End-to-End Automatic Speech Recognition 标题:基于连续学习的单语端到端自动语音识别 链接:https://arxiv.org/abs/2112.09427

作者:Steven Vander Eeckt,Hugo Van hamme 机构:KU Leuven, Department Electrical Engineering ESAT-PSI, Leuven, Belgium 备注:Submitted to ICASSP 2021. 5 pages, 1 figure 摘要:使自动语音识别(ASR)模型适应新的领域会导致原始领域的性能下降,这种现象称为灾难性遗忘(CF)。即使是单语ASR模型也无法扩展到新的口音、方言、主题等,而不会受到CF的影响,这使得它们无法在不存储所有过去数据的情况下不断增强。幸运的是,可以使用持续学习(CL)方法,其目的是在克服CF的同时实现持续适应。在本文中,我们为端到端ASR实现了大量的CL方法,并测试和比较了它们在四个新任务中扩展单语混合CTCTransformer模型的能力。我们发现,性能最佳的CL方法将微调模型(下限)和所有任务联合训练的模型(上限)之间的差距缩小了40%以上,同时只需要访问0.6%的原始数据。 摘要:Adapting Automatic Speech Recognition (ASR) models to new domains leads to a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.

检索(1篇)

【1】 Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking标题:基于Top-k掩蔽的稀疏表示稀疏表示在通道检索中的应用链接:https://arxiv.org/abs/2112.09628

作者:Jheng-Hong Yang,Xueguang Ma,Jimmy Lin 机构:David R. Cheriton School of Computer Science, University of Waterloo 备注:8 pages, 1 figure 摘要:在最近的模型中,如DeepImpact、uniCOIL和SPLADE,稀疏词汇表征学习在提高段落检索效率方面取得了很大进展。本文描述了一种简单而有效的方法,通过引入top-$k$掩蔽方案来控制稀疏性,并引入一种自学习方法来引导掩蔽表示来模拟无掩蔽表示,从而在SPLADE的基础上为文章检索稀疏化词汇表示。我们模型的一个基本实现与更复杂的方法具有竞争力,并在有效性和效率之间实现了良好的平衡。我们的方法的简单性为将来探索用于段落检索的词汇表征学习打开了大门。 摘要:Sparse lexical representation learning has demonstrated much progress in improving passage retrieval effectiveness in recent models such as DeepImpact, uniCOIL, and SPLADE. This paper describes a straightforward yet effective approach for sparsifying lexical representations for passage retrieval, building on SPLADE by introducing a top-$k$ masking scheme to control sparsity and a self-learning method to coax masked representations to mimic unmasked representations. A basic implementation of our model is competitive with more sophisticated approaches and achieves a good balance between effectiveness and efficiency. The simplicity of our methods opens the door for future explorations in lexical representation learning for passage retrieval.

表征(1篇)

【1】 Hyperbolic Disentangled Representation for Fine-Grained Aspect Extraction 标题:用于细粒度特征提取的双曲解缠表示法 链接:https://arxiv.org/abs/2112.09215

作者:Chang-You Tai,Ming-Yao Li,Lun-Wei Ku 机构:Academia Sinica, Taipei, Taiwan 摘要:从用户评论中自动识别显著方面对于意见分析特别有用。在利用弱监督方法方面已经取得了重大进展,这种方法只需要一小部分种子词来训练方面分类器。然而,总有改进的余地。首先,没有一种弱监督方法能够充分利用词之间的潜在层次结构。第二,每个种子词表示应该具有不同的潜在语义,并且当它表示不同的方面时应该是不同的。在本文中,我们提出了HDAE,一个双曲线解纠缠体提取器,其中一个双曲线体分类器捕获单词的潜在层次结构,并且体解纠缠表示为每个种子单词的不同潜在语义建模。与之前的基线相比,HDAE在亚马逊产品评论和餐厅评论数据集上实现的F1平均性能增益分别为18.2%和24.1%。此外,em bedding可视化经验表明,HDAE是利用种子词的更有效方法。烧蚀研究和案例研究进一步证明了所提出组件的有效性 摘要:Automatic identification of salient aspects from user reviews is especially useful for opinion analysis. There has been significant progress in utilizing weakly supervised approaches, which require only a small set of seed words for training aspect classifiers. However, there is always room for improvement. First, no weakly supervised approaches fully utilize latent hierarchies between words. Second, each seed words representation should have different latent semantics and be distinct when it represents a different aspect. In this paper, we propose HDAE, a hyperbolic disentangled aspect extractor in which a hyperbolic aspect classifier captures words latent hierarchies, and aspect-disentangled representation models the distinct latent semantics of each seed word. Compared to previous baselines, HDAE achieves average F1 performance gains of 18.2% and 24.1% on Amazon product review and restaurant review datasets, respectively. In addition, the em-bedding visualization experience demonstrates that HDAE is a more effective approach to leveraging seed words. An ablation study and a case study further attest to the effectiveness of the proposed components

Word2Vec|文本|单词(1篇)

【1】 Sublinear Time Approximation of Text Similarity Matrices 标题:文本相似矩阵的次线性时间逼近 链接:https://arxiv.org/abs/2112.09631

作者:Archan Ray,Nicholas Monath,Andrew McCallum,Cameron Musco 机构:University of Massachusetts Amherst 备注:25 pages, 10 figures 摘要:我们研究了自然语言处理中出现的成对相似矩阵的近似算法。通常,计算$n$数据点的相似性矩阵需要进行$Omega(n^2)$相似性计算。这种二次标度是一个重要的瓶颈,特别是当通过昂贵的函数(例如,通过Transformer模型)计算相似性时。近似方法降低了这种二次复杂度,通常使用精确计算的相似性的一小部分来近似完整的成对相似矩阵的其余部分。重要的工作集中在正半定(PSD)相似矩阵的有效逼近上,例如在核方法中出现的。然而,人们对NLP中经常出现的不确定(非PSD)相似矩阵的了解却少得多。由于观察到这些矩阵中的许多仍然与PSD有点接近,我们将流行的Nystr“{o}m方法推广到不确定设置。我们的算法可以应用于任何相似矩阵,并在矩阵大小的次线性时间内运行,产生秩为$s$的近似值,只有$o(ns)$相似性计算。我们表明,我们的方法,连同CUR分解的一个简单变体,在近似NLP任务中出现的各种相似矩阵方面表现得非常好。我们在文档分类、句子相似性和跨文档引用等下游任务中展示了近似相似矩阵的高精度。 摘要:We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystr"{o}m method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.

其他神经网络|深度学习|模型|建模(1篇)

【1】 An Empirical Investigation of the Role of Pre-training in Lifelong Learning 标题:关于职前训练在终身学习中作用的实证研究 链接:https://arxiv.org/abs/2112.09153

作者:Sanket Vaibhav Mehta,Darshan Patil,Sarath Chandar,Emma Strubell 机构:Carnegie Mellon University,Mila - Quebec AI Institute,University of Montreal, École Polytechnique de Montréal,Canada CIFAR AI Chair 备注:30 pages 摘要:机器学习中的终身学习模式是一种有吸引力的替代更为突出的孤立学习模式的方法,这不仅是因为它与生物学习相似,还因为它可以通过避免过度的模型再训练来减少能量浪费。这一范式的一个关键挑战是灾难性遗忘现象。随着预训练模型在机器学习中的日益普及和成功,我们提出了一个问题:预训练在终身学习中扮演什么角色,特别是在灾难性遗忘方面?我们在预先训练的大型模型中研究现有方法,并评估它们在各种文本和图像分类任务中的性能,包括使用15种不同NLP任务的新数据集进行的大规模研究。在所有设置中,我们观察到,与随机初始化的模型相比,通用预训练隐式地减轻了顺序学习多个任务时灾难性遗忘的影响。然后,我们进一步研究为什么在这种环境下,预先训练可以减轻遗忘。我们通过分析损失情况来研究这一现象,发现预先训练的权重似乎通过导致更大的极小值来缓解遗忘。基于这一观点,我们建议联合优化当前任务损失和损失盆地锐度,以便在顺序微调期间明确鼓励更宽的盆地。我们表明,这种优化方法可以在多个环境中实现与最先进的任务顺序连续学习相媲美的性能,而不会保留随任务数量而扩展的内存。 摘要:The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning, but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel dataset of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness in order to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach leads to performance comparable to the state-of-the-art in task-sequential continual learning across multiple settings, without retaining a memory that scales in size with the number of tasks.

其他(5篇)

【1】 Transcribing Natural Languages for The Deaf via Neural Editing Programs 标题:通过神经编辑程序转录聋人的自然语言 链接:https://arxiv.org/abs/2112.09600

作者:Dongxu Li,Chenchen Xu,Liu Liu,Yiran Zhong,Rong Wang,Lars Petersson,Hongdong Li 机构:The Australian National University,DATA,-CSIRO,Huawei Cyberverse Lab,SenseTime 摘要:这项工作研究了语言识别的任务,其目的是将聋人(听力障碍)社区的自然口语句子翻译成有序的手语语言。以前使用成对句子数据训练的序列到序列语言模型往往无法捕捉两种不同语言之间丰富的联系,导致不令人满意的转录。我们观察到,尽管语法不同,但为了便于聋人交流,gloss有效地简化了句子,同时与句子共享了很大一部分词汇。这促使我们通过在自然口语对应项上执行一系列编辑操作(如单词添加、删除和复制,称为编辑程序)来实现词汇分类。具体地说,我们设计了一种新的神经代理,该代理根据句子上下文和部分编辑结果学习合成和执行编辑程序。该代理被训练为模仿最小的编辑程序,同时通过策略梯度更广泛地探索程序空间,以优化序列转录质量。结果表明,我们的方法大大优于以前的分类模型。 摘要:This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses. Previous sequence-to-sequence language models trained with paired sentence-gloss data often fail to capture the rich connections between the two distinct languages, leading to unsatisfactory transcriptions. We observe that despite different grammars, glosses effectively simplify sentences for the ease of deaf communication, while sharing a large portion of vocabulary with sentences. This has motivated us to implement glossification by executing a collection of editing actions, e.g. word addition, deletion, and copying, called editing programs, on their natural spoken language counterparts. Specifically, we design a new neural agent that learns to synthesize and execute editing programs, conditioned on sentence contexts and partial editing results. The agent is trained to imitate minimal editing programs, while exploring more widely the program space via policy gradients to optimize sequence-wise transcription quality. Results show that our approach outperforms previous glossification models by a large margin.

【2】 Challenge Dataset of Cognates and False Friend Pairs from Indian Languages 标题:来自印度语言的同源词和虚假朋友对的挑战数据集 链接:https://arxiv.org/abs/2112.09526

作者:Diptesh Kanojia,Pushpak Bhattacharyya,Malhar Kulkarni,Gholamreza Haffari 机构:†Indian Institute of Technology Bombay, India, ♣IITB-Monash Research Academy, India, ⋆Monash University, Australia 备注:Published at LREC 2020 摘要:同源词存在于不同语言的同一文本的多个变体中(例如,德语中的“hund”和英语中的“hound”表示“dog”)。它们对各种自然语言处理(NLP)应用提出了挑战,如机器翻译、跨语种歧义消除、计算系统发育和信息检索。解决这一难题的一个可能的解决方案是跨语言对识别同源词。在本文中,我们描述了十二种印度语言的两个同源数据集的创建,即梵语、印地语、阿萨姆语、奥利亚语、卡纳达语、古吉拉特语、泰米尔语、泰卢固语、旁遮普语、孟加拉语、马拉地语和马来语。我们将来自印度语同源词典的同源数据数字化,并利用链接的印度语词网生成同源集。此外,我们使用Wordnet数据为11种语言对创建一个假朋友数据集。我们还使用以前可用的基线同源检测方法评估了数据集的有效性。在词典编纂者的帮助下,我们还进行了手动评估,并发布了本文策划的金标准数据集。 摘要:Cognates are present in multiple variants of the same text across different languages (e.g., "hund" in German and "hound" in English language mean "dog"). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine Translation, Cross-lingual Sense Disambiguation, Computational Phylogenetics, and Information Retrieval. A possible solution to address this challenge is to identify cognates across language pairs. In this paper, we describe the creation of two cognate datasets for twelve Indian languages, namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. We digitize the cognate data from an Indian language cognate dictionary and utilize linked Indian language Wordnets to generate cognate sets. Additionally, we use the Wordnet data to create a False Friends' dataset for eleven language pairs. We also evaluate the efficacy of our dataset using previously available baseline cognate detection approaches. We also perform a manual evaluation with the help of lexicographers and release the curated gold-standard dataset with this paper.

【3】 A Multimodal Approach for Automatic Mania Assessment in Bipolar Disorder 标题:双相情感障碍躁狂自动评估的多模态方法 链接:https://arxiv.org/abs/2112.09467

作者:Pınar Baki 机构:B.S., Computer Engineering, Bo˘gazi¸ci University, Submitted to the Institute for Graduate Studies in, Science and Engineering in partial fulfillment of, the requirements for the degree of, Master of Science, Graduate Program in Computer Engineering 摘要:双相情感障碍是一种心理健康障碍,可导致从抑郁到躁狂的情绪波动。双相情感障碍的诊断通常基于患者访谈和患者护理者提供的报告。随后,诊断取决于专家的经验,可能会将该障碍与其他精神障碍混淆。双相情感障碍诊断中的自动化过程有助于提供定量指标,并允许对患者进行更长期的观察。此外,在COVID-19大流行期间,远程治疗和诊断的需求变得尤为重要。在这篇论文中,我们创建了一个多模态决策系统,该系统基于患者在声学、语言和视觉方面的记录。该系统在双相情感障碍语料库上进行训练。对单峰和多峰系统以及各种融合技术进行了综合分析。除了使用单峰特征处理整个患者会话外,还研究了剪辑的任务级调查。在多模态融合系统中使用声学、语言和视觉特征,我们获得了64.8%的未加权平均回忆分数,这提高了该数据集的最先进性能。 摘要:Bipolar disorder is a mental health disorder that causes mood swings that range from depression to mania. Diagnosis of bipolar disorder is usually done based on patient interviews, and reports obtained from the caregivers of the patients. Subsequently, the diagnosis depends on the experience of the expert, and it is possible to have confusions of the disorder with other mental disorders. Automated processes in the diagnosis of bipolar disorder can help providing quantitative indicators, and allow easier observations of the patients for longer periods. Furthermore, the need for remote treatment and diagnosis became especially important during the COVID-19 pandemic. In this thesis, we create a multimodal decision system based on recordings of the patient in acoustic, linguistic, and visual modalities. The system is trained on the Bipolar Disorder corpus. Comprehensive analysis of unimodal and multimodal systems, as well as various fusion techniques are performed. Besides processing entire patient sessions using unimodal features, a task-level investigation of the clips is studied. Using acoustic, linguistic, and visual features in a multimodal fusion system, we achieved a 64.8% unweighted average recall score, which improves the state-of-the-art performance achieved on this dataset.

【4】 Neural Architectures for Biological Inter-Sentence Relation Extraction 标题:生物句间关系提取的神经结构 链接:https://arxiv.org/abs/2112.09288

作者:Enrique Noriega-Atala,Peter M. Lovett,Clayton T. Morrison,Mihai Surdeanu 机构:The University of Arizona, Tucson, AZ, USA 备注:Accepted at the Scientific Document Understanding workshop at AAAI'22 摘要:我们介绍了一系列用于句子间关系提取的深度学习体系结构,即参与者不一定在同一个句子中的关系。我们将这些架构应用于生物医学领域的一个重要用例:为生化事件分配生物上下文。在这项工作中,生物环境被定义为观察生化事件的生物系统的类型。神经结构对同一候选上下文提及的多次出现进行编码和聚合,以确定它是否是特定事件提及的正确上下文。我们提出了两种广泛的体系结构类型:第一种类型在发出分类之前聚合多个实例,这些实例对应于关于事件提及的相同候选上下文;第二种类型独立地对每个实例进行分类,并使用结果为最终类投票,类似于集成方法。我们的实验表明,所提出的神经分类器是有竞争力的,并且一些分类器在不需要特征工程的情况下比以前的最先进的传统机器学习方法取得了更好的性能。我们的分析表明,与传统的机器学习分类器相比,神经方法特别提高了分类精度,并且还表明了句子间关系提取的难度是如何随着事件和上下文提及之间的距离的增加而增加的。 摘要:We introduce a family of deep-learning architectures for inter-sentence relation extraction, i.e., relations where the participants are not necessarily in the same sentence. We apply these architectures to an important use case in the biomedical domain: assigning biological context to biochemical events. In this work, biological context is defined as the type of biological system within which the biochemical event is observed. The neural architectures encode and aggregate multiple occurrences of the same candidate context mentions to determine whether it is the correct context for a particular event mention. We propose two broad types of architectures: the first type aggregates multiple instances that correspond to the same candidate context with respect to event mention before emitting a classification; the second type independently classifies each instance and uses the results to vote for the final class, akin to an ensemble approach. Our experiments show that the proposed neural classifiers are competitive and some achieve better performance than previous state of the art traditional machine learning methods without the need for feature engineering. Our analysis shows that the neural methods particularly improve precision compared to traditional machine learning classifiers and also demonstrates how the difficulty of inter-sentence relation extraction increases as the distance between the event and context mentions increase.

【5】 Logically at the Factify 2022: Multimodal Fact Verification 标题:在Factify 2022上的逻辑:多模态事实验证 链接:https://arxiv.org/abs/2112.09253

作者:Jie Gao,Hella-Franziska Hoffmann,Stylianos Oikonomou,David Kiskovski,Anil Bandhakavi 机构:Brookfoot Mills, Brookfoot Industrial Estate, Brighouse, HD,RW, United Kingdom 备注:Accepted in AAAI'22: First Workshop on Multimodal Fact-Checking and Hate Speech Detection, Februrary 22 - March 1, 2022,Vancouver, BC, Canada 摘要:本文描述了AAAI 2022多模态事实验证(Factify)挑战的参与者系统。尽管最近在基于文本的验证技术和大型预先训练的多模态模型跨视觉和语言方面取得了进展,但在应用多模态技术自动化事实检查过程方面所做的工作非常有限,特别是考虑到社交媒体上关于图像和视频的索赔和假新闻日益普遍。在我们的工作中,挑战被视为多模态蕴涵任务,并被视为多类分类。提出并探索了两种基线方法,包括集成模型(结合两个单峰模型)和多峰注意网络(对索赔和证据文档中图像和文本对之间的交互进行建模)。在这项工作中,我们进行了一些实验,调查和基准测试不同的SoTA预训练Transformer和视觉模型。我们的最佳模型在排行榜中排名第一,在验证集和测试集上的加权平均F-测度均为0.77。数据集的探索性分析也在Factify数据集上进行,并揭示激发我们假设的显著模式和问题(例如,单词重叠、视觉蕴涵关联、源偏倚)。最后,我们强调了任务和多模式数据集在未来研究中面临的挑战。 摘要:This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022. Despite the recent advance in text based verification techniques and large pre-trained multimodal models cross vision and language, very limited work has been done in applying multimodal techniques to automate fact checking process, particularly considering the increasing prevalence of claims and fake news about images and videos on social media. In our work, the challenge is treated as multimodal entailment task and framed as multi-class classification. Two baseline approaches are proposed and explored including an ensemble model (combining two uni-modal models) and a multi-modal attention network (modeling the interaction between image and text pair from claim and evidence document). We conduct several experiments investigating and benchmarking different SoTA pre-trained transformers and vision models in this work. Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set. Exploratory analysis of dataset is also carried out on the Factify data set and uncovers salient patterns and issues (e.g., word overlapping, visual entailment correlation, source bias) that motivates our hypothesis. Finally, we highlight challenges of the task and multimodal dataset for future research.

机器翻译,仅供参考