zl程序教程

您现在的位置是:首页 >  其他

当前栏目

走向完整弯曲的参数搜索:从非结构化文本中提取和聚类参数的框架

2023-03-31 10:32:48 时间

论点搜索旨在识别自然语言文本中的论点。在过去,这个任务通过句子或文档级别的关键字搜索和参数识别来解决。然而,现有的框架通常只处理参数搜索的特定组件,而不处理以下方面:(1)参数查询匹配:识别与实际搜索查询略有不同的参数;(2)参数标识:识别由多个句子组成的参数;(3)参数聚类:根据主题方面选择检索到的参数。在本文中,我们提出了一个解决这些缺点的框架。我们建议(1)将关键字搜索与预先计算的主题聚类相结合,用于参数查询匹配,(2)应用一种基于句子级序列标记的新方法用于参数识别,(3)基于主题感知参数聚类向用户呈现聚合参数。我们在几个真实世界的辩论数据集上进行的实验表明,基于密度的聚类算法,如hdbscan,特别适用于参数-查询匹配。通过我们的句子级,基于bilstm的序列标记方法,我们获得了0.71的宏f1分数。最后,评价我们的参数聚类方法表明,基于子主题的参数的细粒度聚类仍然具有挑战性,但值得探索。

原文题目:Towards Full-Fledged Argument Search: A Framework for Extracting and Clustering Arguments from Unstructured Text

原文:Argument search aims at identifying arguments in natural language texts. In the past, this task has been addressed by a combination of keyword search and argument identification on the sentence- or document-level. However, existing frameworks often address only specific components of argument search and do not address the following aspects: (1) argument-query matching: identifying arguments that frame the topic slightly differently than the actual search query; (2) argument identification: identifying arguments that consist of multiple sentences; (3) argument clustering: selecting retrieved arguments by topical aspects. In this paper, we propose a framework for addressing these shortcomings. We suggest (1) to combine the keyword search with precomputed topic clusters for argument-query matching, (2) to apply a novel approach based on sentence-level sequence-labeling for argument identification, and (3) to present aggregated arguments to users based on topic-aware argument clustering. Our experiments on several real-world debate data sets demonstrate that density-based clustering algorithms, such as HDBSCAN, are particularly suitable for argument-query matching. With our sentence-level, BiLSTM-based sequence-labeling approach we achieve a macro F1 score of 0.71. Finally, evaluating our argument clustering method indicates that a fine-grained clustering of arguments by subtopics remains challenging but is worthwhile to be explored.