zl程序教程

您现在的位置是:首页 >  其他

当前栏目

HTMOT:随时间变化的分层主题建模

2023-04-18 14:48:17 时间

多年来,主题模型提供了一种从文本中提取见解的有效方法。然而,虽然已经提出了许多模型,但没有一个模型能够共同建模主题的时间性和层次结构。建模时间通过分离词汇上相近但时间上不同的主题来提供更精确的主题,而建模层次结构则提供了文档语料库内容的更详细的视图。因此,在本研究中,我们提出了一种新的方法,HTMOT,来执行分层主题建模。我们使用一种新的吉布斯采样实现来训练HTMOT,这更有效。具体来说,我们表明,仅将时间建模应用于深度子主题,就提供了一种提取特定故事或事件的方法,而高级主题则在语料库中提取更大的主题。我们的结果表明,我们的训练过程是快速的,可以提取准确的高级主题和时间精确的子主题。我们使用Word入侵任务测量了我们的模型的性能,并概述了这种评估方法的一些局限性,特别是对于分层模型。作为一个案例研究,我们重点关注了2020年航天工业的各种发展。

原文题目:HTMOT : Hierarchical Topic Modelling Over Time

原文:Over the years, topic models have provided an efficient way of extracting insights from text. However, while many models have been proposed, none are able to model topic temporality and hierarchy jointly. Modelling time provide more precise topics by separating lexically close but temporally distinct topics while modelling hierarchy provides a more detailed view of the content of a document corpus. In this study, we therefore propose a novel method, HTMOT, to perform Hierarchical Topic Modelling Over Time. We train HTMOT using a new implementation of Gibbs sampling, which is more efficient. Specifically, we show that only applying time modelling to deep sub-topics provides a way to extract specific stories or events while high level topics extract larger themes in the corpus. Our results show that our training procedure is fast and can extract accurate high-level topics and temporally precise sub-topics. We measured our model's performance using the Word Intrusion task and outlined some limitations of this evaluation method, especially for hierarchical models. As a case study, we focused on the various developments in the space industry in 2020.