zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

dipd:来自推特的破坏性事件预测数据集

2023-03-31 10:32:52 时间

骚乱和抗议活动如果失控,可能会在一个国家造成严重破坏。我们已经看到了这样的例子,比如BLM运动、气候罢工、CAA运动等等,它们在很大程度上造成了破坏。我们创建这个数据集的动机是利用它来开发机器学习系统,使用户能够深入了解正在发生的趋势事件,并提醒他们可能导致国家中断的事件。如果任何事件开始失控,它可以通过在事件升级之前进行监控来处理和缓解。该数据集收集过去或已知的造成中断的事件的推文,并将这些推文标记为1。我们还收集了那些被认为无关紧要的推文,并将其标记为0,这样它们也可以用来训练一个分类系统。该数据集包含94855条唯一事件记录和168706条唯一非事件记录,从而给出总数据集263561条记录。我们从这些推文中提取多个特性,如用户的关注者数量和用户的位置,以了解这些推文的影响和影响范围。该数据集可能对各种与事件相关的机器学习问题有用,如事件分类、事件识别等。

原文题目:DiPD: Disruptive event Prediction Dataset from Twitter

原文:Riots and protests, if gone out of control, can cause havoc in a country. We have seen examples of this, such as the BLM movement, climate strikes, CAA Movement, and many more, which caused disruption to a large extent. Our motive behind creating this dataset was to use it to develop machine learning systems that can give its users insight into the trending events going on and alert them about the events that could lead to disruption in the nation. If any event starts going out of control, it can be handled and mitigated by monitoring it before the matter escalates. This dataset collects tweets of past or ongoing events known to have caused disruption and labels these tweets as 1. We also collect tweets that are considered non-eventful and label them as 0 so that they can also be used to train a classification system. The dataset contains 94855 records of unique events and 168706 records of unique non-events, thus giving the total dataset 263561 records. We extract multiple features from the tweets, such as the user's follower count and the user's location, to understand the impact and reach of the tweets. This dataset might be useful in various event related machine learning problems such as event classification, event recognition, and so on.