dipd:来自推特的破坏性事件预测数据集
骚乱和抗议活动如果失控,可能会在一个国家造成严重破坏。我们已经看到了这样的例子,比如BLM运动、气候罢工、CAA运动等等,它们在很大程度上造成了破坏。我们创建这个数据集的动机是利用它来开发机器学习系统,使用户能够深入了解正在发生的趋势事件,并提醒他们可能导致国家中断的事件。如果任何事件开始失控,它可以通过在事件升级之前进行监控来处理和缓解。该数据集收集过去或已知的造成中断的事件的推文,并将这些推文标记为1。我们还收集了那些被认为无关紧要的推文,并将其标记为0,这样它们也可以用来训练一个分类系统。该数据集包含94855条唯一事件记录和168706条唯一非事件记录,从而给出总数据集263561条记录。我们从这些推文中提取多个特性,如用户的关注者数量和用户的位置,以了解这些推文的影响和影响范围。该数据集可能对各种与事件相关的机器学习问题有用,如事件分类、事件识别等。
原文题目:DiPD: Disruptive event Prediction Dataset from Twitter
原文:Riots and protests, if gone out of control, can cause havoc in a country. We have seen examples of this, such as the BLM movement, climate strikes, CAA Movement, and many more, which caused disruption to a large extent. Our motive behind creating this dataset was to use it to develop machine learning systems that can give its users insight into the trending events going on and alert them about the events that could lead to disruption in the nation. If any event starts going out of control, it can be handled and mitigated by monitoring it before the matter escalates. This dataset collects tweets of past or ongoing events known to have caused disruption and labels these tweets as 1. We also collect tweets that are considered non-eventful and label them as 0 so that they can also be used to train a classification system. The dataset contains 94855 records of unique events and 168706 records of unique non-events, thus giving the total dataset 263561 records. We extract multiple features from the tweets, such as the user's follower count and the user's location, to understand the impact and reach of the tweets. This dataset might be useful in various event related machine learning problems such as event classification, event recognition, and so on.
相关文章
- 从本体论开始说起——运营商关系图谱的构建及应用
- 如何成为一名数据科学家?
- 从未见过的堂兄杀了人,你的DNA是关键证据
- 20个安全可靠的免费数据源,各领域数据任你挑
- 20个安全可靠的免费数据源,各领域数据任你挑
- 阿里云李飞飞:All in Cloud时代,云原生数据库优势明显
- 基于Hadoop生态系统的一高性能数据存储格式CarbonData(性能篇)
- 大数据告诉你:10年漫威,到底有多少角色
- TigerGraph:实时图数据库助力金融风控升级
- Splunk利用Splunk Connected Experiences和Splunk Business Flow 扩大数据访问
- 大数据开发常见的9种数据分析手段
- 以免在景区看人,我爬了5W条全国景点门票数据...
- 【实战解析】基于HBase的大数据存储在京东的应用场景
- 数据科学家告诉你哪些计算机科学书籍是你应该看的
- Kafka作为大数据的核心技术,你了解多少?
- Spring Boot 整合 Redis 实现缓存操作
- 大数据学习必须掌握的五大核心技术有哪些?
- 基于Antlr在Apache Flink中实现监控规则DSL化的探索实践
- 甲骨文再次被Gartner评为分析型数据管理解决方案魔力象限领导者
- 爬取吴亦凡微博102118条转发数据,扒一扒流量的真假