您现在的位置是：首页 > 后端

当前栏目

大叔算法分享（5）聚类算法DBSCAN

算法分享聚类大叔 DBSCAN

2023-09-14 09:00:08 时间

一简介

DBSCAN：Density-based spatial clustering of applications with noise

is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.

二原理

DBSCAN是一种基于密度的聚类算法，算法过程比较简单，即将相距较近的点（中心点和它的邻居点）聚成一个cluster，然后不断找邻居点的邻居点并加到这个cluster中，直到cluster无法再扩大，然后再处理其他未访问的点；

三算法伪代码

子方法伪代码

DBSCAN requires two parameters: ε (eps) and the minimum number of points required to form a dense region (minPts).

DBSCAN算法主要有两个参数，一个是距离Eps，一个是最小邻居的数量MinPts，即在中心点半径Eps之内的邻居点数量超过MinPts时，中心点和邻居点才可以组成一个cluster；

四应用代码实现

python

示例代码

def main_fun():
    loc_data = [(40.8379295833, -73.70228875), (40.750613794,-73.993434906), (40.6927066969, -73.8085984165), (40.7489736586, -73.9859616017), (40.8379525833, -73.70209875), (40.6997066969, -73.8085234165), (40.7484436586, -73.9857316017)]
    epsilon = 10
    db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(loc_data))
    labels = db.labels_
    print(labels)
    print(db.core_sample_indices_)
    print(db.components_)
    n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
    for i in range(0, n_clusters_):
        print(i)
        indexs = np.where(labels == i)
        for j in indexs:
            print(loc_data[j])

if __name__ == '__main__':
    main_fun()

主要结果说明

core_sample_indices_ : array, shape = [n_core_samples]: Indices of core samples.
components_ : array, shape = [n_core_samples, n_features]: Copy of each core sample found by training.
labels_ : array, shape = [n_samples]: Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1.

详见官方文档：https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

scala

依赖

<dependency>
  <groupId>org.scalanlp</groupId>
  <artifactId>nak_2.11</artifactId>
  <version>1.3</version>
</dependency>

<dependency>
  <groupId>org.scalanlp</groupId>
  <artifactId>breeze_2.11</artifactId>
  <version>0.13</version>
</dependency>

示例代码

import breeze.linalg.DenseMatrix
import nak.cluster.{DBSCAN, GDBSCAN, Kmeans}

    val matrix = DenseMatrix(
      (40.8379295833, -73.70228875),
      (40.6927066969, -73.8085984165),
      (40.7489736586, -73.9859616017),
      (40.8379525833, -73.70209875),
      (40.6997066969, -73.8085234165),
      (40.7484436586, -73.9857316017),
      (40.750613794,-73.993434906))

    val gdbscan = new GDBSCAN(
      DBSCAN.getNeighbours(epsilon = 1000.0, distance = Kmeans.euclideanDistance),
      DBSCAN.isCorePoint(minPoints = 1)
    )
    val clusters = gdbscan cluster matrix
    clusters.foreach(cluster => {
        println(cluster.id + ", " + cluster.points.length)
        cluster.points.foreach(p => p.value.data.foreach(println))
      })

详见官方文档：https://github.com/scalanlp/nak

算法细节详见参考

参考：A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

其他：

http://www.cs.fsu.edu/~ackerman/CIS5930/notes/DBSCAN.pdf

https://www.oreilly.com/ideas/clustering-geolocated-data-using-spark-and-dbscan

猜你喜欢

SQL Server中使用子查询实现简洁高效查询（子查询sqlserver）
Oracle数据库中的JSR下载安装指导（Oracle下载jsr）
学习redis，找寻完整教程（一套完整redis教程）
周鸿祎：360是全世界黑客都过不去的一道坎
从入门到精通：掌握常用MySQL命令（常用的mysql命令）
使用react+docusaurus快速搭建一个博客网站
mysqlFrom_unixtime及UNIX_TIMESTAMP及DATE_FORMAT日期函数
数据库MySQL移动号码归属地数据库（mysql手机归属地）
Mysql并发控制和事务管理的研究返回索引（mysql并发事务）
命令行连接远程Redis快速上手指南（命令行连接远程redis）
编译正常运行，打jar包运行报错（找不到文件路径）
Oracle12c地位改变数据库行业格局（oracle12c地位）
忘记解决Oracle数据库密码遗忘问题（oracle数据库密码）
linux下查看进程的命令：ps（linux查进程命令）
前端后端的爱恨情仇--续集
重磅官宣！沃趣科技荣获“2022杭州准独角兽企业”！
user成为真正的Linux用户（truelinux）
Linux系统安全：关闭防火墙（关闭linux的防火墙）

相关主题

数据结构与算法(十二)
算法 - KMP算法
BFS算法
前缀和算法

zl程序教程

当前栏目

大叔算法分享（5）聚类算法DBSCAN

一简介

二原理

三算法伪代码

四应用代码实现

python

scala

相关文章

当前栏目

大叔算法分享（5）聚类算法DBSCAN

一 简介

二 原理

三 算法伪代码

四 应用代码实现

python

scala

相关文章

一简介

二原理

三算法伪代码

四应用代码实现