zl程序教程

您现在的位置是:首页 >  大数据

当前栏目

【编程实践】计算文本、字符串相似度:Levenshtein 距离

计算编程 实践 字符串 文本 距离 相似
2023-09-27 14:24:38 时间

目录

Levenshtein 距离简介

Open AI:

我写的(网上抄的):


Levenshtein 距离简介

Perplexity AI:
https://www.perplexity.ai/

The Levenshtein distance is a measure of the similarity between two strings, which is calculated by counting the number of deletions, insertions, or substitutions required to transform one string into another[1][2][3]. It can be used to measure the difference between two words or phrases[4][5].

Levenshtein 距离是两个字符串之间相似性的度量,它是通过计算将一个字符串转换为另一个字符串所需的删除、插入或替换的次数来计算的 [1][2][3]。它可以用来衡量两个词或短语之间的差异[4][5]。

Open AI:

用Kotlin实现计算文本、字符串相似度:Levenshtein 距离的代码示例


/**
 * 计算两个文本、字符串之间的Levenshtein距离
 * @param s1 文本、字符串1
 * @param s2 文本、字符串2
 * @return Levenshtein距离
 */
fun levenshteinDistance(s1: String, s2: String): Int {
    var n = s1.length
    var m = s2.length
    if (n == 0) return m
    if (m == 0) return n
    var d = Array(n + 1) { IntArray(m + 1) }

    // 初始化
    for (i in 0..n) d[i][0] = i
    for (j in 0..m) d[0][j] = j

    // 计算
    for (i in 1..n) {
        for (j in 1..m) {
            val cost = if (s1[i - 1] == s2[j - 1]) 0 else 1
            d[i][j] = minOf(
                    d[i - 1][j] + 1,
                    d[i][j - 1] + 1,
                    d[i - 1][j - 1] + cost
            )
        }
    }
    return d[n][m]
}

我写的(网上抄的):

package com.github.aixcode.utils

import com.github.aixcode.utils.TextDistanceUtil.getSimilarity

object TextDistanceUtil {
    /**
     *  Returns the Levenshtein distance between a and b, result 0~1
     */
    fun getLevenshteinDistance(X: String, Y: String): Int {
        val m = X.length
        val n = Y.length
        val T = Array(m + 1) { IntArray(n + 1) }
        for (i in 1..m) {
            T[i][0] = i
        }
        for (j in 1..n) {
            T[0][j] = j
        }
        var cost: Int
        for (i in 1..m) {
            for (j in 1..n) {
                cost = if (X[i - 1] == Y[j - 1]) 0 else 1
                T[i][j] = Integer.min(Integer.min(T[i - 1][j] + 1, T[i][j - 1] + 1),
                    T[i - 1][j - 1] + cost)
            }
        }
        return T[m][n]
    }

    fun getSimilarity(x: String, y: String): Double {
        val maxLength = java.lang.Double.max(x.length.toDouble(), y.length.toDouble())
        return if (maxLength > 0) {
            // 如果需要,可以选择忽略大小写
            (maxLength - getLevenshteinDistance(x, y)) / maxLength
        } else 1.0
    }
}

fun main() {
    println(getSimilarity("c", "cql")) // 0.3333333333333333
    println(getSimilarity("cq", "cql")) // 0.6666666666666666
    println(getSimilarity("cql", "cql")) // 1.0
}