您现在的位置是：首页 > 其他

当前栏目

PHP DFA算法实现敏感词过滤包 php-dfa-sensitive

2023-02-18 16:32:27 时间

好不容易做个网站上线了，结果被一些别有用心的人灌水，发垃圾广告，垃圾评论，导致一些不该出现的词出现，往往出现这个，我们需要在后台不断的审核，删除，若是全部用人来做的话，想想这个工作量都让人头疼，我们通常的做法是用程序过滤一部分，在加人工审核，当然程序若是能过滤掉100%是最好的，但是程序过滤的永远是第一次发生后的，预知就有点无能为力了。

DFA算法（确定有穷自动机）

安装包地址：https://packagist.org/packages/lustre/php-dfa-sensitive

github地址：https://github.com/FireLustre/php-dfa-sensitive

安装扩展

composer require lustre/php-dfa-sensitive

引人

use DfaFilter\SensitiveHelper;

调用

1、数组调用

$wordData = array(
    '小秘',
    '小李',
    '小红',
    '小红红',
    '小莉莉',
    ......
);
$handle = SensitiveHelper::init()->setTree($wordData);

2、文件调用

$wordFilePath = 'data/words.txt'; //敏感词文件，文件中每个词一行
$handle = SensitiveHelper::init()->setTreeByFile($wordFilePath);

3、检测是否有敏感词

$islegal = $handle->islegal($content);

4、敏感词过滤

// 敏感词替换为*为例（会替换为相同字符长度的*）
$filterContent = $handle->replace($content, '*', true);
// 或敏感词替换为***为例
$filterContent = $handle->replace($content, '***');

5、标记敏感词

$markedContent = $handle->mark($content, '<mark>', '</mark>');

6、获取文字中的敏感词

// 获取内容中所有的敏感词
$sensitiveWordGroup = $handle->getBadWord($content);

// 仅且获取一个敏感词
$sensitiveWordGroup = $handle->getBadWord($content, 1);

封装为类

<?php
namespace App\Services;
use DfaFilter\SensitiveHelper;
class SensitiveWords
{
  protected static $handle = null;
  private function __construct()
  {
  }
  private function __clone()
  {
  }
  /**
   * 获取实例
   */
  public static function getInstance($word_path = [])
  {
    if (!self::$handle) {
      //默认的一些敏感词库
      $default_path = [
        include_once('data/1.txt'), //敏感词文件
        include_once('data/2.txt'),
        include_once('data/3.txt'),
        include_once('data/4.txt'),
      ];
      $paths = array_merge($default_path, $word_path);
      self::$handle = SensitiveHelper::init();
      if (!empty($paths)) {
        foreach ($paths as $path) {
          self::$handle->setTreeByFile($path);
        }
      }
    }
    return self::$handle;
  }
  /**
   * 检测是否含有敏感词
   */
  public static function isLegal($content)
  {
    return self::getInstance()->islegal($content);
  }
  /**
   * 敏感词过滤
   */
  public static function replace($content, $replace_char = '', $repeat = false, $match_type = 1)
  {
    return self::getInstance()->replace($content, $replace_char, $repeat, $match_type);
  }
  /**
   * 标记敏感词
   */
  public static function mark($content, $start_tag, $end_tag, $match_type = 1)
  {
    return self::getInstance()->mark($content, $start_tag, $end_tag, $match_type);
  }
  /**
   * 获取文本中的敏感词
   */
  public static function getBadWord($content, $match_type = 1, $word_num = 0)
  {
    return self::getInstance()->getBadWord($content, $match_type, $word_num);
  }
}

在项目中，使用 SensitiveWords::getBadWord() 来获取文本中是否有敏感词。

$bad_word = SensitiveWords::getBadWord($content);
if (!empty($bad_word)) {
  throw new \Exception('包含敏感词:' . current($bad_word));
}

对于网站的敏感词，我们总是在与攻击者斗智斗勇，上面的是一种过滤的算法，不一定是最好的，我们往往还需要结合正则表达式，字符串过滤，火星文过滤等等技术手段，减少这方面词的出现。

猜你喜欢

ECCV 2022 | 76小时动捕，最大规模数字人多模态数据集开源
LPCG：用激光点云指导单目的3D物体检测
PythonRobotics | 基于python的机器人自主导航
多模态数据的行为识别综述
水尺监测识别系统
KG4Py：Python代码知识图谱和语义搜索的工具包
麦肯锡报告-2022年人工智能现状及5年回顾
ECCV2022 | PCLossNet：不进行匹配的点云重建网络
mask
pug
人群异常聚集识别监测系统
注解的使用合集
依赖使用合集
SpringCloud之GateWay
SpringCloud之Hystrix
SpringCloud之OpenFeign
SpringCloud之Ribbon
SpringCloud之Consul
SpringCloud之zookeeper
SpringCloud之Eureka

zl程序教程

当前栏目

PHP DFA算法实现敏感词过滤包 php-dfa-sensitive

DFA算法（确定有穷自动机）

安装扩展

引人

调用

封装为类

相关文章