elasticsearch之拼音搜索
2023-09-27 14:28:03 时间
拼音搜索在中文搜索环境中是经常使用的一种功能,用户只需要输入关键词的拼音全拼或者拼音首字母,搜索引擎就可以搜索出相关结果。在国内,中文输入法基本上都是基于汉语拼音的,这种在符合用户输入习惯的条件下缩短用户输入时间的功能是非常受欢迎的;
一、安装拼音搜索插件
下载对应版本的elasticsearch-analysis-pinyin插件;
https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.9.2/elasticsearch-analysis-pinyin-7.9.2.zip
在elasticsearch安装目录下的的plugin目录新建analysis-pinyin目录,并解压下载的安装包;
重启elasticsearch,可以看到已经正常加载拼音插件
[2022-01-13T20:37:25,368][INFO ][o.e.p.PluginsService ] [mango] loaded plugin [analysis-pinyin]
二、使用拼音插件
试一下分词效果,可以看到除了每个词的全频,还有每个字的首字母缩写;
POST _analyze
{
"analyzer": "pinyin",
"text": "我爱你,中国"
}
{
"tokens" : [
{
"token" : "wo",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "wanzg",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "ai",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 1
},
{
"token" : "ni",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 2
},
{
"token" : "zhong",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 3
},
{
"token" : "guo",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 4
}
]
}
自定义pinyin filter,并创建mapping;
PUT /milk
{
"settings": {
"analysis": {
"filter": {
"pinyin_filter":{
"type":"pinyin",
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"keep_original" : true,
"limit_first_letter_length" : 16,
"lowercase" : true,
"remove_duplicated_term" : true
}
},
"analyzer": {
"ik_pinyin_analyzer":{
"tokenizer":"ik_max_word",
"filter":["pinyin_filter"]
}
}
}
},
"mappings": {
"properties": {
"brand":{
"type": "text",
"analyzer": "ik_pinyin_analyzer"
},
"series":{
"type": "text",
"analyzer": "ik_pinyin_analyzer"
},
"price":{
"type": "float"
}
}
}
}
批量索引文档;
POST _bulk
{"index":{"_index":"milk", "_id":1}}}
{"brand":"蒙牛", "series":"特仑苏", "price":60}
{"index":{"_index":"milk", "_id":2}}}
{"brand":"蒙牛", "series":"真果粒", "price":40}
{"index":{"_index":"milk", "_id":3}}}
{"brand":"华山牧", "series":"华山牧", "price":49.90}
{"index":{"_index":"milk", "_id":4}}}
{"brand":"伊利", "series":"安慕希", "price":49.90}
{"index":{"_index":"milk", "_id":5}}}
{"brand":"伊利", "series":"金典", "price":49.90}
搜索tls,可以看到已经匹配到对应的记录;
POST milk/_search
{
"query": {
"match_phrase_prefix": {
"series": "tl"
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.691126,
"hits" : [
{
"_index" : "milk",
"_type" : "_doc",
"_id" : "1",
"_score" : 6.691126,
"_source" : {
"brand" : "蒙牛",
"series" : "特仑苏",
"price" : 60
}
}
]
}
}
可以看到查询直接使用MultiPhraseQuery来实现对两个位置字符的定位;
POST milk/_search
{
"query": {
"match_phrase_prefix": {
"series": "tl"
}
},
"highlight": {
"fields": {
"series": {}
}
}
}
{
"profile" : {
"shards" : [
{
"id" : "[OoNXoregTmKQAFotUgOeaA][milk][0]",
"searches" : [
{
"query" : [
{
"type" : "MultiPhraseQuery",
"description" : "series:\"(t tl) (li l lun)\"",
"time_in_nanos" : 177400,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 1,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 5400,
"match" : 12800,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 9200,
"advance_count" : 1,
"score" : 3900,
"build_scorer_count" : 2,
"create_weight" : 40800,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 105300
}
}
],
"rewrite_time" : 45700,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 14500
}
]
}
],
"aggregations" : [ ]
}
]
}
}
查询也可以返回高亮信息
POST milk/_search
{
"query": {
"match_phrase_prefix": {
"series": "tl"
}
},
"highlight": {
"fields": {
"series": {}
}
}
}
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.691126,
"hits" : [
{
"_index" : "milk",
"_type" : "_doc",
"_id" : "1",
"_score" : 6.691126,
"_source" : {
"brand" : "蒙牛",
"series" : "特仑苏",
"price" : 60
},
"highlight" : {
"series" : [
"<em>特</em><em>仑</em>苏"
]
}
}
]
}
}
相关文章
- Linux常用命令(第二版) --文件搜索命令
- Elasticsearch构建商品搜索系统
- 教你用Elastic Search:运行第一条Hello World搜索命令
- Lucene多字段搜索
- elasticsearch最全详细使用教程:搜索详解
- iOS投屏搜索不到设备如何解决?投屏怎么设置?
- 搜索存储过程,视图,函数内容
- 二分搜索插入
- Elasticsearch添加拼音搜索支持
- elasticsearch搜索推荐系列-拼音插件 elasticsearch-analysis-pinyin--analysis-ik(中文分词器)
- 【ElasticSearch】基于boost的细粒度搜索条件权重控制
- 【ElasticSearch搜索推荐】基于ngram分词机制实现index-time搜索推荐
- LeetCode_二叉搜索树_简单_700.二叉搜索树中的搜索
- ElasticSearch调优篇 11 - 搜索结果震荡问题解决+排查命令
- Elasticsearch调优篇 05 - Elasticsearch 搜索层面最全优化
- Android之文件搜索工具类
- Android 使软键盘的回车按钮变成搜索按钮(Kotlin)
- elasticsearch系列五:搜索详解(查询建议介绍、Suggester 介绍)
- elasticsearch系列四:搜索详解(搜索API、Query DSL)
- java 实现微信搜索附近人功能
- 大数据ELK(十):使用VSCode操作猎聘网职位搜索案例
- 二叉搜索树与双向链表
- [LeetCode] 108. Convert Sorted Array to Binary Search Tree 把有序数组转成二叉搜索树
- [LeetCode] 272. Closest Binary Search Tree Value II 最近的二叉搜索树的值 II
- 实战:Nodejs+Mongodb+Elasticsearch 实现简单的搜索
- 如何在Gihub上面精准搜索开源项目?
- 网页搜索自动补全功能如何实现,Elasticsearch来祝佬“一臂之力”
- python学习之美多商城(十七):商品部分:商品搜索、Elasticsearch搜索引擎(Docker部署及haystack对接)