zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

Elasticsearch按某个字段去重-实现group by效果

elasticsearch 实现 效果 by 字段 某个 group
2023-09-27 14:28:03 时间

一、需求分析 

1、现实需求 

已知es中存储了一张学生课程信息宽表,里边包含有student_name、student_id、teacher_id、课程信息等字段。

现在根据学生姓名或者班级ID,得出学生所在班级的所有授课教师信息。 

2、需求分析 

既然是明细宽表,里边的教师的信息必然是有重复的,因此我们需要查询teacher_id的集合并去重,对teacher_id实现SQL中的distinct或group by操作。然后再拿着查询出的不重复的teacher_id集去教师表查询每个教师的相关信息。

2、解决思路  

方式(1)字段aggs与top_hits:在内存中完成操作,性能损耗大

方式(2)collapse折叠:性能高,api操作简单

方式(1)有损性能,不推荐使用,下边是方式(2)为的操作代码


二、操作示例 

下边是JSON查询语句,must指定匹配条件,includes查询指定字段,collapse指定去重字段

1、原生的JSON格式 

{
	"query": {
		"bool": {
			"must": [{
					"term": {"status": {"value": 1}}
				},
				{
					"bool": {
						"should": [{
								"term": { "student_name": {"value": "zhangsan" }}
							},
							{
								"terms": {"class_id": [101]}
							}
						]
					}
				}
			]
		}
	},
	"_source": {
		"includes": ["teacher_id"],
		"excludes": []
	},
	"collapse": {"field": "teacher_id"}
}

2、Java语言的API

Java的API参考关键代码示例如下:

    public List<String> queryTeacherIds(List<Long> classIds, String studentName) {
    	List<String> idList = new ArrayList<>();
    	try{
    		//自行获取客户端连接
            RestHighLevelClient client = esConf.getClient();
            
            BoolQueryBuilder mustBoolQueryBuilder = QueryBuilders.boolQuery();
            mustBoolQueryBuilder.must(QueryBuilders.termQuery("status", 2));
            
            BoolQueryBuilder shouldBoolQueryBuilder = QueryBuilders.boolQuery();
            if(CollectionUtils.isNotEmpty(classIds)){
            	shouldBoolQueryBuilder.should(QueryBuilders.termsQuery("class_id", classIds));
            }
            if(StringUtils.isNotBlank(studentName)){
            	shouldBoolQueryBuilder.should(QueryBuilders.termsQuery("student_name",studentName));
            }
            mustBoolQueryBuilder.must(shouldBoolQueryBuilder);
            
            String teacherId = "teacher_id";
            String [] includes = {teacherId};//指定查询哪些字段
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(mustBoolQueryBuilder).fetchSource(includes,null).collapse(new CollapseBuilder(teacherId));

            SearchRequest searchRequest = new SearchRequest(erpIndex).source(sourceBuilder);
            logger.info("searchRequest:{}", searchRequest.toString());
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            logger.info("searchResponse:{}", searchResponse.toString());
            
            if(searchResponse!=null && searchResponse.getHits().getHits().length > 0){
                SearchHits searchHits = searchResponse.getHits();
                SearchHit[] hits = searchHits.getHits();
                for(SearchHit hit:hits){
                    Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                    Object teacherIdObj = sourceAsMap.get(teacherId);
                    idList.add(String.valueOf(teacherIdObj));
                }
            }
        }catch ( Exception e){
            logger.error("getPopAllowanceIds err{}",e);
            Profiler.businessAlarm("AllowanceSkuEsServiceImpl.querySkuByPage.Exception", String.format("getErpAllowanceIds查询es出错:%s", e));
        }
    	return idList;
    }