zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

Elasticsearch教程(11) elasticsearch 桶聚合 Query DSL

elasticsearch教程 11 聚合 Query DSL
2023-09-27 14:26:50 时间

简介

聚合查询是ES的一个非常强大的功能,在日常开发中经常使用。就像你用MySQL时,经常用到count,sum,max,min,group by,having等等。
学习ES的聚合功能,不用一开始掌握每一种类型,只要熟练使用开发中常用的就行,对应特定的类型,比如ip地址聚合,经纬度聚合,等开发中遇到了再深入。学习要先广度,后深度。

准备数据

PUT /pigg/_doc/1
{
  "name": "老亚瑟",
  "age": 30,
  "sex": "男",
  "group": "日落圣殿",
  "tag":["战士", "坦克"],
  "date": "2019-12-26",
  "friend": "安琪拉"
}

PUT /pigg/_doc/2
{
  "name": "安琪拉",
  "age": 16,
  "sex": "女",
  "group": "日落圣殿",
  "tag":["法师"],
  "date": "2019-01-01",
  "friend": ""
}

PUT /pigg/_doc/3
{
  "name": "凯",
  "age": 28,
  "sex": "男",
  "group": "长城守卫军",
  "tag":["战士"],
  "date": "2020-01-01"
}

PUT /pigg/_doc/4
{
  "name": "盾山",
  "age": 38,
  "sex": "男",
  "group": "长城守卫军",
  "tag":["辅助", "坦克"],
  "date": "2020-02-02"
}

PUT /pigg/_doc/5
{
  "name": "百里守约",
  "age": 18,
  "sex": "男",
  "group": "长城守卫军",
  "tag":["射手"],
  "date": "2020-03-03"
}

PUT /pigg/_doc/6
{
  "name": "李元芳",
  "age": 15,
  "sex": "男",
  "group": "长安",
  "tag":["刺客"],
  "date": "2020-03-23"
}

PUT /pigg/_doc/7
{
  "name": "陈咬金",
  "age": 40,
  "sex": "男",
  "group": "长安",
  "tag":["战士", "坦克"]
}

指标聚合

指标聚合参考之前我的博客Elasticsearch笔记(五) 指标聚合 SQL DSL JavaAPI

桶聚合

桶聚合就是把某个条件作为一个桶,满足这个条件的数据归属到这个桶里。
举例1:有很多彩色的气球,按照颜色桶聚合,桶1:红色的球,桶2:黄色的球,桶3:蓝色的球
在这里插入图片描述
通过上面例子,大概能了解桶聚合的作用了吧。另外用作聚合的字段,最好是keyword类型,虽然text也可以,但是要启用field_data属性,很影响性能。

1 terms

terms桶聚合类似SQL的Group By功能,下面举例按照英雄的阵营terms聚合。

GET /pigg/_search
{
   "size": 0, 
   "aggs": {
     "group_by_group": {
       "terms": {
         "field": "group.keyword"
       }
     }
   }
}

结果如下:

      "buckets" : [
        {
          "key" : "长城守卫军",
          "doc_count" : 3
        },
        {
          "key" : "日落圣殿",
          "doc_count" : 2
        },
        {
          "key" : "长安",
          "doc_count" : 2
        }
      ]

2 多层terms

aggs是可以再内嵌aggs的,举例:分别统计每个阵营的男女人数。

GET /pigg/_search
{
   "size": 0, 
   "aggs": {
     "group_by_group": {
       "terms": {
         "field": "group.keyword"
       },
       "aggs": {
         "group_by_sex": {
           "terms": {
             "field": "sex.keyword"
           }
         }
       }
     }
   }
}

结果如下:

  "buckets" : [
        {
          "key" : "长城守卫军",
          "doc_count" : 3,
          "group_by_sex" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "男",
                "doc_count" : 3
              }
            ]
          }
        },
        {
          "key" : "日落圣殿",
          "doc_count" : 2,
          "group_by_sex" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "女",
                "doc_count" : 1
              },
              {
                "key" : "男",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "长安",
          "doc_count" : 2,
          "group_by_sex" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "男",
                "doc_count" : 2
              }
            ]
          }
        }
      ]

3 filter 先过滤数据范围,再统计

举例1:统计长安阵营下人数

GET /pigg/_search
{
  "size": 0,
  "aggs": {
    "count_of_changan": {
      "filter": {
        "term": {
          "group.keyword": "长安"
        }
      }
    }
  }
}
上面等用于
GET /pigg/_count
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "group.keyword": "长安"
        }
      }
    }
  }
}

举例2:统计长安阵营下人的平均年龄

GET /pigg/_search
{
    "size":0,
    "aggs":{
        "avg_age_of_changan":{
            "filter":{
                "term":{
                    "group.keyword":"长安"
                }
            },
            "aggs":{
                "avg_age":{
                    "avg":{
                        "field":"age"
                    }
                }
            }
        }
    }
}

结果如下:

  "aggregations" : {
    "avg_age_of_changan" : {
      "doc_count" : 2,
      "avg_age" : {
        "value" : 27.5
      }
    }
  }

4 filters 多过滤聚合

filters聚合是定义多个桶,把符合某个桶的过滤条件的数据放到这个桶里。

举例1:分别统计战士和刺客的人数

GET /pigg/_search
{
    "aggs":{
        "count_of_tag":{
            "filters":{
                "filters":{
                    "tag_战士":{
                        "term":{
                            "tag.keyword":"战士"
                        }
                    },
                    "tag_刺客":{
                        "term":{
                            "tag.keyword":"刺客"
                        }
                    }
                }
            }
        }
    }
}

结果如下:

  "aggregations" : {
    "count_of_tag" : {
      "buckets" : {
        "tag_刺客" : {
          "doc_count" : 1
        },
        "tag_战士" : {
          "doc_count" : 3
        }
      }
    }
  }

举例2:分别统计姓李,姓陈,年龄大于20的人数,3个桶如下:

在这里插入图片描述

GET /pigg/_search
{
    "size": 0, 
    "aggs":{
        "count_of_tag":{
            "filters":{
                "filters":{
                    "姓李":{
                        "prefix":{
                            "name.keyword":"李"
                        }
                    },
                    "姓陈":{
                        "prefix":{
                            "name.keyword":"陈"
                        }
                    },
                    "年龄>=20":{
                        "range":{
                            "age":{
                                "gte":20
                            }
                        }
                    }
                }
            }
        }
    }
}

结果如下:

  "aggregations" : {
    "count_of_tag" : {
      "buckets" : {
        "姓李" : {
          "doc_count" : 1
        },
        "姓陈" : {
          "doc_count" : 1
        },
        "年龄>=20" : {
          "doc_count" : 4
        }
      }
    }
  }

5 range

range区间聚合是先划分一个值的区间,文档的那个字段属于哪个区间,就把文档归属到哪个桶。
range聚合用"from"和"to"来定义值区间,是左闭右开的,from 30 to 40是包括30,但不包含40。
举例:按照年龄range聚合

GET /pigg/_search
{
  "aggs": {
    "age_rang": {
      "range": {
        "field": "age",
        "missing": 0,
        "ranges": [
          {
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40
          }
        ]
      }
    }
  }
}

结果如下:

 "buckets" : [
        {
          "key" : "*-30.0",
          "to" : 30.0,
          "doc_count" : 4
        },
        {
          "key" : "30.0-40.0",
          "from" : 30.0,
          "to" : 40.0,
          "doc_count" : 2
        },
        {
          "key" : "40.0-*",
          "from" : 40.0,
          "doc_count" : 1
        }
      ]

6 date range

举例1:查询最近7天的数据

GET /pigg/_search
{
    "size":0,
    "aggs":{
        "range":{
            "date_range":{
                "field":"date",
                "format": "yyyy-MM-dd", 
                "ranges":[
                    {
                        "from":"now-7d/d",
                        "to":"now"
                    }
                ]
            }
        }
    }
}

结果如下:

  "aggregations" : {
    "range" : {
      "buckets" : [
        {
          "key" : "2020-03-16-2020-03-23",
          "from" : 1.5843168E12,
          "from_as_string" : "2020-03-16",
          "to" : 1.584954696781E12,
          "to_as_string" : "2020-03-23",
          "doc_count" : 1
        }
      ]
    }
  }

8 date histogram 日期直方图

基于日期类型字段,以日期间隔来分桶聚合。可用的时间间隔类型为:year、quarter、month、week、day、hour、minute、second,其中,除了year、quarter 和 month,其余可用小数形式。

举例1:统计每个月的人数

GET /pigg/_search
{
    "size":0,
    "aggs":{
        "dates":{
            "date_histogram":{
                "field":"date",
                "interval":"month",
                "format":"yyyy-MM-dd"
            }
        }
    }
 }

返回结果如下:

  "dates" : {
      "buckets" : [
        {
          "key_as_string" : "2019-01-01",
          "key" : 1546300800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2019-02-01",
          "key" : 1548979200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-03-01",
          "key" : 1551398400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-04-01",
          "key" : 1554076800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-05-01",
          "key" : 1556668800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-06-01",
          "key" : 1559347200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-07-01",
          "key" : 1561939200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-08-01",
          "key" : 1564617600000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-09-01",
          "key" : 1567296000000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-10-01",
          "key" : 1569888000000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-11-01",
          "key" : 1572566400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2019-12-01",
          "key" : 1575158400000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-01-01",
          "key" : 1577836800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-02-01",
          "key" : 1580515200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-03-01",
          "key" : 1583020800000,
          "doc_count" : 2
        }
      ]
    }

举例2:去除统计的doc_count=0的数据

GET /pigg/_search
{
    "size":0,
    "aggs":{
        "dates":{
            "date_histogram":{
                "field":"date",
                "interval":"month",
                "format":"yyyy-MM",
                "min_doc_count":1
            }
        }
    }
}

返回结果如下:返回了至少有1个人的月份

  "dates" : {
      "buckets" : [
        {
          "key_as_string" : "2019-01",
          "key" : 1546300800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2019-12",
          "key" : 1575158400000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-01",
          "key" : 1577836800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-02",
          "key" : 1580515200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-03",
          "key" : 1583020800000,
          "doc_count" : 2
        }
      ]
    }

举例3:先确定日期范围,然后再统计

统计2020年1月之后的数据

GET /pigg/_search
{
    "size":0,
    "query": {
      "bool": {
        "filter": {
          "range": {
            "date": {
              "gte": "2020-01-01"
            }
          }
        }
      }
    }, 
    "aggs":{
        "dates":{
            "date_histogram":{
                "field":"date",
                "interval":"month",
                "format":"yyyy-MM",
                "min_doc_count":1
            }
        }
    }
}

返回结果如下:

 "dates" : {
      "buckets" : [
        {
          "key_as_string" : "2020-01",
          "key" : 1577836800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-02",
          "key" : 1580515200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2020-03",
          "key" : 1583020800000,
          "doc_count" : 2
        }
      ]
    }

9 missing 缺失值的桶聚合

统计没有friend值的人员个数
亚瑟的friend值=“安琪拉”,安琪拉friend="",虽然是"",但是还是算存在的。

#总人数是7,有friend是2个人,所以返回值是5
POST /pigg/_search?size=0
{
    "aggs" : {
        "account_without_friend" : {
            "missing" : { "field" : "friend.keyword" }
        }
    }
}

10 折叠查询

折叠查询非常方便,根据某个字段分组,并且取其中第一个,其下面展开的值也可以同时查出来。

举例:按照阵营group分组,取每个group里年龄最大的人,同时展示每个group里年龄前2名的人。

GET /pigg/_search
{
  "collapse": {
    "field": "group.keyword",
    "inner_hits":{
      "name": "old_age",
      "size": 2,
      "sort": [{"age": "desc"}]
    }
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

结果显示查出的阵营是按照里面年纪最大的人的年纪排序,比如陈咬金40岁,老亚瑟30岁,他们分别是各自阵营的年龄最大的人,陈咬金的年纪>老亚瑟的年纪,所以长安排名比日落圣殿高。

  "hits" : [
      {
        "_index" : "pigg",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : null,
        "_source" : {
          "name" : "陈咬金",
          "age" : 40,
          "sex" : "男",
          "group" : "长安",
          "tag" : [
            "战士",
            "坦克"
          ]
        },
        "fields" : {
          "group.keyword" : [
            "长安"
          ]
        },
        "sort" : [
          40
        ],
        "inner_hits" : {
          "old_age" : {
            "hits" : {
              "total" : 2,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "pigg",
                  "_type" : "_doc",
                  "_id" : "7",
                  "_score" : null,
                  "_source" : {
                    "name" : "陈咬金",
                    "age" : 40,
                    "sex" : "男",
                    "group" : "长安",
                    "tag" : [
                      "战士",
                      "坦克"
                    ]
                  },
                  "sort" : [
                    40
                  ]
                },
                {
                  "_index" : "pigg",
                  "_type" : "_doc",
                  "_id" : "6",
                  "_score" : null,
                  "_source" : {
                    "name" : "李元芳",
                    "age" : 15,
                    "sex" : "男",
                    "group" : "长安",
                    "tag" : [
                      "刺客"
                    ],
                    "date" : "2020-03-23"
                  },
                  "sort" : [
                    15
                  ]
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "pigg",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "盾山",
          "age" : 38,
          "sex" : "男",
          "group" : "长城守卫军",
          "tag" : [
            "辅助",
            "坦克"
          ],
          "date" : "2020-02-02"
        },
        "fields" : {
          "group.keyword" : [
            "长城守卫军"
          ]
        },
        "sort" : [
          38
        ],
        "inner_hits" : {
          "old_age" : {
            "hits" : {
              "total" : 3,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "pigg",
                  "_type" : "_doc",
                  "_id" : "4",
                  "_score" : null,
                  "_source" : {
                    "name" : "盾山",
                    "age" : 38,
                    "sex" : "男",
                    "group" : "长城守卫军",
                    "tag" : [
                      "辅助",
                      "坦克"
                    ],
                    "date" : "2020-02-02"
                  },
                  "sort" : [
                    38
                  ]
                },
                {
                  "_index" : "pigg",
                  "_type" : "_doc",
                  "_id" : "3",
                  "_score" : null,
                  "_source" : {
                    "name" : "凯",
                    "age" : 28,
                    "sex" : "男",
                    "group" : "长城守卫军",
                    "tag" : [
                      "战士"
                    ],
                    "date" : "2020-01-01"
                  },
                  "sort" : [
                    28
                  ]
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "pigg",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "老亚瑟",
          "age" : 30,
          "sex" : "男",
          "group" : "日落圣殿",
          "tag" : [
            "战士",
            "坦克"
          ],
          "date" : "2019-12-26",
          "friend" : "安琪拉"
        },
        "fields" : {
          "group.keyword" : [
            "日落圣殿"
          ]
        },
        "sort" : [
          30
        ],
        "inner_hits" : {
          "old_age" : {
            "hits" : {
              "total" : 2,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "pigg",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_score" : null,
                  "_source" : {
                    "name" : "老亚瑟",
                    "age" : 30,
                    "sex" : "男",
                    "group" : "日落圣殿",
                    "tag" : [
                      "战士",
                      "坦克"
                    ],
                    "date" : "2019-12-26",
                    "friend" : "安琪拉"
                  },
                  "sort" : [
                    30
                  ]
                },
                {
                  "_index" : "pigg",
                  "_type" : "_doc",
                  "_id" : "2",
                  "_score" : null,
                  "_source" : {
                    "name" : "安琪拉",
                    "age" : 16,
                    "sex" : "女",
                    "group" : "日落圣殿",
                    "tag" : [
                      "法师"
                    ],
                    "date" : "2019-01-01",
                    "friend" : ""
                  },
                  "sort" : [
                    16
                  ]
                }
              ]
            }
          }
        }
      }
    ]