您现在的位置是：首页 > 数据库

当前栏目

[译]PostgreSQL HAVING子句

postgresql 子句 having

2023-06-13 09:12:02 时间

PostgreSQL HAVING子句

正文

了解GROUP BY和HAVING子句的工作原理可以帮助写出更加高效的SQL。这里用一个真实例子来说明这一点。

假设由2个表：country、invoice：

select * from country;
ctry_code  ctry_name
US      USA
CA      Canada
CN      China
..  ..
7 rows

select * from invoice;
invoice_nbr  ctry_cd  year_nbr  month_nbr  item_cnt  invoice_amt
2014001    BR    2014    3      13      162875
2021172    CA    2021    10      200      1299355
2020435    CN    2020    1      12      145654
2016201    US    2016    8      900      7125125
2021662    US    2021    4      100      800135
2018743    MX    2018    11      5      76124
..  ..  ..  ..  ..  ..
4 million rows

我想写一个在invoice上进行聚合的SQL：

1)对于每个country和year，除了USA外的所有countries

2)对于每个country，除了USA外的所有countries

3)对于每year，包括USA的

4)总计，包括USA的

这是我们想要从查询中得到的：

ctry_name	year_nbr	tot_cnt	tot_amt
Canada	2014	200	1300000
Canada	2015	220	1500000
Brazil	2014	200	1150000
Brazil	2015	180	1000000
Brazil	2018	150	750000
..	..	..	..
Canada	null	420	2800000
Brazil	null	530	2900000
..	..	..	..
All countries	2014	900	6406325
All countries	2015	1000	7306368
All countries	2018	1200	8206334
..	..	..	..
All countries	null	6124	41261346

这是给我们预期的一个结果。使用6个SQL用于6个country（USA除外）中的每个年度总计，6个SQL用于所有年份的国家统计，一个SQL用于所有country的年度统计，最有一个SQL用于全部总计。然后将这14个SQL于13个UNION操作组合在一起：

-- country, year totals
select c.ctry_name, i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name = 'Mexico'
group by c.ctry_name, i.year_nbr  -- grouping by country and year
UNION
....
.... -- unions for each of the 6 countries excluding USA
UNION
-- country totals
select c.ctry_name, null as year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name = 'Mexico'
group by c.ctry_name  -- grouping by just country
UNION
....
.... -- unions for each of the 6 countries excluding USA
UNION
-- totals for all countries by year
select 'All countries' as ctry_name, i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
group by i.year_nbr  -- grouping by just year for all countries
UNION
-- totals for all countries and all years
select 'All countries' as ctry_name, null as year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)

该查询虽然返回了预期结果，但是难以维护，并且效率低下。从invoice中获取14次数据，从而进行14次聚合，最后执行一组昂贵的UNION操作。

理解group by和having后，我们可以写一个更简单更加高效的SQL。和WHERE子句过滤原始数据一样，我们也可以使用having从group by结果集中进行过滤。下面的SQL使用having过滤掉了USA的汇总行，产生contry和year组合的聚合：

select coalesce(c.ctry_name,'All countries') as ctry_name,
i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
group by cube(c.ctry_name, i.year_nbr)
having coalesce(c.ctry_name,'x') != 'USA' -- exclude USA summary rows

这个查询在做什么？在group by中使用CUBE来获取：

1)每个country的每一行，year组合

2)每个country的一行，包括所有year的数据

3)每个year的一行，包括所有country

4)所有year、所有country的一行

然后使用having子句排除ctry_name是USA的所有行。这个查询从invoice表仅获取一次数据，比原始SQL快很多。

注意，country总计的有一个null year，同样，year总计的有一个null country。我们使用coalesce函数将all countries进行转换，在having中使用coalesce，不会删除country名为NULL的行。

在这个例子中，我们只需要在4个聚合组的2个中排除USA的数据。如果要求在所有汇总行中排除USA的数据，我们将在agg之前使用WHERE子句过滤USA数据，而不是在having子句中，下面的SQL满足条件：

select coalesce(c.ctry_name,'All countries') as ctry_name,
i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name != 'USA' -- exclude USA in all summaries
group by cube(c.ctry_name, i.year_nbr)

GROUP BY支持cube、rollup，值得花时间更好地理解。

此示例基于的实际 sql 有 24 个小查询，union每个查询中结合了两个以上的表。与原始代码的 200 多行相比，重写的 sql 仅 10 行长，并且在几秒钟内运行，而原始查询则需要半个多小时。

原文

https://smallthingssql.com/having-a-less-understood-sql-clause

猜你喜欢

关于MYSQL的优化全面详解
将GPT家族模型极限压缩，1700+亿参数稀疏性达50%性能不减，单GPU即可
Oracle 从公共库到租户库（oracle公共库租户库）
前端经常遇到的手写js题
jquerysortable的拖动方法示例详解
介绍一款 ABAP 代码搜索工具 RS_ABAP_SOURCE_SCAN 的使用方法
Linux文件管理：掌握的力量（linux文件星号）
Docker 容器中的老式 DOS BBS
Debian GNU/Linux系统现支持64位 RISC
在 docker 之间导出导入镜像「建议收藏」
MSSQL：数据库服务器的利器（mssql是什么）
Oracle之HisTrek你的全球大航行（oracle中globe）
ORA-39055: The string feature is not supported in version string. ORACLE 报错故障修复远程处理
配置Linux在小米3上的配置应用（linux小米3）
【深度学习】英伟达RTX 4080 12G换壳4070 Ti，性能接近上代旗舰3090 Ti

zl程序教程

当前栏目

[译]PostgreSQL HAVING子句

正文

原文

相关文章