PostgreSQL 空间、多维 序列 生成方法
2023-09-27 14:27:57 时间
背景
PostgreSQL 双节点流复制如何同时保证可用性、可靠性(rpo,rto) - (半同步,自动降级方法实践) PostgreSQL , 同步 , 半同步 , 流复制 两节点HA架构,如何做到跨机房RPO=0(可靠性维度)?同时RTO可控(可用性维度)? 半同步是一个不错的选择。 1、当只挂掉一个节点时,可以保证RPO=0。如下: 主 - 从(挂) 主(挂) - 从 2、当一个节点挂掉后,在另一个节点恢复并开启同步模式前,如果在此期间(
有张表的主键id是serial,但现在不够了,需要升级成bigserial,有什么优雅的方法吗?我看下来好像会锁表很久(因为数据量挺大) 如果直接alter table,由于数据类型从4字节改成了8字节,而tuple结构是在METADATA里面的,不是每行都有,所以DEFORM需要依赖METADATA,目前来说,这种操作需要rewrite table。
一、9.4以下版本,使用pg_dump并行导出,pg_restore并行导入,迁移 (导出使用源版本pg_dump,导入使用目标版本pg_restore。
PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法) PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)https://github.com/digoal/blog/blob/master/201704/20170426_01.md
PostgreSQL 10.1 手册_部分 III. 服务器管理_第 26 章 高可用、负载均衡和复制_26.4. 日志传送的替代方法 26.4. 日志传送的替代方法 26.4.1. 实现 26.4.2. 基于记录的日志传送 前一节描述的内建后备模式的一种替代方案是使用一个轮询归档位置的 restore_command。这是版本 8.4 及以下版本中唯一可用的选项。
数据库的一维序列是很好理解的东西,就是在一个维度上自增。
那么二维、多维序列怎么理解呢?显然就是在多个维度上齐头并进的自增咯。
二维序列以二维序列为例,应该是这样增长的:
0,0
那么如何生成以上二维序列呢?实际上可以利用数据库的多个一维序列来生成。
create sequence seq1; create sequence seq2; create or replace function seq_2d() returns point[] as $$ declare res point[]; begin select array_cat(res, array[point(nextval(seq1), nextval(seq2))]) into res; select array_cat(res, array[point(currval(seq1)+1, currval(seq2))]) into res; select array_cat(res, array[point(currval(seq1), currval(seq2)+1)]) into res; return res; end; $$ language plpgsql strict;
测试
test=# select seq_2d(); seq_2d --------------------------- {"(1,1)","(2,1)","(1,2)"} (1 row) test=# select seq_2d(); seq_2d --------------------------- {"(2,2)","(3,2)","(2,3)"} (1 row) test=# select seq_2d(); seq_2d --------------------------- {"(3,3)","(4,3)","(3,4)"} (1 row)三维序列
三维序列的生成方法类似:
0,0,0 1,0,0 0,1,0 0,0,1 1,1,0 0,1,1 1,0,1 1,1,1 2,1,1 1,2,1 1,1,2 2,2,1 1,2,2 2,1,2 2,2,2 ......
create sequence seq1; create sequence seq2; create sequence seq3; create extension cube; create or replace function seq_3d() returns cube[] as $$ declare res cube[]; begin select array_cat(res, array[cube(array[nextval(seq1), nextval(seq2), nextval(seq3)])]) into res; select array_cat(res, array[cube(array[currval(seq1)+1, currval(seq2), currval(seq3)])]) into res; select array_cat(res, array[cube(array[currval(seq1), currval(seq2)+1, currval(seq3)])]) into res; select array_cat(res, array[cube(array[currval(seq1), currval(seq2), currval(seq3)+1])]) into res; select array_cat(res, array[cube(array[currval(seq1)+1, currval(seq2)+1, currval(seq3)])]) into res; select array_cat(res, array[cube(array[currval(seq1), currval(seq2)+1, currval(seq3)+1])]) into res; select array_cat(res, array[cube(array[currval(seq1)+1, currval(seq2), currval(seq3)+1])]) into res; return res; end; $$ language plpgsql strict;
例子
test=# select seq_3d(); seq_3d --------------------------------------------------------------------------------------- {"(1, 1, 1)","(2, 1, 1)","(1, 2, 1)","(1, 1, 2)","(2, 2, 1)","(1, 2, 2)","(2, 1, 2)"} (1 row) test=# select seq_3d(); seq_3d --------------------------------------------------------------------------------------- {"(2, 2, 2)","(3, 2, 2)","(2, 3, 2)","(2, 2, 3)","(3, 3, 2)","(2, 3, 3)","(3, 2, 3)"} (1 row)多维序列
以此类推,可以得到多维序列。
多维数据的空间存放和BRIN块级索引前面讲到了空间聚集存储,如果数据按空间顺序存放,使用BRIN块级索引,可以在任意维度上得到最好的查询效率,真正做到一个块级索引支持任意列的高效过滤。
例子
create sequence seq1; create sequence seq2; create sequence seq3; create table tbl(c1 int, c2 int, c3 int); create or replace function cluster_insert() returns void as $$ declare begin insert into tbl values (nextval(seq1), nextval(seq2), nextval(seq3)); insert into tbl values (currval(seq1)+1, currval(seq2), currval(seq3)); insert into tbl values (currval(seq1), currval(seq2)+1, currval(seq3)); insert into tbl values (currval(seq1), currval(seq2), currval(seq3)+1); insert into tbl values (currval(seq1)+1, currval(seq2)+1, currval(seq3)); insert into tbl values (currval(seq1), currval(seq2)+1, currval(seq3)+1); insert into tbl values (currval(seq1)+1, currval(seq2), currval(seq3)+1); end; $$ language plpgsql strict;
压测,写入大量数据
vi test.sql select count(*) from (select cluster_insert() from generate_series(1,100)) t; pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 1200
检查多维聚集性
test=# select * from tbl limit 10; c1 | c2 | c3 ---------+---------+--------- 1992652 | 1992653 | 1992652 1992573 | 1992574 | 1992578 1992574 | 1992574 | 1992578 1992573 | 1992575 | 1992578 1992573 | 1992574 | 1992579 1992574 | 1992575 | 1992578 1992573 | 1992575 | 1992579 1992574 | 1992574 | 1992579 1992658 | 1992658 | 1992658 1992659 | 1992658 | 1992658 (10 rows)
创建BRIN块级索引。
create index idx on tbl using brin (c1,c2,c3);
test=# \dt+ tbl List of relations Schema | Name | Type | Owner | Size | Description --------+---------------------+-------+----------+------------+------------- public | tbl | table | postgres | 97 GB | (1 row) test=# \di+ idx List of relations Schema | Name | Type | Owner | Table | Size | Description --------+------+-------+----------+-------+--------+------------- public | idx | index | postgres | tbl | 456 kB | (1 row)
看看456KB的索引,在97 GB的数据层面,查询效率如何。
任意列、组合查询过滤性。
explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000; explain (analyze,verbose,timing,costs,buffers) select * from tbl where c2 between 1 and 1000; explain (analyze,verbose,timing,costs,buffers) select * from tbl where c3 between 1 and 1000; explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000 and c2 between 100 and 2000; explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000 and c3 between 100 and 2000; explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000 and c2 between 100 and 2000 and c3 between 1 and 2000;
test=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000; ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.tbl (cost=650.23..31623.80 rows=1 width=12) (actual time=27.302..50.284 rows=6997 loops=1) Output: c1, c2, c3 Recheck Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000)) Rows Removed by Index Recheck: 229803 Heap Blocks: lossy=1280 Buffers: shared hit=1942 - Bitmap Index Scan on idx (cost=0.00..650.23 rows=23810 width=0) (actual time=26.881..26.881 rows=12800 loops=1) Index Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000)) Buffers: shared hit=662 Planning time: 0.095 ms Execution time: 50.636 ms (11 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c2 between 1 and 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.tbl (cost=650.23..31623.80 rows=1 width=12) (actual time=27.886..49.011 rows=6997 loops=1) Output: c1, c2, c3 Recheck Cond: ((tbl.c2 = 1) AND (tbl.c2 = 1000)) Rows Removed by Index Recheck: 229803 Heap Blocks: lossy=1280 Buffers: shared hit=1942 - Bitmap Index Scan on idx (cost=0.00..650.23 rows=23810 width=0) (actual time=27.512..27.512 rows=12800 loops=1) Index Cond: ((tbl.c2 = 1) AND (tbl.c2 = 1000)) Buffers: shared hit=662 Planning time: 0.040 ms Execution time: 49.348 ms (11 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c3 between 1 and 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.tbl (cost=650.23..31623.80 rows=1 width=12) (actual time=25.238..46.292 rows=6997 loops=1) Output: c1, c2, c3 Recheck Cond: ((tbl.c3 = 1) AND (tbl.c3 = 1000)) Rows Removed by Index Recheck: 229803 Heap Blocks: lossy=1280 Buffers: shared hit=1942 - Bitmap Index Scan on idx (cost=0.00..650.23 rows=23810 width=0) (actual time=24.875..24.875 rows=12800 loops=1) Index Cond: ((tbl.c3 = 1) AND (tbl.c3 = 1000)) Buffers: shared hit=662 Planning time: 0.044 ms Execution time: 46.631 ms (11 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000 and c2 between 100 and 2000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.tbl (cost=650.23..31742.85 rows=1 width=12) (actual time=30.018..48.522 rows=6307 loops=1) Output: c1, c2, c3 Recheck Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000) AND (tbl.c2 = 100) AND (tbl.c2 = 2000)) Rows Removed by Index Recheck: 230493 Heap Blocks: lossy=1280 Buffers: shared hit=1942 - Bitmap Index Scan on idx (cost=0.00..650.23 rows=23810 width=0) (actual time=27.273..27.273 rows=12800 loops=1) Index Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000) AND (tbl.c2 = 100) AND (tbl.c2 = 2000)) Buffers: shared hit=662 Planning time: 0.049 ms Execution time: 48.829 ms (11 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000 and c3 between 100 and 2000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.tbl (cost=650.23..31742.85 rows=1 width=12) (actual time=27.565..46.347 rows=6307 loops=1) Output: c1, c2, c3 Recheck Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000) AND (tbl.c3 = 100) AND (tbl.c3 = 2000)) Rows Removed by Index Recheck: 230493 Heap Blocks: lossy=1280 Buffers: shared hit=1942 - Bitmap Index Scan on idx (cost=0.00..650.23 rows=23810 width=0) (actual time=24.799..24.799 rows=12800 loops=1) Index Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000) AND (tbl.c3 = 100) AND (tbl.c3 = 2000)) Buffers: shared hit=662 Planning time: 0.055 ms Execution time: 46.656 ms (11 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 between 1 and 1000 and c2 between 100 and 2000 and c3 between 1 and 2000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.tbl (cost=650.23..31861.90 rows=1 width=12) (actual time=28.703..49.599 rows=6307 loops=1) Output: c1, c2, c3 Recheck Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000) AND (tbl.c2 = 100) AND (tbl.c2 = 2000) AND (tbl.c3 = 1) AND (tbl.c3 = 2000)) Rows Removed by Index Recheck: 230493 Heap Blocks: lossy=1280 Buffers: shared hit=1942 - Bitmap Index Scan on idx (cost=0.00..650.23 rows=23810 width=0) (actual time=25.590..25.590 rows=12800 loops=1) Index Cond: ((tbl.c1 = 1) AND (tbl.c1 = 1000) AND (tbl.c2 = 100) AND (tbl.c2 = 2000) AND (tbl.c3 = 1) AND (tbl.c3 = 2000)) Buffers: shared hit=662 Planning time: 0.114 ms Execution time: 49.919 ms (11 rows)小结
本文介绍了如何创建、生成多维序列。
本文验证了数据如果按照多维序列聚集存放,可以达到块级索引最强过滤性,任意字段都能实现高效率过滤。
如果数据的多列本身不存在相关性,可以参考这篇文档,对数据进行空间重分布存储。得到最强过滤性。
PostgreSQL 双节点流复制如何同时保证可用性、可靠性(rpo,rto) - (半同步,自动降级方法实践) PostgreSQL , 同步 , 半同步 , 流复制 两节点HA架构,如何做到跨机房RPO=0(可靠性维度)?同时RTO可控(可用性维度)? 半同步是一个不错的选择。 1、当只挂掉一个节点时,可以保证RPO=0。如下: 主 - 从(挂) 主(挂) - 从 2、当一个节点挂掉后,在另一个节点恢复并开启同步模式前,如果在此期间(
有张表的主键id是serial,但现在不够了,需要升级成bigserial,有什么优雅的方法吗?我看下来好像会锁表很久(因为数据量挺大) 如果直接alter table,由于数据类型从4字节改成了8字节,而tuple结构是在METADATA里面的,不是每行都有,所以DEFORM需要依赖METADATA,目前来说,这种操作需要rewrite table。
一、9.4以下版本,使用pg_dump并行导出,pg_restore并行导入,迁移 (导出使用源版本pg_dump,导入使用目标版本pg_restore。
PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法) PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)https://github.com/digoal/blog/blob/master/201704/20170426_01.md
PostgreSQL 10.1 手册_部分 III. 服务器管理_第 26 章 高可用、负载均衡和复制_26.4. 日志传送的替代方法 26.4. 日志传送的替代方法 26.4.1. 实现 26.4.2. 基于记录的日志传送 前一节描述的内建后备模式的一种替代方案是使用一个轮询归档位置的 restore_command。这是版本 8.4 及以下版本中唯一可用的选项。
相关文章
- 高效的大型时间序列数据压缩方法 MidiMax 压缩算法 | 让时间序列可视化更容易
- seq2seq任务中预测序列的词元不断重复
- 时间序列-预测:概述【Time Series Forecasting (TSF) 】【时间序列既可以做回归任务,也可以做分类任务】【预测是回归问题,不是分类问题】
- 时间序列-异常检测(Anomaly Detection)(二):传统方法
- 时间序列-异常检测(Anomaly Detection)(四):深度学习方法
- 推荐模型-序列推荐-2017:NARM
- 不通过删除重建方式,重置序列值的简单方法
- 屏幕序列Screen Sequences
- [python] 时间序列分析之ARIMA
- 权限开发 spring security 3.0.7 序列1 数据库脚本
- python序列(列表,元组,字典)的增删改查
- JAVA基础 XML生成与解析和String包装类下 .replace方法的使用以及char和字符序列的使用场景
- .NET(C#) Json字符串反序列化成dynamic类型对象的方法代码
- 最长上升子序列(两种方法)
- [LeetCode] 300. Longest Increasing Subsequence 最长递增子序列