Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储
2023-09-14 09:11:56 时间
public final class Lucene54DocValuesFormat
extends DocValuesFormat
Lucene 5.4 DocValues format.
Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:
- Delta-compressed: per-document integers written as deltas from the minimum value, compressed with bitpacking. For more information, see
DirectWriter
. - Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as
SmallFloat
), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (DirectWriter
). - GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.
- Monotonic-compressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.
- Const-compressed: when there is only one possible non-missing value, only the missing bitset is encoded.
- Sparse-compressed: only documents with a value are stored, and lookups are performed using binary search.
- Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed directly with multiplication (
docID * length
). - Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotonic-compressed numerics.
- Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotonic-compressed numerics. A reverse lookup index is written from a portion of every 1024th term.
- Sorted: a mapping of ordinals to deduplicated terms is written as Binary, along with the per-document ordinals written using one of the numeric strategies above.
- Single: if all documents have 0 or 1 value, then data are written like SORTED.
- SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
- SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and per-document index into this list are written using the numeric strategies above.
- Single: if all documents have 0 or 1 value, then data are written like NUMERIC.
- SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
- SortedNumeric: a value list and per-document index into this list are written using the numeric strategies above.
Files:
- .dvd: DocValues data
- .dvm: DocValues metadata
转自:http://lucene.apache.org/core/6_4_2/core/org/apache/lucene/codecs/lucene54/Lucene54DocValuesFormat.html
可以看到占用空间非常小!!!
du -sm elasticsearch/nodes/0/indices/hec_test2/0/index/* 299 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdt 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdx 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fnm 148 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.doc 130 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tim 5 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tip 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvd 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvm 1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.si 1 elasticsearch/nodes/0/indices/hec_test2/0/index/segments_7 0 elasticsearch/nodes/0/indices/hec_test2/0/index/write.lock
相关文章
- MySQL运行SQL文件时(全面,改成time):check the manual that corresponds to your MySQL server version for the righ
- python pkl文件_Python字符串格式化输出的方式包括
- appdev文件是什么_常用域名后缀ac
- 让你彻底理解浅拷贝和深拷贝的区别是什么_怎么让文件无法拷贝
- 解决 Python 存储 CSV 文件时多余空行
- 2.1k Star开源支持文件上传、下载、存储功能的分布式海量小文件存储系统
- 【Android 安装包优化】Android 应用中 7zr 可执行程序准备 ( Android Studio 导入可执行 7zr 程序 | 从 Assets 资源文件拷贝 7zr 到内置存储 )
- 【Android 逆向】类加载器 ClassLoader ( 使用 DexClassLoader 动态加载字节码文件 | 拷贝 DEX 文件到内置存储 | 加载并执行 DEX 字节码文件 )
- HDFS的Java客户端操作代码(查看HDFS下所有的文件存储位置信息)详解大数据
- java使用七牛云存储文件和图片详解编程语言
- MySQL 数据库文件存储路径指南(mysql数据存放路径)
- Linux上使用iCloud储存文件(linuxicloud)
- 下文件Linux C编程获取目录下所有文件(linuxc获取目录)
- Linux:彻底删除当前目录下所有文件(linux删除当前目录下所有文件)
- 文件分布式存储Redis实现PDF文件的分布式存储(redis实现pdf)
- Linux查看文件拷贝进度:一种实时追踪方式(Linux查看cp进度)
- 掌握Linux下读写设备文件的方法(linux读写设备文件)
- Redis实现文件存储解决方案(redis存储文件)
- MySQL存储文件的实用技巧(mysql保存文件)
- Linux 启动文件:排除问题并精彩开启(linux启动文件)
- Linux下的域名解析文件详解(linux域名解析文件)
- SQL Server 存储文件:提升数据资产管理能力(sqlserver存文件)
- Oracle BFILE 文件存储及操作(oracle bfile)
- Linux下轻松导出CSV文件(linux导出csv)
- Linux大文件存储:更丰富的空间管理之选(linux大文件存储)
- MySQL数据库:存储文件的位置(mysql数据库存放位置)
- Oracle中List文件的重要性(lst文件oracle)
- 文件流存储实现数据更快更稳定(文件流存储redis)
- 文件批量存储至Redis来优化S3读取性能(s3文件读到redis)
- Redis极致存储,文件无限量(redis能存储多大文件)
- 通过数组给您的文件排序
- java读取csv文件内容示例代码