您现在的位置是：首页 > 其他

当前栏目

Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储

文件存储 value 就是针对 Field Lucene DVD

2023-09-14 09:11:56 时间

public final class Lucene54DocValuesFormat
extends DocValuesFormat

Lucene 5.4 DocValues format.

Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:

NUMERIC:

Delta-compressed: per-document integers written as deltas from the minimum value, compressed with bitpacking. For more information, see DirectWriter.
Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as SmallFloat), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (DirectWriter).
GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.
Monotonic-compressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.
Const-compressed: when there is only one possible non-missing value, only the missing bitset is encoded.
Sparse-compressed: only documents with a value are stored, and lookups are performed using binary search.

BINARY:

Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed directly with multiplication (docID * length).
Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotonic-compressed numerics.
Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotonic-compressed numerics. A reverse lookup index is written from a portion of every 1024th term.

SORTED:

Sorted: a mapping of ordinals to deduplicated terms is written as Binary, along with the per-document ordinals written using one of the numeric strategies above.

SORTED_SET:

Single: if all documents have 0 or 1 value, then data are written like SORTED.
SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and per-document index into this list are written using the numeric strategies above.

SORTED_NUMERIC:

Single: if all documents have 0 or 1 value, then data are written like NUMERIC.
SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
SortedNumeric: a value list and per-document index into this list are written using the numeric strategies above.

Files:

.dvd: DocValues data
.dvm: DocValues metadata

转自：http://lucene.apache.org/core/6_4_2/core/org/apache/lucene/codecs/lucene54/Lucene54DocValuesFormat.html

可以看到占用空间非常小！！！

du -sm elasticsearch/nodes/0/indices/hec_test2/0/index/*
299     elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdt
1       elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdx
1       elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fnm
148     elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.doc
130     elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tim
5       elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tip
1       elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvd
1       elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvm
1       elasticsearch/nodes/0/indices/hec_test2/0/index/_e.si
1       elasticsearch/nodes/0/indices/hec_test2/0/index/segments_7
0       elasticsearch/nodes/0/indices/hec_test2/0/index/write.lock

猜你喜欢

Python与风水的‘’南北通透‘’住宅与‘’南北” 向住宅的均价数据分析
Android开源框架ViewPageIndicator和ViewPager实现Tab导航
firewalld 设置规则只允许指定ip访问指定端口 —— 筑梦之路
SAP UI5 应用开发教程之六十 - SAP UI5 地图控件的一些高级用法试读版
IE盒模型和W3C盒子模型的区别
AI：2020年6月23日北京智源大会顶级大佬邝子平、李开复、陆奇、张亚勤、曹勖文进行云上圆桌论坛《探讨AI与创业》
AHB-Lite简介
系统动态响应分析
Android 11 系统默认横屏显示
大数据基础之Hive（3）最简绿色部署
Java8新特性学习笔记(一) Lambda表达式
Pyecharts：pyecharts(图文+代码)实战(柱状图/条形图/散点图、漏斗图、仪表盘、折线/面积图、水球图、地图、平行坐标系、饼图、极坐标系、雷达图、词云图)之绘制各种吊炸天的图表
序列化和反序列化
前端学习 -- Css -- 兄弟元素选择器
阿里云的SLB，植入cookie和重写cookie有什么区别？
力扣——374. 猜数字大小（java）
ZKUI中文编码以及以docker方式运行的问题

相关主题

zl程序教程

当前栏目

Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储

相关文章