您现在的位置是：首页 > 数据库

当前栏目

Mapreduce构建hbase二级索引

HBase 索引构建 MapReduce 二级

2023-09-14 09:00:23 时间

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableMapper; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.GenericOptionsParser; public class IndexBuilder { private class MyMapper extends TableMapper ImmutableBytesWritable, Put { private Map byte[], ImmutableBytesWritable indexes = new HashMap byte[], ImmutableBytesWritable private String columnFamily; @Override protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException { Set byte[] keys = indexes.keySet(); for (byte[] k : keys) { ImmutableBytesWritable indexTableName = indexes.get(k); byte[] val = value.getValue(Bytes.toBytes(columnFamily), k); Put put = new Put(val);// 索引表的rowkey为原始表的值 put.add(Bytes.toBytes("f1"), Bytes.toBytes("id"), key.get());// 索引表的内容为原始表的rowkey context.write(indexTableName, put); } } @Override protected void setup(Context context) throws IOException, InterruptedException { Configuration conf = context.getConfiguration(); String tableName = conf.get("tableName"); columnFamily = conf.get("columnFamily"); String[] qualifiers = conf.getStrings("qualifiers"); // indexes的key为列名，value为索引表名 for (String q : qualifiers) { indexes.put( Bytes.toBytes(q), new ImmutableBytesWritable(Bytes.toBytes(tableName + "-" + q))); } } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = HBaseConfiguration.create(); String[] otherargs = new GenericOptionsParser(conf, args) .getRemainingArgs();// 去除掉没有用的命令行参数 // 输入参数：表名，列族名，列名 if (otherargs.length 3) { System.exit(-1); } String tableName = otherargs[0]; String columnFamily = otherargs[1]; conf.set("tableName", tableName); conf.set("columnFamily", columnFamily); String[] qualifiers = new String[otherargs.length - 2]; for (int i = 0; i qualifiers.length; i++) { qualifiers[i] = otherargs[i + 2]; } conf.setStrings("qualifiers", qualifiers); Job job = new Job(conf, tableName); job.setJarByClass(IndexBuilder.class); job.setMapperClass(MyMapper.class); job.setNumReduceTasks(0); job.setInputFormatClass(TableInputFormat.class); // 可以输出多张表 job.setOutputFormatClass(MultiTableOutputFormat.class); Scan scan = new Scan(); scan.setCaching(1000); TableMapReduceUtil.initTableMapperJob(tableName, scan, MyMapper.class, ImmutableBytesWritable.class, Put.class, job); job.waitForCompletion(true); } }

本文出自 “点滴积累” 博客，请务必保留此出处http://tianxingzhe.blog.51cto.com/3390077/1699774

hbase构建二级索引解决方案 HBase的一级索引就是rowkey，我们仅仅能通过rowkey进行检索。假设我们相对Hbase里面列族的列列进行一些组合查询，就只能全表扫描了。表如果较大的话，代价是不可接受的，所以要提出二级索引的方案。
基于HBase构建千亿级文本数据相似度计算与快速去重系统随着大数据时代的到来，数据信息在给我们生活带来便利的同时，同样也给我们带来了一系列的考验与挑战。本文主要介绍了基于 Apache HBase 与 Google SimHash 等多种算法共同实现的一套支持百亿级文本数据相似度计算与快速去重系统的设计与实现。该方案在公司业务层面彻底解决了多主题海量文本数据所面临的存储与计算慢的问题。一. 面临的问题 1. 如何选择文本的相似度计算或去重算法？常见的有余弦夹角算法、欧式距离、Jaccard 相似度、最长公共子串、编辑距离等。这些算法对于待比较的文本数据不多时还比较好用，但在海量数据背景下，如果每天产生的数据以千万计算，我们如何对于这些海
第十二届 BigData NoSQL Meetup — 基于hbase的New sql落地实践立即下载

猜你喜欢

[TypeScript] 0.First Example
设置SQL*Plus 的AUTOTRACE
json-lib简单处理json和对json的简单介绍
JBoss7.1配置外网访问
python进程间的通信
Python-OpenCV图像处理-03-色彩空间
Linux pipe函数
Python数学计算工具1、海伦公式计算三角形面积
拒绝拖延，提高效率从一款高效的笔记工具开始
下拉列表
python字符编码
Atitit 风控之道 attilax著风险控制 1. 融资风险控制3 1.1. 风险控制基本知识3 2. 第8 章项目风险的分类管理 1564 2.1. 8.1 项目风险分类 1564
关于公网摄像机直播公网视频直播的基本思考方法
应付压力有什么秘诀
【LeetCode】62. 不同路径
iview TimePicker 时间验证问题
【youcans 的 OpenCV 例程200篇】191.基于图像分割的金字塔图像融合
kafka-eagle下载地址
Java实现蓝桥杯VIP 基础练习时间转换
Spring data elasticsearch的使用
Qt开发经验小技巧231-235
WIN10平板如何修改网络IP地址为固定
[Typescript Challenges] 5. Easy - Length of Tuple

相关主题

hbase-0.94 Java API
HBase命令
Hbase数据库
HBase 数据模型
97 hbase开发
HBase二级索引
HBase Compaction
HBase性能调优
HBase 集群安装
HBase权威指南
Hbase问题
HBase Master 启动
Hbase 过滤器
第1章 HBase简介
hbase的安装
Hbase 安装部署
hbase 性能调优

zl程序教程

当前栏目

Mapreduce构建hbase二级索引

相关文章