zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

HBase管理、分析、修复和调试的自带工具hbck、hfile

HBase调试工具 分析 修复 管理 自带
2023-09-11 14:22:28 时间

 目录 

一、hbck

二、HFile

          三、snapshots

         四、Replication

五、Export

六、copyTable


一、hbck

  • hbck 工具用于Hbase底层文件系统的检测与修复,包含Master、RegionServer内存中的状态及HDFS上数据的状态之间的一致性、黑洞问题、定位元数据不一致问题等
  • 命令: hbase hbck -help  查看参数帮助选项
Usage: fsck [opts] {only tables}
 where [opts] are:
   -help Display help options (this)
   -details Display full report of all regions.
   -timelag <timeInSeconds>  Process only regions that  have not experienced any metadata updates in the last  <timeInSeconds> seconds.
   -sleepBeforeRerun <timeInSeconds> Sleep this many seconds before checking if the fix worked if run with -fix
   -summary Print only summary of the tables and status.
   -metaonly Only check the state of the hbase:meta table.
   -sidelineDir <hdfs://> HDFS path to backup existing meta.
   -boundaries Verify that regions boundaries are the same between META and store files.
   -exclusive Abort if another hbck is exclusive or fixing.

  Datafile Repair options: (expert features, use with caution!)
   -checkCorruptHFiles     Check all Hfiles by opening them to make sure they are valid
   -sidelineCorruptHFiles  Quarantine corrupted HFiles.  implies -checkCorruptHFiles

 Replication options
   -fixReplication   Deletes replication queues for removed peers

  Metadata Repair options supported as of version 2.0: (expert features, use with caution!)
   -fixVersionFile   Try to fix missing hbase.version file in hdfs.
   -fixReferenceFiles  Try to offline lingering reference store files
   -fixHFileLinks  Try to offline lingering HFileLinks
   -noHdfsChecking   Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap
   -ignorePreCheckPermission  ignore filesystem permission pre-check

NOTE: Following options are NOT supported as of HBase version 2.0+.

  UNSUPPORTED Metadata Repair options: (expert features, use with caution!)
   -fix              Try to fix region assignments.  This is for backwards compatiblity
   -fixAssignments   Try to fix region assignments.  Replaces the old -fix
   -fixMeta          Try to fix meta problems.  This assumes HDFS region info is good.
   -fixHdfsHoles     Try to fix region holes in hdfs.
   -fixHdfsOrphans   Try to fix region dirs with no .regioninfo file in hdfs
   -fixTableOrphans  Try to fix table dirs with no .tableinfo file in hdfs (online mode only)
   -fixHdfsOverlaps  Try to fix region overlaps in hdfs.
   -maxMerge <n>     When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default)
   -sidelineBigOverlaps  When fixing region overlaps, allow to sideline big overlaps
   -maxOverlapsToSideline <n>  When fixing region overlaps, allow at most <n> regions to sideline per group. (n=2 by default)
   -fixSplitParents  Try to force offline split parents to be online.
   -removeParents    Try to offline and sideline lingering parents and keep daughter regions.
   -fixEmptyMetaCells  Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows)

  UNSUPPORTED Metadata Repair shortcuts
   -repair           Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles-fixHFileLinks
   -repairHoles      Shortcut for -fixAssignments -fixMeta -fixHdfsHoles

命令: hbase hbck -details   显示所有Region的完整报告

命令: hbase hbck -metaonly  只检测元数据表的状态,如下图:

快捷修复命令:

  • 命令:hbase hbck -repair -ignorePreCheckPermission   
  • 命令:hbase hbck -repairHoles -ignorePreCheckPermission   

二、HFile

  • 查看HFile文件内容工具,命令及参数如下:

三、snapshots

  • HBase快照功能丰富,有很多特征,创建时不需要关闭集群
  • 快照在几秒内就可以完成,几乎对整个集群没有任何性能影响。并且,它只占用一个微不足道的空间
  • 启用快速需设置 hbase-site.xml 文件的  hbase.snapshot.enabled 为True
  • 命令: snapshot 'PerTest','snapPerTest'     基于表 PerTest 创建名为 snapPerTest 的快照
  • 命令: list_snapshots     查看快照列表

 四、Replication

  • HBase replication是另外一个负载较轻的备份工具。被定义为列簇级别,可以工作在后台并且保证所有的编辑操作在集群复制链之间的同步
  • 复制有三种模式:主->从(master->slave),主<->主(master<->master)和循环(cyclic)。这种方法给你灵活的从任意数据中心获取数据并且确保它能获得在其他数据中心的所有副本。在一个数据中心发生灾难性故障的情况下,客户端应用程序可以利用DNS工具,重定向到另外一个备用位置
  • 注:对于一个存在的表,你需要通过本文描述的其他方法,手工的拷贝源表到目的表。复制仅仅在你启动它之后才对新的写/编辑操作有效

                                                            

  • 复制是一个强大的,容错的过程。它提供了“最终一致性”,意味着在任何时刻,最近对一个表的编辑可能无法应用到该表的所有副本,但是最终能够确保一致。

五、Export

  • Export是HBase一个内置的实用功能,它使数据很容易将hbase表内容输出成HDFS的SequenceFiles文件
  • 使用map reduce任务,通过一系列HBase API来获取指定表格的每一行数据,并且将数据写入指定的HDFS目录中
  • 命令语法: hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir>     示例如下:

导出到HDFS的文件 

六、copyTable

  • 和导出功能类似,拷贝表也使用HBase API创建了一个mapreduce任务,以便从源表读取数据。不同的地方是拷贝表的输出是hbase中的另一个表,这个表可以在本地集群,也可以在远程集群
  • 它使用独立的“puts”操作来逐行的写入数据到目的表。如果你的表非常大,拷贝表将会导致目标region server上的memstore被填满,会引起flush操作并最终导致合并操作的产生,会有垃圾收集操作等等
  • 必须考虑到在HBase上运行mapreduce任务所带来的性能影响。对于大型的数据集,这种方法的效果不太理想
  • 命令语法:hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=PerTest2 PerTest    (copy名为PerTest的表到集群中的另外一个表PerTest2) 
  • 注意:若用到--new.name = xxx,首先这个新表要之前就被定义