zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

HBase

2023-09-11 14:16:23 时间
  • https://wiki.mozilla.org/Socorro:HBase
  • http://blog.cloudera.com/blog/2011/02/log-event-processing-with-hbase/
  • Column families
    • Example: A common column family Socorro uses is "ids:" and a common column qualifier in that family is "ids:ooid". Another column is "ids:hang"
    • The table schema enumerates the column families that are part of it. The column family contains metadata about compression, number of value versions retained, and caching.
    • A column family can store tens of thousands of values with different column qualifier names.
    • Retrieving data from multiple column families requires at least one block access (disk or memory) per column family. Accessing multiple columns in the same family requires only one block access.
    • If you specify just the column family name when retrieving data, the values for all columns in that column family will be returned.
    • If a record does not contain a value for a particular column in a set of columns you query for, there is no "null", there just isn't an entry for that column in the returned row.
  • Manipulating a row
    • All manipulations are performed using a rowkey.
    • Setting a column to a value will create the row if it doesn't exist or update the column if it already existed.
    • Deleting a non-existent row or column is a no-op.
    • Counter column increments are atomic and very fast. StumbleUpon has some counters that they increment hundreds of times per second.
  • Tables are always ordered by their rowkeys
    • Scanning a range of a table based on a rowkey prefix or a start and end range is fast.
    • Retrieving a row by its key is fast.
    • Searching for a row requires a rowkey structure that you can easily do a range scan on, or a reverse index table.
    • A full scan on a table that contains billions of items is slow (although, unlike an RDBMS it isn't likely to cause performance problems)
    • If you are continually inserting rows that have similar rowkey prefixes, you are beating up on a single RegionServer. In excess, it is unpleasant.