Logical Model of MapR-DB Binary Tables

Like Apache HBase, MapR-DB stores structured data as a nested series of maps. Each map consists of a set of key-value pairs, where the value can be the key in another map. Keys are kept in strict lexicographical order: 1, 10, and 113 come before 2, 20, and 213.

In descending order of granularity, the elements of a binary table are:

  • Key: Keys identify the rows in a table. In MapR-DB, the maximum supported size of a row key is 64 KB. However, the recommended practice is to keep it lower than a few hundred bytes.
  • Row: Rows span one or more column families and columns. In MapR-DB, the maximum supported size of a row is 2 GB. However, the recommended practice is to keep the size under 2 MB. In general, MapR-DB performs better with many small rows, rather than with fewer very large rows.
  • Column family: A column family is a key associated with a set of columns. Specify this association according to your individual use case, creating sets of columns. A column family can contain an arbitrary number of columns. MapR-DB binary tables support up to 64 column families.
  • Column: Columns are keys that are associated with a series of timestamps that define when the value in that column was updated.
  • Timestamp: The timestamp in a column specifies a particular data write to that column.
  • Value: The data written to that column at the specific timestamp.

This structure results in versioned values that you can access flexibly and quickly. Because Apache HBase and MapR-DB binary tables are sparse, any of the column values for a given key can be null.

Example Table

This example uses JSON notation for representational clarity. In this example, timestamps are arbitrarily assigned.

{
 "arbitraryFirstKey" : {
     "firstColumnFamily" : {
          "firstColumn" : {
               10 : "valueFive",
               7 : "valueThree",
               4 : "valueOne",
               }
          "secondColumn" : {
               16 : "valueEight",
               1 : "valueSeven",
               }
          }
     "secondColumnFamily" : {
          "firstColumn" : {
               37 : "valueFive",
               23 : "valueThree",
               11 : "valueSeven",
               4 : "valueOne",
               }
          "secondColumn" : {
               15 : "valueEight",
               }
          }
     }
 "arbitrarySecondKey" : {
     "firstColumnFamily" : {
          "firstColumn" : {
               10 : "valueFive",
               4 : "valueOne",
               }
          "secondColumn" : {
               16 : "valueEight",
               7 : "valueThree",
               1 : "valueSeven",
               }
          }
     "secondColumnFamily" : {
          "firstColumn" : {
               23 : "valueThree",
               11 : "valueSeven",
               }
          }
     }
}

Queries return the most recent timestamp by default. A query for the value in "arbitrarySecondKey"/"secondColumnFamily:firstColumn" returns valueThree. Specifying a timestamp with a query for "arbitrarySecondKey"/"secondColumnFamily:firstColumn"/11 returns valueSeven.