Partition

Partitioning Key

Partitioning by key range

How

  • Assign each partition a continuous range of keys
  • If you know the boundaries between the ranges, then you can determine which partition contains a given key
  • The partition boundaries can be chosen manually by an administrator, and can also be chosen automatically by the database

Pros and Cons

  • Pros
    • Range scans are easy
  • Cons
    • Certain access patterns may lead to hot spots. For example, if the key is timestamp, then writes might end up going to the same partition.

Applications

  • BigTable, and its open source equivalent HBase
  • Rethink DB
  • MongoDB before 2.4

Partitioning by hash

How

  • Use a hash function to determine the partition for a given key.
  • For partitioning purposes, the hash function need not be cryptographically strong.
  • Some programming’s built-in hash functions may not be suitable for partitioning: for example, in Java’s Object.hashCode() and Ruby’s Object#hash, the same key may have a different hash value in different processes

Pros and Cons

  • (Pro) Distribute keys fairly among the parititions
  • (Cons) Inefficient to do range queries on the partitioning key

Examples

  • Hash functions
    • MD5: Cassandra and MongoDB use MD5
    • Fowler-Noll-Vo: Voldemort
  • Ranges queries
    • Not supported for primary key: Riak, Couchbase, or Voldemort
    • Sent to all paritions: MongoDB

Partitioning and Secondary Indexes

Partitioning Secondary Indexes by Document

How

  • Partition the database by the ID that uniquely identifies a record
  • Each partition maintains its own secondary indexes, covering only the documents in that partition

Pros and Cons

  • Cons: Need to query all partitions

Applications

  • MongoDB, Riak, Cassandra, Elasticsearch, SolrCloud, and VoltDB

Partitioning Secondary Indexes by Term

How

  • Maintain a global secondary index that covers data in all partitions
  • Partition the global secondary by a term different from the primary index

Pros and Cons

  • Pros: Reads are more efficient because we can only query partitions that containing the term we want
  • Cons: Writes are slower and more complicated because a write to a single document may now affect multiple partitions of the index

Other issues

  • Updates to global secondary indexes are often asynchronous

Rebalancing Partitions

Requirements

  • While rebalancing, the data moved between nodes should be minimized
  • While rebalancing, the database should continue accepting reads and writes
  • After rebalancing, the load (data storage, read and write requests) should be shared fairly between the nodes in the cluster

Rebalancing startegies

(Don’t) hash mode N

  • When N changes, most of the keys will need to be moved from one to another

Fixed number of partitions

  • Create many more partitions than there are nodes
  • Assign keys to partitions. This assignment will not change.
  • Assign several partitions to each node. This assignment will change when rebalancing happens.
  • When adding new node, the new node can steal a few partitions from every existing node until partitions are fairly distributed once again.
  • When a node is removed from the cluster, the paritions assigned to the node will be fairly distributed to exisiting nodes.

Dynamic Partitioning

Partitioning proportionally to nodes

Examples

  • Fixed number of paritions: Riak, Elasticsearch, Couchbase, Voldemort

Automatic or Manual Rebalancing

(Don’t do) Fully automated

  • Most convenient, but could be problematic since the behaviour can be unpredictable and rebalancing is expensive
  • It can be a good thing to have a human in the loop for rebalancing. It’s slower than a fully automatic process, but it can help prevent operational surprises.

Semi-automated

  • The system generate a suggested partition assignment automatically, but require an administrator to commit it before it takes effect.

Fully manual

Examples

  • Semi-automated: Couchbase, Riak, and Voldemort

Request Routing

What component makes the routing decision

Any node

  • Sending the request to any node, and the node forwards the request to the node that should handle the request.

Routing tier

  • Sending requests to a routing layer, and the routing layer forwards the request to the node that is responsible for handling the request.

Client

  • The client send the partition directly to the appropriate node

How to make routing decision

  • Many distributed data systems rely on a separate coordination service such as ZooKeeper to maintain the authoritative mapping of partitions to nodes.

Real-world examples

  • HBase, SolrCloud, Kafka: use ZooKeeper to track partition assignment.
  • LinkedIn’s Espresso: uses Helix for cluster management (which in turn relies on ZooKeeper)
  • MongoDB: relies on its own config server implementation and mongos daemons as the routing tier.
  • Cassandra, Riak: use a gossip protocol among the nodes to disseminate any changes in cluster state. Requests can be sent to any node, and that node forwards them to the appropriate node for the requested partition
  • Couchbase: does not rebalance automatically, and is normally is configured with a routing tier called moxi, which learns about routing changes from the cluster nodes

Real-World examples

MySQL