MongoDB: Advance concepts - Replication and Sharding

MongoDB- Advance ConceptsReplication AndSharding

Piyush Rana Software Consultant Knoldus Software LLP

Agenda

1) What is Replication ?

2) How Replication is handled ?

3) Replica Sets Or Master-Slave Replication.

4) What and Why Sharding ?

5) Implementation of Sharding .

Replication

> Replication is the process of synchronizing data across multiple servers. > Replication provides redundancy and increases data availability with multiple copies of data on different database servers, replication protects a database from the loss of a single server. >Disaster Recovery> No downtime for maintenance (like backups, index rebuilds, compaction)> Read scaling (extra copies to read from)

How Replication Works

> MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set.

- Replica set is a group of two or more nodes (generally minimum 3 nodes are required). - In a replica set one node is primary node and remaining nodes are secondary. - All data replicates from primary to secondary node. - At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected. After the recovery of failed node, it again join the replica set and works as a secondary node.

Replica Set Members

1) Primary2) Secondaries2.1) Priority 0 Replica Set Members2.2) Hidden Replica Set Members.2.3) Delayed Replica Set Members

3) Arbiter

Primary Replica Set Member

The primary is the only member in the replica set that receives write operations. MongoDB applies write operations on the primary and then records the operations on the primarys oplog.

Secondary members replicate this log and apply the operations to their data sets.

Priority 0 Replica Set Members

A secondary maintains a copy of the primarys data set. A priority 0 member is a secondary that cannot become primary. Priority 0 members cannot trigger elections. Otherwise these members function as normal secondaries.

A priority 0 member maintains a copy of the data set, accepts read operations, and votes in elections.

Configure a priority 0 member to prevent secondaries from becoming primary, which is particularly useful in multi-data center deployments.

Hidden Replica Set Members

A hidden member maintains a copy of the primarys data set but is invisible to client applications.Hidden members must always be priority 0 members and so cannot become primary.

The db.isMaster() method does not display hidden members. Hidden members, however, may vote in elections.

Delayed Replica Set Members

Delayed members contain copies of a replica sets data set. However, a delayed members data set reflects an earlier, or delayed, state of the set.

Must be priority 0 members. Set the priority to 0 to prevent a delayed member from becoming primary.

Should be hidden members. Always prevent applications from seeing and querying delayed members.

do vote in elections for primary, if members[n].votes is set to 1.

Replica Set Arbiter

An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections of for primary.

Arbiters always have exactly 1 election vote, and thus allow replica sets to have an uneven number of voting members without the overhead of an additional member that replicates data.

DEMO FOR REPLICATION

Make a replicaset with 5 members different kind of replica (e.g. Primary, Secondary, Hidden, Arbitrary and Priority 0).

Insert data and watch behavior for Delay and Arbiter member , and other Secondary members

Turn Down Primary and Invoke Elections .

Adjust Priority for Replica Set Member And Prevent Secondary from Becoming Primary

Configure Non-Voting Replica Set Member

Sharding

Sharding is a method for distributing data across multiple machines.

MongoDB uses sharding to support deployments with very large data sets and high throughput operations.

MongoDB supports horizontal scaling through sharding.

Sharding

Shard Keys

To distribute the documents in a collection, MongoDB partitions the collection using the shard key.

The shard key consists of an immutable field or fields that exist in every document in the target collection.

You choose the shard key when sharding a collection. The choice of shard key cannot be changed after sharding.

Shard Key

Chunks : - A contiguous range of shard key values within a particular shard. MongoDB splits chunks when they grow beyond the configured chunk size, which by default is 64 megabytes

The Perfect Shard Key

If you think about it, the perfect shard key would have the following characteristics:

All inserts, updates, and deletes would each be distributed uniformly across all of the shards in the cluster

All queries would be uniformly distributed across all of the shards in the cluster

All operations would only target the shards of interest: an update or delete would never be sent to a shard which didn't own the data being modified

Similarly, a query would never be sent to a shard which holds none of the data being queried

Hashed Vs Ranged Sharding

Hashed shard keys use a hashed index of a single field as the shard key to partition data across your sharded cluster.

Ranged-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with close shard key values are likely to be in the same chunk or shard.

By using a hashed index on X, the distribution of inserts is similar to the following:

Given a collection using a monotonically increasing value X as
the shard key, using ranged sharding results in a distribution of incoming inserts similar to the following:

Ranged sharding is most efficient when the shard key displays the following traits:

Large Shard Key Cardinality Low Shard Key Frequency Non-Monotonically Changing Shard Keys

Demo For Sharding

2 Shards Server As Replica Set

References

[1] MongoDB Officials Documentations https://docs.mongodb.com/v2.6

Thank you !