Upload
scott-hernandez
View
222
Download
3
Embed Size (px)
DESCRIPTION
MongoDB replication internal architecture for 2.8 Abstract: Replication in MongoDB requires deep integration with almost every part of the codebase, and has important hooks in various systems like storage, indexing, command processing and querying. Most of the replication components have seen a major overhaul recently in order to make further improvements. In this talk we will address what those pieces are, how they interact, and interesting choices made during their design. In this talk we get into the interaction of the replication protocols, commands really, writes and write concern enforcement, consensus (elections/ leader/follower/ majority) behaviors, and down into the depths of oplog generation and application on replicas. While a large part of the talk will be a technical overview of the big pieces we will dive into many important areas in order to ensure better understanding. The audience will be able to greatly affect which areas we focus on during the session, so come with ideas and a focus.
Citation preview
Replication
InternalsFitting Everything Together
2.8, Refactored
● Architecture as of 2.8
● Unit testable; more, and faster, cpp tests
● Many changes (heartbeats, locking, future)
● Interop with 2.6
● Larger replica sets
Large Blocks
● Topology Manager (state machine)
● Replication Coordinator (repl facade)
● Applier (replicate/apply oplog)
● Executor (network, heartbeats, serialization)
● Commands (re-config, init, status, etc)
● External (writes, storage, query, commands)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
Topology
● Maintains Authoritative Stateo Heartbeat, ping, member state
o Roles and transitions
● Contains Decision Logic
● Unit Testable
● Serial AccessTopology Manager
CFG
Examples
● updateConfig
● prepare*Response for commands
● getSyncSource, *
● setFollowerMode (state)
● processHeartbeat
● prepareHeartbeatResponse
PrepareHeartbeatResponseStatus TopologyCoordinatorImpl::prepareHeartbeatResponse(...) {
// Check error conditions, then set response fields …
response->setElectable(!_getMyUnelectableReason(...));
response->setHbMsg(_getHbmsg(...));
response->setTime(...);
response->setOpTime(lastOpApplied);
if (!_syncSource) {
response->setSyncingTo(_syncSource); }
… topology_coordinator_impl.cpp:628
Failover Scenario
Heart
beats P
S
S
Health Check (rsHB)Active Primary
Failover Scenario
Heart
beats P
S
S
Active PrimaryP
Failed Primary
Failover Scenario
Heart
beats Failed
P
S
Health Check (rsHB)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
Replications Coordinator
● Interface to other subsystems
● Uses executor to scheduleo Commands
o Elections, Initiate, Reconfig
o Role/State Changes
● Unit Testableo With help, requires mocking out bridge for
subsystems
Replication
Coordinator
Blocks
ApplierReplication
Coordinator
OplogC
MD
s
Write
s
Qu
ery
Executo
r
Topology ManagerCFG
Examples
● process*Response for commands
● awaitReplication* (for writes or migration)
● isReplEnabled
● canAcceptWrites*
Accepting writesstatic bool checkIsMasterForDatabase(const std::string& db, ...) {
if (!getReplicationCoordinator()->canAcceptWritesForDatabase(db)){
errorDetail->setErrCode(ErrorCodes::NotMaster);
errorDetail->setErrMessage("Not primary while writing to " + ns);
return false;
}
return true;
}
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
Applier
● Reads from *upstream* oplog
● Applier operations transformations
● Mostly unchanged since 2.4
● Includes UpdatePosition commands
Applier
Read + Apply Decoupled
● Background oplog reader thread (net)
● Pool of oplog applier threads (by collection)
Repl Source
Applier
Pool
Buffer
DB4
DB3
DB1 DB2
Local Oplog
Network
Replication Operations
oplog entry (fields):
o = update, o2 = query
{ "ns" : "test.tags",
"op" : "u", "v" : 2, "ts": ...,
"o2" : { "_id" : 1 },
"o" : { "$set" : { "tags.4" : "e" } } }
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
Executor
● Serializes access to Topology state
● Serializes global state changes wrt db writes
● Processes network requests in IO pool
● Supports event/signal notification
Write Request
● Sent by user
● Interpreted by command subsystem
● Checked by replication coordinator
● Executed
● Idempotent entry recorded in oplog
● ~ Replicated
● ~ Possibly verified during user write request
Write Request
ApplierReplication
Coordinator
OplogC
MD
s
Write
s
Qu
ery
Executo
r
Topology ManagerCFG
● Topology Manager (state machine)
● Replication Coordinator (repl facade)
● Applier (replicate/apply oplog)
● Executor (network, heartbeats, serialization)
● Commands (re-config, init, status, etc)
● External (writes, storage, query, commands)
Thanks
Questions?