Transitioning a 4 TB Health Care
Security Auditing System to MongoDBMichael Poremba
Director, Data Architecture
Practice Fusion
IntroductionsGetting started
+ 20 years software engineering
+ Data architect / application architect
+ High-volume OLTP relational databases
+ Application performance and scalability
+ Domain experience:Health care; financial services; IT management; content management and distribution;
targeted advertising; telecom billing; manufacturing; insurance
Michael Poremba @ Practice Fusion
+ Cloud-based electronic health records (EHR)
+ Over 100,000 health care providers in US
+ Over 90,000,000 patient medical records
+ OLTP database: Week day peak ~ 40,000 transactions per second
+ 4 TB security auditing records ~ 50% of OLTP database storage
Practice Fusion
+ HIPAA: Health Insurance Portability and Accountability Act of 1996
+ Who did what to which patient’s medical record when?
+ Regulatory requirement—audit log must be kept and reviewed
+ Law enforcement and evidence in legal discovery
+ Save the audit log forever
+ Primary use cases:
Audit report in EHR: Security audit log viewer
Physician data analytics: Clinical quality measures (CQM)
HIPAA Security Audit Log
HIPAA Security Auditing on MongoDB
Project anatomy & lessons learned
Security Auditing – Legacy Architecture
Public
Load
Balancer
App 1
App 2
App n
.
.
.
EHR
(OLTP DB)
ActivityFeed
ActivityFeedParameter
4..8
CQM
(reporting)
ETL
Audit
Report
+ Latency on SAN increased
+ Response time slowed for writes
+ Database connections held longer
+ Connection pool expanded
+ User interface locked up—waiting
+ Users tried to log in again
+ Login is heaviest user operation
+ [Repeat]
The Log Jam
Found at: http://anchorhardwoods.com/wp-content/uploads/2011/08/log-jam.jpg
Audit Service – New Architecture
Public
Load
Balancer
App 1
App 2
App n
.
.
.
MongoDB
Audit Log
Audit
ServiceAMQ
Queue
Listener
Audit
Report
CQM
(reporting)
ETL
+ Isolate auditing system from EHR OLTP database
+ Extract audit IO off of EHR SAN
+ New service interface for audit events
+ Scale out audit service
+ Scale out data store for auditing
Benefits of New Architecture
Project Objectives
+ New infrastructure for MongoDB
and AMQ
+ Modernize audit service API
+ Modernize audit report UI
+ Convert ~200 audit write operations
to new service API
+ Data warehouse ETL from MongoDB
+ Migrate 4 billion exiting audit records
New Security Auditing SystemColetteprogram management
Ernestservices expert
Bhaviktest engineering
Michaeldata architecture
Jeffcluster architecture
JayMongoDB expert
BrettAMQ expert
Bryaninfrastructure coordination
Carlosdata warehouse ETL
+ Transaction volume: Sustain 1,000 new documents per second
+ Data volume: Scale to 10’s of billions of audit event records
+ High availability and disaster recovery—higher SLA than EHR
+ Quick UI response time for interactive audit report
+ Tamper prevention and detection
No updates or deletes permitted on audit log
Security alerts when audit log is altered
+ Leverage industry standards for health care security audit logging
~300 distinct auditable user actions
Required and varying data elements
Security Auditing – Application Requirements
AuditEvent
ParticipantObject
AuditSystem
User
0..n1..1 1..2
Health Care Industry Standards for Audit Logging
+ ISO 27789:2013: Health
Informatics – Audit trails for
electronic health records
+ ASTM E2147-01(2013):
Standard Specification for Audit
Disclosure Logs for Use in
Health Information Systems
+ FHIR SecurityEvent – resource
definition for auditing
{
"_id" : <BinaryData(4)>, // The audit event GUID
"docHash" : <String; Required>, // Tamper detection
"audOrgGuid" : <BinaryData(4); Required>, // Shard key
"crtdDttmUtc" : <Date; Required>, // Datetime record was inserted
"evnt" : {// Required subdocument
"dttmUtc" : <Date; Required>, // Date/time that event occurred
"typ" : <String; Required>, // Event record type; ~ 300 types
"ptDataTyp" : <String; Required>, // Standard set of patient data types
"actn" : <String; Required>, // Standard set of actions
"sys" : <String; Required> // Source system for audit event
},
"usr" : { // Required subdocument
"usrId" : <String; Required>, // Human-readable ID
"usrGuid" : <BinaryData(4); Required>, // Machine-readable ID
"dispNm" : <String; Required>, // Required; Display name for user
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"altUsr" : { // Optional subdocument for second user
... // Subdocument contains same properties as "usr"
},
"pt" : { // Optional subdocument
"ptId" : <String; Required>, // Human-readable ID for patient
"ptPracGuid" : <BinaryData(4); Required>, // Machine-readable ID for patient
"dispNm" : <String; Required>, // Display name for patient
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"body" : { // Optional subdocument
... // Flattened list of attributes, specific to audit event subtype
}
}
JSON Document Schema for Audit Events
AuditEvent
ParticipantObject
AuditSystem
User
0..n1..1 1..2
Schema Design – Lessons Learned
+ Prop nms strd per doc Long names add up for large collections (ours: 1 TB)
Consider using abbreviated property names
Up-vote this feature request:
https://jira.mongodb.org/browse/SERVER-863
+ Know your application read/write patterns
+ Application responsible for data integrity
+ Be aware of data type behaviors Indexed string search is case sensitive
Several binary data types for UUID—use type 4
(default type is specific to database driver)Found at: http://www.milesfinchinnovation.com/blog/wp-
content/uploads/2013/02/iStock_000019474446Medium.jpg
Schema Design – Lessons Learned
Leverage native data types:
+ Date
+ Boolean
+ Numeric "1" + "1" "11"
"11" + "1" "111"
+ UUID "8c290139-f4e3-49c1-9ba2-a883defc6a15"
"8C290139-F4E3-49C1-9BA2-A883DEFC6A15"
"8c29-0139-f4e3-49c1-9ba2-a883-defc-6a15"
"8c290139f4e349c19ba2a883defc6a15"
"{8c290139-f4e3-49c1-9ba2-a883defc6a15}"
"{8C290139-F4E3-49C1-9BA2-A883DEFC6A15}"
Found at: http://www.industryweek.com/innovation/innovation-one-size-fits-one
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action TypePatient
Data Type
(~300)
(~4 billion)
(~30 billion)
(10) (18)
UserPatient
(~100,000)(~90 million)
Practice
(~50,000)
Legacy Auditing System – Relational Schema
Issues around data normalization
+ New requirements introduced
+ Filter criteria and sort criteria
stored in five different tables
+ Audit events must be read into
memory for filtering and sorting
Join and expand data set by practice
Sort and filter expanded data set
+ Response time suffers for large
practices with many audit events
Schema Design – Lessons Learned
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action TypePatient
Data Type
UserPatient
Practice
Denormalize with care:
{
"_id" : <BinaryData(4)>,
"docHash" : <String; Required>,
"audOrgGuid" : <BinaryData(4); Required>,
"crtdDttmUtc" : <Date; Required>,
"evnt" : {
"dttmUtc" : <Date; Required>,
"typ" : <String; Required>,
"ptDataTyp" : <String; Required>,
"actn" : <String; Required>,
"sys" : <String; Required>
},
"usr" : {
"usrId" : <String; Required>,
"usrGuid" : <BinaryData(4); Required>,
"dispNm" : <String; Required>,
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"pt" : {
"ptId" : <String; Required>,
"ptPracGuid" : <BinaryData(4); Required>,
"dispNm" : <String; Required>,
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"body" : { ... }
}
+ Millions of events per owning organization
+ Quick UI Response Time for Interactive Audit Reports
+ Audit report UI allows events to be sorted/filtered five different ways
+ UI allows paging through audit event
+ Create a secondary index for each sort method
Index Design
+ Organization, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.dttmUtc": -1} );
+ Organization, patient, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "pt.ptId": 1, "evnt.dttmUtc": -1 } );
+ Organization, user, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "usr.usrId": 1, "evnt.dttmUtc": -1 } );
+ Organization, patient data type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.ptDataTyp": 1, "evnt.dttmUtc": -1
} );
+ Organization, user action type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.actn": 1, "evnt.dttmUtc": -1} );
+ Document created date DESCdb.auditEvent.ensureIndex ( {"crtdDttmUtc": -1 } );
Index Definitions
+ Filter by practice GUID
+ Sort by event created date time, descending order
+ Limit to 20 documents
db.auditEvent.find( {"audOrgGuid": BinData(4,"ABrlAG57Rx6gY3zyHzFK3Q==")} )
.sort( {"evnt.dttmUtc" : -1} ).limit(20).explain();
{
"clusteredType" : "ParallelSort",
"shards" : {
"RepSet02/MNGODDB03-SHRD02:27018, MNGODDB04-SHRD02:27018" : [
{
"cursor" : "BtreeCursor auditEvent_audOrgGuid_dttmUtc",
...
} ] }
...
"numshards" : 1,
...
Query Plan
Indexing Strategy – Lessons Learned
+ As with relational databases,
indexes are essential for efficient
queries
+ Learn how to use .explain()
to read query plans
+ Avoid collection scans:"cursor" : "BasicCursor"
+ For compound indexes, query
sort order must match index sort
orderFound at: http://www.ebay.com/itm/13-pc-Hex-Shank-Titanium-Drill-Bit-Set-Quick-Change-
Bits-/350526103504?pt=LH_DefaultDomain_0&hash=item519cfbdfd0
Principle of least privilege
+ MongoDB cluster not accessible from public Internet
+ Security enabled on cluster
+ Application users granted minimum permissions required
Signed audit events
+ Audit events signed with hash of audit event contents
+ Recompute hash on reads—test the data against hash value
+ Send security alert when hash does not match
Oplog monitoring
+ Use mongo-connector Python scripts to monitor oplog
+ Watch for .update() and .delete() operations on collection
+ Send security alert when data changes are detected
Tamper Prevention and Detection
Found at:http://legacymedia.localworld.co.uk/275663/Article/images/17639732/4416792.jpg
Security – Lessons Learned
+ Minimize network access to
MongoDB cluster
+ Enable authentication
+ Leverage role-based
authorization
+ Use SSL (MongoDB Enterprise)
+ Disable REST interface and
HTTP status interface
Found at: http://www.harborfreight.com/3-1-2-half-inch-circular-padlock-98972.html
+ Shard the database to scale out
+ Begin with small number of shards (2 or 3)
+ Group all audit events from the same medical practice
Every audit event is “owned” by some practice
Audit report UI always queries events by medical practice
+ Composite shard key on { PracticeGuid, _id }db.runCommand({
shardcollection : "AuditLog.auditEvent",
key: {audOrgGuid: 1,
_id: 1}});
Transaction Volume: 1,000 New Documents per Second
Found at:http://s3.amazonaws.com/Reconsales/800/0bfe72e0-9b06-42ac-9644-5727a3ca9c79.jpg
Sharding the Database – Lessons Learned
+ At the onset of development
determine whether to shard
+ Specify shard key in queries Allows mongos to route query
Minimize distributed “scatter/gather” queries
Queries spanning chunks likely span shards
+ Choose a key that allows even
balancing Balancing is performed in 32 MB chunks
Design shard key to ensure chunks will not
exceed 32 MB
Found at: http://www.airbrushaction.com/content/sites/default/files/tipstricks-images/4_27.png
High Availability and Disaster Recovery – Replica Sets
+ If audit log is down, then 100,000
health care providers are idle
+ Audit logging subsystem must be
more reliable than customer EHR
+ Node failover must be automatic
+ Protect against network and data
center failure scenarios
Found at: http://www.huntsmart.com/App_Themes/hs.com/ProductImages/250/DNSBC.jpg
Disaster Recovery DCPrimary DC DC2 AZ2
Sharded Cluster Replicated Across Multiple Data Centers
config
mongos shard 2
arbitermongos
amq
arbiter
amq
DC3 AZ1
shard 2
DC2 AZ1
shard 2
mongos shard 3
arbitermongos
arbiter
shard 3shard 3
mongos shard 1
arbitermongos
arbiter
shard 1shard 1
config config
amq amq
Performance and Stress Testing – Lessons Learned
+ Acquire or build load testing tools
+ Test using a realistic, unbiased data set
+ Test database cluster to ensure write
throughput
+ Ensure read & write performance meets
load requirements
+ Find the performance ceiling
+ Find and resolve bottlenecks
+ Tune IO and memory
Found at: http://www.webdesign.org/img_articles/21892/broken_chain.jpg
Data Migration – Lessons Learned
Data Migration
+ Parallelize data migration process
+ Identify and remove bottlenecks
+ Scale out MongoDB cluster to handle
heavy write load
+ Determine whether best to add
indexes before or after migration
+ It takes a while to extract, transform,
and load billions of documentsFound at: http://www.dennissy.com/wp-content/uploads/2010/07/house_moving_malaysia.jpg
Choosing the Appropriate Data Store
MongoDB over relational?
+ Scale out for transaction volume
and data volume
+ Highly varying document
structure
+ Developer productivityEasy map between application and data store
+ Offload read activity in optimized
format different from data writes(a.k.a. CQRS pattern)
Found at: http://www.meonuk.com/hammers-mauls
Choosing the Appropriate Data Store
Relational over MongoDB?
+ Complex normalized data model
+ Diverse read patterns requiring
joins
+ Ad hoc reporting and analysis
+ Data integrity difficult to manage
in application layerFound at:
http://3.bp.blogspot.com/_QUmmdgc7l6A/TTPUyRWFNPI/AAAAAAAAAO8/KV_i2c2lrRk/s1600/saws+various.jpg
MongoDB @ Practice Fusion
Upcoming MongoDB projects
+ Read cache for patient medical
records
+ Online patient intake process
+ Ad campaign segmentation
+ Scale-out data store for
patient clinical observationsFound at: http://jbirdmedia.org/vessels/images/uploads/framing-new-const-lg.jpg