22
Tech view on Regulatory Compliance MarkLogic User Group Benelux Meetup December 2016 Speaker: Alexander L. de Goeij

Tech view on Regulatory Compliance

Embed Size (px)

Citation preview

Page 1: Tech view on Regulatory Compliance

Tech view on Regulatory ComplianceMarkLogic User Group Benelux Meetup December 2016

Speaker: Alexander L. de Goeij

Page 2: Tech view on Regulatory Compliance

About me

• Architect / Consultant

• Financial Services: Core Trading

• Regulations: EMIR, MiFID II

• Architecture: Enterprise / Solution / Project Architect

• Consulting: IT Strategy, implementations, vendor selection, etc.

• Business degree, Tech addiction.

Page 3: Tech view on Regulatory Compliance

“Regulations really make my life more fun! ”As said by no-one, ever.

Page 4: Tech view on Regulatory Compliance

“Regulations really make my life more fun! ”As said by no-one, ever.

everyone who gets to use cool databases!

exciting

Page 5: Tech view on Regulatory Compliance

The challenge we think we are facing:

TransformExtract

Source DataHappy

Regulator

Load Send

extractload

Some Application

Page 6: Tech view on Regulatory Compliance

The actual challenge we are facing:

HappyRegulators

DB 1Load

Source Data

ExtractEmail

FTP

REST

SOAP

Tool 2Load Extract

Thing NLoad Extract

Database you didn’t know still existed

Page 7: Tech view on Regulatory Compliance

Current solution:

Doesn’t work anymore:

• Auditability / Process checks included in Regulations.

• Obligation to re-report.

• More complex Ad-Hoc requests from the Regulator.

• Not suited for Real-Time reporting.

• Waste of money…

Page 8: Tech view on Regulatory Compliance

What do we need?

• Auditability: keep original data in original format to prove results, keep track of ‘who-did-what’ with the data.

• Consistency: real-time requirement from regulator demands more than eventual consistency.

• Forward Flexibility: we know we don’t know what we will have to report tomorrow.

Page 9: Tech view on Regulatory Compliance

Looking to technology for a better answer!

Page 10: Tech view on Regulatory Compliance

Your favorite RDBMS

• ACID, consistent, and blazing fast if you buy Exadata

• Normalize your way out, and fail.

• Not fit for processing/reporting across different data objects: e.g. Trades and Mortgages

• Try to do NoSQL with SQL (innovative, but terribly slow and impossible to maintain)

Example of what not to do:

SQL

SQL

Page 11: Tech view on Regulatory Compliance

MongoDB

• Free! Open Source! GridFS!

• Have to transform data on ingest (to JSON) as most data is XML

• Eventual consistency (AKA data loss) means not real-time.

• Good at homogeneous data.

• Still master-slave, and scaling issues

• Brilliant for RAD / prototyping!

Where things go wrong:

Source: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

Page 12: Tech view on Regulatory Compliance

Cassandra (DataStax)

• Favors data duplication over normalization

• Very fast (if you duplicate well) but does not do JOINs

• Used by ING as main component of their Risk grid (YouTube)

• Excellent for time series data

Source: https://academy.datastax.com/resources/getting-started-time-series-data-modeling

Page 13: Tech view on Regulatory Compliance

Hadoop

Source: http://hortonworks.com/products/data-center/hdp/

Page 14: Tech view on Regulatory Compliance

MarkLogic

• Focused on heterogeneously structured data

• Bitemporal, if you dare

• Semantics / RDF Triples

• ACID, Consistent, stores original file

• ABAC & redaction in enterprise version

• Rules, Workflows, Alerts, Triggers

• Not a COTS!

Page 15: Tech view on Regulatory Compliance

Ok, so now what?

Page 16: Tech view on Regulatory Compliance

Two approaches to a solution

Infra approach:

• Build everything yourself, use open source components

E.g.:

• Hadoop

• Cassandra + Kafka

Platform approach:

• Focus on application and business logic, not on infra

E.g.:

• MarkLogic

• Spark (without Hadoop)

Page 17: Tech view on Regulatory Compliance

Akka ActorsAkka Actors

SparkSparkKafkaKafka

Infra approach (SMACK example)

• Used (and designed) by Netflix, LinkedIn, Uber, Twitter

• Massive amounts of event processing (IoT)

• HA and Geo distributed

• Scala, Python, R, Java(Script)

• Asynchronous everywhere

• Near impossible to destroy: reactive, self-healing, back-pressure.

Kafka

Akka Actors

Play REST APIs

Cassandra

Spark

Mesos OS

Bare Metal

Bare Metal

Bare Metal

Bare Metal

Cassandra

Cassandra

Zookeeper

Marathon

Play REST APIsPlay REST APIs

Page 18: Tech view on Regulatory Compliance
Page 19: Tech view on Regulatory Compliance

Platform approach

MarkLogic

Insert Time Series

Database here

Spark

Source Data

Qualitative

Quantitative

Data Flows Data Stores Analytics Feedback Loop

HappyRegulator

• Schema transformations• Business Rules• Workflow• Rights management

Page 20: Tech view on Regulatory Compliance

Main take-aways

• There are no one-stop solutions

• Don’t pick bleeding edge stuff if you need it to work

• Focus on Business benefit of investment in Regulatory Compliance

• Separate the platform from the project!

• Start small, think big

Page 22: Tech view on Regulatory Compliance

References

• https://academy.datastax.com/resources/getting-started-time-series-data-modeling

• http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

• http://hortonworks.com/products/data-center/hdp/

• https://www.linkedin.com/pulse/data-hubs-marklogic-vs-hadoop-kurt-cagle

• https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin

• http://www.datanami.com/2015/10/05/how-uber-uses-spark-and-hadoop

• https://blog.twitter.com/2015/handling-five-billion-sessions-a-day-in-real-time

• http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflix.html