19
MongoDB World London 6 th November 2015 Robert Hill – Head of Big Data for Financial Services Single View with MongoDB

Single view with_mongo_db_(lo)

  • Upload
    mongodb

  • View
    2.276

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Single view with_mongo_db_(lo)

MongoDB World London6th November 2015Robert Hill – Head of Big Data for Financial Services

Single View with MongoDB

Page 2: Single view with_mongo_db_(lo)

2Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Single View – No, It’s Not Tinder!

Single View is the formation of a unified view of an “entity” from a mix of source systems

These entities can be customers, employees, partners, suppliers, etc.

In reality, customers make up the vast majority of use cases, so this is commonlhy called Single View of Customer, or SVC

Canonical Single View Architecture

Fuzzy matches customer records,

generates link IDs, etc.

Page 3: Single view with_mongo_db_(lo)

3Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Why Care About a Single View?

Let’s say we end up with 100 “John Smiths” in our Data Warehouse How many are different John Smiths in person? How many are simply different systems representing the same John Smith? How many are a single system representing the same John Smith multiple times? How many are a “John Smith” that has contacted us multiple times through differing

channels, branches, or brands, in differing contexts – i.e., corporate CFO John Smith of XYZ Corp. is also citizen John Smith, who has a mortgage, auto loan, and a checking account.

Any customer-centric activity becomes very difficult when we actually cannot tell with certainty who a “customer” is…that includes Risk modelling, Fraud detection, and of course Customer Analytics for marketing and sales.

Taking the example of our CFO above, a bank would be hesitant to turn him down for another car loan given he might have his company invest 20 million with the bank’s business division, wouldn’t they?

Page 4: Single view with_mongo_db_(lo)

4Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Lack of customer knowledge has a high potential cost – poor understanding of customer view data has been known to have huge business impacts.

For example, a customer data flaw in a demand forecasting system cost a major US airline $50 million in one year of operation…on top of the $40 million development cost to implement it

When a global stationary retailer examined their views of a customer, it was found that every real customer had roughly 2.5 “virtual” customer records across 75 source data systems. In short, they had no way of really understanding the value of any given customer, or even a customer segment, via their sales and marketing data, and the resulting cost was estimated as a net loss of 9% per customer transaction

Bad SVC is Bad, Bad Business

Page 5: Single view with_mongo_db_(lo)

5Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

The growth in Data Lakes (or Data Hubs depending…) means that companies store more and more information about entities than they have ever had access to before

More data is not the same as more information – what good is knowing everything about “John Smith” when you have records for 30,000 “John Smiths” stored from various sources in your Data Lake…notice that we may have an order of magnitude more “John Smiths” than we had prior to Big Data

Big Data also means richer data…now we require SVC programming detect duplicate customers from more varied data streams, such as web, images, voice, geospatial, etc. The matching algorithms become much harder, longer to develop, and costly

The Data Hub exists to allow new extracts/titrations of it to change to meet business needs…which places greater demands on the SVC solution to adapt to new data formats

Big Data is Making SVC Harder Than Ever

Page 6: Single view with_mongo_db_(lo)

6Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Single View Affects All Big 4 Big Data Use Cases

Data Rationalisation

The Data Lake / Data Hub architecture enables companies to retain all data in original source formats

These source formats are rife with duplicated entity objects, and any use of the Data Lake in it’s native form for analytics or modelling will contain possibly indeterminate and inaccurate results

The move from Extract/Transform/Load to Extract/Load/Transform has pushed this further down-stream

Fraud

Big Data is enabling longer retention of data, and richer sources of data including voice and image

Fraud is moving towards real-time detection and decisioning, where performance is important

But Big Data expands the difficulty in finding a “true” customer record to model against, and can exacerbate the performance issues of real-time or near-real-time fraud models

Risk

Similarly to Fraud, Big Data is enabling Risk models to have access to more and richer customer data, including social media, detailed web interactions, voice, and image data

This leads to more customer interactions in the data, and potentially better data training sets for better risk modelling - if a single customer can be identified to input into the models! The confusion matrix of the models is now highly dependant upon Single View.

Customer Analytics

As above, Customer Analytics and the CRM actions it enables (NBA, NBO, real-time targeting, etc.) are all potentially benefactors of Big Data.

With CA, the risk of mis-identifying a customer is even greater, as the messaging directly to the customer may be obviously wrong. More subtly, constantly suggesting “customers also liked…” and being entirely wrong routinely suggests to customers that the company really doesn’t know or care about them.

Page 7: Single view with_mongo_db_(lo)

7Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Speed of Comparison - SVC algorithms usually require retrieval and comparison of vast amounts of entity data for comparison and duplicate detection. Historically, this has made them poor candidates for executing from RDBMSs, and flat files in the landing area of DWs are common

Flexibility – as source systems change, the data design of an entity data object is the product of the changes of all of the source system changes that underlie it. In an RDBMS, this can have a large impact on the stability of the Customer table and associated reference data

Speed of Access - Real-Time decisioning requires very fast access to the underlying data, usually precluding joins in high-load environments

Reliability – as SVC stores begin to underlie more real-time processes, it becomes imperative that they have high-availability and fail-over

Representation Flexibility - SVC processing can either combine or link entity data objects, depending upon use cases being considered

Single View Challenges – It Can’t Be Rocket Science?

Page 8: Single view with_mongo_db_(lo)

8Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

How the Canonical Model Stacks Up…And Why It Is Falling Over

Speed - RDBMS provides limited throughput for

Comparison Processing

Flexibility – RDMBS has limited flexibility, can require substantial

redevelopment as source systems change

Speed of Access- RDBMS usually requires joins, limiting speed of

access

Reliability – RDBMs may support clustering, but

usually with extra software costs, i.e., RAC

Representation Flexibilty - RDBMS usually requires joins or

combining physical records destructively

Page 9: Single view with_mongo_db_(lo)

9Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Enter…Mongo! er, MongoDB

MongoDB is an exciting and powerful platform for implementing enterprise-class Single View solutions

The design of MongoDB enables implementations that avoid the pitfalls of traditional RDBMS-based Single View architectures, with a lower cost of implementation

Due to the on-going flexibility of MongoDB to handle source systems changes and mixed data types, it is very likely that the overall Total Cost of Ownership of MongoDB solutions will be lower for the entire solution lifecycle

Page 10: Single view with_mongo_db_(lo)

10Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

MongoDB – a new Big Data SVC architecture

We envision that MongoDB will usually sit on top of a Data Lake (or ODS)

ETL has therefore been replaced with EL

Single View processing may (if possible) be moved into MongoDB, using MapReduce

Let’s look in detail…

New MongoDB Single View Architecture

Page 11: Single view with_mongo_db_(lo)

11Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

MongoDB provides Fast Speed of Access…

MongoDB Avoids Joins

Innate to MongoDB is a database architecture that strives to minimise joins, which is a design philosophy for most Real-Time Decisioning databases

Embedded documents provide a way to de-normalise repeated source data with no performance hit (subject to growth of the object in size)

Flexible Indexing

MongoDB provides flexible and powerful indexing features, that allow the system to access specific data objects rapidly. As most Single View uses have very specific and known access patterns, they are easily indexed

Where possible, Covered Queries allow MongoDB to return Indexed results from the in-memory indexes themselves, saving any disk access

Horizontal Scalability

MongoDB is horizontally scalable through the use of sharding technology. Shards allow MongoDB instances to be added to achieve the desired levels of concurrent performance to large numbers of queries

Key to enabling the use of Single View data is the ability to access it quickly to perform Real-Time Decisioning.

Page 12: Single view with_mongo_db_(lo)

12Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

MongoDB Provides Rapid Speed of Comparison…

MongoDB Integrates MapReduce

By embedding Map/Reduce processing, MongoDB provides a better way to run large dataset Single View processes

As MapReduce operates directly against the MongoDB database, data import and export are eliminated

MapReduce Allows Intelligent SV Algorithms

MapReduce can implement very powerful algorithms in JavaScript expressions

One of the primary uses of MapReduce is to find similar objects and tag them or collect them

Data Access Speed

The same technologies that enable Rapid Speed of Access also enable the rapid execution of SVC:• Indexing allows rapid data access if needed, including

Covered Queries when possible

• Sharding again allows the MongoDB cluster to scale appropriately to handle large data volumes and loads, without the need for costly technologies such as Oracle RAC

Many Single View processing algorithms are slow and inefficient if they use a database, or rely upon difficult to manage flat files as data input and output

Page 13: Single view with_mongo_db_(lo)

13Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

MongoDB Provides Representation Flexibility…

MongoDB can provide Referenced documents

Despite aiming to eliminate joins, MongoDB can flexibly support more normalized and linked records, using Referenced documents

This allows suspected duplicate customer documents to be linked to a real or generated customer master document, and not be overwritten.

Such an approach remains auditable and reportable at any time

Batch or Real-Time

Due to the power of MapReduce integrated into the MongoDB platform, various use cases may be catered for

The traditional, batch-oriented approach may be implemented and match keys written back to the MongoDB database

For certain cases, it may be desirable to not perform SVC comparisons and linking until query time, which allows fully flexible linking

A key design issue for many SVC implementations is how strongly to link or combine suspected duplicates. For applications such as maintaining a bank’s central records, it is usually not advised to eliminate suspected duplicates unless the algorithm is nearly 100% certain, or it is verified by human inspection. However, for a database merely running marketing operations, there is a much lower cost of combining suspected duplicates, even if they are false matches. MongoDB can easily cater for both

Page 14: Single view with_mongo_db_(lo)

14Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

MongoDB Provides Data Flexibility…

Flexible Document Formatting

By retaining source system data longer, Data Lakes increase the variability of source record formats. In a traditional RDBMS Data Warehouse, these changes are costly to implement and track

MongoDB’s flexible JSON/BSON document structure accepts variant record formats easily with no conversion hassles of existing records, no query re-writes, etc.

Non-Structured Data Sources

MongoDB accomodates BSON objects up to 16MB, but has the means to easily incorporate non-structured source data, using GridFS.

GridFS stores very large image, video, audio and other non-structured data sources as chunks, each in their own document with metadata

The ability to store non-structured content within MongoDB with Customer (or Entity) data often avoids the need for a separate Content Management System

Data Scalability

The power of Sharding does more than just allow improved speed – it allows MongoDB to accommodate data sources that simply grow and grow in size

New MongoDB technologies are expected shortly to further push data scalability within each Shard while constraining costs of growth

Big Data systems are incorporating more data sources, longer retention periods, and a great deal of non-structured data. MongoDB provides the flexibility to accommodate all of these and lower TCO

Page 15: Single view with_mongo_db_(lo)

15Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

MongoDB Provides Reliability…

Multiple Redundancies

A deployed MongoDB instance has redundancy built into the Query Routing nodes, the data-bearing Shards, and the 3 Config Servers.

Within each Shard, data is apportioned between primary and backup data sets, with the backups often sited off-site for security and redundancy

This configuration also had inherent load balancing, allowing degraded responses from one unit to be balanced dynamically

As Single View becomes closely tied to customer-facing CRM and Real-Time Decisioning systems, it is imperative that their source of truth does not fail, particularly when used by on-line 24x7 customer channels.

Page 16: Single view with_mongo_db_(lo)

16Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Capgemini’s needs to ensure it provides a flexible and adaptable HR function for its employees .There is a need for the following requirements to be improved and met by this function:• Availability of real time accurate and useful data (consolidated

where possible) • Single Employee View - Masked data where needed • Dashboarding & ability to extract and manipulate data• Improve data quality• Reduce current support problems

.

Use Case: Internal Single View of Employee

Business Case

• Capgemini has a variety of employee-related databases – the Oracle HR system, Leave Management System, Clarity time accounting, etc. Some key data is kept on spreadsheets and data comes from various sources

• HR must produce both ad-hoc and periodic reports to managers and employees, as well as use the data internally

• Most data is updated monthly, leading every reporting cycle to have to adjust the previous month’s summary reports as corrections are applied. This affects accuracy and quality

• HR, Recruiting, and even Managers and Employees require a comprehensive view of HR-related data, with appropriate data security and visibility rules strictly enforced.

Problem

• Construct a Single-View of Employee data, comprised of HR, LMS, Clarity, Salary Reference Data, Bench and Roll-off data, using MongoDB

• Provide users with Tableau, Qlikview, or similar reporting tool

• Build template for SVC-type MongoDB projects

Objectives /Scope

Page 17: Single view with_mongo_db_(lo)

17Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Summary and Questions

MongoDB is an excellent platform for building Single View architectures and solutions

It solves a great many problems associated with existing RDBMS-based SVC solutions, especially in the areas of Speed of Access Speed of Comparison Representation Flexibility Data Flexibility Reliability

As a result of these features, MongoDB provides a demonstrably lower Total Cost of Ownership for an SVC solution than previous generation SVC solutions, but at the cost of learning curve to master MongoDB’s intricacies and associated domain knowledge.

Page 18: Single view with_mongo_db_(lo)

18Copyright ©Capgemini 2014. All Rights Reserved

Single View with MongoDB | November 2014

Contact information

RobertHillHead of Big Data for Financial [email protected]

CapgeminiLondon

Page 19: Single view with_mongo_db_(lo)

The information contained in this presentation is proprietary.© 2014 Capgemini. All rights reserved.

Insert Client/Partner logo

About CapgeminiWith almost 140,000 people in over 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2013 global revenues of EUR 10.1 billion. Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business ExperienceTM, and draws on Rightshore®, its worldwide delivery model.

About MongoDBMongoDB is the next-generation database that helps businesses transform their industries by harnessing the power of data. The world’s most sophisticated organizations, from cutting-edge startups to the largest companies, use MongoDB to create applications never before possible at a fraction of the cost of legacy databases. MongoDB is the fastest-growing database ecosystem, with over 8 million downloads, thousands of customers, and over 650 technology and service partners.

www.capgemini.com www.mongodb.com