14
BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly litigation, a database must hold up as the main system of record. Unfortunately, traditional databases do not keep a complete history of the past. Only with a bitemporal database can you truly maintain a complete and accurate picture of the past, understanding exactly “what you knew” and “when you knew it.”

BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

BETTER WITH BITEMPORALMARKLOGIC WHITE PAPER • JUNE 2015

In our age of billion-dollar regulatory fines and time-consuming, costly litigation,

a database must hold up as the main system of record. Unfortunately, traditional

databases do not keep a complete history of the past. Only with a bitemporal

database can you truly maintain a complete and accurate picture of the past,

understanding exactly “what you knew” and “when you knew it.”

Page 2: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

ASSESSMENT: DO YOU NEED BITEMPORAL? Before you go any further, it is probably helpful to first ask whether you might need bitemporal data management in your organization. If you answer “yes” to any of the following questions, then bitemporal is a solution that you should consider.

YES NO

1. Is tracking when events or transactions occur critical to your business? ✔

2. Are there ever cases when historical data needs to be updated? ✔

3. Do you run into circumstances in which there is a lag between when something happened in the real world, and when it was recorded in the database?

4. Do you get frequent requests from regulators to review historical data? ✔

5. Do you work in an industry in which the sequence of when you learn about certain information is significant, such as in law and intelligence?

6. Is the cost and complexity of storing and accessing historical data in your organization overwhelming?

7. Does managing and accessing historical data cost significant developer resources, or carry increasing risk over time?

Page 3: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

ContentsIntroduction ...............................................................................................................................................................1

The Cost of Not Having Bitemporal

Three Types of Temporality .....................................................................................................................................2

Non-temporal

Unitemporal

Bitemporal

The Benefits of Bitemporal ......................................................................................................................................4

Things You Can Do With Bitemporal

The Increasing Need for Bitemporal

Bitemporal Across Industries

Why Bitemporal Has Been Difficult.........................................................................................................................7

Why the Time for Bitemporal is Now ......................................................................................................................8

Key Features of Bitemporal in MarkLogic

Get Going Quickly

More Information

Page 4: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

INTRODUCTIONToday, databases are the primary system of record, not paper. In this new reality, organizations are required to keep an accurate picture of all the facts, as they occur. For certain industries such as financial services, insurance, and healthcare, there are even laws that mandate how historical data is tracked and managed.

Unfortunately, traditional databases cannot provide a truly accurate picture of your business at different points-in-time. The reason is that traditional databases are unitemporal, and can only track start and end times along a single timeline. But, what if there is a lag between when something happened and when you found out about it? Which time should you record? Or, what if you realize you need to make a correction to when something happened, but do not want to overwrite any historical data? In those cases, a single timeline is not enough.

With a bitemporal database, you can store and query data along two timelines with timestamps for both valid time—when a fact occurred in the real world (“what you knew”), and also system time—when that fact was recorded to the database (“when you knew it”). By tracking events along two timelines with a bitemporal database, it is possible to keep a complete and accurate picture of your business at any given time for internal search and discovery purposes or for when regulators conduct audits.

Consider some of the new questions that a bitemporal database allows you to ask:

• What were my customer’s credit ratings last year as I knew them last quarter?

• What was our position with that security before the trade was amended?

• What did our intelligence indicate before we learned that new piece of information?

With a traditional unitemporal database, you can ask what your customer’s credit ratings looked like as you knew them today, but not yesterday or last quarter.

Only a bitemporal database allows you to go back and see an accurate and unaltered picture of historical data, including past and present changes. A bitemporal database is necessary for today’s enterprises to be able

to accurately explore historical data, manage that data across systems, ensure full data integrity, and do more complex analysis.

MarkLogic® is an Enterprise NoSQL database that is best suited for storing and managing bitemporal data for the following reasons:

• Flexible Data Model – MarkLogic’s document-oriented data model is schema-agnostic and able to manage the complexities of bitemporal data that relational databases are ill-suited for, such as integrity constraints, evolving schemas, and multiple different data models.

• Enterprise Reliability – MarkLogic has the enterprise features that other new generation databases do not. MarkLogic is a proven database that runs mission-critical applications at hundreds of world-leading organizations.

• Bitemporal Out-of-the-Box – Bitemporal is a feature built-in to MarkLogic whereas other vendors make it an additional software add-on that increases cost and complexity.1

1 Hudson Foods recalled one-fifth of their annual output in 1997 due to an outbreak of E. Coli, costing them an estimated $25 Million. Their database only al-lowed them to see a current view of which beef came from which sources, and not a view of their data as it existed on the day the supplier processed the small batch of contaminated meat. This meant the entire product had to be recalled. For more information, see Richard T. Snodgrass’ book, Developing Time-Oriented Database Applications in SQL (ch.2, 11).

THE COST OF NOT HAVING BITEMPORALNot having bitemporal is directly attributed to costing one company $25 Million.1 It has cost (or perhaps saved) many politicians their jobs. In our age of super-regulation and the need to maintain provenance, immutability, and governance with historical data, the potential cost of not using bitemporal grows much larger. This is particularly true in industries such as financial services where not having an accurate picture of the past has contributed to multi-billion dollar fines and further increases in regulation.

1

Page 5: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

THREE TYPES OF TEMPORALITYTo understand bitemporal, you first have to understand how databases currently manage time. In relation to time, there are three basic categories of databases: non-temporal, unitemporal, and bitemporal. Each type is discussed below, using the example of when a patient was diagnosed with an allergy and when the doctor found out about it as a guide.

NON-TEMPORALNon-temporal databases store data with no time dimension. A fact is just a fact—there is no history and it is only understood to be true at the current point in time. Data models that do not support a time dimension are just called snapshots.

Just imagine the example of when a patient was diagnosed with an allergy, which is an important piece of information considering the potential adverse and even deadly reactions that some patients can have to common medications like penicillin. With a non-temporal database, you would just see the current state, which would be either “patient has no allergy” or “patient has a positive allergy diagnosis,” as depicted in Figure 1 in which the shaded area represents when the fact is true.

In a non-temporal database, you just get a single view of the data without respect to time. It should not be surprising that non-temporal databases are very uncommon, as most applications deal with time- varying data.

UNITEMPORALUnitemporal databases support time across one dimension: valid time. Most people just think of valid time as just “time”—it represents when something happened in the real world. Valid time is tracked along a single timeline to answer questions such as: When did

that patient get diagnosed with an allergy? How many patients have that same allergy? How long has the patient had the allergy? In the example of the patient with the allergy, it is clear from the graph in Figure 2 that the patient was diagnosed with an allergy at 9:00am along the valid timeline.

The problem is that valid time only shows a piece of the picture. Looking at the figure above, it would not be clear to an outside observer when the doctor learned that the patient was diagnosed with the allergy. What if it was the lab that first discovered the allergy, but there was a lag in time before the doctor actually found out about the lab results? That is valuable information that is not recorded in a unitemporal database. In this example, imagine if a drug was administered to the patient that day that caused an anaphylactic reaction—didn’t the doctor know not to administer that drug? Let’s look at how to solve this problem with a bitemporal database.

BITEMPORALA bitemporal database records timestamps for events along two dimensions of time: valid time and system time. Valid time tracks when an event occurred in the real world. System time (sometimes called “transaction time”) tracks when the event is recorded to the database. These two time dimensions are depicted graphically along both axes in Figure 3. In this example, valid time represents when the lab discovered the allergy, and system time represents when the doctor found out about it and recorded it to his chart.

Unitemporal databases make the false assumption that valid time is always equal to system time, and in doing so loses valuable information. Sometime, as Figure 3 depicts, valid time is equal to system time. But, you would not know that unless you had a bitemporal database. A bitemporal database records time along

POSITIVE ALLERGYDIAGNOSIS

FIGURE 1: A nontemporal database does not store any time dimensions.

TIME:

NOALLERGY

DIAGNOSIS

POSITIVE ALLERGYDIAGNOSIS

9 AM 10 AM 11 AM 12 AM

FIGURE 2: A unitemporal database only tracks valid time.

2

Page 6: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

both dimensions independently so you can keep accurate records.

Using the example of the patient with the allergy, imagine that the doctor actually found out about the allergy at 10:30am, an hour and a half after the lab did their tests and concluded that the patient had an allergy. The lab noted that the patient had an allergy at 9:00am, but that information did not get to the doctor until 10:30am. This represents a lag between valid time and system time, and would look like Figure 4.

Taking this example a bit further, imagine that later on the same day, at 11:30am, the doctor gets a call from the lab saying that they just discovered that they did the tests incorrectly. The lab result was actually negative—the patient does not have an allergy. This correction is shown in Figure 5. With a bitemporal database, it is easy to make corrections to historical data, and the process does not overwrite any data.

By looking at Figure 5, we can ascertain the following facts:

• Before 10:30am (system time), the doctor did not know about the allergy

• At 10:30am (system time), the doctor recorded the patient having an allergy, which had been discovered by the lab at 9:00am (valid time)

• At 11:30am (system time), the lab and doctor discover the mistake and update the records to show that the patient does not have an allergy

With this timeline tracked across both axes, it is now possible to go back and see a true picture of events. This can be extremely helpful in understanding and avoiding mistakes, as the doctor’s decisions can be easily married to what he knew or did not know at any given point in time. In the setting of a hospital, drug allergies can be life threatening, so having an accurate record of when a patient was diagnosed and when care providers learn this information is critical.

POSITIVE ALLERGYDIAGNOSIS

9 AM

9 AM

10 AM

11 AM

12 AM

10 AM 11 AM 12 AM

NO ALLERGY DIAGNOSIS

SYSTEM TIME“When the doctor found out about it”

VALI

D T

IME

“Whe

n th

e la

b di

scov

ered

the

alle

rgy”

FIGURE 3: A bitemporal database tracks both valid time and system time.

FIGURE 4: A bitemporal database tracks lags in information.

POSITIVE ALLERGYDIAGNOSIS

9 AM

9 AM

10 AM

11 AM

12 AM

10 AM 11 AM 12 AM

SYSTEM TIME“When the doctor found out about it”

VALI

D T

IME

“Whe

n th

e la

b di

scov

ered

the

alle

rgy”

NO ALLERGYDIAGNOSIS

LAG

POSITIVE ALLERGY

DIAGNOSIS

9 AM

9 AM

10 AM

11 AM

12 AM

10 AM

CORRECTION

11 AM 12 AM

SYSTEM TIME“When the doctor found out about it”

VALI

D T

IME

“Whe

n th

e la

b di

scov

ered

the

alle

rgy”

NO ALLERGYDIAGNOSIS

FIGURE 5: A bitemporal database tracks corrections without overwriting data.

3

Page 7: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

The example of the allergy diagnosis may seem somewhat simple, but the same concept can be applied to any piece of data, whether it is when a financial trade occurred, when someone got insurance, or when someone owned a house. In all of these cases, START DATE and END DATE for both valid time and system time can be tracked in order to preserve the most accurate picture of reality.

TABLE 1: Comparing Unitemporal to Bitemporal for a variety of examples.

UNITEMPORAL BITEMPORAL

When did the lab results indicate that the patient had an allergy to penicillin?

When did the lab results indicate that the patient had an allergy to penicillin, and when did the care provider learn about the allergy?

When was the sell order cancelled by the bank’s counter party?

When was the sell order cancelled by the bank’s counter party, and when did the trader learn that it was cancelled?

What reference data existed regarding trade events on December 4th?

What reference data did the trader actually have on December 4th?

When did John become eligible for insurance coverage, as the employment records indicate now?

When did John become eligible for insurance coverage, as the employment records indicated in 2012?

THE BENEFITS OF BITEMPORALBitemporal, simply put, gives you a better way to manage time. No alternative to bitemporal, even temporal versioning, can provide a seamless, query-able, flexible view of historical data. Bitemporal is a critical capability any organization can take advantage of, and there is a particularly growing need for bitemporal in industries that face growing regulatory pressures and litigation such as financial services, insurance, and healthcare. In these industries, organizations are having to better account for all of their past actions with the onset of new laws and litigation, more frequent and in-depth audits, and increased fines for non-compliance. Organizations that better manage their historical data are able to reduce their risk and get through audits unscathed.

THINGS YOU CAN DO WITH BITEMPORAL

• Handle Regulation and Audits – Provide an accurate picture of the past to meet requirements for increased transparency and accountability

• Manage Risk – Create better risk models and improve business intelligence by analyzing true historical data

• Reduce Costs – Simplify architecture and reduce the cost and operational risk of storing redundant historical data

THE INCREASING NEED FOR BITEMPORALThe need to better manage regulatory concerns is growing in general, though it is having a particularly significant impact in certain industries, such as financial services. Large banks have been hit with record-breaking fines in recent years, coupled with an increase in regulatory pressures. Since 2009, banks in the U.S. and Europe have paid over $128 billion to regulators, and 2014 was the biggest year ever, with $65 billion in penalties and fines, about 40% greater than in 2013.2

Today, regulators are more intrusive and carry out more vigorous enforcement as they drill into the details. According to Gerold Grasshoff, the global head of risk management and regulation at Boston Consulting Group, regulatory pressures are now a core issue for banks. “You have to change your operating model, change your products, change the legal risks now...Nothing is changing business models as much as the regulatory issues. That is the biggest strategic challenge.” To adopt to the changing way in which business is done, banks are having to change their IT and data management approaches to increase transparency.3

Other industries are also facing increased regulatory pressures. In healthcare, for example, there is the problem wrought by medical errors, which some reports estimate to be $1 Trillion.4 Knowing when and how

2 James Sterngold, “For Banks, 2014 Was a Year of Big Penalties”, Dec. 30, 2014 <http://www.wsj.com/articles/no-more-regulatory-nice-guy-for-banks-1419957394>

3 Boston Consulting Group, “Building the Transparent Bank”, Dec. 2014 <https://www.bcgperspectives.com/Images/Building_the_Transparent_Bank_Dec_2014_tcm80-177814.pdf>

4 Andel, Davidow, Hollander, Moreno. “The economics of health care quality and medical errors.” Journal of Health Care Finance 39(1):39-50 (2012) <http://www.ncbi.nlm.nih.gov/pubmed/23155743>

4

Page 8: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

errors occurred is critical to improving medical decision making and avoiding medical malpractice. And, consider the growing cost of fraud and abuse across the healthcare industry, estimated to be anywhere between $82 and 272 Billion in the U.S.5 Unfortunately, the general cost and complexity surrounding patient safety, malpractice litigation, and fraud and abuse is only increasing.6

By implementing bitemporal data management, organizations can take a bold step towards lowering risk, improving transparency, and gaining a competitive advantage to outrun the competition.

BITEMPORAL ACROSS INDUSTRIES

FINANCIAL SERVICESBitemporal helps large banks better manage their data and adapt to the changes in laws and regulation that are impacting how business is done. For example, bitemporal helps by providing an accurate record of trades as they occur and are amended. After trades are made, they are later reconciled with counterparties and updates often occur before the trade is closed. With a unitemporal database, updates overwrite historical data, which can put enormous risk on individual traders and entire companies. Bitemporal provides an accurate picture of the entire lifecycle of a trade review, including when changes to counterparty names, transaction id’s, or price corrections occurred.

INSURANCEIn the insurance industry, bitemporal helps by providing a clear determination of coverage over the course of history, ensuring that even if there are retroactive changes, data is never overwritten.

5 Berwick, Hackbarth. “Eliminating waste in US health care.” JAMA 307(14):1513-6 (2012) <http://www.ncbi.nlm.nih.gov/pubmed/22419800>

6 James Sterngold. “For Banks, 2014 Was a Year of Big Penalties.” Wall Street Journal, 2014.

TABLE 2: Bitemporal in Financial Services

BEFORE BITEMPORAL AFTER BITEMPORAL

What do we think the trader’s position was, and what information do we think was available to the trader around the time when the trade was executed?

What was the trader’s exact position when the trade was executed, and what exact reference data was available at the time the trade was executed?

What were our customer’s credit ratings last year?

What were our customer’s credit ratings last year, as we knew them last quarter?

What was our market exposure when trade was made at 11:00am?

What was our market exposure when that trade was made at 11:00am, as we knew it at 11:30am?

What was the company’s profit when we gave guidance?

What did we think the company’s profit was when we gave guidance?

TABLE 3: Bitemporal in Insurance

BEFORE BITEMPORAL AFTER BITEMPORAL

What was the estimated impact of the disaster on insurance premiums?

What was the estimated impact of the disaster on insurance premiums, before the data was adjusted retroactively?

Did the beneficiary have coverage at the point of diagnosis?

Did the beneficiary have coverage at the point of diagnosis, before the legislation was enacted?

Was the employee with the company when the event occurred?

Was the employee with the company when the event occurred, as indicated by your records at that time?

“We’re in an era of very, very vigorous enforcement, of heightened super regulation. It’s not a one-off thing.”Benjamin Lawsky, Superintendent for Financial Services, New York State6

5

Page 9: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

The insurance company can always go back and see a history of past coverage at any point in time in the past. An insurer may also want to know employee status, and may need an accurate picture of when an employee was actually with a company at any point in time, as they knew it at any point in time.

HEALTHCAREHealthcare faces enormous challenges for all stakeholders, including providers, payers, and pharmaceutical and biotechnology companies. Bitemporal is one component of improvements in health IT that helps lower costs and improve outcomes by giving providers a more accurate picture of a patient’s history as varied teams direct the course of treatment, and an improved investigative tool when looking at adverse events. And, when Payers receive billing codes for procedures, they are able to track the full history of each patient. Even if changes to insurance coverage were made retroactively, no part of the history is lost. There are also benefits to pharmaceutical and biotechnology companies as they are able to use bitemporal to enhance decision making in both research and business.

LAW AND INTELLIGENCEBitemporal helps paint a complete picture even when disparate facts are gathered piece-meal before and after certain events. With a more complete picture, government agencies have the ability to better understand motives and even better predict future events. During investigations, bitemporal enables law enforcement officers to go back and ask why you went down a certain path, which is particularly useful when investigations are resurrected from cold case files.

TABLE 4: Bitemporal in Healthcare

BEFORE BITEMPORAL AFTER BITEMPORAL

What did the patient’s chart look like when the medication was prescribed?

What did the patient’s chart look like when the medication was prescribed, before the chart was updated with the lab results?

What was the coverage determination for that patient in June 2010?

What was the coverage determination for that patient in June 2010, as we knew it in August 2010?

What did the clinical trial results indicate when you made the additional investment?

What did the clinical trial results indicate when you made the additional investment, before the research results were updated?

TABLE 5: Bitemporal in Law and Intelligence

BEFORE BITEMPORAL AFTER BITEMPORAL

What was happening when we made the decision?

What did we think was happening when we made the decision?

When did the event happen? When did the event happen, and when was that recorded?

Why do we currently think that we pursued that course of action?

What were we thinking when we pursued that course of action?

“MarkLogic’s bitemporal offers the flexibility of correlating and delivering additional value of data (by providing intraday information, not just end-of-day information) to a diverse customer group—rapidly—that just hasn’t been fully realized before... In fact, MarkLogic’s bitemporal will provide an entirely new opportunity for our customers to perform additional analytics as well as enabling much richer capabilities in the area of compliance management.”

Paolo Pelizzoli, Global Head of Architecture, Global Technology Operations, Broadridge Financial Solutions

6

Page 10: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

WHY BITEMPORAL HAS BEEN DIFFICULTAt this point you are likely asking, “If bitemporal is that important, why haven’t I heard about it?” Although there have been thousands of research papers written on the topic of temporal data in the past twenty years and the topic of bitemporal has been discussed by experts since the early 1990’s, bitemporal is still relatively unknown.

Bitemporal clearly has incredible business value. Yet, most analysts on the business side do not even know they can ask for bitemporal data because it is so seldom put into production. The problem is that with relational databases the complexities of implementing and maintaining bitemporal generally outweigh the benefits. In fact, just handling ordinary temporal data in a relational database can be a huge challenge.

Unfortunately, despite efforts to make bitemporal data easier to manage in relational databases, bitemporal remains an unreachable goal with traditional tools. The number of experts in the world that can manage the complexities inherent with bitemporal implementations using relational databases is probably limited to only a special few individuals. Without going too deeply into the details of bitemporal data modelling, here are some of the key reasons why relational databases are ill-suited for bitemporal data management:

• Integrity Constraints – The relational data model comes with constraints such as referential integrity, entity integrity, and defined schemas that are not easily changed. Some constraints are specific to temporal data, such as child rows within a table only being able to include valid periods of time within the valid period of time defined by the parent row of the table. When bitemporal columns are added to a relational table, they can wreak havoc on the relational data model.

• Schema Evolution – There is incredible complexity when adding bitemporal to a relational model. Architectural and structural changes are temporal themselves, and when new columns are added with temporal dimensions or new tables are created as new data is ingested, the schema will change. Handling a changing schema and resulting changes in application code are complex projects already, even before trying to add bitemporal.

• Multiple Data Models – Handling schema evolution is a difficult challenge, but now imagine the task of handling multiple evolving schemas across multiple data models and data silos, and then aggregating them into a single source of truth. Data integration is an expensive task, but when bitemporal data is included, the complexity grows exponentially.

• Decline in Performance – Read and write performance typically dips because bitemporal queries must consider the additional axis of time in every query, and data usually spans multiple tables and in some cases even multiple servers. Attempts have been made to simplify queries and improve performance, but they have not gone far enough in eliminating the inherent complexity and performance issues caused by scattering bitemporal data across tables.

• Vendor Lock-in – Some vendors have begun to implement improvements in bitemporal. However, as happened in the past with implementing SQL standards, each vendor will implement them differently with their own syntax and then tack on an additional cost of the feature as an add-on.

Oftentimes, the response to the challenges of implementing bitemporal in a relational database is to find the next best solution. Here are some of the common responses.

“ Relational bitemporal offerings are not widely adopted because as time changes, the shape of the data usually changes as well… and RDBMS’ are not able to capture the evolving schema.” Global Investment Bank

7

Page 11: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

• “But, I can just use Slowly Changing Dimensions” – Attempts to use dimensional modelling “type two” Slowly Changing Dimensions (SCDs) as a way to approximate bitemporal data have been made in recent years, and the problems with this approach have been well documented.7 Using SCDs only approximates valid temporal data and results in many inconsistencies that are later difficult to uncover and fix. And, even if everything is designed properly, query performance will likely still be slow and results may not be reproducible.

• “But, I can just take frequent snapshots” – This approach, also referred to as “temporal versioning,” is a more common argument against bitemporal, as most organizations are already taking regular weekly or monthly snapshots of their data. This approach is stable and predictable. Unfortunately, this approach results in massive amounts of redundant data, immense storage costs, and still lots of lost information because of the gaps between snapshots. And, even if frequent snapshots are taken, regulators in most industries view this as increasingly unacceptable. Both regulators and data analysts have specific questions, require fast answers, and do not appreciate any gaps.

7 Tom Johnston. Bitemporal Data: Theory and Practice (Waltham, MA: Elsevier, 2014) 311 - 313.

• “But, I can just rely on my audit logs” – While useful for tracking event information, logs are not sufficient for bitemporal because they cannot be easily or quickly queried and would not meet standards for maintaining immutable records

Bitemporal is the only approach to managing time that provides a quick and seamless way to look back at historical data, query it on the fly at any point-in-time, and work with it operationally just as you would with your most current data.8

WHY THE TIME FOR BITEMPORAL IS NOWAs an Enterprise NoSQL database, MarkLogic provides the flexibility required to make storing and managing bitemporal data a practical reality, without sacrificing any performance with complex queries or data resiliency and security. MarkLogic is also unique in being the only Enterprise NoSQL database that has bitemporal capability.

MarkLogic is schema-agnostic, and manages data as documents. This means that you do not have to maintain a strict schema that must be adhered to throughout the life of the database. If you have to

8 Richard T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999. <http://www.cs.arizona.edu/~rts/tdbbook.pdf>

“ Despite the near universality of time and the time-varying nature of the enterprise being modeled—a static and unmalleable configuration is rare and uninteresting—SQL quite frankly does a lousy job in capturing those aspects that are changing in time, or in providing constructs to effectively model, query, or modify such information.”

Richard Snodgrass, Developing Time-Oriented Database Applications in SQL8

Advantages of Bitemporal in MarkLogic MarkLogic Other DBs

Schema-agnostic to handle schema evolution and multiple varying data models ✔ ✖

Simpler coding and operations ✔ ✖

Quicker time-to-value ✔ ✖

Scalability, elasticity, and reduced storage costs ✔ ✖

8

Page 12: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

integrate a new data source at a later date, you do not have to do complex ETL before loading that data into MarkLogic. The frustration of having to add a new column into a relational database simply disappears—whether you are adding a DATE column or anything else.

Bitemporal data may have a lifespan of decades, and organizations need a database that can respond rapidly to keep pace with schema evolution as new data sources are added. MarkLogic makes it easy to ingest new data sources, and if there are conflicts that need to be resolved (e.g., new data source has the column name “SRC_DATE” but it should be “CLAIM_DATE”), MarkLogic makes it easy to perform the necessary transformations to ensure a standard vocabulary. With MarkLogic, you never have to worry about the constraints found with relational data modelling such as entity integrity, referential integrity, and denormalization—even when it comes to bitemporal data management.

MarkLogic performs orders of magnitude better than relational databases for large-scale data integration projects, speeding up project delivery times by reducing the amount of time spent doing requirements gathering and data modelling, and improving the quality of prototypes. At Broadridge, a large financial services organization, it was remarked that “The first MarkLogic project took 60 days… It was estimated to take 3,000 days with existing technology.”

HOW BITEMPORAL WORKS IN MARKLOGIC

For those with a relational database background, working with temporal and bitemporal data in MarkLogic should be very familiar. The main difference is that rather than columns of dates in a table, that information now appears as timestamps within documents. MarkLogic stores and manages all data as documents, including bitemporal data.

Whether working with JSON or XML documents, a document is considered to be bitemporal if it includes timestamps for valid start and end times, and for system start and end times. One way to load a bitemporal document into MarkLogic is with MarkLogic Content Pump, or mlcp. You can also use the REST API. Or, you can load a bitemporal document using a simple JavaScript update query, which is shown below.

After loading bitemporal documents into MarkLogic, they are managed as a series of documents with range indexes for valid and system time axes. The valid and system time axes each serve as a container for a named pair of range indexes. And, the bitemporal documents are stored in temporal collections, which are logical groupings of temporal documents. You can create additional temporal collections if you have documents that require a different schema for the timestamps.

declareUpdate(); var root = { "tempdoc": { "systemStart": null, "systemEnd": null "validStart": "2014-04-03T11:00:00", "validEnd": "2014-04-03T16:00:00", "content": "some data, like closing price" } };

temporal.documentInsert("temporalCollection", "exampledata.json", root);

FIGURE 6. Updating a bitemporal document

9

Page 13: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

After initial documents are loaded into MarkLogic, they are always kept and never changed. Even if a bitemporal document is “deleted”, MarkLogic still keeps the document, but the system time is changed from infinity to the time of the delete. The same process works for updates—older versions are still kept and the “new” version is simply added. MarkLogic also does not allow updates to system start times. Once the system time is set for a collection, it continues to roll forward to further insure the integrity of the data.

Keeping track of the provenance of information with full governance and immutability is critical, which is why MarkLogic applies its security model to bitemporal documents. MarkLogic is certified by the National Information Assurance Partnership (NIAP) Common Criteria Evaluation and Validation Scheme, and uses Role Based Access Control (RBAC) by default to manage access to documents. This high level of security ensures that historical records are not tampered with, and that documents maintain their permissions over time.

KEY FEATURES OF BITEMPORAL IN MARKLOGICInsert, update (and never delete) – Ingest temporal JSON or XML documents with references to valid time using the Temporal API or mlcp, and make changes without losing any data as new versions are added

Complex temporal queries – Query the database along valid and system time axes using standard Allen and SQL operators when comparing time periods

Adapt to evolving schema – Avoid worrying about the changing shape of the data over time. Unlike relational databases, MarkLogic is schema-agnostic and can easily manage schema changes over time

Maintain a Last Stable Query Time – A special timestamp, called the LSQT (Last Stable Query Time), can be enabled in order to manage and coordinate system start times across systems

Combine with tiered storage – Use tiered storage to easily migrate historical data to less expensive storage tiers, without losing the ability to query the data

Combine with semantics – Assign bitemporal elements to documents, whether they are RDF triples, or documents that include RDF triples, giving you the ability to track how relationships change over time

Combine with geospatial – Gain the ability to track your data over time and space. MarkLogic stores geospatial data, and now you can accurately track how geospatial data changes over time

Take advantage of certified security – Manage bitemporal documents with the same certified security as all other documents, using Role Based Access Control (RBAC) or other security models

Scale quickly and easily – Avoid any concerns of under-provisioning with MarkLogic’s scale-out architecture, which allows you to easily add nodes to handle the increased demands of bitemporal data

FIGURE 7. A bitemporal query as viewed in MarkLogic Query Console

1 0

Page 14: BETTER WITH BITEMPORAL - MarkLogic · 2018-09-29 · BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly

HOW TO GET STARTED

Managing time is not easy. If it were, we probably could have avoided multi-billion dollar problems like Y2K.9 But, managing time is a necessity, and bitemporal is the future of managing time in datbases as we seek to maintain a better record of “what we knew” and “when we knew it.” MarkLogic takes away the constraints that prevent the adoption of bitemporal, and is the best database for storing and managing bitemporal data.

GET GOING QUICKLY1. Identify the questions your business cannot

currently answer2. Identify the business benefits of adding bitemporal3. Assess the current data management environment4. Engage with MarkLogic to discuss implementation5. Download MarkLogic6. Learn more in MarkLogic’s free training

9 According to the BBC and ComputerWorld, the estimated cost of the prepara-tion and remediation for the “Year 2000 problem”, or Y2K, was $608 Billion, and that’s not taking into account inflation. For more information: Robert L. Mitchell. “Y2K: The good, the bad and the crazy”. ComputerWorld (28 December 2009) <http://www.computerworld.com/article/2522197/it-management/y2k--the-good--the-bad-and-the-crazy.html?page=2>

MORE INFORMATION• Read MarkLogic Documentation – Learn how to

work with bitemporal data in MarkLogic at docs.marklogic.com/guide/temporal

• Watch a Presentation – Hear from a MarkLogic customer about “Why Banks Care About Bitemporal” www.marklogic.com/resources/why-banks-care-about-bitemporality/

• Schedule a Meeting – Discuss your particular use case with a MarkLogic sales representative by contacting us at [email protected]

“MarkLogic has a history of bringing advanced data management technology to market, and many of their customers and partners are accustomed to managing complex data in an agile manner. As a result, MarkLogic customers and partners, in general, have a more mature and creative view of how to manage and use data than do most other database users.”

Carl Olofson, Research Vice President for Data Management Software Research, IDC

1 1