8
Flexible integration of eventually consistent distributed storage with strongly consistent databases Olivier Parisot, Antoine Schlechter, Pascal Bauler, Fernand Feltz Département Informatique, Systèmes et Collaboration (ISC) Centre de Recherche Public - Gabriel Lippmann Belvaux, Luxembourg {parisot|schlecht|bauler|feltz}@lippmann.lu Abstract—In order to design distributed business applications or services, the common practice consists in setting up a multi-tier architecture on top of a relational database. Due to the recent evolution of the needs in terms of scalability and availability in cloud environments, the design of the data access layer got significantly more complicated because of the trade-off decisions between consistency, scalability and availability that have to be taken into account in accordance with the CAP theorem. An interesting compromise in this context consists in offering some flexibility at the consistency level, in order to allow multi-tier architectures to support partition tolerance flexibility while guaranteeing availability. This paper introduces a flexible data layer that guarantees availability and gives the ability to the developers to easily select the required execution context, by integrating eventually consistent storage with strongly consistent databases. A given query can either be executed in an eventually consistent but very scalable context or in a strongly consistent context with limited scalability. The benefits of the proposed framework are validated in a real-world use case. Multi-layer architecture, data management, CAP theorem, eventual consistency, strong consistency I. INTRODUCTION AND BACKGROUND The design of complex business applications and services relying on relational data structures and non-trivial business logic is a well explored research domain. The common and currently well-established solution consists in introducing a multi-tier software architecture [2] on top of a Relational Database Management System (RDBMS) in charge of data persistence [10] (Fig.1). In these software solutions, the structured data are mapped onto tables located in relational databases which are queried to offer data access and analytics [14]. Multi-tier architectures based upon this persistence model have been very successful for various kinds of applications like websites, business applications, various services in the context of SOA, etc. [13]. In most implementations, the data access layer at the lowest level mainly acts as a wrapper of the RDBMS-API and offers convenient data access. All important aspects like transactions and consistency are mainly managed by the underlying RDBMS (and partially by applications servers). This type of software architecture is convenient for applications with a high demand on availability and consistency but with only very limited needs in terms of scalability and distributed data management. Figure 1. A sample multi-layer architecture In order to offer new business opportunities and to better respond to existing business needs, modern business applications have to handle additional constraints related to availability and scalability [9]. By doing so, they are confronted to the limitations of the CAP-Theorem [11, 12], which states that no data store can simultaneously respect the properties of Consistency, Availability, and Partition Tolerance. In contrast to legacy software solutions, distributed systems have to take partition tolerance into account which deeply changed the domain of data management. RDBMS are now challenged by the emergence of new kinds of data stores in areas where a RDBMS is not the appropriate solution to handle all encountered data management related issues [1]. And even if these new kinds of data stores like NoSQL are typically found in the field of the Web 2.0 companies with huge volumes of data (such as Amazon or Google), their usage in business applications/services is promising and offers the opportunity to significantly improve the scalability of existing software solutions. From a theoretical point of view, there are fundamental differences between the classic ACID properties (Atomicity, Consistency, Isolation, Durability) known from RDBMS and the more recent BASE properties (Basically Available, Soft State, Eventually Consistent) applied in NoSQL databases [15]. In terms of the CAP-Theorem, BASE-based data stores favor availability and partition tolerance over strong consistency, while ACID-based data stores favor strong consistency and availability over partition tolerance. 2012 IEEE Second Symposium on Network Cloud Computing and Applications 978-0-7695-4943-9/12 $26.00 © 2012 IEEE DOI 10.1109/NCCA.2012.23 65 2012 IEEE Second Symposium on Network Cloud Computing and Applications 978-0-7695-4943-9/12 $26.00 © 2012 IEEE DOI 10.1109/NCCA.2012.23 65 2012 IEEE Second Symposium on Network Cloud Computing and Applications 978-0-7695-4943-9/12 $26.00 © 2012 IEEE DOI 10.1109/NCCA.2012.23 65 2012 Second Symposium on Network Cloud Computing and Applications 978-0-7695-4943-9/12 $26.00 © 2012 IEEE DOI 10.1109/NCCA.2012.23 65

[IEEE 2012 Second Symposium on Network Cloud Computing and Applications (NCCA) - London, United Kingdom (2012.12.3-2012.12.4)] 2012 Second Symposium on Network Cloud Computing and

  • Upload
    fernand

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Flexible integration of eventually consistent distributed storage with strongly consistent databases

Olivier Parisot, Antoine Schlechter, Pascal Bauler, Fernand Feltz Département Informatique, Systèmes et Collaboration (ISC)

Centre de Recherche Public - Gabriel Lippmann Belvaux, Luxembourg

{parisot|schlecht|bauler|feltz}@lippmann.lu

Abstract—In order to design distributed business applications or services, the common practice consists in setting up a multi-tier architecture on top of a relational database. Due to the recent evolution of the needs in terms of scalability and availability in cloud environments, the design of the data access layer got significantly more complicated because of the trade-off decisions between consistency, scalability and availability that have to be taken into account in accordance with the CAP theorem. An interesting compromise in this context consists in offering some flexibility at the consistency level, in order to allow multi-tier architectures to support partition tolerance flexibility while guaranteeing availability. This paper introduces a flexible data layer that guarantees availability and gives the ability to the developers to easily select the required execution context, by integrating eventually consistent storage with strongly consistent databases. A given query can either be executed in an eventually consistent but very scalable context or in a strongly consistent context with limited scalability. The benefits of the proposed framework are validated in a real-world use case.

Multi-layer architecture, data management, CAP theorem, eventual consistency, strong consistency

I. INTRODUCTION AND BACKGROUND The design of complex business applications and services

relying on relational data structures and non-trivial business logic is a well explored research domain. The common and currently well-established solution consists in introducing a multi-tier software architecture [2] on top of a Relational Database Management System (RDBMS) in charge of data persistence [10] (Fig.1). In these software solutions, the structured data are mapped onto tables located in relational databases which are queried to offer data access and analytics [14]. Multi-tier architectures based upon this persistence model have been very successful for various kinds of applications like websites, business applications, various services in the context of SOA, etc. [13]. In most implementations, the data access layer at the lowest level mainly acts as a wrapper of the RDBMS-API and offers convenient data access. All important aspects like transactions and consistency are mainly managed by the underlying RDBMS (and partially by applications servers).

This type of software architecture is convenient for applications with a high demand on availability and consistency but with only very limited needs in terms of scalability and distributed data management.

Figure 1. A sample multi-layer architecture

In order to offer new business opportunities and to better respond to existing business needs, modern business applications have to handle additional constraints related to availability and scalability [9]. By doing so, they are confronted to the limitations of the CAP-Theorem [11, 12], which states that no data store can simultaneously respect the properties of Consistency, Availability, and Partition Tolerance.

In contrast to legacy software solutions, distributed systems have to take partition tolerance into account which deeply changed the domain of data management. RDBMS are now challenged by the emergence of new kinds of data stores in areas where a RDBMS is not the appropriate solution to handle all encountered data management related issues [1]. And even if these new kinds of data stores like NoSQL are typically found in the field of the Web 2.0 companies with huge volumes of data (such as Amazon or Google), their usage in business applications/services is promising and offers the opportunity to significantly improve the scalability of existing software solutions.

From a theoretical point of view, there are fundamental differences between the classic ACID properties (Atomicity, Consistency, Isolation, Durability) known from RDBMS and the more recent BASE properties (Basically Available, Soft State, Eventually Consistent) applied in NoSQL databases [15]. In terms of the CAP-Theorem, BASE-based data stores favor availability and partition tolerance over strong consistency, while ACID-based data stores favor strong consistency and availability over partition tolerance.

2012 IEEE Second Symposium on Network Cloud Computing and Applications

978-0-7695-4943-9/12 $26.00 © 2012 IEEEDOI 10.1109/NCCA.2012.23

65

2012 IEEE Second Symposium on Network Cloud Computing and Applications

978-0-7695-4943-9/12 $26.00 © 2012 IEEEDOI 10.1109/NCCA.2012.23

65

2012 IEEE Second Symposium on Network Cloud Computing and Applications

978-0-7695-4943-9/12 $26.00 © 2012 IEEEDOI 10.1109/NCCA.2012.23

65

2012 Second Symposium on Network Cloud Computing and Applications

978-0-7695-4943-9/12 $26.00 © 2012 IEEE

DOI 10.1109/NCCA.2012.23

65

Thus, when scalability and availability are required, some trade-offs on consistency have to be taken into account and a BASE-style data store needs to be used. However consistency constraints cannot be completely ignored but have to be rigorously managed and loosened in a controlled way [16]. It is a non-trivial task to find the right balance between consistency and scalability [4]. To reach this goal, consistency flexibility is an interesting and promising way [7], and several research teams are already working on this subject [3, 5].

On the other hand, several business scenarios do not rely on strong consistency. In fact, in multi-tier systems, the data which is currently displayed by the presentation layer is not continuously updated in real-time by the data layer. As a consequence, the presentation layer offers an eventually consistent user experience, even if the underlying data stores are strongly consistent.

Nevertheless, the exclusive use of a BASE-style data store is not sufficient for other use cases which heavily take advantage of strongly consistent data access. In fact, the integration of those new kinds of data stores is a non-trivial challenge, and it has huge implications on the application architecture, design and the underling development process [6].

This paper proposes an application architecture, which leverages the advantages of both kinds of data stores where appropriate in a convenient and easily accessible way. The proposed solution (FlexibleDL) is an innovative data access layer to use in multi-tier software architectures that is ready to handle multiple data consistency and scalability requirements. FlexibleDL offers two different access modes to the higher architecture layers, a first one where the data are accessed in an eventually consistent (and scalable) way and another one where data are manipulated in a strongly consistent way. The access mode may be chosen on a per-request level and (all) data are available through both modes.

The rest of this article is organized as follows. Firstly, related works about consistency flexibility are discussed. Secondly, the architecture of FlexibleDL and the internal design are described in detail. Finally we present an implementation of FlexibleDL in a real use case.

II. RELATED WORK Providing different levels of consistency is an interesting

theme for architecture design. As the main goal of this paper is to provide the choice between scalability versus consistency while guarantying availability, a key component to reach this goal is data replication, which can be implemented at the data store layer [21] or at the architecture layer [8]. This component is in charge of consistent data replication between a strongly consistent and a highly scalable data store. An interesting approach in the context of database replication is Lazy Primary Copy Replication [21]. This mechanism describes the replication of data updates between a master data store and a slave data store, in order to provide eventual consistency in the slave component. To ensure a Lazy Primary Copy Replication, the following key points need to be managed:

• Data changes have to be grouped before replication from the master data store to the slave data store: the

group should correspond to the transaction committed on the master.

• Each group of data changes has to be replicated from the master data store to the slave data store only if a transaction is committed on the master data store.

• In each group, ordering of the data changes has to be preserved.

Data replication can be completed and improved by others techniques. Recently, innovative solutions have appeared in this field, like the Command Query Responsibility Segregation (CQRS) pattern [24, 25]. This solution, inspired by the Command Query Separation principle [17], and deeply based on Domain Driven Design [18], introduces a complex mechanism on the whole architecture to manage the read and write operations differently. The underlying hypothesis of this solution is that read and write operations have not the same needs in terms of consistency and scalability: read operations may be eventually consistent, while write operations need to be strongly consistent. As a consequence, two kinds of interactions are defined: the commands for the strongly consistent write operations, and the queries for the eventually consistent read operations. Thus, there are two architectural components: the CommandLayer that has to process the commands and the QueryLayer that has to answer to the queries (see Fig. 2).

The major aspect is that CQRS is an event-centric solution, based on a simple concept called domain event. Domain events represent a capture of completed changes on the domain model [18], and those changes are propagated over the whole architecture.

The event propagation is quite simple: when commands are invoked on the CommandLayer (for instance by the PresentationLayer), the domain model is updated and domain events are generated and dispatched to a central queue by a Dispatcher. On the other hand, the Query Layer receives domain events from the Dispatcher and has to apply them into its internal representation of the data, in order to be able to answer to queries with the updated data. Due to the event-centric aspect, write operations resulting from commands are not instantly applied on the QueryLayer and the update of the QueryLayer is delayed. As a result, the read operations which are executed on the QueryLayer are only eventually consistent.

CQRS is a good step forward in providing different consistency levels within single software architecture. One major drawback of CQRS is that strongly consistent read operations are not possible.

Figure 2. Simple overview of CQRS architecture

66666666

III. FLEXIBLEDL In this paper, we present the general architecture and a

reference implementation for a flexible data layer FlexibleDL that offers the choice to execute a given query in an eventually consistent but very scalable context or in a strongly consistent context with restricted scalability, depending on the needs.

FlexibleDL is inspired by the CQRS pattern, but instead of separating commands and queries, the suggested architecture proposes to separate contexts by level of consistency (strongly consistent or eventually consistent). It is based on the fact that different parts of an application do not have the same needs in terms of consistency and scalability. In addition, FlexibleDL is based on Lazy Primary Copy Replication to synchronize data between the strongly consistent context and the eventually consistent context.

A. Two data models Before going further, it is important to introduce a

distinction of data according to the usage [20]. In FlexibleDL, two kinds of data should be distinguished: the Relational Domain Model, and the Data Transfer Object Model.

In the first place, the Relational Domain Model refers to the structured data managed by the system. Due to the nature of the data (relational, strongly consistent, readable and writable), this model is traditionally persisted in a RDBMS.

In the second place, the Data Transfer Object Model refers to the data that are retrieved, combined, transformed, and extracted from the Relational Domain Model. In others words, it refers to the results of queries, that are under the form of Data Transfer Objects (DTOs). It is a de-normalized, duplicated and read-only copy of the Relational Domain Model, used to be transferred between layers (between presentation layers and the others, for example). Data Transfer Objects may be really huge, they may be costly to build based on complex and long-running queries, and they may often be out of date. Traditionally, Data Transfer Objects just like views are volatile and non-persistent in a RDBMS. In order to speed up query execution on views, the concept of “materialized views” has been introduced in the world of RDBMS [26]. Inspired by this concept, we will manage a persistent copy of the Data Transfer Object Model in FlexibleDL.

The following sections will explain how all these concepts are combined to design a flexible data layer.

B. General architecture FlexibleDL is a framework acting as a data access layer in a

multi-tier architecture (Fig. 3).

Figure 3. Positioning of FlexibleDL

FlexibleDL relies on three main components: the strongly consistent layer, the eventually consistent layer, and the synchroniser component (Fig. 4). The two first components should be considered as distinct Bounded Contexts [18] (one for the strongly consistent concern, the other for the eventually consistent concern), whereas the synchroniser component is responsible for the data exchange between the two other components.

Figure 4. Architecture of FlexibleDL

The next sub-sections will explain how these different components interact.

C. Strongly consistent layer The strongly consistent layer is a traditional data layer,

where strongly consistent write and read operations are executed. From the technical point of view, this component is a thin data management layer coupled to a fully ACID based RDBMS. Strong consistency and transactional behavior are taken in charge by the RDBMS, and our strongly consistent layer is a decorator that provides the functionalities described below.

The major components of the strongly consistent layer are the data repositories and the strongly consistent views (Fig. 5).

Figure 5. Architecture of the strongly consistent layer

On the one hand, a repository is an abstraction of the subsequent data source providing read and write operations in order to manage data in the relational model [18]. In addition to providing convenient access to the relational model, it should notify a component called Interceptor (the role of this

67676767

component is explained in the next sections of this paper) whenever a transaction is committed, i.e. when a set of data changes is committed. To avoid wrong notifications, data changes are not notified when a transaction is rolled back. Here is an explanation of the implementation:

• Data changes are represented by DataDiff objects (Table 1), which describe the type of data change (creation/deletion/update) and the current state of the managed object.

TABLE 1. DATA CHANGES DEFINITION

TypeOfChange {ADD,UPD,DEL} DataDiff { Object modifiedObject String typeOfObject TypeOfChange typeOfChange }

• Repositories offer methods to create, update and delete objects (Table 2). In these methods, DataDiff objects are built and stored in a list, and this list is managed during transaction processing. When a transaction is committed, the list of DataDiff objects is sent to the Interceptor. In practice, a convenient repository may be implemented on top of an existing Object Relational Mapping (ORM): in this scenario, interception of data changes may be done using call-back features of the ORM tool.

TABLE 2. PSEUDO CODE FOR REPOSITORY

Repository { List<DataDiff> l; void create(obj) { l.add(new DataDiff(obj,ADD)); } void modify(obj) { l.add(new DataDiff(obj,UPD)); } void delete(obj) { l.add(new DataDiff(obj,DEL)); } void begin() { l=new List<DataDiff>(); } void commit() { Interceptor.intercept(l);l.clear(); } void rollback() { l.clear();} }

On the other hand, each strongly consistent view is a component that is able to respond to queries, i.e. to build Data Transfer Objects from the relational model in a strongly consistent manner. As a result, queries defined and executed by this way are standard SQL, JPQL, or other queries (Table 3).

TABLE 3. PSEUDO CODE FOR STRONGLY CONSISTENT VIEW

// DTO : object used as result for queries StronglyConsistentView <T extends DTO> { DBMSConnection c; Collection<T> executeQuery() { return c.executeSQLQuery(…);} }

D. Eventually consistent layer The eventually consistent layer is in charge of the Data

Transfer Object Model: it manages a read-only and denormalized copy of the strongly consistent data. It is composed of eventually consistent views (Fig. 6).

Figure 6. Architecture of the eventually consistent layer

The concept of eventually consistent views defines a view as a component that is able to respond to queries by building Data Transfer Objects (Table 4) in an eventually consistent but fully scalable manner.

The essential part of the definition of an eventually consistent view is how the view is actually updated. The used incremental update principle is similar to the well-known ‘incremental view maintenance’ mechanism [26]. In practice, the update is done by view refresher components (Table 5), which are responsible to react to data changes by incrementally updating the eventually consistent views. When a data change is notified through a view refresher instance, then this instance has to update the view according to the kind of data changes (create/update/delete) and according to the changed values (by using the new version of the data passed with the DataDiff object). In addition, there are transactional aspects to be taken into account when updating eventually consistent views. Each view has to stay in a coherent state, so all data changes in one event have to be applied to the view in one transaction. That is why the eventually consistent layer should have transactional support.

TABLE 4. PSEUDO CODE FOR EVENTUALLY CONSISTENT VIEW

// DTO : object used as result for queries EventuallyConsistentView <T extends DTO> { List <T> executeQuery() Number lastAppliedEventIdentifier List<ViewRefresher> getVRList(typeOfObject) void beginTransaction() void commitTransaction() }

68686868

TABLE 5. PSEUDO CODE FOR VIEW REFRESHER

ViewRefresher { private EventuallyConsistentView viewToUpdate; void onAdd(Object object); void onUpd(Object object); void onDel(Object object); }

E. Synchronizer Between the strongly consistent layer and the eventually

consistent layer, the synchroniser component is in responsible to make sure that the eventually consistent layer may execute eventually consistent read operations. Its goal is to process a Lazy Primary Copy Replication [21], as outlined in the related work section.

From an architectural point of view, the synchroniser component is composed of several sub-components: Interceptor, Publisher, Listener and Applier (Fig. 7). These sub-components interact to implement the Lazy Primary Copy Replication algorithm, i.e. they ensure that data changes are asynchronously processed according to the following procedure:

• The data changes are captured from the strongly consistent layer.

• The data changes are grouped by transaction.

• The data changes are ordered by transaction order.

• The data changes are applied on the eventually consistent layer.

Figure 7. Architecture of the synchroniser component

On the strongly consistent side, the Interceptor is in charge of intercepting all data changes that have occurred on the strongly consistent layer. When strongly consistent write operations are processed, i.e. when a transaction is committed by a Repository instance, the data changes related to the committed transaction are listed by the Repository component and notified to the Interceptor component (Table 2, Table 7 left). When data changes are notified, the Interceptor component builds a DataChangedDomainEvent (Table 6). As explained previously, each DataChangedDomainEvent contains all the data changes that have occurred during a transaction. After that, the DataChangedDomainEvent is published by the Publisher (Table 7), which is a component connected to an Event Bus or an Enterprise Service Bus (ESB).

The Event Bus component decouples the strongly consistent layer and the eventually consistent layer by propagating data changes asynchronously. The Event Bus has to ensure that delivery and ordering of the DataChangedDomainEvents are guaranteed. The ordering requirement is critical to ensure Lazy Primary Copy Replication [21], and as a consequence to ensure eventual consistency. It is accomplished by the use of a unique identifier which is incrementally assigned by the Publisher.

TABLE 6. EVENT DEFINITION

DataChangedDomainEvent {Number identifier; List<DataDiff> diffs;}

TABLE 7. PSEUDO CODE FOR EVENT BUILDING AND PUBLISHING

Interceptor { // called when a transaction is committed void intercept(diffs) { event = new DataChangedDomainEvent(diffs) Publisher.publish(event) } } Publisher { // called by Interceptor void publish(event) { event.id=incrementUniqueIdentifier() ESB.publish(event) } }

On the eventually consistent side, the Listener component intercepts events from the Event Bus (Table 8) and passes them to the Applier component, which is in charge of processing the DataChangesDomainEvents on the eventually consistent layer. Its role is to update the Data Transfer Object Model according to the Relational Model changes described by the events. To do this, the Applier component updates each eventually consistent view, by invoking the corresponding view refresher instance according to the DataChangesDomainEvents content (Table 9).

TABLE 8. PSEUDO CODE FOR LISTENER

Listener { // called by ESB when an event is received void onEvent(event) { Applyer.apply(event) } }

69696969

TABLE 8. PSEUDO CODE FOR APPLYER

Applier { // called by Listener void apply(event) { for each view in viewsList if (view.lastAppliedEventId<event.identifier) view.beginTransaction() foreach diff in event.diffs vrList=view.getVRList(diff.typeOfObject) foreach refresher in vrList switch (diff.typeOfChange) case ADD: refresher.onAdd(diff.Object) case UPD: refresher.onUpd (diff.Object) case DEL: refresher.onDel(diff.Object) endforeach endforeach view.lastAppliedEventIdentifier=event.identifier view.commitTransaction() endif endforeach } }

The ordering aspect is important: even if events are guaranteed to be delivered in the right order by the event publication mechanism managed by Publisher / Event Bus / Listener, each eventually consistent view has to know the identifier of the last successfully applied event, in order to explicitly manage the data freshness and correctness. An event with the identifier N can only be applied on an eventually consistent view if the identifier L of the last event applied is smaller than N.

F. Query definition and execution Using FlexibleDL, queries can be defined in two contexts:

in a strongly consistent context or in an eventually consistent context. According to the needs, several kinds of queries can be distinguished:

• The strongly consistent queries, executed in strongly consistent context only.

• The eventually consistent queries, executed in eventually consistent context only.

• The queries that can be executed in both consistent contexts.

In the first and the second case, the queries have to be defined in the context corresponding to the needs.

The last case is interesting: what to do if a given query should be used in the two contexts? As previously explained, the strongly consistent views are defined by SQL requests while the eventually consistent views are defined by a view refreshing process. So in this case, the query has to be defined individually for each execution context, and both definitions must be equivalent.

G. System initialization The system initialization is a pre-requisite to exploit the

overall architecture. During this initialization phase, two cases have to be distinguished according to the relational data base (behind the strongly consistent layer) state:

• The relational data base is empty, and the eventually consistent layer is empty too: it corresponds to the start of an empty system with no data at all. There is no problem, the strongly consistent layer and the eventually consistent layer are both empty and they are filled together.

• The relational data base is already filled with data: the eventually consistent layer has to be initially synchronized with the strongly consistent layer. To do that, there is a specific ResyncEvent to resynchronize an eventually consistent view with the equivalent strongly consistent view. When this event is fired, all the data managed by the strongly consistent views are copied to the eventually consistent views. This is a costly operation, but it is needed to restore the system integrity after a crash, or to initialize the use of FlexibleDL in an existing and running system.

As a result, FlexibleDL is a Data Access Layer that is able to manage different levels of consistency. Especially, the framework offers the choice of the level of consistency for the execution of a given query, by providing both eventually and strongly consistent ways to retrieve the query results.

IV. CASE STUDY

A. The framework In order to validate the approach described in this paper, a

Java framework has been developed. This framework is an implementation of the FlexibleDL data access layer, which provides different levels of consistency for read operations. The major components have been implemented according to the technical requirements previously explained in this paper. Our implementation choices are summarized in Table 10.

TABLE 10. IMPLEMENTATION DETAILS

Component Technical requirements Implementation����

Strongly Consistent Layer

Fully ACID

Able to capture data changes

RDBMS: HSQLDB [23] Repository: Hibernate’s JPA 2.0 implementation [22]

Data changes are captured using Hibernate Interceptors�

Eventually Consistent Layer

BASE

Eventually consistent

Transaction support

Based on Hazelcast [19], a scalable data distribution platform: each eventually consistent view is built on a distributed hash map�

Synchroniser component

Event transport

Event ordering�

Event Bus: based on Hazelcast’s message passing system�

70707070

B. A sample use case A typical business application has different parts, i.e.

bounded contexts [18], with specific goals and constraints. The modules managing the data edition (CRUD: Create Read Update Delete) for instance need strong consistency, whereas the modules responsible to data reporting need more scalability, and are good candidates to work in an eventually consistent manner.

As a sample example, a simple Order Management System (OMS) has been developed using this framework. Since an OMS is a mix of CRUD applications (such as products and customers management), OLTP applications (such as orders), and OLAP applications (such as sales statistics), we can effectively demonstrate the functions offered by FlexibleDL. The corresponding domain model is shown in Table 11.

TABLE 11. DOMAIN MODEL

Customer { String name;} Product { String name; } PurchaseOrderLine { Integer quantity; Product product;} PurchaseOrder {Date date; Customer customer; List<PurchaseOrderLine> orderLines; }�

In the given example, let us consider the specific request for the X most purchased products. This request can be executed in different contexts with different needs. On the one hand, when this request has to be used in a critical business feature like ‘provider ordering according to their importance, in order to pay them differently’, then it should be executed in a strongly consistent context. On the other hand, if the request has to be used to show informative data on a website, then it could be executed in an eventually consistent context in order to be more scalable.

Using FlexibleDL, this request can be implemented to be efficient in the two contexts.

First of all, the strongly consistent view component that gives access to PurchasedProductCount DTOs in a strongly consistent context is built around a simple and classic HQL (Hibernate Query Language) request (Table 12).

TABLE 12. QUERY USED BY THE STRONGLY CONSISTENT VIEW

select new PurchasedProductCount (ol.product.id,ol.product.name,count(*)) from PurchaseOrderLine ol group by ol.product.id,ol.product.name order by count(*)

Secondly the EventuallyConsistentView component that gives access to the same DTOs but relying on an eventually consistent context, is a component built around a distributed map managed by Hazelcast (Table 13).

TABLE 13. EVENTUALLY CONSISTENT VIEW IMPLEMENTATION

PurchasedProductCountECView { // distributed map containing DTOs contentMap=Hazelcast.getMap( ); // for query List <PurchasedProductCount> executeQuery() { return contentMap.values() } // for view update List<ViewRefresher> getVRList(typeOfObject) { vr=new VRWithPurchaseOrderLine(this) return new List<ViewRefresher>(vr); } }

According to the definition of eventually consistent view, this distributed map is updated by the Synchronizer component by defining an appropriate view refresher instance (Table 14).

TABLE 14. VIEW REFRESHER IMPLEMENTATION

VRWithPurchaseOrderLine { PurchasedProductCountECView view void onAdd(PurchaseOrderLine ol) {...} void onUpd(PurchaseOrderLine ol) {...} void onDel(PurchaseOrderLine ol) {...} }

By using FlexibleDL, the simple OMS presented in this paper is able to manage CRUD operations with strong consistency, while providing reporting features with eventual consistency or strong consistency, for the different use cases.

V. CONCLUSION Consistency flexibility is an important feature to improve

the scalability of distributed business applications without losing support for strong consistency where needed. The proposed architecture can be easily used to smoothly migrate existing business solutions to more scalable software architectures, or when responding the read intensive business requirements, without changing the overall software architecture.

This paper has presented an elegant solution that supports both eventually and strongly consistent contexts at the same time. By using both eventually consistent data storages with strongly consistent databases, it provides the means to explicitly choose to execute queries in a strongly consistent context or in an eventually consistent context, according to the specific consistency and scalability needs. In future works we will try to measure the scalability gain for typical systems when using this solution and also analyze the benefits when introducing eventually consistent write operations.

71717171

REFERENCES [1] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros

Harizopoulos, Nabil Hachem Pat Helland, “The end of an architectural era: (it's time for a complete rewrite)”, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria

[2] M Ameling, M Roy, B Kemme, “Understanding and Evaluating Replication in Service Oriented Multi-tier Architectures”, Software and Data Technologies, Communications in Computer and Information Science, Volume 47. ISBN 978-3-642-05200-2. Springer-Verlag Berlin Heidelberg, 2009, p. 91

[3] Sanny Gustavsson , Sten F. Andler, “Self-stabilization and eventual consistency in replicated real-time databases”, Proceedings of the first workshop on Self-healing systems, November 18-19, 2002, Charleston, South Carolina

[4] H Wada, A Fekete, L Zhao, K Lee, A Liu, “Data Consistency Properties and the Tradeoffs in Commercial Cloud Storages: the Consumers’ Perspective”, Conference on Innovative Data Systems Research, 2011, Asilomar, California

[5] M Serafini, F Junqueira, “Weak consistency as a last resort”, Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware, 2010, Zurich, Switzerland

[6] Zhou Weiy, Guillaume Pierre, Chi-Hung Chi, “Consistent Join Queries in Cloud Data Stores”

[7] D Agrawal, A El Abbadi, S Antony, S Das, “Data management challenges in cloud computing infrastructures”, Databases in Networked Information Systems, Lecture Notes in Computer Science, 2010, Volume 5999/2010, 1-10

[8] T Repantis, A Iyengar, V Kalogeraki, I Rouvellou, “Consistent Replication in Distributed Multi-Tier Architectures”, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2011

[9] Ricardo Jimenez-Peris , Marta Patiño-Martinez , Bettina Kemme , Francisco Perez-Sorrosal , Damian Serrano, "A System of Architectural Patterns for Scalable, Consistent and Highly Available Multi-Tier Service-Oriented Infrastructures", Architecting Dependable Systems VI, Springer-Verlag, Berlin, Heidelberg, 2009

[10] SM Fernandes, J Cachopo, "A New Architecture for Enterprise Applications with Strong Transactional Semantics", Lisbon INESCID/IST, 2011

[11] Eric Brewer, “Towards Robust Distributed Systems”, PODC (Principles of Distributed Computing) Keynote (July 2000)

[12] Seth Gilbert and Nancy Lynch, “Brewer’s Conjecture and the Feasibility of Consistent, Available, and Partition-Tolerant Web Services”, ACM SIGACT News, Volume 33, Issue 2 (June 2002).

[13] Eckerson, Wayne W. "Three Tier Client/Server Architecture: Achieving Scalability, Performance, and Efficiency in Client Server Applications." Open Information Systems 10, 1 (January 1995): 3(20)

[14] Neal Leavitt, “Will NoSQL Databases Live Up to Their Promise?”, Computer, v.43 n.2, p.12-14, February 2010

[15] D. Pritchett, “BASE: An Acid Alternative”, ACM Queue, 6(3):48–55, 2008.

[16] Peter Alvaro, Neil Conway, Joe Hellerstein, William R. Marczak, "Consistency Analysis in Bloom: a CALM and Collected Approach", Conference on Innovative Data Systems Research, 2011, Asilomar, California

[17] B. Meyer, "Object-oriented software construction", Prentice Hall International Series in Computer Science, New York: Prentice-Hall, 1988

[18] E. Evans, "Domain-Driven Design", 2003, Addison-Wesley, Boston [19] http://www.hazelcast.com/ [20] Pat Helland, "Data on the Outside Versus Data on the Inside", CIDR

2005: 144-153 [21] M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, G. Alonso,

"Understanding Replication in Databases and Distributed Systems", Proceedings of the The 20th International Conference on Distributed Computing Systems, p.464, April 10-13, 2000

[22] http://www.hibernate.org [23] http://www.hsqldb.org [24] http://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf [25] http://martinfowler.com/bliki/CQRS.html [26] Ashish Gupta, Inderpal Singh Mumick, "Maintenance of Materialized

Views: Problems, Techniques, and Applications", IEEE Data Eng. Bull. 18(2): 3-18 (1995)

72727272