Minister of Higher Education and Scientific · PDF fileMinister of Higher Education and Scientific Research ... Synchronization algorithm for cloud databases on mobile ... 1.3 Disadvantages

People's Democratic Republic of Algeria

Minister of Higher Education and Scientific Research

El-oued University Faculty of Science and Technology Department of Computer Science

№ Ordre:

№ Serial:

LMD Report Master Option: Artificial Intelligence and Distributed Systems

2014/2015

Prepared by: Abderrazak HENKA

Abderrahman MOUSSAOUI

Sustained: Mouhamen Anouar NAOUI

Synchronization algorithm for cloud databases on mobile

application

Acknowledgement

We thanks ALLAH before all, after that all and the best thanks is for our prof and supervisor

Mouhamen Anouar NAOUI, because he give us the courage and the hopeful we thanks him a lot.

ii

Abstraction

Mobile application Development is relatively a new and popular domain. Mobile

applications mainly connect users to data and information that come from the internet.

However connection is not always available and reliable. Therefore, developers need to

make their applications work without connection.

Our report proposes data synchronization algorithm between a mobile and the cloud.

This algorithm is inspired from some other algorithms that already exist, to make

synchronization as efficient as possible, respecting bandwidth usage, number of requests and

mobile storage.

iii

Ré sumé

Le développement des applications mobile est un domaine relativement nouveau et

populaire. Les applications mobiles connectent principalement les utilisateurs aux données

et informations qui viennent de l'Internet.

Cependant la connexion n’est pas toujours disponible et fiable. Par conséquent, les

développeurs ont besoin de faire leurs applications fonctionnent hors ligne.

Notre thèse propose un algorithme de synchronisation de données entre le mobile et le

cloud. Cet algorithme est inspiré de certains autres modèles qui existent déjà, pour faire la

synchronisation aussi efficace que possible, tout en respectant l'utilisation de la bande

passante, le nombre de requêtes et le stockage mobile.

i

Table of Contents

Acknowledgement ................................................................................................................................. i

Abstraction ................................................................................................................................................ ii

Résumé ....................................................................................................................................................... iii

Table of Contents .................................................................................................................................... i

List of Table ............................................................................................................................................. iv

List of Figures ........................................................................................................................................... v

Introduction .............................................................................................................................................. 7

Chapter 1 - Cloud Databases ............................................................................................................ 8

Overview ................................................................................................................................. 8

1.1 Cloud Database ................................................................................................................. 8

1.1.1 Architecture ............................................................................................................... 9

1.2 Advantages of Cloud Database ....................................................................................... 10

1.3 Disadvantages of Cloud Database .................................................................................. 11

1.4 Relational Databases and NoSQL Databases ................................................................. 12

1.4.1 Relational Database ................................................................................................. 12

1.4.2 NoSQL Database ..................................................................................................... 13

1.5 Challenges to Develop Cloud Databases ........................................................................ 14

1.5.1 Scalability ................................................................................................................ 14

1.5.2 Availability .............................................................................................................. 15

1.5.3 Consistency and Integrity ........................................................................................ 15

1.5.4 Database Security and Privacy ................................................................................ 15

1.6 Industry Practices in Cloud Databases ........................................................................... 16

Summary ............................................................................................................................... 17

Chapter 2 - Data Synchronization ................................................................................................. 18

Overview ............................................................................................................................... 18

2.1 Theoretical Models ......................................................................................................... 18

2.2 Synchronization techniques ............................................................................................ 20

2.2.1 Wholesale synchronization ...................................................................................... 20

2.2.2 Status flag synchronization ...................................................................................... 20

2.2.3 Timestamp synchronization ..................................................................................... 20

ii

2.2.4 Mathematical synchronization ................................................................................. 21

2.2.5 Log synchronization ................................................................................................ 21

2.3 Conflict resolution .......................................................................................................... 21

2.4 The CAP Theorem .......................................................................................................... 23

2.5 Related Work (State of art) ............................................................................................. 23

Summary ............................................................................................................................... 24

Chapter 3 - System Design ................................................................................................................ 25

Overview ............................................................................................................................... 25

3.1 Choosing a synchronization model ................................................................................. 25

3.2 A database model for synchronization ........................................................................... 28

3.2.1 Distributed Identity .................................................................................................. 28

3.2.2 Object Versioning (timestamps) .............................................................................. 29

3.2.3 Tracking Deletes ...................................................................................................... 29

3.3 Synchronization algorithm ............................................................................................. 30

3.4 Flexible Conflict Resolution ........................................................................................... 30

3.5 Modelling Requirements: Use Cases Diagram ............................................................... 30

3.5.1 Capturing the system requirement ........................................................................... 31

3.5.2 Search for actors (outside the system): .................................................................... 31

3.5.3 Capture Use Cases (Inside the system) ................................................................... 32

3.5.4 Use Case diagram .................................................................................................... 32

3.6 Modeling System Workflows: Activity Diagrams ......................................................... 33

3.6.1 Add Items on Mobile Database ............................................................................... 33

3.6.2 Edit an Item on Mobile Database ............................................................................ 34

3.6.3 Delete an Item from Mobile Databases ................................................................... 35

3.6.4 Add an Item on Cloud Database .............................................................................. 36

3.6.5 Edit an Item to Cloud Databases ............................................................................. 37

3.6.6 Delete an Item from Cloud Databases ..................................................................... 37

3.6.7 Start Synchronization process ................................................................................. 38

3.7 Modelling a System’s Logical Structure: Class Diagrams ............................................. 39

Summary ............................................................................................................................... 40

Chapter 4 - Implementation ............................................................................................................. 41

4.1 Issue tracker system ........................................................................................................ 41

4.1.1 Graphical User Interface .......................................................................................... 42

4.2 Implementation Detail .................................................................................................... 43

Summary ............................................................................................................................... 45

iii

General Conclusion .............................................................................................................................. 46

References ................................................................................................................................................ 47

iv

List of Tablé

TABLE 1: RDB AND NOSQL DATABASES COMPARISON ..................................................................................................... 14

TABLE 2: COMPARISON BETWEEN THE SYNCHRONIZATION TECHNIQUES................................................................................. 26

v

List of Figurés

FIGURE 1: CLOUD DATABASES ARCHITECTURE................................................................................................................. 10

FIGURE 2: COMBINATION OF ORDERED AND UNORDERED DATA IN AN OBJECT BASED SYSTEMS [8] ............................................. 19

FIGURE 3: REVISION DIAGRAM OF CLIENT A AND B BOTH MODIFYING THE SAME PROPERTY "ITEM1" ......................................... 22

FIGURE 4: THE SYNCHRONIZATION MODEL USING STATUS FLAGS AND TIMESTAMP SYNCHRONIZATION ........................................ 27

FIGURE 5: USE CASE DIAGRAM .................................................................................................................................... 32

FIGURE 6 ADD ITEM ON MOBILE DATABASE ACTIVITY DIAGRAM ........................................................................................ 33

FIGURE 7 : EDIT AN ITEM ON MOBILE DATABASE ACTIVITY DIAGRAM................................................................................. 34

FIGURE 8: DELETE AN ITEM ON MOBILE DATABASES ........................................................................................................ 35

FIGURE 9: ADD AN ITEM ON CLOUD DATABASE ............................................................................................................... 36

FIGURE 10: EDIT AN ITEM TO CLOUD DATABASES .......................................................................................................... 37

FIGURE 11: DELETE AN ITEM FROM CLOUD DATABASES ................................................................................................... 37

FIGURE 12 : START SYNCHRONIZATION PRESSES .............................................................................................................. 39

vi

Introduction

The number of connected mobile devices in the world is rapidly increasing. A report

from IDC estimates that 87% of connected devices sales by 2017 will be tablets and

smartphones [1], this huge percentage indicates that the total number of connected devices

will be growing rapidly. These devices have many differences from desktop computers,

because they have different purposes. These devices connect people to information, as their

social network, work information and emails.

As a solution for sharing information, synchronization of data between cloud databases

and mobile applications is developed. There are a lot of researches about synchronization

witch shows that the optimal solution depends on the context.

The goal of this report is to present a solution that overlap the gap of sharing data

between cloud databases and mobile applications.

The research of this report is driven by the following questions:

How do existing synchronization solutions apply to the domain of cloud databases and

mobile applications?

How can we simplify data synchronization between cloud databases and mobile

applications?

How can we optimize a synchronization process to reduce the usage of mobile resources

such as communication and computation on mobile devices?

This report is organized as follows. Chapter 1 - provides a clear overview about cloud

computing and cloud databases and its impact on mobile devices usage. Chapter 2 -gives

Theoretical Models about synchronization problem and the existing solutions. Chapter 3

discuss the design of our proposition for the synchronization problem. Finally a report about

the implementation is presented on 0.

8

Chapter 1 - Cloud Databasés

Overview

Cloud computing has been the most attractive technology in the recent times, database

has also moved to cloud computing. A database can be accessed by the clients via the

internet from the cloud database service provider and delivered to the users. In other words,

cloud database is designed for virtualized computer environment. The cloud database is

implemented using cloud computing that means utilizing the software and hardware

resources of the cloud computing service provider.

Relational databases ruled the Information Technology (IT) industry for almost 40

years. But last few years have seen changes in the way IT is being used and viewed.

Standalone applications have been replaced with web-based applications, dedicated servers

with multiple distributed servers and dedicated network storage.

This chapter will take a tour around the cloud databases and its advantages and

disadvantages, then will make a comparison between Relational databases and NoSQL

databases. Finally, will talk about Challenges in the cloud database development.

1.1 Cloud Database

Cloud databases are mainly used for data- intensive applications such as data

warehousing, data mining and business intelligence. These applications are read-intensive,

scalable and elastic in nature. Transactional data management applications such as banking,

airline reservation, online e- commerce and supply chain management applications are

write- intensive. Databases supporting such applications require ACID (Atomicity,

Consistency, Isolation and Durability) properties, but these databases are difficult to deploy

in the cloud. Cloud computing is growing at a very high pace in the IT industry around the

world. Many companies have started moving towards cloud computing and accessing their

9

data from cloud database. A survey has shown that almost 36 percent of the companies are

running applications through cloud services (Mimecast Survey, 2011). Cloud computing

can be referred as a new dimension in IT world in terms of cost saving and faster application

performance. This trend of the companies shows that in the near future, companies will start

relying on the cloud applications. Cloud database is mostly used as a service. It is also called

Database as a Service (DBaaS) [2].

1.1.1 Architecture

The cloud database holds the data on different data centers located at different locations.

This makes the cloud database structure different from the rational database management

system. This makes the structure of the cloud database a complex one. There are multiple

nodes across a cloud database, designed for query services, for data centers that are located

in different geological locations and the corporate data centers as well. This is linking is

mandatory for the easy and complete access of the database over the cloud services. There

are different methods for accessing the database over the cloud services, the user can access

it via computer through the internet, or a user using a mobile phone can access the cloud

database via 3G or 4G services (Pizzete and Cabot 2012), in the next figure we will describe

the cloud databases architecture. [2]

10

Figure 1: Cloud Databases Architecture

1.2 Advantages of Cloud Database

The cloud computing has given a new dimension to IT industry and the companies are

looking to adopt cloud services rather than investing a huge money in getting the

infrastructure for own database system. This advent in computing and cloud computing, the

cloud database is also picking up its pace in making its permanent place in IT world. There

are a number of advantages that make it preferable and adoptable by a huge number of

companies for its matchless services in a very cost saving manner. If the companies do not

get the services of a cloud database, then they will have to invest huge money for setting up

their own data centers and then hiring separate staff to manage and take care of all the data

center processes. Here are few advantages of adopting cloud database. [3]

1- The technology has changed the way of business, and now the people use to shop over

the internet and they rely on shopping for saving their time. This change in the business

has let the companies think about the fastest way they can do business over the internet.

There was a time when software needed to be installed to access the database of the

11

company but now a day the employees even don’t have time to install software on their

computer rather they prefer to use a ready to available resources. They prefer to use the

cloud database so that they can access the information stored in their database without

wasting any time. [3]

2- The other advantage of using a cloud database is that it saves a lot of money. The

company does not need to invest money in setting up their own data centers and then

managing it by hiring extra staff for this purpose. Moreover, after setting up a data

center, the company will need to buy the softwares as well and their maintenance is

also required. [3]

3- The cloud database service providers of DBaaS providers also make the customer free

from the tensions of making any immediate changes in the database. On the other hand,

the cloud database providers also offer scalability on the peak times that does not let

the performance of the company go down. [3]

4- Cloud computing has given the freedom to access the information from anywhere

without any boundaries of getting to your personal computer at home. This makes it a

very powerful technology and the companies prefer it as the customers, employees or

the authorities of the companies can get the formation they want from anywhere at any

time. [3]

5- There are many other benefits of cloud database as well, that makes it the best option

available to the larger organizations and companies who need to hold terabytes of data.

The cloud database makes the availability of data possible anytime from anywhere. [3]

1.3 Disadvantages of Cloud Database

As there are advantages of using a cloud database, there are disadvantages as well. The

disadvantages can be alarming sometimes for the companies.

1- The companies have to pay for the usage of the cloud database as per decided. Every

time the data is transferred from the database, the company will have to pay each time.

If the traffic of the company for transferring data with the database is high then the

company may be paying than its expectations. [3]

2- The other disadvantage of using a cloud database is that, we do not have a full control

over the server where our database is being held. We do not have the control over the

softwares installed on those computers. You cannot do anything to make the security

of cloud database strong. The client will have to rely on the provider only. The security

issues can be a big problem for the companies. [3]

3- The data you have hosted on the cloud database is totally dependent on the service

provider. The data and information about a company are the most important asset for

the organization. The organizations cannot afford to lose its information about its

customers and company policies. If the information is given in the wrong hands then

the company or the organization may face heavy losses. [3]

12

4- As there are masses of data hosted on the cloud database so it is very difficult to transfer

that data to your computer. For this purpose, internet speed must be high. On the other

hand, the traditional database can transfer data at a very high speed. [3]

5- If the client wants to switch database from one service provider to new one, then he

may face problems. The reason is that each service provides use their own methods and

techniques for storing data. The organization must be very careful about the selection

of DBaaS provider. [3]

6- In case of cloud database, the data is to be fetched via internet, so if the server is down,

then it may cause inability to access the data from the server. This causes huge losses

when the information is not available when needed.

1.4 Relational Databases and NoSQL Databases

In the earlier stages of computerization, there was more demand for transaction

processing applications. As the database industry matured and people accepted computers

as part and parcel of their lives, analytical applications became the focus of enterprises. Now

they wanted to store data not only for transaction processing, but to analyze consumer trends

and business needs. Enterprises want to use analytical knowledge to enhance their business

value. So, enterprise applications are broadly categorized into transactional and analytical

applications. Relational databases played dominant role in handling transactional data. Later

on, industry leaders like IBM and Oracle added analytical capabilities to their relational

databases for data mining applications. In the meantime, number of databases such as

Column databases, Object-oriented databases etc. came into market [4] [5]. But they could

not overpower the relational databases. Then Internet revolution and web 2.0 applications

started producing massive sparse and unstructured data. RDBMS are not suitable for

handling massive sparse data sets with loosely defined schemas. The need to store and

process such big data defined the role of NoSQL databases in the database technology as

Cloud databases. RDBMs and NOSQL databases are briefly discussed as follows:

1.4.1 Relational Database

The concept of relational databases is forty years old. It worked best in the era of

hardware limits such as small disk space, little memory, slow processor speed and limited

networking. It has rigid database architecture based on tables, columns, indexes,

relationships and schema. Data is stored in tables with predefined complex relationships.

Column indexes are used for faster search. Highly skilled Developers and DBAs are

13

required for database design and maintenance. Conventionally, they are used for

transactional databases. They include details at the lowest granularity. They contain

sensitive and operational data such as employee data and credit card numbers to handle

critical business operations. These databases are not well suited for Cloud environment as

they do not support full content data search and are difficult to scale beyond a limit [6] [7].

1.4.2 NoSQL Database

NoSQL means ‘Not Only SQL’ or ‘Not Relational’. A NoSQL database is defined as a

non-relational, shared- nothing, horizontally scalable database without ACID guarantees

[7]. NoSQL implementations are classified further into key/value stores, document stores,

object stores, tuple stores, column stores and graph stores. They can store and retrieve

unstructured, semi-structured and structured data. They are item-oriented. A domain can be

compared to a table and contains items having different schemas. The items are identified

by keys. All data relevant to a particular item is stored within that item. It improves

scalability of these databases as complex joins are not required to regroup data from multiple

tables. They have the ability to replicate and distribute data over many servers. They are

dynamically provisioned on demand.

They have emerged to address the requirements of data management in the cloud as they

follow BASE (Basically Available, Soft state, eventually consistent) in contrast to the ACID

guarantees. So, they are not suitable for update- intensive transaction applications. They

provide high availability at the cost of consistency [8].

14

RDB NoSQL Databases

Data within a database is treated as a “whole” treated as a “whole”

Each entity is considered an independent unit of data

and can be freely moved from one machine to the

other

RDBMS support centrally managed

architecture They follow distributed architecture.

They are statically provisioned. They are dynamically provisioned.

it is difficult to scale them. They are easy scalable

they provide SQL to query data They use API to query data (not feature rich as SQL).

ACID (atomicity, consistency, isolation and

durability) compliant; DBMS maintains

consistency.

Follow base (basically available, soft state, eventually

consistent); the user accesses are guaranteed only at

a single-key level.

They support on-line transaction processing

applications. They support web2.0 applications.

ORACL, MySAL, SQL server etc. Are popular

RDBMS.

Amazon simpledb, yahoo’s pnuts, couchdb etc. Are

popular nosql databases.

Table 1: RDB and NoSQL Databases Comparison

1.5 Challenges to Develop Cloud Databases

Cloud DBMSs should support features of Cloud computing as well as of traditional

databases for wider acceptability, which is a Hercules’s task. The potential challenges

associated with cloud databases are as follows:

1.5.1 Scalability

The main feature of Cloud paradigm is scalability which implies that resources can be

scaled-up or scaled-down dynamically without causing any interruption in the service. The

cloud database must be able to scale out itself when the workload increases. The scaling out

of the database helps in the best performance and efficiency of the cloud database. It puts

challenges on developers to develop databases in such a way that they can support and

handle unlimited number of concurrent users and data growth. Enterprises deal with huge

volumes of data. Adding additional servers on demand solve the problem of scalability, only

if the process and workload are parallelizable.

15

1.5.2 Availability

Availability of database implies that database is up and running 365 X 24 X 7. It

becomes necessary to replicate data across large geographic distances to provide high data

availability, durability and high levels of fault tolerance. Amazon’s S3 cloud storage service

replicates data across “regions” and “availability zones”.

1.5.3 Consistency and Integrity

Data integrity is the most critical requirement of all business applications and is

maintained through database constraints. The lack of data integrity results in unexpected

outputs. Cloud databases follow BASE (Basically Available, Soft state, eventually

consistent) in contrast to the ACID (Atomicity, Consistency, Isolation and Durability)

guarantees. So, Cloud databases support eventual consistency due to replication of data at

multiple distributed locations. It becomes difficult to maintain the consistency of a

transaction in a database which changes too quickly especially in the case of transactional

data. Developers need to follow BASE approach cautiously. They should not compromise

data integrity in their over enthusiasm to move to cloud databases.

1.5.4 Database Security and Privacy

Data physically stored in a particular country, is subject to local rules and regulations

of that country. The US Patriot Act allows the government to demand access to the data

stored on any computer. Amazon S3 only allows a customer to choose between US and EU

data storage options. If data is encrypted using a key not located at the host, then it is little

safer. Risks are involved in storing transactional data on an untrusted host. Sensitive data is

encrypted before being uploaded to the cloud to prevent unauthorized access. Any

application running in the cloud should not have the ability to directly decrypt the data

before accessing it. Providing security and privacy to different databases on the same

hardware is also a big challenge.

16

1.6 Industry Practices in Cloud Databases

Cloud databases are designed to minimize the number of hardware. They scale out

easily by distributing the database across multiple hosts/nodes as the load increases. NoSQL

databases have become synonym for cloud databases. Few commonly used cloud databases

in the industry are listed below.

Amazon Simple Storage Service (S3) and Databases

Amazon SimpleDB

Google App's Bigtable

Hadoop

Windows Azure Cloud Storage

Microsoft SQL Server Data Services (SDDS)

MongoDB

17

Summary

This chapter has outlined the concept of cloud database, and presented some of cloud

databases aspects, Cloud databases appear to be a good solution for handling the companies’

problems, many of them have started relying on the cloud computing. The massive data

generated by web-based applications have changed the whole database concepts and

scenarios. The datacenters are so expensive so not all organizations are able to buy their

own datacenters. The cloud database makes the datacenters available for all organizations

of any size, and the growing popularity of Cloud databases is marking the beginning of new

era of databases. Though cloud databases are not ACID compliant, they are able to handle

massive workloads of web-based applications. Cloud computing and Cloud databases are

set to rule the next decade by overcoming the limitations they have.

18

Chapter 2 - Data Synchronization

Overview

Synchronization is a well-researched problem, because it is used within a wide range

of software applications. The problem is having data located in deferent hosts, each host has

a copy, but requiring all hosts to be always connected with the system is not a good practice.

Data synchronization is used in serval computer science fields like database, file system

and version control. In case of software solution databases system, data synchronization is

used to keep databases equal to each other. In case of file system, it is used for cooperation

on the same file (google drive, box.com …). In case of version control systems (like Git and

Subversion) it is used to handle and watch change sets.

As we saw in the previous chapter, the current movement of cloud services makes

synchronization process a hot topic again, our report has a goal of data synchronization

focusing on cloud databases with multi mobile devices.

The previous chapter introduced the concept of cloud databases and mobile services.

This chapter will focus on the core of synchronization by presenting theoretical models

(Section 2.1 ) and will list deferent techniques for synchronization with their pros and cons

(Section 0) and conflict resolution (Section 2.3 ). It will also mention the theoretical limits

that a synchronization solution has (Section 2.4 ).

2.1 Theoretical Models

Data synchronization is divided into two domains: ordered data and unordered data. “a

b c” has another meaning then “a c b”, which means ordered data. But in set theory {a, b,

c} is equivalent to {a, c, b}, thus categorized as unordered data. It is hard to have an efficient

solution for the synchronization problem when we consider the data model as ordered data.

However the real-world shows that the data model is a combination of the two models, only

19

the property values must be handled as ordered data. Figure 2: Combination of ordered and

unordered data in an object based systems shows an overview of ordered and unordered data

at object based system [9]

Figure 2: Combination of ordered and unordered data in an object based systems [9]

For the solution of our report, we might simplify the problem by perceiving the data as

combination of ordered and unordered model. This means that the resolution of problems

on the level of property values cannot be solved by merging, but has to be solved by

selecting one version.

The problem is based on few remote clients, each one has a set of different data, so the

clients need to know the updates and calculate the difference of few sets. The challenge here

is to restrict the synchronization to a minimum amount of communication.

20

2.2 Synchronization techniques

This section describes the techniques that currently exist in unordered data synchronization:

2.2.1 Wholesale synchronization

The wholesale approach is the simplest algorithm, when the data is synchronized, one

of the devices sends all local data to other devices. The other devices compute the

differences and return back the updated data. This is too inefficient, because usually few

changes are made, while all data is sent over the network to make the comparison. It has the

advantage that it guarantees that all changes are transmitted and it is so simple to implement.

[10] [11]

2.2.2 Status flag synchronization

With status flag synchronization a client maintains information about the data in the

form of status-flag. These flags indicate if an item was modified, deleted or created. When

synchronizing, the client just sends the items which have a flag set. This is more efficient

then the wholesale synchronization. If the system performs the synchronization with

multiple client, this does not work well. In addition to which data has been changed, we

also need to determine the information about with whom we have synchronized the updates.

[10]

2.2.3 Timestamp synchronization

When timestamp synchronization is used, each client maintains information about the

last time data was changed, timestamp per client represents the last synchronization with

this client. During synchronization, only changed items since the last synchronization have

to be sent. This is an improvement over status flag, it is very inefficient in a situation where

two clients are both fully synchronized with other clients, but in case of first time

synchronization they both will send all data, while there are no changes between them. [10]

21

2.2.4 Mathematical synchronization

This approach uses mathematical properties of data that need to be synchronize. Choi

et. al. uses a Message Digest to determine the changes on data that needs to be synchronized

[12]. Synchronization based on Message Digest is a form of mathematical synchronization.

This solution is independent of database feature vendors, and only uses standard SQL

operation. However this solution requires additional tables on the server and it is highly

dependent on the relational databases model, they use JOINs and foreign keys. And they

have a table for each client on the server. [10] [12]

2.2.5 Log synchronization

Log synchronization approach is used a lot in databases, it is used for tracking changes

on data, and saving them in logs, then, these logs are synchronized with other clients. When

a log is synchronized, every operation is replayed on other clients. Logs can grow

significantly as they store all operations in addition to the normal data. [10]

Data synchronization can work in a single direction or both directions, synchronization

in both directions is called bi-directional synchronization. When the change occurs only on

the server, and the data is read-only on a client, we only retrieve the changes from the server,

this called download-only synchronization. When items only used and modified by a single

client, so the client have to send new changes to the server, this called upload-only

synchronization. [13]

2.3 Conflict resolution

When two clients synchronize data, they may cause conflicts while synchronizing.

Considering the following events, which are also shown in the revision diagram in Figure

3: Revision diagram of Client A and B both modifying the same property "item1" [14]:

Client A get item 1 from the server.

Client B get item 1 from the server.

Client A and B go offline.

Client A makes a change to item 1 and go online to synchronize.

22

Client B makes a change to the same property as Client A and go online again to

synchronize.

Figure 3: Revision diagram of Client A and B both modifying the same property "item1"

We have a conflict when client B tries to synchronize, the conflict resolution needs to

be performed. Many resolution polices have been identified [15], in addition to other custom

policies that can be used for a specific applications:

Originator Wins: Take the data item of the originator

Recipient Wins: Take the data item of the recipient

Client Wins: Take the data item of the client

Server Wins: Take the data item of the server

Recent Wins: Take the data item which has been updated recently in time

Duplication Apply: The requested modification is applied on a duplicated data

item while keeping the existing data item.

23

2.4 The CAP Theorem

In theoretical computer science, the CAP theorem (known as Brewer's theorem) for a

distributed system states that it is impossible for any system to simultaneously provide all

three of the following guarantees:

Consistency. (all nodes see the same data at the same time)

Availability. (a guarantee that every request receives a response about whether it

succeeded or failed)

Partition Tolerance. (the system continues to operate despite arbitrary message loss

or failure of part of the system)

This theorem was proven in 2002 when Nancy Lynch and Seth Gilbert published a proof of

CAP theorem [16].

2.5 Related Work (State of art)

Bayou is a platform that replicate mobile databases on which to build collaborative application.

This work have several users sharing data while being disconnected from the rest of the system.

[17]

Xmiddle is a mobile middleware for sharing transparency of XML documents in P2P

networks, it support sharing of tree-structured data between peers; each offer an access point

the other, to replicate the data or manipulate its data online for working offline. [18]

SodaSync is a framework that provide generic synchronization model for mobile enterprise

applicatins [19]

SyncML is a specification for an interoperable data synchronization framework using XML-

based model. [20]

Rsync is a software application network protocol for windows system and Unix. [21]

24

Summary

The data is divided into ordered and unordered. In reality, almost data we need to

synchronize is a combination of two models. To simplifying the problem, we perspective

the data as unordered. There are techniques that currently exist to solve this problem, each

one has its advantages and disadvantages. At the end of synchronization process, the conflict

resolution policy need to be performed.

The next chapter will design a system for synchronization inspired from the existing approach.

25

Chapter 3 - Systém Désign

Overview

In the previous chapter we have seen work done by others on data synchronization. In this chapter we

will discuss the design of our proposition for the synchronization problem. This includes choosing a

synchronization model to perform the synchronization between the cloud database and mobile (Section 3.1

), then determining some of enhancements we suggest to the data structure (Section 0), the proposition

details in (Section 3.3 ), also creating a flexible conflict resolution (Section3.4 ), UML diagrams

(Section 3.5 ), (Section 3.6 ) and (Section 3.7 ).

We consider the result of this chapter as a detailed design that we can implement at the next chapter.

3.1 Choosing a synchronization model

The synchronization model is the core of synchronization solution. Many synchronization models are

available and discussed in (Section 0). This section will criticize them and list their advantage and

disadvantage. They are summarized at Table 2: Comparison between the synchronization techniques

Wholesale synchronization approach is the easiest one, but requires unnecessary amount of bandwidth,

because the system needs to send all data each time we synchronize. On mobile device this data usage has

an impact on the user data plan and on the energy usage as well.

Mathematical synchronization approach can cost a lot of computing, the approach by Choi et. al.

requires a big amount of stored tables on the server. The amount is linear dependent on the number of

clients.

Status flag synchronization approach works well to keep track of locally made data updates on the

client, but on the server, keeping track for every client status flag can grow out of control and requires

every change on the server to update all status flag for each client. It is clear that this approach is not a

viable solution on the server, because of the amount of space used increases linear with the amount of

clients. At the client, status flags allow the system to track multiple changes to a one single object as one

single change.

26

Timestamp synchronization approach is good to keep track of changes in general. This approach has

disadvantage is that the type of change is not saved and the delete operation is tricky. We have an option

to track deletes is to clear timestamp when item is deleted, and then we lost timestamp information.

However timestamp works significantly better for the server, because only one single timestamp per data

item needs to be recorded to make it work. It is very important that client’s clock is correct for timestamp

synchronization to work correctly.

Log synchronization approach works for keeping track of local updates made while being offline, but

will use additional memory for storing the changes. And it doesn’t allow us to collapse multiple changes

to a single object into a single change. At the server we still need to maintain information about the time

when the log was last synchronized with a client. Log synchronization technique has the advantage that the

change can be replayed the same on the sever

Ban

dw

idth

eff

icie

ncy

Sto

rage

effi

cien

cy

Tra

ck s

erver

chan

ges

Tra

ck l

oca

l ch

ang

es

Com

puta

tional

com

ple

xit

y

Wholesale synchronization - + n/a n/a +

Mathematical synchronization +/- + n/a n/a -

Status flag synchronization + + - + +

Timestamp synchronization + + + +/- +

Log synchronization + - - + +

- = Weak, + = Strong, +/- = Normal, n/a = not applicable

Table 2: Comparison between the synchronization techniques

We want the both download and upload synchronization steps to be as efficient as possible, with

respect with bandwidth usage, number of requests and storage. So we choose the timestamp approach for

download synchronization and status flag approach for upload synchronization. Both method are

guaranteed to find the precise updates made remotely and locally, and therefore allow us to send and receive

updates by using a single request each. The figure Figure 4: The synchronization model using status flags

and timestamp synchronization shows the synchronization model.

27

Id … Status Id … __version

EE7B… … 0 EE7B… … 143216034

1530

CV87A… … 3 CV87A… … 143216033

0303

Figure 4: The synchronization model using status flags and timestamp synchronization

These timestamps are generated on the server for each change to an item, it should be unique per table.

On the server we can use the latest assigned timestamp to detect which data have changed since the last

synchronization. These timestamps have a further discussed in (Section 3.2.2 )

To detect the local changes made on the client, we use status flags. These flags can indicate if this item

was unchanged, modified, deleted or inserted. Local changes are now very easy to find. They are the items

whose status flag not equal to ‘unchanged’.

Multiple changes to the same item will be only in a single change being saved and the same holds for

deleted and updated to a locally inserted item.

Timestamps

Status flags

28

3.2 A database model for synchronization

To make our synchronization model work correctly, we have to make some changes to the structure of the

data we store and the information we saved for each data item.

3.2.1 Distributed Identity

To support creating new data items locally on a client while it is offline, the system need to use an

identity that can easily be created on a client, with avoiding assigning the same identity twice or more in

different clients. An incremental number identity such as integers is inefficient in this case because we

cannot determine the next unique integer for all the available clients. A distributed identity is needed that

can be generated independently and uncoordinated on a client, with the smallest probability of generating

a duplicate.

There are options for distributed identity, each one has its advantages and disadvantages: [17]

GUIDs:

GUID or UUID is 128-bit identifiers and are commonly displayed as 32 hexadecimal digits with groups

separated by hyphens, such as {5c8913f2-e2b4-11e4-8a00-1681e6b88ec1}. The Internet Engineering

Taskforce (IETF) standardize the UUID format in rfc4122 [18]. GUID are generated from random

numbers. GUIDs are not 100% unique, if we generate 2128+1 GUID, the duplicate probability will be 1.

Note that the generation of (2128) GUIDs would take 10790283070806 years with a billion computer that

generate a billion GUIDs per second, so this identifiers is unique enough for choose it as distributed Identity

Time-based/network-based identifiers

Identifiers can be generated based on a combination of the MAC address of the machine and the current

time. MAC address is assigned to network adapter by the manufacturers, they use a system that ensure all

the addresses are unique. So an identifier containing this address would be globally unique, but in

virtualized environment MAC address is generated locally and therefore not necessarily unique. There is

other problem in the case where multiple processors generate an identifier at the same time, the results will

be the same identifier. This problem happen when the system time changed or multiple applications run on

the same device

29

Hierarchical identifiers

This identifier consist of a global client identifier combined with a number generated locally on a

device. The global identifier is assigned by the server to each client. Both parts are guaranteed to be unique,

the resulting number will be globally unique. In this case global id should be generated for each client, and

the client needs to be able to generate the local identifier. This adds some work should be done by the

server, so it has to track the clients, but this is not possible on every client platforms, because not all of

them offers uniquely identify an application. In addition to that the identities could change due to temporary

data weep or reinstalling the application.

In our implementation we use GUIDs, since both alternative ways have problems that are hard to solve,

GUIDs are robust and the chances of a collision are very rare.

3.2.2 Object Versioning (timestamps)

Every item will be assigned a timestamp, which is unique per table and can be seen as the current version of

the item. We will use the term “version” to reference to these vector timestamps. These timestamps will have

two purposes:

- Change detection for download synchronization

- Conflict detection when an item is modified or deleted

When the system perform download synchronization, so it include the most recent assigned timestamp

in the response for a request. The next time the client sends the same request, it includes this version number

in the request to get only the changed items since the last time the request was sent. This includes additions,

modifications and deletes.

When an element is changed we can use the timestamp to check if this item has not been changed on

the server by others while this change was made.

3.2.3 Tracking Deletes

When item deleted we cannot remove that item from the server. Maybe there still be clients who are

working offline with this item, they need to be notified of the deletion when they synchronize. Instead of

remove a deleted item from the database, the system have to put it into a deleted state indicate the item was

removed. We do this by recording its state in a special flag on the server.

30

3.3 Synchronization algorithm

In this section we will describe the algorithm that we used to synchronize the changes. The

synchronization algorithm keeps the local databases synchronized with the remote database server. When

a synchronization request made the algorithm executes these steps:

- Synchronize all items which there status flag is not “unchanged”.

- The system sends the request to the server including the last version that is stored for the request URI

and insert the response (the changes) into the local database.

- Replay the request to the local database and return the changes from the server.

In the case when the user is offline, the steps that send the request to the server are omitted.

For subsequent requests, the version that is received with the last request is included in the URI. This allows

the server to only send modification since last time the request was sent.

Conflict detection is done on the server between pulling and pushing proses.

When the application is offline, the algorithm needs to handle situations that can occur when changes are made

while offline.

- A new item is created locally and then update locally, after the update the item should still have the

“inserted” status instead of change it to “Updated”. The item is still a new item for the server.

- A new item is created locally and then deleted, after delete the item should be deleted and the server

should never know about it. The item deleted locally and we cannot send a deleted request for an item

that the server does not know about.

3.4 Flexible Conflict Resolution

Conflict resolution is an important aspect for the synchronization solution, the system have to detect

conflicting modification by checking if the version of the incoming item and the item in the cloud database

match or not. A conflict occurs when they are deferent. Modification of a deleted items is detected by checking

if the item’s state marked as “deleted”. In Section 2.3 we liste different strategies that can be used to resolve

conflicts.

3.5 Modelling Requirements: Use Cases Diagram

Use cases diagrams describe a system’s requirements from outside looking in, they specify the value that the

system deliver to the user.

31

3.5.1 Capturing the system requirement

In the following, we shall describe the system requirement:

Requirement 1:

The modelled system shall allow mobile users to (View/add/edit/delete) items locally, in order to cache it for

synchronization later.

Requirement 2:

The system shall allow web API users to (View/add/edit/delete) items in cloud database, in order to be

synchronized with mobile devices later.

Requirement 3:

Mobile users should trigger the synchronization process to be up-to-date with the cloud database.

Requirement 4:

The system need to check (mobile/ Web API) user’s identity via a “user credential web service”

3.5.2 Search for actors (outside the system):

Based on the previous requirement we search for actors and determine its interaction with the system.

Mobile User

We capture “Mobile User” actor as it described in requirements 1, 3, 4. They indicate mobile user interaction

with the system:

(View/add/edit/delete) items from local database.

Trigger Synchronization process.

Identity checking

Web API User

We capture “Web API User” as it described in requirement 2, 4. They indicate his interaction with the system:

(View/add/edit/delete) items from cloud database

Identity checking

User credential web service

We capture “user credential web service” as it described in requirement 4. It responsible for:

Checking the (Web API /Mobile) User identity to provide the data.

32

Giving (Web API/Mobile) Users the right for interact with cloud database.

3.5.3 Capture Use Cases (Inside the system)

Base on the previous requirement we create 6 use cases:

(View/Add/Edit/Delete) items from mobile database

(View/Add/Edit/Delete) items from cloud database

Trigger Synchronization

Conflict Resolution

Log in / Log out

Check Identity

3.5.4 Use Case diagram

Figure 5: Use Case Diagram

33

3.6 Modeling System Workflows: Activity Diagrams

Activity diagrams allows us to specify how our system will achieve its goals

3.6.1 Add Items on Mobile Database

Figure 6 Add Item on Mobile Database Activity Diagram

34

3.6.2 Edit an Item on Mobile Database

Figure 7 : Edit an Item on Mobile Database Activity Diagram

35

3.6.3 Delete an Item from Mobile Databases

Figure 8: Delete an Item on Mobile Databases

36

3.6.4 Add an Item on Cloud Database

Figure 9: Add an Item on Cloud Database

37

3.6.5 Edit an Item to Cloud Databases

Figure 10: Edit an Item to Cloud Databases

3.6.6 Delete an Item from Cloud Databases

Figure 11: Delete an Item from Cloud Databases

38

3.6.7 Start Synchronization process

39

Figure 12 : Start Synchronization presses

3.7 Modélling a Systém’s Logical Structuré: Class Diagrams

40

Summary

In this chapter we discussed the design of our synchronization solution system. This solution have the

following characteristics:

- Status flag synchronization combined with Timestamp synchronization. As discussed in

(Section 3.1 ), our solution used status flag synchronization on the client and timestamp

synchronization on the server.

- Database model. In (Section 0), we defined the properties that the data items should have. Each data

item has GUID identifier and a version timestamp to identify the changes and conflicts.

- Synchronization algorithm. Our algorithm is discussed in (Section 3.3 ) and modeled in (Section 3.5

), (Section 3.6 ) and (Section 3.7 ). The system keeps the local database synchronized with the

server and always pushes the changed items then detect the conflicts. After the push operation has

completed, the changes results on the local database are applied.

- Conflict detection and resolution. The conflicts detection is always resolved on the server.

(Section 3.4 )

In the next chapter we will describe our motivation example “gdevTracker”

41

Chapter 4 - Impléméntation

Overview

In chapter 3, we designed the solution for data synchronization. We determined the steps that the

algorithm should care about.

In this chapter we will present the implementation solution that is used in “gdevTraker application” as

an “issue tracking system” for the project sponsor “gamadev company” and as a motivation example.

This chapter Starts with introducing “issue tracking system - gdevTracker” in (Section 4.1 ) and

showing some Graphical User Interfaces, then implementing detail in (Section 4.2 ).

4.1 Issue tracker system

This section describe the “issue tracking system” application which is used as backend for mobile

application.

“An issue tracking system is a computer software package that manages and maintains lists of issues,

as needed by an organization.

Issue tracking systems are commonly used in an organization's customer support call center to create,

update, and resolve reported customer issues” – Wikipedia [19]

The application divided into two components (Web, Mobile). They are lives in different sites. The web

application interact with the cloud database, and the mobile application interacting with the local database.

42

4.1.1 Graphical User Interface

Web Application

Mobile Application

43

4.2 Implementation Detail

The previous sections introduce the project and present some of graphic user interfaces. This section gives

snippets for some parts of application. The more important part of the application is the synchronization

that is written with JavaScript programing language: for Web API

// rout for start synchronization by requesting the link:

// http://localhost:3000/sync

router.post('/sync/',function(req,res,next){

// get the uploaded data

var tasks = req.body.tasks

// get last synchronization from mobile device

var lastSync = req.body.lastSync

var version = Number(new Date());

// loop for (add/edite/delete) uploaded data

for(var i in tasks){

tasks[i]._v = version;

switch(tasks[i].flag){

case 1: //create task

var task = new Tasks(tasks[i]);

task.save()

break;

case 2://edit task

tasks[i]._v = version;

Tasks.update({_id:tasks[i]._id},tasks[i])

break;

case 3://delete task

tasks[i].deleted = true;

Tasks.update({_id:tasks[i]._id},tasks[i])

break;

}

}

// query for the downloaded data

Tasks.find({'_v':{'$gt':lastSync}},function(err,tasks){

//return back downloaded data

res.json({synced:true,_v:version,changes:tasks});

})

})

44

For mobile:

We host our code on GitHub repository:

https://github.com/moussaoui91/mobile

https://github.com/moussaoui91/cloud

//upload the changed data since last synchronization

Tasks.changedItems().then(function(tasks){

$http.post('http://'+$rootScope.hostname+':3000/sync',{tasks:tasks,lastSync:$localstorage.get('lastSync'

)}

// handle the response (returned data)

.then(function(res){

var serverTasks = res.data.changes;

Tasks.synced(res.data._v);

$localstorage.set('lastSync',res.data._v)

// loop for add / edite / delete data from mobile

for(var i in serverTasks){

serverTasks[i].id = serverTasks[i]._id

if (serverTasks[i].deleted) {

Tasks.remove(serverTasks[i])

}else{

Tasks.edit(serverTasks[i]).then(function(res){

if(!res.res.rowAffected){

Tasks.add(res.task)

}

})

}

}

})

});

https://github.com/moussaoui91/mobile

https://github.com/moussaoui91/cloud

45

Summary

This chapter described the motivation example of our synchronization solution between mobile

application and the cloud. With presenting some GUIs and code snippets,

The data of our motivation example hosted on Mongolabe.com, It is so easy to configure a cloud

database through those services, the interfaces is so simple without need for user guides, with code hosted

on Github services.

46

Général Conclusion

Although network connection are getting more faster, there will always be a situation

where the connection does not work well. It will be great if applications continue to work.

With our proposition for synchronization, it is possible to add offline support.

We tried our best to find answers for the questions we put in the Introduction Section.

How do existing synchronization solution apply to the domain of cloud databases

and mobile applications?

The existing solutions described in (Section 0) shows examples of implementation that are

used in domain of mobile devices. Each one has its advantages and disadvantages as

described below:

- Wholesale approach: One of the devices sends all local data to other device, the other

device compute the differences and return back with updated data.

- State Flag approach: The client maintain information about the data in form of status

flag, the client just sends the items witch have a flag set.

- Timestamp approach: each client maintains information about the last time data was

changed, only changed items since the last sync have to be synchronized.

- Mathematical approach: This approach uses mathematical properties of data that

need to be synced.

- Log approach: It is used for tracking changes on data, and saving them in logs, these

logs are synced with other clients.

How can we simplify data synchronization between cloud databases and mobile

applications?

The Data is divided into Ordered-Data and Unordered-Data to simplify the problem we

perspective the data as a combination of ordered and unordered data.

How can we optimize a synchronization process to use minimal amount of

communication and computation on mobile device?

To optimize the computation on mobile devices the best idea is calculate remotely and

use the data locally, that is means executing the calculation process in the server and

receive the results in the mobile to use them.

In order to reduce the communication between the mobile devices and the server we

integrate the changes information in the data themselves, and ignoring un-useful data.

We can explain the idea with this state: for a data item to be synced checking the state

flag to decide if this item synced or no, rather than making the sync process in all cases.

If we do not need some data, so we do not communicate for.

47

References

[1] L. Columbus, "IDC 87% of connected devices by 2017 will be tablets and

smartphones," 12 January 2013. [Online]. Available:

http://www.forbes.com/sites/louiscolumbus/2013/09/12/idc-87-of-connected-devices-

by-2017-will-be-tablets-and-smartphones/. [Accessed 3 4 2015].

[2] W. Shehri, "CLOUD DATABASE DATABASE AS A S ERVICE," IJDMS, vol. 5, no.

12.

[3] A. Indu and G. Anu, "Cloud Databases: A Paradigm Shift in Databases," IJCSI

International Journal of Computer Science, vol. 9, no. 4, 7 2012.

[4] K. Donald, K. Tim and L. Simon, "An Evaluation of Alternative Architectures for

Transaction Processing in the Cloud," SIGMOD, no. 10, 2010.

[5] Daniel and Abadi, Column-oriented Database Systems, VLDB, 2009.

[6] T. R. Singh, "Cloud Computing: An Analysis," International Journal of Enterprise

Computing and Business Systems, vol. 1, no. 2, pp. 2230-8849, 2011.

[7] R. Cattell, "Scalable SQL and NoSQL Data Stores," ACM SIGMOD, vol. 39, no. 4, pp.

12-27, 2011.

[8] A. Mathur, "Cloud Based Distributed Databases: The Future Ahead," International

Journal on Computer Science and Engineering (IJCSE), vol. 3, no. No, 2011.

[9] C. M. Melman, A Generqtive Approach for Data Synchronization between Web and

Mobile, Delft University of Technology, 2013.

[10] S. David, T. Ari and A. Sachin, Efficient PDA Synchronization, vol. 1, IEEE

Transactions on mobile Computing, 2003, pp. 40-51.

[11] A. S, S. D and T. D, On The Scalability Of Data Syncrozation Protocols for PDAs and

Mobile Devices, vol. 4, Network, IEEE, 2002, pp. 22-28.

[12] M.-Y. Choy, E.-A. Cho, D.-H. Park, C.-J. Moon and D. K. Baek, A Database

Syncrozation Algorithm for Mobile Devices, vol. 2, Consumer Electronics, IEEE

Transactions, 2010, pp. 392-398.

[13] microsoft, microsoft, Mar 2014. [Online]. Available: http://msdn.microsoft.com/en-

us/library/bb726039.aspx. [Accessed 09 04 2015].

48

[14] S. Burckhardt, M. Fähndrich, D. Leijen and a. B. P. Wood, "Cloude Types for Eventual

Consistency," pp. 283-307, 2012.

[15] Y. Ledd, Y. Kim and H. Choi, "Conflict Resolution of Data Synchronization in Mobile

Environment," vol. 3044, pp. 196-205, 2004.

[16] S. Gilbert and N. Lynch, "Brewer's Conjecture and the feasibility of Consistent,

Available, Partition-tolerant Web Services," pp. 51-59, June 2002.

[17] P. Lucas, Mobile Devices and Mobile Data-Issues of Identity and Reference., Vols. 2-4,

pp. 323-336.

[18] M. M. S. R. Leach P. [Online]. Available: http://www.ietf.org/rfc/rfc4122.txt. [Accessed

14 04 2015].

[19] WikiPedia, "Issue tracking system," [Online]. Available:

http://en.wikipedia.org/wiki/Issue_tracking_system. [Accessed 27 05 2015].

[20] B. Rajkumar and al, Cloud computing and emerging IT platforms: Vision, hype, and

reality for delivering computing as the 5th utility, vol. 25, Future Generation Computer

Systems, 2009, pp. 599-616.

[21] wikipidia, [Online]. Available: http://en.wikipedia.org/wiki/Issue_tracking_system.

[Accessed 04 2015].

Documents

Minister of Higher Education and Scientific · PDF fileMinister of Higher Education and Scientific Research ... Synchronization algorithm for cloud databases on mobile ... 1.3 Disadvantages