Upload
truonganh
View
218
Download
1
Embed Size (px)
Citation preview
People's Democratic Republic of Algeria
Minister of Higher Education and Scientific Research
El-oued University Faculty of Science and Technology Department of Computer Science
№ Ordre:
№ Serial:
LMD Report Master Option: Artificial Intelligence and Distributed Systems
2014/2015
Prepared by: Abderrazak HENKA
Abderrahman MOUSSAOUI
Sustained: Mouhamen Anouar NAOUI
Synchronization algorithm for cloud databases on mobile
application
Acknowledgement
We thanks ALLAH before all, after that all and the best thanks is for our prof and supervisor
Mouhamen Anouar NAOUI, because he give us the courage and the hopeful we thanks him a lot.
ii
Abstraction
Mobile application Development is relatively a new and popular domain. Mobile
applications mainly connect users to data and information that come from the internet.
However connection is not always available and reliable. Therefore, developers need to
make their applications work without connection.
Our report proposes data synchronization algorithm between a mobile and the cloud.
This algorithm is inspired from some other algorithms that already exist, to make
synchronization as efficient as possible, respecting bandwidth usage, number of requests and
mobile storage.
iii
Ré sumé
Le développement des applications mobile est un domaine relativement nouveau et
populaire. Les applications mobiles connectent principalement les utilisateurs aux données
et informations qui viennent de l'Internet.
Cependant la connexion n’est pas toujours disponible et fiable. Par conséquent, les
développeurs ont besoin de faire leurs applications fonctionnent hors ligne.
Notre thèse propose un algorithme de synchronisation de données entre le mobile et le
cloud. Cet algorithme est inspiré de certains autres modèles qui existent déjà, pour faire la
synchronisation aussi efficace que possible, tout en respectant l'utilisation de la bande
passante, le nombre de requêtes et le stockage mobile.
i
Table of Contents
Acknowledgement ................................................................................................................................. i
Abstraction ................................................................................................................................................ ii
Résumé ....................................................................................................................................................... iii
Table of Contents .................................................................................................................................... i
List of Table ............................................................................................................................................. iv
List of Figures ........................................................................................................................................... v
Introduction .............................................................................................................................................. 7
Chapter 1 - Cloud Databases ............................................................................................................ 8
Overview ................................................................................................................................. 8
1.1 Cloud Database ................................................................................................................. 8
1.1.1 Architecture ............................................................................................................... 9
1.2 Advantages of Cloud Database ....................................................................................... 10
1.3 Disadvantages of Cloud Database .................................................................................. 11
1.4 Relational Databases and NoSQL Databases ................................................................. 12
1.4.1 Relational Database ................................................................................................. 12
1.4.2 NoSQL Database ..................................................................................................... 13
1.5 Challenges to Develop Cloud Databases ........................................................................ 14
1.5.1 Scalability ................................................................................................................ 14
1.5.2 Availability .............................................................................................................. 15
1.5.3 Consistency and Integrity ........................................................................................ 15
1.5.4 Database Security and Privacy ................................................................................ 15
1.6 Industry Practices in Cloud Databases ........................................................................... 16
Summary ............................................................................................................................... 17
Chapter 2 - Data Synchronization ................................................................................................. 18
Overview ............................................................................................................................... 18
2.1 Theoretical Models ......................................................................................................... 18
2.2 Synchronization techniques ............................................................................................ 20
2.2.1 Wholesale synchronization ...................................................................................... 20
2.2.2 Status flag synchronization ...................................................................................... 20
2.2.3 Timestamp synchronization ..................................................................................... 20
ii
2.2.4 Mathematical synchronization ................................................................................. 21
2.2.5 Log synchronization ................................................................................................ 21
2.3 Conflict resolution .......................................................................................................... 21
2.4 The CAP Theorem .......................................................................................................... 23
2.5 Related Work (State of art) ............................................................................................. 23
Summary ............................................................................................................................... 24
Chapter 3 - System Design ................................................................................................................ 25
Overview ............................................................................................................................... 25
3.1 Choosing a synchronization model ................................................................................. 25
3.2 A database model for synchronization ........................................................................... 28
3.2.1 Distributed Identity .................................................................................................. 28
3.2.2 Object Versioning (timestamps) .............................................................................. 29
3.2.3 Tracking Deletes ...................................................................................................... 29
3.3 Synchronization algorithm ............................................................................................. 30
3.4 Flexible Conflict Resolution ........................................................................................... 30
3.5 Modelling Requirements: Use Cases Diagram ............................................................... 30
3.5.1 Capturing the system requirement ........................................................................... 31
3.5.2 Search for actors (outside the system): .................................................................... 31
3.5.3 Capture Use Cases (Inside the system) ................................................................... 32
3.5.4 Use Case diagram .................................................................................................... 32
3.6 Modeling System Workflows: Activity Diagrams ......................................................... 33
3.6.1 Add Items on Mobile Database ............................................................................... 33
3.6.2 Edit an Item on Mobile Database ............................................................................ 34
3.6.3 Delete an Item from Mobile Databases ................................................................... 35
3.6.4 Add an Item on Cloud Database .............................................................................. 36
3.6.5 Edit an Item to Cloud Databases ............................................................................. 37
3.6.6 Delete an Item from Cloud Databases ..................................................................... 37
3.6.7 Start Synchronization process ................................................................................. 38
3.7 Modelling a System’s Logical Structure: Class Diagrams ............................................. 39
Summary ............................................................................................................................... 40
Chapter 4 - Implementation ............................................................................................................. 41
4.1 Issue tracker system ........................................................................................................ 41
4.1.1 Graphical User Interface .......................................................................................... 42
4.2 Implementation Detail .................................................................................................... 43
Summary ............................................................................................................................... 45
iii
General Conclusion .............................................................................................................................. 46
References ................................................................................................................................................ 47
iv
List of Tablé
TABLE 1: RDB AND NOSQL DATABASES COMPARISON ..................................................................................................... 14
TABLE 2: COMPARISON BETWEEN THE SYNCHRONIZATION TECHNIQUES................................................................................. 26
v
List of Figurés
FIGURE 1: CLOUD DATABASES ARCHITECTURE................................................................................................................. 10
FIGURE 2: COMBINATION OF ORDERED AND UNORDERED DATA IN AN OBJECT BASED SYSTEMS [8] ............................................. 19
FIGURE 3: REVISION DIAGRAM OF CLIENT A AND B BOTH MODIFYING THE SAME PROPERTY "ITEM1" ......................................... 22
FIGURE 4: THE SYNCHRONIZATION MODEL USING STATUS FLAGS AND TIMESTAMP SYNCHRONIZATION ........................................ 27
FIGURE 5: USE CASE DIAGRAM .................................................................................................................................... 32
FIGURE 6 ADD ITEM ON MOBILE DATABASE ACTIVITY DIAGRAM ........................................................................................ 33
FIGURE 7 : EDIT AN ITEM ON MOBILE DATABASE ACTIVITY DIAGRAM................................................................................. 34
FIGURE 8: DELETE AN ITEM ON MOBILE DATABASES ........................................................................................................ 35
FIGURE 9: ADD AN ITEM ON CLOUD DATABASE ............................................................................................................... 36
FIGURE 10: EDIT AN ITEM TO CLOUD DATABASES .......................................................................................................... 37
FIGURE 11: DELETE AN ITEM FROM CLOUD DATABASES ................................................................................................... 37
FIGURE 12 : START SYNCHRONIZATION PRESSES .............................................................................................................. 39
vi
Introduction
The number of connected mobile devices in the world is rapidly increasing. A report
from IDC estimates that 87% of connected devices sales by 2017 will be tablets and
smartphones [1], this huge percentage indicates that the total number of connected devices
will be growing rapidly. These devices have many differences from desktop computers,
because they have different purposes. These devices connect people to information, as their
social network, work information and emails.
As a solution for sharing information, synchronization of data between cloud databases
and mobile applications is developed. There are a lot of researches about synchronization
witch shows that the optimal solution depends on the context.
The goal of this report is to present a solution that overlap the gap of sharing data
between cloud databases and mobile applications.
The research of this report is driven by the following questions:
How do existing synchronization solutions apply to the domain of cloud databases and
mobile applications?
How can we simplify data synchronization between cloud databases and mobile
applications?
How can we optimize a synchronization process to reduce the usage of mobile resources
such as communication and computation on mobile devices?
This report is organized as follows. Chapter 1 - provides a clear overview about cloud
computing and cloud databases and its impact on mobile devices usage. Chapter 2 -gives
Theoretical Models about synchronization problem and the existing solutions. Chapter 3
discuss the design of our proposition for the synchronization problem. Finally a report about
the implementation is presented on 0.
8
Chapter 1 - Cloud Databasés
Overview
Cloud computing has been the most attractive technology in the recent times, database
has also moved to cloud computing. A database can be accessed by the clients via the
internet from the cloud database service provider and delivered to the users. In other words,
cloud database is designed for virtualized computer environment. The cloud database is
implemented using cloud computing that means utilizing the software and hardware
resources of the cloud computing service provider.
Relational databases ruled the Information Technology (IT) industry for almost 40
years. But last few years have seen changes in the way IT is being used and viewed.
Standalone applications have been replaced with web-based applications, dedicated servers
with multiple distributed servers and dedicated network storage.
This chapter will take a tour around the cloud databases and its advantages and
disadvantages, then will make a comparison between Relational databases and NoSQL
databases. Finally, will talk about Challenges in the cloud database development.
1.1 Cloud Database
Cloud databases are mainly used for data- intensive applications such as data
warehousing, data mining and business intelligence. These applications are read-intensive,
scalable and elastic in nature. Transactional data management applications such as banking,
airline reservation, online e- commerce and supply chain management applications are
write- intensive. Databases supporting such applications require ACID (Atomicity,
Consistency, Isolation and Durability) properties, but these databases are difficult to deploy
in the cloud. Cloud computing is growing at a very high pace in the IT industry around the
world. Many companies have started moving towards cloud computing and accessing their
9
data from cloud database. A survey has shown that almost 36 percent of the companies are
running applications through cloud services (Mimecast Survey, 2011). Cloud computing
can be referred as a new dimension in IT world in terms of cost saving and faster application
performance. This trend of the companies shows that in the near future, companies will start
relying on the cloud applications. Cloud database is mostly used as a service. It is also called
Database as a Service (DBaaS) [2].
1.1.1 Architecture
The cloud database holds the data on different data centers located at different locations.
This makes the cloud database structure different from the rational database management
system. This makes the structure of the cloud database a complex one. There are multiple
nodes across a cloud database, designed for query services, for data centers that are located
in different geological locations and the corporate data centers as well. This is linking is
mandatory for the easy and complete access of the database over the cloud services. There
are different methods for accessing the database over the cloud services, the user can access
it via computer through the internet, or a user using a mobile phone can access the cloud
database via 3G or 4G services (Pizzete and Cabot 2012), in the next figure we will describe
the cloud databases architecture. [2]
10
Figure 1: Cloud Databases Architecture
1.2 Advantages of Cloud Database
The cloud computing has given a new dimension to IT industry and the companies are
looking to adopt cloud services rather than investing a huge money in getting the
infrastructure for own database system. This advent in computing and cloud computing, the
cloud database is also picking up its pace in making its permanent place in IT world. There
are a number of advantages that make it preferable and adoptable by a huge number of
companies for its matchless services in a very cost saving manner. If the companies do not
get the services of a cloud database, then they will have to invest huge money for setting up
their own data centers and then hiring separate staff to manage and take care of all the data
center processes. Here are few advantages of adopting cloud database. [3]
1- The technology has changed the way of business, and now the people use to shop over
the internet and they rely on shopping for saving their time. This change in the business
has let the companies think about the fastest way they can do business over the internet.
There was a time when software needed to be installed to access the database of the
11
company but now a day the employees even don’t have time to install software on their
computer rather they prefer to use a ready to available resources. They prefer to use the
cloud database so that they can access the information stored in their database without
wasting any time. [3]
2- The other advantage of using a cloud database is that it saves a lot of money. The
company does not need to invest money in setting up their own data centers and then
managing it by hiring extra staff for this purpose. Moreover, after setting up a data
center, the company will need to buy the softwares as well and their maintenance is
also required. [3]
3- The cloud database service providers of DBaaS providers also make the customer free
from the tensions of making any immediate changes in the database. On the other hand,
the cloud database providers also offer scalability on the peak times that does not let
the performance of the company go down. [3]
4- Cloud computing has given the freedom to access the information from anywhere
without any boundaries of getting to your personal computer at home. This makes it a
very powerful technology and the companies prefer it as the customers, employees or
the authorities of the companies can get the formation they want from anywhere at any
time. [3]
5- There are many other benefits of cloud database as well, that makes it the best option
available to the larger organizations and companies who need to hold terabytes of data.
The cloud database makes the availability of data possible anytime from anywhere. [3]
1.3 Disadvantages of Cloud Database
As there are advantages of using a cloud database, there are disadvantages as well. The
disadvantages can be alarming sometimes for the companies.
1- The companies have to pay for the usage of the cloud database as per decided. Every
time the data is transferred from the database, the company will have to pay each time.
If the traffic of the company for transferring data with the database is high then the
company may be paying than its expectations. [3]
2- The other disadvantage of using a cloud database is that, we do not have a full control
over the server where our database is being held. We do not have the control over the
softwares installed on those computers. You cannot do anything to make the security
of cloud database strong. The client will have to rely on the provider only. The security
issues can be a big problem for the companies. [3]
3- The data you have hosted on the cloud database is totally dependent on the service
provider. The data and information about a company are the most important asset for
the organization. The organizations cannot afford to lose its information about its
customers and company policies. If the information is given in the wrong hands then
the company or the organization may face heavy losses. [3]
12
4- As there are masses of data hosted on the cloud database so it is very difficult to transfer
that data to your computer. For this purpose, internet speed must be high. On the other
hand, the traditional database can transfer data at a very high speed. [3]
5- If the client wants to switch database from one service provider to new one, then he
may face problems. The reason is that each service provides use their own methods and
techniques for storing data. The organization must be very careful about the selection
of DBaaS provider. [3]
6- In case of cloud database, the data is to be fetched via internet, so if the server is down,
then it may cause inability to access the data from the server. This causes huge losses
when the information is not available when needed.
1.4 Relational Databases and NoSQL Databases
In the earlier stages of computerization, there was more demand for transaction
processing applications. As the database industry matured and people accepted computers
as part and parcel of their lives, analytical applications became the focus of enterprises. Now
they wanted to store data not only for transaction processing, but to analyze consumer trends
and business needs. Enterprises want to use analytical knowledge to enhance their business
value. So, enterprise applications are broadly categorized into transactional and analytical
applications. Relational databases played dominant role in handling transactional data. Later
on, industry leaders like IBM and Oracle added analytical capabilities to their relational
databases for data mining applications. In the meantime, number of databases such as
Column databases, Object-oriented databases etc. came into market [4] [5]. But they could
not overpower the relational databases. Then Internet revolution and web 2.0 applications
started producing massive sparse and unstructured data. RDBMS are not suitable for
handling massive sparse data sets with loosely defined schemas. The need to store and
process such big data defined the role of NoSQL databases in the database technology as
Cloud databases. RDBMs and NOSQL databases are briefly discussed as follows:
1.4.1 Relational Database
The concept of relational databases is forty years old. It worked best in the era of
hardware limits such as small disk space, little memory, slow processor speed and limited
networking. It has rigid database architecture based on tables, columns, indexes,
relationships and schema. Data is stored in tables with predefined complex relationships.
Column indexes are used for faster search. Highly skilled Developers and DBAs are
13
required for database design and maintenance. Conventionally, they are used for
transactional databases. They include details at the lowest granularity. They contain
sensitive and operational data such as employee data and credit card numbers to handle
critical business operations. These databases are not well suited for Cloud environment as
they do not support full content data search and are difficult to scale beyond a limit [6] [7].
1.4.2 NoSQL Database
NoSQL means ‘Not Only SQL’ or ‘Not Relational’. A NoSQL database is defined as a
non-relational, shared- nothing, horizontally scalable database without ACID guarantees
[7]. NoSQL implementations are classified further into key/value stores, document stores,
object stores, tuple stores, column stores and graph stores. They can store and retrieve
unstructured, semi-structured and structured data. They are item-oriented. A domain can be
compared to a table and contains items having different schemas. The items are identified
by keys. All data relevant to a particular item is stored within that item. It improves
scalability of these databases as complex joins are not required to regroup data from multiple
tables. They have the ability to replicate and distribute data over many servers. They are
dynamically provisioned on demand.
They have emerged to address the requirements of data management in the cloud as they
follow BASE (Basically Available, Soft state, eventually consistent) in contrast to the ACID
guarantees. So, they are not suitable for update- intensive transaction applications. They
provide high availability at the cost of consistency [8].
14
RDB NoSQL Databases
Data within a database is treated as a “whole” treated as a “whole”
Each entity is considered an independent unit of data
and can be freely moved from one machine to the
other
RDBMS support centrally managed
architecture They follow distributed architecture.
They are statically provisioned. They are dynamically provisioned.
it is difficult to scale them. They are easy scalable
they provide SQL to query data They use API to query data (not feature rich as SQL).
ACID (atomicity, consistency, isolation and
durability) compliant; DBMS maintains
consistency.
Follow base (basically available, soft state, eventually
consistent); the user accesses are guaranteed only at
a single-key level.
They support on-line transaction processing
applications. They support web2.0 applications.
ORACL, MySAL, SQL server etc. Are popular
RDBMS.
Amazon simpledb, yahoo’s pnuts, couchdb etc. Are
popular nosql databases.
Table 1: RDB and NoSQL Databases Comparison
1.5 Challenges to Develop Cloud Databases
Cloud DBMSs should support features of Cloud computing as well as of traditional
databases for wider acceptability, which is a Hercules’s task. The potential challenges
associated with cloud databases are as follows:
1.5.1 Scalability
The main feature of Cloud paradigm is scalability which implies that resources can be
scaled-up or scaled-down dynamically without causing any interruption in the service. The
cloud database must be able to scale out itself when the workload increases. The scaling out
of the database helps in the best performance and efficiency of the cloud database. It puts
challenges on developers to develop databases in such a way that they can support and
handle unlimited number of concurrent users and data growth. Enterprises deal with huge
volumes of data. Adding additional servers on demand solve the problem of scalability, only
if the process and workload are parallelizable.
15
1.5.2 Availability
Availability of database implies that database is up and running 365 X 24 X 7. It
becomes necessary to replicate data across large geographic distances to provide high data
availability, durability and high levels of fault tolerance. Amazon’s S3 cloud storage service
replicates data across “regions” and “availability zones”.
1.5.3 Consistency and Integrity
Data integrity is the most critical requirement of all business applications and is
maintained through database constraints. The lack of data integrity results in unexpected
outputs. Cloud databases follow BASE (Basically Available, Soft state, eventually
consistent) in contrast to the ACID (Atomicity, Consistency, Isolation and Durability)
guarantees. So, Cloud databases support eventual consistency due to replication of data at
multiple distributed locations. It becomes difficult to maintain the consistency of a
transaction in a database which changes too quickly especially in the case of transactional
data. Developers need to follow BASE approach cautiously. They should not compromise
data integrity in their over enthusiasm to move to cloud databases.
1.5.4 Database Security and Privacy
Data physically stored in a particular country, is subject to local rules and regulations
of that country. The US Patriot Act allows the government to demand access to the data
stored on any computer. Amazon S3 only allows a customer to choose between US and EU
data storage options. If data is encrypted using a key not located at the host, then it is little
safer. Risks are involved in storing transactional data on an untrusted host. Sensitive data is
encrypted before being uploaded to the cloud to prevent unauthorized access. Any
application running in the cloud should not have the ability to directly decrypt the data
before accessing it. Providing security and privacy to different databases on the same
hardware is also a big challenge.
16
1.6 Industry Practices in Cloud Databases
Cloud databases are designed to minimize the number of hardware. They scale out
easily by distributing the database across multiple hosts/nodes as the load increases. NoSQL
databases have become synonym for cloud databases. Few commonly used cloud databases
in the industry are listed below.
Amazon Simple Storage Service (S3) and Databases
Amazon SimpleDB
Google App's Bigtable
Hadoop
Windows Azure Cloud Storage
Microsoft SQL Server Data Services (SDDS)
MongoDB
17
Summary
This chapter has outlined the concept of cloud database, and presented some of cloud
databases aspects, Cloud databases appear to be a good solution for handling the companies’
problems, many of them have started relying on the cloud computing. The massive data
generated by web-based applications have changed the whole database concepts and
scenarios. The datacenters are so expensive so not all organizations are able to buy their
own datacenters. The cloud database makes the datacenters available for all organizations
of any size, and the growing popularity of Cloud databases is marking the beginning of new
era of databases. Though cloud databases are not ACID compliant, they are able to handle
massive workloads of web-based applications. Cloud computing and Cloud databases are
set to rule the next decade by overcoming the limitations they have.
18
Chapter 2 - Data Synchronization
Overview
Synchronization is a well-researched problem, because it is used within a wide range
of software applications. The problem is having data located in deferent hosts, each host has
a copy, but requiring all hosts to be always connected with the system is not a good practice.
Data synchronization is used in serval computer science fields like database, file system
and version control. In case of software solution databases system, data synchronization is
used to keep databases equal to each other. In case of file system, it is used for cooperation
on the same file (google drive, box.com …). In case of version control systems (like Git and
Subversion) it is used to handle and watch change sets.
As we saw in the previous chapter, the current movement of cloud services makes
synchronization process a hot topic again, our report has a goal of data synchronization
focusing on cloud databases with multi mobile devices.
The previous chapter introduced the concept of cloud databases and mobile services.
This chapter will focus on the core of synchronization by presenting theoretical models
(Section 2.1 ) and will list deferent techniques for synchronization with their pros and cons
(Section 0) and conflict resolution (Section 2.3 ). It will also mention the theoretical limits
that a synchronization solution has (Section 2.4 ).
2.1 Theoretical Models
Data synchronization is divided into two domains: ordered data and unordered data. “a
b c” has another meaning then “a c b”, which means ordered data. But in set theory {a, b,
c} is equivalent to {a, c, b}, thus categorized as unordered data. It is hard to have an efficient
solution for the synchronization problem when we consider the data model as ordered data.
However the real-world shows that the data model is a combination of the two models, only
19
the property values must be handled as ordered data. Figure 2: Combination of ordered and
unordered data in an object based systems shows an overview of ordered and unordered data
at object based system [9]
Figure 2: Combination of ordered and unordered data in an object based systems [9]
For the solution of our report, we might simplify the problem by perceiving the data as
combination of ordered and unordered model. This means that the resolution of problems
on the level of property values cannot be solved by merging, but has to be solved by
selecting one version.
The problem is based on few remote clients, each one has a set of different data, so the
clients need to know the updates and calculate the difference of few sets. The challenge here
is to restrict the synchronization to a minimum amount of communication.
20
2.2 Synchronization techniques
This section describes the techniques that currently exist in unordered data synchronization:
2.2.1 Wholesale synchronization
The wholesale approach is the simplest algorithm, when the data is synchronized, one
of the devices sends all local data to other devices. The other devices compute the
differences and return back the updated data. This is too inefficient, because usually few
changes are made, while all data is sent over the network to make the comparison. It has the
advantage that it guarantees that all changes are transmitted and it is so simple to implement.
[10] [11]
2.2.2 Status flag synchronization
With status flag synchronization a client maintains information about the data in the
form of status-flag. These flags indicate if an item was modified, deleted or created. When
synchronizing, the client just sends the items which have a flag set. This is more efficient
then the wholesale synchronization. If the system performs the synchronization with
multiple client, this does not work well. In addition to which data has been changed, we
also need to determine the information about with whom we have synchronized the updates.
[10]
2.2.3 Timestamp synchronization
When timestamp synchronization is used, each client maintains information about the
last time data was changed, timestamp per client represents the last synchronization with
this client. During synchronization, only changed items since the last synchronization have
to be sent. This is an improvement over status flag, it is very inefficient in a situation where
two clients are both fully synchronized with other clients, but in case of first time
synchronization they both will send all data, while there are no changes between them. [10]
21
2.2.4 Mathematical synchronization
This approach uses mathematical properties of data that need to be synchronize. Choi
et. al. uses a Message Digest to determine the changes on data that needs to be synchronized
[12]. Synchronization based on Message Digest is a form of mathematical synchronization.
This solution is independent of database feature vendors, and only uses standard SQL
operation. However this solution requires additional tables on the server and it is highly
dependent on the relational databases model, they use JOINs and foreign keys. And they
have a table for each client on the server. [10] [12]
2.2.5 Log synchronization
Log synchronization approach is used a lot in databases, it is used for tracking changes
on data, and saving them in logs, then, these logs are synchronized with other clients. When
a log is synchronized, every operation is replayed on other clients. Logs can grow
significantly as they store all operations in addition to the normal data. [10]
Data synchronization can work in a single direction or both directions, synchronization
in both directions is called bi-directional synchronization. When the change occurs only on
the server, and the data is read-only on a client, we only retrieve the changes from the server,
this called download-only synchronization. When items only used and modified by a single
client, so the client have to send new changes to the server, this called upload-only
synchronization. [13]
2.3 Conflict resolution
When two clients synchronize data, they may cause conflicts while synchronizing.
Considering the following events, which are also shown in the revision diagram in Figure
3: Revision diagram of Client A and B both modifying the same property "item1" [14]:
Client A get item 1 from the server.
Client B get item 1 from the server.
Client A and B go offline.
Client A makes a change to item 1 and go online to synchronize.
22
Client B makes a change to the same property as Client A and go online again to
synchronize.
Figure 3: Revision diagram of Client A and B both modifying the same property "item1"
We have a conflict when client B tries to synchronize, the conflict resolution needs to
be performed. Many resolution polices have been identified [15], in addition to other custom
policies that can be used for a specific applications:
Originator Wins: Take the data item of the originator
Recipient Wins: Take the data item of the recipient
Client Wins: Take the data item of the client
Server Wins: Take the data item of the server
Recent Wins: Take the data item which has been updated recently in time
Duplication Apply: The requested modification is applied on a duplicated data
item while keeping the existing data item.
23
2.4 The CAP Theorem
In theoretical computer science, the CAP theorem (known as Brewer's theorem) for a
distributed system states that it is impossible for any system to simultaneously provide all
three of the following guarantees:
Consistency. (all nodes see the same data at the same time)
Availability. (a guarantee that every request receives a response about whether it
succeeded or failed)
Partition Tolerance. (the system continues to operate despite arbitrary message loss
or failure of part of the system)
This theorem was proven in 2002 when Nancy Lynch and Seth Gilbert published a proof of
CAP theorem [16].
2.5 Related Work (State of art)
Bayou is a platform that replicate mobile databases on which to build collaborative application.
This work have several users sharing data while being disconnected from the rest of the system.
[17]
Xmiddle is a mobile middleware for sharing transparency of XML documents in P2P
networks, it support sharing of tree-structured data between peers; each offer an access point
the other, to replicate the data or manipulate its data online for working offline. [18]
SodaSync is a framework that provide generic synchronization model for mobile enterprise
applicatins [19]
SyncML is a specification for an interoperable data synchronization framework using XML-
based model. [20]
Rsync is a software application network protocol for windows system and Unix. [21]
24
Summary
The data is divided into ordered and unordered. In reality, almost data we need to
synchronize is a combination of two models. To simplifying the problem, we perspective
the data as unordered. There are techniques that currently exist to solve this problem, each
one has its advantages and disadvantages. At the end of synchronization process, the conflict
resolution policy need to be performed.
The next chapter will design a system for synchronization inspired from the existing approach.
25
Chapter 3 - Systém Désign
Overview
In the previous chapter we have seen work done by others on data synchronization. In this chapter we
will discuss the design of our proposition for the synchronization problem. This includes choosing a
synchronization model to perform the synchronization between the cloud database and mobile (Section 3.1
), then determining some of enhancements we suggest to the data structure (Section 0), the proposition
details in (Section 3.3 ), also creating a flexible conflict resolution (Section3.4 ), UML diagrams
(Section 3.5 ), (Section 3.6 ) and (Section 3.7 ).
We consider the result of this chapter as a detailed design that we can implement at the next chapter.
3.1 Choosing a synchronization model
The synchronization model is the core of synchronization solution. Many synchronization models are
available and discussed in (Section 0). This section will criticize them and list their advantage and
disadvantage. They are summarized at Table 2: Comparison between the synchronization techniques
Wholesale synchronization approach is the easiest one, but requires unnecessary amount of bandwidth,
because the system needs to send all data each time we synchronize. On mobile device this data usage has
an impact on the user data plan and on the energy usage as well.
Mathematical synchronization approach can cost a lot of computing, the approach by Choi et. al.
requires a big amount of stored tables on the server. The amount is linear dependent on the number of
clients.
Status flag synchronization approach works well to keep track of locally made data updates on the
client, but on the server, keeping track for every client status flag can grow out of control and requires
every change on the server to update all status flag for each client. It is clear that this approach is not a
viable solution on the server, because of the amount of space used increases linear with the amount of
clients. At the client, status flags allow the system to track multiple changes to a one single object as one
single change.
26
Timestamp synchronization approach is good to keep track of changes in general. This approach has
disadvantage is that the type of change is not saved and the delete operation is tricky. We have an option
to track deletes is to clear timestamp when item is deleted, and then we lost timestamp information.
However timestamp works significantly better for the server, because only one single timestamp per data
item needs to be recorded to make it work. It is very important that client’s clock is correct for timestamp
synchronization to work correctly.
Log synchronization approach works for keeping track of local updates made while being offline, but
will use additional memory for storing the changes. And it doesn’t allow us to collapse multiple changes
to a single object into a single change. At the server we still need to maintain information about the time
when the log was last synchronized with a client. Log synchronization technique has the advantage that the
change can be replayed the same on the sever
Ban
dw
idth
eff
icie
ncy
Sto
rage
effi
cien
cy
Tra
ck s
erver
chan
ges
Tra
ck l
oca
l ch
ang
es
Com
puta
tional
com
ple
xit
y
Wholesale synchronization - + n/a n/a +
Mathematical synchronization +/- + n/a n/a -
Status flag synchronization + + - + +
Timestamp synchronization + + + +/- +
Log synchronization + - - + +
- = Weak, + = Strong, +/- = Normal, n/a = not applicable
Table 2: Comparison between the synchronization techniques
We want the both download and upload synchronization steps to be as efficient as possible, with
respect with bandwidth usage, number of requests and storage. So we choose the timestamp approach for
download synchronization and status flag approach for upload synchronization. Both method are
guaranteed to find the precise updates made remotely and locally, and therefore allow us to send and receive
updates by using a single request each. The figure Figure 4: The synchronization model using status flags
and timestamp synchronization shows the synchronization model.
27
Id … Status Id … __version
EE7B… … 0 EE7B… … 143216034
1530
CV87A… … 3 CV87A… … 143216033
0303
Figure 4: The synchronization model using status flags and timestamp synchronization
These timestamps are generated on the server for each change to an item, it should be unique per table.
On the server we can use the latest assigned timestamp to detect which data have changed since the last
synchronization. These timestamps have a further discussed in (Section 3.2.2 )
To detect the local changes made on the client, we use status flags. These flags can indicate if this item
was unchanged, modified, deleted or inserted. Local changes are now very easy to find. They are the items
whose status flag not equal to ‘unchanged’.
Multiple changes to the same item will be only in a single change being saved and the same holds for
deleted and updated to a locally inserted item.
Timestamps
Status flags
28
3.2 A database model for synchronization
To make our synchronization model work correctly, we have to make some changes to the structure of the
data we store and the information we saved for each data item.
3.2.1 Distributed Identity
To support creating new data items locally on a client while it is offline, the system need to use an
identity that can easily be created on a client, with avoiding assigning the same identity twice or more in
different clients. An incremental number identity such as integers is inefficient in this case because we
cannot determine the next unique integer for all the available clients. A distributed identity is needed that
can be generated independently and uncoordinated on a client, with the smallest probability of generating
a duplicate.
There are options for distributed identity, each one has its advantages and disadvantages: [17]
GUIDs:
GUID or UUID is 128-bit identifiers and are commonly displayed as 32 hexadecimal digits with groups
separated by hyphens, such as {5c8913f2-e2b4-11e4-8a00-1681e6b88ec1}. The Internet Engineering
Taskforce (IETF) standardize the UUID format in rfc4122 [18]. GUID are generated from random
numbers. GUIDs are not 100% unique, if we generate 2128+1 GUID, the duplicate probability will be 1.
Note that the generation of (2128) GUIDs would take 10790283070806 years with a billion computer that
generate a billion GUIDs per second, so this identifiers is unique enough for choose it as distributed Identity
Time-based/network-based identifiers
Identifiers can be generated based on a combination of the MAC address of the machine and the current
time. MAC address is assigned to network adapter by the manufacturers, they use a system that ensure all
the addresses are unique. So an identifier containing this address would be globally unique, but in
virtualized environment MAC address is generated locally and therefore not necessarily unique. There is
other problem in the case where multiple processors generate an identifier at the same time, the results will
be the same identifier. This problem happen when the system time changed or multiple applications run on
the same device
29
Hierarchical identifiers
This identifier consist of a global client identifier combined with a number generated locally on a
device. The global identifier is assigned by the server to each client. Both parts are guaranteed to be unique,
the resulting number will be globally unique. In this case global id should be generated for each client, and
the client needs to be able to generate the local identifier. This adds some work should be done by the
server, so it has to track the clients, but this is not possible on every client platforms, because not all of
them offers uniquely identify an application. In addition to that the identities could change due to temporary
data weep or reinstalling the application.
In our implementation we use GUIDs, since both alternative ways have problems that are hard to solve,
GUIDs are robust and the chances of a collision are very rare.
3.2.2 Object Versioning (timestamps)
Every item will be assigned a timestamp, which is unique per table and can be seen as the current version of
the item. We will use the term “version” to reference to these vector timestamps. These timestamps will have
two purposes:
- Change detection for download synchronization
- Conflict detection when an item is modified or deleted
When the system perform download synchronization, so it include the most recent assigned timestamp
in the response for a request. The next time the client sends the same request, it includes this version number
in the request to get only the changed items since the last time the request was sent. This includes additions,
modifications and deletes.
When an element is changed we can use the timestamp to check if this item has not been changed on
the server by others while this change was made.
3.2.3 Tracking Deletes
When item deleted we cannot remove that item from the server. Maybe there still be clients who are
working offline with this item, they need to be notified of the deletion when they synchronize. Instead of
remove a deleted item from the database, the system have to put it into a deleted state indicate the item was
removed. We do this by recording its state in a special flag on the server.
30
3.3 Synchronization algorithm
In this section we will describe the algorithm that we used to synchronize the changes. The
synchronization algorithm keeps the local databases synchronized with the remote database server. When
a synchronization request made the algorithm executes these steps:
- Synchronize all items which there status flag is not “unchanged”.
- The system sends the request to the server including the last version that is stored for the request URI
and insert the response (the changes) into the local database.
- Replay the request to the local database and return the changes from the server.
In the case when the user is offline, the steps that send the request to the server are omitted.
For subsequent requests, the version that is received with the last request is included in the URI. This allows
the server to only send modification since last time the request was sent.
Conflict detection is done on the server between pulling and pushing proses.
When the application is offline, the algorithm needs to handle situations that can occur when changes are made
while offline.
- A new item is created locally and then update locally, after the update the item should still have the
“inserted” status instead of change it to “Updated”. The item is still a new item for the server.
- A new item is created locally and then deleted, after delete the item should be deleted and the server
should never know about it. The item deleted locally and we cannot send a deleted request for an item
that the server does not know about.
3.4 Flexible Conflict Resolution
Conflict resolution is an important aspect for the synchronization solution, the system have to detect
conflicting modification by checking if the version of the incoming item and the item in the cloud database
match or not. A conflict occurs when they are deferent. Modification of a deleted items is detected by checking
if the item’s state marked as “deleted”. In Section 2.3 we liste different strategies that can be used to resolve
conflicts.
3.5 Modelling Requirements: Use Cases Diagram
Use cases diagrams describe a system’s requirements from outside looking in, they specify the value that the
system deliver to the user.
31
3.5.1 Capturing the system requirement
In the following, we shall describe the system requirement:
Requirement 1:
The modelled system shall allow mobile users to (View/add/edit/delete) items locally, in order to cache it for
synchronization later.
Requirement 2:
The system shall allow web API users to (View/add/edit/delete) items in cloud database, in order to be
synchronized with mobile devices later.
Requirement 3:
Mobile users should trigger the synchronization process to be up-to-date with the cloud database.
Requirement 4:
The system need to check (mobile/ Web API) user’s identity via a “user credential web service”
3.5.2 Search for actors (outside the system):
Based on the previous requirement we search for actors and determine its interaction with the system.
Mobile User
We capture “Mobile User” actor as it described in requirements 1, 3, 4. They indicate mobile user interaction
with the system:
(View/add/edit/delete) items from local database.
Trigger Synchronization process.
Identity checking
Web API User
We capture “Web API User” as it described in requirement 2, 4. They indicate his interaction with the system:
(View/add/edit/delete) items from cloud database
Identity checking
User credential web service
We capture “user credential web service” as it described in requirement 4. It responsible for:
Checking the (Web API /Mobile) User identity to provide the data.
32
Giving (Web API/Mobile) Users the right for interact with cloud database.
3.5.3 Capture Use Cases (Inside the system)
Base on the previous requirement we create 6 use cases:
(View/Add/Edit/Delete) items from mobile database
(View/Add/Edit/Delete) items from cloud database
Trigger Synchronization
Conflict Resolution
Log in / Log out
Check Identity
3.5.4 Use Case diagram
Figure 5: Use Case Diagram
33
3.6 Modeling System Workflows: Activity Diagrams
Activity diagrams allows us to specify how our system will achieve its goals
3.6.1 Add Items on Mobile Database
Figure 6 Add Item on Mobile Database Activity Diagram
34
3.6.2 Edit an Item on Mobile Database
Figure 7 : Edit an Item on Mobile Database Activity Diagram
35
3.6.3 Delete an Item from Mobile Databases
Figure 8: Delete an Item on Mobile Databases
36
3.6.4 Add an Item on Cloud Database
Figure 9: Add an Item on Cloud Database
37
3.6.5 Edit an Item to Cloud Databases
Figure 10: Edit an Item to Cloud Databases
3.6.6 Delete an Item from Cloud Databases
Figure 11: Delete an Item from Cloud Databases
38
3.6.7 Start Synchronization process
39
Figure 12 : Start Synchronization presses
3.7 Modélling a Systém’s Logical Structuré: Class Diagrams
40
Summary
In this chapter we discussed the design of our synchronization solution system. This solution have the
following characteristics:
- Status flag synchronization combined with Timestamp synchronization. As discussed in
(Section 3.1 ), our solution used status flag synchronization on the client and timestamp
synchronization on the server.
- Database model. In (Section 0), we defined the properties that the data items should have. Each data
item has GUID identifier and a version timestamp to identify the changes and conflicts.
- Synchronization algorithm. Our algorithm is discussed in (Section 3.3 ) and modeled in (Section 3.5
), (Section 3.6 ) and (Section 3.7 ). The system keeps the local database synchronized with the
server and always pushes the changed items then detect the conflicts. After the push operation has
completed, the changes results on the local database are applied.
- Conflict detection and resolution. The conflicts detection is always resolved on the server.
(Section 3.4 )
In the next chapter we will describe our motivation example “gdevTracker”
41
Chapter 4 - Impléméntation
Overview
In chapter 3, we designed the solution for data synchronization. We determined the steps that the
algorithm should care about.
In this chapter we will present the implementation solution that is used in “gdevTraker application” as
an “issue tracking system” for the project sponsor “gamadev company” and as a motivation example.
This chapter Starts with introducing “issue tracking system - gdevTracker” in (Section 4.1 ) and
showing some Graphical User Interfaces, then implementing detail in (Section 4.2 ).
4.1 Issue tracker system
This section describe the “issue tracking system” application which is used as backend for mobile
application.
“An issue tracking system is a computer software package that manages and maintains lists of issues,
as needed by an organization.
Issue tracking systems are commonly used in an organization's customer support call center to create,
update, and resolve reported customer issues” – Wikipedia [19]
The application divided into two components (Web, Mobile). They are lives in different sites. The web
application interact with the cloud database, and the mobile application interacting with the local database.
42
4.1.1 Graphical User Interface
Web Application
Mobile Application
43
4.2 Implementation Detail
The previous sections introduce the project and present some of graphic user interfaces. This section gives
snippets for some parts of application. The more important part of the application is the synchronization
that is written with JavaScript programing language: for Web API
// rout for start synchronization by requesting the link:
// http://localhost:3000/sync
router.post('/sync/',function(req,res,next){
// get the uploaded data
var tasks = req.body.tasks
// get last synchronization from mobile device
var lastSync = req.body.lastSync
var version = Number(new Date());
// loop for (add/edite/delete) uploaded data
for(var i in tasks){
tasks[i]._v = version;
switch(tasks[i].flag){
case 1: //create task
var task = new Tasks(tasks[i]);
task.save()
break;
case 2://edit task
tasks[i]._v = version;
Tasks.update({_id:tasks[i]._id},tasks[i])
break;
case 3://delete task
tasks[i].deleted = true;
Tasks.update({_id:tasks[i]._id},tasks[i])
break;
}
}
// query for the downloaded data
Tasks.find({'_v':{'$gt':lastSync}},function(err,tasks){
//return back downloaded data
res.json({synced:true,_v:version,changes:tasks});
})
})
44
For mobile:
We host our code on GitHub repository:
https://github.com/moussaoui91/mobile
https://github.com/moussaoui91/cloud
//upload the changed data since last synchronization
Tasks.changedItems().then(function(tasks){
$http.post('http://'+$rootScope.hostname+':3000/sync',{tasks:tasks,lastSync:$localstorage.get('lastSync'
)}
// handle the response (returned data)
.then(function(res){
var serverTasks = res.data.changes;
Tasks.synced(res.data._v);
$localstorage.set('lastSync',res.data._v)
// loop for add / edite / delete data from mobile
for(var i in serverTasks){
serverTasks[i].id = serverTasks[i]._id
if (serverTasks[i].deleted) {
Tasks.remove(serverTasks[i])
}else{
Tasks.edit(serverTasks[i]).then(function(res){
if(!res.res.rowAffected){
Tasks.add(res.task)
}
})
}
}
})
});
45
Summary
This chapter described the motivation example of our synchronization solution between mobile
application and the cloud. With presenting some GUIs and code snippets,
The data of our motivation example hosted on Mongolabe.com, It is so easy to configure a cloud
database through those services, the interfaces is so simple without need for user guides, with code hosted
on Github services.
46
Général Conclusion
Although network connection are getting more faster, there will always be a situation
where the connection does not work well. It will be great if applications continue to work.
With our proposition for synchronization, it is possible to add offline support.
We tried our best to find answers for the questions we put in the Introduction Section.
How do existing synchronization solution apply to the domain of cloud databases
and mobile applications?
The existing solutions described in (Section 0) shows examples of implementation that are
used in domain of mobile devices. Each one has its advantages and disadvantages as
described below:
- Wholesale approach: One of the devices sends all local data to other device, the other
device compute the differences and return back with updated data.
- State Flag approach: The client maintain information about the data in form of status
flag, the client just sends the items witch have a flag set.
- Timestamp approach: each client maintains information about the last time data was
changed, only changed items since the last sync have to be synchronized.
- Mathematical approach: This approach uses mathematical properties of data that
need to be synced.
- Log approach: It is used for tracking changes on data, and saving them in logs, these
logs are synced with other clients.
How can we simplify data synchronization between cloud databases and mobile
applications?
The Data is divided into Ordered-Data and Unordered-Data to simplify the problem we
perspective the data as a combination of ordered and unordered data.
How can we optimize a synchronization process to use minimal amount of
communication and computation on mobile device?
To optimize the computation on mobile devices the best idea is calculate remotely and
use the data locally, that is means executing the calculation process in the server and
receive the results in the mobile to use them.
In order to reduce the communication between the mobile devices and the server we
integrate the changes information in the data themselves, and ignoring un-useful data.
We can explain the idea with this state: for a data item to be synced checking the state
flag to decide if this item synced or no, rather than making the sync process in all cases.
If we do not need some data, so we do not communicate for.
47
References
[1] L. Columbus, "IDC 87% of connected devices by 2017 will be tablets and
smartphones," 12 January 2013. [Online]. Available:
http://www.forbes.com/sites/louiscolumbus/2013/09/12/idc-87-of-connected-devices-
by-2017-will-be-tablets-and-smartphones/. [Accessed 3 4 2015].
[2] W. Shehri, "CLOUD DATABASE DATABASE AS A S ERVICE," IJDMS, vol. 5, no.
12.
[3] A. Indu and G. Anu, "Cloud Databases: A Paradigm Shift in Databases," IJCSI
International Journal of Computer Science, vol. 9, no. 4, 7 2012.
[4] K. Donald, K. Tim and L. Simon, "An Evaluation of Alternative Architectures for
Transaction Processing in the Cloud," SIGMOD, no. 10, 2010.
[5] Daniel and Abadi, Column-oriented Database Systems, VLDB, 2009.
[6] T. R. Singh, "Cloud Computing: An Analysis," International Journal of Enterprise
Computing and Business Systems, vol. 1, no. 2, pp. 2230-8849, 2011.
[7] R. Cattell, "Scalable SQL and NoSQL Data Stores," ACM SIGMOD, vol. 39, no. 4, pp.
12-27, 2011.
[8] A. Mathur, "Cloud Based Distributed Databases: The Future Ahead," International
Journal on Computer Science and Engineering (IJCSE), vol. 3, no. No, 2011.
[9] C. M. Melman, A Generqtive Approach for Data Synchronization between Web and
Mobile, Delft University of Technology, 2013.
[10] S. David, T. Ari and A. Sachin, Efficient PDA Synchronization, vol. 1, IEEE
Transactions on mobile Computing, 2003, pp. 40-51.
[11] A. S, S. D and T. D, On The Scalability Of Data Syncrozation Protocols for PDAs and
Mobile Devices, vol. 4, Network, IEEE, 2002, pp. 22-28.
[12] M.-Y. Choy, E.-A. Cho, D.-H. Park, C.-J. Moon and D. K. Baek, A Database
Syncrozation Algorithm for Mobile Devices, vol. 2, Consumer Electronics, IEEE
Transactions, 2010, pp. 392-398.
[13] microsoft, microsoft, Mar 2014. [Online]. Available: http://msdn.microsoft.com/en-
us/library/bb726039.aspx. [Accessed 09 04 2015].
48
[14] S. Burckhardt, M. Fähndrich, D. Leijen and a. B. P. Wood, "Cloude Types for Eventual
Consistency," pp. 283-307, 2012.
[15] Y. Ledd, Y. Kim and H. Choi, "Conflict Resolution of Data Synchronization in Mobile
Environment," vol. 3044, pp. 196-205, 2004.
[16] S. Gilbert and N. Lynch, "Brewer's Conjecture and the feasibility of Consistent,
Available, Partition-tolerant Web Services," pp. 51-59, June 2002.
[17] P. Lucas, Mobile Devices and Mobile Data-Issues of Identity and Reference., Vols. 2-4,
pp. 323-336.
[18] M. M. S. R. Leach P. [Online]. Available: http://www.ietf.org/rfc/rfc4122.txt. [Accessed
14 04 2015].
[19] WikiPedia, "Issue tracking system," [Online]. Available:
http://en.wikipedia.org/wiki/Issue_tracking_system. [Accessed 27 05 2015].
[20] B. Rajkumar and al, Cloud computing and emerging IT platforms: Vision, hype, and
reality for delivering computing as the 5th utility, vol. 25, Future Generation Computer
Systems, 2009, pp. 599-616.
[21] wikipidia, [Online]. Available: http://en.wikipedia.org/wiki/Issue_tracking_system.
[Accessed 04 2015].