Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Polite ni o di Milano
S uola di Ingegneria Industriale e dell'Informazione
Corso di Laurea Magistrale in Ingegneria Informati a
Dipartimento di Elettroni a, Informazione e Bioingegneria
Thesis
Evaluating Client-Side Repli ated NoSQL Databases
Approa hes
Relatore: Prof. Raaela MIRANDOLA
Correlatore: Ing. Mar o SCAVUZZO
Tesi di laurea di:
Claudio CARDINALE Matr. 849760
Anno A ademi o 20162017
To Noemi...
A knowledgments
My heartfelt thanks to Professor Raaela Mirandola who supported me in the
elaboration of this master thesis with her pre ious guidan e, advi e and suggestions.
Thanks to the o-advisor Mar o S avuzzo for his patien e and professionality, for
having helped me to orient myself on this broad and not very standardized topi and
for his prompt and e ient replies to my emails of request for help.
Many many thanks to my friends and fellow students for their support, the are-
freeness I ould experien e in these 5 wonderful years and their friendship that has
been very pre ious to me.
My deepest gratitude to my family for getting me to where I am today, for the
trust pla ed in me and the onstant en ouragement.
Last but not least, a spe ial thanks goes to Noemi that in di ult times has always
found the right words to motivate me, for having always believed in me, for her
patien e and onstant support during the ourse of my studies and in parti ular
during this last period and for having shared with me this signi ant experien e.
Abstra t
In today's appli ations, data are in reasing exponentially and also need to be
repli ated on dierent devi es in realtime; these dierent devi es should be able to
use them even if they are oine. So the devi es need a lo al opy of the database
available also oine alled lo al database.
The typi al devi e that uses this kind of appli ation is a smartphone. But not only,
also other web appli ations use them, su h as ollaborative softwares (like Google
Do s).
Dierent solutions based on NoSQL databases were proposed (both opensour e
and proprietary in loud lo ated), but of ourse ustom solutions based on RDBMS
are also feasible. The NoSQL based solutions are not standardized, so there is no
name even for them so we all them CS-NoSQL ( lient side NoSQL).
NoSQL was hosen be ause in this setting data are unstru tured and their amount
ould be enormous.
One of the main advantages of CS-NoSQL is that they are a full-sta k environment
(with other solutions we need to reate the entire infrastru ture) and this allows
reating a simpler appli ation with zero ode.
Goal of this master thesis is to verify the performan es of CS-NoSQL for the lass
of appli ations they are designed for, omparing them with a solution based on an
RDBMS.
To do that we implemented a omparing solution based on postgreSQL (an RDBMS
server) that repli ates data on lients using webso ket, we also implemented a simple
lo al database for it.
We reated a ustom framework to test this kind of systems and we did some ben h-
mark tests emulating the dierent lasses of appli ations for whi h CS-NoSQL are
designed.
We dis overed that they are very unstable with a reasonable amount of data and
that our proposed solution based on postgreSQL is qui ker (up to 10x). However
we expe t an improvement (in performan e and stability) in the next years when
probably more native solutions will be developed.
Sommario
Oggi, nelle appli azioni i dati res ono esponenzialmente e vanno repli ati su
diversi dispositivo in realtime, dispositivo he si dovrebbe poter utilizzare an he se
sono oine. Tali dispositivo ne essitano di una opia lo ale del database disponibile
an he oine hiamato database lo ale.
Il dispositivo tipi o he usa questo tipo di appli azione è lo smartphone, ma an he
appli azioni web ome i software ollaborativi ( ome Google Do s) le usano.
Sono state proposte diverse soluzioni basate sui database NoSQL (sia opensour e
he proprietarie in loud), ma an he soluzioni personalizzate basate su RDBMS sono
fattibili. Le soluzioni basate sui NoSQL non sono standardizzate e non hanno un
nome denito quindi le hiamiamo CS-NoSQL (NoSQL lato lient).
Sono stati s elti i NoSQL per il fatto he si trattano dati non strutturati e di grandi
dimensioni.
Uno dei prin ipali vantaggi dei CS-NoSQL è he sono un ambiente ompleto ( on
le altre soluzioni dobbiamo reare l'intera infrastruttura) e iò onsente di reare
un'appli azione più sempli e senza odi e.
L'obiettivo di questa tesi è veri are le performan e dei CS-NoSQL per le lassi
di appli azioni per ui sono progettati, omparandoli on una soluzione basata su
RDBMS.
Per far iò, abbiamo implementato una soluzione omparata basata su postgreSQL
(un server RDBMS) he repli a i dati sui lient usando webso ket, implementato un
sempli e database lo ale per esso, reato un framework personalizzato per testare
questi sistemi e fatto test di ben hmark emulando le dierenti lassi di appli azioni
per ui i CS-NoSQL sono stati progettati.
Abbiamo s operto he sono molto instabili on gran quantità di dati e he la nostra
soluzione basata su postgreSQL è più velo e (no a 10 volte). Tuttavia, i aspettiamo
un miglioramento (in performan e e stabilità) nei prossimi anni quando saranno
probabilmente sviluppate soluzioni più native.
Contents
Introdu tion 1
1 Ba kground 5
1.1 HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 HTTP handshake . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 WebSo ket . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 RESTful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Data retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Consisten y, Partition toleran e, Availability . . . . . . . . . . 10
1.2.3 Partitioning and Distribution . . . . . . . . . . . . . . . . . . 11
1.3 Serverless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Platform as a Servi e . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Software as a Servi e . . . . . . . . . . . . . . . . . . . . . . . 14
2 State of Art 15
2.1 CS-NoSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Distributed issues . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.3 Publish/subs ribe . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 Constraints, permissions and queries . . . . . . . . . . . . . . 19
2.2 Client Best Pra ti es . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Event driven approa h . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Client language . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 RESTful approa h . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Lo al database . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Analysis of Some CS-NoSQL 25
3.1 Chara teristi s to be onsidered . . . . . . . . . . . . . . . . . . . . . 25
3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 OpenSour e . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Sowftare as a Servi e . . . . . . . . . . . . . . . . . . . . . . . 29
i
Contents
4 A Proposed Comparing Traditional Approa h 31
4.1 Te hnologies Ba kground . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Redis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.3 So ket.io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Ar hite ture Proposed . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 Publish/Subs ribe and Webso ket server . . . . . . . . . . . . 34
4.2.3 Input server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.4 Custom logi . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.5 Client library and Lo al database . . . . . . . . . . . . . . . . 36
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Webso ket server . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Classes of Appli ations 39
5.1 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Realtime hat . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.2 Collaborative software . . . . . . . . . . . . . . . . . . . . . . 41
5.1.3 So ial Appli ations . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Examples of Real CS-NoSQL Appli ations . . . . . . . . . . . . . . . 45
5.2.1 Adobe DPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.2 Logite h Harmony Ultimate Home . . . . . . . . . . . . . . . 46
5.2.3 CornerJob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Ben hmarks strategy 49
6.1 S aling Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.1 S aling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.2 Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Test Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 Tests sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.1 Realtime hat . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.2 Collaborative software . . . . . . . . . . . . . . . . . . . . . . 54
6.3.3 So ial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.4 Adapters for systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7 Ben hmarks 57
7.1 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.1 Cou hbase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.1.2 Pou hdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.3 Gun.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
ii
Contents
7.1.4 Traditional . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Analysis of the Results and Lesson Learned . . . . . . . . . . . . . . 70
8 Con lusions 73
Bibliography 75
A Snippets 85
A.1 PostgreSQL realtime retrieve trigger . . . . . . . . . . . . . . . . . . 85
A.2 So ket.io ustom logi . . . . . . . . . . . . . . . . . . . . . . . . . . 86
B Jepsen 87
iii
Contents
iv
List of Figures
1.1 Modern web appli ation sta k . . . . . . . . . . . . . . . . . . . . . . 6
1.2 HTTP1.0 vs HTTP1.1 (usingkeep-alive) [42 . . . . . . . . . . . . 7
1.3 HTTP2 multiple parallel requests [44 . . . . . . . . . . . . . . . . . 8
1.4 Webso ket with server events [87 . . . . . . . . . . . . . . . . . . . . 8
1.5 Database triangle [24 . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Simple lient ar hite ture . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Traditional Ar hite ture Proposed . . . . . . . . . . . . . . . . . . . 33
7.1 Chat ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Collaborative ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . 61
7.3 So ial ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . 61
7.4 Chat (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 63
7.5 Collaborative (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . 64
7.6 So ial (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 64
7.7 Chat (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . . 66
7.8 Collaborative (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . 67
7.9 So ial (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 67
7.10 Chat (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 69
7.11 Collaborative (traditional) ben hmarks . . . . . . . . . . . . . . . . . 69
7.12 So ial (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 70
B.1 Jepsen laten y raw . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
B.2 Jepsen laten y quantiles . . . . . . . . . . . . . . . . . . . . . . . . . 88
B.3 Jepsen rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
v
List of Figures
vi
List of Tables
3.1 CS-NoSQL omparison . . . . . . . . . . . . . . . . . . . . . . . . 26
6.1 CS-NoSQL test environments . . . . . . . . . . . . . . . . . . . . . 51
6.2 postgreSQL (traditional approa h) test environments . . . . . . . . . 51
6.3 so ket.io (traditional approa h) test environments . . . . . . . . . . . 51
6.4 Realtime hat lients . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.5 Collaborative software lients . . . . . . . . . . . . . . . . . . . . . . 54
6.6 So ial lients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.1 Chat ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Collaborative ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . 60
7.3 So ial ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . 61
7.4 Chat (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 63
7.5 Collaborative (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . 63
7.6 So ial (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 64
7.7 Chat (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . . 66
7.8 Collaborative (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . 66
7.9 So ial (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 67
7.10 Chat (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 69
7.11 Collaborative (traditional) ben hmarks . . . . . . . . . . . . . . . . . 69
7.12 So ial (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 70
vii
List of Tables
viii
List of Algorithms
2.1 Event driven retrieve of data . . . . . . . . . . . . . . . . . . . . . . 21
ix
List of Algorithms
x
Introdu tion
In today's appli ations, data are in reasing exponentially [108 and also need to
be repli ated on dierent devi es in realtime, these dierent devi es should be able
to use them even if they are oine. So the devi es need a lo al opy of the database
available also oine alled lo al database.
The typi al devi e that uses this kind of appli ation is a smartphone. But not only,
also other web appli ations use them, su h as ollaborative softwares (like Google
Do s) [112.
So what we need is a database whi h allows pro essing big data and repli ate itself
among devi es in realtime.
NoSQL databases were born to pro ess big data e iently [99,113. So solutions
to repli ate the databases in realtime are based on them, they an be opensour e or
proprietary in loud lo ated, one of the most famous loud solution is rebase [119.
These solutions are not standardized, so there is no name even for them so we all
them CS-NoSQL ( lient side NoSQL database).
Goal of this master thesis is to verify the performan es of CS-NoSQL for the lass
of appli ations they are designed for. Another obje tive is exploring issues related
to the development pro ess, s alability and other related issues.
We analyze the theory behind these databases, then we lassify and analyze the ones
whi h have signi ant hara teristi s. We hoose opensour e systems sin e we an
verify the hara teristi s de lared by vendors, we an take more grained measure-
ments of their performan es (also low level data) and the te hnology behind them is
more standard than the te hnology used by loud systems.
Moreover, we develop an alternative system based on a traditional database that
allows doing the same operations of native CS-NoSQL databases and we use it as a
omparative system.
In order to test these spe i databases we identify some signi ant lasses of ap-
pli ations where they should perform e iently, the lasses of appli ations dier for
elements that are typi al of this kind of appli ations su h as the number of lients
subs ribed, data stru ture and so on. Then we design a framework, sin e there is no
test framework to analyze this kind of appli ations whi h onsiders issues related to
1
Introdu tion
them su h as the number of lients subs ribed.
We prove that our lassi al omparative system is more e ient in terms of per-
forman e, but also in terms of s alability and stability (CS-NoSQL are very unstable
with a reasonable amount of data). In some ases, it is also 10x faster.
Besides CS-NoSQL are very e ient, in some ases, for development pro esses, in
fa t sin e they were born only for this purpose they are a full-sta k environment
(with other solutions we need to reate the entire infrastru ture) and this allows
reating a simpler appli ation with zero ode. This is very ommon proprietary in
loud lo ated systems but there is the disadvantage that it is a proprietary solution
and if the system goes down or the loud ompany fails all data are lost and there
is no way to repli ate the system in another server so a new development is needed.
So lots of CS-NoSQL issues are related to the fa t that solutions are proprietary, we
have issues like non standard way to run tests, non standard development pro esses
and so on. This is a big ost for ompanies that want to use this kind of systems sin e
they need a dierent training for ea h system used. Moreover, as we said previously,
they have not so impressive performan es. At the moment, only few ompanies use
them but we expe t big improvements on both main issues (performan es and devel-
opment pro ess issues) in the next years when, probably, more native solutions will
be implemented be ause the appli ations that need systems like that are in reasing.
Sin e CS-NoSQL are not standardized and are new, their literature is poor but
they are just a ombination of te hnologies already well-studied. The problems re-
lated to their use is that we have only information from vendors. So we hoose some
theoreti al information that are relevant and take these information from vendors (or
if they are not available we try to dedu t them from other sour es su h as the sour e
ode) for ea h CS-NoSQL that are relevant, both loud and opensour e. These infor-
mation in lude: onsisten y, data granularity, distribution and partitioning, se urity
issues and so on.
As said previously, we develop a omparing system that is based on a traditional
database using postgreSQL and lassi al ommuni ation methods based on web-
so kets.
We identify some interesting lasses of appli ations that dier for relevant har-
a teristi s that are typi al of this kind of appli ations. Even here, in order to do
that we use the information provided by the vendors.
We design a test strategy and develop a test framework sin e there are no test frame-
works for this kind of systems. This test framework allows testing the s alability and
performan es emulating the lasses of appli ations analyzed before and take indi es
of performan es that are very relevant for this kind of appli ations su h as time
needed to repli ate a new information to all the lients. Of ourse we measure also
lassi al measurements su h as raw read/write speed.
Finally, we exe ute some ben hmarks that are exe uted on identi al virtual ma-
2
Introdu tion
hines (distributed with the sour e ode of the test framework) to make tests repli a-
ble and make them in a standard way. We exe ute tests on dierent ombinations of
ma hine hara teristi s to see s alability and bottlene ks of the systems to be tested.
Thesis organization
In Chapter 1, we present the ba kground, all the knowledge that we need in order
to understand the next topi s. The main presented topi s are HTTP and webso ket
proto ol, basi on epts and issues of data in a distributed system (talking also about
publish/subs ribe and optimisti repli ation) and nally the on epts behind loud
omputing from the user's point of view.
In Chapter 2, we des ribe the state of art of CS-NoSQL. Of ourse, as we said
previously, there are no a ademi referen es to them, but referen es to te hnologies
used, so we show NoSQL databases from a theoreti al point of view with all the
related aspe ts that we need (su h as distribution). Then we show the best pra ti es
to implement the lo al database on the lient side.
In Chapter 3, we analyze some real CS-NoSQL, both opensour e and proprietary
in loud lo ated ones. We lassify them in a ommon way, as said previously, to
ompare them. Then we sele t a set of them to be used in the ben hmark tests.
In Chapter 4, we show our proposed omparing solution. We des ribe all the elements
to build it from the database to the lient su h as the system to deliver realtime
messages and so on.
In Chapter 5, we analyze the lasses of appli ations for whi h CS-NoSQL are designed
a ording to vendors. We analyze use ases, and for ea h of them we show a real
ase study.
In Chapter 6, we dene our strategy to run ben hmark tests. We analyze what we
need to do, what we need to measure, whi h tests we need to exe ute and we also
shortly explain the framework we have implemented.
In Chapter 7, we show the results of ben hmark tests and we omment them.
In Chapter 8, we present the on lusions of our master thesis and des ribe possible
improvements and future resear h.
3
Introdu tion
4
Chapter 1
Ba kground
In this hapter we show all the theoreti al knowledge needed to understand the
next hapters, in this hapter and in hapter 2 we explore all the theoreti al knowl-
edge to understand this thesis ontribution.
We analyze in 1.1 the HTTP, then webso ket proto ol (that is a proto ol based
on HTTP) that allows sending e iently messages in realtime to lients through
HTTP. The usage of a proto ol that uses HTTP is very important be ause lots of
appli ations of CS-NoSQL are web appli ations. In that se tion we analyze the entire
sta k of a lassi al HTTP appli ation, then we use it to introdu e and explain main
HTPP versions (1.0, 1.1, 2.0) then we des ribe the main HTTP based proto ols that
we use in this master thesis: webso ket (the most important one for this master
thesis' topi that allows sending realtime noti ations to lients) and RESTful (a
standard proto ol to ex hange messages between lient and server).
Then we des ribe basi on epts and issues of data management, in parti ularly
in a distributed system in 1.2, talking also about publish/subs ribe and optimisti
repli ation. We also analyze the database itself, be ause in traditional databases
there is no native event noti ation system: we have to he k it every time (polling) or
we have to reate a trigger to do that as we show in the next se tions. We also analyze
how to send realtime noti ations of hanges and the limits that are the auses
of the reation of CS-NoSQL. We analyze basi on epts of realtime data retrieve
(trigger, publish/subs ribe and high level steps needed), then we show the CAP
theorem (with eventual onsisten y and validity today with the new appli ations)
and nally distributed systems with related issues and basi on epts: distribution
& partitioning, repli ation (master-slaves & multi master) and standard framework
to elaborate data in a distributed environment su h as mapRedu e.
Finally we qui kly analyze on epts behind loud omputing from the user's point
of view in 1.3, sin e lots of systems that we analyze in the next hapters are in loud
lo ated.
After this hapter, we have all the elements needed to understand all the topi s
5
1. Ba kground
of this master thesis, ex ept for the topi s dire tly related to CS-NoSQL su h as of
ourse NoSQL databases that we show in the next hapter.
1.1 HTTP
There are a lot of ways to build an appli ation over HTTP, a web appli ation,
but all these ways have ommon parts.
Figure 1.1: Modern web appli ation sta k
In all the lassi al HTTP appli ations the lient requests something to the HTTP
server, then the HTTP server requests data to Database Server and nally the HTTP
server (appli ation server) sends data to the lient. As shown in gure 1.1.
In the rest of this se tion we des ribe the evolution of the HTTP that allows
sending realtime noti ations. The ore of the issue is the HTTP handshake ana-
lyzed below, in fa t we need to have asyn hronous requests, i.e. the server should be
able to send messages to the lient at any time [101.
In following se tions we present standards related to HTTP: webso ket (the most
important proto ol for this master thesis' topi that allows sending realtime noti-
ations to lients) and RESTful (a standard proto ol to ex hange messages between
lient and server).
1.1.1 HTTP handshake
HTTP 1.0 The original version of HTTP (HTTP 1.0) allowed just onne tions
that must be losed immediately after re eiving data [93, as shown in gure 1.2
(rst image): the lient requests something, then the server replies and nally the
onne tion is losed.
In this version the only approa hes available to make realtime noti ations are: short
polling (s heduling new requests every xed small amount of time to he k if data
are hanged), long polling ( reating a normal request to the server, but the server
replies only when there are new data, then the lient reates a new request) and
event stream (a long request without end, where the server ontinues to send data
to the lient, this approa h is used in video/musi streaming).
All the approa hes have a lot of issues that redu e the performan es [120.
6
1.1. HTTP
Figure 1.2: HTTP1.0 vs HTTP1.1 (usingkeep-alive) [42
HTTP 1.1 It is a new version of HTTP that xed some issues and improved some
things, it enabled the keep-alive option (this option was implemented uno ially by
a lot of HTTP 1.0 lients) [124.
The keep-alive option allows using the same onne tion multiple times, i.e. after a
server replies the lient an send another message over the same onne tion as shown
in gure 1.2 (se ond image).
This was originally designed to request dierent les over the same onne tion and
it onstitutes also the enabling te hnology for webso ket, as we see below.
HTTP 2.0 It is the newest HTTP standard (2015), it introdu ed a lot of improve-
ments. The most important are: the possibility for the server to push data that
the lient has not requested expli itly and the possibility to have multiple parallel
requests over a single TCP onne tion as shown in gure 1.3 [92. These features
allow also in reasing the energy e ien y [100.
But, sin e it is new, it is not supported e iently by a lot of browsers [43 and by a
lot of websites [140. So at the moment other solutions are preferred (like webso ket).
1.1.2 WebSo ket
It is a proto ol built over HTTP 1.1 that allows reating a sort of so ket where
both the lient and the server an send and re eive messages at any time [106. This
is done using an HTTP onne tion with the keep-alive option.
So this proto ol allows reating e ient realtime noti ations, where the lient re-
eives immediately a noti ation from the server (for server events) [128 as shown
in gure 1.4.
7
1. Ba kground
Figure 1.3: HTTP2 multiple parallel requests [44
Figure 1.4: Webso ket with server events [87
1.1.3 RESTful
As shown in gure 1.1 we an use RESTful all to retrieve data, RESTful is a
sort of proto ol to ex hange messages between the lient and the server over HTTP
(it uses standard HTTP methods) [142.
It is a simple proto ol that uses the HTTP verbs, for example to reate new data
a post all must be done, while to update data a put all must be done, this is
the CRUD (Create Read Update Delete) approa h [121. The idea is to make simple
a tions on resour es that return immediately an answer (like GET all) or to exe ute
immediately an a tion (like POST all).
There is another proto ol older than RESTful, still used: SOAP, it is very powerful
but in order to manage resour es RESTful is better [125, 129.
RESTful is designed to work with resour es, in fa t ea h resour e is identied by
a unique URI and there are standard a tions (HTTP verbs) for ea h resour e. For
example we an have:
Customers identied by htttp://example. om/ ustomers
Customer details with a unique ID (for example id = 10) identied by
8
1.2. Data Management
htttp://example. om/ ustomers/10
* Conta ts of a ustomer identied by htttp://example. om/ ustomers/
10/ onta ts
It is useful to surf a ross a JSON stru ture as we see in se tion 2.1.1. RESTful
annot be used with webso ket, but it an be used for short polling (GET) even if
it is not properly a orre t approa h.
1.2 Data Management
In this se tion we shortly des ribe the basi on epts of database systems to
retrieve data in a realtime way, we analyze basi on epts su h as triggers and Pub-
lish/Subs ribe then the high level steps needed to retrieve data in a realtime way.
We des ribe basi on epts and issues of data in a distributed system: onsisten y,
partition toleran e and availability (CAP theorem), we also analyze useful variations
su h as eventual onsisten y and how the original CAP theorem is hanged with the
new appli ations.
Finally we show the general idea behind a distributed system, analyzing the mean-
ing and onsequen es of partitioning and distributing, standard framework su h as
mapRedu e and repli ation (master-slaves & multi master).
1.2.1 Data retrieve
Relational databases are not developed to send noti ations, the basi idea is:
when you want data you get them.
So in the traditional HTTP appli ations, as shown in gure 1.1 for ea h request the
HTTP server he ks the database server.
If we want to make this pro ess realtime we have to solve two main issues: how to
have a noti ation of a hange onsidering also distributed system ases? The trivial
solution is the usage of a database trigger as explained below and how to manage
e iently lients that want to have information about hanges of only some things
and not of everything? The solution is to use the publish/subs ribe pattern [136
as explained in the remainder of this se tion. Finally we re ap everything needed.
Database trigger A ording to SQL:1999 (known as SQL3) [139 a trigger is a
pro edure automati ally alled by the database when some spe i events o ur.
It has the following omponents: unique name, trigger event (insert, delete, update),
a tivation time (before or after the trigger event), trigger granularity (for ea h row,
for ea h statement), trigger ondition (SQL ondition), trigger a tions (SQL pro e-
dure to exe ute) and trigger timestamp (when the trigger was reated).
9
1. Ba kground
If the event is red and the ondition is valid the trigger is alled, if there is more
than one trigger to be alled they are alled in timestamp as ending order.
Publish/Subs ribe In publish/subs ribe pattern (originally known as news sub-
system) ea h lient that is interested in some data/events subs ribes to these ones,
while the server servi e publishes data that will be sent to the lients (by the broker
server(s)) without knowing whi h are the lients subs ribed [94. Of ourse the pub-
lishers ould not be a server, in fa t every a tor an be a publisher.
Publish/subs ribe is not ompatible with RESTful alls sin e in the RESTful ap-
proa h data must be returned immediately when a request is done.
There are two types of subs riptions: topi -based ( hannels, i.e. a unique name
that identies the messages) and ontent-based (messages sent to subs ribers inter-
ested in some attributes or into their ontent).
The ontent-based system is more exible but an e ient distributed broker network
is not so easy to be developed [91.
Sin e we need just to re eive events of hanges on tables the topi -based is enough
where the topi ould be the ouple table-re ord_id as hannel.
Realtime retrieve Now we have all the elements to reate a realtime appli ation:
ea h lient subs ribes to data that it needs, a trigger on the server publishes data on
a publish/subs ribe system when they are updated and nally the ommuni ation
between the lient and the server is over a webso ket hannel (so immediately after
data are published the publish/subs ribe broker server sends data on this hannel,
as previously said RESTful annot be used).
Of ourse there are still issues related to the distribution and partitioning of the
infrastru ture.
1.2.2 Consisten y, Partition toleran e, Availability
The gure 1.5 shows the CAP theorem applied to the databases, it says that in
the distributed systems only two of the following three properties an be guaran-
teed [107, 113: onsisten y (all nodes re eive the most re ent data - write), avail-
ability (all requests re eive a response with data - this does not ensure that data
are the newest) and partition toleran e (the system works properly even if data are
partitioned among nodes, also if some of them are unrea hable). ACID (Atomi ity,
Consisten y, Isolation, Durability) databases (traditional relational databases), of-
ten, hoose availability and onsisten y [96.
Of ourse if there are no network errors all properties hold.
In the distributed systems the main issue is the syn hronization among nodes: there
is no ommon lo k, so the order of ommits is not so easy to be determined.
10
1.2. Data Management
Figure 1.5: Database triangle [24
Eventual onsisten y When strong onsisten y is not implemented, often even-
tual onsisten y is implemented: the storage system guarantees that if no new
updates are made to the obje t, eventually all a esses will return the last updated
value [141.
Today validity Today there are new aspe ts to onsider like partitioned luster
over WAN, that an ause dierent issues su h as high laten y, so CAP theorem is
not enough to lassify these new situations. New lassi ations are proposed like
PACELC to solve these issues [90.
The idea behind PACELC (Partition Consisten y Availability Else Laten y Con-
sisten y) is that if strong onsisten y must be guaranteed the laten y an in rease a
lot (time needed to re eive all the a ks from all repli a for every a tion), potentially
to innity. So a system has to hoose between low laten y and strong onsisten y.
But sin e this lassi ation is not used yet by a lot of ommer ial and non systems in
the next hapters CAP lassi ation is used, sin e using PACELC would ause the
lassi ation in onsistent (sin e some systems are not lassied under it). Moreover
some ommer ial systems do not give further details to make this lassi ation by
yourself.
1.2.3 Partitioning and Distribution
There are some systems that use both ideas. For example HDFS (a distributed
le system) repli ates data among nodes and it partitions data among nodes [135.
So a hunk of data is repli ated some times but it is not present in all nodes. A
11
1. Ba kground
group of nodes is alled luster.
To elaborate data in a distributed environment one simple, powerful and ommon
programming model is: MapRedu e.
The two main patterns to stru ture a distributed repli ated ar hite ture are: multi
master and master-salves.
Partitioning & Distribution
Partitioning Partitioning data among dierent nodes means to split data into
hunks a ording to some dened rules and to put dierent hunks into dierent
nodes [98, 113.
Distribution The distribution is the repli ation of the infrastru ture among
dierent nodes, ea h node does the same thing. It is not a so easy on ept, in fa t
there is the syn hronization problem (formally known as onsisten y as shown by
the CAP theorem) [107 for example in the following s enarios.
Commit data in a distributed database, the problem is to nd the orre t ommits
order sin e there is no ommon lo k, we should also provide a distributed lo k.
Or Publish/subs ribe in a publish/subs ribe appli ation distributed among dierent
nodes, how are published data in a node sent to all the subs ribed lients also the
ones subs ribed on other nodes?
MapRedu e MapRedu e is a programming model where data are split among
dierent nodes and are elaborated by dierent nodes [105. It is omposed by three
main a tions: map (ltering and sorting data), shue (grouping data by key and
sending data with the same key to the same redu er node) and redu e (summary
operations).
Multi Master & Master-Slaves repli ation To have data immediately avail-
able (optimisti repli ation) we have to renoun e to strong onsisten y having even-
tual onsisten y.
There are dierent ways to solve oni ts and to obtain eventual onsisten y. One of
the simplest ways to solve oni ts is Last Writes Wins, in this way the write with
the highest timestamp is used overwriting the on urrent others [133.
In order to keep strong onsisten y we need a distributed lo k that auses delays,
problem shortly.
Multi Master All data an be written in every node of the luster that repli-
ates them to other nodes [51. Of ourse in this kind of ar hite ture there an be
issues due to the CAP theorem.
12
1.3. Serverless
Master-Slaves All data are written in one node that repli ates them to the
others that an be used only in read only mode [49. Of ourse in this kind of
ar hite ture there are less issues than all the issues explained previously. There an
still be issues due to availability as explained by the CAP theorem. If the repli ation
is multi-master it an be also master-slaves.
1.3 Serverless
A serverless is a loud omputing ar hite ture, where the developer does not have
to think about the infrastru ture. It is something half-way between Platform as a
Servi e and Software as a Servi e, for that reason it is also known as Fun tion as a
Servi e.
The serverless idea is a general idea that an be implemented for dierent ases, a
lassi al implementation is for le systems [95.
In serverless ar hite ture the developer has to think only to develop the appli a-
tion without thinking to how to s ale the infrastru ture, how to store data in a se ure
way and so on. In serveless ar hite ture the developer does not have to develop a
real server, the server is already set, he has only to ongure it and eventually to
develop some small extensions (in the database ase the extension an be simply the
triggers).
Like Fun tion as a Servi e suggests we an also have systems where we have to de-
velop just a pie e of ode, a fun tion, whi h is exe uted in an unknown infrastru ture,
where we do not have to think about s alability, we pay just for usage. For example
we pay the number of exe utions of that fun tion, this is the way used by Amazon
AWS implementation: AWS Lambda [2, 3. This is a big advantage for the user,
be ause there is an instant s alability that there is not on Platform as a Servi e.
As we see on 3.2 this kind of ar hite ture is entirely developed on loud ar hite -
tures and partially developed on opensour e ar hite tures, in opensour e ar hite -
tures we have the fa ility to ongure and develop the server but we do not have an
easy s alability like the loud systems.
After we see the se tions below we know how the dieren e among these three
kinds of ar hite tures is minimal and that often there is overlapping among the ar-
hite tures.
An example of overlapping is salesfor e. om, a CRM ( ustomer relationship man-
agement) servi e born like a Software as a Servi e [103. But it has be ome a sort
of serverless ar hite ture to build business appli ations, it has also bought heroku (a
Platform as a Servi e ompany). However, even if now it is like a serverless servi e,
the pri e system has remained the same as that of Software as a Servi e system,
where you do not pay for the usage, you pay for a ounts and ea h a ount has some
limits [78.
13
1. Ba kground
1.3.1 Platform as a Servi e
Platform as a Servi e is a loud omputing ar hite ture where the developer does
not have to think about the infrastru ture. He hooses the platform where to develop,
that is automati ally ongured and ready to use [117. In this kind of ar hite ture
the developer has to hoose how to s ale (it an be an automati pro ess) but in an
easy way, often he has only to hoose the number of nodes.
One of the most known Platform as a Servi e servi e is heroku [123, omparing
the pri e system of heroku [41 with the pri e system of AWS Lambda [3 we an
observe that in heroku we pay for the number of nodes per se ond, while in AWS
Lambda we pay just for the exe ution time. In AWS Lambda we do not have to do
anything to exe ute just one fun tion per se ond or thousands per se ond, while on
heroku we have to set the right number of nodes or to set a orre t strategy to dy-
nami ally reate/remove nodes. So the serverless approa h an rea h the maximum
grade of sharing resour es in loud omputing.
For what on erns the on ept of platform ready to use to develop the serverless
ar hite ture is similar to Platform as a Servi e but it is dierent for other aspe ts
like s alability as we have seen with AWS Lambda example.
1.3.2 Software as a Servi e
Software as a Servi e is a loud omputing ar hite ture where there is no need or
possibility to develop anything [103. One lassi example is a webmail servi e, that
an be ongured for a private ompany.
One of the most known Software as a Servi e servi e is google apps for work [31, a
servi e where the employees of a ompany have a ess to the ompany mail, ompany
do s, ompany loud storage and so on. Everything is done only by onguring a
system without developing anything. The pri e system of google apps for work is a
system where the ompany has to pay for the number of users not for the usage [32,
an approa h ompletely dierent from other loud systems, but it is a good approa h
for ompanies be ause in that way they are able to predi t the ost in a reliable way.
The serverless approa h is very similar to Software as a Servi e sin e the developer
does not have to think about the infrastru ture or how to build a reliable server, but
instead of the Software as a Servi e in the serverless ar hite ture the developer an
develop something, not only ongure.
14
Chapter 2
State of Art
In this hapter we analyze CS-NoSQL databases and lient best pra ti es. In the
previous hapter we have analyzed all the elements needed to understand these topi s,
we have also seen the limits of traditional te hnologies whi h led to the reation of
CS-NoSQL. In fa t the CS-NoSQL allow in reasing performan es (theoreti ally) and
sending realtime noti ations to lients easily without implementing manually the
entire sta k.
As we show the on epts behind CS-NoSQL are well-known on epts with a lot of
a ademi referen es, but the idea behind CS-NoSQL that joins all of these knowledge
is not. In fa t, it is based on the ommon te hnologies best pra ti es.
We analyze the key aspe ts of CS-NoSQL in 2.1, their relationship with NoSQL
databases. We analyze the main data models, distributed issues, publish/subs ribe
applied to CS-NoSQL with their advantages and other non trivial aspe ts su h as
onstraints, permissions and queries.
Then we des ribe the best pra ti es from the lient to re eive noti ations (using
publish/subs ribe) in a transparent way for the nal user's point of view in 2.2.
We show event driven approa h, an analysis of the lient language used, RESTful
approa h and last but not least the lo al database.
This hapter is needed together with the previous hapter to understand and to
lassify the real systems explained in 3. Moreover, it is very useful as based-point
from a theoreti al point of view for the omparative approa h we proposed in 4, in
parti ular we refer to the lient best pra ti es.
2.1 CS-NoSQL
They are improperly alled realtime databases even if they are not realtime
databases (a ompletely dierent thing), but this name is used for ommer ial pur-
poses sin e they send realtime noti ations of hanges in data.
These databases have some advantages. They an be easily partitioned and dis-
15
2. State of Art
tributed over dierent nodes, an important property. Moreover they easily allow
subs ribing to dierent granularities of data and to re eive noti ations of hanges
using publish/subs ribe. This is done without writing any line of ode in the server.
They implement also the publish/subs ribe broker server. Furthermore they an
have partially support to onstraints, permissions and queries support. Also they
an be easily implemented in a serverless approa h with real databases. It is an e-
ient approa h that an also give a lot of e onomi advantages in dierent situations.
Finally they implement lient best pra ti es providing all libraries/framework.
It is easy to see why these databases are implemented over NoSQL databases.
In fa t a NoSQL database is an unstru tured database, it an support a subset of
the SQL instru tions (but it is not mandatory).
They were born to reate s alable databases for Big Data (a set of data that are
too large or too omplex to be managed/elaborated with a traditional system) ap-
pli ations. Even this means losing query expressiveness power, in fa t they an have
multiple masters support [98.
They an be easily partitioned but in order to ensure this e iently they have to
renoun e to the onsisten y or to the availability as shown in the gure 1.5, we show
more details analyzing real appli ations in 3.
Data are stored in unstru tured formats. We des ribe two main formats: key-value
and JSON, we use JSON when possible in our appli ations, but as we see key-value
is very important in some appli ations and it is a sort base of JSON for some aspe ts.
Then we analyze distributed issues with pros and ons, we analyze the CAP theorem
and distribution and partitioning applied to CS-NoSQL.
Moreover we show publish/subs ribe implemented in these databases2.1.3.
Finally non trivial tasks and issues su h as onstraints, permissions and queries.
2.1.1 Data model
In NoSQL databases data are organized in olle tions [113, they are like the
tables of traditional databases.
Ea h olle tion an be organized a ording to dierent ways, we see the two most
used ways: key-value and do ument, for do ument we analyze the JSON that is one
possible format for do ument type.
Ea h do ument is something that an ontain data in JSON format or XML format,
le and so on.
Key-value Key-value is one of the simplest ways to store unstru tured data. It is
also known as di tionary or hash [113. The basi idea is that we have unique keys
and for ea h of them we have a linked value, i.e. a do ument that an be everything:
a le, a simple type, a omplex type (like JSON) and so on.
The simpli ity of this s hema is its key of su ess, in fa t it is used in a lot of NoSQL
16
2.1. CS-NoSQL
systems.
If we want to apply publish/subs ribe we an easily use topi -based publish/subs ribe
using the ouple olle tion-key as hannel. So the only level of granularity is the value
(low granularity), we do not inspe t it.
JSON JSON (JavaS ript Obje t Notation) is an easy standard to store data [102,
it is not so powerful like XML but it is simpler so the omputation is faster [127. It
ould be used as model in NoSQL databases do ument.
On listing 2.1 there is a simple example of a JSON stru ture.
1
2 "main_array":[
3 [
4
5 "title": "Element 11"
6 ,
7
8 "title": "Element 12"
9
10 ,
11 [
12
13 "title": "Element 21"
14 ,
15
16 "title": "Element 22"
17
18
19
20
Listing 2.1: Simple JSON example
On the JSON stru ture there an be simple types, array or obje ts, in the previous
example (listing 2.1) we have:
main obje t
an array ( alled main_array)
* two arrays
· two obje ts (with title property)
So a NoSQL database that uses JSON as do ument is omposed by a JSON data like
that (a main JSON obje t). This stru ture is a very powerful stru ture that allows
storing a lot of omplex stru tures.
A simple example an be a system where we have some users, ea h of them with a
17
2. State of Art
omplex stru ture, ea h user is stored as obje t in a main array.
The RESTful is a good proto ol for JSON and, in general, for NoSQL databases [132.
It is easy to see how this stru ture an be useful to subs ribe only to a portion of
data, to a path (like /main_array/0/). For example using a stru ture like the one
used in listing 2.1, we an subs ribe to the se ond element of the main array (an
inner array) then we re eive noti ations for hanges of it (or other inner elements
also omplex elements). So we an have a high granularity, dierent systems have
dierent limits on max number of levels allowed.
If we want to apply publish/subs ribe we an easily use topi -based publish/subs ribe
using the ouple olle tion-path as hannel.
We an also see that the expressiveness power is the same of key-value sin e we an
onsider path as our key (of ourse the stru ture of data stored is dierent), but it
is more readable.
We analyze better what it means to subs ribe to a spe if path of the JSON stru ture
below.
We an see that if we an subs ribe only to the rst level of the main obje t, the
nal result is the same of a key-value stru ture.
2.1.2 Distributed issues
As introdu ed at the beginning NoSQL databases an be easily distributed and
partitioned. We obtain an in rease of performan e from distribution and partitioning
but, often, we have to renoun e to strong onsisten y.
These databases, often, implement MapRedu e to elaborate distributed data e-
iently [114.
CAP theorem The CAP theorem is one of the fundamental theorems for dis-
tributed databases.
Sin e the main hara teristi of NoSQL databases is the s alability they need parti-
tion toleran e. In fa t NoSQL, often, hoose availability and partition toleran e or
onsisten y and partition toleran e [111, 113. In 3 we analyze NoSQL databases of
both types.
Sin e, often, they do not implement a strong onsisten y an eventual onsisten y is
implemented.
Distribution and partitioning Distribution and partitioning introdu e dierent
issues.
But in NoSQL databases also the master node an be distributed, so the lient
an ommit on dierent nodes. This in reases the performan e, sin e there are no
saturation issues of the server or of the lo ks needed to write data ( ommit), but it
ould reate problems in transa tion onsisten y [98.
18
2.1. CS-NoSQL
2.1.3 Publish/subs ribe
Topi -based subs ription is enough. The ommon idea in all of these databases
is to subs ribe to the main events (the same events of a database trigger explained
in 1.2.1): inserted, updated, deleted.
We subs ribe to these events for dierent levels of granularity, that depends on data
model used and on the database itself. For example there are some databases that,
even if they support JSON, allow subs ribing only to the rst level of it (so it is like
a key-value).
Of ourse, as introdu ed at the beginning, the me hanism to subs ribe to these data
and the trigger that publishes them are developed inside these databases. So no
other development is required, the system is ready to publish hanges. The delivery
an be implemented using dierent te hnologies that we have analyzed in 1.1.
Of ourse we still have issues due to the fa t that we have to s ale also the bro-
ker servers and as previously said it is not an easy task (even if for topi -based is
easier than for ontent-based). Fortunately these databases provide an integrated
publish/subs ribe that s ales with the database itself.
2.1.4 Constraints, permissions and queries
As we see with real databases onstraints on data, permissions and queries are
not a trivial task, espe ially in CS-NoSQL. In fa t we have to repli ate them on the
lient.
Constraints and permissions In dierent systems more than one user an a ess
to the same JSON obje t (do ument) so we need ne grained permissions (it an
be onsidered as a onstraint). Moreover we ould want to have some traditional
onstraints like integrity onstraint, data type onstraint and so on. So we see ev-
erything as a onstraint.
Dierent systems implement dierent onstraints sin e they an also modify perfor-
man es of the system [98.
Queries There are no standards for queries in NoSQL databases, sin e they de-
pend on dierent elements: data model used, database used, MapRedu e with query
support and so on [114. Often key-value database allows having more powerful
queries.
There are some languages (pig and hive) built on top of MapRedu e that allow doing
queries in standard ways [126.
19
2. State of Art
2.2 Client Best Pra ti es
In this sub se tion we analyze the best pra ti es used in these kinds of appli ations
from the lient's point of view. Some hoi es, like data returned in some ases, an
be imposed by the server te hnology hosen. In gure 2.1 a simple ar hite ture is
shown.
Figure 2.1: Simple lient ar hite ture
Of ourse, sin e there is no standard, any appli ation implements dierent things
and/or implements them in dierent ways. So this is just a summary of the most
used best pra ti es.
Note that with appli ation server (then server) we mean the CS-NoSQL with inte-
grated publish/subs ribe broker server.
We an summarize them in some points. The lient does not onta t dire tly the
appli ation server but sends all the requests to the lient framework. It re eives
immediately a response with an eventual onsisten y logi (a read does not return
a data older than a previous read), but of ourse data annot be updated with
the latest of the server. Furthermore the server sends asyn hronous noti ations of
hanges, the lient at hes them using an event driven approa h. The lient spe -
ies whi h data he needs, so he re eives noti ations only for them. Moreover the
lient uses a language that easily allow working with event driven approa h. Also
the lient framework uses RESTful to send/re eive syn hronously data. Finally the
lient framework implements a lo al database, it stores all data written by the lient
and all data re eived by the server via asyn hronous noti ations. This allows re-
turning data immediately when requested by the lient.
All things allow using e iently publish/subs ribe to re eive data in a transparent
way for the nal user.
2.2.1 Event driven approa h
On algorithm 2.1 we show a ommon and simple realtime retrieve of data in
the event driven approa h. The basi idea of event driven approa h is to all a
allba k when an event is red [104, a known example of event driven programming
is programming for desktop interfa es where the events are the user inputs.
Of ourse this is not the only approa h available but it is easy to understand how
it is very useful in asyn hronous appli ations like CS-NoSQL appli ations that we
20
2.2. Client Best Pra ti es
dis uss in this thesis.
Algorithm 2.1 Event driven retrieve of data
1: db← onne t(DB_ADDRESS) ⊲ Conne t to DB
2: document← sele tDo ument(db,DOCUMENT ) ⊲ Sele t do ument
3: onChange(document, PATH,CALLBACK) ⊲ Subs ribe to hange events
4: pro edure allba k(newVal, oldVal)
5: log(oldV al) ⊲ Log new value of the path
6: log(newV al) ⊲ Log old value of the path
7: end pro edure
The basi idea is to subs ribe a allba k to hange events of a spe if path of
a do ument, thinking of JSON do uments the path is a referen e to a spe i
level of the JSON do ument. For example a path for the JSON listing 2.1 an
be /main_array/0/ to subs ribe to hange events of the rst array inside the
main array.
On the allba k the entire new path an be passed or only the part hanged, in the
ase of an array only the hild hanged.
2.2.2 Client language
One of the most used languages in this kind of appli ations is E maS ript 5.0
(in some ases E maS ript 6.0), ommonly known with its diale t name: javas ript.
In fa t the most known E maS ript interpreter: V8 is very e ient for event driven
programs, it is the interpreter used by the desktop porting that we use: nodejs [137.
This e ien y is due to a good Just in Time ompiler written in C [138. All server
te hnologies hosen analyzed in the next hapters have E maS ript lients, of ourse
we an reate lients in other languages sin e the proto ols used are standard and
opened. E maS ript is a language where event driven approa h is easy to implement,
in fa t the language has the fundamentals to exe ute the asyn hronous ode sin e it
has the allba ks [101, 109.
Moreover E amS ript is exe uted in only one thread, also the allba ks are exe uted
in this thread. But some operations like network requests, database onne tions
and so on are exe uted by other threads in ba kground (the result is passed to the
allba k alled in the main thread). If there is no CPU intensive ode, this approa h
is very e ient and solves problems due to ra e onditions [137.
Nodejs has also a standard pa ket manager alled NPM, with a lot of libraries, this
allows writing small examples with only the ode needed to understand, the other
parts are done by the libraries.
Furthermore JSON, as the a ronym suggests, is derived from javas ript. In fa t
the javas ript an parse it easily and JSON obje ts/arrays be ome javas ript native
obje ts/arrays. So we an a ess to them in a native way without alling parsing
21
2. State of Art
methods to iterate the stru ture or without mapping the JSON into already dened
lasses (i.e. deserialize it to obje ts), like other languages do like JAVA [134, in fa t
both approa hes are not easily adaptable to stru ture hanges.
2.2.3 RESTful approa h
What we said previously for path is valid also for RESTful URI. RESTful is
useful to post/put/delete data and to retrieve them syn hronously, syn hronous re-
trieve an be useful to populate lo al databases at the beginning, in fa t RESTful is
not ompatible with asyn hronous requests. There is no reason to exe ute this kind
of operations in an asyn hronous way, of ourse they an be done via a webso ket
onne tion, but they are done in a sort of syn hronous way.
So the best pra ti e is to use RESTful for all the syn hronous operations, let the
possibility to use RESTful get all data or a portion of them, but give also an asyn-
hronous interfa e integrated with the event driven approa h like webso ket.
Of ourse sin e JavaS ript has asyn hronous operations support the result of these
syn hronous operations is returned via a allba k. These operations are syn hronous,
in the meaning that after the request is sent a response is returned immediately, but
there are network delays, for that reason the ode is exe uted in an asyn hronous
way [109.
2.2.4 Lo al database
The best pra ti e is to reate a lo al database on the lient repli ating server data
(with only the data needed, what the lient want) using optimisti repli ations [133,
this allows having what ommer ially is alled optimisti UI [52.
It gives the ability to update the lo al database (and onsequentially the user inter-
fa e) even if there are network delays or if the network is down.
So the developer an all all the methods (get data, put data and so on) on the
database, seeing the ee t, even if there is no onne tion (often alled oine mode).
Of ourse some server onstraints/modi ations or also permissions onstraints are
applied when the network omes ba k up. Some of the onstraints/modi ations
applied by the server an be implemented in the lient su h as type onstraints, in-
tegrity onstraints and so on. Of ourse the server re he ks everything again.
The lient ould also implement the query logi . But for simpli ity, often, this kind
of systems have simple query support or no query support (neither lient nor server).
Coni ts an o ur and dierent ways exist to solve them [133.
One of the simplest ways to solve oni ts is Last Writes Wins, in this way the
writing with the highest timestamp is used overwriting the on urrent others. This
approa h is used by dierent systems in the implementation of lo al database, the
timestamp used is the one of the server when the message is really re eived by the
22
2.2. Client Best Pra ti es
server (not the update time by lient that an be very old due to network delays). Of
ourse this approa h is reliable when there are masters/supernodes, in other ases
te hniques that take in onsideration onsensus are needed.
23
2. State of Art
24
Chapter 3
Analysis of Some CS-NoSQL
In this hapter we analyze some ommer ial CS-NoSQL both opensour e and
proprietary in loud lo ated. We sele t some of them for a future analysis, all the
systems sele ted are open sour e systems sin e we have mu h more ontrol and we
an do better tests. The set is omposed by CS-NoSQL with dierent hara teristi s
su h as dierent CAP, dierent data stru tures and so on. These hara teristi s
are based on theoreti al on epts we have seen in hapter 2. Some of the systems
analyzed in this hapter are the same used in the hapter 5 to dene important
lasses of appli ations. The CS-NoSQL sele ted for future analysis are the ones we
use in the test ben hmarks in hapter 7.
In this hapter we des ribe the main hara teristi s to onsider in 3.1. In that
se tion we also show a re ap of the main hara teristi s of the CS-NoSQL sele ted.
Then we analyze all the hara teristi s des ribed before for some CS-NoSQL in 3.2.
We do a better analysis with all the hara teristi s for systems that we sele t, for
others we do only a simple des ription with the main hara terizing features.
3.1 Chara teristi s to be onsidered
We onsider NoSQL databases with the following hara teristi s: JSON as data
model to have a exible standard that is easy to use, realtime noti ations sup-
port with publish/subs ribe (of ourse this an be provided by other plugins)and a
javas ript library sin e it is the language that we use.
Then we keep in onsideration, to distinguish dierent systems, dierent hara -
teristi s. We onsider data granularity for subs riptions, even if data are stored in
JSON we an have a granularity only to the rst level of the JSON obje t (it is like
key-value) or have a key-value stru ture where the value is JSON. Of ourse we ana-
lyze also the lassi ation a ording to the CAP theorem, of ourse onsidering also
eventual onsisten y if available, this is done only for the server distribution, we do
not have enough information to do this lassi ation for the lo al database (generally
25
3. Analysis of Some CS-NoSQL
it implements eventual onsisten y). But we keep in onsideration also: distribution
(with MapRedu e support) and partitioning, lo al database implemented, repli ation
model (multi-master or master-slaves), proto ol used to send noti ations, proto ol
used to send data onstraints and permissions support (with user management) and
query support. This detailed analysis is done only for systems that we test in the
next hapters.
As previously explained the CAP theorem is not enough for a lot of aspe ts but,
as explained, today is not possible to make a lassi ation under other systems for
a lot of systems.
In table 3.1 we show the main hara teristi s of the databases sele ted for a further
analysis in the next hapters, the sele tion is done taking systems with dierent
hara teristi s.
Table 3.1: CS-NoSQL omparison
Database CAP Lo al DB MapRedu e Repli ation User management Noti ation Interfa e Data granularity Queries
Cou hbase CP No Yes Master slave Yes All Proprietary Key-value Yes
Pou hDB AP Yes Yes Multi master Yes Long polling RESTful Key-value Limited
Gun.js AP Yes No Multi master Yes Webso ket RESTful Fine grained No
3.2 Analysis
Even if this kind of systems are young a lot of systems have already been reated,
there are both opensour e and SaaS systems.
Some SaaS systems are very famous and used but they do not allow doing all tests
and performan e measurements needed for our study, sin e we do not know the
internal stru ture and repli ation, so we analyze only opensour e systems. Of ourse
we qui kly analyze the hara teristi s of the main SaaS systems.
We analyze some opensour e systems, some of them are analyzed qui kly be ause
are not studied in the next hapters but they are well-known or they have parti ular
hara teristi s. Then we analyze some SaaS systems that are very used, they are not
studied in the next hapters.
Note that the presented information are retrieved from data shown on the o ial
sites of the produ ts, but, in most ases, they are not veried empiri ally. Moreover
the term realtime is used improperly.
3.2.1 OpenSour e
We analyze some open sour e systems, for systems that we sele t for ben hmarks
we do a omplete analysis. Analyzing every important point previously explained.
For the other we only say hara terizing features, in fa t these systems do not have
good performan es or they la k of some needed features for our tests.
26
3.2. Analysis
Cou hBase Cou hBase
1
is a NoSQL database with realtime extension alled syn -
Gateway
2
(every a tion must be done passing through this gateway). It is one of
the most famous open sour e systems, for that reason we test it.
A ording to the hara teristi s previously des ribed, using syn Gateway + ou h-
base, we have:
Data granularity for subs ription: key-value [15.
Lo al database: none.
CAP: CP ( onsisten y and partitioning) [6.
MapRedu e: MapRedu e support [10.
Repli ation model: master-slaves repli ation support [12.
Proto ol used for noti ations: webso ket and all the ways explained in 1.1 to
re eive events [14.
Proto ol used to send data: proprietary.
Constraints and permissions support: user management [13.
Query support: N1QL, a super set of SQL to query JSON [11.
Other interesting features are: powerful luster onguration [7 and full text sear h
[9 i.e. full text allows you to sear h and nd what you are looking for even without
exa t mat hes. Just like the LIKE keyword in SQL? Not really. It is something
else. LIKE allows the use of wild ards, whi h is quite dierent. This means it is
ase insensitive, it an ignore unimportant words like 'is' (stop word is the te hni al
term), and is tolerant to mistakes like typos..
Pou hDB Pou hDB
3
is just a javas ript library that intera ts with a NoSQL
database: Cou hDB, sin e Pou hDB uses a stable and used system like Cou hDB
is a system that we test. Pou hDB, in addition to the interfa e, implements also a
lo al database.
Cou hDB Cou hDB
4
is one of the most famous NoSQL databases and it is
very simple.
A ording to the hara teristi s previously des ribed, using Pou hDB + Cou hDB,
we have:
1
http:// ou hbase. om
2
http://developer. ou hbase. om/do umentation/mobile/1.1.0/get-started/
syn -gateway-overview/index.html
3
https://pou hdb. om/
4
https:// ou hdb.apa he.org
27
3. Analysis of Some CS-NoSQL
Data granularity for subs riptions: key-value [15.
Lo al database: Pou hDB [64, 65.
CAP: AP (availability and partitioning) [16, 21, but eventual onsisten y is
implemented [17.
Mapredu e: MapRedu e support [18.
Repli ation model: multi master repli ation support [19.
Proto ol used for noti ations: long polling to re eive events [15.
Proto ol to send data: RESTful native interfa e [20.
Constraints and permissions support: simple user management [22.
Query support: limited [23.
These features allow interfa ing to it e iently and in a realtime way [38. The only
negative aspe t of Cou hDB is that it has a lower query expressiveness power (for
example there is no SQL join equivalent) than other analogous NoSQL databases
like MongoDB [50.
Gun.js Gun.js
5
is a full sta k CS-NoSQL implemented in javas ript. It does not
use any already implemented software, but everything is ad ho implemented. For
that reason, for some features and for the fa t that is implemented in javas ript it is
useful to test it.
An important hara teristi is that there are no entralized stru tures [36, there is
no entral server required, any lient an be a server, so it is peer to peer ( ommonly
know as P2P) [131.
A ording to the hara teristi s previously des ribed, using Gun.js, we have:
Data granularity for subs riptions: JSON full path [35.
Lo al database: yes, sin e any lient an a t as server as said previously.
CAP: AP (availability and partitioning) [34.
MapRedu e: none.
Repli ation model: multi master, sin e it is fully distributed as said previously.
Proto ol used for noti ations: webso ket [38.
Proto ol used to send data: webso ket [38. This hoi e sin e it is fully dis-
tributed as said previously.
5
http://gun.js.org
28
3.2. Analysis
Constraints and permissions support: authenti ation support in a P2P envi-
ronment, using asymmetri ryptography [39.
Query support: none.
Other interesting features are: graph support [37, the ability to expli itly link do u-
ments together, of ourse in any NoSQL database a link an be done manually using
a sort of ID and reliable storage, it allows storing data on AWS S3 [40, AWS S3 is
a reliable loud storage [4 implemented by Amazon.
MemSQL MemSQL
6
is s alable and repli ated in memory SQL database (it is
like traditional databases). It is very interesting but, sin e it is an hybrid system
that does not use the potentiality of NoSQL, it is not analyzed in this thesis.
Meteor Meteor
7
is a Javas ript Client and Server Framework that uses MongDB
[50 to reate a CS-NoSQL appli ation.
But it is not fully opened to other te hnologies and the server side an be fully
developed. The last issue an be a problem for the s alability and e ien y, in fa t
it annot be used as serverless but instead like a lassi al appli ation, like a lassi al
approa h.
For this reason it is not analyzed in this thesis.
3.2.2 Sowftare as a Servi e
These kinds of systems an be easily adapted to a SaaS servi e, sin e they an
have no ode on the ba kend, even if a more orre t lassi ation should be serverless.
In fa t all the ommer ial systems advertise the ability to build the appli ation
without thinking to the stru ture.
Firebase Firebase
8
is one of the most famous ommer ial in loud lo ated CS-
NoSQL, it is owned by Google. It gives just a JSON do ument where you an
subs ribe to any level.
In order to oer s alability in an e ient way it does not allow writing any line
of ode on the ba kend, you an put only stati resour es on the ba kend (that of
ourse are not CPU time expensive). On the other hand it allows dening ba kend
rules that a t like triggers to validate data and have some useful additional features
like login system.
Of ourse it provides libraries and proto ols to a ess it e iently and in a realtime
way su h as: RESTful interfa e, webso ket and Event Stream.
6
http://www.memsql. om/produ t/
7
https://www.meteor. om/
8
https://firebase.google. om/
29
3. Analysis of Some CS-NoSQL
Furthermore the ost system an be easily adapted to data used, in fa t, sin e the
infrastru ture is losed it an be very e ient and the queries are not omplex [25.
Moreover, sin e it is one of the most used databases of this type, it an be easily
integrated with other external systems.
Pubnub Pubnub
9
is a loud publish/subs ribe topi -based system with storage
support. So it does not have the power of CS-NoSQL servers sin e the hannels are
not related to data.
Of ourse there are opensour e publish/subs ribe implementations like so ket.io [79.
Pubnub has the advantage to be in loud lo ated, so it resolves also s alability issues.
Ba kand Ba kand
10
is a proprietary serverless ar hite ture for web appli ations,
it is a publish/subs ribe servi e like Pubnub. But it allows also having more ontrol
on the ba kend, like rebase.
9
https://www.pubnub. om/
10
https://www.ba kand. om/
30
Chapter 4
A Proposed Comparing
Traditional Approa h
In this hapter we propose an approa h based on a traditional database (RDBMS
database), we use this approa h to ompare the performan e of CS-NoSQL in 7. Of
ourse, ex ept for the database, we try to emulate the approa hes of CS-NoSQL.
The ar hite ture proposed ould be implemented with any RDBMS that has an
advan ed trigger support. In fa t even if in some ases we use a proprietary solution,
an alternative (general) solution is provided.
Firstly we show and explain all the base te hnologies to explain the ar hite ture
in 4.1: PostgreSQL, redis and so ket.io.
Then we explain the ar hite ture proposed that is general in 4.2, it is omposed
by dierent parts: database, publish/subs ribe, webso ket server, input server and
ustom logi . We explain how to implement ea h part of the ar hite ture using the
te hnologies explained before.
Finally we show our implementation in 4.3, explain better some logi parts that
depend on the implementation not on the theoreti al on epts: server logi and
lient logi .
4.1 Te hnologies Ba kground
4.1.1 PostgreSQL
The RDBS are CA a ording to the CAP theorem and as shown in gure 1.5,
i.e. they do not support a partition of data in an e ient way. But of ourse they
an be partitioned.
Moreover they an be repli ated a ording to the stru tures explained in 1.2.3: multi
master or master-slaves.
31
4. A Proposed Comparing Traditional Approa h
PostgreSQL
1
is a powerful and used database (so a lot of libraries are implemented).
It an be partitioned [58 and repli ated in a multi-master ar hite ture (and so also
master-slaves) [60.
Moreover it has interesting hara teristi s that we use in the next se tions: it
is a very exible database that allows using ustom languages [59, one of them is
PL/sh
2
a language that allows exe uting shell ommands, so we are able to all an
external program and it has a feature to reate easily a sort of publish/subs ribe
system, it is a queue where you an publish messages and read them in order, to do
that NOTIFY [57 and LISTEN [56 ommands are used. NOTIFY is really alled
only after the transa tion is ommitted. Furthermore PostgreSQL has JSON as data
type [55, it ould be useful but we do not use it to keep the approa h standard.
4.1.2 Redis
Redis
3
is a key-value storage in memory [113, data an be written to the disk
every xed amount of time. It allows implementing a lot of dierent elements: a he
system [70, publish/subs ribe system [75, queue system using lists [71, 76 and
distributed lo k system that an be used by external systems [72.
And has interesting features su h as data partitioning support [73 or master-slaves
repli ation support [77. Moreover everything (image, text, json and so on) an be
inserted as the value (key-value storage), with a high storage limit (512 megabyte
per value) [71.
Unfortunately to keep a high performan e data are not stored immediately in the
disk, but they are written every one se ond (default onguration) [74.
4.1.3 So ket.io
So ket.io
4
is one of the most famous webso ket servers, written in javas ript.
It is built on top of engine.io
5
that is like the transportation level in the ISO/OSI
sta k, it is very e ient but it is only a webso ket implementation. As we show
so ket.io has a lot of integrations and a native implementation (sin e the proto ol
is open [84) that makes it the best hoi e for our tests. Furthermore it is the best
open sour e webso ket server [88.
So ket.io has useful features for our appli ation su h as redis integration to reate
webso ket luster [85, user/session support [86, P2P support [83 that ould be
useful to implement a system like gun.js. Moreover messages are sent in FIFO order,
it is an important property to have eventual onsisten y as we show in the next
1
https://www.postgresql.org/
2
https://github. om/petere/plsh
3
http://redis.io/
4
http://so ket.io/
5
https://github. om/so ketio/engine.io
32
4.2. Ar hite ture Proposed
se tions.
Furthermore So ket.io allows emiting events to all the subs ribed lients to that
event [81, so events are hannels of a publish/subs ribe system, also the lient an
send events that, generally, are aught only by the server, of ourse, sin e it is
publish/subs ribe, there is no noti ation of su essfully delivery to the server (a k).
So the approa h to be used to work with it is the event driven approa h. So we an
a have a distributed publish/subs ribe with user/session support.
4.2 Ar hite ture Proposed
Figure 4.1: Traditional Ar hite ture Proposed
In gure 4.1 we an observe the ar hite ture proposed, every part is explained
in the following se tions.
We have also added an additional level: load balan er/CDN ( ontent delivery net-
work). Load balan er is used to s ale [97, 115, to route users to dierent so ket.io
servers.
In fa t as we observe in the ar hite ture that we propose we an have more than one
so ket.io server, but there is no system to route users.
Sin e we do not need it in the tests phase, we skip this level to avoid inserting other
levels to test.
As we see in 6 we do not partition postgreSQL and redis, in fa t there are not the
bottlene k and we want to test only the realtime feature. In that way we keep things
more simple, sin e we do not have also to onsider the issues related to partitioning
su h as laten y.
While a CS-NoSQL oers a full sta k solution (often with lient lo al database
support) here we have to dene every aspe t of the sta k. So we need to implement
what we have dened in 1.2.1, lient best pra ti es, related needed aspe ts and what
an be useful to emulate CS-NoSQL approa h.
We implement a RDBMS database that must be s alable among nodes and trig-
gers support to notify hange, publish/subs ribe system to publish noti ations
33
4. A Proposed Comparing Traditional Approa h
of hanges, webso ket server, input server where re eiving data to be sent to the
database, ustom logi level an eventual level where to introdu e ustom logi and
lient framework/libraries to ommuni ate to the server ( luster). In this solution
we have a data granularity at a row level, sin e detailed granularity is not required
by the further study that we do.
4.2.1 Database
We use postgreSQL So the normal operations are guaranteed by it (it has good
libraries), so the onsisten y is guaranteed by it.
So the only thing we have to solve is the noti ations of hanges to an external
system (a publish/subs ribe system).
The trivial solution (and general) is a trigger that alls an external program (that
publishes data on publish/subs ribe system), this an be done easily using PL/sh
as shown in appendix A.1. But the all to an external program is not so e ient
(startup time), moreover we should nd a way to all the external program only after
the ommit.
So we an use another solution that is more e ient but not standard: PUB-
LISH/NOTIFY. The trigger publishes the message, then an external listener re-
publishes them in the other publish/subs ribe system.
We have the following steps when there is a hange: trigger all publishes on
a predened postgreSQL hannel for ea h event then a javas ript listener listens
to the same postgreSQL hannel nally the javas ript listener republishes the same
message on publish/subs ribe system setting also namespa e and rooms. There are
a lot of examples useful for our use ase, one of them (with javas ript listener) [62
was modied and used in our nal onguration published on github, we an observe
that this approa h is like a key-value approa h.
4.2.2 Publish/Subs ribe and Webso ket server
We try to emulate the approa h used in CS-NoSQL.
We use so ket.io for webso ket server. As previously said so ket.io is a system built
on top of the webso ket system (engine.io), so the publish/subs ribe broker server
and the webso ket server are in the same ma hine.
It implements a distributed (via redis) publish/subs ribe, i.e. dierent publish/sub-
s ribe broker servers ommuni ate through redis. But in this way we do not know
the status of the entire network, we do not know whi h brokers are onne ted.
When a message is published on a broker server this repli ates it on redis, then other
broker servers read it and send it to the subs ribed lients. So the listener s ript
previously des ribed when publishes something simply writes it on redis (without
alling any broker server) [82.
34
4.2. Ar hite ture Proposed
Everything is done automati ally without ustom ongurations on broker server:
every event ( hannel) used by the listener an be subs ribed by the users.
So ket.io allows managing hannels at high level, we an use: namespa es [80,
we an use them to distinguish dierent tables, so we have the same events identied
unequivo ally for dierent tables and rooms [80, we an use them to reate a sort of
sub hannel, a lient ould be subs ribed only to one room that ontains only events
related only to some rows. Rooms an be very useful to set up permissions at a row
level, of ourse a lient an write and read only rooms where he is subs ribed and
only the server an subs ribe it to rooms, so to do that a ustom logi is needed.
4.2.3 Input server
A good approa h is to use RESTful to ommuni ations that do not need to be
realtime events (they are syn hronous requests).
But for simpli ity we an send also the data form lient to servers using webso ket.
Using so ket.io also the lient uses events approa h that does not provide a su essful
delivery noti ation system, of ourse we an develop our system but for the tests
that we have to do this (a k of a tions done by the lient) is not important.
So for every so ket.io event the server alls a tions on postgreSQL server, we insert
this logi into the ustom logi .
The important thing of this approa h is that the writes do not depend on redis (that
has persisten e problems) and the onsisten y of data is managed by postgreSQL.
4.2.4 Custom logi
With so ket.io we an insert a ustom logi for events sent by the lient, a simple
example is shown in appendix A.2. But we annot modify events generated by others,
in that ase the so ket.io server a ts only like a swit h of messages, i.e. in a so ket.io
node we annot modify messages sent by postgreSQL listener.
The two main operations to do in this level are: send data to database, we an dene
dierent events for dierent database operations and for ea h of them mapping the
event with the database library method and manage subs riptions to rooms, this
depends on the role of rooms. But if we use them for authorizations a simple solution
is at hing an event (sent by lient) auth where we subs ribe that lient only to some
rows based on the authenti ation result. For example we an subs ribe (join to a
room) the lient only to rows that he owns (i.e. owner_id eld mat hing).
The only problem with this solution is that there is no persisten e of data as-
so iated to a lient. If the so ket.io server dies, if the lient hanges the server (if
we have load balan e organization) or if the onne tion is interrupted (and then
reestablished) data asso iated to a lient are lost.
The only riti al thing in our approa h is the authenti ation, in fa t if we are able
35
4. A Proposed Comparing Traditional Approa h
to identify the lient we an store other elements (like rooms joined) in other pla es.
A simple and standard approa h, that is not entralized (so it does not need per-
sisten e), to authenti ate lients is JWT (Json Web Token) [116. Eventually a
resyn hronization of data is needed (for messages lost during the down) but even if
we do not implement it the solution remains eventual onsisten y.
Of ourse for the topi of the thesis it is not needed to develop this, in fa t we need
just to test performan es as shown in 6.
4.2.5 Client library and Lo al database
We have to reate a lo al database that is the nal interfa e of our lient. So the
lient onta ts only it, the lo al database sends data through webso ket and re eives
noti ations of hanges through webso ket as we said previously.
We use Last Writer Win to implement lo al database, so the repli ation should be
eventual onsisten y.
So a lo al database implementation is an obje t that ontains data and provides the
read/write methods for the user.
We have the following situations: user alls write/update, lo al data are updated,
at the same time the lo al database tries to update the remote servers until there
are no network errors and data are updated by the server, a allba k is alled and it
updates lo al data.
Often data are modied by the server: default values, data updated by triggers,
onstraints, id auto generated and so on. Sin e we write data on the lient database
we have to implement all of them in the lient, but to do that we have to reate a
sort of SQL interpreter on the lient. Of ourse if there are no network problems the
issue ould be solved waiting for the answer of the server, but this means to make
the appli ation not optimisti .
This issue is solved by other systems in the following ways (some of them are already
implemented sin e they are NoSQL): simplify onstraints (the lient an repli ate
them easily), simplify defaults (the lient an repli ate them easily), no triggers,
unique ID that an be generated by the lient. For our appli ation and tests we an
skip everything ex ept for ID, to solve it we an use a UUID (Universally Unique
Identier) [118 that an be easily generated by the lient, so ket.io server and post-
greSQL server.
4.3 Implementation
We tried to use E mas ript where it is possible, using event driven approa h. On
github
6
there are ode, ongurations and vagrant installer s ripts. The installation
6
https://github. om/ arduz/master-thesis-sour e/tree/master/proposed_solution
36
4.3. Implementation
instru tions an be found in the readme. We used also promise pattern [66 to make
allba ks more readable.
We analyze qui kly the stru ture of the ode, of ourse everything is ommented
so we explain only the riti al points.
Database, sample table, trigger with related fun tions, these are very simple and
follow the stru ture. As we said previously the database is neither distributed nor
partitioned.
Database listener is very simple and follows the stru ture previously explained. We
insert it in a separated se tion be ause (as we have done in the vagrant installer
s ripts) it an be deployed on another ma hine (only one ma hine, in fa t it annot
be distributed), of ourse it an be deployed in the same ma hine of the database.
This listener sends an event to lients of a spe if room (we analyze logi of rooms
below) for every a tion done in the table (insert, update, delete) as so ket.io event
(the event name is the SQL a tion). As we said previously the table is spe ied as
namespa e.
Redis server does not require additional odes, so only vagrant installer s ript is
provided.
Webso ket server is explained below sin e it is very omplex. It an be distributed
using a load balan e, but only the installer s ript of a single ma hine is provided (no
load balan er s ript) sin e we do not want to test load balan er (that means another
level to test).
Finally we qui kly observe below what the lient does.
4.3.1 Webso ket server
The behavior is trivial, sin e mu h work is done automati ally as we said previ-
ously, we re ap briey what we have done.
We dene the same a tions for all tables (spe ied in namespa e via of).
We dene auth event allba k that identies the lient.
We dene join event allba k that allows to a lient to subs ribe to rooms enabled
for him. In this ase we have provided a simple permission system where the room
number is the owner_id of the row.When a lient joins a room all rows of that room
are sent to the lient (initial retrieve): emulating the insert event reated by the
database listener after having done a read all for that room. Of ourse the number
of rows an be limited.
We map lient data events (add, put, delete) to SQL fun tions (insert, update, delete)
making some he ks. If there is an error (su h as no permission) we dis ard the om-
mand and send the right ommand only to the lient that has sent the request to
update the lo al database of the lient. For example if a lient tries to add a row
using an owner_id that is not enabled for him the server dis ards the add and sends
a delete event to that lient.
37
4. A Proposed Comparing Traditional Approa h
Using this approa h, as said previously, the onsisten y of data modi ation is
managed by postgreSQL and it is like any normal modern HTTP appli ation, in fa t
data do not pass through redis in this phase.
The normal read is managed automati ally and it is eventual onsisten y sin e the
repli ation to lo al database is not syn hronized but so ket.io guarantees FIFO de-
livery.There is only one riti al point: what if there is a hange during the initial
retrieve onsidering that the lient is not able to distinguish data (initial retrieve or
other events)? We ould have dierent situations.
A notify event is alled before the initial retrieve is exe uted. Updates and deletes
are dis arded sin e there is no data, insertions are added to lo al database but they
annot be newer than the data sent from the initial retrieve (for the same reason
of the next point). Of ourse update/delete for rows added are onsidered. So it
remains eventual onsisten y.
A notify event is alled (and so the webso ket event) before the operation is om-
mitted and so visible to the sele t done meantime (this ause sele t data older than
notify event sent previously). This is impossible sin e the notify is really alled only
after ommit.
A notify event is alled during the sending of initial data to the lient. Sin e so ket.io
is exe uted in the same thread and delivery is FIFO, the sending of initial data is
done before the new ones.
A notify hange arrives before the answer of the sele t (but the sele t ommand was
already sent). In this ase we lose the hange, but it is still eventual onsisten y. Of
ourse improvements su h as queue of events lo ked (until initial retrieve is nished)
are good, but are not implemented to keep the simpli ity of the ode.
So if the events are red before initial retrieve, they are not a problem sin e they are
dis arded or ause eventually onsisten y. At the same time if events are red after
initial retrieve, they are not a problem sin e they are aught (sin e there is FIFO
guarantee). But events sent during initial retrieve are a problem sin e we an lose
something due to the fa t that we ould dis ard them, of ourse we keep eventually
onsisten y.
4.3.2 Client
We have two main lasses: so ket.io lient, this lass simply maps to so ket.io
ommands fun tions and allba ks that lo al database wants and general lo al database.
The general lo al database has to do the following operations: keep a lo al opy of
data, when a hange is requested it updates lo al data then sends update to the
server retrying to send it if there are network delays, when a noti ation of hange is
re eived it updates lo al data and alls a allba k that says that data were updated
and as explained previously it adds a UUID to ea h row reated. Of ourse the nal
lient intera ts only with the lo al database.
38
Chapter 5
Classes of Appli ations
In this hapter we analyze the main use ases with related real study ases,
where the CS-NoSQL are suggested and where they should perform very well. So
these are the ases for whi h the CS-NoSQL are designed. Of ourse, sin e there is no
literature, the information are taken from the vendors, vendors used are rebase
(google) and pubnub, both explained in 3.2.2.
The use ases analyzed in this hapter are tested in ben hmarks in 7, we analyze in
6 how to test these use ases and whi h parameters hange to do dierent signi ant
tests.
We analyze the main use ases, with important dieren es in their hara teristi s
(su h as average number of reads ompared to number of writes) in 5.1. We also
highlight the hara teristi s that better exploit the advantages of CS-NoSQL from
a theoreti al point of view. The use ases analyzed are: realtime hat, ollaborative
software and so ial appli ations.
Then we analyze some real study ases in 5.2, where the use ases seen previously
are applied with su ess by some important ompanies. This means that in some
ases CS-NoSQL are a good solution.
5.1 Use Cases
We have dierent elements to onsider. The most important one is the number
of reads, i.e. if #reads>>#writes, but also the stru ture of data is very important,
i.e. if JSON ould in rease performan e/readability. Of ourse we need to onsider
also aspe ts su h as subs riptions granularity, noti ations stru ture and onstraints,
modi ations, permissions needed by the server.
Now we analyze some use ases, they are the most ommon a ording to vendors.
More use ases ould be found on pubnub solutions site
1
.
We observe that the traditional omparing approa h an be adapted to every use ase,
1
https://www.pubnub. om/solutions/
39
5. Classes of Appli ations
but this adaption requires time. In fa t CS-NoSQL adapt themselves automati ally,
we show in 5.2 that this is a key of their su ess.
We need to use unique ID referen es as shown in the use ases, it is one of the few
referen es supported by most NoSQL databases (all databases we use support it).
In fa t in NoSQL databases we do not have referential integrity he ks [130 (some
databases try to implement it with some expli it he ks but it is not standard), so
also in the traditional approa h we do not use the referential integrity he k (it slows
the appli ation down).
5.1.1 Realtime hat
It is a lassi al use ase, shown as examples by dierent vendors [26,67. The hat
onsidered is room based hat, i.e. there are some rooms where there are a lot of users
subs ribed that see all the messages of that room. So we expe t #reads>>#writes.
However we need to do a lot of he ks in the server, su h as permissions to be
posted in that room, identity he k (name shown) and so on. Looking at the store
stru ture [27, in parti ular at messages of rooms store (sample shown on listing 5.1),
the usage of JSON stru ture seems very useful.
1
2 "messages":
3 "room1":[
4
5 "test": "message 1",
6 "user": "user1"
7 ,
8
9 "test": "message 2",
10 "user": "user2"
11
12 ,
13 "room2":[
14 ,
15 "room -users":
16 "room1":["user1", "user2",
17 "room2":["user1",
18 ,
19 "users":
20 "user1":
21 "roomsAllowed": ["room1", "room2"
22 ,
23 "user2":
24 "roomsAllowed": ["room1"
25
40
5.1. Use Cases
26
27
Listing 5.1: Chat data stru ture
We an observe that we need to link every message to a room and to a user. Of
ourse with JSON we an make a dire t link ( hild of) with only one element, so we
need to hoose if we want to link with room (all messages of a room) or with users
(all messages of a user). Room is hosen sin e the users subs ribe to it and want to
re eive noti ations based on it (they want messages of a room not messages of a
user), while the user is linked using ID referen e.
Of ourse using a traditional approa h we link both things (room and user) using
ID referen e. We use room ID as so ket.io room, so we manage permissions and
we automati ally send noti ations of rooms hanges (so it is the same approa h of
CS-NoSQL solution).
So we have three tables: messages, rooms and users, for every row of messages we
have a referen e to user and room. If we want to link users to rooms (permission)
we need a pivot table that links users and rooms. The JSON version seems to be
more readable.
So permissions and noti ations are managed analogously with a traditional ap-
proa h, we expe t small improvements omparing with traditional approa h given
only by the fa t that ar hite ture and delivery are optimized to have #reads>>#writes.
Moreover we only add data that we know to be unique, NoSQL databases should
be better with this kind of aspe ts. But sin e we have #reads>>#writes we should
not see ee ts of it.
We an observe that if we want to retrieve e iently the users of a room we an
do it easily with a traditional approa h (it is a query on users table), but to do it
with NoSQL (sin e we do not have queries) we have to repli ate data (room-users)
as shown in listing 5.1.
5.1.2 Collaborative software
A ollaborative software is a software that allows to more than one person to work
together in the same do ument, one of the most famous ollaborative softwares is
Google Do s [33. A simple ollaborative software is provided as example by rebase
[28. Sin e the data stru ture is not simple we keep in onsideration a simplied
version (we do not onsider aspe ts su h as multiple do uments or permissions) in
5.2.
1
2 "history":[
3
4 "timestampt":1490506829 ,
41
5. Classes of Appli ations
5 " hangeObje t":
6 "start": 20,
7 " hange": "ab d"
8 ,
9 "user":"user1"
10 ,
11
12 "timestampt":1490506830 ,
13 " hangeObje t":
14 "start": 22,
15 " hange": "Impa t",
16 "end": 24
17 ,
18 "user":"user2"
19 ,
20
21 "timestampt":1490506835 ,
22 " hangeObje t":
23 "start": 22,
24 " hange": -2
25 ,
26 "user":"user2"
27 ,
28 ,
29 "users":
30 "user1":
31 "position": 10
32 ,
33 "user2":
34 "position": 15,
35 "positionEnd": 20
36
37
38
Listing 5.2: Collaborative software data stru ture
We work using history of hanges, in fa t sin e there is no lo k users annot
hange the same element meantime.
The stru ture of history is simple: we an add a text (rst hange), we an add a
font to a portion of text (se ond hange), we an remove some hars (third hange).
Moreover we an observe that we an subs ribe to users to see their realtime ursor
position.
It is easy to observe how the JSON stru ture is perfe t for this kind of appli ations,
it is easy to think to extend this appli ation adding hildren detailed elds in hange.
42
5.1. Use Cases
Of ourse in the traditional appli ation we an do the same thing serializing some-
thing (also JSON) to a string.
We have also two subs riptions (users and history) that are easy to manage and easy
to read with JSON stru ture.
But of ourse we an manage them with a traditional approa h, in fa t they be-
ome two dierent tables. We have: history table with timestamp, user referen e,
hangeObje t (that ould be the JSON as string) elds and users with position and
positionEnd elds.
We only add data that we know to be unique, NoSQL databases should be bet-
ter with this kind of onstraints. Sin e we have #reads ≃ #writes we should see
dieren es due to the add e ien y.
5.1.3 So ial Appli ations
A so ial appli ation is a lassi al use ase, shown as example by dierent vendors
[29, 69. It is omposed by lassi al elements: users, relationships (followers), posts,
omments, likes. A data stru ture that uses the power of JSON ould be the one
shown in the listing 5.3, but the stru ture really used is dierent as shown in the
listing 5.4 [30.
1
2 "users":
3 "user1":,
4 "user2":,
5 "user3":,
6 ,
7 "relationships":[
8 ["user1", "user2",
9 ["user2", "user3",
10 ,
11 "posts":[
12
13 "user": "user1"
14 " ontent":"aaa",
15 "likes":[
16 "user1",
17 "user2",
18 ,
19 " omments":[
20
21 "timestamp": 1490506829 ,
22 "user": "user2",
23 " onent": "bbbb"
24
43
5. Classes of Appli ations
25
26
27
28
Listing 5.3: Possible so ial stru ture
1
2 "users":
3 "user1":
4 "posts": ["post1"
5 ,
6 "user2":"posts":[,
7 "user3":"posts":[,
8 ,
9 "followers":
10 "user1":["user2",
11 "user2":["user1", "user3",
12 "user3":["user2"
13 ,
14 "likes":
15 "post1":["users1", "user2"
16 ,
17 " omments":
18 "post1":[
19
20 "timestamp": 1490506829 ,
21 "user": "user2",
22 " onent": "bbbb"
23
24
25 ,
26 "posts":
27 "post1":
28 "user": "user1"
29 " ontent":"aaa"
30
31
32
Listing 5.4: So ial stru ture
Even if the rst data stru ture seems good, the se ond is better. In fa t we an
subs ribe only one time to the main obje t, while in the rst stru ture we have to
subs ribe for every new post. Lots of subs riptions ould be a problem. Moreover it
is easier to bring the se ond approa h on traditional te hnologies.
For simpli ity permissions and related aspe ts are skipped. Like the previous use
44
5.2. Examples of Real CS-NoSQL Appli ations
ase, even if we have more than one subs ription we an manage them easily (with
dierent tables). We have: users, likes (with referen e to user and to post), omments
(with referen e to user and to post), post (with referen e to user).
But we ould have some problems due to the absen e of the integrity onstraints.
For example for likes, we ould add a referen e to a post in the likes obje t while the
post is deleting itself (in fa t we do not even have lo k).
We an observe that if we want to retrieve e iently ea h user's posts we an do it
easily with a traditional approa h (it is a query on users table), but to do it with
NoSQL (sin e we do not have queries) we have to repli ate data (posts under users)
as shown in listing 5.4. We expe t #reads >> #writes, it is ommon to have a lot
of followers (so for every write we have to repli ate it to a lot of followers).
It seems that everything an be managed e iently with a traditional approa h so,
like the rst use ase, we expe t minor improvements due to e ient ar hite ture
and delivery optimized.
5.2 Examples of Real CS-NoSQL Appli ations
In this se tion we analyze qui kly some real stories of appli ation of CS-NoSQL.
For ea h ase we show the problem solution and link with the use ases shown pre-
viously.
Of ourse sin e the ases are taken from vendors, the te hnologies used are propri-
etary in loud lo ated ones so there an be some dieren es ompared to tests we an
do with opensour e te hnologies. In fa t proprietary in loud lo ated te hnologies
sometimes have extra features or extra performan es in some parti ular onditions.
5.2.1 Adobe DPS
Adobe DPS (Digital Publishing Solution) is a ollaborative software used for
publishing mobile app experien es. It is developed over pubnub [1.
Problem Sin e it is a fully- ollaborative software, there should be the possibility
to work on the same proje t together (dierent persons) from dierent devi es, from
dierent lo ations. Like Google Do s [33.
Solution After having analyzed the developing osts of ustom solutions built over
lassi al systems a ommer ial CS-NoSQL solution was hosen. It sends noti ations
about hanges of the proje ts to all the onne ted devi es. It also allows, in the
future, introdu ing new features like server-server noti ations. The system hosen
oers also global redundan y.
45
5. Classes of Appli ations
Use ase We an easily observe that this is the use ase shown in 5.1.2. Of ourse
the stru ture of data is dierent, but the idea of saving hanges is not. The exibility
of JSON gives us the possibility to use the same previous model in more omplex
situations than a simple text do ument (we only need to hange the ontent of hange
eld as explained previously).
5.2.2 Logite h Harmony Ultimate Home
Logite h harmony ultimate home is a home automation hub that allows ontrol-
ling house from the app and by other means. It is developed over pubnub [48.
Problem With the app the user an ontrol the hub from every lo ation, data
hanged with the hub are a stream of data. So a se urely and reliable solution to
send a stream of data is needed.
Solution A ommer ial CS-NoSQL solution was hosen. It is used to send realtime
data from the app to the hub, when the user is outside. Moreover the hub sends
a stream of realtime data from dierent devi es (lights, temperatures ex.) to the
mobile app.
Use ase We have not studied this use ase previously but it is marked as one of
the ommon use ases by pubnub [68.
5.2.3 CornerJob
CornerJob is a lo ation based job re ruitment app. It is developed over pubnub
[5. The main feature of it is the hat: when a job seeker applies for a job a new hat
is reated.
Problem Sin e the hat is the most important part of the appli ation, the ompany
wanted a standard hat on a reliable te hnology that has a low developing and
maintenan e ost.
Solution After having analyzed developing osts of ustom solution built over
lassi al systems and after having tried free plans, a ommer ial CS-NoSQL solution
was hosen. Be ause it allowed building a hat system in few time and in a reliable
way. In fa t the CS-NoSQL is a full sta k system that takes are of every step
needed, so there is no need to think about s alability and ommuni ations among
internal omponents. Moreover, sin e a ommer ial system proprietary in loud
lo ated was used, the ompany does not have to think about the maintenan e of the
infrastru ture.
46
5.2. Examples of Real CS-NoSQL Appli ations
Use ase We an easily observe that this is the use ase shown in 5.1.1. Of ourse
it is a little dierent, sin e we do not have hat rooms, but we have only private
messages between two users. So some aspe ts hange: it is not true that #reads >>
#writes, the version for traditional approa h is not still valid (minor hanges are
needed) and so on.
47
5. Classes of Appli ations
48
Chapter 6
Ben hmarks strategy
In this hapter we des ribe the idea behind ben hmarks, whose results are pub-
lished and ommented in hapter 7. We show how to test the lasses of appli ations
analyzed in the previous hapter, how to test s alability of dierent systems and
whi h indi es to take to ompare systems. So we show theoreti al on epts and we
design our test framework from a theoreti al point of view.
We analyze the theoreti al on epts of s alability and how to test s alability with
our systems in se tion 6.1.
We show the idea behind some test frameworks and we design our test framework
for CS-NoSQL in se tion 6.2.
Then we show how to test the lasses of appli ations analyzed in the previous hapter
in a general way in se tion 6.3.
Finally we explain how to integrate the tests of dierent lasses of appli ations to
our test framework in se tion 6.4.
6.1 S aling Test
In this se tion we show how we an test the s aling of dierent solutions. We
analyze the s aling that we do and the reasons of those hoi es. Then we show
dierent test environments based on the previous hoi es.
6.1.1 S aling
We know that NoSQL databases implement partitioning very e iently [98. So
we expe t that with a lot of data that require partitioning the performan e of NoSQL
solutions would be mu h better than the traditional approa h.
Moreover sin e, often, strong onsisten y onstraints are relaxed, also the repli ation
is more e iently.
Both aspe ts bring to have horizontal s alability (also alled out/in) [122, sin e we
in rease the number of servers (not aspe ts like the power of them).
49
6. Ben hmarks strategy
Furthermore, sin e CS-NoSQL are a full sta k solution, also the internal realtime
delivery server is s aled horizontally (it is s aled with the database).
So in order to keep simpli ity, to make standard things (partition is implemented
in dierent ways based on the data model) and to avoid adding an additional level
to test, we de ided to skip this part. So we test without horizontal s aling.
Moreover, sin e our topi is ompared to these kinds of appli ations with a standard
approa h, if we test also the horizontal s aling we add another variable fa tor that
an hange our results.
For the same reason we do not s ale the traditional approa h that we proposed
horizontally. Here the s aling we mean: partitioning and repli ation of postgreSQL
server and repli ation of so ket.io ma hines, redis is not a bottlene k so it does not
need s aling. But we do a verti ally s aling (also alled up/down) [122 that we
explain in the next se tion.
6.1.2 Environments
In this se tion we show dierent ongurations that we want to test. These
ongurations were found after some minor empiri al tests. We have also set them
to have the same maximum and minimum sum of resour es used ( ounting not stati
servers).
As said previously, CS-NoSQL are a full sta k solution implemented in just one
server. So we an reate a ommon environment for them, shown below.
While on the traditional approa h proposed we have dierent servers. So they need
an advan ed dis ussion.
So ket.io and gun.js ould be onsidered mono-thread so we do not s ale the CPU.
In fa t we have a main thread where to exe ute allba ks and other operations,
furthermore ba kground operations (exe uted in other threads) in ase of network
delivery (ba kground operations used by so ket.io) are not CPU intensive [137.
To keep things more general all the tests are done using virtual ma hines (using
virtualbox
1
) [110 on the same physi al ma hine.
So all the omponents and te hni al hara teristi s are the same (su h as RAM
speed).
Note that with CPU we mean a standard modern CPU (i7 generation), of ourse for
all servers the same CPU was used.
CS-NoSQL Sin e we have only one server we an simply follow the table 6.1. As
we said previously gun.js an be onsidered mono-thread, so we an test it with just
one CPU (so we have only two ases based on the RAM).
1
https://www.virtualbox.org
50
6.2. Test Framework
Table 6.1: CS-NoSQL test environments
N° RAM [GB N° CPU
1 2 2
2 2 3
3 4 2
4 4 3
Comparing traditional approa h Here we have 4 servers, for ea h of them we
use a virtual ma hine:
PostgreSQL: this is a riti al point and we follow table 6.2.
Redis: this is not a riti al point, it is not also a thing to test. So we an
onsider it stati , we use always a ma hine with 1CPU and 512MB RAM.
Listener: this is not a riti al point, it is not also a thing to test. So we an
onsider it stati , we use always a ma hine with 1CPU and 512MB RAM.
So ket.io: this is a riti al point and we follow table 6.3. As we said previously
gun.js an be onsidered mono-thread, so we an test it with just one CPU.
Of ourse we have to test all the ombinations: so we have 8 tests to do.
Table 6.2: postgreSQL (traditional approa h) test environments
N° RAM [GB N° CPU
1 1 1
2 1 2
3 2 1
4 2 2
Table 6.3: so ket.io (traditional approa h) test environments
N° RAM [GB N° CPU
1 1 1
2 2 1
6.2 Test Framework
Sin e the systems used are ustom or are new there are no stable frameworks to
test their performan e. So we follow the guidelines of an existing framework alled
YCSB [89, we realize our framework that tests only what we need and in the way
we want. The framework with tests prepared (and related aspe ts like SQL tables)
51
6. Ben hmarks strategy
ould be found on github
2
with instru tions on the readme.
But we tested only performan es (throughput and laten y). A further analysis should
test the onsisten y of the distributed environment, to do that there are some tools
like jepsen [47 (we have shortly analyzed it in appendix B).
The framework we realized is done in javas ript, so it is easy to integrate it with
other platforms. Sin e we know whi h are the indi es that we need, we have just
to implement general aspe ts. We realize a generator of data, a ording to the data
stru ture, it is shown in se tion 6.3. And a general lient that allows sending data
and reading lo al databases (or re eive data), it is shown in se tion 6.4.
What we need to test are write and read performan es. In fa t, as we said, we have
dierent models of data, but we do not have omplex operations so we have only
basi writes and written repli ations of data.
Sin e our riti al point is the repli ation, we have to test with more than one lient.
We an onsider a write ompleted when it is repli ated in all lients.
So we need at least one lient to make writes and at least one to he k repli ations
( alled reads). The number of lients (writer and reader) depends on the type of
appli ation.
We emulate dierent situations where we have (for example) 1 writer and 100
reader lients. So an important information is the laten y of the syn hronization to
all these lients. We all exe ution the write/read data a ording to the number of
writers/readers spe ied in the model.
So we measure the dierent things. Laten y to syn hronize all lients and through-
put (request/laten y to syn hronize everything), for every exe ution. Mean (with
varian e) laten y to get data by ea h lient. Final throughput: total number of
requests (reads and writes) per se ond.
So ea h test unit is omposed by dierent lients, so we exe ute more than one unit
in parallel.
To do that we have a test manager that he ks everything and ommuni ates with
lients, in fa t every lient is implemented as another pro ess.
The reation of a pro ess and ommuni ation is managed by the framework in a
transparent way. In order to map we need only lient fun tions with the general
lient (it is done by the adapter).
The framework, sin e it is not a general framework, does not generate harts, it
only generates raw data, harts an be generated using external and powerful in-
struments like sheet softwares. Of ourse the test framework is exe uted on another
virtual ma hine (with high resour es) or in the host ma hine.
2
https://github. om/ arduz/master-thesis-sour e/tree/master/tests
52
6.3. Tests sets
6.3 Tests sets
In this se tion we analyze what every element of the lasses of appli ations (ana-
lyzed previously) needs. We also dene for ea h of them the number of lients to be
used for tests, for some of them we an dene more versions (e.g. test with 1 reader
lient then test 10 reader lients) to test s alability and adaptability of repli ation
and on urrent writing. Of ourse this is a simplied simulation of the behavior in
these lasses of appli ations.
All data shown (like the number of lients) are found after some minor empiri al
tests. In some ases data stru tures need some hanges to be adapted to the model
of the database used. There are minor hanges (simpli ations) that are not shown
here, of ourse the ode on github ontains everything.
We should onsider also the size of data, but after some experiments, we have seen
that some systems do not support a big amount of data. So the test would be in-
onsistent. Of ourse we analyze the data volume adaptability, with data volume we
mean the volume of data in the database (not the size of the single eld).
6.3.1 Realtime hat
Clients As we said previously, sin e it has a room based hat, we expe t #reads
>> #writes. This example is useful to test the performan es of a lot of lients
subs ribed to a subs ription. So a reasonable number of lients, onsidering the
environments previously dened, ould be the one shown in table 6.4, where we have
#readers>>#writers. In that table is shown also the number of rooms.
Table 6.4: Realtime hat lients
N° N° Writer N° Reader N° Room
1 1 10 1
2 1 100 1
3 10 100 5
Data generation After a trial data initialization (e.g. users reation or room
reation), to run tests we have just to reate messages. So our generator is simply a
fake text generator, other data like the user an be stati . As we analyze in the next
se tion, we have to manage the room where to write.
Writing We want that a writer is subs ribed only to one room.
Reading We want that ea h read is subs ribed to 2 rooms (if possible).
53
6. Ben hmarks strategy
6.3.2 Collaborative software
Clients As we said previously, we expe t #reads ≃#writes. This example is useful
to test the performan es of on urrent writes. So a reasonable number of lients,
onsidering the environments previously dened, ould be the one shown in table
6.5, where we have #readers ≃ #writers.
Table 6.5: Collaborative software lients
N° N° Writer N° Reader
1 1 1
2 10 10
3 100 100
Data generation After a trial data initialization (e.g. users reation), to run tests
we have just to reate hanges elements. A hange is a JSON obje t of the stru ture
shown in listing 6.1. There are two elements: start, i.e. the start position of the
hange and hange, i.e. the new text that repla es the old one starting from the start
position. We an onsider everything stati ex ept for start and hange that should
be respe tively fake number and text.
1
2 "timestampt":1490506829 ,
3 " hangeObje t":
4 "start": 20,
5 " hange": "ab d"
6 ,
7 "user":"user1"
8
Listing 6.1: Collaborative software hange stru ture
Writing Here the writing is trivial, all lients write to history (list of hanges).
Reading Here the reading is trivial, all lients subs ribe to history (list of hanges).
6.3.3 So ial
Clients As we said previously we expe t #reads >> #writes. This example is
useful to test the performan es of a lot of lients subs ribed to multiple subs riptions.
So a reasonable number of lients, onsidering the environments previously dened,
ould be the one shown in table 6.6, where we have #readers >>#writers.
In a so ial appli ation for ea h person there are a lot of users that see the writes.
These users are followers and sometimes also followers of their dire t followers.
54
6.4. Adapters for systems
Table 6.6: So ial lients
N° N° Writer N° Reader
1 1 10
2 1 100
3 10 100
Data generation After a trial data initialization (e.g. users reation), we have
dierent elements:
Posts: we should generate fake texts ( ontents).
Comments: we should link users to posts. Of ourse sin e there are no he ks
we an generate random ids (also non exiting ones) for referen es.
Likes: we an observe that they are like omments. So, to simplify, we an skip
them.
We an say that there are more omments than posts, for simpli ity we an send a
post every 9 omments. When we generate a post we have a new post that an have
omments, and where users an subs ribe. So we have a subs ription to new posts
and multiple subs riptions for omments, ea h for every post.
Writing As said previously the reation of a post is done every 9 omments, it
is done by all writers. We want that for ea h post we have two writers that write
omments (if possible). The writers an omment to all posts, also older ones.
Reading All lients subs ribe to posts, they read all the new posts. A lient is
subs ribed to half of the posts. Even if new posts are reated the old subs riptions
are not deleted. So the more time passes, the more subs riptions are reated.
6.4 Adapters for systems
The general lient needs: login method, join method to subs ribe to hannel and
to a table/do ument, write ommand, data allba k that is alled when there are
new data (of ourse new data are passed as argument).
We have already proved that systems an provide authenti ation. But sin e it is
implemented in dierent ways it ould add another variable fa tor to onsider that
an hange the nal results, so we skip it sin e it is not our main target to test. Now
we analyze qui kly how to implement these for all the platforms.
55
6. Ben hmarks strategy
Cou hBase
Subs ription: we annot manage hannels, we an hoose only to whi h do u-
ment we subs ribe. So we re eive noti ations for every hange in do uments
where we are subs ribed. We ould use lters [8 as workaround to this prob-
lem, but they are not so e ient and they are not so easy to use in a real
environment (due to permissions needed to reate lters).
Write ommand: it is a normal asyn hronous all.
Data allba k: it returns new data from the lo al database.
Pou hDB
Subs ription: we annot manage hannels, we an hoose only to whi h do u-
ment we subs ribe. So we re eive noti ations for every hange in do uments
where we are subs ribed. We ould use lters [63 as workaround to this prob-
lem, but they are not so e ient and they are not so easy to use in a real
environment (due to permissions needed to reate lters).
Write ommand: it is a normal asyn hronous all.
Data allba k: it returns new data from the lo al database.
Gun.js
Subs ription: we an subs ribe to any level of the JSON stru ture, so we do
not have problems (only messages that we need are delivered).
Write ommand: it is a normal asyn hronous all.
Data allba k: it returns new data from the lo al database.
Comparing traditional approa h
Subs ription: we an reate ustom hannels and join them using join, so only
messages that we need are delivered.
Write ommand: it is a normal asyn hronous all.
Data allba k: it returns new data from the lo al database.
56
Chapter 7
Ben hmarks
In this hapter we show the ben hmark tests that we have done a ording to
what we have seen in the previous hapter, for the lasses of appli ations and for the
systems we previously dened.
We analyze the tests done, with some assumptions raised with pra ti al experi-
ments in se tion 7.1. Tests exe utions are organized by te hnology, for ea h te hnol-
ogy they are organized by lass of appli ation.
Finally we re ap the results obtained, showing why in most ases our traditional
omparative approa h is better in se tion 7.2.
7.1 Tests
Here we show data of the exe ution of the tests. We tried, when possible, to
simplify them to avoid to run some tests.
Moreover even if we measure dierent data (su h as request throughput or varian e
of laten y) explained in the previous hapter, sin e they would make the omparison
onfusing, we show only nal throughput (in tables) and average laten y in se onds
(in harts). Of ourse the tool developed generates the other data and tests an be
easily reprodu ed.
For ea h system we have exe uted the test for every lass of appli ations. For ea h
of them we have a table (with exe utions) and a hart.
In the table we reported results for dierent numbers of lients and for dierent
environments, but in the harts we reported only results for dierent lients using
the best environment.
For ea h system we run a short analysis of the results for: s alability, data
volume adaptability (what happens to the entire system if the data stored in rease),
laten y stability and write performan es (what is possible to dedu t from indire t
measurements that do).
Then we ome to some observations for ea h lass of appli ations, remember that
57
7. Ben hmarks
the main aspe ts to test for ea h of them are: hat (we test subs ription delivery
e ien y), ollaborative (we test the on urrent writes e ien y), so ial (we test
multiple subs ription e ien y, we also remind that the number of subs riptions
in reases during the time).
The tests were exe uted for 30 se onds, more se onds were ne essary to nish all the
exe utions. The environment was stable and we do not needed to exe ute the same
test more than one time (to take an average value).
For ea h onguration of number of lients, for ea h lass of appli ations and
for systems we had to nd the right value of on urren y. It is the number of
on urrent exe utions, an exe ution is the write/read data a ording to the number
of writers/readers spe ied by the model. When an exe ution is nished another is
started (to keep the same onsisten y value).
So the total number of writes/reads started together is equal to #writes/reads in
the model multiplied for on urren y.
Con urren y inuen es the number of tasks sent to the same writer, but often a
writer (depends on the implementation of the lient of the server) sends data in a
sequential way. So to send data in a on urrent way we have to emulate more writers.
This value is found in an empiri al way: the value hosen is the rst value (starting
from lower) that guarantees the max throughput in the best environment.
We do tests as a bla k box, we do not know details of the systems.
So, for example, if writers are slow we do not know if the problem is the database
itself or the realtime delivery level. But if we observe that on urrent writes have the
same time of non on urrent writes we an expe t that the bottlene k is the realtime
delivery level.
Some tests were not exe uted, sin e they have not be setup in a reasonable time
(>60se onds) or the server has died.
7.1.1 Cou hbase
General properties
S alability Cou hbase strongly depends on the CPU, but it has a low s alability.
We an observe from tests that if we in rease the number of CPU the performan es
in rease for small volumes, but do not in rease signi antly for a big dataset. The
CPU is saturated immediately, while the RAM is kept free, we have seen this also
analyzing the ma hine during the tests.
Data volume adaptability Cou hbase depends on the total volume of data.
Gradually data in rease and the laten y in reases, in fa t (during pre-tests, the
development phase) we needed to lear the database to have a reasonable speed
(thing not needed with other systems). We observe this for all lasses of appli ations
58
7.1. Tests
and for all exe utions.
Laten y stability The ou hbase laten y is stable during the entire pro ess for all
lasses of appli ations and for all exe utions. Of ourse there are some small errors
in the measurements that are due to dierent aspe ts, su h as the time to send all
initial data.
Write performan es If we have a big amount of writes (not ne essarily on ur-
rent) they are very slow, in reasing the laten y. We an easily observe this for the
ollaborative ben hmark when we have 100 writers.
At the same time it seems that the on urrent writes are not so riti al (they do not
inuen e so mu h the result ompared to a normal write). In fa t we are also able
to use a big on urren y fa tor.
Classes of appli ations
We observe the throughput obtained for the dierent ongurations and environ-
ments. For ea h lass of appli ation we have a separate table.
At the same time we observe the laten y during the entire pro ess for the best en-
vironment for ea h onguration, shown with dierent olors (the legend shows the
ouple #writers-#readers). It is shown as a hart, on the abs issa we have the rela-
tive time (in se onds) into the pro ess whereas on the ordinate we have the laten y
value (in se onds). For ea h lass of appli ation we have a separate hart.
Chat The throughput is shown in table 7.1, whereas the laten y is shown in g-
ure 7.1. Laten y is very high when we have more writes ( onguration 1 for high
on urren y and ase 3), but a big amount of readers is well managed.
Collaborative The throughput is shown in table 7.2, whereas the laten y is shown
in gure 7.2.Sin e we have more total writes the laten y is higher in the rst on-
guration, but it seems that on urrent writes are well managed if they are few. If
we have a lot of on urrent writes like in the third onguration we have very bad
performan es.
So ial The throughput is shown in table 7.3, whereas the laten y is shown in gure
7.3. When we have more readers and so a lot of subs riptions things go bad.
59
7. Ben hmarks
Table 7.1: Chat ( ou hbase) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s
1 1 10 200 2 2 310
2 1 10 200 2 3 322
3 1 10 200 4 2 311
4 1 10 200 4 3 327
5 1 100 40 2 2 440
6 1 100 40 2 3 615
7 1 100 40 4 2 443
8 1 100 40 4 3 734
9 10 100 15 2 2 39
10 10 100 15 2 3 47
11 10 100 15 4 2 40
12 10 100 15 4 3 49
Figure 7.1: Chat ( ou hbase) ben hmarks
Table 7.2: Collaborative ( ou hbase) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s
1 1 1 350 2 2 94
2 1 1 350 2 3 129
3 1 1 350 4 2 98
4 1 1 350 4 3 137
5 10 10 15 2 2 80
6 10 10 15 2 3 87
7 10 10 15 4 2 81
8 10 10 15 4 3 89
9 100 100 1 2 2 1
10 100 100 1 2 3 1
11 100 100 1 4 2 1
12 100 100 1 4 3 1
60
7.1. Tests
Figure 7.2: Collaborative ( ou hbase) ben hmarks
Table 7.3: So ial ( ou hbase) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s
1 1 10 250 2 2 620
2 1 10 250 2 3 622
3 1 10 250 4 2 620
4 1 10 250 4 3 623
5 1 100 80 2 2 401
6 1 100 80 2 3 405
7 1 100 80 4 2 402
8 1 100 80 4 3 406
9 10 100 20 2 2 49
10 10 100 20 2 3 50
11 10 100 20 4 2 50
12 10 100 20 4 3 51
Figure 7.3: So ial ( ou hbase) ben hmarks
61
7. Ben hmarks
7.1.2 Pou hdb
General properties
S alability Pou hdb depends on CPU and RAM. In fa t all tests show that if we
in rease both we obtain a signi ant in rease of performan es. But at the same time
for a big data volume we do not have in rease of performan e.
Data volume adaptability It seems that pou hdb does not depend on the volume
of data, of ourse more spe i tests should be done.
Laten y stability The laten y is not stable during the entire pro ess. We an
easily observe this in the rst exe utions for ea h lass of appli ation, in fa t in that
ase we have a high level of on urren y and a high throughput.
Write performan es They are very slow, in reasing the laten y. It seems also
that on urrent writes are a problem (we had to skip some tests).
Classes of appli ations
We observe the throughput obtained for the dierent ongurations and environ-
ments. For ea h lass of appli ation we have a separate table.
At the same time we observe the laten y during the entire pro ess for the best en-
vironment for ea h onguration, shown with dierent olors (the legend shows the
ouple #writers-#readers). It is shown as a hart, on the abs issa we have the rela-
tive time (in se onds) into the pro ess whereas on the ordinate we have the laten y
value (in se onds). For ea h lass of appli ation we have a separate hart.
Chat The throughput is shown in table 7.4, whereas the laten y is shown in gure
7.4. Laten y is high when we have more writers, but a big amount of readers is well
managed (laten y is not inuen ed).
Collaborative The throughput is shown in table 7.5, whereas the laten y is shown
in gure 7.5. Con urrent writes are not well managed, the laten y in reases and
throughput de reases.
So ial The throughput is shown in table 7.6, whereas the laten y is shown in gure
7.6. When we have more readers and so a lot of subs riptions things go bad.
62
7.1. Tests
Table 7.4: Chat (pou hdb) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s
1 1 10 100 2 2 745
2 1 10 100 2 3 830
3 1 10 100 4 2 820
4 1 10 100 4 3 1030
5 1 100 1 2 2 70
6 1 100 1 2 3 70
7 1 100 1 4 2 88
8 1 100 1 4 3 100
9 10 100 1 2 2 15
10 10 100 1 2 3 20
11 10 100 1 4 2 16
12 10 100 1 4 3 25
Figure 7.4: Chat (pou hdb) ben hmarks
Table 7.5: Collaborative (pou hdb) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s
1 1 1 100 2 2 390
2 1 1 100 2 3 390
3 1 1 100 4 2 420
4 1 1 100 4 3 420
5 10 10 10 2 2 63
6 10 10 10 2 3 70
7 10 10 10 4 2 65
8 10 10 10 4 3 74
9 100 100 1 2 2 NONE
10 100 100 1 2 3 NONE
11 100 100 1 4 2 NONE
12 100 100 1 4 3 NONE
63
7. Ben hmarks
Figure 7.5: Collaborative (pou hdb) ben hmarks
Table 7.6: So ial (pou hdb) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s
1 1 10 100 2 2 650
2 1 10 100 2 3 664
3 1 10 100 4 2 670
4 1 10 100 4 3 725
5 1 100 1 2 2 4
6 1 100 1 2 3 4
7 1 100 1 4 2 4
8 1 100 1 4 3 5
9 10 100 1 2 2 3
10 10 100 1 2 3 3
11 10 100 1 4 2 3
12 10 100 1 4 3 3
Figure 7.6: So ial (pou hdb) ben hmarks
7.1.3 Gun.js
We remind that we use only 1 CPU for gun.js server.
64
7.1. Tests
General properties
S alability Gun.js strongly depends on the RAM. Often it dies for out of mem-
ory (after some se onds of tests, so it does not die for few se onds tests), but at the
same time we have seen that it saturated the only ore used. So the ore be omes
the bottlene k and it is not possible to s ale it. With more RAM we do not in rease
the performan es but we are able to exe ute some tests that with less RAM would
die.
Data volume adaptability It seems that Gun.js strongly depends on the volume
of data. We an observe in all tests, but parti ularly in the so ial one (where the
data in rease in time), that the laten y in reases in time with the in rease of data.
Laten y stability The laten y is stable during the entire pro ess, it is inuen ed
by data volume (but regularly) as said previously. All tests do not show signi ant
os illations.
Write performan es They are very slow, in reasing the laten y. It seems also
that on urrent writes are a problem (we had to skip some tests).
Classes of appli ations
We observe the throughput obtained for the dierent ongurations and environ-
ments. For ea h lass of appli ation we have a separate table.
At the same time we observe the laten y during the entire pro ess for the best en-
vironment for ea h onguration, shown with dierent olors (the legend shows the
ouple #writers-#readers). It is shown as a hart, on the abs issa we have the rela-
tive time (in se onds) into the pro ess whereas on the ordinate we have the laten y
value (in se onds). For ea h lass of appli ation we have a separate hart.
Chat The throughput is shown in table 7.7, whereas the laten y is shown in gure
7.7. A big amount of readers is well managed (laten y is not inuen ed).
Collaborative The throughput is shown in table 7.8, whereas the laten y is shown
in gure 7.8. Con urrent writes are not well managed, the laten y in reases and
throughput de reases.
So ial The throughput is shown in table 7.9, whereas the laten y is shown in gure
7.9. When we have more readers and so a lot of subs riptions things do not go so
bad, so multiple subs riptions are well managed (not perfe tly but better than in
other systems).
65
7. Ben hmarks
Table 7.7: Chat (gun.js) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB throughput [req/s
1 1 10 100 2 270
2 1 10 100 4 277
3 1 100 90 2 NONE
4 1 100 90 4 190
5 10 100 1 2 NONE
6 10 100 1 4 NONE
Figure 7.7: Chat (gun.js) ben hmarks
Table 7.8: Collaborative (gun.js) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB throughput [req/s
1 1 1 100 2 165
2 1 1 100 4 168
3 10 10 10 2 20
4 10 10 10 4 20
5 100 100 1 2 NONE
6 100 100 1 4 NONE
66
7.1. Tests
Figure 7.8: Collaborative (gun.js) ben hmarks
Table 7.9: So ial (gun.js) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB throughput [req/s
1 1 10 160 2 330
2 1 10 160 4 333
3 1 100 15 2 NONE
4 1 100 15 4 114
5 10 100 1 2 NONE
6 10 100 1 4 NONE
Figure 7.9: So ial (gun.js) ben hmarks
7.1.4 Traditional
We remind that we use only 1 CPU for so ket.io server. Moreover, after some
tests, we have seen that postgreSQL is not the bottlene k, so we an avoid testing
dierent environments for it.
67
7. Ben hmarks
General properties
S alability It does not depend on the RAM. Analyzing the ma hine we have seen
that it saturated the only ore used, so the ore be omes the bottlene k and it is not
possible to s ale it. We an observe that there is no dieren e with dierent RAM
ongurations, this sin e the limit imposed by the CPU is rea hed before, so RAM
issues should be ome visible only with few RAM.
Data volume adaptability It seems that it does not depend on the volume of
data, of ourse more spe i tests should be done.
Laten y stability The laten y is not so stable during the entire pro ess. In fa t
for all lasses of appli ations and for all exe utions we have big os illations. At the
same time we should onsider that we have also a big on urren y level so there are
a lot of tasks in exe utions, so it is statisti ally easier that there are some ollisions
among them (they try to write in the same time so they are queued).
Write performan es They are very qui k. It also seems that on urrent writes
are not a problem, the performan es remain the same in all tests.
Classes of appli ations
We observe the throughput obtained for the dierent ongurations and environ-
ments. For ea h lass of appli ation we have a separate table.
At the same time we observe the laten y during the entire pro ess for the best en-
vironment for ea h onguration, shown with dierent olors (the legend shows the
ouple #writers-#readers). It is shown as a hart, on the abs issa we have the rela-
tive time (in se onds) into the pro ess whereas on the ordinate we have the laten y
value (in se onds). For ea h lass of appli ation we have a separate hart.
Chat The throughput is shown in table 7.10, whereas the laten y is shown in gure
7.10. A big amount of readers is well managed (laten y is not inuen ed). At the
same time the laten y is not inuen ed by the number of writers.
Collaborative The throughput is shown in table 7.11, whereas the laten y is
shown in gure 7.11. It seems that on urrent writes are managed not so bad.
The in rease of the laten y is not so big ( ompared with other systems) and the
de rease of the throughput is not so big.
So ial The throughput is shown in table 7.12, whereas the laten y is shown in
gure 7.12. When we have more readers and so a lot of subs riptions things go well,
so multiple subs riptions are well managed.
68
7.1. Tests
Table 7.10: Chat (traditional) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB so ket.io throughput [req/s
1 1 10 150 2 425
2 1 10 150 4 409
3 1 100 15 2 2345
4 1 100 15 4 1982
5 10 100 3 2 630
6 10 100 3 4 635
Figure 7.10: Chat (traditional) ben hmarks
Table 7.11: Collaborative (traditional) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB so ket.io throughput [req/s
1 1 1 70 2 62
2 1 1 70 4 70
3 10 10 15 2 50
4 10 10 15 4 57
5 100 100 2 2 35
6 100 100 2 4 34
Figure 7.11: Collaborative (traditional) ben hmarks
69
7. Ben hmarks
Table 7.12: So ial (traditional) ben hmarks
N° N° Writer N° Reader Con urren y RAM [GB so ket.io throughput [req/s
1 1 10 150 2 1908
2 1 10 150 4 1828
3 1 100 20 2 6314
4 1 100 20 4 5919
5 10 100 3 2 670
6 10 100 3 4 642
Figure 7.12: So ial (traditional) ben hmarks
7.2 Analysis of the Results and Lesson Learned
We an observe that there is not a real advantage in using CS-NoSQL. The
traditional solution proposed is, often, better, it has:
The highest throughput: ex ept for the rst two ongurations of ollaborative
tests.
Low laten y: ex ept for ollaborative tests.
E ient subs riptions delivery: laten y does not in rease when the number of
subs riptions in reases. As shown in hat test.
E ient multiple subs riptions delivery: even if the number of subs riptions
in reases in time during the test, the laten y remains the same. As shown in
so ial test.
Better performan es when there are a lot of on urrent writes: as shown in
ollaborative tests.
It does not depend on data volume.
Of ourse, there are some points where it is not so good like CS-NoSQL:
70
7.2. Analysis of the Results and Lesson Learned
Laten y stability.
Con urrent writes for few data: in the rst ongurations of ollaborative tests,
laten y and throughput are not so good. But if we in rease data we obtain
better results than other systems.
S alability. As we have seen it depends on the pro essor. But as we have seen in
4.2 we an use a load balan er. Moreover, sin e we used standard te hnologies,
we an use standard te hnologies [53 to implement load balan e between lo al
ores.
There is no CS-NoSQL with a number of advantages like the traditional solution
proposed.
The only advantage, ex ept for the easy onguration sin e they are a full sta k envi-
ronment, seems to be the performan e when we have the few numbers of on urrent
writes, so when we have #reads ≃ #writes. It is the main expe tation that we had,
analyzing lasses of appli ations in 5.
So we proved that, ex ept for ollaborative tests, we have better results with a tra-
ditional approa h. Moreover the traditional approa h proposed seems more stable.
Of ourse NoSQL are designed to partition themselves better, but this is useless if
the basi performan es are not so good.
71
7. Ben hmarks
72
Chapter 8
Con lusions
Analyzing tests of performan es of lasses of appli ations typi al of CS-NoSQL
we dis overed important results: CS-NoSQL are not so e ient as we expe ted sin e
they are a dedi ated system based on NoSQL databases that as we know for a
lot of situations are very e ient [98. Often, the traditional omparing approa h
we proposed is more e ient, in same ases even 10x faster. So, with the urrent
te hnologies, we would suggest a ustom solution based on traditional te hnology,
even for realtime appli ations.
Moreover, the results of our tests highlighted that CS-NoSQL are not so stable in
time and not so s alable from ben hmark tests. Often they are poorly do umented,
in fa t sometimes we had some problems to nd out the theoreti al information (su h
as information about the CAP theorem) and we had some problems to use some not
so ommon features. This is be ause these systems are not so used.
They do not have a standard (there is not literature about them), this means that
there are lots of problems if we want to use them ommer ially. There are no standard
ways for tests, additional development skills should be required and the support to
this system ould be dropped anytime. So, if we onsider these elements (that are a
ost) and the not so impressive performan es, it seems that this kind of systems are
only a higher ost for ompanies, of ourse more dedi ated studies should be done
to derive more nal results.
In fa t, proprietary in loud lo ated solutions ould be very good also for the ost
model. Espe ially if alternative solutions are also based on loud. As we said, loud
solutions have a lot of advantages like an easy onguration, full sta k environment,
zero ode required. But sin e they are not opened and they are not based on standard
te hnologies (like other loud solutions), if the support to them is dropped the entire
appli ation is lost. Of ourse, a study of this topi should be done, but is not so easy
sin e as said previously we an manage only few elements.
Future work in ludes testing onsisten y (way explained in appendix B) and
partitioning (NoSQL databases should be very e ient with partitioning).
73
8. Con lusions
Another important future work is doing better tests that go inside the infrastru ture
of CS-NoSQL so we are able to measure the single time of every a tion: time required
to update the lo al database, to deliver realtime noti ations, time to pro ess a write,
time to publish new data and so on.
But sin e NoSQL is an important topi in re ent years, sin e the target appli ations of
these systems is something that is in reasing in re ent years and sin e big ompanies
(like Google) are investing on it, we expe t an improvement in the future.
74
Bibliography
[1 Adobe DPS ase study. https://www.pubnub. om/ ustomers/adobe/.
[2 AWS Lambda. https://aws.amazon. om/lambda/.
[3 AWS Lambda pri ing. https://aws.amazon. om/lambda/pri ing/.
[4 AWS S3. https://aws.amazon. om/s3/.
[5 CornerJob ase study. https://www.pubnub. om/ ustomers/
pubnub-the-perfe t- hat-solution-for- ornerjob/.
[6 Cou hbase CAP theorem. http://developer. ou hbase. om/
do umentation/server/ urrent/ on epts/data-management.html.
[7 Cou hbase luster. http://developer. ou hbase. om/do umentation/
server/ urrent/ lustersetup/manage- luster-intro.html.
[8 Cou hbase lters. https://developer. ou hbase. om/do umentation/
mobile/1.4/guides/syn -gateway/server-integration/index.html?
language=ios.
[9 Cou hbase full text sear h. http://blog. ou hbase. om/2016/february/
ou hbase-4.5-developer-preview- ou hbase-fts.
[10 Cou hbase MapRedu e. http://developer. ou hbase. om/do umentation/
server/ urrent/ar hite ture/in remental-map-redu e-views.html.
[11 Cou hbase N1QL. http://www. ou hbase. om/n1ql.
[12 Cou hBase repli ation. http://do s. ou hbase. om/admin/admin/Tasks/
tasks-manage-repli ation.html.
[13 Cou hbase users. http://developer. ou hbase. om/do umentation/
mobile/ urrent/develop/guides/syn -gateway/authorizing-users/
index.html.
[14 Cou hbase webso ket. https://github. om/ ou hbase/syn _gateway/wiki/
WebSo ket-Based-Changes-Feed.
75
Bibliography
[15 Cou hDB hanges noti ations. http://guide. ou hdb.org/draft/
notifi ations.html.
[16 Cou hDB Clustering (partitioning). http://guide. ou hdb.org/draft/
lustering.html.
[17 Cou hDB eventually onsisten y. http://do s. ou hdb.org/en/2.0.0/
intro/ onsisten y.html.
[18 Cou hDB MapRedu e. https://wiki.apa he.org/ ou hdb/Introdu tion_
to_Cou hDB_views.
[19 Cou hDB repli ation. http://do s. ou hdb.org/en/2.0.0/repli ation/.
[20 Cou hDB RESTful. http://do s. ou hdb.org/en/2.0.0/api/.
[21 Cou hDB S aling introdu tion. http://guide. ou hdb.org/draft/s aling.
html.
[22 Cou hDB se urity. http://guide. ou hdb.org/draft/se urity.html.
[23 Cou hDB views. http://guide. ou hdb.org/draft/views.html.
[24 Database triangle. http://blog.s ottlogi . om/dgorst/assets/
mongodb-vs- ou hdb/nosql-triangle.png.
[25 Firebase pri ing. https://firebase.google. om/pri ing/.
[26 Fire hat. https://github. om/firebase/fire hat.
[27 Fire hat data Stru ture. https://github. om/firebase/fire hat/blob/
master/rules.json.
[28 Firepad. https://github. om/firebase/firepad.
[29 Friendly Pix. https://github. om/firebase/friendlypix.
[30 Friendly Pix data stru ture. https://github. om/firebase/friendlypix/
blob/master/web/database-rules.json.
[31 Google apps for work. https://gsuite.google. om/index.html.
[32 Google apps for work pri ing. https://gsuite.google. om/pri ing.html.
[33 Google do s. https://do s.google. om/do ument.
[34 Gun.js CAP theorem. https://github. om/amark/gun/wiki/CAP-Theorem.
[35 Gun.js data format. https://github. om/amark/gun/wiki/GUN%E2%80
%99s-Data-Format-%28JSON%29.
76
Bibliography
[36 Gun.js distributed stru ture. https://github. om/amark/gun/wiki/
Getting-Started-%28v0.3.x%29#distributed.
[37 Gun.js graphs. https://github. om/amark/gun/wiki/Graphs.
[38 Gun.js realtime. https://github. om/amark/gun/wiki/Getting-Started-
%28v0.3.x%29#real-time-syn .
[39 Gun.js se urity. https://github. om/amark/gun/wiki/Se urity
%2C-Authenti ation%2C-Authorization.
[40 Gun.js storage. https://github. om/amark/gun/wiki/
Using-Amazon-S3-for-Storage.
[41 Heroku pri ing. https://www.heroku. om/pri ing.
[42 HTTP persistent onne tions. https://www.safaribooksonline. om/
library/view/http-the-definitive/1565925092/ h04s05.html.
[43 HTTP2 Browser support. http:// aniuse. om/#feat=http2.
[44 HTTP2.0 Multiplex. https://assets.wp.nginx. om/wp- ontent/uploads/
2015/10/HTTP2.png.
[45 Jepsen postgreSQL explanation. https://aphyr. om/posts/
282-jepsen-postgres.
[46 Jepsen postgreSQL test tool. https://github. om/jepsen-io/jepsen/tree/
master/postgres-rds.
[47 Jepsen tool. https://github. om/jepsen-io/jepsen.
[48 Logite h Harmony Ultimate Home ase study. https://www.pubnub. om/
ustomers/logite h/.
[49 Master-Slave repli ation. https://en.wikipedia.org/wiki/Master/slave_
(te hnology).
[50 Mongo DB. https://www.mongodb. om/.
[51 Multi Master repli ation. https://en.wikipedia.org/wiki/Multi-master_
repli ation.
[52 Optimisti UI. http://info.meteor. om/blog/
optimisti -ui-with-meteor-laten y- ompensation.
[53 PM2. https://github. om/Unite h/pm2.
77
Bibliography
[54 PostgreSQL Fun tions. https://www.postgresql.org/do s/9.5/stati /
sql- reatefun tion.html.
[55 PostgreSQL JSON. https://www.postgresql.org/do s/9.5/stati /
datatype-json.html.
[56 PostgreSQL Listen. https://www.postgresql.org/do s/9.5/stati /
sql-listen.html.
[57 PostgreSQL Notify. https://www.postgresql.org/do s/9.5/stati /
sql-notify.html.
[58 PostgreSQL Partitioning. https://www.postgresql.org/do s/9.5/stati /
ddl-partitioning.html.
[59 PostgreSQL Pro edural Languages. https://www.postgresql.org/do s/9.5/
stati /xplang.html.
[60 PostgreSQL Repli ation. https://www.postgresql.org/do s/9.5/stati /
different-repli ation-solutions.html.
[61 PostgreSQL Triggers. https://www.postgresql.org/do s/9.5/stati /
sql- reatetrigger.html.
[62 Postgres's publish-subs ribe features made better with JSON. https://blog.
andyet. om/2015/04/06/postgres-pubsub-with-json/.
[63 Pou hDB lter. https://pou hdb. om/2015/04/05/
filtered-repli ation.html.
[64 Pou hDB live repli ation. https://pou hdb. om/guides/repli ation.
html#live\dis retionary-repli ation.
[65 Pou hDB lo al database. https://pou hdb. om/guides/databases.html.
[66 Promise pattern. https://developer.mozilla.org/en/do s/Web/
JavaS ript/Referen e/Global_Obje ts/Promise.
[67 PubNub hat. https://www.pubnub. om/solutions/ hat/.
[68 PubNub home automation. https://www.pubnub. om/solutions/
home-automation-and-ma hine-signaling/.
[69 PubNub live blogging. https://www.pubnub. om/solutions/
live-blogging/.
[70 Redis a he. http://redis.io/topi s/lru- a he.
78
Bibliography
[71 Redis data types. http://redis.io/topi s/data-types.
[72 Redis distributed lo k. http://redis.io/topi s/distlo k.
[73 Redis partitioning. http://redis.io/topi s/partitioning.
[74 Redis persistan e. https://redis.io/topi s/persisten e#
aof-advantages.
[75 Redis Publish/Subs ribe. http://redis.io/topi s/pubsub.
[76 Redis queue. http://redis.io/ ommands/rpoplpush#
pattern-reliable-queue.
[77 Redis repli ation. http://redis.io/topi s/repli ation.
[78 Salesfor e. om pri ing. http://www.salesfor e. om/eu/platform/
pri ing/.
[79 So ket.io. http://so ket.io/.
[80 so ket.io hannels. http://so ket.io/do s/rooms-and-namespa es/.
[81 so ket.io emit event. https://so ket.io/do s/server-api/#
namespa e-emit-eventname-args.
[82 so ket.io-emitter. https://github. om/so ketio/so ket.io-emitter.
[83 so ket.io P2P. https://github. om/so ketio/so ket.io-p2p.
[84 so ket.io proto ol. https://github. om/so ketio/so ket.io-proto ol.
[85 so ket.io redis. https://github. om/so ketio/so ket.io-redis.
[86 so ket.io users. https://www.npmjs. om/pa kage/so ket.io.users.
[87 Webso ket. http://www.ibm. om/developerworks/library/
wa-reverseajax2/fig01.gif.
[88 Webso ket servers omparison. https://medium. om/denizozger/
finding-the-right-node-js-webso ket-implementation-b63bf a0539#.
isyo3prtn.
[89 YCSB. https://github. om/brianfrank ooper/YCSB.
[90 Daniel Abadi. Consisten y tradeos in modern distributed database system
design: Cap is only part of the story. Computer, 45(2):3742, 2012. http://
dx.doi.org/10.1109/MC.2012.33.
79
Bibliography
[91 Guruduth Banavar, Tushar Chandra, Bodhi Mukherjee, Jay Nagarajarao,
Robert E Strom, and Daniel C Sturman. An e ient multi ast proto ol for
ontent-based publish-subs ribe systems. In Distributed Computing Systems,
1999. Pro eedings. 19th IEEE International Conferen e on, pages 262272.
IEEE, 1999. http://dx.doi.org/10.1109/ICDCS.1999.776528.
[92 Mike Belshe, Martin Thomson, and Roberto Peon. Hypertext Transfer Proto-
ol Version 2 (HTTP/2). RFC Editor, 2015. http://dx.doi.org/10.17487/
RFC7540.
[93 Tim Berners-Lee, Roy Fielding, and Henrik Frystyk. Hypertext Transfer Proto-
ol HTTP/1.0. RFC Editor, 1996. http://dx.doi.org/10.17487/RFC1945.
[94 Ken Birman and Thomas Joseph. Exploiting virtual syn hrony in distributed
systems. SOSP '87 Pro eedings of the eleventh ACM Symposium on Operat-
ing systems prin iples, pages 123138, 1987. http://dx.doi.org/10.1145/
37499.37515.
[95 William J Bolosky, John R Dou eur, David Ely, and Marvin Theimer. Feasibil-
ity of a serverless distributed le system deployed on an existing set of desktop
PCs. 28(1):3443, 2000. http://dx.doi.org/10.1145/345063.339345.
[96 Eri Brewer. Cap twelve years later: How the" rules" have hanged. Computer,
45(2):2329, 2012. http://dx.doi.org/10.1109/MC.2012.37.
[97 Brad Cain, Abbie Barbir, Raj Nair, and Oliver Spats he k. Known ontent
network ( n) request-routing me hanisms. RFC Editor, 2003. http://dx.doi.
org/10.17487/RFC3568.
[98 Ri k Cattell. S alable SQL and NoSQL data stores. ACM SIGMOD Re or,
39:1227, 2010. http://dx.doi.org/10.1145/1978915.1978919.
[99 Min Chen, Shiwen Mao, and Yunhao Liu. Big data: A survey. Mobile Net-
works and Appli ations, 19(2):171209, 2014. http://dx.doi.org/10.1007/
s11036-013-0489-0.
[100 Shaiful Alam Chowdhury, Varun Sapra, and Abram Hindle. Client-Side Energy
E ien y of HTTP/2 for Web and Mobile App Developers. Software Anal-
ysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International
Conferen e on, 5, 2016. http://dx.doi.org/10.1109/SANER.2016.77.
[101 Flaviu Cristian. Syn hronous and asyn hronous. Communi ations of the ACM,
39(4):8897, 1996. http://dx.doi.org/10.1145/227210.227231.
80
Bibliography
[102 Douglas Cro kford. The appli ation/json Media Type for JavaS ript Ob-
je t Notation (JSON). RFC Editor, 2006. http://dx.doi.org/10.17487/
RFC4627.
[103 Mi hael Cusumano. Cloud omputing and SaaS as new omputing platforms.
Communi ations of the ACM, 53(4):2729, 2010. http://dx.doi.org/10.
1145/1721654.1721667.
[104 Frank Dabek, Ni kolai Zeldovi h, Frans Kaashoek, David Mazières, and Robert
Morris. Event-driven programming for robust software. In Pro eedings of the
10th workshop on ACM SIGOPS European workshop, pages 186189. ACM,
2002. http://dx.doi.org/10.1145/1133373.1133410.
[105 Jerey Dean and Sanjay Ghemawat. Mapredu e: simplied data pro essing
on large lusters. Communi ations of the ACM, 51(1):107113, 2008. http://
dx.doi.org/10.1145/1327452.1327492.
[106 Ivan Fette and Alexey Melnikov. The WebSo ket Proto ol. RFC Editor, 2011.
http://dx.doi.org/10.17487/RFC6455.
[107 Armando Fox and Eri A Brewer. Harvest, yield, and s alable tolerant systems.
Hot Topi s in Operating Systems, 1999. Pro eedings of the Seventh Workshop
on, pages 174178, 1999. http://dx.doi.org/10.1109/HOTOS.1999.798396.
[108 John Gantz and David Reinsel. The digital universe in 2020: Big data, bigger
digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze
the future, 2007(2012):116, 2012.
[109 Jesse James Garrett et al. Ajax: A new approa h to web appli ations. 2005.
[110 Charles David Graziano. A performan e analysis of xen and kvm hypervisors
for hosting the xen worlds proje t. 2011.
[111 Katarina Grolinger, Wilson A Higashino, Abhinav Tiwari, and Miriam AM
Capretz. Data management in loud environments: Nosql and newsql data
stores. Journal of Cloud Computing: Advan es, Systems and Appli ations,
2(1):22, 2013. http://dx.doi.org/10.1186/2192-113X-2-22.
[112 Carl A Gutwin, Mi hael Lippold, and TC Graham. Real-time groupware in
the browser: testing the performan e of web-based networking. In Pro eedings
of the ACM 2011 onferen e on Computer supported ooperative work, pages
167176. ACM, 2011. http://dx.doi.org/10.1145/1958824.1958850.
[113 Jing Han, E Haihong, Guan Le, and Jian Du. Survey on NoSQL database.
Pervasive Computing and Appli ations (ICPCA), 2011 6th International Con-
81
Bibliography
feren e on, pages 363366, 2011. http://dx.doi.org/10.1109/ICPCA.2011.
6106531.
[114 Robin He ht and Stefan Jablonski. Nosql evaluation: A use ase oriented
survey. In Cloud and Servi e Computing (CSC), 2011 International Confer-
en e on, pages 336341. IEEE, 2011. http://dx.doi.org/10.1109/CSC.2011.
6138544.
[115 Markus Hofmann and Leland R Beaumont. Content networking: ar hite ture,
proto ols, and pra ti e. Elsevier, 2005.
[116 Mi hael Jones, John Bradley, and Nat Sakimura. Json web token (jwt). RFC
Editor, 2015. http://dx.doi.org/10.17487/RFC7519.
[117 George Lawton. Developing software online with platform-as-a-servi e te hnol-
ogy. Computer, 41(6):1315, 2008. http://dx.doi.org/10.1109/MC.2008.
185.
[118 Paul J Lea h, Mi hael Mealling, and Ri h Salz. A universally unique identier
(uuid) urn namespa e. 2005. http://dx.doi.org/10.17487/rf 4122.
[119 Georg JP Link, Dominik Siemon, Gert-Jan de Vreede, and Susanne Robra-
Bissantz. Evaluating an hored dis ussion to foster reativity in online ol-
laboration. In CYTED-RITOS International Workshop on Groupware, pages
2844. Springer, 2015. http://dx.doi.org/10.1007/978-3-319-22747-4_3.
[120 Salvatore Loreto, P Saint-Andre, S Salsano, and G Wilkins. Known Issues
and Best Pra ti es for the Use of Long Polling and Streaming in Bidire tional
HTTP. RFC Editor, 2011. http://dx.doi.org/10.17487/RFC6202.
[121 James Martin and Savant Institute. Managing the data-base environment.
Prenti e-Hall Englewood Clis (NJ), 1983.
[122 Maged Mi hael, Jose E Moreira, Doron Shiloa h, and Robert W Wisniewski.
S ale-up x s ale-out: A ase study using nut h/lu ene. In Parallel and Dis-
tributed Pro essing Symposium, 2007. IPDPS 2007. IEEE International, pages
18. IEEE, 2007. http://dx.doi.org/10.1109/IPDPS.2007.370631.
[123 Neil Middleton, Ri hard S hneeman, et al. Heroku: Up and Running. "
O'Reilly Media, In .", 2013.
[124 Jerey C Mogul, Jim Gettys, Henrik Frystyk, Tim Berners-Lee, and Roy T
Fielding. Hypertext Transfer Proto ol HTTP/1.1. RFC Editor, 1997.
http://dx.doi.org/10.17487/RFC2068.
82
Bibliography
[125 Gavin Mulligan, Denis Gra, et al. A omparison of SOAP and REST im-
plementations of a servi e based intera tion independen e middleware frame-
work. Pro eedings of the 2009 Winter Simulation Conferen e (WSC), pages
14231432, 2009. http://dx.doi.org/10.1109/WSC.2009.5429290.
[126 Nikos Ntarmos, Ioannis Patlakas, and Peter Triantallou. Rank join queries
in nosql databases. Pro eedings of the VLDB Endowment, 7(7):493504, 2014.
http://dx.doi.org/10.14778/2732286.2732287.
[127 Nurzhan Nurseitov, Mi hael Paulson, Randall Reynolds, and Clemente Izuri-
eta. Comparison of JSON and XML Data Inter hange Formats: A Case Study.
S enario, pages 157162, 2012. https://pdfs.semanti s holar.org/8432/
1e662b24363e032d680901627aa1bfd6088f.pdf.
[128 Vi toria Pimentel and Bradford G Ni kerson. Communi ating and Displaying
Real-Time Data with WebSo ket. IEEE Internet Computing, 16:4553, 2012.
http://dx.doi.org/10.1109/MIC.2012.64.
[129 Paul Pres od. Roots of the REST/SOAP Debate. In Extreme Markup Lan-
guages®. Citeseer, 2002.
[130 Ansar Raque. Evaluating nosql te hnologies for histori al nan ial data, 2013.
[131 Rüdiger S hollmeier. A denition of peer-to-peer networking for the lassi a-
tion of peer-to-peer ar hite tures and appli ations. Peer-to-Peer Computing,
2001. Pro eedings. First International Conferen e on, pages 101102, 2001.
http://dx.doi.org/10.1109/P2P.2001.990434.
[132 Rami Sellami, Sami Bhiri, and Bruno Defude. ODBAPI: a unied REST
API for relational and NoSQL data stores. 2014 IEEE International Congress
on Big Data, pages 653660, 2014. http://dx.doi.org/10.1109/BigData.
Congress.2014.98.
[133 Mar Shapiro. Optimisti repli ation and resolution. In En y lopedia of
Database Systems, pages 19951995. Springer, 2009. http://dx.doi.org/10.
1007/978-0-387-39940-9_258.
[134 Sang Shin. Introdu tion to json (javas ript obje t notation). Presentation
www. javapassion. om, 2010.
[135 Konstantin Shva hko, Hairong Kuang, Sanjay Radia, and Robert Chansler.
The hadoop distributed le system. Mass Storage Systems and Te hnologies
(MSST), 2010 IEEE 26th Symposium on, pages 110, 2010. http://dx.doi.
org/10.1109/MSST.2010.5496972.
83
Bibliography
[136 Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr, and Jussi Mylly-
maki. Implementing a s alable XML publish/subs ribe system using relational
database systems. SIGMOD '04 Pro eedings of the 2004 ACM SIGMOD in-
ternational onferen e on Management of data, pages 479490, 2004. http://
dx.doi.org/10.1145/1007568.1007623.
[137 Stefan Tilkov and Steve Vinoski. Node.js: Using JavaS ript to build high-
performan e network programs. IEEE Internet Computing, 14(6):80, 2010.
http://dx.doi.org/10.1109/MIC.2010.145.
[138 Devesh Tiwari and Yan Solihin. Ar hite tural hara terization and similarity
analysis of sunspider and Google's V8 Javas ript ben hmarks. Performan e
Analysis of Systems and Software (ISPASS), 2012 IEEE International Sympo-
sium on, pages 221232, 2012. http://dx.doi.org/10.1109/ISPASS.2012.
6189228.
[139 Can Türker and Mi hael Gertz. Semanti integrity support in sql: 1999 and
ommer ial (obje t-) relational database management systems. The VLDB
Journal The International Journal on Very Large Data Bases, 10(4):241269,
2001. http://dx.doi.org/10.1007/s007780100050.
[140 Matteo Varvello, Kyle S homp, David Naylor, Jeremy Bla kburn, Alessandro
Finamore, and Konstantina Papagiannaki. Is the Web HTTP/2 Yet? Le -
ture Notes in Computer S ien e, 9631:218232, 2016. http://dx.doi.org/
10.1007/978-3-319-30505-9_17.
[141 Werner Vogels. Eventually onsistent. Communi ations of the ACM, 52(1):40
44, 2009. http://dx.doi.org/10.1145/1435417.1435432.
[142 Erik Wilde. Putting things to REST. S hool of Information, 2007.
84
Appendix A
Snippets
A.1 PostgreSQL realtime retrieve trigger
We all a ommand to send updated data (notify hanges) to the publish/sub-
s ribe system as des ribed in 4.2.1. To do that we reate a postgreSQL fun tion [54
shown in the listing A.1 that alls our external program, then we reate a postgreSQL
after trigger [61 that alls this fun tion (so we re eive insert/delete/update events)
as shown in A.2.
This way is a standard SQL with triggers, some spe i postgreSQL fun tions were
used, of ourse the languages hange between database systems and needs. But the
idea is still valid.
1 CREATE or REPLACE FUNCTION notify(text , text , text , text ) RETURNS text
AS '
2 #!/ bin/bash
3 node notify $1 $2 $3 $4 </dev/null >/dev/null 2>&1 &
4 ' LANGUAGE plsh ;
Listing A.1: Notify fun tion
1 CREATE OR REPLACE FUNCTION notify_trigger_pro () RETURNS trigger AS
$notify_trigger_pro $
2 DECLARE
3 old_v TEXT ;
4 new_v TEXT ;
5 BEGIN
6 IF TG_OP = 'INSERT ' THEN
7 old_v := '';
8 ELSE
9 old_v := (SELECT ('[' || row_to_json (OLD) || ''):: json ->> 0);
10 END IF;
11 IF TG_OP = 'DELETE ' THEN
12 new_v := '';
13 ELSE
14 new_v := (SELECT ('[' || row_to_json (NEW) || ''):: json ->> 0);
85
A. Snippets
15 END IF;
16 PERFORM notify(NEW.id , TG_OP , old_v ,new_v);
17 RETURN NEW;
18 END;
19 $notify_trigger_pro $ LANGUAGE plpgsql;
20
21 CREATE TRIGGER notify_trigger
22 AFTER UPDATE INSERT OR UPDATE OR DELETE ON DATA_TABLE
23 FOR EACH ROW
24 EXECUTE PROCEDURE notify_trigger_pro ();
Listing A.2: Notify trigger
A.2 So ket.io ustom logi
1 so ket.on(' hat message', fun tion(msg)
2 io.emit(' hat message', msg.split('').reverse().join(''
));
3 );
Listing A.3: So ket.io ustomization
86
Appendix B
Jepsen
We exe uted jepsen framework to test a postgreSQL server deployed in a virtual
ma hine with 4GB RAM and 1CPU (of ourse on the same physi al ma hine of the
other tests).
The test used was the standard jepsen postgreSQL test [45,46, it emulates simulta-
neous bank transfers and raw read of a ounts. It onsiders a fail when it is not
possible to exe ute a transfer, sin e there is a negative amount (i.e. other transfers
in the meantime).
The results obtained are shown as harts in gures B.1,B.2,B.3. In the harts also
the failures of reads are shown, even if they are 0, transfer info is the time required
to he k the amount before doing the transfer.
Figure B.1: Jepsen laten y raw
We an observe that tests do a lot of transfers and reads for some se onds. In
harts we have the measures for every operation (or some of them) for every se ond
of the test (or hunks of se onds), the measures that we have are: time (laten y)
needed to pro ess the operation, shown in B.1; time (laten y) needed to pro ess the
operation with quantiles, shown in B.2; throughput of every operation, shown in
B.3.
87
B. Jepsen
Figure B.2: Jepsen laten y quantiles
Figure B.3: Jepsen rate
88