50
Riad MOKADEM – June 22 th , 2006 Algebraic Signatures For Scalable Distributed Data Structures 1\50 pages Riad MOKADEM [email protected] [email protected] http://ceria.dauphine.fr/riadmokadem http://ceria.dauphine.fr/riadmokadem /riad.html /riad.html Algebraic Signatures For Scalable Distributed Data Structures Thesis presentation Thesis presentation CERIA Laboratory

Riad MOKADEM – June 22 th, 2006 Algebraic Signatures For Scalable Distributed Data Structures 1\50 pages Riad MOKADEM [email protected]

Embed Size (px)

Citation preview

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures1\50 pages

Riad [email protected] [email protected]

http://ceria.dauphine.fr/riadmokadem/riad.hhttp://ceria.dauphine.fr/riadmokadem/riad.htmltml

Algebraic Signatures For Scalable

Distributed Data Structures

Thesis presentationThesis presentation

CERIA Laboratory

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures2\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures3\50 pages

Facts

SDDS

SDDS RP*

Objective

Facts

New architectures new data structures and file system. Data in Distributed Ram.

Scalability.

Parallel queries…

An SDDS is a new class of data structures Specific for Multicomputers, P2P, Grids… For Any application needing scalability and fast

response time

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures4\50 pages

SDDSs Family

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

SDDS-2005 is based on the RP* SDDS principles

Facts

SDDS

SDDS RP*

Objective

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures5\50 pages

SDDS RP* Scheme

Files are range partitioning (RP*) based Records in distributed RAM Record = (key, non-key field)

Buckets split using median key Like in a B-tree

Clients are not synchronously informed about splits.May send a query to an incorrect serverServers forward incorrectly addressed queries and send back

Image Adjustment Messages to adjust client image.Key search queries Range queries.

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Facts

SDDS

SDDS RP*

Objective

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures6\50 pages

Objective: New Capabilities SDDS-2005

Parallel Store, restore file to/from disk storage The SDDS Backup Scheme

Concurrent access Useless update detection Record Scheme Updates

Protection against incidental viewing of data in ServersEncoded data in bucket

Scans (non-key parallel search) Various string matches

Prefix, String, longest common…

Introduction

SDDS-2005 architecture

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Facts

SDDS

SDDS RP*

Objective

Algebraic Signatures(Cumulative)

Using Algebraic Signatures

Using Algebraic Signatures

Using Cumulative Algebraic Signatures

Using Cumulative Algebraic Signatures

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures7\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures8\50 pages

SDDS-2005 Architecture

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

SDDS-2005 Architecture

Internal Organization of Bucket SDDS-2005

Communication Server Client

Demo Client Interface

Network

ClientClient

Applications

clientclient

Applications

... ...

serverserver

RAM Bucket RAM Bucket

Listen ThreadWork thread

...

NameName

...

...

clientclient clientclient

serverserverserverserver

serverserverRAM Bucket RAM Bucket RAM Bucket RAM Bucket

Applications Applications

Listen ThreadWork thread

Listen ThreadWork thread

Listen ThreadWork thread

Listen ThreadWork thread

Listen ThreadWork thread

Listen ThreadWork thread

Listen ThreadWork thread

Multithread architecture

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures9\50 pages

Internal Organization Of Bucket SDDS-2005

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

HeaderHeader

Index SDDS B+-tree

Data PagesData Pages

SDDS-2005 Architecture

Internal Organization of Bucket SDDS-2005

Communication Server Client

Demo Client Interface

Index : a few Kbytes up to MByte File Mapped structures

Data file : Dozens of Mbytes up to GBytes

Index

Root

Leaf headers

Records

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures10\50 pages

Communication Server - Client

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

SDDS-2005 Architecture

Internal Organization of Bucket SDDS-2005

Communication Server Client

Demo Client Interface

ClientSDDS-2005

Network

Threads

ClientSDDS-2005

ClientSDDS-2005

Responses clientRequests client

Server SDDS-2005

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures11\50 pages

Demo Client Interface

Choice of « Search command »Choice of « Search command »

Search by contentSearch by content}}}} Search by Key Search by Key

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

SDDS-2005 Architecture

Internal Organization of Bucket SDDS-2005

Communication sever Client

Demo Client Interface

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures12\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures13\50 pages

Algebraic Signatures? [ICDE’04]

• Galois Field GF(2f ) f>>1

• Each symbol has size f

• f = 8 or f = 16 in SDDS-2005

• XOR used for + and – operations .

• Antilog and Log tables used for * and / .

• Using a primitive element

GF(28) string ASCII Code

GF(216 ) string Unicode

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures14\50 pages

Calculus Algebraic Signatures

• 1-symbol signature

Sign ( P )= pi i i = 1..n

With P=(p1,p2,…,pn) and ( = : , 2, 3… )

• N-symbol signature

Sign (P)= (Sign ( P ), Sign 2( P ),…Sign N ( P ))

• Typical Collision Probability: 2-Nf.

• In SDDS-2005:

• N =1 or N = 2.

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures15\50 pages

Backup Scheme [WDAS’03]

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Need for RAM file backup at disk

Backup only Protection againt RAM failure

File remains in RAM

Eviction RAM sharing among different SDDS files

Restore SDDS file load from disk to RAM

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures16\50 pages

Backup Scheme [WDAS’03]

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

2 mapped files : Data_file, Index_file Bucket Paging

Signed Data Page of 64 KB Signed Index Page of 256 B.

List of Page Signatures = Bucket Map Also backed up at the disk.

Page Signature = Algebraic Signature 2-Symbol in GF(216) = 4B long

Much shorter than SHA-1

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures17\50 pages

Parallel Backup on Disk Storage

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

…… … …

RAM Buckets

StorageStorage

Client(Store

command Multicast)

Update or

Insertion record R

StoraStoragege

DiskDiskDiskDisk

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Write to the disk only the parts (pages) changed since last backup.

Restore

Restore File

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures18\50 pages

Update Scheme [PDMST DEXA06]

Normal update

Compare Signature_before and Signature_after of each record. Send the update only if these signatures differ. The client sends only the effectively changed data

Blind update Not search of record.

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Record signatures used as timestamps. Clients reads every record without any wait. It sends back the Before_Signature for comparison with that stored. There is a conflict if these signatures differ

Management of concurrence

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures19\50 pages

Concurent Blind Updates

sk

s1

s2

Buckets of data

in RAM

Client

Calculus signature_before

(R1) Sgn1

Signature_after

Update R1v1 (Sgn’1)

Update R2v2 (Sgn’2)

(R2) Sgn2

Comparison

Sgn’1 = Sgn1

Sgn’2 ≠ Sgn2

No Update

Update R2 ?

Sgn1

Sgn2V2, Sgn2 .

.

.

Calculus Sgn’’2

Sgn’2=Sgn2Update

(R2)v2

Sgn’2 ≠ Sgn’2Concurrent update by another client

No update

Algebraic Signatures?

Backup/ Restore in SDDS-2005

Update Scheme

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Search of R1, R2

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures20\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures21\50 pages

Cumulative Algebraic Signatures? [VLDB-DBISP2P’05]

• Encodes each symbol pi in the record P (p1,p2,..pi,..pn) with the signature of prefix ending at pi.

Protects against incidental data viewing on the servers

• Decoding is necessary Speeds up string match

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Cumulative Algebraic Signatures?

Calculus Cumulative Signatures

Protection Afgainst Incidental viewing of Data

Encoding/ decoding

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures22\50 pages

Encoding/ Decoding

Key Data

Non Key Data

Record’s Structure

Encoding / decoding concern only non key data. Encoding / Decoding in clients (Signatures are calculated in clients)

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Cumulative Algebraic Signatures?

Calculus Cumulative Signatures

Encoding/ decoding

Protection Against Incidental Viewing of Data

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures23\50 pages

Encoding/ Decoding

Client

Serv1Search for P

Signature Match

Encode

DecodeDecode

Insertion of P

Serv2

Serv3

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Encoding P(p1, p2,..,pn) - -> P(p’’1, p’’2,..,p’’n) p’’i= p’’i-1 + p’i = p’’i-1 XOR p’i p’i=pi i = antilog (log pi+i)

Decoding P(p’’1, p’’2…, p’’n) --> P(p1, p2,…, pn) pi=p’i / i = antilog (log p’i- i) p’i= p’’i - p’’i-1= p’’i XOR p’’i-1

Insertion P

Cumulative Algebraic Signatures?

Calculus Cumulative Signatures

Encoding/ decoding

Protection Against Incidental Viewing of Data

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures24\50 pages

Protection against incidental viewing of Data

Cumulative Algebraic Signatures?

Calculus Cumulative Signatures

Encoding/ decoding

Protection Afgainst Incidental viewing of Data

Example:

P =« SOUTENANCE_RIAD_MOKADEM»

Encoding

P’=(p’1p’2,….p’30)=( S, p’1+ 2O,….,p’29+

23M)

Sign(P)= Sign(M) = S+ 2O+….+ 23M

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

SDDS-2005

1 symbol (1B) per signature in GF (28).

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures25\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures26\50 pages

String Search [LNCS’05]

Cumulative SignaturesCumulative Signatures Search in Non key data. Various string matches.

Prefix, String, Longest common prefix, Longest common string.

For Prefix, String search No sent data to search (Sending signature)

Best confidentiality Faster messaging

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching in SDDS-2005

Performance Measurement

Conclusion & Future Work

Preview worksPreview works (String matching)Boyer-Moore, Karp-Rabin, Knuth- Morris-Pratt, Quick Search…

Our Approch

Prefix Search, Complete Search

Sequential Search. N-Gram Search

Longest Common Prefix.

Longest Common String

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures27\50 pages

Prefix Search

Client calculates Sc= Sign(S)= Sign (‘E’)

Sends only Sc & Size= 15 to servers

Search prefix S = ‘PARIS DAUPHINE’

Complexity : O(1)

Example

PARIS DAUPHINE UNIVERSITY

DAUPHINE PARIS UNIVERSITYEncoded Record P1

P2

Sign (p15) !=Sc

Sign (p15) =Sc

Prefix found in Pj

In server

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Our approch

Prefix Search, Complete Search

String Search

Longest Common Prefix

Longest Common String

Collision resolution in client

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures28\50 pages

Complete Search

Full matchFull match

Lr : Longer of record.

Es: Pointer to next record

K: Key of record.

Lc: Version

Sg: Signature of record.

Lr Es K Lc Sg Data

Record Structure

Client sends signature S to search.

Complexity : O(1)

In serverComparison with algebraic signature Sg stored in heading of each record.(Test 1st symbol then 2nd if equality)Sequential cover of records.

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Our approch

Prefix Search, Complete Search

String Search

Longest Common Prefix.

Longest Common String

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures29\50 pages

Sequentail Search

Search for string S= ‘PARIS’

UNIVERSITE PARIS DAUPHINE

In Server

Sequential comparison of signatures (l=5 symbols).

Sign(‘E’) ≠ Sc,….….

Record P

Client sends Sc= Sign (‘Paris’) = Sign (‘S’) & size l =5 to servers.

Complexity : O(n-l)

n: Size of record P

….

)Sign(‘S’)^Sign(‘P’) / ( i =Sc S found

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Our approch

Prefix Search, Complete Search

Sequenetial String Serach, n-Gram Method

Longest Common Prefix.

Longest Common String

Collision resolution on the client

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures30\50 pages

String search by n-Gam

Example searcg by digram (n=2)

S(au) 2 0

S(da) 1 0

S(up) 3 0

S(hi) 5 0

S(ph) 4 0

S(in) 6 0

S(ne)

T: Meta Table n-Gram

si ≠ ne & si not in T Jump= l-1= 7 positions

……..

up ≠ ne & up in T Jump= j+1= 4 positions

… S found 6 comparisons & 5 shifts

Complexity O (m-l)/ (l-n+1)

m size of record

Client send S= ‘Dauphine’ & size l to sever

On server

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Sign j d

Calculus of table of n-gramT = [ s’1…s’l-n]

s’i= Sign(si…si+n)

Our approch

Prefix Search, Complete Search

Sequential Serach, n-Gram Method

Longest Common Prefix

Longest Common String

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures31\50 pages

Longest Common PrefixExample

Record P in S1

Equality in p1, p2, p4, p8.

Inequality in p16, p12.

Equality in p10, p11.

L=11

Record P’ in S2

Equality in p1, p2, p4, p8, p16.

Inequality in p25, p21, p19, p17.

L=16

Record P

In S1Record

P’

In S2

UNIVERSITE PARIS DAUPHINEPrefix S received

by servers

UNIVERSITE PARIS9 DAUPHINE

UNIVERSITE DAUPHINE

Client send String S to servers S1, S2.

On servers

Complexity : Best case O(1) Worse case O(Log2 L-L’). (L, L’ size of successif longest prefix)

S1 & S2 send L to client.

Client select L= 16

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Collision resolution on the servers

Our approch

Prefix Search, Complete Search

String Serach

Longest Common Prefix.

Longest Common String

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures32\50 pages

Longest Common stringExample

L=12

L= 27

LABORATOIRE RECHECHE INFORMATIQUERecord P

BIENVENUES LABORATOIRE CERIA DAUPHINEString S ......

LABORATOIRE CERIA DAUPHINE UNIVERSITY FRANCE Record P’

Client send string S to servers S1 (P, P’ in S1).

On the server

S1 sends L=27 to client. Complexity per record : Best case O(1) +1/N Worse case O(n*l).

(N size of bucket, n size of record, l size of string)

Introduction

SDDS-2005

Algebraic Signatures Cumulative Algebraic Signatures

String Matching

Performance Measurement

Conclusion & Future Work

Collisions resolution on the server

Our approch

Prefix Search, Complete Search

String Serach

Longest Common Prefix

Longest Common String

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures33\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures34\50 pages

Hardware Configuration

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

1.8 GHz P4 Servers1.8 GHz P4 Servers 800 MHz P3 Client 800 MHz P3 Client 500 MHz P3 Name Server500 MHz P3 Name Server 1 Gbs Ethernet1 Gbs Ethernet Windows 2K Server OSWindows 2K Server OS

Hardaware Configuration

File Storage & Update Analysis

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures35\50 pages

File Storage Performance Analysis

Hardaware onfiguration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Bucket size (MB)

Number of

record

Signature

calculus (ms)

Signature

Calculus per/MB

(ms)

Totalstore time (ms)

Store time for

0 % change

(ms)

Gain (%)

Store time for

5 % change (ms)

Gain(%)

1.88 100 46 24.46 562 50 91.1 65 88.43

2.7 150 78 28.8 781 82 89.51 95 87.83

17.6 1000 438 24.88 5078 438 91.38 453 91.07

158 10000 4068 25.74 46406 4071 91.23 4085 91.19

393 25000 11003 27.9 117859 11013 91.33 11018 90.65

Number of record

Storage Time (ms)

Scalability

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures36\50 pages

SHA-1/ Algebraic Signatures

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Bucket size(Mb)

Number of record

Algebraic signature calculus

(ms)

SHA-1calculus

(ms)

Initial Store time with SHA-

1(ms)

Initial Store time

with alg. sign.(ms)

SHA-1 Store

time for 5 %

change (ms)

Alg. sign Store

time for 5 %

change (ms)

Gain(%)

1.88 100 46 70 602 562 85 65 30

2.7 150 78 103 799 781 119 95 25

17.6 1000 438 680 5278 5078 697 453 53

158 10000 4068 6088 47906 46406 6102 4085 49

393 25000 11003 15403 119342 117859 15418 11018 40

0

5000

10000

15000

20000

0 2 4 6

Bucket Size (MB)

Algebraic signature

Cryptographicsignature

Storage Time (ms)

SHA-1 Signatures: 20 Bytes

Algebraic Signatures: 4 Bytes

Hardaware onfiguration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures37\50 pages

Performance of Backup command

1 st request :1 st request :

Signature Signature CalculusCalculus (375 ms) (375 ms) Storage of all pages (4922 ms) Storage of all pages (4922 ms)

2nd Request : 2nd Request : No bucket change (375 ms) No bucket change (375 ms)

3rd Request : 3rd Request : 1 page changed 1 page changed

(375 + 16 ms)(375 + 16 ms)

Hardaware onfiguration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures38\50 pages

Update Measurements

Update

Update Time (with change)

(ms)

Update Time

(No change) (ms)

Normal Update 0.92 0.28

Blind Update 0.74 0.20

Avoid lost updates

Hardaware onfiguration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures39\50 pages

Cost of Encoding/ Decoding Data

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

EncodingEncoding : : 0.045 ms/KB 0.045 ms/KB DecodingDecoding : : 0.042 ms/KB0.042 ms/KB

InsertionInsertion : 0.25 ms/ KB : 0.25 ms/ KB Search: 0.28ms/ KB Search: 0.28ms/ KB

16 % 14 %

Hardaware Configuration

File Storage & Update Measurement

Encoding/ Decoding Data

String Search Experiments

Protection against incidental viewing of data in severs String matching possibilities

Décoding/ SearchEncoding/ Insertion

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures40\50 pages

Response Time For String Matching

Record Record String Offset Time

Position Size (B)Size(B) (B) (msec) 1 20 5 13 0.44

1 100 20 70 0.68

1 100 20 80 0.682

100 100 20 70 72.5

100 100 30 70 71.7 200 100 20 70

165

Record Record Préfix Time

Position Size (B) Size (B) (msec)

1 100 20 0.369

100 250 20 37.8

100 250 35 37.78

200 250 20 71.3

300 250 20 120.53

500 250 20 197.5

String Match Prefix Match

Key search: 0.27 ms/ KB

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Hardaware Configuration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures41\50 pages

Response Time For String Matching

Recordposition

Size of inserted data(B)

Size last Record (B)

Size prefix to search (size prefix

found) (B)

Time to search (ms)

1 50 50 25) 20( 0.372

1/100 250 50 25) 20( 43

49/100 250 50 25) 20( 46.2

99/100 250 50 25) 20( 47

Recordposition

Size of inserted data (B)

Size of last record (prefix)

(B)

Size string to search( Size string

found)(B)

Offset string in record (B)

Time to search (ms)

1 100 100 22 (20) 70 0.62

100 100 22 10 (5) 10 290

100 100 45 15 (10) 10 470

100 120 45 15 (10) 10 565

Longest Prefix Match

Longest Common

String Match

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Hardaware Configuration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures42\50 pages

n-Gram search Measurements

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Up to (l-1) times faster than Cumulative search algorithm

l: Size of string to search

Digram Search

Cumulative Search

String Search Size (bytes)

Searh Time (ms)

Hardaware Configuration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Search in 1 record (300 Bytes)

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures43\50 pages

Comparison (String Matching)

Size of case

Size of records

Size of last

record

Size data to search

Offset in last

Record

Non Encoded Alg Sign Search

Karp Rabin

Search

Cumu-

latives sign

Search100 250 25 10 5 205 151 147

200 250 25 10 5 368 275 268

500 250 25 10 5 1123 725 702

100 250 45 35 5 180 168 126

Cumulative signature reduced string matching timesCumulative signature reduced string matching times

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Hardaware Configuration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures44\50 pages

Comparison (String Matching)

XOR+ Algebraic Signatures

Karp Rabin

Cumulative Signatures

String Search for data < 32B (left) and data > 32B (right)

Search Time (ms)

Records Number

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Gain of Cumulative SearchGain of Cumulative Search

- Previous Algorithms (- Previous Algorithms (Karp-RabinKarp-Rabin))

Saving Saving 5%5% for string < 32B for string < 32B

Saving Saving +20%+20% for string > 32B for string > 32B

- No encoded data (Saving of - No encoded data (Saving of 30%30%). ).

Hardaware Configuration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures45\50 pages

Example of préfix search

Result received from ServersResult received from Servers

Dialog at the ClientDialog at the Client

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

PPrefix Search refix Search operation operation

Hardaware Configuration

File Storage & Update Measurements

Encoding/ Decoding Data

String Search Experiments

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures46\50 pages

4. Algebraic Cumulative Signatures

1. Introduction

2. SDDS-2005

7. Conclusion & Future Work

6. Performance Measurements

PLANP L A N

5. String Matching in SDDS-2005

3. Algebraic signatures

Backup Scheme, Concurency Update

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures47\50 pages

Conclusion

Conclusion

Future Work

Thanking

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Algebraic Signature are an Efficient Basis for New SDDS-2005 Algebraic Signature are an Efficient Basis for New SDDS-2005 CapabilitesCapabilites (Backup, Updates)(Backup, Updates)

Cumulative Algebraic Signatures are Efficient for Incidental Cumulative Algebraic Signatures are Efficient for Incidental View Protection & String SearchView Protection & String Search

Prototype SDDS-2005Prototype SDDS-2005o Up and runningUp and runningo submitted tosubmitted to DBWorldDBWorldo Free downloadFree download atat http:\ceria.dauphine.frhttp:\ceria.dauphine.fr

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures48\50 pages

Future Work

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Conclusion

Future Work

Thanking

More on n-Grams. Altrenative Signature Schemes

(Inverse Signatures using Horner Scheme) Delta Compression using Cumulative signatures. Protection against silent corruption. Alternative GF multiplication methods

(Prefetch, Broder, Tables of …) Collision Resolution of on the clients. SDDS-2005 as part of Virtual Repository of eGov documents

(EGov Project).

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures49\50 pages

Acknowledgements

Work partly supported by CEE Project eGov MS Research CEE ICONS Project IBM Almaden Res. Cntr

Future Work

Conclusion

Thanking

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures50\50 pages

Riad [email protected] [email protected]

Introduction

SDDS-2005

Algebraic Signatures

Cumulative Algebraic Signatures

Performance Measurement

Conclusion & Future Work

Conclusion

Future Work

Thanking