Upload
barry-carr
View
214
Download
0
Embed Size (px)
Citation preview
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures1\50 pages
Riad [email protected] [email protected]
http://ceria.dauphine.fr/riadmokadem/riad.hhttp://ceria.dauphine.fr/riadmokadem/riad.htmltml
Algebraic Signatures For Scalable
Distributed Data Structures
Thesis presentationThesis presentation
CERIA Laboratory
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures2\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures3\50 pages
Facts
SDDS
SDDS RP*
Objective
Facts
New architectures new data structures and file system. Data in Distributed Ram.
Scalability.
Parallel queries…
An SDDS is a new class of data structures Specific for Multicomputers, P2P, Grids… For Any application needing scalability and fast
response time
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures4\50 pages
SDDSs Family
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
SDDS-2005 is based on the RP* SDDS principles
Facts
SDDS
SDDS RP*
Objective
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures5\50 pages
SDDS RP* Scheme
Files are range partitioning (RP*) based Records in distributed RAM Record = (key, non-key field)
Buckets split using median key Like in a B-tree
Clients are not synchronously informed about splits.May send a query to an incorrect serverServers forward incorrectly addressed queries and send back
Image Adjustment Messages to adjust client image.Key search queries Range queries.
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Facts
SDDS
SDDS RP*
Objective
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures6\50 pages
Objective: New Capabilities SDDS-2005
Parallel Store, restore file to/from disk storage The SDDS Backup Scheme
Concurrent access Useless update detection Record Scheme Updates
Protection against incidental viewing of data in ServersEncoded data in bucket
Scans (non-key parallel search) Various string matches
Prefix, String, longest common…
Introduction
SDDS-2005 architecture
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Facts
SDDS
SDDS RP*
Objective
Algebraic Signatures(Cumulative)
Using Algebraic Signatures
Using Algebraic Signatures
Using Cumulative Algebraic Signatures
Using Cumulative Algebraic Signatures
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures7\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures8\50 pages
SDDS-2005 Architecture
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
SDDS-2005 Architecture
Internal Organization of Bucket SDDS-2005
Communication Server Client
Demo Client Interface
Network
ClientClient
Applications
clientclient
Applications
... ...
serverserver
RAM Bucket RAM Bucket
Listen ThreadWork thread
...
NameName
...
...
clientclient clientclient
serverserverserverserver
serverserverRAM Bucket RAM Bucket RAM Bucket RAM Bucket
Applications Applications
Listen ThreadWork thread
Listen ThreadWork thread
Listen ThreadWork thread
Listen ThreadWork thread
Listen ThreadWork thread
Listen ThreadWork thread
Listen ThreadWork thread
Multithread architecture
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures9\50 pages
Internal Organization Of Bucket SDDS-2005
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
HeaderHeader
Index SDDS B+-tree
Data PagesData Pages
SDDS-2005 Architecture
Internal Organization of Bucket SDDS-2005
Communication Server Client
Demo Client Interface
Index : a few Kbytes up to MByte File Mapped structures
Data file : Dozens of Mbytes up to GBytes
Index
Root
Leaf headers
Records
…
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures10\50 pages
Communication Server - Client
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
SDDS-2005 Architecture
Internal Organization of Bucket SDDS-2005
Communication Server Client
Demo Client Interface
ClientSDDS-2005
Network
Threads
ClientSDDS-2005
ClientSDDS-2005
Responses clientRequests client
Server SDDS-2005
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures11\50 pages
Demo Client Interface
Choice of « Search command »Choice of « Search command »
Search by contentSearch by content}}}} Search by Key Search by Key
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
SDDS-2005 Architecture
Internal Organization of Bucket SDDS-2005
Communication sever Client
Demo Client Interface
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures12\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures13\50 pages
Algebraic Signatures? [ICDE’04]
• Galois Field GF(2f ) f>>1
• Each symbol has size f
• f = 8 or f = 16 in SDDS-2005
• XOR used for + and – operations .
• Antilog and Log tables used for * and / .
• Using a primitive element
GF(28) string ASCII Code
GF(216 ) string Unicode
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures14\50 pages
Calculus Algebraic Signatures
• 1-symbol signature
Sign ( P )= pi i i = 1..n
With P=(p1,p2,…,pn) and ( = : , 2, 3… )
• N-symbol signature
Sign (P)= (Sign ( P ), Sign 2( P ),…Sign N ( P ))
• Typical Collision Probability: 2-Nf.
• In SDDS-2005:
• N =1 or N = 2.
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures15\50 pages
Backup Scheme [WDAS’03]
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Need for RAM file backup at disk
Backup only Protection againt RAM failure
File remains in RAM
Eviction RAM sharing among different SDDS files
Restore SDDS file load from disk to RAM
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures16\50 pages
Backup Scheme [WDAS’03]
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
2 mapped files : Data_file, Index_file Bucket Paging
Signed Data Page of 64 KB Signed Index Page of 256 B.
List of Page Signatures = Bucket Map Also backed up at the disk.
Page Signature = Algebraic Signature 2-Symbol in GF(216) = 4B long
Much shorter than SHA-1
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures17\50 pages
Parallel Backup on Disk Storage
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
…… … …
RAM Buckets
StorageStorage
Client(Store
command Multicast)
Update or
Insertion record R
StoraStoragege
DiskDiskDiskDisk
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Write to the disk only the parts (pages) changed since last backup.
Restore
Restore File
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures18\50 pages
Update Scheme [PDMST DEXA06]
Normal update
Compare Signature_before and Signature_after of each record. Send the update only if these signatures differ. The client sends only the effectively changed data
Blind update Not search of record.
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Record signatures used as timestamps. Clients reads every record without any wait. It sends back the Before_Signature for comparison with that stored. There is a conflict if these signatures differ
Management of concurrence
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures19\50 pages
Concurent Blind Updates
sk
s1
s2
Buckets of data
in RAM
Client
Calculus signature_before
(R1) Sgn1
Signature_after
Update R1v1 (Sgn’1)
Update R2v2 (Sgn’2)
(R2) Sgn2
Comparison
Sgn’1 = Sgn1
Sgn’2 ≠ Sgn2
No Update
Update R2 ?
Sgn1
Sgn2V2, Sgn2 .
.
.
Calculus Sgn’’2
Sgn’2=Sgn2Update
(R2)v2
Sgn’2 ≠ Sgn’2Concurrent update by another client
No update
Algebraic Signatures?
Backup/ Restore in SDDS-2005
Update Scheme
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Search of R1, R2
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures20\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures21\50 pages
Cumulative Algebraic Signatures? [VLDB-DBISP2P’05]
• Encodes each symbol pi in the record P (p1,p2,..pi,..pn) with the signature of prefix ending at pi.
Protects against incidental data viewing on the servers
• Decoding is necessary Speeds up string match
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Cumulative Algebraic Signatures?
Calculus Cumulative Signatures
Protection Afgainst Incidental viewing of Data
Encoding/ decoding
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures22\50 pages
Encoding/ Decoding
Key Data
Non Key Data
Record’s Structure
Encoding / decoding concern only non key data. Encoding / Decoding in clients (Signatures are calculated in clients)
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Cumulative Algebraic Signatures?
Calculus Cumulative Signatures
Encoding/ decoding
Protection Against Incidental Viewing of Data
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures23\50 pages
Encoding/ Decoding
Client
Serv1Search for P
Signature Match
Encode
DecodeDecode
Insertion of P
Serv2
Serv3
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Encoding P(p1, p2,..,pn) - -> P(p’’1, p’’2,..,p’’n) p’’i= p’’i-1 + p’i = p’’i-1 XOR p’i p’i=pi i = antilog (log pi+i)
Decoding P(p’’1, p’’2…, p’’n) --> P(p1, p2,…, pn) pi=p’i / i = antilog (log p’i- i) p’i= p’’i - p’’i-1= p’’i XOR p’’i-1
Insertion P
Cumulative Algebraic Signatures?
Calculus Cumulative Signatures
Encoding/ decoding
Protection Against Incidental Viewing of Data
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures24\50 pages
Protection against incidental viewing of Data
Cumulative Algebraic Signatures?
Calculus Cumulative Signatures
Encoding/ decoding
Protection Afgainst Incidental viewing of Data
Example:
P =« SOUTENANCE_RIAD_MOKADEM»
Encoding
P’=(p’1p’2,….p’30)=( S, p’1+ 2O,….,p’29+
23M)
Sign(P)= Sign(M) = S+ 2O+….+ 23M
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
SDDS-2005
1 symbol (1B) per signature in GF (28).
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures25\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures26\50 pages
String Search [LNCS’05]
Cumulative SignaturesCumulative Signatures Search in Non key data. Various string matches.
Prefix, String, Longest common prefix, Longest common string.
For Prefix, String search No sent data to search (Sending signature)
Best confidentiality Faster messaging
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching in SDDS-2005
Performance Measurement
Conclusion & Future Work
Preview worksPreview works (String matching)Boyer-Moore, Karp-Rabin, Knuth- Morris-Pratt, Quick Search…
Our Approch
Prefix Search, Complete Search
Sequential Search. N-Gram Search
Longest Common Prefix.
Longest Common String
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures27\50 pages
Prefix Search
Client calculates Sc= Sign(S)= Sign (‘E’)
Sends only Sc & Size= 15 to servers
Search prefix S = ‘PARIS DAUPHINE’
Complexity : O(1)
Example
PARIS DAUPHINE UNIVERSITY
DAUPHINE PARIS UNIVERSITYEncoded Record P1
P2
Sign (p15) !=Sc
Sign (p15) =Sc
Prefix found in Pj
In server
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Our approch
Prefix Search, Complete Search
String Search
Longest Common Prefix
Longest Common String
Collision resolution in client
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures28\50 pages
Complete Search
Full matchFull match
Lr : Longer of record.
Es: Pointer to next record
K: Key of record.
Lc: Version
Sg: Signature of record.
Lr Es K Lc Sg Data
Record Structure
Client sends signature S to search.
Complexity : O(1)
In serverComparison with algebraic signature Sg stored in heading of each record.(Test 1st symbol then 2nd if equality)Sequential cover of records.
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Our approch
Prefix Search, Complete Search
String Search
Longest Common Prefix.
Longest Common String
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures29\50 pages
Sequentail Search
Search for string S= ‘PARIS’
UNIVERSITE PARIS DAUPHINE
In Server
Sequential comparison of signatures (l=5 symbols).
Sign(‘E’) ≠ Sc,….….
Record P
Client sends Sc= Sign (‘Paris’) = Sign (‘S’) & size l =5 to servers.
Complexity : O(n-l)
n: Size of record P
….
)Sign(‘S’)^Sign(‘P’) / ( i =Sc S found
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Our approch
Prefix Search, Complete Search
Sequenetial String Serach, n-Gram Method
Longest Common Prefix.
Longest Common String
Collision resolution on the client
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures30\50 pages
String search by n-Gam
Example searcg by digram (n=2)
S(au) 2 0
S(da) 1 0
S(up) 3 0
S(hi) 5 0
S(ph) 4 0
S(in) 6 0
S(ne)
T: Meta Table n-Gram
si ≠ ne & si not in T Jump= l-1= 7 positions
……..
up ≠ ne & up in T Jump= j+1= 4 positions
… S found 6 comparisons & 5 shifts
Complexity O (m-l)/ (l-n+1)
m size of record
Client send S= ‘Dauphine’ & size l to sever
On server
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Sign j d
Calculus of table of n-gramT = [ s’1…s’l-n]
s’i= Sign(si…si+n)
Our approch
Prefix Search, Complete Search
Sequential Serach, n-Gram Method
Longest Common Prefix
Longest Common String
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures31\50 pages
Longest Common PrefixExample
Record P in S1
Equality in p1, p2, p4, p8.
Inequality in p16, p12.
Equality in p10, p11.
L=11
Record P’ in S2
Equality in p1, p2, p4, p8, p16.
Inequality in p25, p21, p19, p17.
L=16
Record P
In S1Record
P’
In S2
UNIVERSITE PARIS DAUPHINEPrefix S received
by servers
UNIVERSITE PARIS9 DAUPHINE
UNIVERSITE DAUPHINE
Client send String S to servers S1, S2.
On servers
Complexity : Best case O(1) Worse case O(Log2 L-L’). (L, L’ size of successif longest prefix)
S1 & S2 send L to client.
Client select L= 16
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Collision resolution on the servers
Our approch
Prefix Search, Complete Search
String Serach
Longest Common Prefix.
Longest Common String
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures32\50 pages
Longest Common stringExample
L=12
L= 27
LABORATOIRE RECHECHE INFORMATIQUERecord P
BIENVENUES LABORATOIRE CERIA DAUPHINEString S ......
LABORATOIRE CERIA DAUPHINE UNIVERSITY FRANCE Record P’
Client send string S to servers S1 (P, P’ in S1).
On the server
S1 sends L=27 to client. Complexity per record : Best case O(1) +1/N Worse case O(n*l).
(N size of bucket, n size of record, l size of string)
Introduction
SDDS-2005
Algebraic Signatures Cumulative Algebraic Signatures
String Matching
Performance Measurement
Conclusion & Future Work
Collisions resolution on the server
Our approch
Prefix Search, Complete Search
String Serach
Longest Common Prefix
Longest Common String
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures33\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures34\50 pages
Hardware Configuration
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
1.8 GHz P4 Servers1.8 GHz P4 Servers 800 MHz P3 Client 800 MHz P3 Client 500 MHz P3 Name Server500 MHz P3 Name Server 1 Gbs Ethernet1 Gbs Ethernet Windows 2K Server OSWindows 2K Server OS
Hardaware Configuration
File Storage & Update Analysis
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures35\50 pages
File Storage Performance Analysis
Hardaware onfiguration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Bucket size (MB)
Number of
record
Signature
calculus (ms)
Signature
Calculus per/MB
(ms)
Totalstore time (ms)
Store time for
0 % change
(ms)
Gain (%)
Store time for
5 % change (ms)
Gain(%)
1.88 100 46 24.46 562 50 91.1 65 88.43
2.7 150 78 28.8 781 82 89.51 95 87.83
17.6 1000 438 24.88 5078 438 91.38 453 91.07
158 10000 4068 25.74 46406 4071 91.23 4085 91.19
393 25000 11003 27.9 117859 11013 91.33 11018 90.65
Number of record
Storage Time (ms)
Scalability
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures36\50 pages
SHA-1/ Algebraic Signatures
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Bucket size(Mb)
Number of record
Algebraic signature calculus
(ms)
SHA-1calculus
(ms)
Initial Store time with SHA-
1(ms)
Initial Store time
with alg. sign.(ms)
SHA-1 Store
time for 5 %
change (ms)
Alg. sign Store
time for 5 %
change (ms)
Gain(%)
1.88 100 46 70 602 562 85 65 30
2.7 150 78 103 799 781 119 95 25
17.6 1000 438 680 5278 5078 697 453 53
158 10000 4068 6088 47906 46406 6102 4085 49
393 25000 11003 15403 119342 117859 15418 11018 40
0
5000
10000
15000
20000
0 2 4 6
Bucket Size (MB)
Algebraic signature
Cryptographicsignature
Storage Time (ms)
SHA-1 Signatures: 20 Bytes
Algebraic Signatures: 4 Bytes
Hardaware onfiguration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures37\50 pages
Performance of Backup command
1 st request :1 st request :
Signature Signature CalculusCalculus (375 ms) (375 ms) Storage of all pages (4922 ms) Storage of all pages (4922 ms)
2nd Request : 2nd Request : No bucket change (375 ms) No bucket change (375 ms)
3rd Request : 3rd Request : 1 page changed 1 page changed
(375 + 16 ms)(375 + 16 ms)
Hardaware onfiguration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures38\50 pages
Update Measurements
Update
Update Time (with change)
(ms)
Update Time
(No change) (ms)
Normal Update 0.92 0.28
Blind Update 0.74 0.20
Avoid lost updates
Hardaware onfiguration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures39\50 pages
Cost of Encoding/ Decoding Data
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
EncodingEncoding : : 0.045 ms/KB 0.045 ms/KB DecodingDecoding : : 0.042 ms/KB0.042 ms/KB
InsertionInsertion : 0.25 ms/ KB : 0.25 ms/ KB Search: 0.28ms/ KB Search: 0.28ms/ KB
16 % 14 %
Hardaware Configuration
File Storage & Update Measurement
Encoding/ Decoding Data
String Search Experiments
Protection against incidental viewing of data in severs String matching possibilities
Décoding/ SearchEncoding/ Insertion
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures40\50 pages
Response Time For String Matching
Record Record String Offset Time
Position Size (B)Size(B) (B) (msec) 1 20 5 13 0.44
1 100 20 70 0.68
1 100 20 80 0.682
100 100 20 70 72.5
100 100 30 70 71.7 200 100 20 70
165
Record Record Préfix Time
Position Size (B) Size (B) (msec)
1 100 20 0.369
100 250 20 37.8
100 250 35 37.78
200 250 20 71.3
300 250 20 120.53
500 250 20 197.5
String Match Prefix Match
Key search: 0.27 ms/ KB
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Hardaware Configuration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures41\50 pages
Response Time For String Matching
Recordposition
Size of inserted data(B)
Size last Record (B)
Size prefix to search (size prefix
found) (B)
Time to search (ms)
1 50 50 25) 20( 0.372
1/100 250 50 25) 20( 43
49/100 250 50 25) 20( 46.2
99/100 250 50 25) 20( 47
Recordposition
Size of inserted data (B)
Size of last record (prefix)
(B)
Size string to search( Size string
found)(B)
Offset string in record (B)
Time to search (ms)
1 100 100 22 (20) 70 0.62
100 100 22 10 (5) 10 290
100 100 45 15 (10) 10 470
100 120 45 15 (10) 10 565
Longest Prefix Match
Longest Common
String Match
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Hardaware Configuration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures42\50 pages
n-Gram search Measurements
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Up to (l-1) times faster than Cumulative search algorithm
l: Size of string to search
Digram Search
Cumulative Search
String Search Size (bytes)
Searh Time (ms)
Hardaware Configuration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Search in 1 record (300 Bytes)
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures43\50 pages
Comparison (String Matching)
Size of case
Size of records
Size of last
record
Size data to search
Offset in last
Record
Non Encoded Alg Sign Search
Karp Rabin
Search
Cumu-
latives sign
Search100 250 25 10 5 205 151 147
200 250 25 10 5 368 275 268
500 250 25 10 5 1123 725 702
100 250 45 35 5 180 168 126
Cumulative signature reduced string matching timesCumulative signature reduced string matching times
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Hardaware Configuration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures44\50 pages
Comparison (String Matching)
XOR+ Algebraic Signatures
Karp Rabin
Cumulative Signatures
String Search for data < 32B (left) and data > 32B (right)
Search Time (ms)
Records Number
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Gain of Cumulative SearchGain of Cumulative Search
- Previous Algorithms (- Previous Algorithms (Karp-RabinKarp-Rabin))
Saving Saving 5%5% for string < 32B for string < 32B
Saving Saving +20%+20% for string > 32B for string > 32B
- No encoded data (Saving of - No encoded data (Saving of 30%30%). ).
Hardaware Configuration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures45\50 pages
Example of préfix search
Result received from ServersResult received from Servers
Dialog at the ClientDialog at the Client
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
PPrefix Search refix Search operation operation
Hardaware Configuration
File Storage & Update Measurements
Encoding/ Decoding Data
String Search Experiments
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures46\50 pages
4. Algebraic Cumulative Signatures
1. Introduction
2. SDDS-2005
7. Conclusion & Future Work
6. Performance Measurements
PLANP L A N
5. String Matching in SDDS-2005
3. Algebraic signatures
Backup Scheme, Concurency Update
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures47\50 pages
Conclusion
Conclusion
Future Work
Thanking
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Algebraic Signature are an Efficient Basis for New SDDS-2005 Algebraic Signature are an Efficient Basis for New SDDS-2005 CapabilitesCapabilites (Backup, Updates)(Backup, Updates)
Cumulative Algebraic Signatures are Efficient for Incidental Cumulative Algebraic Signatures are Efficient for Incidental View Protection & String SearchView Protection & String Search
Prototype SDDS-2005Prototype SDDS-2005o Up and runningUp and runningo submitted tosubmitted to DBWorldDBWorldo Free downloadFree download atat http:\ceria.dauphine.frhttp:\ceria.dauphine.fr
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures48\50 pages
Future Work
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Conclusion
Future Work
Thanking
More on n-Grams. Altrenative Signature Schemes
(Inverse Signatures using Horner Scheme) Delta Compression using Cumulative signatures. Protection against silent corruption. Alternative GF multiplication methods
(Prefetch, Broder, Tables of …) Collision Resolution of on the clients. SDDS-2005 as part of Virtual Repository of eGov documents
(EGov Project).
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures49\50 pages
Acknowledgements
Work partly supported by CEE Project eGov MS Research CEE ICONS Project IBM Almaden Res. Cntr
Future Work
Conclusion
Thanking
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Riad MOKADEM – June 22th, 2006 Algebraic Signatures For Scalable Distributed Data Structures50\50 pages
Riad [email protected] [email protected]
Introduction
SDDS-2005
Algebraic Signatures
Cumulative Algebraic Signatures
Performance Measurement
Conclusion & Future Work
Conclusion
Future Work
Thanking