22
Tagging schema design for high performance

Data structures for cloud tag storage

Embed Size (px)

Citation preview

Page 1: Data structures for cloud tag storage

Tagging schema design for high performance

Page 2: Data structures for cloud tag storage

Plan

▪ Tagging basis▪ Database challenges▪ Tagging solutions▪ Pros and cons▪ Q&A session

Page 3: Data structures for cloud tag storage

Tagging terms• Tag is a non-hierarchical keyword or term assigned to a piece of information• Tags are generally chosen informally and personally by the item's creator or

by its viewer• If tags are assigned by the creator and are limited it is taxonomy• If tags are assigned by the viewer and are unlimited it is folksonomy • Started to be widely used from 2003 by Flikr and Delicious web sites• Tags are showed usually inline as well as tag cloud

Page 4: Data structures for cloud tag storage

Tagging challenges+1. used vocabulary reflects the user’s vocabulary directly 2. flexibility - the user can add or remove tags3. multi-dimensional nature - users can assign any number and combination of tags to express a

concept

lead to-4. specialized tags or tags without meaning to others than themselves, misspellings,

singular/plural form, compound words5. tags are often ambiguous, overly personalized, poorly applied tag6. Using synonyms, acronyms and homonyms which aren’t handled well

Page 5: Data structures for cloud tag storage

Database challenges

1. Performance2. Queries awkwardness3. Database size4. Housekeeping

Page 6: Data structures for cloud tag storage

High normalized approach

Page 7: Data structures for cloud tag storage

Denormalized approach

Page 8: Data structures for cloud tag storage

Complex data type approach

Page 9: Data structures for cloud tag storage

Full-text-search oriented solutions

Stackoverflow: <php><mysql><guid><encryption>JSON: {“tags”:[“php”, “apache2”, “openinviter”]}

Page 10: Data structures for cloud tag storage

Full-text-search approaches

FTS inside DB

+FTS model

Relational/denormalized/FTS model

Approach 1 Approach 2

FTS server(Lucene, Sphinx,

Elastic, Solr, Xapian, etc)

Application

server

Application

server

Page 11: Data structures for cloud tag storage

Housekeeping

Denormalized/FTS1. Change all affected tags in all documents if a tag name changedFTS1. FTS index rebuild due fragmentation 2. FTS index refresh if it isn’t refreshed on COMMIT

Page 12: Data structures for cloud tag storage

Test exampleStackOverflow posts via http://data.stackexchange.com/From 31/07/2008 to 21-12-2012Posts: 2 680 474Applied tags: 7 791 527Used unique tags: 30 485Max tags count for a post: 5

Page 13: Data structures for cloud tag storage

Comparison

Initial population time

Relational

Denormalized

Complex data type

Full text search

0 500 1000 1500 2000 2500

Insert time

ModelInsert time, seconds

Relational 1048Denormalized 1205Complex data type 2086Full text search 1950

Page 14: Data structures for cloud tag storage

Comparison

DB sizeModel

Size total, MB

Data size, MB

Index size, MB

Relational 1166 338 828Denormalized 1080 376 704Complex data type 1134 256 878Full text search 1055 416 639

Relational

Denormalized

Complex data type

Full text search

0 200 400 600 800 1000 1200 1400

DB size

Index size, MB Data size, MB Size total, MB

Page 15: Data structures for cloud tag storage

Comparison

Search by document id and all tag retrieval

ModelSpeed with cold cache, seconds

Speed with hot cache, seconds

Relational 0,2 0,003Denormalized 0,07 0,002Complex data type 0,9 0,002Full text search 0,3 0,001

Relational

Denormalized

Complex data type

Full text search

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Speed with cold cache, seconds

Relational

Denormalized

Complex data type

Full text search

0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035

Speed with hot cache, seconds

Page 16: Data structures for cloud tag storage

Comparison

Search using 1 tags and all tag retrieval

Model

Speed with cold cache, seconds

Speed with hot cache, seconds

Relational 1 0,005Denormalized 0,7 0,004Complex data type 1,7 0,005Full text search 0,7 0,002

Relational

Denormalized

Complex data type

Full text search

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Speed with cold cache, seconds

Relational

Denormalized

Complex data type

Full text search

0 0.001 0.002 0.003 0.004 0.005 0.006

Speed with hot cache, seconds

Page 17: Data structures for cloud tag storage

ComparisonSearch by AND using 2 tags and all tag retrieval

Model

Speed with cold cache, seconds

Speed with hot cache, seconds

Relational 40 34Denormalized 34 20Complex data type 34 14Full text search 20 2

Relational

Denormalized

Complex data type

Full text search

0 5 10 15 20 25 30 35 40 45

Search speed

Speed with hot cache, seconds Speed with cold cache, seconds

Page 18: Data structures for cloud tag storage

Comparison

Cloud tag populationModel Speed, secondsrelation 20relational simplified 18relational without fk 202denormalized 18Complex data type 21fts 40

relation

relational simplified

relational without fk

denormalized

array

fts

0 50 100 150 200 250

Speed, seconds

Page 19: Data structures for cloud tag storage

Pros & Cons

ModelSpace consumption

Search performance Insert performance

Maintenance

Additional housekeeping

Risk of failure

Search queries development

Relational worst worst highest minimal not required no worst

Denormalized moderate moderate good required required no moderate

Complex data type moderate moderate worst required required no moderate

Full text search optimal optimal moderate required required yes optimal

Page 20: Data structures for cloud tag storage

Conclusion

1. Choose your best model based on:• Performance (search/insert/update)• Space consumption• Engineer experience• Hardware cost• Software cost

2. Each storage model should be checked on your RDBMS - don’t be afraid to try and measure

3. Understanding how complex data types are stored inside is crucial4. Understanding how FTS works inside is crucial5. Investigate your DBMS unique features

There is no silver bullet for tag storage model!

Page 21: Data structures for cloud tag storage

Q&A

Page 22: Data structures for cloud tag storage

Contacts

Feel free to ask any db-related questions: [email protected]