33
Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 1 E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data Andreas Both, Head of Research and Development UNISTER GmbH, Germany

E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Embed Size (px)

Citation preview

Page 1: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 1

E-Commerce and Graph-driven Applications:Experiences and Optimizations while

moving to Linked Data

Andreas Both, Head of Research and DevelopmentUNISTER GmbH, Germany

Page 2: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 2

Unister Group

e-commerce company

founded 2002

major B2C web portals in Germany (and Europe)

verticals: travel, flights, travel packages, retail, . . .integrated business model10 million unique users per month (Germany, AGOF)

increased number of employees

2003: 12015: 1600

Page 3: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 2

Unister Group

e-commerce company

founded 2002

major B2C web portals in Germany (and Europe)

verticals: travel, flights, travel packages, retail, . . .integrated business model10 million unique users per month (Germany, AGOF)

increased number of employees

2003: 12015: 1600

Page 4: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3

Use Case

Agenda for e-commerce companies:

take advantage of linked data

unchain datastores from schema

Requirements:

fast

robust

scalable

→ Users: I want it all. I want it now.

Page 5: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3

Use Case

Agenda for e-commerce companies:

take advantage of linked data

unchain datastores from schema

Requirements:

fast

robust

scalable

→ Users: I want it all. I want it now.

Page 6: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3

Use Case

Agenda for e-commerce companies:

take advantage of linked data

unchain datastores from schema

Requirements:

fast

robust

scalable

→ Users: I want it all. I want it now.

Page 7: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 4

Typical Data Structures and Queries

hierarchical (directed) region graph

hotels and regions might have many features

typical queries: select several features of hotels

Page 8: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 5

Example Query

PREFIX uo : <ht tp : // on to l ogy . u n i s t e r . de/ on to l ogy#>PREFIX uor : <ht tp : // on to l ogy . u n i s t e r . de/ r e s o u r c e/>PREFIX uo r f : <ht tp : // on to l ogy . u n i s t e r . de/ h o t e l / f a c i l i t y />PREFIX uos : <ht tp : // on to l ogy . u n i s t e r . de/ skos/>

SELECT d i s t i n c t ? s {? s a uo : Hote l ;

uo : ha sFea tu r e u o r f : 5 6 ,u o r f : 1 8 ,u o r f : 2 1 ,u o r f : 210 ,u o r f : 5 ,u o r f : 211 ,u o r f : 3 4 ,u o r f : 1 7 ;

uo : l o c a t e d I n uor : Europe ;uo : s u i t a b l e F o r uos : Fami ly

} LIMIT 10 ;

Page 9: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6

Experiences: standard search process

A search for attributes

...1 very selective

...2 less selective

B pick a region

C sort the results

D limit the selection

Setting:

Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583

Virtuoso: version 7.1 (fast track1), 824 MB, buffer size: 70,000

Experiments: 20 runs, charts show median

1https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent

Page 10: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6

Experiences: standard search process

A search for attributes

...1 very selective

...2 less selective

B pick a region

C sort the results

D limit the selection

Setting:

Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583

Virtuoso: version 7.1 (fast track1), 824 MB, buffer size: 70,000

Experiments: 20 runs, charts show median

1https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent

Page 11: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6

Experiences: standard search process

A search for attributes

...1 very selective

...2 less selective

B pick a region

C sort the results

D limit the selection

Setting:

Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583

Virtuoso: version 7.1 (fast track1), 824 MB, buffer size: 70,000

Experiments: 20 runs, charts show median

1https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent

Page 12: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 7

Requirements for Industrial Applicability (in e-commerce)

requirements for replacingtraditional databases:

fast: short response time

search query refinement→ shorter response time

robust: similar answer times

easy to scale up

system resource efficient

→ requirements not fulfilled

Page 13: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 7

Requirements for Industrial Applicability (in e-commerce)

requirements for replacingtraditional databases:

fast: short response time

search query refinement→ shorter response time

robust: similar answer times

easy to scale up

system resource efficient

→ requirements not fulfilled

Page 14: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 8

Example Query

PREFIX uo : <ht tp : // on to l ogy . u n i s t e r . de/ on to l ogy#>PREFIX uor : <ht tp : // on to l ogy . u n i s t e r . de/ r e s o u r c e/>PREFIX uo r f : <ht tp : // on to l ogy . u n i s t e r . de/ h o t e l / f a c i l i t y />PREFIX uos : <ht tp : // on to l ogy . u n i s t e r . de/ skos/>

SELECT d i s t i n c t ? s {? s a uo : Hote l ;

uo : ha sFea tu r e uo r f : 5 6 ,uo r f : 1 8 ,uo r f : 2 1 ,uo r f : 2 10 ,uo r f : 5 ,uo r f : 2 11 ,uo r f : 3 4 ,uo r f : 1 7 ;

uo : l o c a t e d I n uor : Europe ;uo : s u i t a b l e F o r uos : Fami ly

} LIMIT 10 ;

Page 15: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9

Data Preparation

hotel entity p1 p2 p3 . . . pn

hotel1 0 0 1 . . . 0hotel2 1 0 1 . . . 1hotel3 1 1 1 . . . 0hotel4 1 0 1 . . . 1

......

......

......

hotelm 0 0 1 . . . 0

BitSet representation of (hotel) properties:p =̂ 0010...0

Advantages:

no index

very small

operations in-memory

easy update

easy insert

Page 16: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9

Data Preparation

hotel entity p1 p2 p3 . . . pn

hotel1 0 0 1 . . . 0hotel2 1 0 1 . . . 1hotel3 1 1 1 . . . 0hotel4 1 0 1 . . . 1

......

......

......

hotelm 0 0 1 . . . 0

BitSet representation of (hotel) properties:p =̂ 0010...0

Advantages:

no index

very small

operations in-memory

easy update

easy insert

Page 17: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9

Data Preparation

hotel entity p1 p2 p3 . . . pn

hotel1 0 0 1 . . . 0hotel2 1 0 1 . . . 1hotel3 1 1 1 . . . 0hotel4 1 0 1 . . . 1

......

......

......

hotelm 0 0 1 . . . 0

BitSet representation of (hotel) properties:p =̂ 0010...0

Advantages:

no index

very small

operations in-memory

easy update

easy insert

Page 18: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 10

Data Preparation

BitSet Setting, Virtuoso adaptions:

16507 stored properties → 63,109,198 B RAM used

Virtuoso: 824 MB → 706 MB

Virtuoso set-up update: buffer size=60000

Page 19: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11

Implemented Process: Virtuoso plugin

(with kind help of the Openlink team, GeoKnow Project2)

1 interpret bif:contains (workaround!)

2 request bitsets from memcache via JNI (workaround!)

3 compute hotels using bit operations on addressed bitsets

4 map hotel IDs to Virtuoso literal IDs (workaround!)query IDs from Virtuoso via literal selectionrequires special predicate for each hotel resource

5 return cursor on result set

2 This work has been supported by grants from theEuropean Union’s 7th Framework Programme providedfor the project GeoKnow (GA no. 318159)), c.f.,http://geoknow.eu

Page 20: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11

Implemented Process: Virtuoso plugin

(with kind help of the Openlink team, GeoKnow Project2)

1 interpret bif:contains (workaround!)

2 request bitsets from memcache via JNI (workaround!)

3 compute hotels using bit operations on addressed bitsets

4 map hotel IDs to Virtuoso literal IDs (workaround!)query IDs from Virtuoso via literal selectionrequires special predicate for each hotel resource

5 return cursor on result set

2 This work has been supported by grants from theEuropean Union’s 7th Framework Programme providedfor the project GeoKnow (GA no. 318159)), c.f.,http://geoknow.eu

Page 21: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11

Implemented Process: Virtuoso plugin

(with kind help of the Openlink team, GeoKnow Project2)

1 interpret bif:contains (workaround!)

2 request bitsets from memcache via JNI (workaround!)

3 compute hotels using bit operations on addressed bitsets

4 map hotel IDs to Virtuoso literal IDs (workaround!)query IDs from Virtuoso via literal selectionrequires special predicate for each hotel resource

5 return cursor on result set

2 This work has been supported by grants from theEuropean Union’s 7th Framework Programme providedfor the project GeoKnow (GA no. 318159)), c.f.,http://geoknow.eu

Page 22: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 12

Preliminary Results of A: properties in BitSets

Observations:

more complex →less response time

stable responsetimes

warmup required

Page 23: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 13

Preliminary Results of B: non-selective property in Virtuoso

Observations:

less selectivefeature answeredwithin Virtuosohas largest impacton computationtime

Page 24: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 14

Preliminary Results of C: order by

Observations:

sorting is notdone in BitSet,but might bepossible toimplement in thefuture

Page 25: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 15

Preliminary Results D: limit 10

Observations:

limit is not donein BitSet, butmight be possibleto implement inthe future

Page 26: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16

Discussion

Summary:

proven good performance

query time is robust

very resource efficient

no schema required

→ if a star pattern isrecognizable, then use bitsetoptimization

ToDos (not production ready):

overcome workarounds

tighten the integration

generalize interface

extend to ElasticSearch

→ Virtuoso with full-text indexcluster)

Page 27: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16

Discussion

Summary:

proven good performance

query time is robust

very resource efficient

no schema required

→ if a star pattern isrecognizable, then use bitsetoptimization

ToDos (not production ready):

overcome workarounds

tighten the integration

generalize interface

extend to ElasticSearch

→ Virtuoso with full-text indexcluster)

Page 28: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16

Discussion

Summary:

proven good performance

query time is robust

very resource efficient

no schema required

→ if a star pattern isrecognizable, then use bitsetoptimization

ToDos (not production ready):

overcome workarounds

tighten the integration

generalize interface

extend to ElasticSearch

→ Virtuoso with full-text indexcluster)

Page 29: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16

Discussion

Summary:

proven good performance

query time is robust

very resource efficient

no schema required

→ if a star pattern isrecognizable, then use bitsetoptimization

ToDos (not production ready):

overcome workarounds

tighten the integration

generalize interface

extend to ElasticSearch

→ Virtuoso with full-text indexcluster)

Page 30: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16

Discussion

Summary:

proven good performance

query time is robust

very resource efficient

no schema required

→ if a star pattern isrecognizable, then use bitsetoptimization

ToDos (not production ready):

overcome workarounds

tighten the integration

generalize interface

extend to ElasticSearch

→ Virtuoso with full-text indexcluster)

Page 31: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17

Take Away Messages

e-commerce use case requires short and robust request times

BitSet-driven extension has proven its value

→ basic requirements of e-commerce scenario fulfilled→ still flexible (schemaless), but performant

taking advantage of external data structures is possible (inVirtuoso)

Dr. Andreas BothHead of Researchand DevelopmentUnister GmbH,Leipzig, Germany

[email protected]

+49 341 65050 24496

http://www.unister.de

Page 32: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17

Take Away Messages

e-commerce use case requires short and robust request times

BitSet-driven extension has proven its value

→ basic requirements of e-commerce scenario fulfilled→ still flexible (schemaless), but performant

taking advantage of external data structures is possible (inVirtuoso)

Dr. Andreas BothHead of Researchand DevelopmentUnister GmbH,Leipzig, Germany

[email protected]

+49 341 65050 24496

http://www.unister.de

Page 33: E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data

Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17

Take Away Messages

e-commerce use case requires short and robust request times

BitSet-driven extension has proven its value

→ basic requirements of e-commerce scenario fulfilled→ still flexible (schemaless), but performant

taking advantage of external data structures is possible (inVirtuoso)

Dr. Andreas BothHead of Researchand DevelopmentUnister GmbH,Leipzig, Germany

[email protected]

+49 341 65050 24496

http://www.unister.de