71
When a relational database doesn’t work And why a graph database might help

Wed 1130 aasman_jans_color

Embed Size (px)

Citation preview

Page 1: Wed 1130 aasman_jans_color

When a relational database  doesn’t work

And why a graph database might help

Page 2: Wed 1130 aasman_jans_color

ContentsContents

• Franz and customers• Two Use Cases

– Amdocs: a real time semantic platform for telecom that knows everything about everyone in real time

– Real time news  and social network analysis using the Linked Open Data CloudLinked Open Data Cloud

• Scalability?• Integration with other NoSQL databases – Solr, MongoDBg , g

Page 3: Wed 1130 aasman_jans_color

Franz Inc – Who We AreFranz Inc  Who We Are

• Private, founded 1984 • We are an AI and 

Semantic Technology company• Out of BerkeleyOut of Berkeley

Page 4: Wed 1130 aasman_jans_color
Page 5: Wed 1130 aasman_jans_color

(1 (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15)(16 17) (18 19 20 21 22 23 24 27 28) (29 30))

Page 6: Wed 1130 aasman_jans_color

Bob

AliceCraig

Bill

Page 7: Wed 1130 aasman_jans_color
Page 8: Wed 1130 aasman_jans_color

How is it different from an RDB d h i i fl ibl ?and why is it more flexible?

• No Schema. – Say whatever you want to say but– ontologies may constrain what you put in triple store

• No Link Tables – because you can do one‐to‐many relationships directly

• No Indexing Choices– Can add new data attributes (predicates) on‐the‐fly that will be real‐time available for querying becausewill be real time available for querying, because everything is automatically indexed.

• Takes anything you give it: it is trivial to consume– Rows and columns from RDB, XML, RDF(S), OWL, Text and Extracted Entities, JSON

Page 9: Wed 1130 aasman_jans_color

AllegroGraph: RDF Graph StoreAllegroGraph: RDF Graph Store

RESTBackup/Restore

ReplicationRules Java

Warm FailoverSparql Prolog Rules 

Clif++ Geo SNA Time RDFS+ Java‐Script

Session Management, Query Engine, FederationSecurity

ManagementStorage layer ( compression,  indexing, freetext, transactions )

Page 10: Wed 1130 aasman_jans_color
Page 11: Wed 1130 aasman_jans_color
Page 12: Wed 1130 aasman_jans_color

Use Case AmdocsUse Case Amdocs

Build a semantic platformthat knows everything

babout everyonein real time.

Page 13: Wed 1130 aasman_jans_color
Page 14: Wed 1130 aasman_jans_color
Page 15: Wed 1130 aasman_jans_color
Page 16: Wed 1130 aasman_jans_color
Page 17: Wed 1130 aasman_jans_color
Page 18: Wed 1130 aasman_jans_color
Page 19: Wed 1130 aasman_jans_color
Page 20: Wed 1130 aasman_jans_color
Page 21: Wed 1130 aasman_jans_color
Page 22: Wed 1130 aasman_jans_color

Telco Call Center Volume QuadruplesQuadruples Since 2007

• On average, each call – Lasts 10 minutes– Go thru 68 screens

• One call costs 3 months’ profit from that customer• One call costs 3 months  profit from that customer• It’s getting worse every day!

Page 23: Wed 1130 aasman_jans_color

Typical Interaction Begins in the Dark

Bill

Dark

PlanPast Payments The unknown – why 

calling? How to help?

DeviceCalculator (avg peak usage)

g p

Past Interactions (Memos)

Statements

No real‐time context           ‐ insight & guidance

(Memos)g g

High AHT, poor FCR, low customer and agent satisfaction

Page 24: Wed 1130 aasman_jans_color
Page 25: Wed 1130 aasman_jans_color

AIDA Maps Events to C tConcepts

Events from many source systems are transformed into a set of related business concepts

Interactions

Bills

Orders

Many events Triple Store with business concepts

Bills

Payments

Collections

Charge disputeg p

Individual

Customer

Pay instructions Subjective  "good payer"Patterns  "always pays 2 days late"

Chronology of events

Device Activated

Device heartbeat

Subscriptions

D i h

a e s a ays pays days a eTrends “improving payer"Geospatial  “within 5 miles of the tower"Time  “within 5 minutes of an outage" Chronology of eventsDevice changes Probability  “probably will call about the bill"Absence of occurrence  “missed payment"Relationship between  " friend of a friend"

Page 26: Wed 1130 aasman_jans_color

Events Decision Engine

Container

ActionsSBA   Application Server

ContainerContainer

EventIngestion Inference

Amdocs Event Collector

Amdocs Integration Framework

Scheduled

Inference Engine(Business Rules)

Bayesian

EventsEvents

“Sesame”

ScheduledEvents

yBeliefNetwork

Operational SystemsOperational Systems

CRMCRMRM OMS

AllegroGraph

Operational SystemsOperational Systems

Event Data SourcesEvent Data Sources

NW Web 2.0

AllegroGraphTriple Store DB

Page 27: Wed 1130 aasman_jans_color

AIDA Event CollectionAIDA Event Collection

Amdocs Event CollectorInference & DecisionAmdocs Event Collector

Event Sources Collection Parsing Mapping Publishing

Decision

Ingestion

• Events are collected from many heterogeneous, configured event sources

Phone calls texting video upload roaming etc– Phone calls, texting, video upload, roaming, etc.– iTune download, web site interaction, media upload– Emails, support calls

Bill payment or non payment– Bill payment or non‐payment– Phones stop working or disconnect

• All fused and mapped into a single event knowledge base

Page 28: Wed 1130 aasman_jans_color

AIDA Semantic Inference

• Define rules to operate to create higher level concepts

AIDA Semantic Inference

– Event (mapping) rules ‐Map event data into the domain ontology– Automatic rules – Compute new properties defined by the ontology– On‐demand rules ‐ perform inference for the services

• Rules triggered upon event ingestion, service request or schedule• Semantic rule inference generates new triples from existing ones

Bills

Charges

P t

Amount

Payment P

Customer

Payments Due Date

“Timeliness”Make

Pattern

Good

Bad

Devices Model

StatusOnTime

Early

Late

Improving

Worsening

Page 29: Wed 1130 aasman_jans_color

Semantic Inference – Using Business R l hi h l l

• AIDA provides Workbench for business 

Rules to generate high level concepts“Late Payment” defined in Workbench

rule construction• Utilizes a sophisticated 

magnetic block GUI for b i lbusiness analysts

• Rules triggered to infer and generate newbusiness conceptsbusiness concepts

rule PaymentDetails.timeliness{

if date within EarlyPeriod days after customerBill.billDatethen timeliness = Early ;

Each business rule defines an attribute. This rule defines an attribute of the PaymentDetails class called timeliness

then timeliness = Early ;else if date not within LatePeriod days after customerBill.billDatethen timeliness = Late ;else timeliness = OnTime ;

}All classes and their attributes are defined in the application ontology

Java codeJava code

Page 30: Wed 1130 aasman_jans_color

Decisioning – Probabilistic 

• AIDA incorporates also Bayesian Belief Networks (BBN)

Assessment

• These are graphical models for reasoning under uncertainty• Important part of decision making – the likelihood of something happenning

estimated by how often it occurred in the past (primarily used in medical research til tl )until recently)

• Evidence consists of observations on certain nodes leading to conclusions

Evidence Conclusions

Payment Pattern

Bill Expect Payment Arrangement 

Setup

Payment

Expect Payment

Page 31: Wed 1130 aasman_jans_color

Presenting insight to the CSRese t g s g t to t e CS

Prediction on reason for the Process opens Prediction on reason for the call – ranked by probability relevant screen for 

reference and action

Presentation of recent dinteractions and events  

Prioritized Recommended treatment and script

Page 32: Wed 1130 aasman_jans_color

First application:  CRMAmdocs Guided Interaction Advisor

First Call ResolutionFirst Call Resolution• Increase up to 15%

Average Handling Time• Reduce up to 30%

Training CostsR d 25%• Reduce up to 25%

Page 33: Wed 1130 aasman_jans_color

Triples all the way downTriples all the way down

Page 34: Wed 1130 aasman_jans_color

So why a triple storeSo why a triple store

• Flexibility, flexibility and flexibilityy, y y– Change the schema on a daily basis– Customers create new policies which in turn will create new schemas on the fly

• Needed to work with meaningRdf describes data– Rdf describes data

• Needed to be declarative for everything– Most RTBI is a combination of data in the DB and javaMost RTBI is a combination of data in the DB and java variables in the application.

Page 35: Wed 1130 aasman_jans_color
Page 36: Wed 1130 aasman_jans_color

Text Intelligence for DOD/ISText Intelligence for DOD/IS

Page 37: Wed 1130 aasman_jans_color

How would you do this with d d h iyour standard search engine

• Give me a newspaper text with a republican and a democrat that serve on two subcommittees that have the same parent committee.

• Which [democrat|republican] is most vocal in the oil spill disaster[ | p ] p

• Given this text, find all the other texts that have the same people and the same main topics but not democrats in the textsame main topics but not democrats in the text.

• Which newspaper favors [democrats|republicans]

• Which [democrate|republican|senator|representative] get most of the attention in the last week.

• Give me the distribution of the most important topics yesterday

Page 38: Wed 1130 aasman_jans_color

The processThe process

• We spider daily >  300 on‐line newspapers and thousands of p y p pblogs

• And search specifically for all the member of the senate and  house of representatives and the executive branch

• Apply entity extractor to the text and extract main concepts – About 150 triples per text…p p

• Hook up these concepts with a detailed database of  each politician and with information from the linked open data cloud

Page 39: Wed 1130 aasman_jans_color
Page 40: Wed 1130 aasman_jans_color
Page 41: Wed 1130 aasman_jans_color
Page 42: Wed 1130 aasman_jans_color

From News Article toFrom News Article to

• People (has‐people)p ( p p )– And their roles

• Places (has‐places)– And the county, state, country they are in

• Organizations (has‐organizations)– Government departments, company names, etc.

• Main Categories (has‐domains)Politics sports ministries energy finance economics– Politics, sports, ministries, energy, finance, economics, ecology, oil, mining industry, etc..

• Main Concepts (has‐main‐groups)– Other important nouns and phrases in a text

Page 43: Wed 1130 aasman_jans_color
Page 44: Wed 1130 aasman_jans_color

LOD cloud – Sept 22 2010LOD cloud  Sept 22 2010

latest LOD cloud

Page 45: Wed 1130 aasman_jans_color
Page 46: Wed 1130 aasman_jans_color

AllegroTextAllegroText

Page 47: Wed 1130 aasman_jans_color
Page 48: Wed 1130 aasman_jans_color

• A little demo?

Page 49: Wed 1130 aasman_jans_color

How scalable is this?How scalable is this?

Page 50: Wed 1130 aasman_jans_color
Page 51: Wed 1130 aasman_jans_color
Page 52: Wed 1130 aasman_jans_color
Page 53: Wed 1130 aasman_jans_color
Page 54: Wed 1130 aasman_jans_color
Page 55: Wed 1130 aasman_jans_color
Page 56: Wed 1130 aasman_jans_color

LoadingLoading

Page 57: Wed 1130 aasman_jans_color

QueriesQueries

• Query planner now takes 99% of SPARQL 1.0, automatically Q y p Q , ycompiles it into query graph flow language…

Page 58: Wed 1130 aasman_jans_color
Page 59: Wed 1130 aasman_jans_color
Page 60: Wed 1130 aasman_jans_color

You can write this by hand if you i i lfwant to optimize yourself.

Page 61: Wed 1130 aasman_jans_color
Page 62: Wed 1130 aasman_jans_color
Page 63: Wed 1130 aasman_jans_color
Page 64: Wed 1130 aasman_jans_color
Page 65: Wed 1130 aasman_jans_color
Page 66: Wed 1130 aasman_jans_color
Page 67: Wed 1130 aasman_jans_color

This will actually work on Prolog i h l !with rules too!

Page 68: Wed 1130 aasman_jans_color
Page 69: Wed 1130 aasman_jans_color

Query performance notes:iWins

• Indices are small enough to fit in memory of conventialg ymachines

• Simultaneous access to indices  (see next slide)

• Pipe line architecture• Pipe line architecture– Stream based processing (all nodes can be active in parallel. Most nodes can begin before the end of data is p greached.)

Page 70: Wed 1130 aasman_jans_color
Page 71: Wed 1130 aasman_jans_color

The endThe end