Graph Connect London – 19 Nov 2013by Sebastian Verheughe
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor
Telenor NorwaySubsidiary of the Telenor Group
~1 billion GBP in mobile revenues 2012
Sebastian Verheughe
Lead Developer for Neo4j solutionCoding Architect
Disclaimer
The presentation is not identical to the implementation due to security reasons but shows how we have modeled and solved the
problem in general.
However, all presented data (numbers & charts) are real, unfiltered and extracted from the
production logs
A very aspect ofour business
Telenor Norway Middleware Services
MOBILE
MWBUSINESS
LOGIC& DATA
Providing business logic and data for all channels in the mobile value chain
Handles users with access to X00,000 resources
Backend
Backend
Backend
Backend
Backend
Backend
Backend
Backend
Backend
Backend
Backend
Backend
Channel
Channel
Channel
Channel
Channel
Channel
Channel
Channel
Channel
Channel
Channel
Channel
used by 42 channels
calls 35 sub-systems
10,000 code classes
500 requests/second
20,000 orders/day
Our Problem
20 minutes to calculate all accessible resources
1500 lines of SQL to implement the authorization logic
“solved” by caching data going stale
and the solution did not scale…
FinanceProduction HR
Why a Graph Database?
The questions we wanted answered required traversal of tree structures.
Sub Sub
Tablet Phone
Access
User: R. Branson
Which resources does the user have
access to?
Sales
Parent Company
Part of Company
SubscriptionOwner
UsesSubscription
Tablet
Tailored Read Model
The Model makes read queriesas simple and efficient as possible.
First find your questionsthen model your graph
graph model = relational model
High Level Architecture
Classic MWServices
RDBMS
Resource Authorizati
on
Message Queue
checkaccess
Clients
tx log
Neo4j
othersources
Conditional RulesACCESS is given with the following include parameters:access to subsidiaries and access to content
User
Only find children of PARENT COMPANYgiven access to subsidiaries is allowed
Only look at PART OF COMPANYgiven access to content is allowed
Only look at SUBSCRIPTION OWNERgiven access to content is allowed
Different Access Needs
Umbrella Admin
Access ContentAccess Subsidiaries & Content
Super Admin
Admin
Access (Node Only)Access S&C
Graph AlgorithmPrerequisite: The user node
1. Follow all ACCESS relationships and
read the access parameters on the relationship
2. Follow all PARENT COMPANY relationships given access to subsidiaries is allowed
3. Follow all PART OF COMPANY relationships given access to content is allowed
4. Follow all SUBSCRIPTION OWNER relationships given access to content is allowed
Solution Value
1. Performance optimized from minutes to seconds.
2. Simplicity of writing and understanding business rules for the query traversal.
3. Scalability by performance allowing us to onboard more corporate customers (project business case)
Autonomous Service with it’s own life-cycle and data repository.
Authorization Complexity
• Not a collection of isolated customer trees *
• Not all users of a customer have equal access
• Not a fixed schema, form or size for all customers
• Real-time updated with customer & product data
The data form a highly connected living graph
* Covered later in Technical Details
How we Started with Neo4j
1. Searched the internet for articles about graph database and different solutions.
2. Downloaded and quickly prototyped the solution we liked that matched our requirements (Neo4j).
3. Workshop with Neo4j and our project developers to quickly gain competence and ensure design QA.
4. Solution QA with Neo4j before production and help with performance issues / tuning.
Lessons Learned• Choose a solution/technology that fits your
problem
• New way of thinking – build competence in org.
• Profile your java code to make it really fast
• Don’t put everything into the graph (functional creep)
• Need to know how traversal works (e.g. shortest path)
• Benchmark the graph to evaluate your traversal speed
(we get 1/10 of raw traversal speed)
Alternative In-Memory RDBMS
Option 1: Use existing database- Performance issues due to shared data / suboptimal structure- Complexity since SQL not designed for traversal
Option 2: Separate database+ Might reach same performance as graph db+ Familiar technology- Complexity since SQL not designed for traversal
Decided to go with our instinct
Graph Database
Different Graph Structures
Company X: 147 000 Company Y: 52 000 Company Z: 95 000
1 700 ms 750 ms 1300 ms
get all accessible subscriptions
2 000
1 000
Data from test – repeated prod sampling gave ~2.4 sec for 215,000 subscriptions
Different Graph Structures
1 ms 1 ms 1 ms
check access to single subscription
Company X: 147 000 Company Y: 52 000 Company Z: 95 000
2 000
1 000
Production Performanceretrieve all accessible resources
Check single resource access
1 ms
No operational problems in production
RDBMS Disk RDBMS Mem Cached
Graph In-Heap
Company X
12 min 18 sec < 2 sec
Company Y
22 min 58 sec < 2 sec
Company Z 3 min 15 sec < 2 sec
Technical Details
Production Details
Graph Size 27 million nodes (pre-warmed in heap)
~1x properties, ~2x relationships
Traffic Volume ~1000 req/min during biz hours~ 40K daily real-time updates
Performance Avg: 1 ms, 99% < 4 ms, 99.9% < 9 ms
JVM Sun 6, 20 GB Heap (~15 GB pre-warmed)
CMS GC, No FULL GC in prodDaily restarted for full database
sync
Production Has Access Query
Time (ms)
Time (ms)
Production All Queries
Time (ms)
Garbage collection
Implementing the Algorithm
Lets look at the Neo4j Traversal Framework
Iterable<Node> getAccessibleResources(…) {
Evaluator myEvaluator = …
Expander myExpander = …
return Traversal.description()
.evaluator(myEvaluator)
.expander(myExpander)
.traverse(startNode).nodes();
}
Implementing the Algorithm
Evaluator is a simple filter, e.g. for Node type
class MyEvaluator implements Evaluator {
public Evaluation evaluate(Path path) {
if <I am interested in this node>
return Evaluation.INCLUDE_AND_CONTINUE;
else
return Evaluation.EXCLUDE_AND_CONTINUE;
}
}
Create an Evaluator for each use-case.
Implementing the Algorithm
The custom Expander contains business rules!
class ResAuthExpander implements PathExpander<PathExpander> {…public … expand(Path path, BranchState<…> state) {
if (path.lastRelationship rel == ACCESS)
accToSub = rel.getProperty(ACCESS_TO_SUBSIDIARIES);
accToCont = rel.getProperty(ACCESS_TO_CONTENT);
state.set( getExpander(accToSub, accToCont) );}return state.get().expand(…)
}
Single expander class to control business logicBranchState let’s you keep state during traversal
Implementing the Algorithm
Generates the valid relationships to traverse.
public getExpander(boolean accToSub, boolean accToCont) {
PathExpander exp = StandardExpander.DEFAULT.add(ACCESS,…);
if (accToSub) exp.add(PARENT_COMPANY,…)
if (accToCont)exp.add(PART_OF_COMPANY,
…).add(SUBSCRIPTION_OWNER,…);
return exp; }}
U-Turn Strategy
X
Subscription
Access
User
Reversing the traversal increases performance from n/2 to 2d where n and d are tree size and depth
(we went from 1s to 1ms)
1.
2.
3.
4.
Does the user have access to subscription X?
5.6.
7.
8.
Up to find path quicklyDown to check access
The Zigzag Problem
Op
IT
Subscriptions
Access User
Solvable by adding state to the traversal (or check path)
What if we also have reversed access to the subscription payer?
Ed
Jo
payer
payerpayer
The Many-to-Many Problem
Traversal becomes time consuming (e.g. M2M market) However, we only needed to implement the rule for direct
access to sub.
The nodes Op & IT may be connected through many subscriptions
Op
IT
Subscription
Access
User
owne
r
paye
rDoes the user have access to
department Op?
Deployment View
• Two equal instances of Neo4j embedded in Tomcat
• Access through Java API due to need for custom logic
• Using Neo4j 1.8 without HA (did not like ZooKeeper)
RDBMS
Resource Authorizati
on
Message Queue
tx log
Neo4j
Resource Authorizati
on
Neo4j
Dual Model CostThere are some drawbacks with dual models also
• Not possible to simply join the ACL with resource tables in the relational database - queries needed redesign
• The complexity added by code and infrastructure necessary to manage an additional model.
• Not ordinary competence (in Norway at least)
Unexplored Areas
Combining Access Control List & Graph• Best of both worlds (simple logic, fast
lookup)
Algorithm– Find all affected users when the graph is
updated– Invalidate users access control list– Calculate all accessible resources for each
user– Store result in users access control list
Could then skip the U-turn and many-to-many problem.
Was is worth it?
Yes!
The user experience is important in Telenor
Questions?
Web References
• Telenor Norway• The Project - How NOSQL Paid off for
Telenor• JavaWorld - Graphs for Security