Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Anomaly Detection in MedicareProvider Data using OAAgraphSungpack Hong (Oracle Labs)Mark Hornick (Oracle Advanced Analytics)Francisco Morales (Oracle Labs)March 21, 2018
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor StatementThe following is intended to outline our research activities and general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Insights
3
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Graph Analytics Machine Learning
Compute graph metric(s) Add to structured data Build predictive model
using graph metric
Build model(s) and score or classify data
Add to graphExplore graph or compute
new metrics using ML result
Approach problem from two perspectives
4
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
OAAgraph• An R package integrating Parallel Graph AnalytiX with Oracle R Enterprise
• Single, unified interface– Work with R data.frame proxy objects (ore.frame) for database data and familiar
functions across ML and graph– Results available as R data.frame proxy objects allowing further processing
• R users take advantage of powerful, complementary technologies available with Oracle Database– Highly scalable PGX engine, part of Oracle Spatial and Graph option– Integrated with Oracle R Enterprise, part of Oracle Advanced Analytics option
5
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PGX (Parallel Graph AnalytiX)• In-memory graph engine
• Fast, parallel, built-in graph algorithms
• 35+ graph algorithms
• Graph query (pattern-matching) via PGQL
• Custom algorithm compilation (advanced use case)
• PGX also available on Hadoop and NoSQL
Detecting Components and CommunitiesTarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang’s
Ranking and Walking Pagerank, Personalized Pagerank,Betweenness Centrality (w/ variants),Closeness Centrality, Degree Centrality,Eigenvector Centrality, HITS,Random walking and sampling (w/ variants)
Evaluating Community Structures
∑ ∑
Conductance, ModularityClustering Coefficient (Triangle Counting), Adamic-Adar
Path-Finding Hop-Distance (BFS)Dijkstra’sBi-directional Dijkstra’sBellman-Ford’s
Link Prediction SALSA (Twitter’s Who-to-follow)
Other Classics Vertex CoverMinimum Spanning-Tree(Prim’s)
6
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Oracle R Enterprise• Use Oracle Database as a high performance compute environment• Transparency layer
– Leverage proxy objects (ore.frames) - data remains in the database– Overload R functions that translate functionality to SQL– Use standard R syntax to manipulate database data
• Parallel, distributed ML algorithms– Scalability and performance– Exposes in-database machine learning algorithms from ODM– Additional R-based algorithms executing and database server
• Embedded R execution– Store and invoke R scripts in Oracle Database – Data-parallel, task-parallel, and non-parallel execution– Invoke R scripts at Oracle Database server from R or SQL– Use open source CRAN packages
7
Oracle Database
User tables
In-dbstats
Database ServerMachine
SQL InterfacesSQL*Plus,SQLDeveloper, …
Oracle R EnterpriseR Client
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
OAAgraph with Oracle Database
Client
Database Server
R Client
ORE
Oracle Database
PGX Server
# Connect R client to # Oracle Database using ORER> ore.connect(..)
# Connect to PGX server # using OAAgraphR> oaa.graphConnect(...)
OAAgraph
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Data Sources• Graph data represented as two tables
– Nodes with properties– Edges with properties
Database Server
Oracle Database
PGX Server
Node ID
Node Prop 1(name)
Node Prop 2(age)
…
1238 John 39 …
1299 Paul 41 …
4818 … … …
From Node
To Node Edge Prop 1(relation)
…
1238 1299 Likes …
1299 4818 FriendOf …
1299 6637 FriendOf …
Node Table Edge Table
edge1node1 edge2node2
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
R Client
Loading Graph
Client
Database Server
ORE
Oracle Database
PGX Server
# Load graph into PGX:# Graph load happens at the server side.# Returns OAAgraph object – a proxy # for the graph in PGXR> mygraph <-
oaa.graph (EdgeTable, NodeTable, ...)
edgenode
OAAgraph
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
R Client
Running Graph Algorithm
Client
Database Server
ORE
Oracle Database
PGX Server
# e.g. compute Pagerank for every node # in the graph # Execution occurs in PGX server sideR> result1<- pagerank (mygraph, ... )
OAAgraph
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
R Client
Exporting the result to DB
Client
Database Server
ORE
Oracle Database
PGX Server
# Export result to DB as Table(s)
R> oaa.create(mygraph, nodeTableName = “node”,
nodeProperties = c(“pagerank“, … ),
… )
EDGESNODES
OAAgraph
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Anomaly Detection in Healthcare BillingBackground and Introduction
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
About the Dataset• A public dataset from US Center for Medicare and Medicaid Services (CMS)
– Health-care Billing Data for CY 2012
– Aggregated medical transactions: 9,153,272 records with 29 variables
– Transactions between 880,644 medical providers and CMS with total amounts > $77B for the year
– Per provider/service aggregate counts,
and submitted/allowed/payment mean/sd
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Anomalies in this Demo• Information in the dataset
– Providers (doctors) their services (treatment, operation, prescription…)– Specialties of providers (e.g. pediatrics, dermatology, …)
• Observation– Doctors of the same specialty provide similar services – What if a doctor perform a lot of treatments that typically belong to other specialties?
• E.g. a cardiologist doing plastic surgery?
• How do we find such cases?
By applying graph analysis on this dataset“There is a spy among us” (an internet meme)
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
• A graph capturing relationships between providers and services– Vertex: health-care provider (LHS), and
health-care services (RHS)– Edge: there is an edge if the provider
has given the service
undirected, bipartite graph
• Vertices have associated properties – e.g. specialty, name, …
Creating a Graph From the Dataset
Health Providers Health Services
e.g. Dr. Victor Frankenstein,Podiatrist
Prescribe Aspirin
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Graph Approach – Basic Idea
Specialty: Internal Medicine
Specialty: Plastic Surgery
Service: Administration of influenza virus vaccine
• In the graph view, providers of the same specialty are close to each other – They are closely connected by common
services that they provide
• We consider it anomalous if a provider vertex is exceptionally close to vertices of another specialty
• But how do we define such closeness? How to find them?
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
• How do we define if two vertices in the graph are close to each other ?
• Shortest path – Using edge weights as ‘distance’
metric • Edge weight can be 1 hop-
distance– Classic graph algorithm: Dijkstra,
Bellman-Ford …
• Some considerations– What if there are multiple paths?– What about high-degree vertices in
between?
Graph Algorithm -- Closeness
A B C DVs.
Vs.A B C D
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
• Personalized pagerank (PPR)– A variant of Pagerank algorithm*– Given a set of starting vertices– Repeating random walk (with restart)
from the starting vertices – Compute probability of visiting each
vertex in the graph– Computed value a natural relative
distance (or closeness) of vertices from the starting set
Graph Algorithm: Personalized PagerankStarting vertices
• Vertices that are ‘close’ would be visited more often naturally
• Shared edges also would make the vertex visited more often
* The algorithm becomes the normal Pagerank algorithm if the Starting Set equals to all the vertices in the graph
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Anomaly Detection Procedure (sketch)1. Select a Specialty
2. Find the set of doctors of the specialty (starting set)
3. Compute Personalized Pagerank from the starting set
4. Find doctors of other specialty that have high values
– Pick up a threshold value (set from the minimum PPR values among the starting vertices)
– Mark high-valued vertices as anomalous
Doctors900,000 HCPCS
6,000
Same specialty(specialty set)
Anomalous (other specialty)
Anomalous (other specialty)
Edges9,000,000
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
• There can be a large number of false-positives, though
• The case of Optometry a lot of providers higher than threshold
• Because some specialties are naturally close to each other
Dealing with False Positives
Distribution of PPR score (from Optometry)
Blue: Optometry doctors
Red: other doctors with high PPR values
Threshold
98.5% Ophthalmology
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
• Very simple collaborative filtering– Group by anomaly-candidates to
their specialties – Focus on groups with least number
of providers
Dealing with False Positive
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Browsing the record of the provider
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
DemoUsing OAAgraph with RStudio
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Summary• OAAgraph provides powerful, scalable graph analytics enabled
from R in Oracle Database with Oracle R Enterprise• Graph analytics is well-positioned for solving large-scale
anomaly detection problems with Spatial and Graph PGX
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 26
Learn More about Oracle’s R Technologies…
http://oracle.com/goto/R