27
Company Confidential – Do Not Distribute 1

Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 1

Page 2: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Two Goals…

HPC Systems

Breakthrough supercomputing performance and

scalability

Storage & Data Management

Scale-out, manageable high

performance storage systems

YarcData

Solutions for Big Data analytics

Supercomputing Market Leadership Grow Profitably

…Three Business Units

Page 3: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Created a new division within Cray

100% focused on Big Data solutions

Rapidly-growing, multi-billion market

Hired experienced management team with deep Enterprise experience

Page 4: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 4

“Needle in a Haystack” Needle in a Needlestack!

Paradigm: SEARCH Paradigm: DISCOVERY

Discovery  is  about  finding  QUESTIONS  as  well  as  finding  the  answers  to  ques?ons  

Page 5: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 5

G00232737

YarcData's uRiKA Shows Big Data Is MoreThan Hadoop and Data WarehousesPublished: 11 September 2012

Analyst(s): Carl Claunch

The hype about big data is mostly on Hadoop or data warehouses, but bigdata involves a much wider and varied set of needs, practices andtechnologies. We offer recommendations for IT organizations seeking asolution to "graph" problems, including use of the uRiKA graph appliance.

ImpactsŶ IT organizations faced with previously infeasible graph-style discovery problems may succeed

using a focused solution like uRiKA.

Ŷ To address all their data requirements, IT organizations may be forced to duplicate databetween systems such as uRiKA and transactional systems.

RecommendationsŶ Select candidates to place on uRiKA where processing is graph-oriented, the data is large-

scale, and discovery of complex relationships is a core focus.

Ŷ Validate the appropriateness of specialized systems and achievability of performance targets byproof-of-concept and pilot tests.

Ŷ Carefully define the volume and performance requirements for processing data.

Ŷ If requirements can be met using fewer machine types and the economics make sense, use thefewest platform types possible.

AnalysisTo listen to the hype, you might think big data is only about Hadoop, but Gartner deems big data tocomprise a wider, more varied set of needs, practices and technologies. The uRiKA graph appliancefrom YarcData, a new company spinoff of Cray, illustrates our point and is used here to highlightjust one of the classes of big data problems that are poorly addressed by traditional systems.

G00232737

YarcData's uRiKA Shows Big Data Is MoreThan Hadoop and Data WarehousesPublished: 11 September 2012

Analyst(s): Carl Claunch

The hype about big data is mostly on Hadoop or data warehouses, but bigdata involves a much wider and varied set of needs, practices andtechnologies. We offer recommendations for IT organizations seeking asolution to "graph" problems, including use of the uRiKA graph appliance.

ImpactsŶ IT organizations faced with previously infeasible graph-style discovery problems may succeed

using a focused solution like uRiKA.

Ŷ To address all their data requirements, IT organizations may be forced to duplicate databetween systems such as uRiKA and transactional systems.

RecommendationsŶ Select candidates to place on uRiKA where processing is graph-oriented, the data is large-

scale, and discovery of complex relationships is a core focus.

Ŷ Validate the appropriateness of specialized systems and achievability of performance targets byproof-of-concept and pilot tests.

Ŷ Carefully define the volume and performance requirements for processing data.

Ŷ If requirements can be met using fewer machine types and the economics make sense, use thefewest platform types possible.

AnalysisTo listen to the hype, you might think big data is only about Hadoop, but Gartner deems big data tocomprise a wider, more varied set of needs, practices and technologies. The uRiKA graph appliancefrom YarcData, a new company spinoff of Cray, illustrates our point and is used here to highlightjust one of the classes of big data problems that are poorly addressed by traditional systems.

G00232737

YarcData's uRiKA Shows Big Data Is MoreThan Hadoop and Data WarehousesPublished: 11 September 2012

Analyst(s): Carl Claunch

The hype about big data is mostly on Hadoop or data warehouses, but bigdata involves a much wider and varied set of needs, practices andtechnologies. We offer recommendations for IT organizations seeking asolution to "graph" problems, including use of the uRiKA graph appliance.

ImpactsŶ IT organizations faced with previously infeasible graph-style discovery problems may succeed

using a focused solution like uRiKA.

Ŷ To address all their data requirements, IT organizations may be forced to duplicate databetween systems such as uRiKA and transactional systems.

RecommendationsŶ Select candidates to place on uRiKA where processing is graph-oriented, the data is large-

scale, and discovery of complex relationships is a core focus.

Ŷ Validate the appropriateness of specialized systems and achievability of performance targets byproof-of-concept and pilot tests.

Ŷ Carefully define the volume and performance requirements for processing data.

Ŷ If requirements can be met using fewer machine types and the economics make sense, use thefewest platform types possible.

AnalysisTo listen to the hype, you might think big data is only about Hadoop, but Gartner deems big data tocomprise a wider, more varied set of needs, practices and technologies. The uRiKA graph appliancefrom YarcData, a new company spinoff of Cray, illustrates our point and is used here to highlightjust one of the classes of big data problems that are poorly addressed by traditional systems.

Source: Gartner

Page 6: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 6

Figure 3 | The network of the top borrowers. Nodes represent financial institutions (selected as explained in the text). Outgoing links represent theestimated potential impact of an institution to another one (see Methods). The nodes are positioned within a circle of radius 1, centred in 0. The distanceof each node from the center is 1{R D! "i , while the angle increases linearly with R(D) from 0 to 10p. Thus, the closer a node is to the center the higher is itsDebtRank (the intuition here is its centrality). A node in the center (DebtRank 5 1) is able to put under distress the entire economic value of the network.DebtRank decreases by moving outwards and leftwards along the spiral. The diagram allows at the same time to visualise the structure of the network andto compare the importance of any two given nodes. The size and the color of the node reflects the DebtRank value (larger and red nodes have higherDebtRank). The color of a link reflects the DebtRank of the node from which it originates (red links originate from node with high DebtRank and makehigh impact to the destination nodes). (a) Period one, at the beginning of the crisis (see also inset of Fig. 2a). Most of the nodes have low levels ofDebtRank, i.e., they are located close to the border. (b) Period four (peak of the crisis). Nodes have comparable levels of Debt Rank. However, they are alsomuch more central, i.e., they can impact a large fraction of the total economic value. A single default is likely to trigger a systemic failure.

Figure 4 | Debt Rank, asset size and fragility. Scatter plot of DebtRank versus asset size, measured as a fraction (in %) of the total of the asset size in thenetwork. For sake of simplicity, in the experiment, asset size was assumed constant during the time span of the data. Notice that institutions such as UBS,or CITIGROUP alone account for almost 10% of the total assets. The size of each bubble is proportional to the outstanding debt of the institution whilethe color reflects its fragility, defined as the ratio of debt over market capitalization in the given period, as in the previous section. (a) Period one. Since theoutstanding debt was very low or zero, most nodes appear small and have levels of DebtRank below 0.3, but comparable among each other. (b) Periodfour. Many institutions have a Debt Rank larger then 0.5, i.e. each can impact, alone, the majority of the economic value in the network. The outstandingdebt in this period is close to the peak for all the institutions, as reflected by the size of the bubbles. Notice, also a higher fragility, most bubbles are red,although with some heterogeneity.

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 2 : 541 | DOI: 10.1038/srep00541 4

Page 7: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 7

Telecom/Mobile

Life Sciences/Biology

Social Networking

Supply Chain

Healthcare/Medicine

Intelligence/Security

Internet/WWW Finance

Page 8: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 8

Common  Theme:  DISCOVERING  RELATIONSHIPs  spanning  large,  fragmented  data  sets,  and  doing  so  at  an  

increasing  speed  

Domain  

Risk  

Compliance  

Customer  Insight  

Cyber  Security  

Current  

Departmental  Computa?ons  &  aggrega?ons  

Recogni?on  of  known  behaviors  

Profitability,  reten?on  and  marke?ng  

Siloed  examina?on  of  network  incidents  

Emerging  

Risk  as  a  func?on  of  the  connec+ons  between  

counterpar?es  

Proac?vely  discover  new  pa4erns  of  suspicious  

behavior  

Follow  connec+ons  across  customer  social  networks  to  

drive  revenue  

Find  hidden  threats  through  a  connected  view  of  network  ac?vity  

Page 9: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 9

No ACID

No SQL

Key Value

Column Oriented

Relational Extensions

RDBMS

Document Stores In Memory

Page 10: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 10

Graphs are hard to Partition

High cost to follow relationships that

span Cluster Nodes

Graphs are not Predictable Graphs are highly Dynamic

High cost to follow multiple competing paths

which cannot be pre-fetched/cached

High cost to load multiple, constantly changing

datasets into in-memory graph models

?

*Source: Hennessy, J. and Patterson, D., “Computer Architecture: A Quantitative Approach”, 2012 edition

Page 11: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 11

URiKA  Advantages  Business  Challenge  

11

Massively  Mul?-­‐threaded  128  Hardware  Threads/

processor  Up  to  8,192  Processors  

Large  Shared  Memory  Up  to  512  TB  

Highly  Scalable  I/O  Up  to  350  TB/hr  

Graphs are hard to Partition

Graphs are not Predictable

Graphs are highly Dynamic

Real-time, Interactive

Analytics on Big Data

Graph Problems

Page 12: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 12

12

Linux Apache Tomcat/Jena

�  Data  Center  sees  another  Linux  server  

�  Applica+ons  see  industry  standard  interface  �  RDF,  SPARQL,  Java,  

Gadgets…  �  Reuse  Exis+ng  Skillsets  

�  Java,  OSGI,  App  Server,  SOA,  ESB,  Web  toolkit…  

�  No  Lock-­‐in  �  All  applica?ons  and  ar?facts  

built  on  uRiKA  can  be  run  on  other  pla_orms  

�  Subscrip+on  Pricing  Model  

SPARQL  RDF   Java   Gadgets  /Mashups  

Graph  Analy+c  PlaJorm  Shared-­‐memory,  Mul?-­‐threaded,  Scalable  I/O    

graph-­‐op?mized  hardware  

Graph  Analy+c  Database  RDF  datastore  and  SPARQL  engine  

Graph  Analy+c  Tools  Management,  Security,  Data  Pipeline  

Page 13: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 13

�  Install  in  your  Data  Center  like  any  other  purchased  equipment  �  …  But  pay  a  subscrip?on  fee  for  use  of  the  box  instead  of  purchasing  the  box  up  front  

�  Incrementally  add  processors  and  memory  as  use-­‐cases  expand  

� License  for  as  li4le  as  a  one-­‐year  term  � Enables  firms  to  prove  out  the  value  incrementally  

Page 14: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 14

EDW  

Customer  Mart  

Risk  Mart  

Finance  Mart  

Compliance  Mart  

Core Systems

Ops Data

Emails

Social Media

Stru

ctur

ed

Uns

truc

ture

d

External Internal

Data Gateways

Market Data

News Feeds

Web Logs

Incident Reports

Call Data Records

Netflow Data

IDS Logs

Departmental  DWs  

Regulatory  Capital  

Average  Balance  

Rela?onship  Depth  

Regulatory  Capital  

Product  Profitability  

BI/DW Infrastructure Data Sources Analytical Outputs

Paths  to  Fraud    Developing  plots    Developing  cyber  aaacks    Impac_ul  market  events    Ul?mate  counterparty  exposure  

Graph Analytics

Page 15: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 15

Load any type of

data source

Hypothesize with partial information

Validate with existing knowledge/data

“In  the  amount  of  -me  it  takes  to  validate  one  hypothesis,  we  can  now  validate  1,000  hypotheses  –  massively  improving  our  success  rate  and  systema-zing  serendipity.“  (YarcData  Govt  Customer  )  

Entire dataset

Unknown Relationships

Fast real time response

Page 16: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 16

•  Stores  informa+on  in  the  following  format  –  an  RDF  Triple  

 

•  Each  RDF  triple  is  a  pair  of  ver+ces  and  an  edge  in  a  directed  graph:  

 

subject predicate object

David Jones WorksFor Bank of Nebraska LLC

David Jones

Bank of Nebraska

LLC

WorksFor

Slide 16

Page 17: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 17

Page 18: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 18

�  Goal: Proactively identify patterns of activity and threat candidates by aggregating intelligence and analysis

�  Data sets: Reference data, People, Places, Things, Organizations, Communications…

�  Technical Challenges: Volume, Variety and Velocity of data; Inaccurate, incomplete and falsified data

�  Users: Intelligence Analysts �  Usage model: Search for patterns of

activity and graphically explore relationships between entities for candidate behavior and activities

�  Augmenting: Existing Hadoop cluster and multiple data appliances

Concept

Fertilizer Factory Transport

Page 19: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 19

Discover  “similar”  Pa+ents  to  op+mize  treatment  

The  Challenge  �  Longitudinal,  historical  data  spanning  all  events,  symptoms,  

diagnoses,  diseases,  treatments,  prescrip?ons,  etc  of  10M  pa?ents  including  gene?cs  and  family  history  

�  Ad-­‐hoc,  constantly  changing  defini?on  of  “similarity”  based  on  thousands  of  parameters  

�   Interac?ve,  real-­‐?me  response  during  consulta?on  

uRiKA  Solu+on  �  uRiKA  holds  en?re  rela?onship  graph  in  memory  

�  Iden?fy  “similar”  pa?ents  based  on  ad-­‐hoc  physician  specified  paaerns  

�  Interac?ve,  real-­‐?me  access  by  en?re  physician  community  

Business  Value  �  Consistent  selec?on  of  the  most  effec?ve  treatment  for  

each  pa?ent  by  each  doctor  every  ?me  

Blood pressure Prior myocardial infarction Hypertension Body mass index HDL Anti-hypertension meds

√ √

√ √

Patient: Jean Generic

Page 20: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 20

Quickly  adapt  surveillance  to  new  rules  and  pa4erns  

The  Challenge  �  Con?nuous  stream  of  possible  compliance  

events  

�  New  compliance  rules  require  constant  updates  of  signals  and  paaerns  

�  Current  solu?ons  require  offline  prepara?on  and  are  based  on  rigid  rules  

uRiKA  Solu+on  �  En?re  rela?onship  graph  in  memory  

�  New  Paaerns/templates  can  be  iden?fied  and  added  in  real-­‐?me  

�  Graphical  interac?ve  explora?on  of  rela?onships  between  people,  places,  things,  organiza?ons,  communica?ons,  etc.  

Business  Value  �  A  responsive,  flexible  event  detec?on  

pla_orm  that  adapts  to  new  knowledge  

Figure Source: Graph-based technologies for intelligence analysis,T. Coffman, S. Greenblatt, S. Marcus, Commun. ACM, 47(3):45-47, 2004

Page 21: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 21

Monitor,  analyze  complex  opera+onal  processes  to  reduce  risk    

The  Challenge  �  In  large  ins?tu?ons,  opera?ons  (eg.  Trade  Lifecycle)  

can  be  complex,  inter-­‐connected  processes  involving  people,  technology  and  process  steps  

�  Mul?ple  data  collec?on  points  exist,  but  hard  to  combine  to  get  a  holis?c  picture  

�  Hard  to  monitor  if  cri?cal  controls  are  effec?ve  

uRiKA  Solu+on  �  Ability  to  hold  en?re  ‘opera?onal  model’  in  

memory    

�  Can  seamlessly  add  addi?onal  incremental  data  sources  relevant  to  analysis  

�  Interac?ve,  real-­‐?me  querying  capability  allows  ‘constant’  monitoring  of  opera?ons  

Business  Value  �  Beaer  control  of  opera?onal  risk  by  combining  

disparate,  disconnected  instrumenta?on  data  for  a  complex  opera?on  

Page 22: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 22

Understand  financial  risk  emerging  from  connec+ons  in  a  financial  system  

The  Challenge  �  The  financial  crisis  illustrated  that  counterparty  risk  is  a  

func?on  of  the  network  of  counterpar?es,  not  just  each  in  isola?on  

�  Eg.  Ford,  its  suppliers,  employees  are  all  counterpar?es  with  interconnected  exposures  to  a  lender  like  BofA  

uRiKA  Solu+on  �  Ability  to  hold  en?re  dependency  graph  of  posi?ons  in  

memory  

�  Founda?on  to  run  complex  graph  analy?c  algorithms  without  losing  interac?vity  

Business  Value  �  Ability  to  gain  addi?onal  perspec?ve  on  complex  

counterparty  risk  in  addi?on  to  exis?ng  procedures  for  risk  calcula?ons  

Figure 3 | The network of the top borrowers. Nodes represent financial institutions (selected as explained in the text). Outgoing links represent theestimated potential impact of an institution to another one (see Methods). The nodes are positioned within a circle of radius 1, centred in 0. The distanceof each node from the center is 1{R D! "i , while the angle increases linearly with R(D) from 0 to 10p. Thus, the closer a node is to the center the higher is itsDebtRank (the intuition here is its centrality). A node in the center (DebtRank 5 1) is able to put under distress the entire economic value of the network.DebtRank decreases by moving outwards and leftwards along the spiral. The diagram allows at the same time to visualise the structure of the network andto compare the importance of any two given nodes. The size and the color of the node reflects the DebtRank value (larger and red nodes have higherDebtRank). The color of a link reflects the DebtRank of the node from which it originates (red links originate from node with high DebtRank and makehigh impact to the destination nodes). (a) Period one, at the beginning of the crisis (see also inset of Fig. 2a). Most of the nodes have low levels ofDebtRank, i.e., they are located close to the border. (b) Period four (peak of the crisis). Nodes have comparable levels of Debt Rank. However, they are alsomuch more central, i.e., they can impact a large fraction of the total economic value. A single default is likely to trigger a systemic failure.

Figure 4 | Debt Rank, asset size and fragility. Scatter plot of DebtRank versus asset size, measured as a fraction (in %) of the total of the asset size in thenetwork. For sake of simplicity, in the experiment, asset size was assumed constant during the time span of the data. Notice that institutions such as UBS,or CITIGROUP alone account for almost 10% of the total assets. The size of each bubble is proportional to the outstanding debt of the institution whilethe color reflects its fragility, defined as the ratio of debt over market capitalization in the given period, as in the previous section. (a) Period one. Since theoutstanding debt was very low or zero, most nodes appear small and have levels of DebtRank below 0.3, but comparable among each other. (b) Periodfour. Many institutions have a Debt Rank larger then 0.5, i.e. each can impact, alone, the majority of the economic value in the network. The outstandingdebt in this period is close to the peak for all the institutions, as reflected by the size of the bubbles. Notice, also a higher fragility, most bubbles are red,although with some heterogeneity.

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 2 : 541 | DOI: 10.1038/srep00541 4

Figure 3 | The network of the top borrowers. Nodes represent financial institutions (selected as explained in the text). Outgoing links represent theestimated potential impact of an institution to another one (see Methods). The nodes are positioned within a circle of radius 1, centred in 0. The distanceof each node from the center is 1{R D! "i , while the angle increases linearly with R(D) from 0 to 10p. Thus, the closer a node is to the center the higher is itsDebtRank (the intuition here is its centrality). A node in the center (DebtRank 5 1) is able to put under distress the entire economic value of the network.DebtRank decreases by moving outwards and leftwards along the spiral. The diagram allows at the same time to visualise the structure of the network andto compare the importance of any two given nodes. The size and the color of the node reflects the DebtRank value (larger and red nodes have higherDebtRank). The color of a link reflects the DebtRank of the node from which it originates (red links originate from node with high DebtRank and makehigh impact to the destination nodes). (a) Period one, at the beginning of the crisis (see also inset of Fig. 2a). Most of the nodes have low levels ofDebtRank, i.e., they are located close to the border. (b) Period four (peak of the crisis). Nodes have comparable levels of Debt Rank. However, they are alsomuch more central, i.e., they can impact a large fraction of the total economic value. A single default is likely to trigger a systemic failure.

Figure 4 | Debt Rank, asset size and fragility. Scatter plot of DebtRank versus asset size, measured as a fraction (in %) of the total of the asset size in thenetwork. For sake of simplicity, in the experiment, asset size was assumed constant during the time span of the data. Notice that institutions such as UBS,or CITIGROUP alone account for almost 10% of the total assets. The size of each bubble is proportional to the outstanding debt of the institution whilethe color reflects its fragility, defined as the ratio of debt over market capitalization in the given period, as in the previous section. (a) Period one. Since theoutstanding debt was very low or zero, most nodes appear small and have levels of DebtRank below 0.3, but comparable among each other. (b) Periodfour. Many institutions have a Debt Rank larger then 0.5, i.e. each can impact, alone, the majority of the economic value in the network. The outstandingdebt in this period is close to the peak for all the institutions, as reflected by the size of the bubbles. Notice, also a higher fragility, most bubbles are red,although with some heterogeneity.

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 2 : 541 | DOI: 10.1038/srep00541 4

Page 23: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 23

�  Goal: Proactively identify optimal treatments for patients based on treatment results for “similar” patients

�  Data sets: Longitudinal patient data, Family history, Genetics, Reference data,

�  Technical Challenges: Ad-hoc constantly changing definition of “similarity” across thousands of constantly changing attributes

�  Users: Doctors �  Usage model: Compare results of

candidate treatment options for “similar” patients based on ad-hoc physician specified patterns

�  Augmenting: Existing data warehouse

Page 24: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 24

�  Goal: Proactively identify unknown cyber threats by examining all relationships

�  Data sets: IP, MAC, BGP, Firewall, DNS, Netflow, Whois, NVD, CIDR…

�  Technical Challenges: Volume and Velocity of data; Temporal dependencies; Real-time response

�  Users: Cyber Analysts �  Usage model: Iterative analysis of all

patterns across all traffic to explore deviations in frequency of occurrence, derivative patterns of known threats and linking patterns through relationships in offline data

�  Augmenting: Existing data appliances

Page 25: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 25

�  Goal: Proactively identify new patterns of healthcare fraud (perpetrator/provider/patient nexus) by examining all healthcare relationships

�  Data sets: Provider, Beneficiary, Policy, Claims, Billing, Services, Outcomes, Social Network, Organization…

�  Technical Challenges: Volume and Velocity of data; Entity Resolution; Complex Inter-relationships; Temporal dependencies

�  Users: Fraud Inspectors/Analysts �  Usage model: Analyze outcome and cost

for various disease trajectories and identify outliers for treatment optimization and fraud investigation; Separate fraud patterns from benign treatment or legitimate errors

�  Augmenting: Existing data warehouse, Predictive Analytics

Provider Patient Social Network

Billing Claims

External Sources Outcomes

Services

Medicare  Medicaid  

Page 26: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 26

� When  you  can  interac+vely  ask  complex,  graph-­‐oriented  ques+ons  of  large,  diverse  data  sets,  you  can  discover  pa4erns  and  rela+onships  that  are  beyond  the  capability  of  other  systems  

� You  can  solve  analy+cal  challenges  you  thought  were  intractable  

Page 27: Company Confidential – Do Not Distribute 1...vSelect candidates to place on uRiKA where processing is graph-oriented, the data is large scale, and discovery of complex relationships

Company Confidential – Do Not Distribute 27