20
HPCC Platform Big Data Analytics and Delivery http://hpccsystems.com LexisNexis’ massive parallel-processing open-source computing platform Big Data Cloud Meet Up September 8 th , 2011

BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Embed Size (px)

DESCRIPTION

Big Data Analytics for Health - Insights from the Healthcare Industry. - Charles Kaminski, LexisNexis

Citation preview

Page 1: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

HPCC PlatformBig Data Analytics and Delivery

http://hpccsystems.com

LexisNexis’ massive parallel-processing open-source

computing platform

Big Data Cloud Meet UpSeptember 8th, 2011

Page 2: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Who’s been using the HPCC Platform and why?

• Very large businesses• Federal Agencies• National research labs

• It’s 4 to 10 times faster• Products and solutions are built much faster• Very complex problems can be modeled and solved• It’s proven

http://hpccsystems.com

Page 3: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

What’s changed?

We just Open-Sourced!

The HPCC Platform is now available to you.

http://hpccsystems.com

Page 4: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Big Data…It’s our business.

BigData

Open Source Components

InsuranceInsurance

Financial Services Financial Services

Cyber SecurityCyber Security

GovernmentGovernment

Health CareHealth Care

RetailRetail

TelecommunicationsTelecommunications

Transportation & LogisticsTransportation & Logistics

Weblog AnalysisWeblog Analysis

INDUSTRY SOLUTIONSINDUSTRY SOLUTIONS

Customer Data IntegrationData FusionFraud Detection and PreventionKnow Your CustomerMaster Data ManagementWeblog Analysis

Online ReservationsOnline Reservations

http://hpccsystems.com

Page 5: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

The Platform’s Major Parts

• Thor – Data ingestion, hygiene, refining, transformation, linking, fusion• Roxie – Data Delivery Engine

•Supports complex queries and distributed indexes•Low latency -- Latencies grow logarithmically

• ECL – One language•Highly expressive and efficient declarative language

•Solve complex problems•Encourage code reuse

http://hpccsystems.com

Page 6: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

How we’re different

It’s not a group of disparate technologies or competing visions bolted together.

It’s one platform with a clear proven vision.

This by itself is powerful.

http://hpccsystems.com

Page 7: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

How we’re different

• You can transcend map reduce • Build transformative data graphs and applications using ECL• Solve very complex Big Data problems• Don’t struggle to fit your Big Data problem into groups of map reduce jobs

http://hpccsystems.com

Page 8: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

How we’re different

• No need to munge the data before ingestion• No complex block file system• No need to tune number of tasks for different jobs• Data Delivery Engine is included• Use a single language for data cleansing, transformation, linking, fusion, and delivery• ECL promotes language extension and code reuse• Data graphs are built and optimized by the system• The system-generated C++ is highly optimized• Code execution is optimized• Low and predictable latencies

• Modeling data problems as data problems leads to richer solutions

http://hpccsystems.com

Page 9: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Challenges Facing Health Care Enterprises Challenges facing the health insurance industry

Disparate data in spread across separate physical locations

Scale of data. BIG Data is getting BIGGER.

Adding relationships exponentially expands the size of the BIG Data analytics challenge.

LexisNexis has leveraged parallel-processing computing platforms and large scale graph analytics for a over a decade.

http://hpccsystems.com

Page 10: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Potential Fraud – a POC for the State of New York

Applied social network analytics to information provided by the State of New York and public data supplied by LexisNexis to identify relationships between a group of New York Medicaid recipients living in high-end condominiums located within the same complex and any links those individuals might have to medical facilities or others providing care to New York Medicaid recipients.

http://hpccsystems.com

Page 11: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

What’s entailed (high level)

Mix First Party data with Public and Third Data sources

Adds fidelity to existing entities Adds new linkages into the

analysis Ads new entities into the

analysis Exposes ring leaders and brokers

that don’t directly participate

Addition of External Data

http://hpccsystems.com

Page 12: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

• Graph \ Network 3 Billion derived public data relationships between people merged with risk indicators.

• Graph Analytics examine up to 20 billion data points to create variables that allows for predictive analysis incorporating relationship context and associated risk.

• Targets fraud across all sectors including Healthcare, Financial Services and Government.

How we did it

http://hpccsystems.com

Page 13: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Cluster Visualization Introduction

How many of them are living in expensive residences, owned expensive property or drive expensive cars?

How many recipients are contacts of medical businesses? How many medical businesses are associated with any of the people in the

cluster? How many are currently receiving benefits?

How many of them are living in expensive residences, owned expensive property or drive expensive cars?

How many recipients are contacts of medical businesses? How many medical businesses are associated with any of the people in the

cluster? How many are currently receiving benefits?

Medicaid RecipientMedicaid Recipient

Expensive ResidenceExpensive Residence

Owns expensive propertyOwns expensive property Owns Expensive VehiclesOwns Expensive Vehicles

Business Contact of Medical Business EntityBusiness Contact of Medical Business Entity

Cluster visualization introduction

http://hpccsystems.com

Page 14: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Cluster Visualization Cluster visualization

http://hpccsystems.com

Page 15: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

City Walk Sample: Vehicle Statistics

What is the list of preferred expensive vehicles?

Make Description # Owned Make Description # Owned

Mercedes-Benz

46 Chevrolet

2

Lexus

41 Hummer

2

BMW

27 Jeep

2

Infiniti

13 Nissan

2

Acura

9 Toyota

2

Lincoln

8 Aston Martin

1

Audi

7 Bentley

1

Land Rover

7 Cadillac

1

Porsche

6 GMC

1

Jaguar

5 Honda

1

Mercedes Benz

3 Volkswagen

1

Saab

3 Volvo

1

Vehicle Statistics

http://hpccsystems.com

Page 16: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Name Deeds Held Name Deeds Held

Hudson Eight 78 Mike Greem 21

Hudson Five 74 Scott Hill 21

Hudson First 73 Betty Donaway 21

Hudson Nine 65 Al Clark 19

Harry Anderson 45 Dave Miller 17

Hudson Ten 41 Mark Walker 16

Hudson Seven 39 Mike Smith 16

Home Nationwide 33 Val Edwards 15

Hudson Three 33 Eric Garcia 14

Brian Smith 28 Dane Young 14

Alan Stevens 25 Bill Moore 14

Chris Doe 24 Karen Carter 14

Sophie Davis 23 Casey Baker 14

Washington Mutual 23 Art Nelson 14

Fleet Mortgage Co. 21 Cathy Parker 13

Dominant buyers and sellers at City Walk

Property deed reference counts

http://hpccsystems.com

Page 17: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

The engineering story

http://hpccsystems.com

One guy (Joe Prichard). Three weeks. Less than part time.

The platform lets him focus on the data.

Joe’s a lot of fun to work with.

Page 18: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Do you do build other POC’s?

YesYes

http://hpccsystems.com

Page 19: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

What next?

Try us out!

• Virtual Machine

• Binaries

• EC2 Data Script

• Ensemble Recipe…Juan from Cannonical

Try us out!

• Virtual Machine

• Binaries

• EC2 Data Script

• Ensemble Recipe…Juan from Cannonical

http://hpccsystems.com

Page 20: BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

Contact Information

Charles Kaminski

Senior Architect

Academic Development Lead

HPCC Systems

[email protected]

402-619-9413

Charles Kaminski

Senior Architect

Academic Development Lead

HPCC Systems

[email protected]

402-619-9413

http://hpccsystems.com