25
1 Data Quality: Opportunities, Data, and Examples

1 Data Quality: Opportunities, Data, and Examples

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Data Quality: Opportunities, Data, and Examples

1

Data Quality:Opportunities, Data, and Examples

Page 2: 1 Data Quality: Opportunities, Data, and Examples

2

Page 3: 1 Data Quality: Opportunities, Data, and Examples

3

– Level of analysis• Take a quick look at what/why use data

• Linking data from disparate and third party sources

– Explore data types– Typical issues & Tricks

• Cross validation and sourcing

• Reverse Look-up

• GIS layering

• Backfill from text correlated to codes

– Information from operations• Text analytics

– Level of analysis• Take a quick look at what/why use data

• Linking data from disparate and third party sources

– Explore data types– Typical issues & Tricks

• Cross validation and sourcing

• Reverse Look-up

• GIS layering

• Backfill from text correlated to codes

– Information from operations• Text analytics

Better and More DataBetter and More Data

Page 4: 1 Data Quality: Opportunities, Data, and Examples

4

Sales and Distribution

Producer SegmentationMarket PlanningRevenue ForecastingCross sell and Up sellRetention and Profitability

Underwriting

Risk Selection and PricingPortfolio ManagementPremium AdequacyBilling and Collections Management

Claims

Payment AccuracyClaim Collaboration > Fraud Detection > Subrogation > Risk Transfer > 3rd Party Deductible > Reinsurance Recoverable

General Organizational OverviewAn information business focused on risk taking.Make. Sell. Serve.

Page 5: 1 Data Quality: Opportunities, Data, and Examples

5

Same Problems – Different Lines of BusinessSame Problems – Different Lines of Business

• Personal – Auto, HO, Umbrella

• Small Commercial – BOP, CPP

• Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto

• Large Commercial Accounts

• Commercial Auto

• Workers Comp

• Umbrella/Excess

• Specialty Lines – D&O, EPL, E&O, Farm, FI

• Personal – Auto, HO, Umbrella

• Small Commercial – BOP, CPP

• Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto

• Large Commercial Accounts

• Commercial Auto

• Workers Comp

• Umbrella/Excess

• Specialty Lines – D&O, EPL, E&O, Farm, FI

Page 6: 1 Data Quality: Opportunities, Data, and Examples

6

Structured dataSemi-structured dataUnstructured dataTextSpatialPictographicGraphicVoiceVideo

Data Types and FormsData Types and Forms

Page 7: 1 Data Quality: Opportunities, Data, and Examples

7

Data

Archive,Legacy Systems

Current System Claim

Multiple StatesBilling SystemsFinance SystemsCRM Systems, other data

PolicyMultiple Underwriting Systems

Medical Data - Bill Review - PPO - Case Management - Paradigm

Multiple Data Systems which must be pulled together for analysis. Great opportunity for cross-validation and sourcing

• Identify Data Systems• Get right data from right systems• Overcome internal Organizational Barriers• Bridge to legacy systems and archived data• Augment to create rich data mining environment• Expect the need to negotiate for resources

ACTIONS

Vendors/Partners

External Data

Page 8: 1 Data Quality: Opportunities, Data, and Examples

8

Dun & BradstreetExperianBureau of Labor and StatisticsMarket StanceAM BestEquifaxUS CensusClaritasMelissa DataISOGIS vendorsU&C Data setsCode Sets for ICD-s and CPT’s…

Some typical external data sources and vendors

Page 9: 1 Data Quality: Opportunities, Data, and Examples

9

Data Glitches – historical and on-goingData Glitches – historical and on-going

Systemic changes to data not process related– Changes in data layout / data types– Changes in scale / format– Temporary reversion to defaults– Missing and default values– Gaps in time series

Systemic changes to data not process related– Changes in data layout / data types– Changes in scale / format– Temporary reversion to defaults– Missing and default values– Gaps in time series

Page 10: 1 Data Quality: Opportunities, Data, and Examples

10

Process Reasons for poor data entry

Page 11: 1 Data Quality: Opportunities, Data, and Examples

11

Defining Issues-sampleDefining Issues-sampleConstantsDefinition MismatchesFiller Containing DataInconsistent CasesInconsistent Data TypesInconsistent Null RulesInvalid KeysInvalid ValuesMiscellaneousMissing ValuesOrphansOut of RangePattern ExceptionsPotential ConstantsPotential DefaultsPotential DuplicatesPotential InvalidsPotential RedundantValuesPotential Unused FieldsRule ExceptionsUnused Fields

Source Data

1-DefineIssues

Page 12: 1 Data Quality: Opportunities, Data, and Examples

12

Data Elements

DZ

BE

CN

DK

EG

FR

. . .

ZW

ISO 3166English Name

ISO 31663-Numeric Code

012

056

156

208

818

250

. . .

716

ISO 31662-Alpha Code

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others

ISO 3166French Name

L`Algérie

Belgique

Chine

Danemark

Egypte

La France

. . .

Zimbabwe

DZA

BEL

CHN

DNK

EGY

FRA

. . .

ZWE

ISO 31663-Alpha Code

MORE ISSUES…Mapping across sources: Same Fact, Different TermsMORE ISSUES…Mapping across sources: Same Fact, Different Terms

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

DataElementConcept

Page 13: 1 Data Quality: Opportunities, Data, and Examples

13

Data FillingData Filling

• Manual

• Statistical Imputation

• Temporal

• Spatial

• Spatial-temporal

• Manual

• Statistical Imputation

• Temporal

• Spatial

• Spatial-temporal

Page 14: 1 Data Quality: Opportunities, Data, and Examples

14

Geographic Hierarchy

Page 15: 1 Data Quality: Opportunities, Data, and Examples

15

Deriving Data = Power

Totals: Household Income Trends: Rate of Medical Bill Increases Ratios: Claims/Premium, Target/Median Friction: Level of inconvenience, ratio of rental to damage Sequences: Lawyer-Doctor, Auto-Life Policy Circumstances: Minimal Impact Severe Trauma Temporal: Loss shortly after adding collision Spatial: Distance to Service, proximity of stakeholders Logged: Progress Notes, Diaries,

Who did it, When, “Why”

Page 16: 1 Data Quality: Opportunities, Data, and Examples

16

Deriving Data = Power (Cont’d)

Behavioral: Deviation from past usage, spike buying Experience Profiles: Vendor, Doctor, Premium Audit Channel: How applied, How reported, Service Chain Legal Jurisdiction: Venue Disposition, Rules Demographics: Working, Weekly wage, lost income Firmographics: Industry Class Code Vs Injuries Claimed Inflation: Wage, Medical, Goods, Auto, COLA Gov’t Statistics: Crime Rate, Employment, Traffic Other Stats: Rents, Occupancy, Zoning, Mgd Care

Page 17: 1 Data Quality: Opportunities, Data, and Examples

17

“Search” versus “Discover”“Search” versus “Discover”

Data Mining

Text Mining

DataRetrieval

InformationRetrieval

Search(goal-oriented)

Discover(opportunistic)

StructuredData

UnstructuredData (Text)

Page 18: 1 Data Quality: Opportunities, Data, and Examples

18

Word Replacement

Lists

Input Value

[Jim]

SearchingSearchingSearchingSearching

Returns “Similar Matches”

All Records Found:

Jimmy

Jim

James

JimmyJimmy

JimJim

JamesJames

JAMESJAMES

JAMESJAMES

JAMESJAMES

TransformedInput Value

[JAMES]

Page 19: 1 Data Quality: Opportunities, Data, and Examples

19

Motivation for Text MiningMotivation for Text Mining

• Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation)

• Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery.

90%

Structured Numerical or CodedInformation

10%

Unstructured or Semi-structuredInformation

Page 20: 1 Data Quality: Opportunities, Data, and Examples

20

Convergence of Disciplines Example

Page 21: 1 Data Quality: Opportunities, Data, and Examples

21

Techniques for attacking text data:Techniques for attacking text data:

Rules-basedStatistical Text Analysis and ClusteringLinguistic and Semantic ClusteringSupport Vector MachinesPattern Matching or other statistical algorithmsNeural Networks

Combination of methods from above

Text is like a data iceberg

Page 22: 1 Data Quality: Opportunities, Data, and Examples

22

Claims processing – Progress notes and DiariesClaims processing – Progress notes and Diaries

CLAIMSADJUSTER

•Medical Management Staff•Special Investigation Unit•NICB•Vendor Management•Consulting Engineers•Hearing Representative •Structured Settlement Unit•Recovery Staff•Legal Staff

•Home Office Staff•Field Office Claim Staff•Insured Risk Manager•Agent or Broker

•Diary forward – “call Dr Jones next week”•Business Rule – large loss review•System Reminder – update case reserves•Correspondence Tracking – legal letter sent

Service

Page 23: 1 Data Quality: Opportunities, Data, and Examples

23

Semantic processing: Named Entity ExtractionSemantic processing: Named Entity Extraction

• Identify and type language features• Examples:

• People names• Company names• Geographic location names• Dates• Monetary amount• Phone #, zipcodes, SSN, FEIN• Others… (domain specific)

• Identify and type language features• Examples:

• People names• Company names• Geographic location names• Dates• Monetary amount• Phone #, zipcodes, SSN, FEIN• Others… (domain specific)

Page 24: 1 Data Quality: Opportunities, Data, and Examples

24

ForkliftHits Ladder

Ladder inDoorway

ForkliftCouldn’t Stop

No BarrierSigns

ForkliftBrakes

Defective

Cooking Oil on Floor

ForkliftGoing Too

Fast

BrakeMaintenance

Delayed

HousekeepingInadequate

Speed LimitsNot Enforced

Or

Lack ofPersonnel

NoPolicy

NoEnforcement

NoEnforcement

Feedback to UW

Page 25: 1 Data Quality: Opportunities, Data, and Examples

25

Data Quality:Opportunities, Data, and Examples