21
© 2013 IBM Corporation Biometrics, Identity and Big Data Analytics Dr. Charles Li Analytics Solution Center [email protected]

Li charles biometrics analytics & big data 122013a for release

Embed Size (px)

DESCRIPTION

biometrics, big data, identity analytics

Citation preview

Page 1: Li charles    biometrics analytics & big data 122013a for release

© 2013 IBM Corporation

Biometrics, Identity and Big Data Analytics

Dr. Charles LiAnalytics Solution [email protected]

Page 2: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

2

Topics

� Biometrics, Identity & ID Management

� Views on Biometrics Technology and System

� Big Data Analytics and Challenges

� Identity Establishment from All Sources

� Identity and Biometrics in the Cloud

� Identity and Biometrics Analytics in Motion

� Summary

Page 3: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Biometrics, Identity and ID Management

Identity

Establishment

Players

Entitlement(s)

Actions

Identity

Trust

(Rules)

Status

(Environment)

Reputation

(History)

Identity Management

Page 4: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Views on biometrics technology and system

4

What is missing?

Page 5: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

5

Extract insight from a high volume, variety and velocity of data in a timely and cost-effective manner

Big Data Concept

Data in many forms –structured, unstructured, text and multimedia

Data in Motion – Analysis of streaming data to enable decisions within fractions of a second

Data at Scale - from terabytes to zettabytes

Variety:

Velocity:

Volume:

Page 6: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

6

Analytics Concept

Structured Data & Unstructured Content

Descriptive Analytics

Prescriptive Analytics

Predictive Analytics

Made consumable and accessible to everyone

What if these trends

continue? Forecasting

How can we achieve the best

outcome and address variability?

Stochastic Optimisation

What is happening

What exactly is

the problem?

How many, how often,

where?

What actions are needed?

What could happen?

Simulation

How can we achieve the best

outcome? Optimisation

What will happen next if?

Predictive

Modelling

Extracting insight,

concepts and relationships

Content Analytics

Deep insights to improve

visualization and

marketing interactions

VisualAnalytics

Biometrics Quality

Monitoring

Biometrics Reports

Page 7: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Biometrics Data at Scale – Static & Single Instance

1 Billion Arrivals 2012 world wide United States – 100-200 million international arrivals 2012

1 Exabytes traveling data

Unique Identification Authority of India (UIDAI) plans to enroll 1.2 billion citizens.(UID Program) ( enroll million /day; half billion by

2014) 3-4 Exabytes Biometrics &

Biographic Data

Prolific Usage of Mobile Phones 6 Billion Mobile Phones

6 Exabytes of behavior data

ID Cards/Border Crossings/Benefits/Multiple

Instances

7,000,000,000x(10 Print 0.5-1MB + Face 200KB +

IRIS KB)

7 Exabytes

EU VIS Biometrics Matching System (BMS) at

70 million individuals and 100K daily enrollment

~100 Terabyte

US DoS has in the range of 100 million faces & Others~ at least 10-50 Terabytes

DHS IDENT over 150 million identities; 125,000 transactions daily

~100-300 Terabytes

FBI NGI ~ over100 Million Fingerprints & More coming plus Faces/Iris

~100-200 Terabytes

1 GigaBytes = 1000MB

1 TeraBytes = 1000GB

1 PetaBytes = 1000TB

1 ExaByes = 1000PB

1 ZettaBytes = 1000EB

1 YottaBytes = 1000ZB

many instances, history, transaction, logs… data in reality

Page 8: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

8

Big Data Sources

System Transaction, Log and Transition Data – Several Times More!

Page 9: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Other Big data examples

150 Exabytes global size of “Big Data” in Healthcare, growing between 1.2 and 2.4 EX / year

For every session, NY Stock

Exchange captures 1 Terabyte of trade information

AT&T transfers about

30 Petabytes of data through its network daily

Hadron Collider at CERN

generates 40 Terabytes of usable data / day

Facebook processes

500+ Terabytes of data daily

Google processes

> 24 Petabytes of data in a single day

Twitter processes

12 Terabytes of data daily

By 2016, annual Internet traffic

will reach 1.3 Zettabytes

We don’t have the most challenging problem!

Page 10: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

� “Brutal Force” De-Duplication

• Cumulative de-duplication / Total number of checks= N(N-1)/2 –“Combination Problem”

• De-duplicate 100 million population enrollment results 4,999,999,950,000,000 checking!!!

• 15 years to complete with 10 million matches per second

� Biometric Accuracy Challenge

• FMR at 1 Identification false match per million;

• 500 False Matches with 1 million enrollment population (de-duplicate)

• 5 million false matches with 100 million enrollment population

Biometric Performance at Giga Scale*

* Courtesy to Bojan Cukic* Courtesy to Bojan Cukic

Prohibitive! We have some unique challenges!

Prohibitive! We have some unique challenges!

Page 11: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Face the Challenges

� Identity Establishment with All Data Sources

- Leverage Entity Resolution Technologies

- Leverage ‘Context Accumulation’

� Biometrics Services in the Cloud

- Leverage Big Data Infrastructure, Platforms

- Leverage Software Services

� Biometrics and Identity Analytics in Motion

- Monitor quality

- Monitor performance

11

Page 12: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Establishment Identity with All Sources

� Biometrics(physical and behavioral)

� Biographic information

� Behavior data (Social media usage)

� Travel data (API, PNR)

� Credit Card/Banking Information

� Web or Mobile App usage behavior

• Emails

• Multimedia

� Spatial and temporal information

12

Entity /Identity

Resolution With all

Sources

Entity / Identity Resolution - a complex process involving the application of sophisticated algorithms across multiple heterogeneous data sources to resolve multiple records into a single fused view of an individual

• Reduce search space and • Reduce search space and computing resources

• Compliment to low quality images • Cost and benefits tradeoff• Systematic research necessary • Successful programs

Page 13: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

InfrastructurePlatform

Managementand Administration

Availability andPerformance

Security andCompliance

Usage andAccounting

Enterprise

Application Services

ApplicationLifecycle

ApplicationResources

ApplicationEnvironments

ApplicationManagement

Integration

Cloud ServicesInfrastructure and Platform as a Service

Smarter Commerce Smarter Cities

Social BusinessBusiness Analyticsand Optimization

Enterprise+

Cloud SolutionsSoftware and Business Process as a Service

Infrastructure

aaS

Platform

PaaS

Software

SaaS

Business Process

BPaaS

DeploymentPrivate, Public and Hybrid Models

Biometrics Services in the Cloud - Leverage Big Data

Infrastructure, Platform and Software Services

Standard Interface

Process DataProcess DataProcess Data

Process DataProcess DataProcess Data

Process DataProcess DataProcess Data

Enrolment Service

1:1 Identification Service

….

Fingerprint Biometric DataIris

Face

Note: Cloud & Big Data not the same

Page 14: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

� A Prototype - Leveraging the cloud for Big Data Biometrics

• E. Kohlwey et al. “Leveraging the Cloud for Big Data Biometrics, 2011

• A prototype system for generalized searching of cloud-scale biometric data as well as an application of this system to the task of matching collection of synthetic human iris images

• Implemented with Hadoop (Map/Reduce framework)

� Successful deployment of Identification algorithms for India UID program

• Non-traditional matching vendor technologies

� Biometrics as a Service

• Business process as a service

• Software as a service

14

Exemplary Progress

Page 15: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

� Focus on Parallelism and Scalability

• Excellent research and testing areas

• Bring algorithms into operational environment

� Explore defining biometrics as a service program –new way of thinking about acquisition

• Business process as a service

• Software as a service

� Encourage partnership among Big Data & Analytics developers, traditional biometrics solution providers

• Big Data and Analytics players

15

Challenges

Page 16: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Big Data Appliance Examples

� IBM Nettezza

� Oracle EXADATA

� Terradata

� EMC2 Greenplum

� SAP HANA

� Schooner Appliance MySQL

Example - (CBP) 40TB data (per appliance, a few hundreds

cores) hosted by a little more than a dozen appliances support

30 – 40 % of DHS’s operations

16

Page 17: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

17

Biometrics and Identity Analytics in Motion

� ROC curve calibration along the security vs convenience

• Allow systems to dynamically change operation criteria based on live situation

• This is a real challenge due to the needed ground truth…

� Quality Feedback to the Collection

• Avoid collecting ‘bad’ data to degrade the system

� Operating Metrics Monitoring

• Rates on enrollment, rejection and etc.

• Geo-location and temporal information

� Fuse all data sources based on real time feedback

• Dynamically allocating fusion algorithms and configurations

� Provide controlled parallelism

• System and algorithms levels

Page 18: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Achieve scale:

By partitioning applications into software components

By distributing across stream-connected hardware hosts

Infrastructure provides services for

Scheduling analytics across hardware hosts,

Establishing streaming connectivity

Transform

Filter / Sample

Classify

Correlate

Annotate

Where appropriate:

Elements can be fused together

for lower communication latency

� Continuous ingestion� Continuous analysis

One Approach - Streams Technology in Working

© 2013 IBM

Corporation1

Near Real Time on Big Data Platform

Page 19: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

19

Summary

� Re-focus on Identity

• Biometrics as an enabling technology

� Re-thinking on

• Open architecture

• Vendor agnostic solution via biometrics middleware

� Big Impact by Big Data and Cloud Technologies

• Biometrics as a Service to Leverage Cloud Computing

� Big Data Real Time Platform

• Near real time analytics requirements

Page 20: Li charles    biometrics analytics & big data 122013a for release

© 2013 IBM Corporation20

Page 21: Li charles    biometrics analytics & big data 122013a for release

© 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

21

A New Look - Identity and Biometrics Analytics

Stream in Parallel

Big DataPlatform

Entity /Identity Resolution

Big Data Solution

Pipeline Identification Services

Including many Models

Massively Parallel Processing

Real Time

High Volume

� Travel Data

� Banking Data

� Spatial Data

� Temporal Data

Real-time feeds

� Biometrics Capture Data

� Biographic Data

Unstructured data

� Social Media

� Info on Web

� Behavioral data

Report – Descriptive Analytics

Predictive Models

Business Workflow Resolution

Visualization Analytics

Content Analytics