Usama Fayyad talk in South Africa: From BigData to Data Science
79
From BigData to Data Science: Predictive Analytics in a Changing Data Landscape Usama Fayyad, Ph.D. Group Chief Data Officer and CIO Risk, Finance, & Treasury Technology Barclays Twitter: @usamaf Overviews for South Africa July 15-16, 2015
Usama Fayyad talk in South Africa: From BigData to Data Science
1. From BigData to Data Science: Predictive Analytics in a
Changing Data Landscape Usama Fayyad, Ph.D. Group Chief Data
Officer and CIO Risk, Finance, & Treasury Technology Barclays
Twitter: @usamaf Overviews for South Africa July 15-16, 2015
2. Outline Big Data all around us: the problems that Data
Science is NOT addressing The CDO role and Data Axioms Some of the
issues in BigData Case studies Context and sentiment analysis A
case-study on IOT Yahoo! predictions at scale Summary and
conclusions
3. Why should banks worry about data? 100 years ago Smaller
Much more local Deep understanding of customer needs Deep knowledge
of customer Now Digital, decline of the branch Global No face of
the customer New generation of millennials A big part of banking
was knowing the customer intimately personal relationships making
KYC, risk modelling simple
4. Why data is important: Every customer interaction in an
opportunity to capture data, learn & act An instant car loan
product offering is displayed in the app Connie logs into BMB to
check her balance. From cookies and browsing behaviour we know she
is looking for a car loan CRM data is used to pre-calculate Connies
borrowing limit for a car loan Based on multiple internal and
external data sources and predictive models, we identify cross-sell
opportunities Connie is offered a competitively priced bespoke
offer for car servicing / MOT Connies journey is enhanced based on
previous multivariate testing results User experience is
continuously, iteratively improved by capturing user interaction in
real- time during every session Connie has a personalised journey
based on pre-calculated limit for the car loan amount Data Capture/
Opportunity to learn Actions Customer Interaction Event (branch,
telephony, digital, mobile, sales) Millions of customer
interactions per day Customer Interaction CRM Predictive Analytics
Multivariate Testing Targeted offers during browsing Product
Discovery Cross-sell Measure feedback per session Every interaction
is an opportunity to capture data, to learn and to act Big Data
platform Imagine we could move back to this model 100 years ago on
demand, consumable, understandable information to build intimate
relationships & an understanding of the customer
5. What matters in the age of analytics? 1. Being able to
exploit all the data that is available Not just what you have
available What you can acquire and use to enhance your actions 2.
Proliferating analytics throughout the organization Make every part
of your business smarter Actions and not just insights 3. Driving
significant business value Embedding analytics into every area of
your business can significantly drive top line revenues and/or
bottom line cost efficiencies
6. Data Fusion: Can we bring all data together for analysis
& action? Customer and Client Interactions Central Data Fusion
Engine Ingesting, persisting, processing and servicing in
Real-time. Analysis TransactionsSocial Data Trade Application Logs
Network Traffic RiskMarketingFinancial CrimeFraudCyber Security
DaaS
7. Big Data: is a mix of structured, semi-structured, and
unstructured data Typically breaks barriers for traditional RDB
storage Typically breaks limits of indexing by rows Typically
requires intensive pre-processing before each query to extract some
structure usually using Map-Reduce type operations Above leads to
messy situations with no standard recipes or architecture: hence
the need for data scientists Conduct Data Expeditions Discovery and
learning on the spot Why Big Data? A new term, with associated Data
Scientist positions A Data Scientist is someone who Knows a lot
more software engineering than Statisticians & Knows a lot more
Statistics than software engineers
8. The 4-Vs of Big Data Big Data is Characterized by the 3-Vs:
Volume: larger than normal challenging to load/process Expensive to
do ETL Expensive to figure out how to index and retrieve Multiple
dimensions that are key Velocity: Rate of arrival poses real-time
constraints on what are typically batch ETL operations If you fall
behind catching up is extremely expensive (replicate very expensive
systems) Must keep up with rate and service queries on-the-fly
Variety: Mix of data types and varying degrees of structure
Non-standard schema Lots of BLOBs and CLOBs DB queries dont know
what to do with semi-structured and unstructured data
9. 9 Invited talk #ODSC, Boston Copyright Usama Fayyad 2015
Male, age 32 Lives in SF Lawyer Searched on from London last week
Searched on: Italian restaurant Palo Alto Checks Yahoo! Mail daily
via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and
hosts a 360 page viewed by 10k people Searched on: Hillary Clinton
Clicked on Sony Plasma TV SS ad Registration Campaign Behavior
Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code
from Amazon Classic Data: e.g. Yahoo! User DNA
10. 10 Invited talk #ODSC, Boston Copyright Usama Fayyad 2015
Male, age 32 Lives in SF Lawyer Searched on from London last week
Searched on: Italian restaurant Palo Alto Checks Yahoo! Mail daily
via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and
hosts a 360 page viewed by 10k people Searched on: Hillary Clinton
Clicked on Sony Plasma TV SS ad Spends 10 hour/week On the internet
Purchased Da Vinci Code from Amazon How Data Explodes: really big
Social Graph (FB) Likes & friends likes Professional netwk -
reputation Web searches on this person, hobbies, work,
locationMetaData on everything Blogs, publications, news, local
papers, job info, accidents
11. The Distinction between Classic Data and Big Data is fast
disappearing Most real data sets nowadays come with a serious mix
of semi- structured and unstructured components: o Images o Video o
Text descriptions and news, blogs, etc o User and customer
commentary o Reactions on social media: e.g. Twitter is a mix of
data anyway Using standard transforms, entity extraction, and new
generation tools to transform unstructured raw data into
semi-structured analyzable data
12. Text Data: The Big Driver We speak of big data and the
Variety in 3-Vs Reality: biggest driver of growth of Big Data has
been text data o Most work on analysis of images and video data has
really been reduced to analysis of surrounding text Nowhere more so
than on the internet Map-Reduce popularized by Google to address
the problem of processing large amounts of text data: o Many
operations with each being a simple operation but done at large
scale o Indexing a full copy of the web o Frequent re-indexing
13. A few words on: The Chief Data Officer Why are companies
creating this position? There is a fundamental realisation that
Data needs to become a primary value driver at organizations o We
have lots of Data o We spend much on it: in technology and people o
We are not realising the value we expect from it A strong business
need to create the CDO role: o Traditional companies are not
following, but adopting the model that actually works in other
data-intensive industries CDO has a seat at executive table: the
voice of Data Data done right is an essential element to unify
large enterprises to unlock value from business synergies
14. Fundamental Data Principles to Support Analytics Usamas
Obvious Data Axioms 1. Data gains value exponentially when
integrated and coalesced. When fragmented: dramatic value loss
takes place; Increased costs; Reduced utility/integrity; Increased
security risks 2. Fusing Data together from disparate/independent
sources is difficult to achieve and impossible to maintain Hence
only viable approach is: Intercepting and documenting at the source
Fusing at the source Controlling lifecycle and flow
15. Fundamental Data Principles to Support Analytics Usamas
Obvious Data Axioms 3. Standardisation is essential For sustained
ability to integrate data sources and hence growing value; For
simplifying down-stream systems and apps For enforcing discipline
as a firm increases its data sources 4. Data governance and policy
must be centralised Needs to be enforced strongly else we slip into
chaos and a Babylon of terms/languages An Enterprise Data
Architecture spanning structured and unstructured data 5. Recency
Matters -> data streaming in modelling and scoring Often,
accuracy of prediction drops quickly with time (e.g. consumer
shopping) Value of alerts drop exponentially with time Ability to
trigger responses based on real-time scoring critical Streaming,
real-time model updates, real-time scoring
16. Fundamental Data Principles to Support Analytics Usamas
Obvious Data Axioms 6. Abstraction layer of data from
Apps/platforms is a must Rapid renewal & modernization: the
pace of change and development of technology are very rapid Design
for migration and infrastructure replacement via abstraction layers
that remove tech dependencies Encryption and Masking: Persisting
unencrypted confidential and secret data (even within secure
firewalls) is an invitation for problems and risks 7. Data is a
primary competency and not a side-activity supporting other
processes Hence specialized skills and know-how are a must
Generalists will create a hopeless mess Data is difficult:
modelling, architecture, and design to support analytics
17. Driving the need for integrated processes, data &
architecture Regulatory Demand Data quality, completeness, lineage
and aggregation Company financial reporting presented at most
granular level through various lenses: business units, legal
entities, regional, sector level, Faster turnaround for
increasingly complex ad hoc data requests Additional sophisticated
stress tests delivered in a joined-up manner between Risk and
Finance Enhanced hybrid capital computation methodologies Business
Drivers Better and smarter controls Capital precision Better
customer experience Better colleagues experience More cost
effective
18. Data Governance is BORING for most Policy, governance &
control Business ownership & data stewardship Definitions &
metadata Permit to build (change governance) Reporting &
analytics Data architecture Data sourcing Reconciliation Data
quality management Documentation, tracking & audit but it
shouldnt be!
19. Reality check: So what should users of analytics in Big
Data world do? Load the Data into a Data Lake Your life will be so
much easier as you can now do Data Acrobatics & other amazing
data feats
20. The Data Lake according to Waterline We loaded the Data!
Congratulations Now What?
21. From a Data Lake to Amazon Browser Amazon Simple Search
Amazon gives you facets Product details
22. Reality check: So where do analysts & Data Scientists
spend all their time? Lets Mine the Data
23. Reality check: So what do Technology people worry about
these days? To Hadoop or not to Hadoop? When to use techniques
requiring Map-Reduce and grid computing? Typically organizations
try to use Map-Reduce for everything to do with Big Data This is
actually very inefficient and often irrational Certain operations
require specialized storage o Updating segment memberships over
large numbers of users o Defining new segments on user or usage
data
24. Drivers of Hadoop in large enterprises Cost of storage
Fastest growing demand is more storage Data in Data Warehouses have
traditionally required expensive storage technology: $20K to $50k
per terabyte per year cost of premium storage $2K per terabyte much
lower per year cost of Hadoop on commodity storage
25. Analysis & programming software PIG HIPI
26. Hadoop Stack
27. Big Data Landscape
28. Data Platforms Landscape Map
29. Reality check: If storage is biggest driver of Hadoop
adoption, what is next biggest? ETL Replaces expensive licenses
Much higher performance with lower infrastructure costs
(processors, memory) Flexibility in changing schema and
representation Flexibility on taking on unstructured and
semi-structured data Plus suite of really cool tools
30. Turning the 3-Vs of Big Data into Value Understand context
and content What are appropriate actions? Is it Ok to associate my
brand with this content? Is content sad?, happy?, serious?,
informative? Understand community sentiment What is the emotion? Is
it negative or positive? What is the health of my brand online?
Understand customer intent? What is each individual trying to
achieve? Can we predict what to do next? Critical in cross-sell,
personalization, monetization, advertising, etc
31. Many business uses of predictive analytics Analytic
technique Uses in business Marketing and sales Identify potential
customers; establish the effectiveness of a Understanding customer
behavior model churn, affinities, propensities, Web analytics &
metrics model user preferences from data, collaborative filtering,
targeting, Fraud detection Identify fraudulent transactions Credit
scoring credit worthiness of a customer requesting a loan, secured
and loans Manufacturing process analysis Identify the causes of
manufacturing problems Portfolio trading optimize a portfolio of
financial instruments by maximizing returns & minimizing risks
Healthcare Application fraud detection, cost optimization,
detection of events like epidemics, etc... Insurance fraudulent
claim detection, risk assessment Security and Surveillance
intrusion detection, sensor data analysis, remote sensing,
detection, link analysis, etc... Application Log File Analytics
Understanding application, network, event logs in IT
32. Case Studies: 1. Context Analysis (unstructured data) 2.
IOT Case Study 3. Yahoo! Predictive Modeling
33. Reality Check So who is the company we think is best at
handling Big Data?
34. Case Study: Biggest Big Data in advertising? Understanding
context for ads What Ad would you place here?
35. Case Study: Biggest Big Data in advertising? Understanding
context for ads Damaging to Brand?
36. 38 Invited talk #ODSC, Boston Copyright Usama Fayyad 2015
The Display Ads Challenge Today What Ad would you place here?
37. NetSeer: Solving accuracy issues Ambiguity, waste, brand,
safety Why did Google Serve this Ad? 39 this is how NetSeer
actually sees this content
38. NetSeer: How it works 40 high MPG ford low emission fuel
efficiency ECONOMY CARS economy vehicles microscope lenses reading
glasses autofocus bifocal refraction VISION TOOLS eye chart focus
groups A/B testing consumer study surveying blind study analytics
MARKET RESEARCH ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ WEBSITE.COM ~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ electric vehicles
service record safety rating focus DISCERNS AND MONETIZES HUMAN
INTENT + Identifies Concepts expressed on a page + Disambiguates
language + Builds increasingly rich profile over time 52M 2.3B
CONCEPTS RELATIONSHIPS BETWEEN CONCEPTS
39. NetSeer: Intent for Display Positive vs negative content re
a particular topic
40. Problem: Hard to understand user intent Contextual Ad
served by Google What NetSeer Sees:
41. Case Studies: 1. Context Analysis (unstructured data) 2.
IOT Case Study 3. Yahoo! Predictive Modeling
42. The Connected Cow Joseph Sirosh VP, Machine Learning
Microsoft
43. Case Studies: 1. Context Analysis (unstructured data) 2.
IOT Case Study 3. Yahoo! Predictive Modeling
44. Yahoo! One of Largest Destinations on the Web 80% of the
U.S. Internet population uses Yahoo! Over 600 million users per
month globally! Global network of content, commerce, media, search
and access products 100+ properties including mail, TV, news,
shopping, finance, autos, travel, games, movies, health, etc. 25+
terabytes of data collected each day Representing 1000s of
cataloged consumer behaviors More people visited Yahoo! in the past
month than: Use coupons Vote Recycle Exercise regularly Have
children living at home Wear sunscreen regularly Sources: Mediamark
Research, Spring 2004 and comScore Media Metrix, February 2005.
Data is used to develop content, consumer, category and campaign
insights for our key content partners and large advertisers
45. Yahoo! Big Data A league of its own Terrabytes of
Warehoused Data 25 49 94 100 500 1,000 5,000 Amazon Korea Telecom
AT&T Y!LiveStor Y!Panama Warehouse Walmart Y!Main warehouse
GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD
PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge
Exceeds others by 2 orders of magnitude Millions of Events
Processed Per Day 50 120 225 2,000 14,000 SABRE VISA NYSE YSM Y!
Global
46. Behavioral Targeting (BT) Search Ad Clicks Content Search
Clicks BT Targeting ads to consumers whose recent behaviors online
indicate which product category is relevant to them
47. Male, age 32 Lives in SF Lawyer Searched on from London
last week Searched on: Italian restaurant Palo Alto Checks Yahoo!
Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y!
Groups, and hosts a 360 page viewed by 10k people Searched on:
Hillary Clinton Clicked on Sony Plasma TV SS ad Registration
Campaign Behavior Unknown Spends 10 hour/week On the internet
Purchased Da Vinci Code from Amazon Yahoo! User DNA On a per
consumer basis: maintain a behavioral/interests profile and
profitability (user value and LTV) metrics
48. How it works | Network + Interests + Modelling Analyze
predictive patterns for purchase cycles in over 100 product
categories In each category, build models to describe behaviour
most likely to lead to an ad response (i.e. click). Score each user
for fit with every categorydaily. Target ads to users who get
highest relevance scores in the targeting categories Varying
Product Purchase CyclesMatch Users to the ModelsRewarding Good
BehaviourIdentify Most Relevant Users
49. Recency Matters, So Does Intensity Active now and with
feeling
50. Differentiation | Category specific modelling time
intensityscore time intensityscore IntenseClickZone Example 1:
Category Automotive Example 2: Category Travel/Last Minute
Different models allow us to weight and determine intensity and
recency Alt Behaviour 1: 5 pages, 2 search keywords, 1 search
click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1
search click, 1 ad click IntenseClickZone
51. Differentiation | Category specific modelling time
intensityscore Intense Click Zone Example 1: Category Automotive
Different models allow us to weight and determine intensity and
recency with no further activity, decay takes effect Alt Behaviour
1: 5 pages, 2 search keywords, 1 search click, 1 ad click user is
in the Intense Click Zone
52. Automobile Purchase Intender Example A test ad-campaign
with a major Euro automobile manufacturer Designed a test that
served the same ad creative to test and control groups on Yahoo
Success metric: performing specific actions on Jaguar website Test
results: 900% conversion lift vs. control group Purchase Intenders
were 9 times more likely to configure a vehicle, request a price
quote or locate a dealer than consumers in the control group ~3x
higher click through rates vs. control group
53. Mortgage Intender Example We found: 1,900,000 people
looking for mortgage loans. +122% CTR Lift Mortgages Home Loans
Refinancing Ditech Financing section in Real Estate Mortgage Loans
area in Finance Real Estate section in Yellow Pages +626% Conv Lift
Example search terms qualified for this target: Example Yahoo!
Pages visited: Source: Campaign Click thru Rate lift is determined
by Yahoo! Internal research. Conversion is the number of qualified
leads from clicks over number of impressions served. Audience size
represents theaudiencewithinthis behavioralinterest categorythat
hasthe highestpropensitytoengagewitha brandorproductandtoclickon
anoffer. Date: March 2006 Results from a client campaign on Yahoo!
Network Example: Mortgages
54. Experience summary at Yahoo! Dealing with one of the
largest data sources (25 Terabyte per day) Behavioral Targeting
business was grown from $20M to > $400M in 3 years of
investment! Yahoo! Specific? -- BigData critical to operations Ad
targeting creates huge value Right teams to build technology (3
years of recruiting) Search is a BigData problem (but this has
moved to mainstream)
55. Lessons Learned A lot more data than qualified talent
Finding talent in BigData is very difficult Retaining talent in
BigData is even harder At Yahoo! we created central group that
drove huge value to company Data people need to feel like they have
critical mass Makes it easier to attract the right people Makes it
easier to retain Drive data efforts by business need, not by
technology priorities Chief Data Officer role at Yahoo! now
popular
56. Where does this leave us? 1. What matters in the age of
analytics?. Being able to exploit all the data that is available
Proliferating analytics throughout the organization Driving
significant business value 2. Where are we today? New Data
Landscape via Hadoop Confusion by marketers, analysts, and
technical community Struggling with basics of managing data because
of the new flexibility 3. What are the real issues? Data management
and governance Talent for Data and Data Science is rare but
critical Data and Analytics are specialisms that need management
and know-how: not for generalists
57. The early days of mass auto production
58. Todays Auto: It just works! No need to understand what
happens when you turn on ignition Very complex inside, but all
simplicity on the outside
59. Data Science, AI, Algorithms Data Science Artificial
Intelligence / Machine Learning Algorithms Data as a Service (DaaS)
to provide our business units with better, faster, cheaper data
Algorithms Create leading edge Algorithms for Automation via AI,
Machine Learning, and leading edge data science Automation Tech via
AI CoE for Intelligent Automation via AI, ML, case-based reasoning
& Planning Secure Cloud Data Lab Secure Cloud data lab with
open source stack to be able to leverage external datasets and work
with the start-ups to innovate at scale Experimentation evaluation
Data-fuelled evaluation of experimentation at scale, e.g. digital
experiences empowered by data Data Solutions for Customers/ Clients
Create a Lab for innovative solutions for customers and Clients KEY
ADVANCED CAPABILITIES FRAUD / CYBERSECURITY Reduced false alarms
Less write-offs Better intrusion detection, cyber attacks, internal
financial crime BUSINESS IMPACT - EXAMPLES RAPID AUTOMATION
Learning algorithms automate and roboticise manual tasks
automatically by crowd-sourcing Automated reporting and reduced
manual reconciliation SMARTER TRADING ALGORITHMS Smarter trading
algorithms and quantitative analytics enabled by new and more
powerful technology TARGETED MARKETING Leverage automated learning
and classification to customer acquisition, retention, and
activation REVENUE DRIVER Leverage social media / unstructured data
for Cross-sell, up-sell, next-best-action, CRM, etc.
60. Barclays Local Insights: Providing insights about people
www.insights.barclays.co.uk
61. Application Dates: Rising Eagles Graduate Programme:
April-June | Explorer Programme: 1 week in July We recruit on a
rolling bases & early applications are advised find out more
and apply at joinus.barclays.com/Africa
62. Technology opportunities at Barclays We seek world class
technologists, data scientists, problem solvers, Data systems
engineers the team that will reinvent Financial Services Create
game changing new products and services for our Africa businesses
across retail, business banking, Card, Corporate/IB, and
Wealth/Investment management that do not only generate new
experiences and growth but new business models. We are looking for
hungry data talent that is looking to move the needle in all our
businesses. Yassi Hadjibashi Chief Data Officer, Barclays Africa -
DSI Africa is a huge growth opportunity for Barclays. If you know
your Data Science, We want you! If you love Hadoop & BigData,
We love you! And if you want to be part of an industry
transformation around Data in Financial Services, come help us make
it happen! Dr. Usama Fayyad Group Chief Data Officer, Barclays Bank
Security, Data, and Software as code. Basically everything
automated for sophisticated rapid innovation, deployment cycles,
and new age software engineering. Come join our journey and drive
this with backing of most senior leadership and attention!!! Peter
Rix Chief Technology Officer, Barclays Africa