20
Data Science and Your Financial Journey 10/3/2015 Winnie Cheng, Chief Data Scientist @Bankrate.com

UIUC Reflections 2015: Data Science and Your Financial Journey

Embed Size (px)

Citation preview

Page 1: UIUC Reflections 2015: Data Science and Your Financial Journey

Data Science and Your Financial Journey

10/3/2015

Winnie Cheng, Chief Data Scientist

@Bankrate.com

Page 2: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20152

What is Bankrate.com?EMPOWERS YOU TO MANAGE $ FOR YOUR LIFE GOALS

Image Source: http://www.blue-fs.co.uk/

Credit Cards

Mortgage

Auto Loans

Investments

Insurance

Retirement

Page 3: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20153

Faces behind Bankrate.comHQ in sunny Palm Beach Florida

Offices in NYC, SF and major US Cities

500+ Employees

Page 4: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20154

Where does Data Science come in?EVERYWHERE.

Information Relevancy

what you want to know. when you want to know it

1

Search & Reachability

get the word out there

2

Marketplace Intelligence

connect what you need to who we know

3

SVM

Anomaly Detection

Page 5: UIUC Reflections 2015: Data Science and Your Financial Journey

• Determine: Are you looking to buy a home or plan for retirement?

• User Segmentation divides visitors into groups with similar characteristics

• What do we know?

– For each user, articles read on our site

• Group users with similar preference into a segment

– Machine learning can help here

• Compute user-to-user similarity

• Apply Hierarchical Clustering Algorithm

Distance = 1 - similarity

10/3/2015

User Intent and Segmentation: Who are you?PREDICTING YOUR LIFE STAGE

Page 6: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20156

Nodes: Users

Edges: User-User Similarity

Visualizing User Segments

Page 7: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20157

Visualizing User Segments

Car Buyers

Tax

Retirement

Home Purchase

Page 8: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20158

• Previous approach useful for looking at how users naturally form groups

• But what if we want to find a very specific group of users?

– E.g., those who are looking to buy home for the first time?

• Find list of relevant articles with item-to-item similarity

– Start with a few articles on first-time home buying

– Similarity model helps identify more articles relevant to first-time homebuying

• People that read X also read Y

• If users read any of relevant articles, they are in group

Finding a specific user segmentTURNING THE PROBLEM AROUND

Page 9: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/20159

• Product Managers are constantly improving the look-and-feel of our site

and Digital Content teams work hard to produce informative articles and

engaging videos.

Moneyball: Site Optimization FrameworkALGO-DRIVEN IMPROVEMENT OF SITE

Moneyball – Decide what works, what doesn’t

• How to determine whether this headline is better than another variation?

• How to serve ‘best’ headline or design variation in timing manner?

• How to do this with thousands of components changing at the same time?

Or “Rates are going up

with more home buyers”?

Page 10: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201510

• Want to pick and serve variations that have highest user engagement (click-

through-rate CTR)

• Analogous to abstract class of math problem: Multi-Armed Bandit

• Slot Machines in a Casino

– Which machine (arm) to pull next to maximize my chance of hitting jackpot

without going broke?

Multi-Armed Bandit Problem

Slot Machines -> Design Variations

Pull It -> Show it (Page Impression)

Jackpot -> User Click

Cost to Pull -> Cost of showing ‘bad’ variation and losing the click

Casino to Site Optimization

Page 11: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201511

• Algorithm for multi-armed bandit problem that minimizes the regret

• Key idea:

– For each variation, estimate and model the distribution of CTR

– Pick next variation based on sampling from these distribution and taking the one

with higher CTR

Bayesian Bandit

CTR

Pro

babili

ty

W1: 56.37%W2: 43.63%

W1: 9.82%W2: 90.18%

Steady state CTR

Variation 1: 25%Variation 2: 30%

Page 12: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201512

Big Data Stack to Support Real-Time Streaming

Full-feedback tech stack

• Designed low-latency big data platform from the grounds-up

• Worked closely with data engineering team

• Ensure 24/7 availability and scalability to multiple data centers

Page 13: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201513

• Great algorithms and technology to provide relevant information to users

• But how do users find Bankrate.com?

– Portion of traffic comes from major search engines

• SEO team makes our site easily reachable from these engines

• Data Science helps SEO team:

– Understand connectivity of pages within our site

– Assess PageRank of urls relative to each other

– Perform Path Analysis and identify deadlinks

Search Engine Optimization (SEO)

Page 14: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201514

Network Graph with Neo4j

Page 15: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201515

• Having accurate tags on articles is important as they serve as inputs for

content recommendation, other machine learning algorithms and internal

data reporting and visualization

• Approach:

– Apply supervised learning to suggest what tags should be associated to a new

article based on the words in it

– Text is messy, several Natural Language Processing (NLP) techniques to get a

good set of words

• Stemming, Term Frequency Filtering, etc.

• Topic modeling to visualize processed text

Content TaggingIMPROVE CATEGORIZATION OF ARTICLES

Page 16: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201516

Topic ModelingLATENT DIRICHLET ALLOCATION (LDA) CLUSTERING ALGORITHM (# TOPICS=10)

Page 17: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201517

• Train classification algorithm (RandomForest, SGD)

– Present it with articles and how each is tagged

Tag Suggestion as Classification ProblemSHOULD WE TAG THIS ARTICLE WITH KEYWORD X?

addit isnt mile particularli knock increasingli fear suffer 24 whose privat group monitor monitor monitor monitormonitor expos mother had state better certainli deviat trauma must senat norm woman woman around familiar watchwatch solut know fall fall shadow surviv requir choic enabl mother-in-law $59 benefit she she quick bone still oldmental replac $100000 see cost cost video home home home home home home home 80 genworth said clockpattern away abl abl abl abl figur figur estim health testifi told plu patient situat base let tub stay sinc care care carecare comparison last testimoni my technolog technolog surround fell place unabl committe long-term chang think firstfirst live point schedul independ elder elder alreadi famili famili famili nurs nurs nurs nurs save save fee commununivers regist system system system system system system system long by stuck reli privaci privaci privaci monthlidevast immedi devis tell tell friend friend door big hundr electr ridden phone broke $9 instant $3 he 10 octoberinsignific hour possibl provid older bed guilt didnt aliv featur comput almost certain deliv stroke lie engin floor policother frail imag arent insur normal assist assist reach someon alert alert healthi moment aros medicaid physic billionbillion wife professor no spent issu issu issu medicar missouri

Actual Tags: retired, investingPredicted Tags: retired, investing

Post-Processed Text of Article

Page 18: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201518

• Due diligence in onboarding new

lenders

How to ensure lenders are protected

from malicious traffic and click bots?

• Fraud Detection Models as semi-

supervised machine learning

– Identify outliers

– Construct initial training set

– Train model to predict whether a

given click is fraudulent or not

• Some observations:

– Foreign countries, site crawl

– Intent inconsistencies

Bankrate.com a Trusted MarketplaceCLICK FRAUD DETECTION – HOW TO ENSURE LENDERS ARE PROTECTED?

Users

(Home Buyers)Banks & Lenders

Page 19: UIUC Reflections 2015: Data Science and Your Financial Journey

10/3/201519

Model-driven insights:

Can we assess how likely a user will get a

mortgage approval? Suggest remedial actions?

Can we predict CTR and conversion from

market conditions and site dynamics?

Can we anticipate demands for specific

financial products? (e.g. refinance season)

… and more.. to improve your financial journey

Assessing Lead Quality and DemandTAKING IT FURTHER

Users

(Home Buyers)Banks & Lenders

Page 20: UIUC Reflections 2015: Data Science and Your Financial Journey

Join our Data Science team!

Use Bankrate.com for your $-questions!

[email protected]

10/3/2015