Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
All You Need to Know to Build a Product Knowledge Graph
Nasser ZalmoutAmazon
Chenwei ZhangAmazon
Xian LiAmazon
Yan LiangAmazon
Xin Luna DongAmazon→Facebook
OutlineOverview and Introduction
Knowledge Extraction
Knowledge Cleaning
Break
Ontology Mining
Applications
Conclusion and Future Directions
20 min
40 min
25 min
20 min
25 min
20 min
10 min
Overview and Introduction
Overview and Introduction 20 min
Knowledge Extraction
Knowledge Cleaning
Q&A
Break
Ontology Mining
Applications
Conclusion and Future Directions
Q&A
Knowledge Graph Example for 2 Songs
artist mid345
mid346
mid127
mid129
mid128
genre
song_writer
name
name
name
name
name
“Shake it off”
“Love Story”
“Taylor Alison Swift”
“Taylor Swift”
“Country pop”
artist
“Dance-pop”
“Pop”name
name
12/13/1989birth_date
Recording type
type
GenretypeEntity type
Entity
Relationship
genre
Product Graph Example for 2 Songs
artist
mid345
mid346
mid127
mid129
mid128
genre
song_writer
name
name
name
name
name
“Shake it off”
“Love Story”
“Taylor Alison Swift”
“Taylor Swift”
“Country pop”
artist
“Dance-pop”
“Pop”name
name
12/13/1989birth_date
Genretype
genre
Product Graph Example for 2 Songs
artist
mid345
mid346
mid127
mid129
mid128
genre
song_writer
name
name
name
name
name
“Shake it off”
“Love Story”
“Taylor Alison Swift”
“Taylor Swift”
“Country pop”
artist
“Dance-pop”
“Pop”name
name
12/13/1989birth_date
Genretype
genre
mid568
mid570
ASIN
ASIN
B0035QUXWR
B0067XLIG8
type
B0035QUXWQ
B0067XLIG4
ASIN
ASIN
mid567
mid569
mid571
product
product
product
product
product
Release
Track
Product Graph Example for 2 Products
Use Case I: Providing Information
Use Case II: Providing Choices
Use Case III: Improving Search
Use Case III: Improving Search
Use Case III: Improving Search
Use Case IV: Improving Recommendation
Product Graph vs. Knowledge Graph
Generic KG
PG
Generic KG
PG
Generic KG
PGPG
(A) (B) (C)
Generic KG
PG
Generic KG
PG
(A) (B) (C)
Generic KG
Product KG
(Hardline, softline, consumables, etc.)
Movie,Music,Book,etc.
✅
Product Graph vs. Knowledge Graph
But, Is The Problem Harder?
Generic KG
Product KG
(Hardline, softline, consumables, etc.)
Movie,Music,Book,Aetc.
We focus on retail products in
this tutorial
Challenges in Building Product Graph I
❑Sparse and noisy structured data
Challenges in Building Product Graph II
❑Extremely complex domains
❑How to identify the millions of product types?
❑How to organize types into a taxonomy tree?
Sellers’ view Buyers’ view
Challenges in Building Product Graph III
❑Big variety across product types
❑Different attributes apply to different product types
❑Different value vocabularies and different patterns
Challenges in Building Product Graph III
❑Big variety across product types
❑Different attributes apply to different product types
❑Different value vocabularies and different patterns
Scale Up in 3 Dimensions
Thousands of attributes
Millions of categories
Hundreds of languages
Big challenge: Limited training labels for large-scale, rich data
Can We Build A Self-Driving Product Understanding System?
Our Goal: Self-Driving Product Knowledge Collection
Product Type Flavor Color
Product 1 Snacks Cherry
Product 2 Candy ? ?
Product 3 Candy Choc. Gold
AutoKnowUser logs
Grocery
Snacks Drinks
Candy
Product KG Grocery
Snacks Drinks
CandyPretzels
Catalog
Taxonomy
Prod. 1 Prod. 2
Gold
color
Prod. 3
Choc.Chocolatesynonym
flavor flavor
hasType
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Our Goal: Self-Driving Product Knowledge Collection
Product Type Flavor Color
Product 1 Snacks Cherry
Product 2 Candy ? ?
Product 3 Candy Choc. Gold
AutoKnowUser logs
Grocery
Snacks Drinks
Candy
Product KG Grocery
Snacks Drinks
CandyPretzels
Catalog
Taxonomy
Prod. 1 Prod. 2
Gold
color
Prod. 3
Choc.Chocolatesynonym
flavor flavor
hasType
New Type
New Value
Corrected Value
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product Knowledge Collection
Product Type Flavor Color
Product 1 Snacks Cherry
Product 2 Candy ? ?
Product 3 Candy Choc. Gold
AutoKnowUser logs
Grocery
Snacks Drinks
Candy
Product KG Grocery
Snacks Drinks
CandyPretzels
● #Types ↑ 3X ● Defect rate ↓up to 68 percent points
Catalog
Taxonomy
Prod. 1 Prod. 2
Gold
color
Prod. 3
Choc.Chocolatesynonym
flavor flavor
hasType
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Input Data
PT Taxonomy
Catalog
Behavioral Signals (e.g., search logs,
reviews, Q&A)
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product Knowledge Collection
Input Data
PT Taxonomy
Catalog
Behavioral Signals (e.g., search logs,
reviews, Q&A)
Taxonomy Enrichment
Relation Discovery
Ontology Suite
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product Knowledge Collection
Synonym Discovery
Data Imputation
DataCleaning
Data SuitePT Taxonomy
Catalog
Behavioral Signals (e.g., search logs,
reviews, Q&A)
Taxonomy Enrichment
Relation Discovery
Ontology Suite
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product Knowledge Collection
Broad Graph
{product, attribute,
value}
{value, synonym,
value}
Ontology
Synonym Discovery
Data Imputation
DataCleaning
Data SuitePT Taxonomy
Catalog
Behavioral Signals (e.g., search logs,
reviews, Q&A)
Taxonomy Enrichment
Relation Discovery
Ontology Suite
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product Knowledge Collection
Self Driving to Navigate a Large Space
• Automatic: Fully ML-based• Annotation free: Weak learning based on existing
Catalog data and user behavior• One-size-fits-all: Few taxonomy-aware models• Self guidance: Identify important attributes and
categories to focus efforts
Key Intuition I. Learning w. Limited Labels Generated from Existing Catalog Data
Product Type Flavor Color
Product 1 Snacks Cherry
Product 2 Candy ? ?
Product 3 Candy Choc. Gold
Grocery
Snacks Drinks
Candy
Catalog
Taxonomy
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Key Intuition II. Rich Customer Behavior
Grocery
Snacks Drinks
CandyPretzels
CandyPretzels
Prod. 1 Prod. 2
Gold
color
Prod. 3
Choc.Chocolatesynonym
flavor flavor
hasType
Query 1 Query 2 Query 3
Prod. 1 Prod. 2 Prod. 3
purchase purchase purchase
co-purchase co-view
mention
mention
mention
mention
Key Intuition III. Product Categories as First-Class Citizen in Modeling
Grocery
Snacks Drinks
CandyPretzels
Prod. 1 Prod. 2 Prod. 3
hasType
Key Intuition IV. Leverage Graph StructureTaxonomy is a tree
Grocery
Snacks Drinks
CandyPretzels
CandyPretzels
Prod. 1 Prod. 2
Gold
color
Prod. 3
Choc.Chocolatesynonym
flavor flavor
hasType
KG is a graph
Query 1 Query 2 Query 3
Prod. 1 Prod. 2 Prod. 3
purchase purchase purchase
co-purchase co-view
Customer behavior forms a graph
Key Intuition V. Leverage Different Modals
Key Techniques in AutoKnow
Deliver the Data Business
1000 000000, , ,
Deliver the Data Business
1High
precisionmodels
Deliver the Data Business
1000,High
precisionmodels
E2E pipeline + AutoMLto reduce
modeling cost
Deliver the Data Business
1000 000, ,High
precisionmodels Scale-up to
reduce #models
100s attributes
1000s categories
10s languages
E2E pipeline + AutoMLto reduce
modeling cost
Deliver the Data Business
1000 000000, , ,High
precisionmodels Scale-up to
reduce #modelsHigher yield from
multi-modal models
100s attributes
1000s categories
10s languages
E2E pipeline + AutoMLto reduce
modeling cost
Broad Graph
{product, attribute,
value}
{value, synonym,
value}
Ontology
Synonym Discovery
Data Imputation
DataCleaning
Data SuitePT Taxonomy
Catalog
Behavioral Signals (e.g., search logs,
reviews, Q&A)
Taxonomy Enrichment
Relation Discovery
Ontology Suite
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Tutorial Structure
Sec 2
Sec 3
Sec 4
Sec 5. Applications
Section Structure
• Problem DefinitionWhat are unique challenges for PG beyond generic KGs?
• Short answer -- key intuitionWhat are key intuitions for building product KGs?
• Long answer -- detailsWhat are practical tips?
• Reflection/short-answer Can we apply the techniques to other domains?
Key Questions We Answer in This Tutorial
• Q1. What are unique challenges to build a product knowledge graph and what are solutions?
• Q2. Are these techniques applicable to building other domain knowledge graphs?
• Q3. What are practical tips to make this to production?