13
Dato Confidential Fraud Detection Webinar Alon Palombo Data Scientist [email protected] Product Matching Webinar

Webinar - Product Matching - Palombo (20160428)

Embed Size (px)

Citation preview

Page 1: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Fraud Detection Webinar

Alon PalomboData Scientist

[email protected]

Product Matching Webinar

Page 2: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Agenda• Who is Dato?• Data science workflow• What is product matching?• Demo using real public data• Questions

Page 3: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Dato: We Intelligent Applications

45+ and growing fast!

Page 4: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Customers

Page 5: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Data Science workflow

Ingest Transform

Model DeployUnstructured Data

Page 6: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

What is product matching?• In 2016, global e-commerce sales are expected to

reach $1.92 Trillion.

• Online retailers and price comparison sites curate product catalogues by aggregating from multiple sources.

• Product matching is the task of keeping these catalogues free of duplicates, full of attributes per product, and consistent across different sites.

6

Page 7: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

DifficultyStructured Attributes

Reviews

Images

Description

Thor, Andreas. "Toward an adaptive String Similarity Measure for Matching Product Offers." GI Jahrestagung (1). 2010.

{Aggregate MultipleSources

Page 8: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Definition• Ironically, there are similar names for very similar

problems:• Entity resolution• Record linking• De-duplication• Reference reconciliation• Data matching• and more…

Page 9: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Definition• In GraphLab Create we distinguish between Record

Linkage and De-duplication.

• Record Linkage refers to matching structured query records to a fixed set of reference records with the same schema.

• De-duplication refers to assigning an entity label to each row. Records with the same label are likely correspond to the same real-world entity.

Page 10: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Product matching demo – using real public data

Page 11: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Summary• Product matching is at the heart of e-commerce.• Many relevant similar problems with similar

solutions.• Easy exploration, modeling, and evaluation using

GraphLab Create.

Page 12: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Our machine learning course

https://www.coursera.org/learn/ml-foundations

Page 13: Webinar - Product Matching - Palombo (20160428)

Dato Confidential

Questions?

[email protected]