19
Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Embed Size (px)

Citation preview

Page 1: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Wine Informatics

Dr. Bernard Chen Ph.D.University of Central Arkansas

Page 2: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Data science Data science is the study that incorporates varying

techniques and theories from distinct fields, such as Data Mining, Scientific Methods, Math and Statistics, Visualization, natural language processing, and the Domain Knowledge, to discover useful information from domain-related

data.

Page 3: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Domain Knowledge in Wine The quality of the wine is usually

assured by the wine certification, which is generally assessed by Physicochemical, and sensory tests

The existing data mining researches focus on the physicochemical laboratory tests much more than sensory tests.

Page 4: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Domain Knowledge in Wine it is very interesting to mine useful information

from those sensory testing notes for answering the questions such as

“What makes wine become a 90+ one?”, “What is the common characteristics shared by 90+

Napa Cabernet sauvignon?”, “What are the group of the wine share similarities?”, “What are the characteristics differ the wine from

France and Italy?”

Page 5: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Domain Knowledge in Wine The key to the success of the wine sensory

related data science research relays on the consistent reviews from prestigious experts.

Several popular wine magazines provide widely accepted sensory reviews toward wines produced every year, such as Wine Spectator [13], Wine Advocate [14], Decanter [15]

Page 6: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Wine Spectator Review Example Kosta Browne Pinot Noir Sonoma Coast 2009 Ripe and deeply flavored, concentrated and

well-structured, this full-bodied red offers a complex mix of black cherry, wild berry and raspberry fruit that's pure and persistent, ending with a pebbly note and firm tannins. Drink now through 2018. 5,818 cases made.

Page 7: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Wine Spectator

Our first dataset is compiled from the list of “Top 100 Wines of 2011” [16] by Wine Spectator, a lifestyle magazine that focuses on wine and wine culture.

Their reviews are straight and to the point.

Page 8: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Review Example Kosta Browne Pinot Noir Sonoma Coast 2009 Ripe and deeply flavored, concentrated and

well-structured, this full-bodied red offers a complex mix of black cherry, wild berry and raspberry fruit that's pure and persistent, ending with a pebbly note and firm tannins. Drink now through 2018. 5,818 cases made.

Page 9: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Ann C. Noble’s Wine Aroma Wheel

Page 10: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Our own wine wheel Based on “Top 100 wines in 2011”, we

analyzing all one hundred wine reviews and adding all necessary categories and subcategories, we came out with a total of 547 distinct attributes.

When looking at our finished list, we noticed many cases where groups of attributes were really just permeations of the same thing.

An example would be the following three attributes: FRESHLY-CUT APPLE, RIPE APPLE, and APPLE.

Page 11: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Hierarchical Clustering

DendrogramVenn Diagram of Clustered Data

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 12: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Distance Measure

Page 13: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Distance Measure Example

WINE CHERRY CHEWY TANNINS

BEAUTY

WINE1 1 1 1

WINE2 0 0 1

Page 14: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Clustering Results

Page 15: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Clustering Results1 2 3 4 5 6 7 8 9 10 11

Page 16: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Clustering Results

Ref#

Vintage

Type

Varietal

1 2008 RED MERLOT (.53) - CABERNET FRANC (.29) - CABERNET SAUVIGNON (.13) - MALBEC (.04) - PETIT VERDOT (.01)

2 2008 RED CABERNET SAUVIGNON

3 2009 RED PINOT NOIR

4 2007 RED CABERNET SAUVIGNON

5 2007 RED SANGIOVESE (.90) - CANAIOLO/COLORINO (.10)

6 2004 RED TEMPRANILLO

Ref#

World

Country Region Alcohol

Price

Drink Begin

Drink End

1 NEW United States

Washington   $35 NOW 2020

2 NEW United States

Washington 14.5% $37 NOW 2018

3 NEW United States

California 13.9% $45 NOW 2019

4 NEW United States

Washington 14.6% $32 NOW 2019

5 OLD Italy Tuscany 14% $22 NOW 2022

6 OLD Spain Castilla y Leon

  $15 NOW 2015

Page 17: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Clustering Results

CLUSTER #3 – 6 Instances – Attribute InformationAttribute Number of Wines Attribute

WeightBLACKBERRY 6 3LONG FINISH 5 2

SPICE 4 3FRUIT 3 1

BLACK CHERRY 3 3RED 3 2

FOCUSED 3 1EXCELLENT

FINISH3 2

RIPE 3 1TANNINS_LOW 3 2TANNINS_HIGH 3 2Suggestions

This cluster represents the fruity aspect of new-world wines, focusing on powerful notes of blackberry and black cherry, as well as a commanding finish.

Page 18: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Conclusion In this paper, we discuss Wine Reviews

and how their attributes can play an integral role in grouping different wines together.

We show that when using only the attributes of a wine review, we can aggregate wines together that have similar world region, monetary value, vintage, type, and varietal.

Page 19: Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas

Thanks

Questions?