36
Application of Decision- Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic Commerce / Spring 2001, VoL 5, No.3, pp. 45-62. Copyright © 2001 M.E. Sharpe, Inc. All rights reserved. Jong Woo Kim, Byung Hun Lee, Michael J. Shaw, Hsin-Lu Chang, Matthew Nelson

Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts

Paper By:

Source:

International Journal of Electronic Commerce / Spring 2001, VoL 5, No.3, pp. 45-62. Copyright © 2001 M.E. Sharpe, Inc. All rights reserved.

Jong Woo Kim, Byung Hun Lee, Michael J. Shaw, Hsin-Lu Chang, Matthew Nelson

Page 2: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

2

Paper Authors: JONG WOO KIM is an assistant professor in the depart

ment of statistics at Chungnam National University in Teajon, Korea.

BYUNG HUN LEE is a researcher in marketing at MPC LTD. in Korea High-level timing goals.

MICHAEL J. SHAW is director of the Center for Information Systems and Technology Management at the University of Illinois at Urbana Champaign.

HSIN-LU CHANG is a Ph.D. student in information systems at the School of Commerce, University of Illinois at Urbana-Champaign.

MATTHEW NELSON currently is a second-year Ph.D. student in information systems at the University of Illinois.

Page 3: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

3

Paper References:1. Agrawal, R.; Aming, A.; Bollinger, T.; Mehta, M.; Schafer, J.; and Srikant, R. The Quest data

mining system. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining. Portland, OR, 1996, pp. 244-249.

2. Allen, c.; Kania, D.; and Yaeckel, B. Internet World Guide to One-to-One Web Marketing. New York: John Wiley, 1998.

3. Berry, M.J .A., and Linoff, G. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley Computer Publishing, 1997.

4. BroadVision. A BroadVision one-ta-one white paper. White paper, BroadVision (1996).5. BroadVision. http://broadvision.com (2000). 6. Business Week Graphic: On-line sales are soaring. Business Week (June 22, 1998) (based on

data from Forrester Research).7. Chaturvedi, A.R; Hutchinson, G.K.; and Nazareth, D.L. Supporting complex real-time decision

making through machine learning. Decision Support Systems, 10, 2 (September 1993),213-233. 8. CLIPS. CLIPS Riference Manual (Version 6.05). CLIPS, 1997. 9. Garbonell, J.G., and Michalski, R.S. Machine learning: A historical and methodological analysis. Al

Magazine (fall 1983), 69-79. 10. Gupta, 0.;, Digiovanni, M.; Norita, H.; and Goldberg, K. Jester 2.0: Evaluation of a new linear time

collaborative filtering algorithm. 22nd International ACM SIGIR Conference on Research and Development in Informa tion Retrieval (August 1999), pp. 291-292.

11. Johnson, R.A., and Wichern, D.W. Applied Multivariate Statistical Analysis. Englewood-Cliffs, NJ: Prentice Hall International, 1992.

12. Kim, ]., Lee, K., Shaw, M. J., Chang, H., and Nelson, M. A preference scoring tedmique to personalized advertisements on Internet storefront. Working paper, University of lllinois at Urbana-Champaign (2000).

13. Lee, K. Personalized advertisement techniques for one-to-one marketing on Internet stores. Master's thesis, Chungnam National University, 2000.

Page 4: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

4

Paper References: Cont.

14. LikeMinds. LikeMinds'·personalization server 3.0 technology backgrounder. White paper, Andromedia (1999).

15. Mehta, M.; Agrawal, R,: and Rissanen, J. SLIQ: A fast scalable classifier for data mining. In Proceeding of the Fifth International Conference on Extending Database Technology. Avignon (March 1996).

16. Net Perceptions. www.netperceptions.com (2000). 17. Peppers, D., and Rogers, M. The One to One Future: Building Relationships One Customer at a

Time. New York: Currency and Doubleday, 1993.18. Quinlan, J.R. Induction of decision-trees. In J .W. Chavlik, W. Jude, and T.G. Dietterich (eds.),

Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann, 1990, pp. 57-69. 19. Quinlan, J.R. Leaming decision-tree classifiers. ACM Computing Surveys, 28, 1 (March 1996), 71-

72. 20. Schafer, J.B.; Konstan, J.;: and Riedl, J. Recommendation system in e commerce. In Proceedings

of the ACM Conference on Electronic Commerce, Denver (November 3-5,1999), pp. 158-166. 21. Shaw, M.J Machine leaming methods for intelligent decision support. Decision Support Systems,

10,2 (September 1993), 79-83. 22. Shaw, M.J.; Blanning, R; Strader, T.; and Whinston, A. Handbook on Electronic Commerce. Berlin

and Heidelberg: Springer-Verlag, 2000.23. Tsai, L.-H., and Koehler, GJ The accuracy of concepts learned from induction. Decision Support

Systems, 10,2 (September 1993), 161-172. 24. Weiss, S.M., and Indurkhya, Nitin. Predictive Data Mining: A Practical Guide

Page 5: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

5

Reference for the paper: http://an5qy7ag4q.scholar.serialssolutions.com/?

sid=google&auinit=JW&aulast=Kim&atitle=Application+of+Decision-Tree+Induction+Techniques+to+Personalized+Advertisements+on+Internet+Storefronts&title=International+journal+of+electronic+commerce&volume=5&issue=3&date=2001&spage=45&issn=1086-4415

FOR MORE INFO...

STONY BROOK UNIVERSITY LIBRARIES

Page 6: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

6

Introduction This paper studies personalized recommendation

techniques that suggest products or services to the customers of Internet storefronts based on their demographics or past purchasing behavior

Page 7: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

7

Currently available techniques Currently available recommendation

techniques are primarily based: Collaborative filtering Preference scoring Rule-based approaches

Page 8: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

8

Currently available techniques cont.

Collaborative filtering selects advertisements for customers based on the opinions of other customers with similar past preferences.

The preference-scoring approach to personalized recommendation uses a preference-score concept to select personalized advertisements based on initial customer profile, purchase history, and behavior in Internet stores

In the rule-based approach, marketing rules from marketing experts are a core component in providing personalized advertisements.

Page 9: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

9

Rule based approach

Based on the marketing rule that “undergraduates in Indiana whose income is greater than $80000 prefer to own Luxury cars”, car advertisements are displayed whenever customers who live in Indiana access the Internet storefront.

Page 10: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

10

Rule based approach

Page 11: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

11

Issues in Rule based approach The marketing rules for personalized

recommendation usually come from marketing experts in that domain. The acquisition of marketing rules, however, is

unsystematic, time-consuming, and dependent on the intuition of marketing experts.

The marketing rules in a knowledge base need to be continuously updated and changed, but manual change management is a difficult task.

Page 12: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

12

Proposed Solution To overcome the defects of the rule-

based approach in personalized advertisements, the authors propose decision-tree induction techniques that extract marketing rules from databases in Internet stores.

Rule-based personalization has two phases: marketing-rule extraction, and real-time advertisement selection

Page 13: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

13

Proposed Solution In order to obtain valuable marketing rules,

decision tree induction technique is used to analyze purchase-transaction histories, customer profiles, and product information.

The extracted marketing rules are stored in a marketing-rule base and are used for real-time personalized-advertisement selection when customers visit the Internet store

Page 14: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

14

Proposed Solution Schematic

Page 15: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

15

Marketing Rule Extraction The extraction of

useful marketing rules using decision-tree induction techniques begins by defining a hierarchy tree of product categories to extract marketing rules at various abstraction levels of product categories

MP3 Files Sporting goods

Pop Music Classical Music

Symphony ConcertRap

Dance Metal

Shakira Britany

Kim Rolla

Page 16: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

16

Marketing Rule Extraction

The rule-extraction phase has four steps: (1) selecting learning data, (2) generating target variables, (3) constructing a decision tree, and (4) selecting a marketing rule

Page 17: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

17

Marketing Rule Ext. – Phase 1

Step 1 Training data sets and test data sets are

selected from customer records, making it possible to begin with a data set of manageable size. The size of a data set is determined by such factors as number of customer-profile records, the volume of disk storage, and the computing capacity of the inductive learning tools.

Page 18: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

18

Marketing Rule Ext. – Phase 1 Step 2

In the next step, values of target variables are generated for the selected data records. In the present approach, target variables are whether or not a customer prefers a specific product category in the hierarchy tree. Since the target variables do not exist in the databases, they must be generated. This is done using the purchase-transaction database.

Page 19: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

19

Marketing Rule Ext. – Phase 1 For example, customer profile records can have the following

attributes:

CUSTOMER = <ID, age, gender, hobby, location, married, job, ..., education>  

After the target variables are generated, CUSTOMER relation is extended as follows.

CUSTOMER' = <ID, age, gender, hobby, location, married, job, ..., education, t_MP3_file, t_Sporting_Goods/Leisure, t_Pop_Music, t_Classical_Music, ....., t_Fitness_Accessories>  

   where t_MP3_file, Etc. are binary variables to specify a customer's preference for the specific product categories. That is, the binary variable is assigned to 1 when the customer prefers the corresponding product category.

Page 20: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

20

Marketing Rule Ext. – Phase 1 Step 3

The third step is decision-tree construction. Using inductive learning tools and the training data set, decision trees are inducted for all the target variables.

Step 4 After the decision tree is constructed, useful

marketing rules are filtered from constructed decision-trees based on the validation results from test data sets and accuracy of the rules.

Page 21: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

21

Marketing Rule Ext. – Phase 1

Page 22: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

22

TARGET-VARIABLE GENERATION

There are several possible ways to generate target variables based on the purchase-transaction database: (1) the counting-based method, (2) the expected-value-based method, (3) the statistics-based method, and (4) the subcategory-based method.

Page 23: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

23

Counting-based method The counting-based method, based on the number of

purchases in a specific product category, makes it possible to decide whether or not the customer prefers the product category.

That is,

    PRE[subij] = 1 if NP[subij] >= Omegawhere, NP[subij] = the number of

purchase of customer i for product category j

Omega is a minimum threshold, which is determined by analysts.

    PRE[subij] = 0 otherwise    

Page 24: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

24

Expected-value-based method

N[subj] is the number of products in product category j.Alpha is a multiplier for the expected value. Usually alpha is equal to or greater than 1. When alpha is 1, if customer i purchased more items in product category j than the expected number of purchases of product category j, this means that customer i prefers product category j

Page 25: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

25

Statistics-based method

Statistical values like the mean, the median, the first quartile, and the third quartile are used to generate target variables

Page 26: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

26

Subcategory based method In the case of non-leaf-node product

categories, the corresponding target variables can be generated based on subcategory target variable values (when the target variables for subcategories are already generated using one of the above methods).

PRE[subij] = 1 if PRE[subik] = 1 for some subcategory k of j    PRE[subij] = 0 otherwise

Page 27: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

27

MARKETING-RULE SELECTION A decision tree includes marketing rules that link customer

demographics with preferences for product categories. Since marketing rules may have low predictability or may not be accurate, the valuable marketing rules have to be selected from constructed decision trees.

Two phase heuristics are used for marketing-rule selection.    1. Choose decision trees whose predictability (1 - misclassification rate), with respect to a training data set, is greater than a certain threshold.    2. Select nodes from the filtered decision trees whose accuracy (i.e., purity) is greater than a certain threshold.    

The selected nodes in the decision trees are translated in the rule structure. Here is an example of a translated rule.

If age < 30 and gender = male then Ballad, with accuracy = .9 and level = 3.

Page 28: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

28

ADVERTISEMENT SELECTION Personalized advertisements are selected

on a real-time basis. Let M be the number of advertisements to

be displayed in an Internet storefront, and L the depth of the product-category hierarchy tree.

The advertisement-selection algorithm is as follows. STEP 1 m: = 0,l: = L, generate fact set using the

specific customer's profile.

Page 29: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

29

Advertisement Selection Algorithm Continued.

STEP 2 Until m < M and 1 > 0 2.1 Fire marketing rules of level 1 with

customer's facts. 2.2 If there are matching rules, matched rules

are fired by accuracy order. The product categories appeared in consequent parts of the rules are selected as preferred product categories for the specific customer. Choose advertisements for that category from the advertisement database. The number of selected advertisements should be less than M - m.

2.3 If m = M, exit. 2.4 Update m to reflect the additionally selected

advertisements, and update l to l-1.

Page 30: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

30

Advertisement Selection Algorithm Continued.

STEP 3 If m < M, randomly select M - m advertisements

from the advertisement database

Page 31: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

31

EXPERIMENT EXPERIMENTAL DATA COLLECTION

Data from 330 respondents were gathered. At first, the respondents were asked to select preferred product categories in MP3 music files and sporting goods or leisure equipment. Then they were asked to choose five MP3 music files among the 32 displayed products. The displayed products were equally selected from all MP3 leaf nodes. The respondents were also asked to purchase five sporting goods or leisure equipment items among the 24 displayed products. The displayed products were equally selected from all sporting goods or leisure equipment leaf nodes (three products from each of the eight leaf nodes). On the next two pages, respondents were asked to rank MP3 music files and sporting goods/leisure equipment items on a five-point personal-interest scale (1 = low interest to 5 = high interest). 16 advertisements for MP3 music files were displayed to rank interests about the advertisements, which were selected one item from each leaf node. In addition, 16 advertisements for sporting goods or leisure equipment items were displayed to rank interests about those, which were two selected items from each leaf node. Finally, the respondents were asked to specify their profiles, including age, gender, job, and so on.

Page 32: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

32

MARKETING-RULE EXTRACTIONThe 330 respondents were divided into two data sets, one for marketing-rule generation, and the other for testing the effectiveness of the proposed approach. For marketing-rule generation, 198 responses (60% of the total data set) were used. Among the 198 data items, 70 percent were used to construct decision trees, and 30 percent to validate constructed decision trees. For target-variable generation, an expected value-based approach of alpha= 1 was used. All told, 37 decision trees were constructed (26 for MP3 music files and 11 for sporting goods or leisure equipment). Decision trees with predictability greater than 65 percent were selected in the marketing-rule selection step. Decision rules with accuracy greater than 65 percent were selected as valuable marketing rules.

Page 33: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

33

EFFECTIVENESS TEST To test the effectiveness of the proposed decision-tree

induction approach, the results were compared to the preference-scoring approach and a random selection. The data of 132 respondents (40% of the total data set) were isolated for effectiveness testing. Three advertisements were selected for each respondent: that is, one advertisement using each of the three approaches for each respondent. In the case of the decision-tree induction approach, the marketing rules generated earlier were used to select personalized advertisements. In the case of the preference-scoring approach, advertisements were selected based on the respondents' own interest-manifestation behavior (purchase history, profile, preferred product categories). In the case of random selection, advertisements were selected on a simple random basis. The means of the five-point-scale personal-interest scores for the selected advertisements were compared statistically

Page 34: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

34

RESULT

Page 35: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

35

Conclusion Statistically, there were differences among the five-point-scale

scored means of customer' interest in advertisements from the three different approaches: decision-tree induction, preference scoring, and random selection. Thus, the effectiveness of personalized-recommendation techniques is affected by the decision algorithm utilized in the personalized- advertisement selection process.

The proposed decision-tree induction approach is effective in generating marketing rules as a means to assist rule-based personalized recommendations.

In the case of sporting goods or leisure equipment, decision-tree induction gave the best results. But in the case of MP3 music files, preference scoring gave the best results. This indicates that the appropriateness of personalized-advertisement techniques depends on the characteristics of product categories. Thus it is necessary to study how various personalized-advertisement techniques can be used cooperatively.

Page 36: Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic

Source: International Journal of Electronic Commerce 5 no3 45-62 Spr 2001

36

ThankYou

Next Presentation – 5/2/2006 Topic - New Advances in Data Mining

By: Group 14

Madhavarapu,Chidroop Sandhuria,Deepanshu

DON’T FORGET TO COME TO CLASS!