4
Usage of Nearest Neighborhood, Decision Tree and Bayesian Classification Techniques in Development of Weight Management Counseling System Sunita Soni Reader,BIT, Durg(C.G.-491001)-India [email protected] Jyothi Pillai Sr. Lecturer BIT Durg(C.G.491001)-India [email protected] Abstract Case based Reasoning (CBR) is an approach for solving a new problem by remembering a previous similar situation and by reusing information and knowledge of that situation. Selection and generation of cases are two important components of a CBR system. Obesity is one of the most significant public health problems facing the whole world. Children have been weighing progressively more since the 1970s, the first phase of the obesity epidemic that now has entered a second phase of serious health problems related to overweight, including diabetes, certain types of cancer, and cardiovascular disease. Proper counseling on nutrition and appropriate physical activity can control the problem of obesity. In previous paper, we proposed a Case Based Framework for weight management counseling to obese children. In this paper, three Data Mining techniques: Nearest Neighborhood, Decision Tree and Bayesian Classification, were applied on distributed case bases for Case retrieval and Case adaptation. Keywords- Case Based Reasoning, Obesity, Nearest Neighborhood, Decision Tree, Classification, Euclidean Distance, Data Mining. 1. Introduction Family history of obesity, snacking of high-energy foods and lack of physical activity were the important influencing factors of obesity. 50-80% of obese children will continue as obese adults and falls into risk group of Diabetes, Hypertension, Coronary Heart Diseases and many more obesity related diseases. Case-based reasoning tries to solve new problems by reusing solutions that were applied to past similar problems and uses results to fit a new problem situation. In previous paper, we proposed a Case Based Framework for weight management counseling to obese children. In this paper, three Data Mining techniques: Nearest Neighborhood, Decision Tree and Bayesian Classification, were applied on distributed case bases for Case retrieval and Case adaptation. In section 2, we present an introduction to Obesity and Weight Management followed by an introduction to CBR. Section 3 describes the system architecture. In section 4, we detail our proposed expert system. Our plan for future work is presented in section 5. 2. Literature Review 2.1 Obesity And Weight Management Counseling Obesity is defined as an excessive accumulation of body fat. Pediatric obesity has multiple causes around an imbalance between energy in (calories obtained from food) and energy out (calories expended in the basal metabolic rate and physical activity) [5]. Among the factors influencing obesity, family history of obesity, snacking of high-energy foods & lack of physical activity were found to be the important influencing factors. Increase physical activity like playing outdoor games, walking, cycling, health education regarding dietary habit and sedentary life style should be encouraged in children. 2.2 Case Based Reasoning Knowledge Management (KM) can be defined as a discipline for realizing an integrated approach to managing and sharing the overall information, stored in databases and in documents or may be represented by the unarticulated experience of individuals. When dealing with medical information management, one of the most effective is Case Based Reasoning [10]. Figure 1: R 4 cycle A case-based reasoning (CBR) system adapts old solutions to meet new demands, explains and critiques new situations using old instances (called cases) and performs reasoning from precedents to interpret new RETAIN integrate in case-base RETRIEVE find similar problems C C B B R R REUSE propose solutions from retrieved cases REVISE adapt and repair proposed solution First International Conference on Emerging Trends in Engineering and Technology 978-0-7695-3267-7/08 $25.00 © 2008 IEEE DOI 10.1109/ICETET.2008.239 691

[IEEE 2008 First International Conference on Emerging Trends in Engineering and Technology - Nagpur, Maharashtra, India (2008.07.16-2008.07.18)] 2008 First International Conference

  • Upload
    jyothi

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2008 First International Conference on Emerging Trends in Engineering and Technology - Nagpur, Maharashtra, India (2008.07.16-2008.07.18)] 2008 First International Conference

Usage of Nearest Neighborhood, Decision Tree and Bayesian Classification Techniques in Development of Weight Management Counseling System

Sunita Soni Reader,BIT, Durg(C.G.-491001)-India

[email protected]

Jyothi Pillai Sr. Lecturer BIT Durg(C.G.491001)-India

[email protected]

Abstract

Case based Reasoning (CBR) is an approach for solving a new problem by remembering a previous similar situation and by reusing information and knowledge of that situation. Selection and generation of cases are two important components of a CBR system. Obesity is one of the most significant public health problems facing the whole world. Children have been weighing progressively more since the 1970s, the first phase of the obesity epidemic that now has entered a second phase of serious health problems related to overweight, including diabetes, certain types of cancer, and cardiovascular disease. Proper counseling on nutrition and appropriate physical activity can control the problem of obesity. In previous paper, we proposed a Case Based Framework for weight management counseling to obese children. In this paper, three Data Mining techniques: Nearest Neighborhood, Decision Tree and Bayesian Classification, were applied on distributed case bases for Case retrieval and Case adaptation. Keywords- Case Based Reasoning, Obesity, Nearest Neighborhood, Decision Tree, Classification, Euclidean Distance, Data Mining. 1. Introduction Family history of obesity, snacking of high-energy foods and lack of physical activity were the important influencing factors of obesity. 50-80% of obese children will continue as obese adults and falls into risk group of Diabetes, Hypertension, Coronary Heart Diseases and many more obesity related diseases. Case-based reasoning tries to solve new problems by reusing solutions that were applied to past similar problems and uses results to fit a new problem situation. In previous paper, we proposed a Case Based Framework for weight management counseling to obese children. In this paper, three Data Mining techniques: Nearest Neighborhood, Decision Tree and

Bayesian Classification, were applied on distributed case bases for Case retrieval and Case adaptation. In section 2, we present an introduction to Obesity and Weight Management followed by an introduction to CBR. Section 3 describes the system architecture. In section 4, we detail our proposed expert system. Our plan for future work is presented in section 5. 2. Literature Review 2.1 Obesity And Weight Management Counseling Obesity is defined as an excessive accumulation of body fat. Pediatric obesity has multiple causes around an imbalance between energy in (calories obtained from food) and energy out (calories expended in the basal metabolic rate and physical activity) [5]. Among the factors influencing obesity, family history of obesity, snacking of high-energy foods & lack of physical activity were found to be the important influencing factors. Increase physical activity like playing outdoor games, walking, cycling, health education regarding dietary habit and sedentary life style should be encouraged in children. 2.2 Case Based Reasoning Knowledge Management (KM) can be defined as a discipline for realizing an integrated approach to managing and sharing the overall information, stored in databases and in documents or may be represented by the unarticulated experience of individuals. When dealing with medical information management, one of the most effective is Case Based Reasoning [10].

Figure 1: R4 cycle A case-based reasoning (CBR) system adapts old solutions to meet new demands, explains and critiques new situations using old instances (called cases) and performs reasoning from precedents to interpret new

RREETTAAIINN integrate in

case-base

RREETTRRIIEEVVEE find similar problems

CCCBBBRRRRREEUUSSEE propose solutions from retrieved cases RREEVVIISSEE

adapt and repair proposed solution

First International Conference on Emerging Trends in Engineering and Technology

978-0-7695-3267-7/08 $25.00 © 2008 IEEE

DOI 10.1109/ICETET.2008.239

691

Page 2: [IEEE 2008 First International Conference on Emerging Trends in Engineering and Technology - Nagpur, Maharashtra, India (2008.07.16-2008.07.18)] 2008 First International Conference

problems. A general CBR is described by R4 Cycle, following four processes, as mentioned in Figure 1 2.3 Nearest Neighborhood Technique One of the important techniques in Classification is Nearest Neighborhood technique based on Euclidean distance, which is mathematically represented as- n Distance = sqrt (Σ (wi – ui)2) i = 1 where wi is the weight of the values of the attribute xi of the case stored in case library and ui is the weight of the attribute xi of the new case. The similarity between the attributes is computed on the base of the numerical representation of these attributes. 2.4 Decision Tree Decision tree analysis has long been used when a multi-stage decision is involved. The most widely used induction algorithm is ID3, which builds a decision tree from a database of cases using an information theoretic approach and chooses split attribute(best attribute) with the highest information gain. 2.5 Bayesian Theory Bayesian Classification is a probabilistic technique of pattern recognition and is based probabilistic terms. The Naive Bayes method is a method of classification applicable to categorical data, based on Bayes theorem. The Bayesian Classification technique determines the probability that an example in the test set belongs to a particular class, with the highest probability. Given: the classified samples set D (the domain for fruits) • X: a sample with unknown class (red and round) • H: some hypothesis(eg. X belongs to class ‘Apple’) • P(X|H): posteriori probability of X conditioned on H

(given X is apple, probability that X is red and round)

• P(H): prior probability of H (the probability that any given sample is apple)

• P(X): prior probability of X (the probability that any given sample is red and round)

For classification problem: determine P(H|X) --- the probability that H holds given the observed sample X. By Bayesian Theorem -

3. System Architecture This article integrates DM and CBR techniques for Weight Management Counseling. The architecture of proposed system is shown in Figure 2, which includes

two phases: a Knowledge management module and Case Based Reasoning Module. In the Knowledge Management Module, the relevant data sets are collected, cleaned and preprocessed for removing discrepancies and inconsistencies to improve its quality [12]. ID3 and Naïve Bayesian Classification are used to classify the past cases. In the Case Based Reasoning Module, inputting a new case of obese child will trigger the system for weight management counseling by using the rules discovered in knowledge management module. Then, the reasoning module of the CBR mechanism will use DM techniques (Nearest Neighborhood, Decision Tree and Bayesian Classification), to seek the most similar case.

Figure 2: The architecture of proposed system

4. Modules Of Case-Based Expert System For Weight Management Counseling 4.1 Knowledge Management Module This module includes following components: - 4.1.1 Data Repository. The data repository consists of the domain knowledge in Weight Management Counseling, which comprises of BMI chart, expert advice, etc 4.1.2 Knowledge Miner. In this paper, we have applied three Data Mining techniques: Nearest Neighborhood, Decision Tree and Bayesian Classification, on distributed case bases for Case retrieval and Case adaptation. In retrieval process, firstly Euclidean Distance was used in CBR to calculate similarity degree between all former cases and new case and retrieves the most similar one. Next, ID3 was used which builds a decision tree from a database of cases. Finally, Naïve Bayesian classifier was applied to classify cases and retrieve similar cases.

Expert Dietician

KNOWLEDGEMANAGEMENT

MODULE

Databases External Resources

CASE BASED REASONING

MODULE

Knowledge Miner

CASE BUILDER

CASE BASES

Case Retrieval

Case Adaptation

New Case Formation

WEIGHT MANAGEMENT COUNSELLING Patient

Distributed Case bases

692

Page 3: [IEEE 2008 First International Conference on Emerging Trends in Engineering and Technology - Nagpur, Maharashtra, India (2008.07.16-2008.07.18)] 2008 First International Conference

4.1.3 Case Based Rule Structure. The Case bases are the repositories representing the collection of obesity and weight management knowledge, which are used by the case builder to automatically create new case bases. 4.2 Case Based Reasoning Module The reasoning procedure of Case Builder (main component of this module) is listed below [4]:- 4.2.1.Input New Case. Input a new case data of obese child, which will trigger proposed system. 4.2.2. Case Retrieval - CBR retrieves old cases, where each case from case base provides an alternative solution and a prediction of possible outcomes for the problem. Consider the new case described in Table 1. By Euclidean distance formula for case retrieval, it can be verified that new case, Cnew, is similar to the cases, C18 , C27 , C35 , C54 , C88 of case library. Again consider the Table 1. By using ID3 technique, the solution for new case is S2, which is derived from the decision tree shown in Figure 3. Finally, once again consider the new case described in Table 1. By using Bayesian Classification technique and Table 2- Probability of S1, S3 and S4 is 0 but Probability of S2 is 1 i.e. for the new case C_new, S2 is the more likely solution than S1, S3 and S4. 4.2.3.Case Adaptation. The purpose of case adaptation is to modify the retrieved case to solve the problems of the new case. 4.2.4. Revise Case - When the solutions of the new case are not suitable for the new case, revisions can be conducted. 4.2.5. Save Case - Save the case into case base to enhance the completeness and to consolidate the self-learning mechanism of the system

5. Conclusion As the results obtained by using various Data Mining techniques on test data in the Case retrieval process of CBR, ID3 and Naïve Bayesian were found to be giving more precise results as compared to Euclidean Distance method. Our future work will include testing and implementing CBR module with real data and comparison of more machine-learning techniques, such as neural networks, genetic algorithms and so forth. 6. Acknowledgement The authors wish their heart felt gratitude to the Management, Bhilai Institute of Technology, Durg,

India for their inspiring encouragement and support towards the completion of the work.

Table 1. Case data Case Diet Food

typePhys-ical

Acti-vity

Fam-ily

Exer-cise

Euclidean Distance

Sugge-stion

C2 3 2 3 4 1 2.645751311 S2C5 3 2 1 4 3 2.645751311 S1,S2C9 3 2 1 4 1 1.732050808 S2C10 3 1 3 4 3 3.16227766 S1,S2C12 3 1 2 4 3 2.645751311 S4C14 1 1 2 4 1 1.732050808 S3C18 3 1 1 4 1 1.414213562 S2C23 3 2 2 3 2 2 S4C27 3 2 1 3 1 1.414213562 S2C29 3 1 3 3 2 2.449489743 S4C35 3 1 1 3 2 1.414213562 S4C38 3 2 3 2 2 2.828427125 S4C47 3 1 3 2 2 2.645751311 S4C54 1 1 1 2 1 1.414213562 S3C69 3 1 2 1 1 2.449489743 S2C73 1 2 1 3 2 1.732050808 S2C85 1 2 2 3 3 2.645751311 S4C88 2 1 2 4 1 1.414213562 S3C96 2 1 3 4 3 3 S1

C100 2 2 1 1 1 2.236067977 S2

C_ new

2 1 1 3 1 ?

7. References [1] Daniel D. Wu, Rosina Weber, and Fredric D. Abramson:A case based-frame Framework for Leveraging Neutrigenomics Knowledge and Personalized Nutrition Counseling. The 2004 European Conference in Case-based Reasoning. [2] David W. Aha: The Omnipresence of Case-Based Reasoning in Science and Application. Navy center for Applied Research in Artificial Intelligence. [3] Yoon-Joo Park, Byung-Chun Kim and Se-Hak Chun: New Knowledge extraction technique using probability for case-Based reasoning: application to medical diagnosis. Expert Journal compilation 2006 Vol 23, No.1 [4] Rrainer Schmidt, Olga Vorobieva, Lothar Gierl Case Based Adaptation problems in medicine. [5] Adolf J. Ariza, Helen J. Binns: Childhood nutritional status: ongoing surveillance is necessary: SWISS MED WKLY 2004. [6] David B. Leake, Andrew Kinley and David Wilson: Learning to Improve Case Adaptation by Introspective Reasoning and CBR. Proceedings of the First International Conference on Case Based Reasoning, Sesimbra, , 1995.

693

Page 4: [IEEE 2008 First International Conference on Emerging Trends in Engineering and Technology - Nagpur, Maharashtra, India (2008.07.16-2008.07.18)] 2008 First International Conference

[7] Micheal B. Zimmermann, Crolyn Giibeli, Claudia Piintener, Luciano Molinari: Overweight and Obesity in 6-12 year old children in Switzerland: SWISS MED WKLY 2004. [8] Schmidt. R. Montani, S. Bellazzi, R.Portinale, L. Gierl L.: Case-Based Reasoning for Medical Knowledge-Based systems, Int.J.Med. Info.,2001 [9] Seitz, A.Uhrmachar, A.M.Damm, D.:Case- Based Prediction in Experimental Medical Studies. Artificial Intelligence in medicine, 1999. Figure 3. The classification tree for obesity

[10] Mu-Jung Huang, Mu-Yen Chen, Show-Chin Lee: Integrating data mining with case-based reasoning for chronic disease prognosis and diagnosis: Science Direct – Expert systems with applications 32(2007) 856-857. [11] Hyunchal Ahn, Kyoung-jae Kim, Ingoo Han: A case-based reasoning system with two-dimensional reduction technique for customer classification: Science Direct – Expert systems with applications 32(2007) 1011-1019. .

Table 2. The case data with counts and probabilities

S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4

Over- eating

0 5 0 8 Fatty Food

0 5 0 4 Indoor 1 1 0 4 Mother Obese

0 2 0 4No

Exer_ cise

1 0 0 4 1 7 3 9

Normal Diet 0 1 2 1

Normal Food 1 2 3 5

Out_ door

Irreg_ ular

0 1 2 3Father Obese 0 0 1 2 Irreg_

ular 0 1 0 5

Balanc_ ed Diet 1 1 1 0

Out_ door

Regu_ lar

0 5 1 2Both Obese 1 3 2 3

Reg_ular 0 6 3 0

Noone 0 2 0 0

Over- eating

0/1 5/7 0/3 8/9 Fatty Food

0/15/70/34/9 Indoor 1/1 1/7 0/3 4/9 Mother Obese

0/1 2/7 0/3 4/9No

Exer_ cise

1/1 0/7 0/3 4/9

Normal Diet 0/1 1/7 2/3 1/9

Normal Food 1/12/73/35/9

Out_ door

Irreg_ ular

0/1 1/7 2/3 3/9Father Obese 0/1 0/7 1/3 2/9

Irreg_ ular 0/1 1/7 0/3 5/9

Balanc_ ed Diet 1/1 1/7 1/3 0/9

Out_ door

Regu_ lar

0/1 5/7 1/3 2/9Both Obese 1/1 3/7 2/3 3/9

Reg_ular 0/1 6/7 3/3 0/9

Noone 0/1 2/7 0/3 0/9

Exercise Sugge_ stion

Diet Food type Physical Activity Family

S2 S3

Outdoor Irregular

Outdoor Regular

Balanced S3

Physical Activity

NoRegular Irregular

Diet Normal Outdoor

Regular

Only Indoor

Outdoor Irregular

Outdoor Regular

Only Indoor

Diet

Balanced Overeating

Family

Both Parents Obese

Outdoor Irregular Mother

Obese

S2 S1,S2 S1,S2 S1,S2 S1,S2 S1,S2 S1

PhysicalActivity

Exercise

Family

Family

Physical Activity

Father Obese

Mother Obese

S2

694