5
Using Association Rules to Identify Root Causes of CRD in Broilers Suthathip Maneewongvatana, Songrit Maneewongvatana, Tanate Lojitamnuay, Mallika Juthasong Department of Computer Engineering King Mongkut’s University of Technology Thonburi Bangkok, Thailand [email protected], [email protected], [email protected], [email protected] Abstract— Chronic Respiratory Disease (CRD) in broilers poses some management challenges for the poultry meat industry since the disease discovery often comes too late. Raw meat from infected chickens must be excluded from being processed into cooked poultry product. Many factors from raising process are considered to contribute to such disease. Thus, finding root causes of CRD would help the farmers to have better control and prevent disease and also help the processing facilities to have a better estimation of raw meat amount. In this paper, we apply associate rules technique to identify the root causes of CRD. Possible factors were first identified by skilled veterinarians. Then collected data from several farms were used to find sets of factors that could potentially be the root causes. From the results we can identify set of general factors that are common root causes on all farms as well as sets of farm-dependent factors. Keywords—CRD; Broiler Farm Industry; Association Rules I. INTRODUCTION Thailand is known as one of the world’s kitchens that exports the agriculture goods and processed food [1]. Annual production capacity of cooked poultry products in Thailand is also increasing in the forms of ready-to-eat and Halal food. Therefore, there is rapid growth of broiler farm business in every region of Thailand. To control the standard of each farm, the Department of Livestock Development (DLD), has launched a set of standards for broiler farms to control the management of house, environment and hygiene in the farm [2]. Although there are standard procedures focusing on preventing the serious disease and anomaly that can be occurred in the broiler raising cycle, we still found a large amount of losses from unobservable disease (subclinical symptom). One major problem of rejected chickens is CRD (Chronic Respiratory Disease). This disease is the state that causes animals in the house sick and their entrails are infected. The CRD infected chickens were detected and rejected only when they are transferred to the plucking station at slaughterhouse, therefore farmers cannot define preventive mechanism and they cannot predict the number of infected chickens in time. The large number of rejected chickens not only affects to farmers’ profit but also causes a problem in the supply chain for cooked poultry product. There will be insufficient raw meat supplied to the cooked poultry production. This problem lowers the trust level of the food manufacturer. To alleviate the loss due to enormous number of rejections, many broiler industries try to find the ways to control the number of CRD infected chickens by controlling the significant factors that affects to the infection. Unfortunately, there is no substantial report and study about the main effects of infection in broiler farm. One reason of this problem is farms in different regions have different environment such as weather condition, humidity, water condition, chicken’s density, and skill of caretaker, therefore we cannot use the same standard to manage different farms. This work proposed a method to analyze the data normally collected throughout the raising cycle in every broiler farm. We apply a data mining algorithm for analyzing historical data to determine the important factors that may affect a large number of CRD infected chickens. Various parameters taken from different documents were analyzed by skilled veterinarian to identify the important factors that are assumed to relevant to the occurrence of CRD and then the association rules representing the effect of different factors on the number of infected chickens are generated by Apriori algorithm. The same method can be adapted and applied to many different farms based on their collected data to identify their farm specific causes of CRD problem. The goal of this work is to identify both general and farm-dependent causes of CRD in broilers. The data used in the experiment in this work is gathered from the real broiler farms in different regions of Thailand. First, we determined the important general factors from all farms and secondly, we applied the same method to analyze the farm-dependent causes from data of each farm. The factors discovered from the experiment will be further analyzed by skilled veterinarian to obtain the possibility and support information. The list of possible factors and recommended prevention from an expert will be used to construct an expert system to support broiler farmers in the future. II. LITERATURE REVIEW There are many computer applications supporting farmers’ decision in complicated processes. In [3], an on-line intelligent diagnosis system for fish diseases is developed. This system collected various fish disease information, recommendation for treatments, and prevention obtained from domain experts. However, there are many problem domains that we cannot identify root causes of a problem directly. Therefore the statistical methods were applied to analyze the root causes of a certain problem from historical data. 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) 978-1-4799-0806-6/13/$31.00 ©2013 IEEE 206

[IEEE 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) - Khon Kaen, Thailand (2013.05.29-2013.05.31)] The 2013 10th International Joint

  • Upload
    mallika

  • View
    216

  • Download
    4

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) - Khon Kaen, Thailand (2013.05.29-2013.05.31)] The 2013 10th International Joint

Using Association Rules to Identify Root Causes of CRD in Broilers

Suthathip Maneewongvatana, Songrit Maneewongvatana, Tanate Lojitamnuay, Mallika Juthasong Department of Computer Engineering

King Mongkut’s University of Technology Thonburi Bangkok, Thailand

[email protected], [email protected], [email protected], [email protected]

Abstract— Chronic Respiratory Disease (CRD) in broilers poses some management challenges for the poultry meat industry since the disease discovery often comes too late. Raw meat from infected chickens must be excluded from being processed into cooked poultry product. Many factors from raising process are considered to contribute to such disease. Thus, finding root causes of CRD would help the farmers to have better control and prevent disease and also help the processing facilities to have a better estimation of raw meat amount. In this paper, we apply associate rules technique to identify the root causes of CRD. Possible factors were first identified by skilled veterinarians. Then collected data from several farms were used to find sets of factors that could potentially be the root causes. From the results we can identify set of general factors that are common root causes on all farms as well as sets of farm-dependent factors.

Keywords—CRD; Broiler Farm Industry; Association Rules

I. INTRODUCTION Thailand is known as one of the world’s kitchens that

exports the agriculture goods and processed food [1]. Annual production capacity of cooked poultry products in Thailand is also increasing in the forms of ready-to-eat and Halal food. Therefore, there is rapid growth of broiler farm business in every region of Thailand. To control the standard of each farm, the Department of Livestock Development (DLD), has launched a set of standards for broiler farms to control the management of house, environment and hygiene in the farm [2]. Although there are standard procedures focusing on preventing the serious disease and anomaly that can be occurred in the broiler raising cycle, we still found a large amount of losses from unobservable disease (subclinical symptom). One major problem of rejected chickens is CRD (Chronic Respiratory Disease). This disease is the state that causes animals in the house sick and their entrails are infected. The CRD infected chickens were detected and rejected only when they are transferred to the plucking station at slaughterhouse, therefore farmers cannot define preventive mechanism and they cannot predict the number of infected chickens in time. The large number of rejected chickens not only affects to farmers’ profit but also causes a problem in the supply chain for cooked poultry product. There will be insufficient raw meat supplied to the cooked poultry production. This problem lowers the trust level of the food manufacturer.

To alleviate the loss due to enormous number of rejections, many broiler industries try to find the ways to control the

number of CRD infected chickens by controlling the significant factors that affects to the infection. Unfortunately, there is no substantial report and study about the main effects of infection in broiler farm. One reason of this problem is farms in different regions have different environment such as weather condition, humidity, water condition, chicken’s density, and skill of caretaker, therefore we cannot use the same standard to manage different farms.

This work proposed a method to analyze the data normally collected throughout the raising cycle in every broiler farm. We apply a data mining algorithm for analyzing historical data to determine the important factors that may affect a large number of CRD infected chickens. Various parameters taken from different documents were analyzed by skilled veterinarian to identify the important factors that are assumed to relevant to the occurrence of CRD and then the association rules representing the effect of different factors on the number of infected chickens are generated by Apriori algorithm. The same method can be adapted and applied to many different farms based on their collected data to identify their farm specific causes of CRD problem. The goal of this work is to identify both general and farm-dependent causes of CRD in broilers.

The data used in the experiment in this work is gathered from the real broiler farms in different regions of Thailand. First, we determined the important general factors from all farms and secondly, we applied the same method to analyze the farm-dependent causes from data of each farm. The factors discovered from the experiment will be further analyzed by skilled veterinarian to obtain the possibility and support information. The list of possible factors and recommended prevention from an expert will be used to construct an expert system to support broiler farmers in the future.

II. LITERATURE REVIEW There are many computer applications supporting farmers’

decision in complicated processes. In [3], an on-line intelligent diagnosis system for fish diseases is developed. This system collected various fish disease information, recommendation for treatments, and prevention obtained from domain experts. However, there are many problem domains that we cannot identify root causes of a problem directly. Therefore the statistical methods were applied to analyze the root causes of a certain problem from historical data.

2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)

978-1-4799-0806-6/13/$31.00 ©2013 IEEE 206

Page 2: [IEEE 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) - Khon Kaen, Thailand (2013.05.29-2013.05.31)] The 2013 10th International Joint

There are many statistical and data mining algorithms used to identify the causes of problem in a certain process. Oliva et al. [4], proposed the root cause analysis of network anomalies by using Apriori frequent itemset mining. This method can detect anomalies that repeatedly occur from the similar causes such as from the same IP address or ports. Bayesian Network is another method widely used in root cause analysis process [5, 6]. For this method, all possible causes of problem are represented as vertices connected in a graph. The relationships between cause and effect vertices are represented as edges with the conditional probability values. The diagnosis process is performed to identify the probability of all parent causes when the evidence problem is observed [7]. The correctness in capturing root cause of a problem depends on the completeness of model structure defined by the domain experts.

The problem of using Bayesian Network for this work is there are many related factors and we cannot model the correct relation network for this domain. Moreover, in the broiler farm’s problem diagnosis, we cannot analysis each factor individually, because different factors may have some correlation. For example, a high density of chickens in a house with all males can be considered as an important factor affecting on the infected chickens but we may not found this problem in the house with high density of all females. Hence, the analysis of association among different factors is more suitable for this problem statement. The association rules generation technique is selected in this experiment.

III. DATA MINING TECHNIQUES

A. Association Rules Association Rule is a popular Data mining process that can be applied to find the causes and effects rules of association between two or more factors on another factor from the large data sample. The example of association rule analysis was originally proposed in the common problem known as market basket transactions for discovering hidden purchase patterns in a large transaction data [9].

The first step to generate the possible rules are finding the frequent itemset. Let I= {i1, i2,…,id} be the set of all items and T={t1, t2,…,tN} be the set of all transactions. Each transaction ti contains a subset of items in I such that ti I. A set of items, A

ti, in each transaction is referred to as an itemset and the frequency of an itemset is the number of transactions that contain that itemset. Apriori [10] is an algorithm for mining association rules by constructing all possible frequent itemsets. This strategy control the exponential growth of candidate itemsets based on the minimum support threshold. Initially, every itemset will be considered and their support are calculated, as in

(1)

Where N is the total number of transactions. After that the candidate item with lower support than the threshold will be discarded. Hence we can eliminate unimportant itemsets and

get the frequent itemsets. Next, the algorithm will iteratively generate new candidates of k-itemsets by using only the frequent itemsets found in the previous step and calculate their support values. The itemsets with lower support value will be eliminated. The algorithm terminates when there are no new frequent itemsets generated. Finally, the association rules are generated from the frequent itemsets. The rules are generated by partitioning frequent item A into two sets that are B and A-B (B A) and calculate the confidence of rule B A-B, as in

(1)

The rules with low confidence may be discarded. Finally, we obtain the association rules with acceptable support and confident levels. An association rule is an implication expression of the form X Y, where X and Y are disjoint itemsets, and X I, and Y I. The importance of an association rule is measured in terms of support and confidence as explained before. The value of support is an important factor to define the probability of a rule’s occurrence, whereas the confidence measures the reliability of the inference made by a rule. For example, the high confidence of the rule X Y means that it is the high possibility that Y will be presented in the transactions containing X.

IV. METHODOLOGY

A. Database Broiler raising information used in this experiment was

taken from the real data recorded in various contract farms of one company. Therefore, the formats of documents among different farms are similar. The total number of farms is 250 from 4 regions of Thailand. We used data collected in the last 3 years (from 2010-2012). The documents collected information related to entire processes of broiler cycle. Each raising cycle of broiler in every farm spans around 40 days (5-7 weeks) after chicks arrive at the house. Chickens from the same source are fed in a closed environment which controls temperature and humidity until they are large enough. Finally, they are captured and shipped to the slaughterhouse. After removing chickens, the houses are cleaned and wait for new flock of chickens. Hence, for each year, a farm can bred several generations of broiler. In this dataset, we have total 12,059 generation records.

For each raising generation, many parameters related to the entire processes of broiler raising cycle (Fig. 1) were collected in different documents. These include the record from hatchery which refers to the source of egg, breeding, and flock of parents. Another important document is the daily record that collects the general information while chickens stay in a house such as location of farm, daily temperature, daily humidity, density, gender of chicken in the house, caretaker, and vaccine usage. Each document is labeled by a chicken code that refers to the source of chicks, raising farm, house number, and generation, therefore we can obtain data of a certain generation from different documents. Finally, at the slaughterhouse, there is a document that records the number of rejected chicken with CRD problem. We can summarize

207

Page 3: [IEEE 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) - Khon Kaen, Thailand (2013.05.29-2013.05.31)] The 2013 10th International Joint

sample of parameters related to different period of broiler raising in Table I.

The evidence variable for this experiment is percentage of infected chickens which is the proportion of number of infected chickens to the total number of chickens in each house.

Fig. 1. Broiler Raising Cycle

B. Parameter Selection Parameter Selection is the method to select suitable choice

of parameters from the data at hand and remove redundant or irrelevant attributes [8]. In this work, there are large numbers of parameters collected from broiler raising documents. To remove the irrelevant ones and increase the speed of data analysis, we need to select only the relevant parameters affecting to the number of infected chickens. We started our selection process by asking skilled veterinarian to select the set of parameters. Finally, we can obtain the set of important parameters as indicated by symbol * in Table.

TABLE I. SAMPLE PAREMETERS COLLECTED FROM ENTIRE RAISING CYCLE

State Parameter Description Data format and Allow valuesa

Eggs production

Parent flock code *

The code to identify flock of parents

Predefined codes

Breeding * Chicken’s breeding. {A,B,C}

Grade Chicken parent’s grade.

Age Chicken parent’s age in weeks.

Integer

Parent Farm* Farm of parents Predefined codes

Hatching Hatchery Hatchery Code Predefined codes

Incubator Type Type of incubator {A,B}

Incubator Number

Incubator number Integer

Infertile Eggs Number of infertile eggs.

Integer

Raising Temperature* Inside-house temperature.

Real

Number of chickens*

Number of chickens in a house.

Integer

Area* House’s area in m2 Real

Farm Type* Farm Standard Type Predefined standard type

Region Farm location Predefined region code

Average weight Average weight of chicken in each flock.

Real

Drug used Quantity of medicine or vaccine used in each house.

Real

Litter moisture Moisture level of litter house’s base.

{A,B,C,D,E}

Gender* Gender of chickens in a house.

{Male, Female, Mix}

Size* Target Size of chickens in a house

{S, M, L}

slaughter Infected chickens

Number of infected chickens found at slaughter house.

Integer

a. Some predefined codes and sensitive allow values have to be altered due to the policy of source owners

C. Data Preprocessing Before analyzing the data, we need to validate, transform

and standardize the obtained parameters by the following steps:

Validate the farm code data, gender code, size code, hatchery code, parent flock code, breeding, and farm standard code.

Adjust some values of temperature measured in Fahrenheit unit to Celsius unit.

Calculate mean and variance of temperature and humidity taken from daily record for an entire cycle and normalize humidity mean and variance to discrete levels.

For humidity mean and variance, there are 10 levels. Each level covers range of 10% humidity. For example, the mean humidity equals to 0%-10% is assigned to level 1 and the mean humidity equals to 70%-80% is assigned to level 8.

Calculate chicken density by dividing the number of chickens in the house by the house’s area. The density is separated into 10 levels as follow :

o Level 1 if density is between 0 - 3.5 chickens/m2

o Level 2 if density is between 3.5 - 7 chickens/m2

o Level 3 if density is between 7 - 10.5 chickens/m2

o Level 4 if density is between 10.5 - 14 chickens/m2

208

Page 4: [IEEE 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) - Khon Kaen, Thailand (2013.05.29-2013.05.31)] The 2013 10th International Joint

o Level 5 if density is between 14 - 17.5 chickens/m2

o Level 6 if density is between 17.5 - 21 chickens/m2

o Level 7 if density is between 21 - 24.5 chickens/m2

o Level 8 if density is between 24.5 - 28 chickens/m2

o Level 9 if density is between 28 - 31.5 chickens/m2

o Level 10 if density is between 31.5 - 35 chickens/m2

Discretize percentage of cull and dead chickens into levels (1-10) where level 1 has lowest percent of infected chickens (0-10%) and level 10 is for highest percentage (90%-100%).

Calculate percentage of infected chicken by dividing the number of infected chickens by the total number of chickens in a house. Percent of infected chickens is separated into 10 levels as follow :

o Level 1 if number of infected chicken is between 0% - 10%

o Level 2 if number of infected chicken is between 10% - 20%

o Level 3 if number of infected chicken is between 20% - 30%

o Level 4 if number of infected chicken is between 30% - 40%

o Level 5 if number of infected chicken is between 40% - 50%

o Level 6 if number of infected chicken is between 50% - 60%

o Level 7 if number of infected chicken is between 60% - 70%

o Level 8 if number of infected chicken is between 70% - 80%

o Level 9 if number of infected chicken is between 80% - 90%

o Level 10 if number of infected chicken is between 90% - 100%

D. Association Rule Generation After data preprocessing, the association rules are generated

by using Weka [8] software. The level of infected chickens was set to be the class association. Because the events of having large infected chickens is rarely, Apriori algorithm with only 0.1 minimum support is established. However, we set up minimum confident value to 0.5 in order to have high confident rules. There are 10 maximum rules generated based on our customization.

We separated the experiment into two levels. For the first level, the general causes of infected chickens can be captured by generating association rules from historical data of all different farms, whereas, in the second level, we generated the farm-dependent rules by analyzing data from different generations of an individual farm.

V. EXPERIMENTAL RESULTS Sample general and farm-dependent association rules generated from Apriori algorithm are presented in Table II and Table III.

TABLE II. SAMPLE OF GENERAL ASSOCIATION RULES

No All Farm

Rulesb Confidence

1 If AvgHighTemp = 31 and ParentFarm = N then InfectedLv = 10 0.63

2 If HumidVar = 1 and Size = L and ParentFarm = N then InfectedLv = 10 0.60

3 If Gender = Male and ParentFarm = N then InfectedLv = 10 0.59

4 If HumidVar = 1 and Hatchery = XXIHC1 and ParentFarm = N then InfectedLv = 10 0.59

5 If HumidVar = 1 and Breeding = A and ParentFarm = N then InfectedLv = 10 0.59

6 If HumidVar = 1 and Hatchery = XXIHC1 and Breeding = A and ParentFarm = N then InfectedLv = 10

0.57

7 If HighTempVar = 2 and Hatchery = XXIHC1 and ParentFarm = N then InfectedLv = 10

0.57

8 If HighTempVar = 2 and Breeding = A and ParentFarm = N then InfectedLv = 10 0.57

9 If HighTempVar = 2 and Hatchery = XXIHC1 and Breeding = A and ParentFarm = N then InfectedLv = 10

0.57

10 If HumidityVar = 1 and Density = 4 and ParentFarm = N then InfectedLV= 10

0.56

b. Some predefined codes and sensitive allow values have to be altered due to the policy of source owners

TABLE III. SAMPLE OF FARM-DEPENDENT ASSOCIATION RULES

No Farm 1

Rulesc Confidence

1 If LowTempAvg. = 26 then InfectedLv = 10 1

2 If LowTempAvg. = 26 and LowTempVar = 3 then InfectedLv = 10 1

3 If LowTempAvg. = 26 and Farm Type = Standard then InfectedLv = 10 1

4 If LowTempAvg = 26 and Breeding = A then InfectedLv = 10 1

5 If LowTempAvg = 28 and Density = 4 then Infected = 6 1

6 If LowTempAvg = 28 and Size = L then InfectedLv= 6 1

No Farm 2

Rules Confidence

1 If HighTempVar = 2 and Density = 3 and Breeding = A then InfectedLv = 10 0.83

3 If HumidAvg = 8 and HighTempVar = 2 and Density = 3 and Breeding = A then InfectedLv = 10

0.83

209

Page 5: [IEEE 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) - Khon Kaen, Thailand (2013.05.29-2013.05.31)] The 2013 10th International Joint

4 If HighTempVar = 2 and HumidVar = 0 and Density = 3 and Breeding = A then InfectedLv = 10

0.83

5 If HighTempVar = 2 and Density = 3 and Farm Type = Standard and Breeding = A then InfectedLv = 10

0.83

6

If HumidityAvg = 8 and HighTempVar = 2 and HumidVar = 0 and Density = 3 and Breeding = A then InfectedLv = 10

0.83

7 If HumidityAvg = 8 and HighTempLv = 2 and Farm Type = Standard and Density = 3 and Breeding = A then InfectedLv = 10

0.83

c. Some predefined codes and sensitive allow values have to be altered due to the policy of source owners

The rules obtained from Apriori algorithm shown in Table II and III consist of cause part and effect part. For example, the first rule of Table II can be interpreted as: if the condition in a house having average high temperature equals to 31 C and chickens in this house are produced by parents from farm N, it is 63% of chances that the 90%-100% chickens in this flock will be infected.

VI. DISCUSSION The results of analyzing historical data by Apriori algorithm

show that we have different set of rules when applying the same method to the data from all sample farms and specific farms. For example we found that an average low temperature equals to 26 C are assumed to be one of causes for infected chickens in farm 1, but we do not found this result in farm 2. Hence, we can suggest that using association rules for different level of analysis as we performed in our experiment can reveal general causes and farm-dependent causes of CRD.

The results of this experiment is roughly reviewed and supported by the skilled husbandman. However, we still need to assemble new set of time series data of a certain factor such as using temperature from different raising period instead of analyzing only its mean value. Moreover, we should apply other parameters rather than the set selected by the expert to the analysis process to obtain the unexpected factors affecting to the CRD infected.

Farmers can use the result from this analysis as the preliminary examination for CRD problem and setup the suitable preventive and healing mechanism. Moreover, we can extend this result to the implementation of an expert system that recommends the procedures to relief CRD problem in different broiler farms.

VII. CONCLUSION In this work we proposed the application of association

rules in identifying causes of CRD in broilers which is the major problem in broiler farm industry and cooked-chicken food manufacturing. The real data of entire broiler raising cycle starting from egg production, hatching, raising, and slaughtering are obtained from different farms in different regions of Thailand. The general association rules and farm-dependent association rules are generated by analyzing data from all farms and specific farm, respectively. These results reveal the possible global and farm-dependent causes of CRD.

ACKNOWLEDGMENT We would like to thank the sources of broiler records data,

although, we cannot present the sources of data due to privacy policy.

REFERENCES

[1] Thailand, Kitchen of the World (Online). Available : http://www.prnewswire.com/news-releases/thailand-kitchen-of-the-world-128022523.html.

[2] Broiler Association. Broiler farm standard in Thailand Today (Online). Available : http://www.broilerassociation.or.th/index.php?p=foodsafety_detail&lang=en&id=1.

[3] L. Daoliang , F. Zetian, D. Yanqing, Fish-Expert: a web-based expert system for fish disease diagnosis, Expert Systems with Applications 23, 2002, pp.311–320.

[4] I. Paredes-Oliva, X. Dimitropoulos, M. Molina, P. Barlet-Ros and D. Brauckhoff, Automating Root-Cause Analysis of Network Anomalies, using Frequent Itemset Mining, SIGCOMM’10, August 30–September 3, 2010, New Delhi, India, 2010, pp. 467-468.

[5] A. Alaeddini, I. Dogan, Using Bayesian networks for root cause analysis in statistical process control, Expert Systems with Applications 38, 2011, pp.11230–11243.

[6] S. Dey, J.A. Stori, A Bayesian network approach to root cause diagnosis of process variations, International Journal of Machine Tools & Manufacture 45, 2005, pp.5–91.

[7] I. Ben-Gal, Bayesian Networks, in Ruggeri F., Faltin F. & Kenett R.,Encyclopedia of Statistics in Quality & Reliability, Wiley & Sons, 2007.

[8] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second Edition, Elservier, 2005.

[9] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition, Elesevier, 2006.

[10] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.

210