An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

  • Upload
    a10b11

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    1/12

    An efcient mining algorithm for maximal weighted frequent patternsin transactional databases

    Unil Yun a , Hyeonil Shin b , Keun Ho Ryu a , EunChul Yoon c,

    a Department of Computer Science, College of Electrical & Computer Engineering, Chungbuk National University, South Koreab Car Infotainment Research Department, LG Electronics, South Koreac Department of Electronics Engineering, Konkuk University, South Korea

    a r t i c l e i n f o

    Article history:Received 18 April 2011Received in revised form 1 February 2012Accepted 5 February 2012Available online 21 February 2012

    Keywords:Data miningWeighted frequent pattern miningMaximal frequent pattern miningVertical bitmapPrex tree

    a b s t r a c t

    In the eld of data mining, there have been many studies on mining frequent patterns due to its broadapplications in mining association rules, correlations, sequential patterns, constraint-based frequent pat-terns, graph patterns, emerging patterns, and many other data mining tasks. We present a new algorithmfor mining maximal weighted frequent patterns from a transactional database. Our mining paradigmprunes unimportant patterns and reduces the size of the search space. However, maintaining the anti-monotone property without loss of information should be considered, and thus our algorithm prunesweighted infrequent patterns and uses a prex-tree with weight-descending order. In comparison, a pre-vious algorithm, MAFIA, exponentially scales to the longest pattern length. Our algorithm outperformedMAFIA in a thorough experimental analysis on real data. In addition, our algorithm is more efcient andscalable.

    2012 Elsevier B.V. All rights reserved.

    1. Introduction

    Data mining is dened as the process of non-trivial extraction of previously unknown and potentially useful information from datastored in databases [9,16,20,23,33] . Data mining is used to nd pat-terns (or itemsets) hidden within data, and associations among thepatterns. In particular, frequent pattern mining plays an essentialrole in many data mining tasks such as mining association rules[1] , interesting measures [3,26] , correlations [22,34] , sequentialpatterns [8,29,41,42] , constraint-based frequent patterns [5,40] ,graph patterns [35] , emerging patterns [11,19,27] and approximatepatterns [39] . Mining information and knowledge from very largedatabases is not easy since it takes a long time to process largedatasets and the amount of discovered knowledge, and becausethe number of patterns can be signicant and redundant. Frequentpattern mining is used to discover a complete set of frequent pat-terns in a transaction database with minimum support. Frequentpatterns have a well-known anti-monotone property [1] : if a pat-tern is infrequent, all of its super patterns must be infrequent.According to this property, a long frequent pattern of length x leadsto (2 n 2) shorter non-empty frequent patterns. For instance, if pattern {a,b,c,d} is frequent, all subsets of {a,b,c,d} including{a},{b},{c},{a,b},{a,c}, . . ., and {b,c,d} are also frequent. To avoid

    mining all frequent patterns, we can mine only those that are max-imal frequent patterns [4,43] . A pattern X is a maximal frequentpattern if X is frequent and every proper super pattern of X is infre-quent. The problemof mining maximal frequent patterns is how todiscover the complete set of maximal frequent patterns. Miningmaximal frequent patterns can reduce the size of the search spacedramatically. However, while items have different importance inreality, this characteristic is still not considered. For this reason,weighted frequent pattern mining algorithms [2,7,31] have beensuggested. Weight based pattern mining is so powerful that itnot only reduces the size of the search space but also mines moreimportant patterns. The main focus in weighted frequent patternmining is on satisfying the anti-monotone property since thisproperty is generally broken when different weights are appliedto different items. Even if a pattern is weighted as infrequent, itssuper patterns can be weighted as frequent since super patternsof a low weight pattern can receive a high weight after other itemswith higher weight are added. For this reason, previous weightedfrequent pattern mining algorithms [3638] used a maximumweight of the transaction database (or each conditional database)instead of each item weight to maintain the anti-monotoneproperty.

    There are several applications used to apply maximal frequentpattern mining or weighted frequent pattern mining. For example,using maximal frequent patterns with respect to a series of supportthresholds, we can approximate and summarize the support infor-mation of all frequent patterns [28] . As another example, using

    0950-7051/$ - see front matter 2012 Elsevier B.V. All rights reserved.doi: 10.1016/j.knosys.2012.02.002

    Corresponding author. Tel.: +822 450 3349; fax: +822 3437 5235.E-mail addresses: [email protected] (U. Yun), [email protected]

    (H. Shin), [email protected] (K.H. Ryu), [email protected] (E. Yoon).

    Knowledge-Based Systems 33 (2012) 5364

    Contents lists available at SciVerse ScienceDirect

    Knowledge-Based Systems

    j ou rna l h ome pa ge : www.e l s ev i e r. co m / loc a t e /knosys

    http://dx.doi.org/10.1016/j.knosys.2012.02.002mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.knosys.2012.02.002http://www.sciencedirect.com/science/journal/09507051http://www.elsevier.com/locate/knosyshttp://www.elsevier.com/locate/knosyshttp://www.sciencedirect.com/science/journal/09507051http://dx.doi.org/10.1016/j.knosys.2012.02.002mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.knosys.2012.02.002
  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    2/12

    maximal frequent patterns, we can nd positive feature patterns[32] that are mined from the edge images. Negative feature pat-terns are mined from the complements of edge images and usedto prune non-face candidate images. These feature patterns canbe used to effectively train face detectors [32] . Similarly, we cannd emerging patterns [11] that are frequent in positive samplesand infrequent in negative samples. These patterns can be usedto develop effective classiers [19,20] . Additionally, maximal fre-quent patterns can be used to build an effective heart attack pre-diction system [27] . These applications reect the characteristicsof maximal frequent patterns; they can serve as a border betweenfrequent and infrequent patterns. Meanwhile, weighted frequentpattern mining also has several applications. First, item weightscan be used to calculate the cluster allocation prot, which is thesum of the ratio of total occurrence frequency to the cluster of XML documents, as described in [18] . Second, several efcient rec-ommendation systems [10,13] have been devised based on WFPmining. Chen [10] proposed a recommendation system that in-creases prot from cross-selling without losing recommendationaccuracy. Its consideration of product protability for sellers isbased on weighted frequent patterns. Third, alarm weights affectthe accuracy and validity of the results directly in an alarm corre-lation in communication networks, as demonstrated in [21] . Theproper determination of weight reects both the objective infor-mation of the alarm and the subjective judgment of experts. Moreextensions with weight constraints have been developed, such asmining weighted association rules [31] , mining weighted associa-tion rules without pre-assigned weights [30] , mining weightedsequential patterns [41,42] , mining weighted closed patterns[36] , mining frequent patterns with dynamic weights [2] , miningweightedgraphs [24] , mining weightedsub-trees or sub-structures[25] , and mining weighted frequent XML query patterns [15] .However, in the above applications, weighted frequent patternmining discovers important patterns and maximal frequent pat-tern mining extracts fewer patterns by compressing the frequencyinformation. In this paper, we take a systems approach to the prob-lem of mining maximal frequent patterns with weight constraints.We propose an efcient algorithm called MWFIM (MaximalWeighted Frequent Itemset Mining) that mines the entire set of maximal weighted frequent patterns from a transaction database.We conduct an extensive experimental characterization of MWFIMagainst a state-of-the-art maximal pattern mining method, MAFIA[6] . Using some standard machine-learning benchmark datasets,our results indicate that MWFIM outperforms the MAFIA algo-rithm. The main contributions of this paper are as follows: (1)we dene the problem of incorporating maximal frequent patternmining and weighted frequent pattern mining, (2) introduce max-imal weighted frequent pattern mining, (3) propose a pruningtechnique with head weighted support for reducing the size of the search space, and (4) implement our algorithm, MWFIM, and

    conduct an extensive experimental study comparing MWFIM withMAFIA. The remainder of this paper is organized as follows. In Sec-tion 2 , we provide some background information on this subject. InSection 3 , we suggest maximal weighted frequent pattern miningand describe the MWFIM algorithm. Section 4 provides our exten-sive experimental results. Finally, some concluding remarks arepresented in Section 5 .

    2. Background

    Let I = {i1 , i2 , i3 , . . ., in} be a unique set of items. A transactiondatabase, TDB, is a set of transactions in which each transaction,denoted as a tuple htid, X i , contains a unique transaction identier

    (tid) and a set of items ( X ). We call X # I a pattern, and we call X ak-pattern if the cardinality of pattern X is k. The support of a pattern

    is the number of transactions containing that pattern in the TDB.The problem of frequent pattern mining is how to nd a completeset of patterns satisfying minimum support in a TDB. To pruneinfrequent patterns, frequent pattern mining usually uses theanti-monotone property [1] . That is, if pattern X is an infrequentpattern, all super patterns of X must be infrequent. Using this prop-erty, infrequent patterns can be pruned earlier. Frequent patternmining has been studied in the area of data mining due to its broadapplications in mining association rules [1] , interesting measuresor correlations [3,22,26,34] , sequential patterns [8,29,35,41,42] ,constraint-based frequent patterns [5] , graph patterns [35] ,emerging patterns [11,19,27] and other data mining tasks. Theseapproaches have focused on enhancing the efciency of algorithmsin which techniques for search strategies, data structures, and dataformats have been devised. Moreover, constraint-based patternmining approaches [17] have been proposed to address the reduc-tion of uninteresting patterns in terms of user focus. These studieshave focused on how a concise pattern or other important patternsare found. However, there has been no research on mining maxi-mal frequent patterns with weight constraints.

    2.1. Weighted frequent pattern mining

    Themain focus of weighted frequent pattern (WFP) mining con-cerns the anti-monotone property since this property is usuallybroken when different weights are applied to items and patterns.That is, although pattern X is weighted as infrequent, the superpatterns of X can be weighted as frequent. Mining association ruleswith weighted items (MINWAL) [7] denes a weighted support,which is calculated by multiplying the support of a pattern withthat patterns average weight. In weighted association rule mining(WARM), the problem of breaking the anti-monotone property issolved using a weighted support and devising a weighted down-ward closure property. However, the weighted support of pattern{A,B} in WARM is the weight ratio of the transactions containingboth A and B to the weight of all transactions, and thus WARM doesnot consider this support measure. Recently, weighted frequentpattern mining algorithms [39,40] based on the pattern-growthapproach have been developed. Weight-based pattern mining re-duces the size of the search space, but also nds more importantpatterns. The weight of an item is a non-negative real number as-signed to reect the importance of each item in the TDB. Given aset of items, I = {i1 , i2 , i3 , . . ., in}, the weight of a pattern is formallydened as follows:

    Weight P P length pi1 Weight pi

    length P :

    The value achieved when multiplying the support of a pattern withthe weight of the pattern is the weighted support of that pattern.

    That is, given pattern P , the weighted support is dened as WSup- port (P ) = Weight (P ) Support (P ). A pattern is called a weighted fre-quent pattern if the weighted support of the pattern is no less thanthe minimum support threshold.

    2.2. Maximal frequent pattern mining

    Pattern X is the maximal frequent pattern (MFP) if X is frequentand every one of its proper super patterns is infrequent. For in-stance, if pattern {d,g,h} is a frequent pattern and has only oneinfrequent super pattern {d,g,h,f}, then {d,g,h} is a maximal fre-quent pattern. The supports of all frequent patterns are neededto extract the association rules. However, it is often impracticalto generate an entire set of frequent patterns or closed patterns

    when there are very long patterns present in the data [6] . The setof maximal frequent patterns is the smallest possible expression

    54 U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    3/12

    of data that can still be used to extract a set of frequent patterns.Once set FI is generated, the support information can be easilyrecomputed from the transaction database. Burdick [6] proposeda novel MFP mining algorithm, MAFIA (MAximal Frequent ItemsetAlgorithms). This algorithm used a vertical bitmap representation,where the count of a pattern is based on the column in the bitmap.In contrast, the FPmax [12] algorithm uses a horizontal format. Itis also a depth-rst MFP mining algorithm, but uses an FP-treestructure based on the pattern-growth approach. Weighted fre-quent pattern mining discovers important patterns, and maximalfrequent pattern mining extracts fewer patterns by compressingthe frequency information, and thus the maximal frequent patternwith weight-constraint mining can be used to nd fewer butimportant frequent patterns. However, several difcult problemsremain in this incorporation. We describe detailed informationregarding these problems and how to handle them in the nextsection.

    3. MWFIM: Maximal Weighted Frequent Itemset Mining

    3.1. Maximal weighted frequent pattern

    First, we propose maximal frequent pattern mining with weightconstraints. We dene a new joined paradigmfor considering bothparadigms. Our joining order is maximal weighted frequent pat-tern (MWFP) mining. In MWFP mining, weighted frequent patternsare rst found, and maximal frequent patterns are then minedfrom the weighted frequent patterns.

    Denition 3.1. Maximal Weighted Frequent Pattern (MWFP) Apattern is dened as a maximal weighted frequent pattern if thepattern has no weighted frequent superset.

    In this approach, MWFP mining discovers candidate weightedfrequent patterns using MaxW (the maximum weight of the data-base)rst, and maximal frequent patternsare then mined. Tomain-tain theanti-monotone property of MWFP mining, MaxWis used todetermine whether a pattern is an approximate weighted frequentpattern before checking the supersets (whether the pattern has anysupersets weighted as frequent). Finally, some of the maximal fre-quent but weighted infrequent patterns are pruned, and new max-imal frequent patterns are mined as candidate MWFPs. The newmaximal frequent patterns are checked again for lossless MWFPmining. The integrationof a weighted frequent pattern with a max-imal frequent pattern is obvious. The weight constraints have to beconsidered before applying the superset checking. For example,suppose the minimum support threshold is 2, and that the transac-tion database in Table 1 a and its weight range (0.40.8) are used asnormalized weights in Table 1 b. The set of candidate weighted fre-

    quent patterns using the Apriori principle [1] is {{a}, {b}, {c}, {d}, {e},f, {g}, {a,b}, {a,c}, {a,d}, {a,f}, {a,g}, {b,c}, {b,d}, {b,f}, {c,d}, {c,f},{d,f}, {d,g}, {a,b,c}, {a,b,d}, {a,d,f}, {a,c,d}, {a,c,f}, {a,d,g}, {b,c,d},{b,c,f}, {c,d,f,}, {a,b,c,d}, {a,b,c,f}, {a,c,d,f}, {a,b,d,f}, {b,c,d,f}, and{a,b,c,d,f}}, and MaxW is 0.8. Then, the patterns {a,b,c,d,f} and{a,d, g} are maximal since they have no proper supersets. However,neither is a real weighted frequent pattern whose maximumweighted support (MaxW support) is less than a minimum sup-port (2), and thus they must bepruned using real weightedfrequentpattern checking. Consequently, new maximal frequent patterns{{a,b,c,d}, {a,b,c,f}, {a,c,d,f}, {a,b,d,f}, {b,c,d,f}, {a,d}, {a,g}} areminedagain. Among thepatterns, only thepattern {a,b,c,d} is a realweighted frequent pattern, and thus is MWFP. In contrast,{{a,b,c, f}, {a,b,d,f}, {a,c, d,f}, {b,c,d,f}, {a,d}, {a,g}} are pruned and

    generate fewer 1-level subsets. Therefore, new candidate patterns{{a,b,f}, {a,c,f}, {a,d,f}, {b,c,f}, {b,d,f}, {c,d,f}, {g}} are generated

    for mining the remaining MWFPs (subsets of the MWFPs are notgenerated). Pattern {a,c, f} is an MWFP, butothercandidate patternsare not. Finally, the resulting set of MWFPs, {{a,b,c, d}, and {a,c, f}},is ultimately generated. As shown in the above example, to main-tain theanti-monotone property in MWFP mining, we usethe max-imum weight (MaxW) when an item weighted as infrequent mustbe pruned. For this reason, some of the candidate patterns can beexactly weighted as infrequent. Thus, we must check whether thecandidate patterns are exactly weighted as frequent. A problemoc-curs when a candidate pattern is not exactly weighted as frequent,which is that if we remove the candidate pattern weightedas infre-quent, then certain information on other patternsmay be lost. Evenif the candidate is weighted as infrequent, any weighted frequentpatterns that belong to subsets of the candidate can exist, and it

    may thus be possible to nd an MWFP from these subsets. In con-clusion, if a generated candidate pattern is exactly weighted as fre-quent, then we check whether it has a proper weighted frequentsuperset in MWFP mining. Only if it has no proper weighted fre-quent superset, the candidate pattern is an MWFP and can be in-serted into the resulting set of MWFP mining. In contrast, if thecandidate is not exactly weighted as frequent, we must repeatedlycheck whether all the 1-level reduced subsets of the candidate areweighted as frequent. This means that if the candidate pattern of aleaf node is weighted as infrequent when the MWFP mining algo-rithm traverses in depth-rst order for the generation of candidateMWFP patterns, then its parent node must be visited and the nodepattern checked. Our MWFP mining search strategies involve thisadditional handling, and can nd the entire MWFPs of a TDB with-

    out a loss of information.Another consideration of mining MWFP is as follows. While the

    subsets of the MFP are guaranteed to be frequent, some of the sub-sets of MWFP can be weighted as infrequent. For instance, asshown in Fig. 1 , with a minimum support of 2 and a list of three

    Table 1

    Example transaction database and item weights.

    TID Transaction

    (a) Transaction database100 a, b, c, d, f, g200 a, b, c, d, f 300 b, d, e, h, i400 a, d, e, g

    500 a, b, c, d, f, g600 e, f, g, h

    Item Weight Support

    (b) Weight tablea 0.7 4b 0.6 4c 0.8 3d 0.65 5e 0.45 3f 0.5 4g 0.4 4h 0.5 2i 0.45 1

    Fig. 1. A weighted frequent pattern {a,b,c} and its subsets.

    U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364 55

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    4/12

    weighted items (a,b, c), in which {a:0.8,b:0.75,c:0.45}, the pattern{a,b,c} is exactly weighted as frequent since its weighted supportis 2. Moreover, it has no proper superset weighted as frequent,and thus it is an MWFP. However, one of its subsets does not haveexactly weighed support. One pattern, {a, c}, and its weighted sup-port, 1.875, is less than the minimum support. Therefore, as one of the characteristics of an MFP, a complete frequent pattern set thatcan be generated from an MFP cannot be inherited to an MWFP.However, MWFP mining still has wide applications such as miningnegative and positive patterns for classication.

    3.2. Search strategies for MWFP mining

    In this section, we present a conceptual framework of the itemsubset lattice, along with our search strategies for MWFP mining.Assume there is a descending ordering in the total weight 6 WD of items, I in a database. If an item i before an item j occurs in theordering, we denote this occurrence as i 6 WD j. This ordering canbe used to enumerate the item subset lattice or partial orderingover power set S of items I . We dene the partial order 6 on S 1 ,S 2 e S such that S 1 6 S 2 if S 1 # S 2 . Fig. 2 shows a sample of a com-

    plete subset lattice for four items. The top element in the lattice isan empty set (denoted as {} or root ), and each lower level kincludes all k-patterns. The k-level patterns are sorted in weight-descending order on each level, and all children nodes are associ-ated with the earliest subset in the previous level. The patternidentifying each node will be referred to as the head of the node,while possible extensions of the node are named the tail . For exam-ple, consider node P in Fig. 3 . The head of P is {a,b}, and its tail isthe set {c,d}. The search space used in mining maximal weightedfrequent patterns is a prex-tree consisting only of weighted fre-quent items. Specically, approximately weighted frequent itemsare formed to maintain the anti-monotone property. Using prexweights, we can nd such items without any loss of information.

    A prex-tree is a subset lattice, as shown in Fig. 2 . The tail con-

    tains all items weighted as approximately frequent, the weights of which are no larger than any element item weights of the head. Toprune infrequent items, not every exactly weighted support forthese items is needed. Instead, we multiply the weight of the itemshead by its support. Using Lemma 1 , this approximate weightedsupport, W , is always larger than each exactly weighted support.In addition, W is the maximum weighted support of the headssubsets, and thus the anti-monotone property is always main-tained. To mine the maximal weighted frequent patterns fromthe prex-tree, we traverse the tree in depth-rst order. At eachnode P, each element in the tail of the node is generated andcounted as a 1-level extension. If the weighted support of { P shead} [ {1-extention} is less than the minimum support threshold,any super pattern in the sub-tree rooted at { P s head} [ {1-exten-tion} will be weighted as infrequent.

    Lemma 1. In a prex-tree with weight-descending order, the weight of pattern P containing only head items is always equal to or larger than any weight of super patterns containing P.

    Proof. When given items a1 ,a2 ,a3 , . . .,am , ak is sorted in weight-descending order as in w1 P w2 P w 3 P P wn P wk > 0,w11 P

    w1 w22 ) 2w 1 P w 1 w 2 ) w1 P w2 is always true. Thus,

    the weight of pattern { a1} is always equal to or larger than thatof pattern { a1 ,a2}. In addition, the weight of pattern { a1 ,a2} is alsoalways equal to or larger than that of pattern { a1 ,a2 ,a3} sincew1 w2

    2 Pw1 w2 w3

    3 ) 3 w 1 w 2 P 2w 1 w 2 w 3 ) w1 w 2 P2 w3 . h

    The weight of pattern { a1 ,a2 ,a3 , . . .,ak} is (w1 + w2 + w3 + +wn)/ n and is always equal to or larger than that of pattern{ a1 ,a2 ,a3 , . . .,an, ak},

    w1 w2 w3 wn wkn 1 ;

    *w1 w2 w3 wn

    n

    Pw1 w2 w3 wn wk

    n 1

    ) n 1w 1 w 2 w 3 wn 6 nw1 w2 w 3 wn wk) w1 w 2 w 3 wn P nwk and w1 + w2 + w3 + + wnis always equal to or larger than n(wk).

    Fig. 3 shows items with weight-descending order and their pre-x-tree.The weight of pattern {a,b} is always equalto or largerthanthat of it super patterns, {a,b, c} and {a,b, d} by Lemma 1 . As shownin Fig. 3 , the weight of pattern {a,b} is 0.75, and that of {ab,c} is(0.8 + 0.7 + 0.6)/3 = 0.7. The weight of {a,b,d} is (0.8 + 0.7 + 0.5)/3 0.667, and is also equal to or less than that of its parent nodepattern, {a,b}.

    Lemma 2. In a weight-descending prex-tree, if a node pattern isweighted as infrequent, then all of its child nodes are also weighted asinfrequent. Thus, in a depth-rst traversal, if a pattern is exactlyweighted as infrequent, then its sub-tree traversal is stopped and theremaining child nodes are not considered.

    Fig. 2. An example of a subset lattice for four items.

    Fig. 3. An example of a prex-tree with weight-descending order.

    56 U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    5/12

    Proof. Each child node pattern is always a super pattern of itsparent node pattern in a prex-tree. Based on Lemma 1 , each childnode pattern is always equal to or less than its parent node pattern.In addition, the support of a parent pattern is always equal to orlarger than that of its super patterns since, if the levels of patternsincrease, the number of supports decreases. Thus, all patterns of child nodes in a prex-tree of weight-descending order must haveless or equally weighted supports than the weighted support of their parent node pattern. h

    As shown in Fig. 4 , the support ofpattern{a,b}is 5, and thus theweighted support of pattern {a,b} is 0.75 5 = 3.75. If the mini-mum support threshold (min_sup) is 3, pattern {a,b} is weightedas frequent. However, the weighted support of {a, b,c}, 0.7 4 =2.8, is notlarger than min_sup, andthus {a,b, c} is weightedas infre-quent. The weighted support of {a,b,d} is 0.667 3 2, and there-fore {a,b,d} is also weighted as infrequent. It is not necessary tocheck whether pattern {a,b, c,d} is weighted as frequent since it isa child node of weighted infrequent pattern {a,b,c}.

    Lemma 3. In a weight-descending prex-tree, if the head of a node isweighted as infrequent, then node N must be a leaf node.

    Proof. The pattern of N is a subset of N s child node (denoted as C ).Even if the pattern of N is weightedas infrequent, some of its super-sets canbe weighted as frequent. However, if all items are sorted inweight-descending order, the weighted support of C is always lessthan or equal to the weighted support of N s head based on Lemma1. Thus, if the pattern of N is weightedas infrequent, C must also beweighted as infrequent. Consequently, any child nodes of N arealways weightedas infrequent. Further, a sub-tree rooted as N can-not have any weighted frequent patterns. For this reason, node N ,thepatternof whichis weightedas infrequent, is a leaf node. Reach-ing leaf node P in the depth-rst traversal of the tree, we obtain acandidate for the result set of maximal weighted frequent patterns;however, only if a real weighted frequent pattern can be a

    candidate. Meanwhile, a weighted frequent superset of P may havealready been discovered. Therefore, we need to check whether asuperset of candidate pattern P is already contained in the resultset. Only the weighted frequent patterns whose supersets are notweightedas frequent canbe added to theresult set. Thelargestpos-sible frequent pattern contained in the sub-tree rooted at P is H [ T (Head union Tail) of P . As shown in Fig. 3 , as the head of P is {a,b}and the tail is {c,d}, the

    H [ T of

    P is then {a,b,c, d}. If {a,b,c, d}is dis-

    covered to be weighted frequent, it is not necessary to traverse anysubsets of H [ T . Thus, we can prune out the entire sub-tree rootedat node P . The itemsof tails are always sortedin descendingorder of their weights, and the item order is always xed. Thus, there is noneed to reorder items for any sub-trees, and we can omit the spentreordering time. h

    To discover the exact result set of MWFPs, we have to check allMWFPs to ensure that no superset of any pattern has already beendiscovered before adding the pattern to the MWFPs. A progressivefocusing technique [14] was introduced to improve the supersetchecking performance without excessive accesses to all MWFPset. The basic idea is as follows. If the entire MWFP set is large atany given node, only fragments of the MWFP set are possiblesupersets of the pattern at the given node. Thus, we use a localMWFP set, which is the subset of the entire MWFP set and is rele-vant at the node, to effectively check the supersets. In our MWFPmining, the local MWFP set for the root is initialized as the nullset. Suppose that we are examining node K and are about to tra-verse on K n , where K n = K [ {i} (i is each item of N s tail). The localMWFP set for K n contains all of the patterns in the Local MWFP setfor K with the added condition that they also consist of the itemused to extend K when forming K n . After the sub-tree traversal of K n , the local MWFP set containing the MWFPs of K n is inserted intothe global MWFP set. Consequently, candidate patterns are nolonger required to conduct superset checks in the global MWFPset. Instead, the local MWFP set consists of all supersets of the cur-rent node. Thus, if the local MWFP set of a candidate node is empty,then the global MWFP set has no supersets. On the contrary, if thelocal MWFP set is not empty, then a superset will be found in theglobal MWFP set. Our MWFP mining framework in weight-descending order is as follows. First, a candidate pattern can bechecked to determine whether its real weighted support is lessthan or equal to the minimum support threshold (min_sup). If the pattern is genuinely weighted as frequent, its supersets (exten-sions) have the possibility to be genuinely weighted frequent pat-terns. Therefore, the pattern combinations and each approximatelyweighted frequent item of the pattern tail make up its 1-levelextensions. Finally, the real weighted frequent pattern of a leaf node has to be checked to determine whether it has a weighted fre-quent superset.

    Fig. 5. Prex-tree for mining maximal weighted frequent patterns.

    Fig. 4. An example weighted frequent pattern and its prex-tree.

    U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364 57

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    6/12

    3.3. Example for MWFP mining

    In this Section, we show an example of mining maximalweighted frequent pattern (MWFP). Fig. 5 shows an example of prex-tree for maximal weighted frequent pattern (MWFP) miningof the transaction database (TDB) shown in Table 1 . If we supposethat the minimum support threshold (min_sup) is 2, thenthe weight list is ha:0.7,b:0.6,c:0.8,d:0.65,e:0.45,f:0.5,g:0.4,h:0.5,i:0.45 i . We extend each level of the node pattern and traversethe prex-tree in depth-rst order until no more weighted fre-quent patterns can be generated. In our approach, we considerimportant maximal frequent patterns with weight constraints soMWFIM can remove weighted infrequent items such as items hand i. As a result, the prex tree of MWFIM does not storeweighted infrequent items so memory usage can be reduced.

    In this example, when the head of a node is {}, the sortedweightedfrequent itemlist in weight-descending orderwitha max-imumweightof theTDBof 0.8is hc,a,d,b,f,e,g i . Thislist becomesthetail of the root.The real weightedsupport of therst item of thetail,c , is larger than min_sup, and thus the 1-level extension of node {c}canbe generated as childnodes ofnode{c}. Thus,when{c} is a head,

    the available tail is {a,d,b,f}. This tail can be generated since eachitem is weightedas approximately frequent using the weight infor-mationof thehead, {c}.For generating l-levelextensions throughtheunion of head{c} and each itemof the tail, {c} and {a} ofthe tail rstmake {c,a},which is a child node of node {c}. The real weightedsup-port of {c,a} is largerthanmin_sup, and thus therefore {c,a} is a realweightedfrequentpattern.Thus, if the tail of head {c,a} is not {}, wecanthentraversethe childnodes of{c,a}.Tondthe tail,whichis theapproximate weightedsupport of {c, a} [ {i}, each item with a lowerorder thanthe member itemof {c,a} must becalculated.Maintainingthe anti-monotone property, the tail of {c,a} is {d,b, f}, and thus wecan generate a child node of {c,a}, which is {c,a,d}. This pattern,{c,a,d}, is a real weighted frequent pattern, and thus we can ndthetail of thenexthead, {c,a,d}, whichis {b,f}. The1-levelextension

    of {c,a,d,b} can be generated since {c,a,d,b} is a real weighted fre-quent pattern. This extension is {c,a,d,b, f} since the tail of {c,a,d,b} is f. However, {c,a,d,b,f} is not genuinely weighted as fre-quent (0.65 3 = 1.95 < min_sup), and thus nomore extensions arepossible. Pattern {c,a,d,b, f} is just a leaf node. If a leaf node patternhasreal weightedsupport andpasses a supersetcheck to determinewhether it has any real weighted frequent supersets, it can be anMWFP. Nevertheless, {c,a,d,b,f} is not, and thus we have to ndMWFPs from the 4-level subsets of {c,a,d,b,f}. The candidates are{c,a,d,b}, {c,a,d,f}, {c,a,b,f}, {c,d,b,f}, and {a,d,b,f}. One ofthesevecandidates, {c,a,d,b} is the parent node of {c,a,d,b,f}, as shown inFig. 5 , and thus {c,a,d,b} is checked to determine whether it is anMWFPwhenanyof itschildnodes are not MWFPs. Othercandidatesdo not need additionalchecks since they traverse through each sub-tree later. Returning to the parent node {c,a,d,b}, {c,a,d,b} is genu-inely weighted as frequent, and it has weighted frequent supersets.

    Thus, {c,a,d,b} isanMWFP, andis insertedintothe result set. For thenext search, other child nodes of {c,a,d,b}s parent node need to betraversedin depth-rstorder. When{c,a, d}is setto a head, onlypat-tern {c,a,d, f} can be generated for its 1-level extension. However,{c,a,d,f} is not genuinely weighted as frequent (0.6625 3 =1.9875 < min_sup), and thus {c,a,d,f} becomes a leaf node. Thedepth-rst traversal for the sub-tree of {c,a,d} is ended, and oursearch returns to the root of sub-tree {c,a, d}. A child node of {c,a,b} is then an MWFP, and therefore {c,a,d} cannot be an MWFP.We must traverse the sibling nodes of {c,a,d} for the next step. Thenext sibling node of {c,a,d}, {c,a,b}, is genuinely weighted as fre-quent, and thus its child nodes can be generated. When the headofthe nodeis {c,a,b}, its tailis f, and thusa new node, {c,a,b, f}, is ex-tended as a child node of {c,a,b}. This is a leaf node (it is no longerpossible to create extensions) since it is not genuinely weighted asfrequent (0.65 3 = 1.95 < min_sup). Since the sub-tree rooted at{c,a, b} has not no MWFPs, {c, a,b} can be a candidate MWFP. Thus,we need to check whether pattern {c,a,b} has any supersets thatare weighted as frequent. We already discovered an MWFP,{c,a,d,b} that is a superset of {c,a, b}. Therefore, {c,a, b} cannot passthe superset check and is not an MWFP. Returning to our traversal,we arrive at {c, a,f}, the next sibling node of {c,a, b}, and checkwhether the 1-level extensions of {c,a,f} are possible. Indeed,{c,a,f} is genuinely weighted as frequent, but it has no tail and thusbecomesa leaf node. It isnolongerpossible tocreate any extensions.Thus, we have to check whether its supersets are weighted as fre-quent. No supersets of {c,a,f} are found from the result set, whichcontains already discovered MWFPs, and thus {c,a,f} is an MWFP.This pattern is inserted into the result set. Next, {c,a, f} has noremaining sibling nodes, and we return to its parent node. Thesub-tree rooted as parent node {c,a} already hasMWFPs, andthere-fore {c,a} is a subsetof these MWFPs. Thus,wecanskip thesupersetchecking step for {c,a}. Next, we traverse the sub-tree rooted as{c,d}, which is the next sibling of {c,a}. Node {c,d} can have its tail,{b,f},anda combinationof {c,d} [ {b}, whichis used tomaketherstchildnodes of{c,d}.Thegeneratedpattern, {c,d,b},is a real weightedfrequentpattern,andthuswe cangenerate a newcandidateas the1-level extension of {c,d, b} using its tail. The only 1-level extension,{c,d,b,f}, is not a genuinely weighted frequent pattern (0.6375 3= 1.9125 < min_sup), and therefore we stop the traversal of thissub-tree and return to itsparent node, {c,d,b}. Node{c,d,b} is a sub-set of {c, a,d,b}, which is an already discovered MWFP. Conse-quently, the remaining sibling node, {c,d,f}, is not genuinelyweighted as frequent (0.65 3 = 1.95 < min_sup), and thus wecheckits remaining sibling nodes.However, it hasno remaining sib-ling nodes, and therefore we return to its parent node, {c,d}. How-ever, {c,d} is not an MWFP since it is a subset of an alreadydiscovered MWFP. Next, {c,b} is the right-sibling node of {c,d},andthusweextend itssub-tree, and searchthe tree indepth-rstor-der. Node {c,b} is genuinely weighted as frequent, but {c,b,f} is not.

    Returning to the root of the sub-tree, only {c,b} can be a candidate.However, {c,b} is also a subset of an already discovered MWFP,and thus it cannot be inserted into the result set. We traverse thenext sibling node, {c,f}, but {c,f} is not genuinely weighted as fre-quent. Thus, {c,f} cannot be an MWFP. Furthermore, it has noright-sibling node, so we return to its parent node, {c}. Withoutchecking thesuperset, weknowthat pattern{c}cannotbeanMWFPsince thesub-tree rooted at{c}hasone ormore MWFPs.Wetraversethe remaining search spaces for the right-sibling nodes of {c}. Theremainders are {a}, {d}, {b}, f, {e}, and {g}. By traversing the prex-tree rooted at {}, all of the MWFPs are discovered.

    3.4. Vertical bitmap representation

    Our MWFIM mining algorithm adopts a vertical bitmap repre-sentation used in MAFIA [6] . Each bitmap stands for a pattern in

    Fig. 6. An example of a vertical bitmap representation and AND-operation appliedto the bits.

    58 U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    7/12

    the database, and the bit in each bitmap represents whether a gi-ven transaction has the corresponding pattern. Initially, each bit-map corresponds to a 1-level pattern or a single item. Thepatterns of counted supports in the transaction database becomerecursively longer, and the vertical bitmap representation workscompletely in combination with this pattern extension. For in-stance, the bitmap for pattern {a,b} can be generated easily by per-forming an AND-operation on all of the bits in the bitmaps for {a}and {b}. Next, to count the number of transactions that contain{a,b}, we only need to count the number of single bits in the{a,b} bitmap that equals the number of transactions that have{a,b}, as shown in Fig. 6 b. To simplify, the bitmap representation

    is ideal for both candidate pattern generation and supportcounting.

    In previous algorithms [6,14] , bit map representation is appliedfor storing all the items of transactions in TDB. Meanwhile, inMWFIMalgorithm, bit map structure is used to keep only weightedfrequent items of transactions in TDB. Thus, memory usage can bereduced. For example, suppose that a minimum support (min_sup)is 2, given a transaction databasein Table1 a and weight table in Ta-ble 1 b, a frequent item list is: ha:4,b:4,c:3,d:5, e: 3,f:4,g:4,h:2,i:1 iand a weight list of items is: ha:0.7,b:0.6,c:0.8,d:0.65,e:0.45,f:0.5,g:0.4,h:0.5,i:0.45 i . Asverticalbitmaps shownin Fig. 6 a, our MWIFMalgorithm does not need to store items h and i in vertical bitmaps because the items maximum weighted supports(MaxW support)are less than a minimumsupport (2)and any pat-

    tern including items h or i cannot be a weighted frequentpattern.

    3.5. MWFIM : MWFP mining algorithm

    We will now present the MWFIM algorithm.

    ALGORITHM [MWFIM] : Maximal Weighted Frequent ItemsetMining

    Input : (1) A Transaction Database: TDB;

    (2) Weights of the items within weight range:MinWMaxW;

    (3) Minimum support threshold: min_supOutput : The complete set of maximal weighted frequent

    patternsBegin1. Let MWFP be the set of maximal weighted frequent

    patterns. Initialize MWFP {};2. Scan TDB once to nd the global weighted frequent items as

    follows:support MaxW P min_sup

    3. Sort the items in weight-descending order;4. Scan the TDB again and build vertical bitmaps to store

    weighted frequent candidate items of transactions in the

    TDB.5. Call MWFIM(root, MWFP, false);

    Procedure MWFIM (Current node C, MWFP, Boolean isHUT)

    1: isAdded = false;2: allWF = true;3: HeadWS= C.Head.Weight C.Head.Support;4: If (HeadWS P min_sup)5: For each item i in all the remaining items

    //remaining items that have lower orders than the items of C.Head

    6: If (C.Head.Weight C.Head [ {i} P min_sup), theninsert i into C.Tail;

    7: Else allWF = false;

    8: for each item i in C.Tail9: If (i is the rst item in C.Tail) isHUT = true;10: Else isHUT = false;11: extended_C = C [ {i};12: isAdded_local = MWFIM(extended_C, MWFP, isHUT);13: If (isAdded_local) isAdded_MWFP = true;14: If (isHUT and AllWF = true) return isAdded;15: if (C.Tail == {})16: If (not ExistSubset(C.Head, MWFP)) {

    // C.Head is not subset of a weighted frequent pattern17: Insert C.Head into MWFP;18: isAdded = true; }19: Else isAdded= false;20: return isAdded;

    In the MWFIM algorithm, TDB is scanned once, and weightedfrequent items are found and sorted in weight-descending order.To store all of the transaction andcorresponding items, vertical bit-maps are generated. Next, the MWFIM algorithm calls the recur-sive MWFIM procedure (Current node C, MWFP, Boolean isHUT).In the procedure, a ag, isAdded, is set to true only if any childnode of current node C has an MWFP. Otherwise, it is set to false.When the ag is true, without checking for supersets, we knowthat the head of C is not be a maximal weighted frequent pattern.When the procedure call has ended, it returns its isAdded ag. An-other ag of the procedure, allWF, is set to true when all of theremaining items that have lower orders than the items of C.Head

    are weighted as frequent. The other ag, isHUT means that cur-rent candidate is H [ T when isHUT is true. If isHUT and allWF

    Table 3

    Parameter settings for scalability test.

    Data sets | T | | I | | L| # of items

    # of trans(K)

    Size(MB)

    (a) T10I4Dx datasetsT10I4D100K 10 4 2000 1000 100 3.92T10I4D200K 10 4 2000 1000 200 8.05T10I4D400K 10 4 2000 1000 400 15.71T10I4D600K 10 4 2000 1000 600 23.56T10I4D800K 10 4 2000 1000 800 31.42T10I4D1000K 10 4 2000 1000 1000 39.27T10I4D2000K 10 4 2000 1000 2000 78.55T10I4D3000K 10 4 2000 1000 3000 117.83T10I4D4000K 10 4 2000 1000 4000 157.11T10I4D5000K 10 4 2000 1000 5000 196.39

    (b) TaLbNc datasetsT10.L1000.N10000D100K 10 4 1000 10,000 100 5.08T20.L2000.N20000D100K 20 4 2000 2 0,000 100 10.99T30.L3000.N30000D100K 30 4 3000 3 0,000 100 16.82T40.L4000.N40000D100K 40 4 4000 4 0,000 100 22.66T10.L1000.N10000D1000K 10 4 1000 10,000 1000 50.82T20.L2000.N20000

    D1000K20 4 2000 20,000 1000 109.9

    T30.L3000.N30000D1000K

    30 4 3000 30,000 1000 168.28

    T40.L4000.N40000D1000K

    40 4 4000 40,000 1000 226.63

    Table 2

    Characteristics of benchmark datasets.

    Data sets Size (M) # of trans # of items A.(M.) trans size

    Pumsb 15.9 49,046 2113 74(74)Accidents 33.8 340,183 572 45(45)Retail 3.97 88,162 16,470 13(50)BMS-Webview1 0.97 59,602 497 2.5(267)

    U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364 59

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    8/12

    are true, we do not have to traverse any subsets of H [ T (C.HeadC.Tail) in line 14. The weighted support of C.Head is used to checkthe genuinely weighted support of the current candidate pattern inline 4. Line 6 prunes weighted infrequent patterns with the maxi-mum weight, which is the weight of C.Head . TheMWFIMalgorithmadopts the depth-rst traversal with the prex-tree. If C.Tail is notempty, C is extended as extended_C, and the MWFIM procedure(extended_C, MWFP, isHUT) is called recursively in line 12. How-ever, if C.Tail is empty, then C.Head is a candidate pattern, and thusthe procedure checks whether C.Head has a weighted frequent sup-

    erset in line 16. If it does not, C.Head is a maximal weightedfrequent pattern and is inserted into the MWFP set.Fig. 10. Number of patterns (accidents dataset).

    Fig. 8. Number of patterns (Pumsb dataset).

    Fig. 12. Number of patterns (retail dataset).

    Fig. 13. Runtime (BMS-Webview1 dataset).

    Fig. 11. Runtime (retail dataset).

    Fig. 9. Runtime (accidents dataset).

    Fig. 7. Runtime (Pumsb dataset).

    60 U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    9/12

    4. Performance evaluation

    Using real and synthetic datasets, we report our experimentalresults on the performance of the MWFIM algorithm as comparedto the state-of-the-art maximal pattern mining algorithm, MAFIA

    [6] . MWFIM and MAFIA both use vertical bitmaps for candidatepattern generation and support counting. However, MWFIM isthe rst maximal weighted frequent pattern mining algorithm.The main purposes of this experiment are to demonstrate howeffectively non-maximal weighted patterns can be pruned, and toshow the effectiveness of maximal weighted frequent patterns.Additionally, we run a scalability test and analyze the memoryusage and quality of patterns in MWFIM.

    4.1. Test environment and datasets

    In our experiments, we used four real datasets and several syn-thetic datasets. Table 2 shows the characteristics of these datasets(Pumsb,Accidents, Retail, BMS-Webview1,andT10I4DxK datasets).The Pumsb dataset includes census data for populations and hous-ing.The Accidents datasetcontainsanonymous trafc accident data.It is quite dense, and therefore a large number of frequent patternswill be mined, even forvery high minimum support values. The Re-tail dataset is sparse and includes a dataset on a retail supermarketstore basket. The BMS-Webview1 dataset contains several monthsofsparseclick-streamdata fromane-commerceWebsite.Thesefourreal datasets can be obtained from the Frequent Itemset MIning

    Fig. 15. Runtime (T10I4Dx dataset).

    Fig. 14. Number of patterns (BMS-Webview1 dataset).

    Fig. 16. Runtime (TaLbNc dataset).

    Fig. 17. Memory usage (Pumsb dataset).

    U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364 61

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    10/12

    (FIMI) dataset repository ( http://www.mi.cs.helsinki./data/ ).These datasets do not have weight values of their items, and there-fore a random generation function is used to generatetheirweights.Table3 summarizesthe parameter settings, where | T | is the averagesize ofa transaction,| I | is theaveragesizeof themaximal potentiallylarge itemsets, | L| is the maximum number of potential frequentpatterns, and N is the number of items. As shown in Table 3 a andb, we usesyntheticT10I4Dx and TaLbNc datasets. T10I4Dx datasetscontain from 1000 K to 5000 K transactions and TaLbNc datasetshave the number of items from 10 K to 40 K with 100 K or 1000 Ktransactions. These synthetic datasets were generated from theIBMdataset generator. OurMWFIMalgorithmwas written in VisualC++. Experiments were performed on a processor operating at

    2.40 GHz with 2048 MBofmemoryon a Microsoft Windows 7 oper-ating system.

    4.2. Experimental results on execution time

    We analyze the evaluation results for the Accidents, Pumsb, Re-tail, and BMS-Webview1 datasets in Figs. 714 . The normalizedweights of the items are between 0.3 to 0.6, 0.5 to 0.8, 0.3 to 0.6and 0.4 to 0.8, respectively. From Figs. 714 show that MWFIMruns faster and generates fewer patterns than the MAFIA algorithmin all cases. Specically, fewer patterns are foundas minimum sup-port is increased. Fig. 7 compares the results for the Pumsb datasetand shows that MWFIM outperforms MAFIA in all cases. Likewise,Fig. 9 shows that MWFIM is faster than MAFIA with the Accidentsdataset, which is a dense TDB. Figs. 8, 10 and 12 show that MAFIAmines a very large number of patterns with the following settings.For example, in the Pumsb datasets, the numbers of patterns inMAFIA are 108,804 with a minimum support of 54%, and 146,882with a minimum support of 52%. For the Accidents dataset, whichis another dense dataset, MWFIM is faster than MAFIA, as shown inFig. 9 , and generates fewer patterns, as shown in Fig. 10 . In partic-ular, Fig. 10 shows that the numberof unimportant patterns is con-siderably reduced by MWFIM. In Figs. 1114 , we provide theevaluation results for two sparse datasets, the Retail and BMS-Webview1 datasets. With these datasets, our experiment showsthat MWFIM shows the best performance in terms of the numberof patterns and runtime for sparse datasets. In conclusion, ourexperiment has that the number of patterns found by MWFIM isseveral orders of magnitude fewer than the number of patterns dis-covered by MAFIA. However, the maximal weighted frequent pat-terns mined by MWFIM are more important and fewer than themaximal frequent patterns of MAFIA since the weight constraintsreect what item is more important even if it has low frequency.Therefore, we determined that MWFP mining can reduce much lar-ger search spaces than MFP mining when the ttingparameters areset. Meanwhile, the runtime is increased when the minimum sup-port becomes lower.

    4.3. Scalability test

    T10I4DxK datasets are used to test the scalability using thenumber of transactions, and TaLbNc datasets are used to checkthe scalability using the number of attributes. In this experiment,MWFIM scales much better than the MAFIA algorithm. First, weran a scalability test on MWFIM with regard to the number of transactions from 100 K to 5000 K. The minimum support is setas 0.1% about 100500 K transactions and set as 0.3% about10005000 K transactions. The normalized weights of items areset as between 0.3 and 0.6. Fig. 15 shows that the slope ratio of MWFIM is lower than that of MAFIA, and that MWFIM is also fas-ter. Second, we also compare MWFIM with MAFIA using the num-ber of attributes as 1040 K. In this test, transaction number isincreased from 100 K to 1000 K in which a minimum support set

    as 0.5% for 100 K transactions and 0.8% for 1000 K transactions.The normalized weights of the items are set as between 0.3 and0.6. Fig. 16 shows that MWFIM is much more scalable than MAFIAin terms of the number of attributes. In comparison with MAFIAalgorithm, MWFIM not only runs faster but also has betterscalability.

    4.4. Memory consumption

    In this experiment, we checked the memory usage of MWFIMand MAFIA using four real datasets. From Figs. 1722 , we can seethat MWFIM uses less memory than MAFIA. MWFIM uses normal-ized weights pushed deeply into the mining process, and thereforeweighted infrequent patterns are not considered for the next min-

    ing step. These memory usages are generally proportional to thenumber of their result patterns. In addition, unimportant patterns

    Fig. 20. Memory usage (BMS-Webview1 dataset).

    Fig. 18. Memory usage (accidents dataset).

    Fig. 19. Memory usage (retail dataset).

    62 U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364

    http://www.fimi.cs.helsinki.fi/data/http://www.fimi.cs.helsinki.fi/data/
  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    11/12

  • 8/12/2019 An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

    12/12

    [5] F. Bonchi, C. Lunnhese, Pushing Tougher Constraints in Frequent PatternMining, PAKDD, May 2005.

    [6] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, T. Yiu, MAFIA: a maximalfrequent itemset algorithm, IEEE Transactions on Knowledge and DataEngineering 17 (11) (2005) 14901504.

    [7] C.H. Cai, A.W. Fu, C.H. Cheng, W.W. Kwong, Mining association rules withweighted items, in: Proceedings of International Database Engineering andApplications Symposium, IDEAS 98, Cardiff, Wales, UK, 1998, pp. 6877.

    [8] J. Chang, Mining weighted sequential patterns in a sequence database with atime-interval weight, Knowledge-Based Systems 24 (1) (2011) 19.

    [9] M.S. Chen, J. Han, P.S. Yu, Data mining: an overview fromdatabase perspective,IEEE Transactions on Knowledge and Data Engineering 8 (1996) 866883.

    [10] L. Chen, F. Hsu, M. Chen, Y. Hsu, Developing recommender systems with theconsideration of product protability for sellers, Information Sciences 178 (4)(2008) 10321048.

    [11] G. Dong G.J. Li, Efcient mining of emerging patterns: discovering trends anddifferences, in: Proceedings of 1999 International Conference on KnowledgeDiscovery and Data Mining (KDD99), San Diego, CA, 1999, pp. 4352.

    [12] G.Grahne, J. Zhu,Fast algorithmsfor frequentitemset miningusing FP-trees, IEEETransactions on Knowledge and DataEngineering 17 (10) (2005) 13471362.

    [13] J. Ge, Y. Qiu and Z. Chen, Cooperative recommendation system based onontology construction, in: 7th Intl Conference on Grid and CooperativeComputing, October 2008, pp. 691694.

    [14] K. Gouda, M.J. Zaki, Efciently mining maximal frequent itemsets, in: Proc.IEEE Intl Conf. Data Mining, 2001, pp. 163170.

    [15] M.S. Gu, J.H. Hwang et al., Mining the weighted frequent XML query pattern,in: IEEE International Workshop on Semantic Computing and Applications, July 2008.

    [16] J. Han, M. Kamber, Data Mining Concepts and Techniques, second ed., MorganKaufmann, 2005.

    [17] J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidategeneration: a frequent-pattern tree approach, Data Mining and KnowledgeDiscovery 8 (1) (2004) 5387.

    [18] J.H. Hwang, K.H. Ryu, A weighted common structure based clusteringtechnique for XML documents, Systems and Software 83 (7) (2010) 12671274.

    [19] J. Li, G. Dong, K. Ramamohanarao, L. Wong, Deeps: a new instance-based lazydiscovery and classication system, Machine Learning 54 (2) (2004) 99124.

    [20] A.H. Lim, C.S. Lee, Processing online analytics with classication andassociation rule mining, Knowledge-Based Systems 23 (3) (2010) 248255.

    [21] T. Li, X. Li, H. Xiao, An effective algorithm for mining weighted associationrules in telecommunication networks, in: Intl Conference on ComputationalIntelligence and Security Workshops, 2007, pp. 425428.

    [22] Y.C. Li, J.S. Yeh, C.C. Chang, Isolated items discarding strategy for discoveringhigh utility itemsets, Data & Knowledge Engineering 64 (2008) 198217.

    [23] H. Mannila, H. Toivonen, Levelwise search and borders of theories inknowledge discovery, Data Mining and Knowledge Discovery 1 (3) (1997)

    241258.

    [24] M. McGlohon, L. Akoglu, C. Faloutsos, Weighted Graphs and DisconnectedComponent: Patterns and a generator, KDD, 2008.

    [25] S. Nowozin, K. Tsuda, Weighted substructure mining for image analysis,in: IEEE Conference on Computer Vision and Pattern Recognition, June2007.

    [26] E.R. Omiecinski, Alternative interest measures for mining associationsin databases, IEEE Transaction on Knowledge and Data Engineering(2003).

    [27] S.B. Patil, Y.S. Kumaraswamy, Intelligent and effective heart attack predictionsystem using data mining and articial neural network, European Journal of Scientic Research 31 (4) (2009) 642656.

    [28] J. Pei, G. Dong, W. Zou, J. Han, Mining condensed frequent-pattern bases,Knowledge and Information Systems 6 (5) (2004) 570594.

    [29] J. Pei, J. Han, et al., Mining sequential patterns by pattern-growth: theprexspan approach, IEEE Transactions on Knowledge and Data Engineering(October) (2004).

    [30] K. Sun, F. Bai, Miningweightedassociationrules without pre-assignedweights,IEEE Transactions on Knowledge and Data Engineering 20 (4) (2008).

    [31] F. Tao, Weighted Association Rule Mining Using Weighted Support andSignicant Framework, ACM SIGKDD, August 2003.

    [32] W. Tsao, A.J.T. Lee, Y. Liu, T. Chang, H. Lin, A data mining approach to facedetection, Pattern Recognition 43 (2010) 10391049.

    [33] Y. Wu, Y. Chen, R. Chang, Mining negative generalized knowledge fromrelational databases, Knowledge-Based Systems 24 (1) (2011) 134145.

    [34] H. Xiong, S. Shekhar, P.N. Tan, V. Kumar, Exploiting A Support-based UpperBound of Pearsons Correlation Coefcient for Efciently Identifying StronglyCorrelated Pairs, ACM SIGKDD, August 2004.

    [35] X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, IEEE ICDM02,December 2002.

    [36] U. Yun, Mining lossless closed frequent patterns with weight constraints,Knowledge-Based Systems 20 (2007) 8697.

    [37] U. Yun, Efcient mining of weighted interesting patterns with a strongweight and/or support afnity, Information Sciences 177 (17) (2007)34773499.

    [38] U. Yun, On pushing weight constraints deeply into frequent itemset mining,Intelligent Data Analysis 13 (2) (2009).

    [39] U. Yun, K. Ryu, Approximate weighted frequent pattern mining with/withoutnoisy environments, Knowledge Based Systems 24 (1) (2011) 7382.

    [40] U. Yun,An efcientminingof weightedfrequent patterns withlengthdecreasingsupport constraints, Knowledge Based Systems 21 (8) (2008) 741752.

    [41] U. Yun, K. Ryu, Weighted approximate sequential pattern mining withintolerance factors, Intelligent Data Analysis 15 (4) (2011) 551569.

    [42] U. Yun, K. Ryu, Discovering important sequential patterns with length-decreasing weighted support constraints, International Journal of Information Technology and Decision Making 9 (4) (2010) 575599.

    [43] X. Zeng, J. Pei, K. Wang, J. Li, PADS: a simple yet effective pattern-awaredynamic search method for fast maximal frequent pattern mining, Knowledge

    and Information Systems 20 (3) (2009) 375391.

    64 U. Yun et al. / Knowledge-Based Systems 33 (2012) 5364