32
Basic Level in Formal Concept Analysis: Interesting Concepts and Psychological Ramifications Radim Belohlavek, Martin Trnecka Palacky University, Olomouc, Czech Republic Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 1 / 32

Radim Belohlavek, Martin Trneckatrnecka.inf.upol.cz/publications/slides/ijcai13.pdf · Basic Idea Belohlavek R., Klir G.: Concepts and Fuzzy Logic. MIT Press, 2011. Two research directions:

  • Upload
    haminh

  • View
    220

  • Download
    4

Embed Size (px)

Citation preview

Basic Level in Formal Concept Analysis:Interesting Concepts and Psychological Ramifications

Radim Belohlavek, Martin Trnecka

Palacky University, Olomouc, Czech Republic

!

!

!

POSSIBILISTIC INFORMATION:

A Tutorial

George J. Klir

State University of New York (SUNY)

Binghamton, New York 13902, USA

[email protected]

prepared for International Centre for Information and Uncertainty, Palacky University, Olomouc!

!

!

!

!

!

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 1 / 32

Basic Idea

Belohlavek R., Klir G.: Concepts and Fuzzy Logic. MIT Press, 2011.

Two research directions:

A) Formal Concept analysis (data minig in general) benefits from Psychology of Concepts(utilizing phenomena studied by PoC).

B) Psychology of Concepts benefits from data mining (FCA).

Psychology of Concepts: big area in cognitive psychology, empirical study of humanconcepts.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 2 / 32

Motivation

Number of patterns extracted from data may become too large to be reasonablycomprehended by a user.

This is in particular true of formal concept analysis.

Users finds some extracted concepts important (relevant, interesting), some lessimportant, some even “artificial” and not interesting.

Select only important concepts.

Several approaches have been proposed, e.g.:

– Indices enabling us to sort concepts according to their relevance.Kuznetsov’s stability index

– Taking into account additional user’s knowledge (background knowledge) to filter relevantconceptsBelohlavek et al.: attribute dependency formulas, constrained concept lattices

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 3 / 32

Previous Work

Another possibility is that the user judgment on concepts’ importance is due to certainpsychological processes and phenomena.

Our approach: important are concepts from the basic level.

Belohlavek R., Trnecka M.: Basic Level of Concepts in Formal Concept Analysis.In: F. Domenach, D.I. Ignatov, and J. Poelmans (Eds.): ICFCA 2012, LNAI 7278,Springer, Heidelberg, 2012, pp. 28-44.

In our previous work we demonstrated that the basic level phenomenon may be utilizedto select important formal concepts.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 4 / 32

Q: What is this?

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 5 / 32

A: Dog

. . . Why dog?

There is a number of other possibilities:

Animal

Mammal

Canine beast

Retriever

Golden Retriever

Marley

. . . So why dog?: Because “dog” is a basic level concept.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 6 / 32

Basic Level Phenomenon

Extensively studied phenomenon in psychology of concepts.

When people categorize (or name) objects, they prefer to use certain kind of concepts.

Such concepts are called the concepts of the basic level.

Definition of basic level concepts?:Are cognitively economic to use; “carve the world well”.

One feature: Basic level concepts are a compromise between the most general andmost specific ones.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 7 / 32

Basic Level Definitions

The psychological literature does not contain a single, robust and uniquely intepretabledefinition of the notion of basic level.

Several informal definitions.

There exists (semi)formalized metrics from psychologists.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 8 / 32

Our Paper (First Step in B)

We formalize within FCA five selected approaches to basic level.

Goal: Explore their relationship.

Several challenges:

– Every formal model of basic level will be simplistic (potentially a point of criticism from thepsychological standpoint).

– Various issues which are problematic or not yet fully understood from a psychologicalviewpoint (less knowledge and extensive domain knowledge).

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 9 / 32

Introduction to Formal Concept Analysis (FCA)

Method for analysis of tabular data (Rudolf Wille).

Input: binary objects (rows) × attributes (columns) data table called formal context(denoted by 〈X,Y, I〉); the binary relation I ⊆ X × Y tells us whether attribute y ∈ Yapplies to object x ∈ X, i.e. 〈x, y〉 ∈ I, or not, i.e. 〈x, y〉 6∈ I.

Output:

1 Hierarchically ordered collection of clusters:

– called concept lattice,– clusters are called formal concepts,– hierarchy = subconcept-superconcept.

2 Data dependencies:

– called attribute implications,– not all (would be redundant), only representative set.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 10 / 32

Formal Concept

The notion of a concept is inspired by Port-Royal logic (traditional logic):

concept := extent + intent

extent = objects covered by conceptintent = attributes covered by concept.

A formal concept of 〈X,Y, I〉 is any pair 〈A,B〉 consisting of A ⊆ X and B ⊆ Ysatisfying A↑ = B and B↓ = A where

A↑ = {y ∈ Y | for each x ∈ X : 〈x, y〉 ∈ I}, and

B↓ = {x ∈ X | for each y ∈ Y : 〈x, y〉 ∈ I}.

That is, A↑ is the set of all attributes common to all objects from A and B↓ is the setof all objects having all the attributes from B, respectively.

Example (Concept dog)

extent = collection of all dogs (foxhound, poodle, . . . ),intent = {barks, has four limbs, has tail, . . . }

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 11 / 32

Concept lattice

The set B(X,Y, I) of all formal concepts of 〈X,Y, I〉 equipped with thesubconcept-superconcept ordering ≤ is called the concept lattice of 〈X,Y, I〉.〈A1, B1〉 ≤ 〈A2, B2〉 means that 〈A1, B1〉 is more specic than 〈A2, B2〉.≤ captures the intuition behind DOG ≤ MAMMAL (the concept of a dog is morespecific than the concept of a mammal).

Example

a b c d

1 × ×2 × ×3 × × ×4 × × ×

1,2,3,4 c

1,3a, c

3,4b, c

2,4c, d

3a, b, c

4b, c, d

a, b, c, d

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 12 / 32

Basic Level Metrics

We created a metrics (functions assigning numbers to concepts).

Large number is stronger indicates that the concept is basic level concept.

For formal context 〈X,Y, I〉 which describes all the available information regarding theobjects and attributes. For a given approach M to basic level, we define a function BLMmapping every concept 〈A,B〉 in the concept lattice B(X,Y, I) to [0,∞) or to [0, 1]

BLM(A,B) is interpreted as the degree to which 〈A,B〉 belongs to the basic level.

A basic level is thus naturally seen as a graded (fuzzy) set rather than a clear-cut set ofconcepts.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 13 / 32

Similarity Approach (S)

Based on informal (one of the first) definition from Eleanor Rosch (1970s).

Basic level concept satisfies three conditions:

1. The objects of this concept are similar to each other.2. The objects of the superordinate concepts are significantly less similar.3. The objects of the subordinate concepts are only slightly more similar.

Formalized in our previous paper (Belohlavek, Trnecka, ICFCA 2012).

BLS(A,B) = α1 ⊗ α2 ⊗ α3,

Where αi represents the degree of validity of the above condition i = 1, 2, 3 and ⊗represents a truth function of many-valued conjunction.

The definition of degrees αi involves definitions of appropriate similarity measures andutilizes basic principles of fuzzy logic.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 14 / 32

Cue Validity Approach (CV)

Proposed by Eleanor Rosch (1976).

Based on the notion of a cue validity of attribute y for concept c, i.e. the conditionalprobability p(c|y) that an object belongs to c given that it has y.

The total cue validity for c is defined the sum of cue validities of each of the attributesof c.

Basic level concepts are those with a high total cue validity.

We consider the following probability space: X (objects) are the elementary events, 2X

(sets of objects) are the events, the probability distribution is given by P ({x}) = 1|X| for

every object x ∈ X. For an event A ⊆ X then, P (A) = |A|/|X|. The eventcorresponding to a set {y, . . . } ⊆ Y of attributes is {y, . . . }↓.

BLCV(A,B) =∑y∈B

P (A|{y}↓) =∑y∈B

|A ∩ {y}↓||{y}↓|

.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 15 / 32

Category Feature Collocation Approach (CFC)

Proposed by Jones (1983).

Defined as product p(c|y) · p(y|c) of the cue validity p(c|y) and the so-called categoryvalidity p(y|c).The total CFC for c may then be defined as the sum of collocations of c and eachattribute.

Basic level concepts may then again be understood as concepts with a high total CFC.

BLCFC(A,B) =∑y∈Y

(p(c|y) · p(y|c)) =∑y∈Y

(|A ∩ {y}↓||{y}↓|

· |A ∩ {y}↓|

|A|

).

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 16 / 32

Category Utility Approach (CU)

Proposed by Corter and Gluck (1992).

Utilizes the notation of category utility

cu(c) = p(c) ·∑y∈Y

[p(y|c)2 − p(y)2].

Algorithm COBWEB is base on the CU.

BLCFC(A,B) = P (A) ·∑y∈Y

[(P ({y}↓ ∩A)

P (A)

)2

− P ({y}↓)2]=

=|A||X|·∑y∈Y

[(|{y}↓ ∩A||A|

)2

− |{y}↓|

|X|)2

].

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 17 / 32

Predictability Approach (P)

Based on the idea, frequently formulated in the literature.

Basic level concepts are abstract concepts that still make it possible to predict well theattributes of their objects (enables good prediction, for short).

This approach has not been formalized in the psychological literature.

We introduce a graded (fuzzy) predicate pred such that pred(c) ∈ [0, 1] is naturallyinterpreted as the truth degree of proposition “concept c = 〈A,B〉 enables goodprediction”.

We use the principles of fuzzy logic to obtain the truth degrees β1, β2, and β3 of thefollowing three propositions:

1. c has high pred ;2. c has a significantly higher pred than its upper neighbors;3. c has only a slightly smaller pred than its lower neighbors.

This is done in a way analogous to how we approached a similar problem withformalizing the similarity approach to basic level.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 18 / 32

Extracting Interesting Concepts - Datasets

Sports,20 objects – sports and 10 attributes (on land, on ice, individual sport . . . ), 37 formalconcepts.

Drinks,68 objects – drinks and 25 attributes – regarding the composition of drinks (containsalcohol, contains caffeine, contains vitamins, various kinds of minerals), 320 formalconcepts.

DBLP,6980 objects – authors of papers in computer science and 19 attributes – conferencesin computer science, 2495 formal concepts.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 19 / 32

Extracting Interesting Concepts - Experimental Evaluation

Concepts selected for basic level are, for example:

Sports dataset

“ball games”

“land sports”

“individual sports”

“winter collective sports”

. . .

DBLP dataset

“Authors publishing in top databaseconferences”

“Authors publishing in theoryconferences”

. . .

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 20 / 32

Extracting Interesting Concepts - Experimental Evaluation

Concepts selected for basic level are, for example:

Drinks dataset

“beers”,

“energy drinks containing coffein”,

“liquers”,

“milk drinks”,

“mineral waters”,

“milk drinks”,

“sweet vitamin drinks”,

“sparkling drinks”,

“wines”,

. . .

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 21 / 32

Comparison of Basic Level Metrics

Different approaches describe single phenomenon – basic level of concepts.

For every input data 〈X,Y, I〉, a given metric BLM (i.e. M being S, CV, CFC, CU, orP) determines a ranking of formal concepts in B(X,Y, I), i.e. determines the linearquasiorder ≤M defined by

〈A1, B1〉 ≤M 〈A2, B2〉 iff BLM(A1, B1) ≤ BLM(A2, B2).

We examined the pairwise similarities of the rankings ≤S, ≤CV, ≤CU, ≤CFC, and ≤P

for various datasets.

We used the Kendall tau coefficient to assess the similarities.

Kendall tau coefficient τ(≤M,≤N) of rankings ≤M and ≤N is a real number in [−1, 1].High values indicate agreement of rankings with 1 in case the rankings coincide. Lowvalues indicate disagreement with −1 in case one ranking is the reverse of the other.

Even though this approach might seem rather strict, significant patterns were obtained.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 22 / 32

Similarity of Rankings Basic Level Concepts

S CV CFC CU PS 1.000 -0.093 -0.215 -0.139 -0.175CV -0.093 1.000 0.737 0.754 0.170CFC -0.215 0.737 1.000 0.789 0.229CU -0.139 0.754 0.789 1.000 0.161P -0.175 0.170 0.229 0.161 1.000

Table : Kendall tau coefficients for Sports data.

S CV CFC CU PS 1.000 0.103 0.151 0.067 -0.036CV 0.103 1.000 0.555 0.581 -0.232CFC 0.151 0.555 1.000 0.811 -0.061CU 0,067 0.581 0.811 1.000 -0.064P -0.036 -0.232 -0.061 -0.064 1.000

Table : Kendall tau coefficients for Drinks data.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 23 / 32

Similarity of Rankings Basic Level Concepts

S CV CFC CU PS 1.000 0.051 0.050 0.039 -0.104CV 0.051 1.000 0.701 0.699 -0.211CFC 0.050 0.701 1.000 0.791 -0.164CU 0.039 0.699 0.791 1.000 -0.107P -0.104 -0.211 -0.164 -0.107 1.000

Table : Kendall tau coefficients for synthetic 75× 25 datasets.

S CV CFC CU PS 1.000 0.245 0.303 0.350 -0.136CV 0.245 1.000 0.636 0.631 -0.540CFC 0.303 0.636 1.000 0.727 -0.645CU 0.350 0.631 0.727 1.000 -0.458P -0.136 -0.540 -0.645 -0.458 1.000

Table : Kendall tau coefficients for synthetic 100× 50 datasets.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 24 / 32

Similarity of sets of top r basic level concepts

Rank correlation may be too strict a criterion.

Namely, instead of basic-level ranking, one is arguably more interested in the setconsisting of the top r concepts of B(X,Y, I) according to the ranking ≤M for a givenmetric M. We denote such set by

TopMr

with the provision that

a) if the (r+ 1)-st, . . . , (r+ k)-th concepts are tied with the r-th one the ranking, we add thek concepts to TopM

r .b) we do not include concepts to which the metric assigns 0.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 25 / 32

Similarity of Sets of Top r Basic Level Concepts

Given metrics M and N, we are interested in whether and to what extent are the setsTopMr and TopNr similar.

For this purpose, we propose the measure of similarity.

For formal concepts 〈C,D〉, 〈E,F 〉 ∈ B(X,Y, I), denote by s(〈C,D〉, 〈E,F 〉) anappropriately defined degree of similarity. We present results utilizing the similaritybased on the simple matching coefficient but other options such as the Jaccardcoefficient yield similar results. For two metrics M and N, and a given r = 1, 2, 3, . . . ,we define:

S(TopMr ,TopNr ) = min(IMN, INM)

where

IMN =

∑〈C,D〉∈TopM

rmax〈E,F 〉∈TopN

rs(〈C,D〉, 〈E,F 〉)

|TopMr |and

INM =

∑〈E,F 〉∈TopN

rmax〈C,D〉∈TopM

rs(〈C,D〉, 〈E,F 〉)

|TopNr |.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 26 / 32

Similarity of Sets of Top r Basic Level Concepts

S(TopMr ,TopNr ) may naturally be interpreted as the truth degree of the proposition

“for most concepts in TopMr there is a similar concept in TopNr and vice versa”.

Because of this interpretation and because S is a reflexive and symmetric fuzzy relationwith suitable furhter properties, S is a good candidate for measuring similarity.

Clearly, high values of S indicate high similarity and S(TopMr ,TopNr ) = 1 iff

TopMr = TopNr .

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 27 / 32

Similarity of Sets of Top r Basic Level Concepts

0 10 20 30 40 500.4

0.5

0.6

0.7

0.8

0.9

1

r

S

S−CV

S−CU

S−P

S−CFC

CV−CU

CV−P

CV−CFC

CU−P

CU−CFC

P−CFC

Figure : Similarities S of sets of top r concepts for 75× 25 random datasets.

0 10 20 30 40 500.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

r

S

S−CV

S−CU

S−P

S−CFC

CV−CU

CV−P

CV−CFC

CU−P

CU−CFC

P−CFC

Figure : Similarities S of sets of top r concepts for 100× 50 random datasets.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 28 / 32

Similarity of Sets of Top r Basic Level Concepts

0 10 20 30 40 500.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

r

S

S−CV

S−CU

S−P

S−CFC

CV−CU

CV−P

CV−CFC

CU−P

CU−CFC

P−CFC

Figure : Similarities S of sets of top r concepts for 100× 50 random datasets.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 29 / 32

Results

CU, CFC, and CV may naturally be considered as a group of metrics with significantlysimilar behavior, while S and P represent separate, singleton groups.

This observation contradicts the current psychological knowledge. Namely, the(informal) descriptions of S, P, and CU are traditionally considered as essentiallyequivalent descriptions of the notion of basic level in the psychological literature. Onthe other hand, CFC has been proposed by psychologists as a supposedly significantimprovement of CV and the same can be said of CU versus CFC.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 30 / 32

Conclusions

We formalized within FCA five selected approaches to the basic level phenomenon.

The experiments performed indicate a mutual consistency of the proposed basic levelmetrics but also an interesting pattern.

Formalization of basic level in the framework of formal concept analysis is beneficial forthe psychological investigations themselves because it helps put them on a solid, formalground.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 31 / 32

Thank you.

Belohlavek R., Trnecka M. (DAMOL) Basic Level in Formal Concept Analysis August 8, 2013 32 / 32