6

Click here to load reader

DKBLM — Deep knowledge based learning methodology

Embed Size (px)

Citation preview

Page 1: DKBLM — Deep knowledge based learning methodology

Vol . 8 N o . 4 J. o f C o m p u t . Sci. & T e c h n o l . 1993

D K B L M - - D e e p Knowledge Based Learning Methodology

M a Z h i f a n g ( - ~ )

Dept. of Computer Science, Jilin University, Changchun 130023 Received September 13, 1991; revised February t8, 1992.

A b s t r a c t

To solve the Imperfect Theory Problem (ITP) faced by Explanation Based Generalization (EBG), this paper proposes a methodology, Deep Knowledge Based Learning Methodology (DKBLM) by name, and gives an implementation of DKBLM, called Hierarchically Distributed Learning System ffiDLS). As an example of HDLS's application, this paper shows a learning system (MLS) in meteorology domain and its running with a simplified example.

DKBLM can acquire experiential knowledge with causality in it. It is applicable to those kinds of domains, in which experiments are relatively difficult to carry out, and in which there exist many available knowledge systems at different levels for the same domain (such as weather forecasting).

Keywords: Machine learning, explanation based learning, deep knowledge.

1. I n t r o d u c t i o n

One of explanation based approaches is the Explanation Based Generalization (EBG) tll. A summary of EBG is as follows.

Given: * Goal Concept: A definition of the concept to be learned in terms of high-level or function-

al properties which are not available in the representation of an example. * Training Example: A representation of a specific example of the concept in terms of

lower-level features. * Domain Theory: A set of inference rules and facts for proving that a training example

meets the high-level definition of the concept. * Operationality Criterion: A specification about the definition of a concept must be repre-

sented so that the concept can be efficiently recognized. Determine: * General Operational conditions for being a member of the concept. Given this information, the first task in EBG is to construct an explanation of why the

training example satisfies the goal concept by using the inference rules in the domain theory. This explanation takes form of a proof tree composed of inference rules which proves that the training example is a member of the concept. Next, this explanation is generalized to obtain a set of sufficient conditions for which this explanation structure hold in general. These conditions represent general operational conditions for being a member of the concept.

It is obvious that the learning ability of EBG lies in its pre-possessed huge amount of do- main or non-domain knowledge. So it is rational to demand that an EBG system have perfect knowledge. But it is well known that there are many difficulties in equipping an. EBG system with perfect knowledge from the beginning of building the EBG system. Without perfect know- ledge, EBG systems may not get right results of learning. This is called "Imperfect Theory Problem (ITP)-t21.

There are four methodologies so far which may be used to solve ITP. ( 1 ) Experiment methodology [2' 31. It is applicable to those kinds of domains, such

as Physics or Chemistry, in which experiments are relatively easy to make. But it may be not

Page 2: DKBLM — Deep knowledge based learning methodology

380 J. of Comput. Sci. & Technol. Vol.8

applicable to those kinds of domains, such as Weather Forecasting, in which experiments are rel- atively difficult or expensive to conduct.

(2) Induction methodologyl4'51: It forms, hypothesis of domain knowledge level from informa- tion of data level with Similar based learning techniques. It may output efficient experiential knowledge as it has done so far. But it may not output causality underlying the experiential

knowledge which some domain experts prefer. (3) Analogy methodologytr'7'81: Referring to the knowledge of other domains is an idea and

has been used for a long time. Although many advances in Analogy have been made in recent years, there are still some difficulties. First, for a domain, it is difficult to find a similar one. Se- cond,although a similar domain may be found, there may be no available knowledge systems which can be used to analyze. Third, the techniques (such as concept corresponding, data trans- forming, understanding between different domains as well as analogy itself) need to be developed practically.

(4) Deep knowledge based learning .methodology (DKBLM) (this paper): It is applicable to those kinds of domains, in which experiments are relatively difficult to make, and in which there exist many available knowledge systems at different levels in the same domain (such as in weather forecasting). It can acquire experiential knowledge with causality in it as shown in the next section.

2o Deep Knowledge Based L e a r n i n g M e t h o d o l o g y ( D K B L M )

Based on the opinion of abstract knowledge structure in [9], we put forward HYPOTHESIS OF EXPLANATION BY DEEP KNOWLEDGE, that is, a kind of causality in D ti) may be ex- plained with K "§ (h = 1, 2, . . . ) in the same D ~'~.

In this section, we first introduce the members of H D L S - - l e a r n i n g agents (LAs), then we introduce the organic whole hierarchically distributed learning system (HDLS) composed of LAs. All learning agents in an HDLS have a unique scheme of external features and a unique structure of internal logic.

The scheme of an LA's external features is : A G E N T - NAME < name - string > I D - NUMBER < n u m b e r > DO MAIN < string > HO ST - MACHINE < string > OWNER < n a m e - string >

The external features are information which is to be perceived by HDLS and other LAs in the same HDLS. LAs in different HDLS's in different application domains may have different schemes.

user or LAs in upper levels

j lexp,ana on genera ation I *] knowledge base I [ of the LA

~ I hypothesization and justification ~

call LAs in deeper levels

Fig.1. An LA's internal logic.

The structure of an LA's internal logic is abstractly composed of two parts as Fig. I. Any LA in an HDLS is allowed to accept a learning task from the user or other LAs in upper lev- els. Given a training example

P (v I , -.., v~) : : > C (1)

Page 3: DKBLM — Deep knowledge based learning methodology

No.4 Deep Knowledge Based Learning Methodology 381

the learning task of an LA is trying to find a P ' (u~ , .--, u~,), such that

P (v t , --., v,): < P ' ( u I , .-., u,,) (2) P "(u 1, . . . , u~):: > C (3)

where v i ( i = l , 2, . . . , n ) and uj ( j = l , --., m) are constants or variables, and formula (1) means that there is an example, whose description is P ( h , "", vn), belonging to class C. Formulas (2) and (3) mean jointly generalizing P(v I , ..., v , ) to P '(u I , ..-, u,,) such that any objects satisfying P '(u I , . . . , u,,) belong to class C (for a detailed explanation of symbols '. < and ; ; > , see also [ I0] ). We denote Pi (xl , "", x , ) as Pi for simplicity from now on.

Definition 1. We use (PI , "", P= ) or P to denote Pi /~ "'" /~ Pm ~ and call it conjunctive

set and we use {Pi , "", Pro} or P to denote Pl W ... W P~ , and call it disjunctive set.

Definition 2. I f Q (wt , . . . , wh ) (or Q for simplicity ) can be deduced from a conjunctive set P

or its subset in a knowledge base KB , then we denote it as P - " Q.

Definition 3. Suppose P is a conjunctive set, we define, in K B . p (0) = p;

(t) P =(c [p(O),,. e l ;

n - - I

P(~) = (c [ ~ p(i)__.~. C ) (n ---- I, 2, ... ) i = I

co (*)_ p _ ~ pO).

i=O

Definition 4. I f P l , "", P~ are all those conjunctive sets in K B which satisfy Pl--~ Q, "",

P , - ~ Q, then we denote it as P=> Q, where P={P1 , ""P,}. Definition 5. Suppose. A and B are two disjunctive sets and any elements in either A or B are

conjunctive sets, and ch ~ A and ~p ~ B, we define

A * B = { ( a l , . . . , a , , , b I , . . . , b , ) l V (a t , . . . , a~,)~ A and V (b j , . . . , b , ) ~ B}

Definition 6. We define c <~ > = { ( c ) };

C<n)= [') PI *'"*P,. condition

where condition~-(V(Qi , . . . , Qm) ~ C(n-l)) and ( ( V i ) (Pi --> Qi) in K B ) ;

C ~ ' > = 0 C <i) i = 0

In the part of :explanation and generalization (EG) ' , LA first tries to explain formula ( I ) using its own knowledge. If the explanation succeeds, the LA begins to generalize the explana- tion tree. If the generalization succeeds, it returns a generalized result to the user or other LA which calls it. If the generalization fails, then it returns a failure or an ungeneralized explanation tree to the user or other LA which calls it. This part "explanation and generalization (EG)" is similar to the traditional EBG. If the explanation fails, the LA steps into "hypothesization and justification (HG)" part.

Theorem 1. An explanation succeeds in K B if and only if

(3N l ) (3N 2) (qe ) (e ~ C(N~)A e ~ P(tr ). Theorem 2. An explanation succeeds in K B if and only if

(3N) (3e) (e ~ c(N)/~ e ~ P ( ~

Theorem 3. An explanation succeeds in K B if and only i f (3N) (c~ PCN) ).

Definition 7. Suppose Q ~ C <') and Q = ( Q I , "", Qm), and P~P<~ we define that P-- , Q

means (P--~ Qi ) A ... A (P--~ Qm )-

In the part of "hypothesization and justification (HJ)", LA chooses elements Q ~ C (') and P~__P~'>, makes a hypothesis " P ~ - Q" to try to bridge the gap between C (') and pC'), and

evaluate it briefly on the possibility L ( P , Q) of hypothesis " P - - - Q" ( two pieces of

Page 4: DKBLM — Deep knowledge based learning methodology

382 J. of Comput. Sci. & Technol. Vol.8

meta-knowledge below might be useful for the evaluation). If L(P, Q) is less than a pre-fixed number, the hypothesis "P--~ Q" should be given up. If HJ cannot find any P--~ Q whose L(P, Q) is greater enough, then the LA returns a failure to the user or the LA which calls it. If an L(P, Q) is greater enough, the LAjustifies the hypothesis "P--; Q" by first preparing the con- tent of a communication letter calling deep LAs, and then waiting for answers from deep LAs. After the part HJ receives an answer queue, reorders it and send it to the part EG.

Definition 8. Suppose x and y are two objects with time factor, the time difference TD between x a n d y is (Cfo [ I1] )

TD(x, y )= I the time of x's occurring-the time of y's occurring ] META-KNOWLEDGE-I: Suppose T D ( P , C ) = T for a training example P i ', >C,

and TD (P ' , Q ) = t for any rules P'--~ Q in a KB, then e ~ C {~), T D ( e , C ) = t * i,

V e ~ Pq}, T D ( e , P ) = t * j . So LA need only to take those Q E C {i) and P ~ Pq) satisfying t* (i + j ) = T into consideration.

Definition 9. Suppose rl : Pi--" Q and r 2 : P 2 --~ Q are inference rules in a KB, and the relative coefficient RC (el , P2 ) of the two rules is defined as

RC (PI p2)= t P| c~P z[ �9 [ PIuP2 l

where l Xl is the number of members in a set X. It is easy to see that O<~RC (P~ , P2)<<. 1. M E T A - K N O W L E D G E - I I : Suppose HJ chooses a "P--~ Q" and there is an inference

rule P ' =>Q in a KB, then use RC (P, P ' ) as an aid to evaluation. When LAi in level i questions LAj in level j with question ( and L A j answers LA/ with

answer ~" , LA/ uses SFS ~) to transform c into level j , and LAj uses CFS C/) to transform , / into level i. Here we enlarge the definition of SFS {~ and CFS ~/) from the limit o f j = i + I to

that o f j >i .

Abstractly speaking�9 a hierarchically distributed learning system (HDLS) is a vector

HDLS = < A 1 , ..., A ,> (n>0)

where Ai ={Ail , ..., A~i } (i = 1, 2, ..-, n) and A# (/= 1, 2, .--, k i) are LAs introduced above.

Each Aq ( i=1, 2, ..., n - l , j ~ { 1 , 2, ..., ki}) is granted to be able to call any A,, (i<s<~n, t ~ {I, 2, ..-, k,}), while At, is not permitted to call Ai j .

Theorem 4. There is no deadlock of cMling between LAs in an HDLS defined above.

3. An Appl icat ion o f H D L S in M e t e o r o l o g y D o m a i n

Based on the idea of DKBLM, we have developed a learning system called Meteorology Learning System (MLS) following the pattern of HDLS. The following is a simplified example.

Example. INPUT TRAINING EXAMPLE: There are meteorological live faces: F - l : There is a 500hpa jet at time T (its head is at place P, its tail is at cold zone); F - 2 : There is a 850hpa jet at time T (its head is at place P, its tail is at warm zone ); F - 3 : There is a rain storm at place P at time T + I 2 . LEARNING TASK: Suppose F - 3 has a relation to F - 1 and F - 2 . MLS tries to acquire

a more general rule for rain storm forecasting from the training example. ASSUMPTION: MLS is composed of LAI (weather forecasting agent), LA2 (meteorological

principle agent )�9 LA3 (geography agent) and LA4 (hydromechanism agent). The law of calling between these LAs is defined at present as

MLS=<{LA1 }, {LA2, LA4}, {LA3}>. LAI has the following rules:

R - l - 1 : IF cloud cluster of rain storm exists THEN there will be rain storm in 12 hours;

Page 5: DKBLM — Deep knowledge based learning methodology

N o , 4 Deep Knowledge Based Learning Methodology 383

R - I - 2 : IF a je t comes from a cold zone, T H E N the je t is a cold advection;

R - l - 3 : IF a je t comes from a warm zone, T H E N the je t is a warm advection;

LA2 has the following rules: R - 2 - I : IF steam is condensed, T H E N there will be cloud clusters;

R - 2 - 2 : IF X is a cloud cluster o f rain storm, T H E N X is a cloud cluster;

R - 2 - 3 : IF a cold advection exists, T H E N a cold condensing factor exists;

R - 2 - 4 : IF X is a warm advection, T H E N X comes from warm zone.

LA3 has the following rules:

R - 3 - 1 : IF P is a warm zone T H E N there is amount of stream above P.

LA4 has the following rules:

R - 4 - I : IF CA is a cold advection T H E N CA goes down;

R - 4 - 2 : IF W A is a warm advection T H E N W A goes up.

For simplicity, we let O S (LAI) = O S (LA2) _ os(LA3 ) = O S (LA4)

A S (LAI) = A S T M ) = A S (LA3) = A S (LA4) = { (omitted here ) }

R S ( L g l ) = { R - l - 1 , R - l - 2 , R - l - 3 } Rs(LAE)----{R-2-1, R - 2 - 2 , R - 2 - 3 , R - 2 - 4 } R S (LA3) = { R - 3 - 1 }

R s ( L A 4 ) = { R - 4 - I, R - 4 - 2 } CFS (EAt) = null

SFS (LA') = CFS (LA2) = SFS (LA2) = C F S (LA4) = SFS (LA4) = CFS (LA3) = SELF - M A P P I N G

SFS ~LA3 ~ = nult K(LAi)~---'(As(LAi), R S (LAi), S F S (LA/), CFs(LAi)), ( i=1 , 2, 3, 4 )

M L S R U N N I N G : For the input training example, L A I builds up an incomplete explanation

tree (Fig. 2 . ) using R - l - l , R - I - 2 , and R - l - 3 . To complete the tree, LA1 forms a hypothesis:

H - I - 1 : IF there are 500hpa cold advection and 850hpa warm advection at place P

T H E N there will be cloud cluster of rain storm at place P.

To justify H - 1 - 1, LAI calls LA2.

RA/N STORM ",- t CLOUD CLUSTE_. ,_. ? . __ OF RAIN STORM

500hpa COLD ~ I ~ 500hpa JET

850hpa WARM ~ [ ~ 850hpa JET ADVECTION (warm zone)

Fig.2. An incomplete explanation tree formed by LAI.

To explain H - l - l , LA2 forms another incomplete explanation tree ( F i g .

R - 2 - I , R - 2 - 2 , and R - 2 - 3 . To complete the tree, LA2 forms a hypothesis:

H - 2 - 1 : IF there is an 850hpa warm advection, T H E N there is stream.

LA2 then forms a new hypothesis H - 2 - 2 from H - 2 - 1 by using R - 2 - 4 .

H - 2 - 2 : There is stream in warm zone.

CLOUD CLUSTER OF "--I -,-- CLOUD CLUSTER-'-

RAIN STORM

3 ) using

~ - CONDENCING 500hpa COLD FACTOR "--~ "-- ADVECTION

eet) .,--

~mSTREAM . ? . - - 850hpa WARM ADVECTION

Fig.3. An incomplete explanation tree formed by LA2.

To just ify H - 2 - 2 , LA2 calls LA3. LA3 justifies H - 2 - 2 by using R - 3 - 1 . So H - 2 - 1

is justified by H - 2 - 2 . Then LA2 returns LA1 with a modified H - 1 - 1 (named as H ' - 1 - 1 ).

Page 6: DKBLM — Deep knowledge based learning methodology

384 J. of Comput . Sci. & Technol. Vol.8

H ' - I - 1: IF there are 500hpa cold advection and 850hpa warm advection at place P

T H E N there will be cloud cluster of rain storm at place P

C O N D I T I O N The 500hpa cold advection and the 850hpa warm advection

should meet at the same point in the space.

To see if the C O N D I T I O N in H ' - 1 - 1 is satisfied, LAI calls LA4. LA4 justifies by using

R - 4 - 1 and R - 4 - 2 that the C O N D I T I O N is satisfied. So finally LAI has H - 1 - 1 justified.

(The generalization o f H - I - 1 is similar to that of the traditional E B L ) .

It is easy to see that this kind o f learning cannot be substituted by Similar based learning

or Analogy learning or Experiential learning or the traditional EBL at the same domain level. It

makes use of several existing knowledge systems to fulfill an explanation of a training example

to some extent. So it may benefit from Distr ibuted Artificial Intelligence (DAI) in learning abili-

ty, system organization, knowledge management , and so on.

References

[ 1] G. DeJong and R.Mooney, Explanation-based learning: an alternative view. Tech. Pep. U I L U - E N G - 86- 2208, 1986.

[ 2] S.Rajamoney, G.DeJong and B.Faltings, Towards a model of conceptual knowledge acquisition through directed experimentation. IJCAI-85, 1985, pp.688-- 690.

[ 3] S.Rajamoney, The classification, detection and handling of imperfect theory problems. IJCAI -87 , 1987, p p . 2 0 5 - 207.

[ 4 ] B.Silver, Learning equation solving methods from examples. IJCAI-83, 1983, pp .429-- 431. [ 5 ] S.Salzberg, Heuristics for inductive learning. IJCAI-85, 1985, pp.603-- 609. [ 6 ] P.Winston, Artificial Intelligence: Addison-Wesley, 1977. [ 7 ] J.Carbonell, Experimental. learning in analogical .problem solving. AAAI-82, 1982, pp .168-- 17l. [ 8] J.Carbonell, Learning by analogy, formulating and generalizing plans from past experience. In Machine

Learning, Ed. by R.Michalski, J.CarboneU and T.Mitchell Vol. 1, Springer-Verlag, 1984. [ 9 ] Z.Ma and W.Zou, On the levels of knowledge and deep knowledge. Proc. of Int'l. Conf. for Young

Computer Scientists, 1991, Bering, pp.442-- 445. [10] R.Michalski, J.Carbonell and T.MitcheU' (Ed.), Machine Learning, Vol.l. Spfinger-Verlag, 1984. [11] Z.ma, R.Zhao and F.Su, A representation for time and inference on persistence with uncertainty in

weather forecast. Proc. of Int'l. Conf. on Next Generation Computing Systems, 1989, Bering, pp.22-- 26.