56
Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former assistents of Prof. Paul Darius

Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Daedalus and Icarus, statistical expert systems for agriculture

Leen Nys and Luc Duchateau

former assistents of Prof. Paul Darius

Page 2: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Experimental design: theory versus application

Page 4: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Daedalus and

Icarus

Page 6: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Chaos and structure

Page 7: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

TAXSY A Rule-based Expert System Shell Developed with SAS® Software

Page 8: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

TAXSY – A Rule-based Expert System Shell Developed with SAS® Software

• Darius, P. (1984) Expert Systems and Statistics, SEAS Proceedings, Spring Meeting

• Darius, P. (1986) Building Expert Systems with the Help of Existing Software, COMPSTAT 1986: Proceedings in Computational Statistics

• Darius, P. (1988) Statistical Expert Systems: Some Implementation and Experimentation Aspects. Osterreichisches Zeitschrift fur Statistik und Informatik

• Darius , P. (1990) A toolbox for adding knowledge-based modules to existing statistical software. Annals of Mathematics and Artificial Intelligence

• Demonstration of TAXSY at International Summer School on Computational Aspects of Model Choice, Charles University Prague, 1-14 July 1991

Page 9: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

• Expert systems use explicitly coded knowledge (often in the form of IF-THEN rules) to solve problems for which a (numerical) algorithmic solution is not appropriate

• TAXSY is an expert system shell completely written in SAS

• It consists of a set of SAS programs which, with the addition of datasets with rules and code, form a flexible system for knowledge-based consultation

Page 10: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

• The heart of TAXSY is the inference engine • The inference engine is capable of backward chaining

on rules • One needs to specify an attribute as the goal of the

inference process (e.g. name of a test) • The inference engine will repeatedly invoke the rule to

find a value for the goal attribute.

TAXSY (AF Application)

Page 11: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

TAXSY needs a rule base in the form of a SAS dataset, with RULES of the following format:

• IF (attribute) (operator) (value) • AND (attribute) (operator) (value) • … • THEN (attribute) (operator) (value)

TAXSY (AF Application)

RULES (SAS Datasets)

Page 12: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

• To obtain a value that cannot be inferred from rules, TAXSY invokes an appropriate interface.

• The PROMPTS dataset should contain, for each such attribute, the name of the AF-application TAXSY has to start: Simple menu Sophisticated applications involving the construction of SAS-

programs based on information previously obtained and processing of their results.

TAXSY (AF Application)

RULES (SAS Datasets)

PROMPTS (SAS Datasets)

Page 13: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

STRUCTURE dataset • contains metadata • stores information about variables and observations

and about relations between variables

TAXSY (AF Application)

RULES (SAS Datasets)

PROMPTS (SAS Datasets)

STRUCTURE (SAS Datasets)

Page 14: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

STRATEGY dataset • inference process generally needs in a given stage

only a limited number of rules • rule dataset is splitted in a number of modules

through the STRATEGY dataset • to speed up the search process

TAXSY (AF Application)

RULES (SAS Datasets)

PROMPTS (SAS Datasets)

STRUCTURE (SAS Datasets)

STRATEGY (SAS Datasets)

Page 15: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

DAEDALUS Description, Analysis and Experimental Design for AgricuLtUral Systems

Page 16: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Innovative aspects

Page 17: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Innovative aspects - OOPS

Page 18: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Innovative aspects–No name approach

Page 19: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Innovative aspects – Mixed model

Page 20: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Object “Experiment”

Page 21: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Instance variables

Page 24: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Low level strategy - Analysis specification Part Attribute Operator Value

if filled dataset is present

and design structure for analysis is specified

then analysis specification is done

if filled dataset Is absent

and message for missing dataset Is given

then analysis specification is abandoned

if design structure for analysis is not specified

and message for non specification is given

then analysis specification is abandoned

Item name Item value (=method-name)

filled dataset Check-existence_of_dataset

design structure for analysis Determine_design

message for missing dataset Show_message_missing_dataset

message for non specification Show_message_non_specification

Page 25: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Determine_design_method

• Determine_design_method: three options

o Already available through ICARUS

o Choose from a list of designs and assign variables

o Construct a design

Page 26: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Variables type

Page 27: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Variables relationship

Page 28: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Low level strategy – Design for analysis Part Attribute Operator Value

if set of variance components is estimable

then design for analysis is allowed

if set of variance components is partially estimable

and message for partial estimability Is given

then design for analysis is allowed

if set of variance components is not estimable

and message for non estimability is given

then design for analysis is not allowed

Item name Item value (=method-name)

set of variance components dataset Check_estimability_of_variance_components

message for partial estimability Show_message_partial_estimability

message for estimability Show_message_non_estimability

Page 29: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

check_estimability_of_variance_ components

• Non estimability due to

– Specific missing value pattern

– Misspecification of the design structure

• Method based on stratum concept and REML

– Stratum needs to have at least rank one after projection on the space spanned by the treatment factors, otherwise no degrees of freedom remaining to estimate residual variance

Page 31: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Updating the experiment object

Page 33: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

ICARUS Prototype to ‘explore’ experimental designs

Page 34: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Nys, M., P. Darius and M. Marasinghe An interactive window-based environment for experimental design. COMPSTAT, 1992, Neuchatel Nys, M., P. Darius and M. Marasinghe An interactive window-based environment to explore design of an experiment SOFTSTAT, 1993, Heidelberg Invited talk given by P. Darius and M. Nys at the University of Augsburg, 1994 Invited talk for reserachers in chemistry given by P. Darius and M. Nys at Research Center of Boehringer Mannheim, Tutzing, July 1994

ICARUS Prototype to ‘Explore’ Experimental

Designs

Page 35: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

• Tool to assist statisticians and experimenters to ‘explore’ experimental designs during the design phase

• ‘Design’ as an object – Interactive – Graphical

• Strategy based: incorporates statistical knowledge

• Implementation in Objectworks/Smalltalk on SUN workstations

Window-based

Page 36: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Representation of a Design

• Text

• Name

• Model (design-matrix)

• Dataset

• Others

• Graph of crossing/nesting relationship of factors

Factorial 24, Latin Square, Split-plot, ..

Fields were chosen in 5 countries. Each field was divided into 4 plots,...

Y ij=µ + α i + β j + ε ij

Block Treatment 1 3 1 2 1 1 2 2 2 1 2 3

e.g. Kirk: SPF-p.qr

VARIETY * PLANT

Page 37: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Hasse Diagrams nesting = partial ordering Hasse diagram

B * A

A is nested in B (each level of A occurs only within 1 level of B) B*= B is fixed factor

DIET* HORMONE* CHICKEN BATCH

SAMPLE SUBSAMPLE

BATCH IRRIGATION* PLOT VARIETY* SUBPLOT

Two-way Completely Randomized Design

Nested (hierarchical ) Design

AGE GROUP TREATMENT* SUBJECT TIME* SUBJECT-TIME

Split-Plot Design Repeated Measures Design

Page 38: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

SEX* DIET* HORMONE*

CHICKEN

3-way factorial

ROW COLUMN TREATMENT*

PLOT

Latin-square

Page 39: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

A model for the experimental design process

One-shot approach ?

Interactive process ?

Our model : Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: •practical aspects •analysis aspects

Page 40: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation

Defining the problem

An engineer wants to study the stretch of a piece of metal punched in a die. Three facors are considered: Lubricant (L): None, mill oil and added lubricant Thickness of the steel (T): 8 and 10 mm. Steel type (S): standard and AK steel For a given combination of L, T and S it is easy to punch 3 pieces, the one after the other. The experimenter plans to repeat the entire experiment a week later.

(based on Lorenzen and Anderson, 1993)

DESIGN ? ANALYSIS?

Page 41: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation

Entering the design

Problem description (abstract)

Computer representation (formal)

Two options are proposed option 1 • describe all elements (treatment factors, blocking = design factors,

experimental units, ...) of the design

• ‘easy’ questions are asked to find the relationship between all the elements

Page 42: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Description of the elements

Name Total number of levels Random/fixed

lubricant 3 fixed

thickness 2 fixed

Steel type 2 fixed

week 2 random

piece 72 random

‘easy’ questions to find relationship

Page 43: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

option 2

• Treatment factor (name) applied on (experimental unit)

• Treatment factor random/fixed

• Design factor (name)

• Response variables (name) measured on (observational unit)

• ‘More difficult’ questions are asked to find the relationship between all the elements

Page 44: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation

Design representation

• Hasse diagram

• Model

• Dataset

See later

Page 45: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Design evaluation : - analysis aspects (see later) - practical aspects

Problems: - Very difficult to run exp. in random order

Solutions: - If # treatm. > 1 then group exp. units

Randomized order ?

Example

Page 46: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation

Re-defining the problem

Re-entering the design

• Since it takes a long time to wipe down a die, one lubricant was selected and all combinations of thickness and steel type were run before another lubricant was used

• All 3 pieces are punched before going to the next thickness and steel type

• Piece is not the experimental unit of lubricant, steel type or thickness, but it is the observational units on which the stretch is measured

• Die is the experimental unit for lubricant • A group of 3 pieces is the experimental unit for thickness and steel type

Page 47: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation

• Hasse diagram of factors

• Model

• Dataset

Design representation

Page 48: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant 2 Die 0 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricatn*thickness 2 Lubricant*type*thickness 2 Week*type 1 Week*thickness 1 Week*type*thickness 1 Week*lubricant*type 2 Week*lubricant*thickness 2 Week*lubricant*type*thickness 2 Die*type 0 Die*thickness 0 Die*type*thickness 0 Group 0 Piece 47

(I) For designs with a ‘(balanced) complete response response structure’ (Taylor and Hilton)

• An interaction term for factors not connected by a line in a Hasse diagram, is added to the model

(= maximal model)

• The degrees of freedom are calculated by the rules given in Taylor and Hilton (and in most text books)

Maximal model: some effects can have 0 df

Page 49: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant + die 2 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricant*thickness 2 Lubricant*type*thickness 2 Week*type 1 Week*thickness 1 Week*type*thickness 1 Week*lubricant*type +die*type 2 Week*lubricant*thickness + die*thickness 2 Week*lubricant*type*thickness+die*type*thickness+group 2 Piece 47

(II) Effects with 0 df are confounded with effects with df = 0

Based on EMS e.g. EMS (die)=EMS(week*lubricant) + x variance(die)

Page 50: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant + die 2 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricant*thickness 2 Lubricant*type*thickness 2 Week*type Week*thickness Week*type*thickness Week*lubricant*type +die*type Week*lubricant*thickness + die*thickness Week*lubricant*type*thickness+die*type*thickness+group 9 Piece 47

(III) The interactions between the treatment and design (=blocking factors) are pooled

week

die

group

piece

Error strata

Page 51: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Up to now:

Balanced complete response structure Model I (=maximal model) Model II Model III

rules Taylor and Hilton (effects of model and df)

confounding

pooling

For orthogonal structure:

• Hasse diagram of ‘relevant’ effects (factors of cross-classifications)

• Add μ on top of Hasse diagram

• For each effect: total number of levels

• For df

Start at top

• df for μ is 1

• df for a specific effect= total number of levels for specific effect – sum of df of effects on top of specific effect

Page 52: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Hasse diagram of effects (model III)

• 4 error-strata

• No interaction between week (design facor) and treatment factors

• Remark: 2 pieces per groups instead of 3 (in orignal design description)

Page 53: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: -Practical aspects -Analysis aspects Evaluation of design: analysis aspects

Page 54: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Evaluation of design: analysis aspects- power

Standardized minimal detectable difference

Power for lubricant (if # weeks=2)

α

Page 55: Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: -Practical aspects -Analysis aspects

Re-entering of the design

3 weeks instead of 2 weeks -> significant increase in power

Conclusion:

Efficient way to explore the experimental design during the design phase

researchers

statistical consultants