Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former

Daedalus and Icarus, statistical expert systems for agriculture

Leen Nys and Luc Duchateau

former assistents of Prof. Paul Darius

Experimental design: theory versus application

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.bbsrc.ac.uk/organisation/institutes/institutes-of-bbsrc/rothamsted.aspx&ei=5Mt5VKzuDYXeaoeNgpgK&psig=AFQjCNGeVSVaQuytzU0t2Wjbn5MA_wLXoQ&ust=1417354405426714

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.bbsrc.ac.uk/organisation/institutes/institutes-of-bbsrc/rothamsted.aspx&ei=5Mt5VKzuDYXeaoeNgpgK&psig=AFQjCNGeVSVaQuytzU0t2Wjbn5MA_wLXoQ&ust=1417354405426714

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://blog.longnow.org/02007/08/01/long-term-agricultural-experiments/&ei=Wsx5VNH8KtjdatSIgMAJ&psig=AFQjCNFQj2i_YaCaz1JNTTvj0q0gp_6oIA&ust=1417354705831192



http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.theguardian.com/environment/2012/may/21/farmer-charged-damage-gm-crop&ei=lsx5VJDwNZf7aq6jgoAK&psig=AFQjCNFQj2i_YaCaz1JNTTvj0q0gp_6oIA&ust=1417354705831192

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.commondreams.org/news/2012/05/21/organic-farmer-charged-disturbing-genetically-modified-crop-trial&ei=zsx5VL72F5PgatCZgbgK&psig=AFQjCNFQj2i_YaCaz1JNTTvj0q0gp_6oIA&ust=1417354705831192

TAXSY

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://autobirdblog.com/category/industry-news/page/3/&ei=LJt5VNunGs34aLfngagG&psig=AFQjCNEPVjU1f7peQwBMrN8IpoIqKuhldQ&ust=1417342096051199

Daedalus and

Icarus

The agricultural experimental stations

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.josdemeyer.be/node/33&ei=xZN5VM3PI8GVap_ogIAO&bvm=bv.80642063,d.ZGU&psig=AFQjCNHzvyaHrgaNKzNYnCyQMqHLNSafqQ&ust=1417340225120750

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.josdemeyer.be/node/33&ei=xZN5VM3PI8GVap_ogIAO&bvm=bv.80642063,d.ZGU&psig=AFQjCNHzvyaHrgaNKzNYnCyQMqHLNSafqQ&ust=1417340225120750

Chaos and structure

TAXSY A Rule-based Expert System Shell Developed with SAS® Software

TAXSY – A Rule-based Expert System Shell Developed with SAS® Software

• Darius, P. (1984) Expert Systems and Statistics, SEAS Proceedings, Spring Meeting

• Darius, P. (1986) Building Expert Systems with the Help of Existing Software, COMPSTAT 1986: Proceedings in Computational Statistics

• Darius, P. (1988) Statistical Expert Systems: Some Implementation and Experimentation Aspects. Osterreichisches Zeitschrift fur Statistik und Informatik

• Darius , P. (1990) A toolbox for adding knowledge-based modules to existing statistical software. Annals of Mathematics and Artificial Intelligence

• Demonstration of TAXSY at International Summer School on Computational Aspects of Model Choice, Charles University Prague, 1-14 July 1991


• Expert systems use explicitly coded knowledge (often in the form of IF-THEN rules) to solve problems for which a (numerical) algorithmic solution is not appropriate

• TAXSY is an expert system shell completely written in SAS

• It consists of a set of SAS programs which, with the addition of datasets with rules and code, form a flexible system for knowledge-based consultation


• The heart of TAXSY is the inference engine • The inference engine is capable of backward chaining

on rules • One needs to specify an attribute as the goal of the

inference process (e.g. name of a test) • The inference engine will repeatedly invoke the rule to

find a value for the goal attribute.

TAXSY (AF Application)


TAXSY needs a rule base in the form of a SAS dataset, with RULES of the following format:

• IF (attribute) (operator) (value) • AND (attribute) (operator) (value) • … • THEN (attribute) (operator) (value)


RULES (SAS Datasets)


• To obtain a value that cannot be inferred from rules, TAXSY invokes an appropriate interface.

• The PROMPTS dataset should contain, for each such attribute, the name of the AF-application TAXSY has to start: Simple menu Sophisticated applications involving the construction of SAS-

programs based on information previously obtained and processing of their results.



PROMPTS (SAS Datasets)


STRUCTURE dataset • contains metadata • stores information about variables and observations

and about relations between variables




STRUCTURE (SAS Datasets)


STRATEGY dataset • inference process generally needs in a given stage

only a limited number of rules • rule dataset is splitted in a number of modules

through the STRATEGY dataset • to speed up the search process




STRUCTURE (SAS Datasets)

STRATEGY (SAS Datasets)


DAEDALUS Description, Analysis and Experimental Design for AgricuLtUral Systems

Innovative aspects

Innovative aspects - OOPS

Innovative aspects–No name approach

Innovative aspects – Mixed model

Object “Experiment”

Instance variables

Instance variables and methods

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.automatedbuildings.com/news/nov11/articles/sinopoli2/111028023202sinopoli.html&ei=2_h5VOOdGMzeaIuggqAC&bvm=bv.80642063,d.d2s&psig=AFQjCNG5tt0Oce6qxeZ4pFAXLI8iNs-bZQ&ust=1417366049236339

High level strategy

Attribute Value Task Rules-Prompts

Analysis specification specdes

Analysis specification done Response variable respvar

Response variable known Response type Resptype

Response type known Design for analysis desianal

Design for analysis allowed Fit model Fitmodel

Fit model done Check outliers Outlier

Check outliers done Variance function varfunc

Variance function correct Density function Densityf

Density function correct Independence assumption Independ

Independence assumption correct Full analysis analysis




Low level strategy - Analysis specification Part Attribute Operator Value

if filled dataset is present

and design structure for analysis is specified

then analysis specification is done

if filled dataset Is absent

and message for missing dataset Is given

then analysis specification is abandoned

if design structure for analysis is not specified

and message for non specification is given

then analysis specification is abandoned

Item name Item value (=method-name)

filled dataset Check-existence_of_dataset

design structure for analysis Determine_design

message for missing dataset Show_message_missing_dataset

message for non specification Show_message_non_specification

Determine_design_method

• Determine_design_method: three options

o Already available through ICARUS

o Choose from a list of designs and assign variables

o Construct a design

Variables type

Variables relationship

Low level strategy – Design for analysis Part Attribute Operator Value

if set of variance components is estimable

then design for analysis is allowed

if set of variance components is partially estimable

and message for partial estimability Is given

then design for analysis is allowed

if set of variance components is not estimable

and message for non estimability is given

then design for analysis is not allowed

Item name Item value (=method-name)

set of variance components dataset Check_estimability_of_variance_components

message for partial estimability Show_message_partial_estimability

message for estimability Show_message_non_estimability

check_estimability_of_variance_ components

• Non estimability due to

– Specific missing value pattern

– Misspecification of the design structure

• Method based on stratum concept and REML

– Stratum needs to have at least rank one after projection on the space spanned by the treatment factors, otherwise no degrees of freedom remaining to estimate residual variance

Implementation

• SAS did a bad job in estimating variance components

• IML is used instead

• The obtained rank for the different strata is stored in the cov_parms list

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.cliffsnotes.com/math/algebra/linear-algebra/real-euclidean-vector-spaces/projection-onto-a-subspace&ei=ViN-VN-VO6zG7AaY7YHQAQ&psig=AFQjCNHgFYq5kXUzIJIynpNzTSvpa1408Q&ust=1417638407646105

Updating the experiment object

Beyond Daedalus

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.commondreams.org/news/2012/05/21/organic-farmer-charged-disturbing-genetically-modified-crop-trial&ei=zsx5VL72F5PgatCZgbgK&psig=AFQjCNFQj2i_YaCaz1JNTTvj0q0gp_6oIA&ust=1417354705831192

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://bloggingyourpassion.com/why-following-your-passion-works/&ei=ZxB-VPikH4ec7gb94IDgBQ&bvm=bv.80642063,d.ZGU&psig=AFQjCNHDW77inkOd46pVxWsHMlwkcdX4zg&ust=1417634092658155

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://bloggingyourpassion.com/why-following-your-passion-works/&ei=ZxB-VPikH4ec7gb94IDgBQ&bvm=bv.80642063,d.ZGU&psig=AFQjCNHDW77inkOd46pVxWsHMlwkcdX4zg&ust=1417634092658155

http://bloggingyourpassion.com/wp-content/uploads/2011/06/wpid-Photo-Jun-30-2011-939-AM.jpg

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.quotesvalley.com/quotes/friendship/page/548/&ei=ZB5-VKiIGM3Q7AaH7oEo&psig=AFQjCNEk1ox-vIsbH_PXBUFar3pPnyz7KQ&ust=1417634396373737

ICARUS Prototype to ‘explore’ experimental designs

Nys, M., P. Darius and M. Marasinghe An interactive window-based environment for experimental design. COMPSTAT, 1992, Neuchatel Nys, M., P. Darius and M. Marasinghe An interactive window-based environment to explore design of an experiment SOFTSTAT, 1993, Heidelberg Invited talk given by P. Darius and M. Nys at the University of Augsburg, 1994 Invited talk for reserachers in chemistry given by P. Darius and M. Nys at Research Center of Boehringer Mannheim, Tutzing, July 1994

ICARUS Prototype to ‘Explore’ Experimental

Designs

• Tool to assist statisticians and experimenters to ‘explore’ experimental designs during the design phase

• ‘Design’ as an object – Interactive – Graphical

• Strategy based: incorporates statistical knowledge

• Implementation in Objectworks/Smalltalk on SUN workstations

Window-based

Representation of a Design

• Text

• Name

• Model (design-matrix)

• Dataset

• Others

• Graph of crossing/nesting relationship of factors

Factorial 24, Latin Square, Split-plot, ..

Fields were chosen in 5 countries. Each field was divided into 4 plots,...

Y ij=µ + α i + β j + ε ij

Block Treatment 1 3 1 2 1 1 2 2 2 1 2 3

e.g. Kirk: SPF-p.qr

VARIETY * PLANT

Hasse Diagrams nesting = partial ordering Hasse diagram

B * A

A is nested in B (each level of A occurs only within 1 level of B) B*= B is fixed factor

DIET* HORMONE* CHICKEN BATCH

SAMPLE SUBSAMPLE

BATCH IRRIGATION* PLOT VARIETY* SUBPLOT

Two-way Completely Randomized Design

Nested (hierarchical ) Design

AGE GROUP TREATMENT* SUBJECT TIME* SUBJECT-TIME

Split-Plot Design Repeated Measures Design

SEX* DIET* HORMONE*

CHICKEN

3-way factorial

ROW COLUMN TREATMENT*

PLOT

Latin-square

A model for the experimental design process

One-shot approach ?

Interactive process ?

Our model : Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: •practical aspects •analysis aspects

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation

Defining the problem

An engineer wants to study the stretch of a piece of metal punched in a die. Three facors are considered: Lubricant (L): None, mill oil and added lubricant Thickness of the steel (T): 8 and 10 mm. Steel type (S): standard and AK steel For a given combination of L, T and S it is easy to punch 3 pieces, the one after the other. The experimenter plans to repeat the entire experiment a week later.

(based on Lorenzen and Anderson, 1993)

DESIGN ? ANALYSIS?


Entering the design

Problem description (abstract)

Computer representation (formal)

Two options are proposed option 1 • describe all elements (treatment factors, blocking = design factors,

experimental units, ...) of the design

• ‘easy’ questions are asked to find the relationship between all the elements

Description of the elements

Name Total number of levels Random/fixed

lubricant 3 fixed

thickness 2 fixed

Steel type 2 fixed

week 2 random

piece 72 random

‘easy’ questions to find relationship

option 2

• Treatment factor (name) applied on (experimental unit)

• Treatment factor random/fixed

• Design factor (name)

• Response variables (name) measured on (observational unit)

• ‘More difficult’ questions are asked to find the relationship between all the elements


Design representation

• Hasse diagram

• Model

• Dataset

See later

Design evaluation : - analysis aspects (see later) - practical aspects

Problems: - Very difficult to run exp. in random order

Solutions: - If # treatm. > 1 then group exp. units

Randomized order ?

Example


Re-defining the problem

Re-entering the design

• Since it takes a long time to wipe down a die, one lubricant was selected and all combinations of thickness and steel type were run before another lubricant was used

• All 3 pieces are punched before going to the next thickness and steel type

• Piece is not the experimental unit of lubricant, steel type or thickness, but it is the observational units on which the stretch is measured

• Die is the experimental unit for lubricant • A group of 3 pieces is the experimental unit for thickness and steel type


• Hasse diagram of factors

• Model

• Dataset

Design representation

Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant 2 Die 0 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricatn*thickness 2 Lubricant*type*thickness 2 Week*type 1 Week*thickness 1 Week*type*thickness 1 Week*lubricant*type 2 Week*lubricant*thickness 2 Week*lubricant*type*thickness 2 Die*type 0 Die*thickness 0 Die*type*thickness 0 Group 0 Piece 47

(I) For designs with a ‘(balanced) complete response response structure’ (Taylor and Hilton)

• An interaction term for factors not connected by a line in a Hasse diagram, is added to the model

(= maximal model)

• The degrees of freedom are calculated by the rules given in Taylor and Hilton (and in most text books)

Maximal model: some effects can have 0 df

Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant + die 2 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricant*thickness 2 Lubricant*type*thickness 2 Week*type 1 Week*thickness 1 Week*type*thickness 1 Week*lubricant*type +die*type 2 Week*lubricant*thickness + die*thickness 2 Week*lubricant*type*thickness+die*type*thickness+group 2 Piece 47

(II) Effects with 0 df are confounded with effects with df = 0

Based on EMS e.g. EMS (die)=EMS(week*lubricant) + x variance(die)

Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant + die 2 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricant*thickness 2 Lubricant*type*thickness 2 Week*type Week*thickness Week*type*thickness Week*lubricant*type +die*type Week*lubricant*thickness + die*thickness Week*lubricant*type*thickness+die*type*thickness+group 9 Piece 47

(III) The interactions between the treatment and design (=blocking factors) are pooled

week

die

group

piece

Error strata

Up to now:

Balanced complete response structure Model I (=maximal model) Model II Model III

rules Taylor and Hilton (effects of model and df)

confounding

pooling

For orthogonal structure:

• Hasse diagram of ‘relevant’ effects (factors of cross-classifications)

• Add μ on top of Hasse diagram

• For each effect: total number of levels

• For df

Start at top

• df for μ is 1

• df for a specific effect= total number of levels for specific effect – sum of df of effects on top of specific effect

Hasse diagram of effects (model III)

• 4 error-strata

• No interaction between week (design facor) and treatment factors

• Remark: 2 pieces per groups instead of 3 (in orignal design description)

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: -Practical aspects -Analysis aspects Evaluation of design: analysis aspects

Evaluation of design: analysis aspects- power

Standardized minimal detectable difference

Power for lubricant (if # weeks=2)

α

Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: -Practical aspects -Analysis aspects

Re-entering of the design

3 weeks instead of 2 weeks -> significant increase in power

Conclusion:

Efficient way to explore the experimental design during the design phase

researchers

statistical consultants

A Personal Note to End ...

http://www.google.be/url?url=http://www.safetyanalyse.nl/shell-stapt-diverse-schalieprojecten-vs/&rct=j&frm=1&q=&esrc=s&sa=U&ei=ZAp7VKLgNsPoaJy9gqgD&ved=0CBcQ9QEwAQ&usg=AFQjCNEWRPh7gdSVxUU6QP8xD5CER7k4EA

//upload.wikimedia.org/wikipedia/de/a/a2/Mb-logo.svg

Documents

Daedalus and Icarus, statistical expert systems for agriculture · 2014-12-09 · Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former