Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Daedalus and Icarus, statistical expert systems for agriculture
Leen Nys and Luc Duchateau
former assistents of Prof. Paul Darius
Experimental design: theory versus application
Daedalus and
Icarus
The agricultural experimental stations
Chaos and structure
TAXSY A Rule-based Expert System Shell Developed with SAS® Software
TAXSY – A Rule-based Expert System Shell Developed with SAS® Software
• Darius, P. (1984) Expert Systems and Statistics, SEAS Proceedings, Spring Meeting
• Darius, P. (1986) Building Expert Systems with the Help of Existing Software, COMPSTAT 1986: Proceedings in Computational Statistics
• Darius, P. (1988) Statistical Expert Systems: Some Implementation and Experimentation Aspects. Osterreichisches Zeitschrift fur Statistik und Informatik
• Darius , P. (1990) A toolbox for adding knowledge-based modules to existing statistical software. Annals of Mathematics and Artificial Intelligence
• Demonstration of TAXSY at International Summer School on Computational Aspects of Model Choice, Charles University Prague, 1-14 July 1991
• Expert systems use explicitly coded knowledge (often in the form of IF-THEN rules) to solve problems for which a (numerical) algorithmic solution is not appropriate
• TAXSY is an expert system shell completely written in SAS
• It consists of a set of SAS programs which, with the addition of datasets with rules and code, form a flexible system for knowledge-based consultation
• The heart of TAXSY is the inference engine • The inference engine is capable of backward chaining
on rules • One needs to specify an attribute as the goal of the
inference process (e.g. name of a test) • The inference engine will repeatedly invoke the rule to
find a value for the goal attribute.
TAXSY (AF Application)
TAXSY needs a rule base in the form of a SAS dataset, with RULES of the following format:
• IF (attribute) (operator) (value) • AND (attribute) (operator) (value) • … • THEN (attribute) (operator) (value)
TAXSY (AF Application)
RULES (SAS Datasets)
• To obtain a value that cannot be inferred from rules, TAXSY invokes an appropriate interface.
• The PROMPTS dataset should contain, for each such attribute, the name of the AF-application TAXSY has to start: Simple menu Sophisticated applications involving the construction of SAS-
programs based on information previously obtained and processing of their results.
TAXSY (AF Application)
RULES (SAS Datasets)
PROMPTS (SAS Datasets)
STRUCTURE dataset • contains metadata • stores information about variables and observations
and about relations between variables
TAXSY (AF Application)
RULES (SAS Datasets)
PROMPTS (SAS Datasets)
STRUCTURE (SAS Datasets)
STRATEGY dataset • inference process generally needs in a given stage
only a limited number of rules • rule dataset is splitted in a number of modules
through the STRATEGY dataset • to speed up the search process
TAXSY (AF Application)
RULES (SAS Datasets)
PROMPTS (SAS Datasets)
STRUCTURE (SAS Datasets)
STRATEGY (SAS Datasets)
DAEDALUS Description, Analysis and Experimental Design for AgricuLtUral Systems
Innovative aspects
Innovative aspects - OOPS
Innovative aspects–No name approach
Innovative aspects – Mixed model
Object “Experiment”
Instance variables
High level strategy
Attribute Value Task Rules-Prompts
Analysis specification specdes
Analysis specification done Response variable respvar
Response variable known Response type Resptype
Response type known Design for analysis desianal
Design for analysis allowed Fit model Fitmodel
Fit model done Check outliers Outlier
Check outliers done Variance function varfunc
Variance function correct Density function Densityf
Density function correct Independence assumption Independ
Independence assumption correct Full analysis analysis
Low level strategy - Analysis specification Part Attribute Operator Value
if filled dataset is present
and design structure for analysis is specified
then analysis specification is done
if filled dataset Is absent
and message for missing dataset Is given
then analysis specification is abandoned
if design structure for analysis is not specified
and message for non specification is given
then analysis specification is abandoned
Item name Item value (=method-name)
filled dataset Check-existence_of_dataset
design structure for analysis Determine_design
message for missing dataset Show_message_missing_dataset
message for non specification Show_message_non_specification
Determine_design_method
• Determine_design_method: three options
o Already available through ICARUS
o Choose from a list of designs and assign variables
o Construct a design
Variables type
Variables relationship
Low level strategy – Design for analysis Part Attribute Operator Value
if set of variance components is estimable
then design for analysis is allowed
if set of variance components is partially estimable
and message for partial estimability Is given
then design for analysis is allowed
if set of variance components is not estimable
and message for non estimability is given
then design for analysis is not allowed
Item name Item value (=method-name)
set of variance components dataset Check_estimability_of_variance_components
message for partial estimability Show_message_partial_estimability
message for estimability Show_message_non_estimability
check_estimability_of_variance_ components
• Non estimability due to
– Specific missing value pattern
– Misspecification of the design structure
• Method based on stratum concept and REML
– Stratum needs to have at least rank one after projection on the space spanned by the treatment factors, otherwise no degrees of freedom remaining to estimate residual variance
Implementation
• SAS did a bad job in estimating variance components
• IML is used instead
• The obtained rank for the different strata is stored in the cov_parms list
Updating the experiment object
Beyond Daedalus
ICARUS Prototype to ‘explore’ experimental designs
Nys, M., P. Darius and M. Marasinghe An interactive window-based environment for experimental design. COMPSTAT, 1992, Neuchatel Nys, M., P. Darius and M. Marasinghe An interactive window-based environment to explore design of an experiment SOFTSTAT, 1993, Heidelberg Invited talk given by P. Darius and M. Nys at the University of Augsburg, 1994 Invited talk for reserachers in chemistry given by P. Darius and M. Nys at Research Center of Boehringer Mannheim, Tutzing, July 1994
ICARUS Prototype to ‘Explore’ Experimental
Designs
• Tool to assist statisticians and experimenters to ‘explore’ experimental designs during the design phase
• ‘Design’ as an object – Interactive – Graphical
• Strategy based: incorporates statistical knowledge
• Implementation in Objectworks/Smalltalk on SUN workstations
Window-based
Representation of a Design
• Text
• Name
• Model (design-matrix)
• Dataset
• Others
• Graph of crossing/nesting relationship of factors
Factorial 24, Latin Square, Split-plot, ..
Fields were chosen in 5 countries. Each field was divided into 4 plots,...
Y ij=µ + α i + β j + ε ij
Block Treatment 1 3 1 2 1 1 2 2 2 1 2 3
e.g. Kirk: SPF-p.qr
VARIETY * PLANT
Hasse Diagrams nesting = partial ordering Hasse diagram
B * A
A is nested in B (each level of A occurs only within 1 level of B) B*= B is fixed factor
DIET* HORMONE* CHICKEN BATCH
SAMPLE SUBSAMPLE
BATCH IRRIGATION* PLOT VARIETY* SUBPLOT
Two-way Completely Randomized Design
Nested (hierarchical ) Design
AGE GROUP TREATMENT* SUBJECT TIME* SUBJECT-TIME
Split-Plot Design Repeated Measures Design
SEX* DIET* HORMONE*
CHICKEN
3-way factorial
ROW COLUMN TREATMENT*
PLOT
Latin-square
A model for the experimental design process
One-shot approach ?
Interactive process ?
Our model : Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: •practical aspects •analysis aspects
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation
Defining the problem
An engineer wants to study the stretch of a piece of metal punched in a die. Three facors are considered: Lubricant (L): None, mill oil and added lubricant Thickness of the steel (T): 8 and 10 mm. Steel type (S): standard and AK steel For a given combination of L, T and S it is easy to punch 3 pieces, the one after the other. The experimenter plans to repeat the entire experiment a week later.
(based on Lorenzen and Anderson, 1993)
DESIGN ? ANALYSIS?
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation
Entering the design
Problem description (abstract)
Computer representation (formal)
Two options are proposed option 1 • describe all elements (treatment factors, blocking = design factors,
experimental units, ...) of the design
• ‘easy’ questions are asked to find the relationship between all the elements
Description of the elements
Name Total number of levels Random/fixed
lubricant 3 fixed
thickness 2 fixed
Steel type 2 fixed
week 2 random
piece 72 random
‘easy’ questions to find relationship
option 2
• Treatment factor (name) applied on (experimental unit)
• Treatment factor random/fixed
• Design factor (name)
• Response variables (name) measured on (observational unit)
• ‘More difficult’ questions are asked to find the relationship between all the elements
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation
Design representation
• Hasse diagram
• Model
• Dataset
See later
Design evaluation : - analysis aspects (see later) - practical aspects
Problems: - Very difficult to run exp. in random order
Solutions: - If # treatm. > 1 then group exp. units
Randomized order ?
Example
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation
Re-defining the problem
Re-entering the design
• Since it takes a long time to wipe down a die, one lubricant was selected and all combinations of thickness and steel type were run before another lubricant was used
• All 3 pieces are punched before going to the next thickness and steel type
• Piece is not the experimental unit of lubricant, steel type or thickness, but it is the observational units on which the stretch is measured
• Die is the experimental unit for lubricant • A group of 3 pieces is the experimental unit for thickness and steel type
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation
• Hasse diagram of factors
• Model
• Dataset
Design representation
Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant 2 Die 0 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricatn*thickness 2 Lubricant*type*thickness 2 Week*type 1 Week*thickness 1 Week*type*thickness 1 Week*lubricant*type 2 Week*lubricant*thickness 2 Week*lubricant*type*thickness 2 Die*type 0 Die*thickness 0 Die*type*thickness 0 Group 0 Piece 47
(I) For designs with a ‘(balanced) complete response response structure’ (Taylor and Hilton)
• An interaction term for factors not connected by a line in a Hasse diagram, is added to the model
(= maximal model)
• The degrees of freedom are calculated by the rules given in Taylor and Hilton (and in most text books)
Maximal model: some effects can have 0 df
Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant + die 2 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricant*thickness 2 Lubricant*type*thickness 2 Week*type 1 Week*thickness 1 Week*type*thickness 1 Week*lubricant*type +die*type 2 Week*lubricant*thickness + die*thickness 2 Week*lubricant*type*thickness+die*type*thickness+group 2 Piece 47
(II) Effects with 0 df are confounded with effects with df = 0
Based on EMS e.g. EMS (die)=EMS(week*lubricant) + x variance(die)
Effects of the maximal model df Week 1 Lubricant 2 Week*lubricant + die 2 Steel type 1 Thickness 1 type*thickness 1 Lubricant*type 2 Lubricant*thickness 2 Lubricant*type*thickness 2 Week*type Week*thickness Week*type*thickness Week*lubricant*type +die*type Week*lubricant*thickness + die*thickness Week*lubricant*type*thickness+die*type*thickness+group 9 Piece 47
(III) The interactions between the treatment and design (=blocking factors) are pooled
week
die
group
piece
Error strata
Up to now:
Balanced complete response structure Model I (=maximal model) Model II Model III
rules Taylor and Hilton (effects of model and df)
confounding
pooling
For orthogonal structure:
• Hasse diagram of ‘relevant’ effects (factors of cross-classifications)
• Add μ on top of Hasse diagram
• For each effect: total number of levels
• For df
Start at top
• df for μ is 1
• df for a specific effect= total number of levels for specific effect – sum of df of effects on top of specific effect
Hasse diagram of effects (model III)
• 4 error-strata
• No interaction between week (design facor) and treatment factors
• Remark: 2 pieces per groups instead of 3 (in orignal design description)
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: -Practical aspects -Analysis aspects Evaluation of design: analysis aspects
Evaluation of design: analysis aspects- power
Standardized minimal detectable difference
Power for lubricant (if # weeks=2)
α
Defining/re-defining the problem Entering/re-entering the design Design representation Design evaluation: -Practical aspects -Analysis aspects
Re-entering of the design
3 weeks instead of 2 weeks -> significant increase in power
Conclusion:
Efficient way to explore the experimental design during the design phase
researchers
statistical consultants