View
3
Download
0
Category
Preview:
Citation preview
Research Collection
Report
Optimal sampling schemes based on the anticipated variancewith lack to fit
Author(s): Mandallaz, Daniel
Publication Date: 2001
Permanent Link: https://doi.org/10.3929/ethz-a-004158007
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.
ETH Library
Daniel Mandallaz
Optimal Sampling Schemes Based on the
Anticipated Variance with Lack of Fit
Chair of Forest Inventory and PlanningSwiss Federal Institute of Technology (ETH), Zurich
2001
Published by:
Chair of Forest Inventory and Planning
Department of Forest and Wood Sciences
CH-8092 Zurich
Herausgeber:
Professur f�ur Forsteinrichtung und
Waldwachstum
Departement Wald-und Holzforschung
ETH Zentrum
CH-8092 Zurich
Aknowledgements
I would like to express my thanks to Professor P. Bachmann, Chair of Forest
Inventory and Planning at ETH Zurich, for his continuous support as well as for the
working environment he succeeded to create. Thanks are also due to Dr. A. Lanz
and E. Kaufmann at the Swiss Federal Research Institute WSL in Birmensdorf for
providing data from the Swiss National Inventory and for many helpful discussions.
Last not least, I thank my friend J.-F. Didisheim for his proof-reading, the remaining
errors being my own responsibility.
A L�ea Marine
et
Philom�ene
Abstract
This technical report generalizes and improves on previous work on optimal
sampling schemes for forest inventory. It also gives more detailed mathematical
derivations of previously published results. The sampling procedures are optimal in
the sense that they minimize the anticipated variance for given costs or conversely.
The anticipated variance is de�ned as the average of the design-based variance un-
der a simple stochastic model for the location of the trees. This location model,
the local Poisson Forest, assumes that trees are uniformly and independently dis-
tributed within a given number of Poisson-strata. We consider two-phase two-stage
cluster sampling schemes in which costs and terrestrial second-phase sampling den-
sity can vary over domains. The estimation procedure is based on post-strati�cation
with respect to so-called working-strata which do not need to be identical with the
Poisson-strata, usually unknown, which induces a lack of �t. It is then possible to
derive analytically the optimal sampling schemes. Simulations and data from the
Swiss National Inventory illustrate the theory.
Zusammenfassung
Dieser Bericht verallgemeinert und verbessert fr�uhere Arbeiten �uber optimale
Stichprobenpl�ane f�ur die Waldinventur. Es werden ferner ausf�uhrlichere mathe-
matische Herleitungen gegeben als in den schon publizierten Artikeln. Optimal
bedeutet, dass die antizipierte Varianz bei vorgegebenen Kosten minimiert wird,
oder umgekehrt. Die antizipierte Varianz ist das Mittel der klassischen Stich-
probenvarianz unter einem stochastischem Modell, welches die r�aumliche Lage der
B�aume erzeugt. In diesem r�aumlichen Modell, das lokale Poisson Modell, sind die
B�aume unabh�angig und uniform innerhalb Poisson-Straten verteilt. Wir betra-
chten zwei-phasige zwei-stu�ge Trakt-Stichproben, f�ur welche die Kosten und die
terrestrische Stichprobendichte der zweiten Phase zwischen Gebieten variieren kann.
Das Sch�atzverfahren verwendet Post-Strati�zierung bez�uglich sogenannter Arbeit-
Straten, welche mit den, meistens unbekannten, Poisson-straten nicht identisch
sein m�ussen, was einen \lack of �t" erzeugt. Es ist m�oglich, die optimalen Stich-
probenpl�ane analytisch abzuleiten. Simulationen und Daten der Schweizerischen
Landesforstinventur illustrieren die Theorie.
R�esum�e
Ce rapport g�en�eralise et am�eliore des r�esultats ant�erieurs sur les plans de sondage
optimaux pour l'inventaire des forets, au sens que la variance anticip�ee est mini-
male pour un cout donn�e, ou invers�ement. De plus, il donne des d�emonstrations
math�ematiques plus compl�etes que dans les articles d�ej�a publi�es. La variance an-
ticip�ee est la moyenne de la variance sous le plan de sondage par rapport �a un mod�ele
stochastique pour la distribution spatiale des arbres. Ce mod�ele, le mod�ele pois-
sonien local, suppose que les arbre sont r�epartis ind�ependamment et uniform�ement
�a l'int�erieur de strates (dites de Poisson). Nous consid�erons des plans de sondages
en satellites �a deux phases et deux degr�es pour lesquels la densit�e de sondage ter-
restre de la deuxi�eme phase ainsi que les couts peuvent varier d'un domaine �a
l'autre. La proc�edure d'estimation repose sur une post-strati�cation par rapport
�a des strates, dites de travail, qui ne sont pas forc�ement identiques aux strates de
Poisson, d'ailleurs inconnues le plus souvent. Ceci engendre un d�e�cit d'ajustement
(\lack of �t"). Il est possible de calculer analytiquement les plans de sondage op-
timaux. Des simulations et les donn�ees de l'Inventaire Forestier National Suisse
illustrent la th�eorie.
Contents
1 Introduction 11
1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Formulation of the Problem and De�nitions . . . . . . . . . . . . . . 11
2 Basic Concepts 15
2.1 Reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Post-Strati�ed Estimates . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Internal Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Design-Based Aspects 23
3.1 Calculation of the Design-Based Variance . . . . . . . . . . . . . . . 23
3.2 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 The Anticipated Variance 27
4.1 Mathematical Prerequisites . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Calculation of the Anticipated Variance . . . . . . . . . . . . . . . . 29
4.3 Interpretation and Estimation of the Lack of Fit . . . . . . . . . . . 30
5 Optimization 33
5.1 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Discrete PPS Approximation . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Optimal Sampling Schemes . . . . . . . . . . . . . . . . . . . . . . . 37
6 Examples 43
6.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Swiss National Inventory . . . . . . . . . . . . . . . . . . . . . . . . . 47
7 Conclusions 55
A Calculation of the Anticipated Variance under Cluster Sampling 57
7
List of Figures
6.1 Empirical Variance E.V. v. Mean Estimated Variance M.E.V. 44
6.2 Mean Estimated Variance M.E.V. v. Anticipated Variance
without Lack of Fit A.V. . . . . . . . . . . . . . . . . . . . . . . 45
6.3 Mean Estimated Variance M.E.V. v. Anticipated Variance
with Lack of Fit A.V.L. . . . . . . . . . . . . . . . . . . . . . . . 45
6.4 Empirical E.RED v. Mean Estimated M.RED variance re-
duction (in %) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.5 Distribution of DBH in SNI1 . . . . . . . . . . . . . . . . . . . . 49
6.6 Gamma Values for two Concentric Circles according to DBH
Threshold in cm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.7 Ratio q =
~ 2according to DBH Threshold in cm . . . . . . . . 50
6.8 Relative Anticipated Error: D = 1 . . . . . . . . . . . . . . . . . 52
6.9 Surface Areas of Optimal Circles: D = 1 . . . . . . . . . . . . . 53
6.10 Relative Anticipated Error: D = 1;m1 = 11:7 . . . . . . . . . . 53
8
List of Tables
4.1 ANOVA with Working-Strata . . . . . . . . . . . . . . . . . . . 31
6.1 Installation and Traveling Costs . . . . . . . . . . . . . . . . . 48
6.2 Parameters of SNI . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 Components of Variance in SNI . . . . . . . . . . . . . . . . . . 49
6.4 Optimal Concentric Circles . . . . . . . . . . . . . . . . . . . . 51
6.5 Optimal Sampling Scheme with D = 5 . . . . . . . . . . . . . . 51
6.6 Optimal Sampling Scheme with D = 1 . . . . . . . . . . . . . . 51
6.7 Optimal Sampling Scheme with D = 1 and m1 = 11:7 . . . . . 52
6.8 Empirical and Anticipated Relative Errors in % . . . . . . . . 52
9
Chapter 1
Introduction
1.1 Preliminaries
The choice of an eÆcient sampling design is paramount. Though optimal sampling
schemes based on anticipated variances have received much attention in sampling
theory, primarily in the context of socio-economical studies (see e.g. Saerndal et al.
(1992)), they have been largely ignored by forest inventory specialists. Let us em-
phasize the fact that no meaningful optimization is possible in sampling theory
without some kind of \super-population model" under which an anticipated vari-
ance can be de�ned. A technical report by Mandallaz (1997) and recent papers by
Mandallaz and Ye (1999) and Mandallaz and Lanz (2001) presented optimal sam-
pling schemes based on the anticipated variance under a local Poisson model for
the spatial distribution of the trees. The idea di�ers from the general framework in
as much as only the location of the trees is random and not the response variable
of interest, e.g. tree timber volume, which is �xed. The present work extends these
results to the case where the Poisson-strata, the idealized true model chosen by
Nature but usually unknown, are not identical with the so-called working-strata,
chosen by the statistician to calculate post-strati�ed estimates. This discrepancy
between Poisson- and working-strata induces a so-called lack-of-�t term in the re-
sulting anticipated variance of the post-strati�ed estimates, which has important
consequences with respect to optimization and applications. Though this technical
report is almost self-contained the reader is advised to read �rst Mandallaz and Ye
(1999) in order to get acquainted with the new concepts. Chapter 2 below gives
a short review. One must take the de�nitions word for word and in particular
distinguish carefully between Poisson-strata, working-strata and sampling-strata.
1.2 Formulation of the Problem and De�nitions
We consider a forested region F , assumed to be a subset of the Euclidean plane <2
whose surface area, measured in ha, is �(F ) and a well de�ned population P of N
trees in F , which are identi�ed by their labels i = 1; 2 : : :N . We shall write i 2 G
if the i-th tree belongs to a set G � F ; the surface area of an arbitrary set G is
always denoted by �(G). The response variable of interest measured or observed
at a given time point on each tree in P is denoted by Yi, and is assumed to be
error-free. Given F , our objective is to estimate
�YF =1
�(F )
Xi2F
Yi (1.1)
11
We assume that F is partitioned into D domains, F = [Dg=1Fg with Fg \ Fk = ;.
In each domain Fg the Poisson-strata are denoted by F1;gk with k = 1; 2 : : : P1g .
According to Mandallaz and Ye (1999), the local Poisson model holds if the locations
of all trees i 2 F1;gk can be viewed as the realizations of independently uniformly
distributed random points in F1;gk . We emphasize the fact that the Poisson-strata
F1;gk do not need to be known to the inventorist, we simply assume that \Nature"
has chosen this model for the location of the trees. In contrast we de�ne accordingly
in each domain Fg the working-strata F2;gk with k = 1; 2 : : : P2g , which will be
used to construct post-strati�ed estimates in each domain. For each sample point
(see below) one has to know to which working-stratum it belongs. For a better
intuitive understanding one can view each working-stratum as the union of �ner
Poisson-strata. Note also that domains, working-strata and Poisson-strata do not
need to be simply connected sets of the plane, i.e. they can each consist of several
non-contiguous components.
The sampling procedure is as follows:
1. First phase
Draw a set s1 of n1 points in A � F independently and uniformly distributed,
which are the origins of the clusters (or trakts). The M points of the x-th
cluster are x+el; l = 1; 2 : : :M . For each x 2 s1 letMg(x) =PM
l=1 IFg (x+el)
be the number of points of the cluster with origin x = x + e1 hitting the
domain Fg . Set M(x) =P
D
g=1Mg(x) and s1;g = fx 2 s1 j Mg(x) 6= 0g. Fur-thermore, let Ag = fx 2 <2 j Mg(x) 6= 0g and note that the set A � F can
be chosen as A = fx 2 <2 j M(x) 6= 0g. A can be viewed as the �rst-phase
sampling-stratum. The set s1 is usually drawn by a systematic grid on aerial
photographs (in agreement with general practice we shall treat systematic
samples as random samples). One de�nes for each cluster x 2 s1 a predictionof the local density at the cluster level according to:
bYc;g(x) = 1
Mg(x)
MXl=1
IFg (x + el)bY (x+ el) (1.2)
where bY (x + el) is the prediction at the point x + el, which depends on the
working-stratum to which x+ el belongs. The actual calculation of bY (x+ el)
is discussed in section 2.3.
2. Second phase
In each set s1;g draw n2;g clusters out of the n1;g by equal probability sam-
pling without replacement to obtain the subset s2;g � s1;g. For each x 2 s2;gone can calculate at each point x + el either the local density Y (x + el)
and the residual R(x + el) = Y (x + el) � bY (x + el) if one-stage sampling
is used, or the estimated local density Y �(x + el) and the estimated residual
R�(x + el) = Y �(x + el) � bY (x + el) if two-stage sampling is used (see sec-
tions 2.1 and 2.3). The Ag can be viewed as the (conditional) second-phase
sampling-strata: if x is uniform in A and given that x is in Ag then x is
uniform in Ag . In practice s2;g is obtained by a coarser terrestrial grid than
the aerial grid. Again, we shall assume that this can be treated as a random
subsample in the above sense. Note that the terrestrial sampling density can
vary over domains. The terrestrial quantities at the cluster level are de�ned
as:
Y �c;g(x) =
1
Mg(x)
MXl=1
IFg (x+ el)Y�(x+ el) (1.3)
12
R�c;g(x) =
1
Mg(x)
MXl=1
IFg (x+ el)R�(x+ el) = Y �
c;g(x)� bYc;g(x) (1.4)
for two-stage sampling and similarly by dropping the � for one-stage sampling. We
shall need the following assumptions:
Assumption I (no overlap)
We shall assume that a cluster cannot be spread over several domains, so that
Mg1(x)Mg2 (x) = 0 whenever g1 6= g2. The Ag therefore also form a partition of
A. n1;g is the number of non-void clusters hitting Fg , n1 is the number of non-void
clusters hitting F and n1 =PD
g=1 n1;g. ThereforeM(x) =Mg(x) whenever x 2 Ag .
Assumption II (negligible boundaries)
�(Ag)
�(A)=�(Fg)
�(F ):= pg
Assumption III (constant e�ective cluster size)
Ex2AM(x) = Ex2AgMg(x) := EgMg(x) 8g
The assumption that a cluster cannot be spread over several domains (though
it may be spread over several Poisson-strata or working-strata within the same
domain) is purely technical. In practice this will seldom occur. In such a case we
would split the cluster into several components, adjusting accordingly the sample
sizes. The assumption of negligible boundaries is not a concern for suÆciently large
forested areas Fg , which is the context of this work. Whenever Assumption III
is severely violated one has good reasons to use di�erent cluster geometries in the
domains Fg , and the theory presented here is not applicable. Note that Assumptions
I-III all hold under simple random sampling or if we have a single domain.
13
Chapter 2
Basic Concepts
2.1 Reminder
For easier reference we shall brie y review the concepts of local density, estimated
local density and anticipated variance for a given domain Fg . Full details are given
in Mandallaz (1997) and Mandallaz and Ye (1999).
We consider sampling schemes based on inclusion circles. To each tree we assign
a circle Ki centered on the tree. The i-th tree is sampled from the random point u
whenever the point u falls withinKi, the indicator variable Ii(u) is then 1, otherwise
0. By symmetry, this is equivalent to saying that the i-th tree is inside the circle
centered on the point, which is the de�nition used for �eld work. The situation
is slightly more intricate with strata and we have 3 possible versions. Let us �rst
de�ne the local density Y0(u) at the point u as
Y0(u) =1
�(Fg)
Xi2Fg
YiIi(u)
�g
i
(2.1)
�g
iis the �rst-stage inclusion probability and �(Fg)�
g
iis the inclusion area. In this
version we adjust for edge e�ects only at the forest boundary: that is �(Fg)�g
i=
�(Fg \ Ki). The second version adjusts for edge e�ects at the boundary of the
Poisson-strata. In this case Ii(u) = 0 if u and the i�th tree are not in the same
Poisson-stratum, say F1;gk . The conditional inclusion probability �1;gki
, given that
u 2 F1;gk and i 2 F1;gk, is de�ned by �(F1;gk)�1;gki
= �(F1;gk \Ki) and we set
Y1(u) =1
�(F1;gk)
Xi2F1;gk
YiIi(u)
�1;gki
(2.2)
This version is of theoretical use only since we do not know the Poisson-strata. The
third version adjusts for edge e�ects at the boundary of the working-strata, which
is the version one should ideally use in �eld work. Again Ii(u) = 0 if u and the
i�th tree are not in the same working-stratum, say F2;gk. The conditional inclusion
probability �2;gki
, given that u 2 F2;gk and i 2 F2;gk , is de�ned by �(F2;gk)�2;gki
=
�(F2;gk \Ki)
Y2(u) =1
�(F2;gk)
Xi2F2;gk
YiIi(u)
�2;gki
(2.3)
15
If the point u is uniformly distributed in Fg then
EuY0(u) = Y g
E(Y1 (u)ju 2 F1;gk) =1
�(F1;gk)
Xi2F1;gk
Yi := Y 1;gk
E(Y2 (u)ju 2 F2;gk) =1
�(F2;gk)
Xi2F2;gk
Yi := Y 2;gk
EuY1(u) = EuY2(u) = Y g
Hence, Y1(u) and Y2(u) yield unbiased estimates for the strata. Let us now calculate
the variance of (2.2) when the point u is uniformly distributed in Fg . With p1;gk =�(F1;gk)
�(Fg)we get
Vu2FgY1(u) =1
�(Fg)
ZFg
(Y1(u)� Y g)2 =
P1gXk=1
p1;gk1
�(F1;gk)
ZF1;gk
(Y1(u)� Y g)2
Write Y1(u)�Y g = Y1(u)�Y 1;gk+Y 1;gk �Y g whenever u 2 F1;gk and expand the
square. The cross-product term vanishes and we get
Vu2FgY1(u) =
P1gXk=1
p1;gkV(Y1(u)ju 2 F1;gk) +P1gXk=1
p1;gk(Y 1;gk � Y g)2 (2.4)
As Y1(u) is a Horwitz-Thompson estimate its variance is given by
V(Y1(u)ju 2 F1;gk) =1
�2(F1;gk)
0@ Xi2F1;gk
Y 2i(1� �
1;gki
)
�1;gki
+X
i6=j2F1;gk
�1;gkij
� �1;gki
�1;gkj
�1;gki
�1;gkj
1A (2.5)
where �1;gkij
= E (Ii(u)Ij(u)ju 2 F1;gk) =�(Ki\Kj\F1;gk)
�(F1;gk)are the pairwise condi-
tional inclusion probabilities.
According to Mandallaz and Ye (1999) we de�ne the anticipated variance of
Y1(u) as the average E!V(Y1(u)ju 2 F1;gk), where ! stands for the random location
of the trees in F1;gk under the Poisson-model. Hence, all the probabilities occurring
in (2.5) depend on the location of the trees. However, �(F1;gk)�1;gki
� �(Ki) as
long as the i-th tree is in the \interior" of Fg . The assumption of negligible
boundary e�ects states that
�(Fg)�g
i(!) = �(F1;gk)�
1;gki
(!) = �(F2;gk)�2;gki
(!) � �(Ki) (2.6)
Under this assumption we can calculate E!V(Y1(u)ju 2 F1;gk). We need only
E!Eu (Ii(u; !)Ij(u; !)ju 2 F1;gk). Interchanging the order of integration and since
for a given u the random variables Ii(u; !) are independent, we obtain, neglect-
ing boundary e�ects, E!�1;gkij
(!) = �1;gki
�1;gkj
. Hence, by taking the anticipated
variance in (2.5) the second term vanishes. The �rst term in (2.5) is
1
�2(F1;gk)
Xi2F1;gk
Y 2i
�1;gki
� 1
�2(F1;gk)
Xi2F1;gk
Y 2i
As �(F1;gk)!1 the second term tends to zero but not the �rst. Substituting this
into (2.4), noting thatp1;gk
�(F1;gk)= 1
�(Fg)and also that, neglecting boundary e�ects,
16
�(F1;gk)�1;gki
= �(Fg)�g
iwe obtain the following result:
Under simple random sampling the asymptotic anticipated variance is
given by
E!V(Y1(u)ju 2 Fg) = 1
�2(Fg)
Xi2Fg
Y 2i
�g
i
+
P1gXk=1
p1;gk(Y 1;gk � Y )2 (2.7)
Under the assumption of negligible boundary e�ects (2.6) we have Y0(u) = Y1(u) =
Y2(u) := Y (u). The mathematics of boundary e�ects is beyond exact general re-
sults. For a semi-quantitative approach based on integral geometry see Mandallaz
(1997). It suÆces to say that boundary e�ects are negligible if the inclusion circles
are small with respect to the strata, which is common sense. We emphasize once
again the fact that this assumption is only required in the calculation of the an-
ticipated variance in order to derive the optimal sampling schemes and not for the
calculation of the estimates based on actual inventory data. Hence, we shall assume
from now on that the anticipated variance under simple random sampling is given
by
E!V(Yk(u)ju 2 Fg) = 1
�2(Fg)
Xi2Fg
Y 2i
�g
i
+
P1gXk=1
p1;gk(Y 1;gk � Y )2 k = 0; 2 (2.8)
The corresponding formula for cluster sampling rests upon the same arguments and
is given in (4.4), with the proof in the Appendix.
In two-stage sampling let us denote by s2(u) the set of �rst-stage trees selected
at the point u. On each of the selected trees i 2 s2(u) one gets an approximation Y �i
of the exact value Yi. In the �nite set s2(u) one draws a subsample s3(u) � s2(u)
of trees. For each tree i 2 s3(u) one measures the exact variable Yi. Let us now
de�ne the second stage indicator variable:
Ji(u) =
(1 if i 2 s3(u)0 if i 62 s3(u)
(2.9)
pi = P(Ji(u) = 1jIi(u) = 1) are the second-stage conditional inclusion probabil-
ities. The trees in s2(u) are sampled independently from each other, so that
pij = P(Ji(u)Jj(u) = 1jIi(u)Ij(u) = 1) = pipj . The estimated local density
Y �0 (u) at the point u is de�ned as
Y �0 (u) =
1
�(Fg)
0@Xi2Fg
Y �iIi(u)
�g
i
+X
i2s3(u)
Ri
�g
ipi
1A (2.10)
where Ri = Yi �Y �iare the residuals at the tree level. Adjusting for edge e�ects at
the working-strata boundaries we set as in (2.3)
Y �2 (u) =
1
�(F2;gk)
0@ Xi2F2;gk
Y �iIi(u)
�2;gki
+X
i2s3(u)
Ri
�2;gki
pi
1A (2.11)
whenever u and i are in the same working-stratum F2;gk . By construction we have
for (2.10, 2.11)
E(Y �k (u)ju) = Yk(u) k = 0; 2 (2.12)
Since Y �0 (u) is a Horwitz-Thompson estimate of Y0(u) with independent drawings
we have according to (2.10, 2.11)
V(Y �0 (u)ju) =
1
�2(Fg)
Xi2Fg
R2iIi(u)(1� pi)
(�g
i)2pi
:= V0(u) (2.13)
17
V(Y �2 (u)ju) =
1
�2(F2;gk)
Xi2F2;gk
R2iIi(u)(1� pi)
(�2;gki
)2pi:= V2(u) (2.14)
Under the assumption of negligible boundary e�ects we have for both versions the
important result
Vg := Eu2Fg Vk(u) =1
�2(Fg)
Xi2Fg
�R2i
�g
ipi� R2
i
�g
i
�k = 0; 2 (2.15)
Using for k = 0; 2
Vu2Fg (Y�k (u)) = Vu2FgE(Y
�k (u)ju)+Eu2FgV(Y �
k (u)ju) = Vu2Fg (Yk(u))+Vg (2.16)
we obtain by the previous results
E!V(Y�k (u)) =
1
�2(Fg)
Xi2Fg
Y 2i�R2
i
�g
i
+
P1gXk=1
p1;gk(Y 1;gk � Y )2 +1
�2(Fg)
Xi2Fg
R2i
�g
ipi
(2.17)
To simplify this expression we assume that the prediction modelM at the tree level
satis�es
Yi = Y �i+Ri EMRi = C OVM (Y �
i; Ri) = 0 8i (2.18)
As a consequence EM (Y 2i� R2
i) = Y �2
i. Hence, by taking EM E!V(Y
�k(u)) we can
replace the term Y 2i�R2
iby Y �2
i. To simplify the notation we shall henceforth omit
the expectation with respect to the model M and write
E!V(Y�k (u)) =
1
�2(Fg)
Xi2Fg
Y �2i
�g
i
+
P1gXk=1
p1;gk(Y 1;gk �Y )2+ 1
�2(Fg)
Xi2Fg
R2i
�g
ipi
(2.19)
Since we are dealing with bounded random variables only we have 1�(Fg)
Pi2Fg Yi !
1�(Fg)
Pi2Fg Y
�iin probability and in the mean as �(Fg) ! 1. Also, we shall use
Y2(u), Y�2 (u) and V2(u) henceforth at the points u = x + el of the x cluster. To
simplify the notation we write Y (u), Y �(u) and V (u).
2.2 Post-Strati�ed Estimates
We consider only two-phase two-stage procedures as the two-phase one-stage proce-
dure is a special case. Within each domain Fg one obtains, according to Mandallaz
and Ye (1999), the following two-phase two-stage point estimate
bY �g=
Px2s1;g Mg(x)bYc;g(x)P
x2s1;g Mg(x)+
Px2s2;g Mg(x)R
�c;g(x)P
x2s2;g Mg(x)(2.20)
Recall that the predictions bYc;g(x) at the cluster level are based by (1.2) on the
working-strata. The relative surface areas can be estimated by the overall propor-
tion of points falling within a domain, i.e. by
pg =
Px2s1 Mg(x)Px2s1 M(x)
=
Px2s1;g Mg(x)Px2s1 M(x)
(2.21)
Intuitively it is clear that one can combine the domain estimates into a global
estimate by setting
bY � =DXg=1
pg bY �g
(2.22)
18
Using (2.20) and (2.21) this can be rewritten as
bY � =
Px2s1 M(x)bYc(x)P
x2s1 M(x)+
DXg=1
pg
Px2s2;g Mg(x)R
�c;g(x)P
x2s2;g Mg(x)
!(2.23)
where we have set
bYc(x) = 1
M(x)
MXl=1
IF (x+ el)bY (x+ el) (2.24)
(2.23) is the mean of all the predictions plus the weighted mean (according to
the estimated surface areas of the domains) of the mean residual within each do-
main, which is a straightforward generalization of results given in Mandallaz and
Ye (1999). Before calculating the design-based and anticipated variances of (2.22)
we go back to the calculation of the predictions within each domain. This technical
section is taken from Mandallaz (1997).
2.3 Internal Linear Models
Since the predictions are de�ned within each domain we consider in this section
a single domain. As such, as far as the design-based variance is concerned, the
predictions can be arbitrary and in particular obtained from existing models, not
based on the data of the inventory at hand (so called external models). In particular,
there could be a bias in the predictions, but this is irrelevant since this bias is
removed by adding the mean residual. However, usually one has to construct the
predictions with a model based on the data from the inventory at hand (so called
internal models). We follow Mandallaz (1991).
We consider one-stage sampling �rst. At the point level the theoretical pre-
dictions are obtained via a design-based linear model of the form
bY�(x) = �Y + �t(Z(x)� �Z); � 2 <p; Z(x) 2 <p
where Z(x) is the vector of auxiliary variables available in the large sample. Z(x) is
further de�ned as a vector of 0=1 indicator variables based on the working-strata.
At the cluster level we set consequently
bY�;c(x) = �Y + �t(Zc(x) � �Z); � 2 <p; Z(x) 2 <p
These predictions are purely theoretical since �Y , �Z are unknown and � is, for the
time being, arbitrary. These de�nitions ensure that
Ex2AM(x)bY�;c(x)Ex2AM(x)
= �Y 8�
The theoretical residuals are de�ned as
R�;c(x) = Yc(x) � bY�;c(x)By construction the theoretical residuals have zero mean, namely
�R� =Ex2AM(x)Rc(x)
Ex2AM(x)= 0 (2.25)
We shall assume that the model has an intercept term, which is generally the case in
practice. This means that one component of Z(x), say the �rst Z1(x), is constant
19
and equal to 1. We therefore partition the vectors as Z(x)t = (1; Z�(x)t) and
�t = (�1; ��t) (t is the transposition operator). In two-phase sampling the optimal
choice of the unknown parameter vector � is determined by minimizing the residual
variance term, i.e.
min�
ExM2(x)(Yc(x)� �Y � �t(Zc(x)� �Z))2
Di�erentiating with respect to � leads to the normal equations for the optimal
theoretical choice �0
ExM2(x)(Zc(x)� �Z)(Zc(x)� �Z)t�0 = ExM
2(x)(Yc(x)� �Y )(Zc(x)� �Z) (2.26)
It follows from the normal equations that the optimal theoretical residuals and
predictions are uncorrelated in the sense that
Ex2AM2(x)(R�0;c(x) � �R)(bY�o;c(x)� �Y ) = 0 (2.27)
From this we have the following decomposition, which plays a key role for the
calculation of the anticipated variance
Ex2AM2(x)(Rc(x)�R)2 = Ex2AM
2(x)(Yc(x) � Y )2 � Ex2AM2(x)(bYc(x) � Y )2
(2.28)
The (p; p) matrix on the left hand side of (2.26) is singular for models with an inter-
cept term, since in this case the �rst row and the �rst column are identically zero.
An elegant solution is to use generalized inverses, see Mandallaz (1991), which unfor-
tunately is not always a suitable procedure when using standard statistical software
packages. We therefore are providing below an alternative approach. Rewriting the
normal equations in terms of the reduced vectors Z�c(x) and �� we see that the
general solution of the original normal equations is given by:
ExM2(x)(Z�
c (x)� �Z�)(Z�c (x) � �Z�)t��0 = ExM
2(x)(Yc(x) � �Y )(Z�c (x)� �Z�)
The optimal �t is therefore equal to (�1; ��t0 ), where �1 is arbitrary.
In practice the theoretical normal equations are obviously not available and we
solve instead their sample versions, i.e.Xx2s2
M2(x)(Z�c(x) � bZ�
2 )(Z�c(x)� bZ�
2 )t b��0 = X
x2s2M2(x)(Yc(x)� bY2)(Z�
c(x) � bZ�
2 )
In other words, b��0 is obtained by linear regression with weights M2(x) of the cen-
tered response variable Yc(x) � bY2 on the centered explanatory variables without
the intercept term, i.e. on Z�c (x) � bZ�
2 .
The empirical predictions are given bybYc(x) = bY2 + b��t0 (Z�c(x) � bZ�
2 )
Most software packages will give directly the predictions, say Pc(x), of bYc(x) � bY2,so that one can also write bYc(x) = bY2 + Pc(x).
The empirical residuals are then given by
Rc(x) = Yc(x) � bYc(x) = (Yc(x)� bY2)� b��t(Z�c(x)� bZ�
2 )
which by construction satisfy
bR2 =
Px2s2
M(x)Rc(x)Px2s2 M(x)
= 0
20
Hence, the point estimate is �nally given by
bYc;reg = bY2 + b��t0 ( bZ�1 � bZ�
2 ) =bY2 + b�t0( bZ1 � bZ2)
where b�t0 = (�1; b��t0 )
and �1 is arbitrary. Note that the �rst component of bZ1 � bZ2 is zero and that all
statistically relevant quantities do not dependend on the arbitrary choice of �1.
In short the optimal design-based point estimate and its estimated vari-
ance can be obtained by standard regression procedures.
It is also worthwhile to note that direct regression of the Yc(x) on the Zc(x)
with weight M2(x) leads to residuals satisfyingP
x2s2 M2(x)Rc(x) = 0 instead ofP
x2s2 M(x)Rc(x) = 0 which is more intuitive. In simple cluster sampling M(x) �1, so that one could in this case also use ordinary least squares.
Further modi�cations are possible by estimating the matrix
ExM2(x)(Z�
c (x)� �Z�)(Z�c (x) � �Z�)t
in the large sample, see Mandallaz (1991) for details.
In two-phase two-stage sampling the theoretical results remain the same since
the extra term due to two-stage sampling does not depend on the prediction model
used. For estimation purposes it suÆces to replace everywhere Yc(x) by Y �c(x),
since this is precisely what the variance estimate does (Mandallaz and Ye (1999)).
Finally, one can also use standard least squares of the Y (xl) or Y�(xl) on the
Z(xl), that is, by ignoring the cluster structure of the data, and then de�ne the
predictions and residuals at the cluster level. In this case we still have zero mean
residual (2.25) but (2.27) may be violated. However, extensive simulations have
shown that the points and variance estimates obtained with the various methods,
including the model-based technique presented in Mandallaz (1991), are usually
small and that the sampling error of b� can be neglected (that this holds asymp-
totically is proved in Mandallaz (1991)). In our case the standard least square
estimates based on a simple ANOVA model with the working-strata as groups have
a clear intuitive interpretation (which is not the case for the model at the cluster
level) : if xo + el 2 F2;gk for a given xo 2 s1 then bY (xo + el) is the ordinary mean,
i.e. ignoring the cluster structure, of all the estimated local densities Y �(xm) withxm 2 F2;gk, i.e. at points in the same working-stratum. This is the estimate we
shall use, while assuming that (2.27) holds exactly when calculating the anticipated
variance. Note that this is true for simple random sampling (M(x) � 1). Intuitively
this will hold approximately whenever the design-based intra-cluster correlation of
the residual is small. Furthermore, one can assume asymptotically (n2 ! 1) that
for xo + el 2 F2;gk the prediction bY (xo + el) is equal to the true working-stratum
mean Y 2;gk =1
�(F2;gk)
Pi2F2;gk Yi.
21
Chapter 3
Design-Based Aspects
3.1 Calculation of the Design-Based Variance
We shall now derive the design-based variance of the overall estimate (2.22) or
of (2.23), which is equivalent. We shall calculate it conditionally on the sample
sizes n1;g and n2;g. Also, all the random variables occurring are bounded, so that
convergence in probability implies converge in the mean. For instance one can write
Ex2A (bpg) = Ex2AP
x2s1 Mg(x)
n1Ex2AM(x)(1 + op(n�1=21 ))
=Ex2AMg(x)
Ex2AM(x)+o(n
�1=21 ) = pg+o(n
�1=21 )
We shall use repeatedly such approximations, without explicit mention, and keep
only the �rst order terms. With respect to the expectation and variance operators
E and V the index 1 refers to the �rst phase, 2 to the second phase and 3 to the
second stage. Hence, using (2.23), we have
bY := E3j1;2 bY � =
Px2s1 M(x)bYc(x)P
x2s1 M(x)+
DXg=1
pg
Px2s2;g Mg(x)Rc;g(x)P
x2s2;g Mg(x)
!(3.1)
since E3Y�(x+ el) = Y (x+ el). Furthermore,
E2;3j1 bY � =
Px2s1 M(x)bYc(x)P
x2s1 M(x)+
DXg=1
pg
Px2s1;g Mg(x)Rc;g(x)P
x2s1;g Mg(x)
!
and, by using the de�nition (2.21), also
E2;3j1 bY � =
Px2s1 M(x)Yc(x)P
x2s1 M(x)(3.2)
Thus, according to Mandallaz and Ye (1999), bY � is asymptotically design unbiased
(ADU), with a bias of order o(n�12;g) (even exactly unbiased for simple random sam-
pling with external model). To calculate the variance we �rst use the decomposition
V1;2;3bY � = V1;2(E3j1;2 bY �) + E1;2 (V3j1;2 bY �) (3.3)
To calculate the second term in (3.3) we need
V3j1;2
Px2s2;g Mg(x)R
�c;g(x)P
x2s2;g Mg(x)= V3j1;2
Px2s2;g Mg(x)Y
�c;g(x)P
x2s2;g Mg(x)
23
Since the second-stage sampling are independent we have by (2.14)
V3j1;2
Px2s2;g Mg(x)Y
�c;g(x)P
x2s2;g Mg(x)=
Px2s2;g
PM
l=1IFg (x+el)V (x+el)
n2;g
n2;g
�Px2s2;g
Mg(x)
n2;g
�2Therefore
E2j1(V3j1;2 bY �) =DXg=1
bp2g
1
n2;g
Px2s1;g
PM
l=1IFg (x+el)V (x+el)
n1;g�Px2s1;g
Mg(x)
n1;g
�2
Taking now the expectation with respect to the �rst phase and since bp2g= p2+o(n�11;g)
E1E2j1V3j1;2 bY � =DXg=1
p2g
1
n2;g
EgMg(x)Vg
E2gMg(x)
where EgMg(x) = Ex2AgMg(x) and Vg =
1�2(Fg)
Pi2Fg
�R2
i
�g
ipi� R
2
i
�g
i
�by (2.15). We
therefore have the intermediate result
E1;2 (V3j1;2 bY �) =DXg=1
p2g
1
n2;gEgMg(x)
1
�2(Fg)
Xi2Fg
�R2i
�g
ipi� R2
i
�g
i
�(3.4)
We still need V1;2(E3j1;2 bY �) = V1;2bY = V1E2j1 bY + E1V2j1 bY . According to (3.2)
the �rst term is
V1
Px2s1 M(x)Yc(x)P
x2s1 M(x)=
1
n1E2x2AM(x)
Ex2AM2(x)(Yc(x) � Y )2
For the second term we have
V2j1 bY = V2j1
DXg=1
bpg�P
x2s2;gMg(x)Rc;g(x)
n2;g
�P
x2s2;gMg(x)
n2;g
Under Assumption I and since the second-phase drawings are independent between
domains, this is equivalent to
DXg=1
bp2g
�1� n2;g
n1;g
�1
n2;g
Px2s1;g M
2g(x)(Rc;g(x) �R1;g)
2�Px2s1;g
Mg(x)
n1;g
�2
where R1;g =
Px2s1;g
Mg(x)Rc;g(x)Px2s1;g
Mg(x). Therefore we obtain
E1V2j1 bY =
DXg=1
p2g
n2;g
�1� n2;g
n1;g
�EgM
2g (x)(Rc;g(x)�Rg)
2
E2gMg(x)
Collecting the pieces we see that under Assumption I the asymptotic variance is
given by
24
V1;2;3(bY �) =DXg=1
p2g
n2;g
1
EgMg(x)
1
�2(Fg)
Xi2Fg
�R2i
�g
ipi� R2
i
�g
i
�
+
DXg=1
p2g
n2;g
�1� n2;g
n1;g
�EgM
2g(x)(Rc;g(x)�Rg)
2
E2gMg(x)
+1
n1
EM2 (x)(Yc(x)� Y )2
E2M(x)(3.5)
This is the result given in Mandallaz and Lanz (2001). The �rst term is the second-
stage variance within the domains, the second is the residual variance within the
domains and the last term is the variance one would get if the true local densities
were known at all n1 points.
3.2 Variance Estimation
We shall now derive an ADU estimate of the asymptotic theoretical variance (3.5).
By analogy with the case of one domain treated in Mandallaz (1997) we can expect
that the �rst two terms in (3.5) can be partially estimated by
A :=
DXg=1
bp2g
n2;gM2
g
�1� n2;g
n1;g
�1
n2;g
Xx2s2;g
M2g(x)(R�
c;g(x)�R
�2;g)
2
with R�2;g =
Px2s2;g
Mg(x)R�
c;g(x)P
x2s2;gMg(x)
andMg =
Px2s1;g
Mg(x)
n1;g. Likewise, if all the Y �
c (x)
for x 2 s1 were available, the last term in (3.5) could be partially estimated by
1
n21M2
1
Xx2s1
M2(x)(Y �c(x)� bY �)2
with M1 =
Px2s1
M(x)
n1. Under Assumption I this is equal to
1
n21M2
1
DXg=1
Xx2s1;g
M2g(x)(Y �
c;g(x) � bY �)2
This can be estimated by using Horwitz-Thompson estimates within each s1;g by
B :=1
n21M2
1
DXg=1
n1;g
n2;g
Xx2s2;g
M2g(x)(Y �
c;g(x) � bY �)2
We therefore set
bV(bY �) =DXg=1
bp2g
n2;gM2
g
�1� n2;g
n1;g
�1
n2;g
Xx2s2;g
M2g(x)(R�
c;g(x)�R
�2;g)
2
+1
n21M2
1
DXg=1
n1;g
n2;g
Xx2s2;g
M2g (x)(Y
�c;g(x)� bY �)2 (3.6)
We shall now prove that (3.6) is indeed an ADU estimate of (3.5), provided Assump-
tions I-III hold. To this end we shall replace asymptotically all the point estimates
25
appearing in (3.6) by their true values (e.g. Mg by EgMg(x), bY � by Y and so on)
and use the relations
EgM2g(x)(R�
c;g(x) �Rg)
2 = EgM2g(x)(Rc;g(x) �Rg)
2 + EgMg(x)Vg (3.7)
EgM2g (x)(Y
�c;g(x) � Y g)
2 = EgM2g (x)(Yc;g(x)� Y g)
2 + EgMg(x)Vg (3.8)
where Y g = 1�(Fg)
Pi2Fg . They can be checked by direct calculations, similar to
those of the second stage variance given in the proof of (3.5). It follows from (3.7)
that
E1;2;3 (A) =
DXg=1
p2g
n2;gE2gMg(x)
�1� n2;g
n1;g
��EgM
2g (x)(Rc;g(x) �Rg)
2 + EgMg(x)Vg�
To calculate E1;2;3(B) we set n1;g = n1pg , write (Y �c;g(x) � Y ) = (Y �
c;g(x) � Y g +
Y g � Y ), expand the square, use E3j1;2Y�c;g(x) = Yc;g(x) and (3.8). Collecting the
terms yields
E1;2;3 (B) =1
n1E2M(x)
DXg=1
pgEgMg(x)Vg+1
n1E2M(x)
DXg=1
pgEgM2g(x)(Yc;g(x)�Y )2
Under Assumptions I-III we can rewrite the last term of (3.5) by using the relation
Ex2AM2(x)(Yc(x)� Y )2 =
DXg=1
pgEgM2g(x)(Yc;g(x)� Y )2
and check, by calculating E(A) + E(B) with n1;g = n1pg , that (3.6) is an ADU
estimate of (3.5).
26
Chapter 4
The Anticipated Variance
4.1 Mathematical Prerequisites
The calculation of the anticipated variance under cluster sampling is technically
intricate. In order to have analytically tractable results we need two further as-
sumptions.
Assumption IV:
Ex2Ag[Mg(x)Yc;g(x)jMg(x)] =Mg(x)Y g
Assumption V:
Ex2Ag
hPP2g
k=1M2;gk(x)Y 2;gkjMg(x)i=Mg(x)Y g
Assumptions I-V are all satis�ed under simple random sampling. Assumptions
IV and V essentially say that the number of points per cluster falling within the
forest area is not related to the values of the response variable. This could be vi-
olated if for instance values of timber volume at the forest edge di�ered markedly
from values in the interior. In any case, these are boundary e�ects, speci�c to each
given forest and therefore beyond general theory. Furthermore, they are negligible
for very large forested areas. Following Mandallaz and Ye (1999) and Mandallaz
and Lanz (2001) we introduce the in ation factors due to cluster sampling with
respect to Poisson-strata, working-strata and domains according to
(1 + �1;g)�21;g =
Vx2Ag
�PP1g
k=1M1;gk(x)(Y 1;gk � Y g)�
EgMg(x)
(1 + �2;g)�22;g =
Vx2Ag
�PP2g
k=1M2;gk(x)(Y 2;gk � Y g)�
EgMg(x)
(1 + ~�2) ~�22 =
Vx2A�PD
g=1
PP2g
k=1M2;gk(x)(Y 2;gk � Y )�
Ex2AM(x)
(1 + �)�2 =Vx2A
�PD
g=1Mg(x)(Y g � Y )�
Ex2AM(x)(4.1)
27
In the above equations the variance between Poisson- and working-strata as well as
between domains are de�ned as
�21;g =
P1gXk=1
p1;gk(Y 1;gk � Y g)2
�22;g =
P2gXk=1
p2;gk(Y 2;gk � Y g)2
~�22 =
DXg=1
P2gXk=1
pgp2;gk(Y 2;gk � Y )2
�2 =
DXg=1
pg(Y g � Y )2 (4.2)
The relative surface areas are p1;gk =�(F1;gk)
�(Fg)and p2;gk =
�(F2;gk)
�(Fg). M1;gk(x) is
the number of points of the x-th cluster falling within the Poisson-stratum F1;gk ,
likewise M2;gk(x) for the working-stratum F2;gk.
We de�ne the lack of �t in the g-th domain as
�2g = (1 + �1;g)�
21;g � (1 + �2;g)�
22;g (4.3)
It is zero if the working-strata coincide with the Poisson-strata. We shall go back
later to interpretating and estimating �2g.
We need the following two important results, the proofs of which are given in
the Appendix.
Lemma 1: anticipated variance of local densities with respect to Poisson-strata
E!EgM2g(x)(Yc;g(x) � Y g)
2
E2gMg(x)
=1
EgMg(x)
0@ 1
�2(Fg)
Xi2Fg
Y 2i
�g
i
+ (1 + �1;g)�21;g
1A(4.4)
Lemma 2: variance of predictions with respect to working-strata
EgM2g (x)(
bYc;g(x) � Y g)2
E2gMg(x)
=1
EgMg(x)(1 + �2;g)�
22;g (4.5)
we also need the following decomposition which holds under Assumptions I, II, III
and V.
Lemma 3
(1 + ~�2) ~�22 =
DXg=1
pg(1 + �2;g)�22;g + (1 + �)�2 (4.6)
28
Proof:
Under Assumption III and V we have zero expectation and therefore
Vx2A
0@ DXg=1
P2gXk=1
M2;gk(x)(Y 2;gk � Y )
1A = E2A
0@ DXg=1
P2gXk=1
M2;gk(x)(Y 2;gk � Y )
1A2
Under Assumption I the cross terms with di�erent indexes g vanish. Conditioning
on x 2 Ag we get under Assumption II the equivalent expression
DXg=1
pgEx2Ag
0@P2gXk=1
M2;gk(x)(Y 2;gk � Y )
1A2
We shall further set Y 2;gk�Y = Y 2;gk�Y g+Y g�Y and note thatPP2g
k=1M2;gk(x) =
Mg(x). Expanding the square we see by Assumption V and after conditioning on
Mg(x) that the cross-product term also vanishes. To complete the proof one checks
that (1 + �)�2 =Ex2AM
2(x)
Ex2AM(x)
PD
g=1 pg(Y g � Y )2 under Assumptions I-III. Collecting
the terms we obtain (4.6).
Finally we need the following result, which holds under Assumptions I-IV
Lemma 4
1
n1
Ex2AM2(x)(Yc(x)� Y )2
E2x2AM(x)
=
DXg=1
p2g
n1;g
EgM2g (x)(Yc;g(x)� Y g)
2
E2gMg(x)
+1
n1E2x2AM(x)
DXg=1
pgEgM2g(x)(Y g � Y )2
(4.7)
Proof
Write Yc(x)� Y = Yc(x)� Y g + Y g � Y whenever x 2 Ag. Then
Ex2AM2(x)(Yc(x)�Y )2 =
DXg=1
pgEgM2g (x)(Yc;g(x)�Y g)
2+
DXg=1
pgEgM2g (x)(Y g�Y )2
This results from the fact that the cross-product term is equal to
2
DXg=1
pg(Y g � Y )EgM2g (x)(Yc;g(x)� Y g)
which vanishes under Assumption IV since
2EgM2g (x)(Yc;g(x)� Y g) = 2EgMg(x)E g
�Mg(x)(Yc;g(x)� Y g)jMg(x)
�Replacing n1;g by pgn1 we obtain (4.7).
4.2 Calculation of the Anticipated Variance
We can now calculate the anticipated variance. The second term in (3.5) contains
the residuals for which we use the decomposition (2.28). This term appears once
29
with the factor 1n2;g
, for which we use Lemma 1, Lemma 2 and the lack-of-�t term,
and once more with the factor � 1n1;g
, for which we use Lemma 3 and Lemma 4 in
order to combine it with the last term in (3.5). Finally, we use the prediction model
at the tree level (2.18) and the arguments leading to (2.19). This allows us to shift
the expression �R2
i
�g
i
appearing in the �rst term of (3.5) into the �rst term of Lemma
1. After some algebra this leads to the following fundamental result:
Under Assumptions I-V the asymptotic anticipated variance of (2.22) is given by
E!V1;2;3(bY �) =DXg=1
p2g
n2;gEgMg(x)
1
�2(Fg)
Xi2Fg
R2i
�g
ipi
+
DXg=1
p2g
n2;gEgMg(x)
1
�2(Fg)
Xi2Fg
Y �2i
�g
i
+
DXg=1
p2g
n2;gEgMg(x)�2g
+1
n1EM(x)(1 + ~�2) ~�
22
(4.8)
The �rst term derives from the second-stage sampling within domains, the second
is the anticipated variance within domains as if they were global Poisson forests,
the third derives from the lack of �t of the prediction models used and the forth
from the overall heterogeneity of the working-strata. If the working-strata coincide
with the Poisson-strata, then the lack-of-�t terms are zero and we get the result
given in Mandallaz and Lanz (2001). We used EgMg(x) instead of Ex2AM(x) in the
above formula in order to emphasize the additive structure of the variance terms
across domains. Besides, one can conjecture that (4.8) remains approximately valid
even if Assumption III (constant cluster size) is violated. In contrast to the previous
results we note that as n1 !1 the heterogeneity of the strata is not entirely
removed unless the lack-of-�t term is zero. We shall see that this term will
play a key role for optimization. In the next section we shall give an intuitive
interpretation for the lack of �t as well as a technique to estimate it.
4.3 Interpretation and Estimation of the Lack of
Fit
It is enough to consider a single domain. In order to illustrate the main point
we consider one-stage simple random sampling (then Mg(x) � 1 and 1 + �1;g =
1 + �2;g = 1). Let us assume that each working-stratum is the union of Poisson-
strata. That is, we set F2;gk =SP1gk
j=1 F1;gkj were F1;gkj is the j-th Poisson-stratum
of the k-th working-stratum in the g-th domain. With this notation we obtain �21;g =PP2g
k=1
PP1gk
j=1 p1;gkj(Y 1;gkj�Y g)2. Writing Y 1;gkj�Y g = Y 1;gkj�Y 2;gk+Y 2;gk�Y g
and expanding the square we obtain
�2g = �21;g � �22;g =
P2gXk=1
P1gkXj=1
p1;gkj(Y 1;gkj � Y 2;gk)2
So that the lack-of-�t term is simply the remaining heterogeneity not accounted for
by the working-strata. Let us now consider the general case. We assume that we
30
Table 4.1: ANOVA with Working-Strata
Source Sum of Squares Anticipated Values
ModelP
x2s2;g M2g(x)(bYc;g(x)� bY �
g)2 n2;gEgMg(x)(1 + �2;g)�
22;g
ResidualP
x2s2;g M2g (x)(Y
�c;g(x)� bYc;g(x))2 n2;gEgMg(x)(�
2g +�2
g)
TotalP
x2s2;g M2g(x)(Y �
c;g(x)� bY �
g)2 n2;gEgMg(x)(�
2g+ (1 + �1;g)�
21;g)
have a one-phase two-stage inventory with a given cluster structure (it could also
be a two-phase inventory but we only need the terrestrial data). The idea is very
simple: we break down the overall sum of squares of the estimated local densities
according to the ANOVA model based on the working-strata. The asymptotic
anticipated values of the sum of squares \Model" and \Residual" are obtained by
the techniques used in sections 4.1 and 4.2. Table 1 gives the results.
The pure-error term �2g is de�ned as
�2g =1
�2(Fg)
Xi2Fg
Y �2i
�g
i
+1
�2(Fg)
Xi2Fg
R2i
�g
ipi
(4.9)
We can also interpret Table 1 as a classical analysis of variance. We introduce the
observed coeÆcient of determination R2g= Model
Totaland we can write
(1�R2
g)Total
n2;gEgMg(x)��2
g=
�2g. The lack-of-�t term �2
g is therefore a decreasing function of the goodness of
�t of the model, as expressed by the coeÆcient of determination R2g, whence the
expression lack of �t. Note that R2gis also a decreasing function of �2
g. The estimated
relative variance reduction satis�es
M.RED =bV(bY �
g;one-phase)� bV(bY �g;two-phase)bV(bY �
g;one-phase)=
�1� n2;g
n1;g
�R2g (4.10)
To estimate the pure-error term we rewrite it as a density
�2g=
1
�(Fg)
Xi2Fg
Y �2i
�(Ki)+
1
�(Fg)
Xi2Fg
R2i
�(Ki)pi(4.11)
Hence, at a given point u we can construct the estimate
��2g (u) =1
�(Fg)
Xi2Fg
Y �2iIi(u)
�g
i�(Ki)
+1
�(Fg)
Xi2s2(u)
R2i
�g
i�(Ki)p
2i
(4.12)
We can obtain an estimate of �2gand of its variance by using the previous techniques
with Y �(x + el) = ��2g(x + el). Note also that �2
gcan be estimated for any new
sampling scheme with data from a past inventory: simply replace Ki by Ki;new,
�g
iby �
g
i;oldand p2
iby pi;oldpi;new in formula (4.12). At the end of section 5.2 we
shall give an alternative procedure to estimate the pure error term, which requires
only aggregated data and not the tree data at the plot level. Consequently, by the
ANOVA table, one can also estimate the lack-of-�t term �2gif the cluster geometry of
the existing inventory is the same as the cluster geometry of the inventory for which
we want to predict the anticipated variance (the inclusion probabilities appearing
31
in the pure-error terms can di�er). The crucial point is that we do not need
to know the Poisson-strata to do that: we only assume that they do exist. If
the working-strata are available in the form of thematic maps it is possible, using
simulations and GIS, to estimate the term (1 + �2;g)�22;g for any cluster structure.
This is not possible for the term (1 + �1;g)�21;g of the Poisson-strata. However, one
can estimate the term �21;g from the inventory at end (with clusters of nominal
size M simply compute M ANOVA tables for simple random sampling and pool
the resulting estimates of �21;g, likewise for �22;g). According to Mandallaz and Ye
(1999) it can be conjectured that
1 � 1 + �1;g � 1 + �2;g � EgM(x) +VgMg(x)
EgMg(x)
In short one can estimate the lack-of-�t term if the geometry of the cluster is
constant or at least work out a plausible scenario if it is not.
32
Chapter 5
Optimization
5.1 Costs
We shall use the following terminology, which is adapted from Mandallaz and Ye
(1999).
1. The �rst-phase costs can be written asP
D
g=1 n1;gEgM(x)c1g , which is asymp-
totically equivalent under Assumption III to n1EM(x)P
D
g=1 pgc1g . To sim-
plify we shall write this as n1EM(x)c1 where c1 is the mean �rst-phase cost
per point.
2. Within each domain Fg and for n2;g second-phase clusters, the total costs of
traveling between and within clusters, together with the installation costs will
be written as �g(n2;g). The total costs is thereforeP
D
g=1 �g(n2;g).
3. The linearized version of the traveling costs between clusters within domains
will be written as �g + sgn2;g. The intercept terms �g must be subtracted
from the overall budget C, so that we set ~C = C �PD
g=1 �g.
4. Let cog denote the mean installation costs per point and Tcg the mean traveling
time within clusters. Set ccg = cog +Tcg
EM(x). Then one has
�g(n2;g) � �g + c2gn2;g
where c2g = EM(x)ccg + sg.
5. c21g is the mean unit cost per �rst-stage tree to obtain the approximate value
Y �iof Yi. This might entail for instance the time required to measure only
the diameter at 1.3m.
6. c22g is the mean unit cost per second-stage tree to perform the extra measure-
ments required to know the exact response Yi. This might entail for instance
the time required to measure diameter at 7m and the tree height.
Consequently we shall use the following cost constraint when n1 and the n2;g are
given
C = EM(x)n1c1 +
DXg=1
�g(n2;g) + EM(x)
DXg=1
n2;g
0@c21g Xi2Fg
�g
i+ c22g
Xi2Fg
�g
ipi
1A(5.1)
33
and the following linear approximation when we also optimize the sample sizes
~C = EM(x)n1c1 +
DXg=1
c2gn2;g + EM(x)
DXg=1
n2;g
0@c21g Xi2Fg
�g
i+ c22g
Xi2Fg
�g
ipi
1A(5.2)
5.2 Discrete PPS Approximation
It was shown in Mandallaz and Ye (1999) that the optimal sampling schemes must
be found in the class of discrete PPS approximation. We follow here the new version
given in Mandallaz and Lanz (2001). We give the calculations for the true values
Yi. The results will be used in the next section mutatis mutandis for the predictions
Y �i. A single domains needs to be considerd. We assume that the N values Yi of
the response variable Y are partitioned into classes Cl; l = 1; 2 : : :K and that
the �rst-stage inclusion probabilities �i are stepwise constant, i.e. we set �i = ~�lwhenever Yi 2 Cl. In practice Yi is usually the timber volume of the i-th tree.
In Mandallaz and Ye (1999) the optimal discrete approximation was found in the
class �i = �f(Yi) where � is a constant and the stepwise constant function f is
de�ned according to f(Yi) = E� (Y jY 2 Cl) whenever Yi 2 Cl, where E
� denotes
the expectation with respect to the discrete distribution of the random variable Y
with values Yi. According to Mandallaz and Lanz (2001) this choice can be slightly
improved by using g(Yi) =pE� (Y 2 j Y 2 Cl) instead whenever Yi 2 Cl. We set
:=
PN
i=1Y2
i
f(Yi)PN
i=1 Yi� 1 (5.3)
This inequality follows from the following calculations:
NXi=1
Y 2i
f(Yi)= NE
� Y 2
f(Y )= N
KXl=1
P(Y 2 Cl)E� ( Y 2
f(Y )jY 2 Cl)
which by the de�nition of the function f() and the property of the conditional
expectation is equal to
N
KXl=1
P(Y 2 Cl)E� (Y 2jY 2 Cl)E� (Y jY 2 Cl) � N
KXl=1
P(Y 2 Cl)E� (Y jY 2 Cl) = NE� (Y ) =
NXi=1
Yi
Likewise we de�ne
~ :=
PN
i=1Y2
i
g(Yi)PN
i=1 Yi=
PN
i=1 g(Yi)PN
i=1 Yi� 1 (5.4)
The second equality results from
NXi=1
Y 2i
g(Yi)= NE
� Y 2
g(Y )= N
KXl=1
P(Y 2 Cl)E� ( Y2
g(Y )jY 2 Cl)
which by the de�nition of the function g() and the property of the conditional
expectation is equivalent to
N
KXl=1
P(Y 2 Cl) E� (Y 2jY 2 Cl)pE� (Y 2jY 2 Cl)
= N
KXl=1
P(Y 2 Cl)pE� (Y 2jY 2 Cl) =
NXi=1
g(Yi)
34
The inequality results from the fact that for Yi 2 Cl we have
g(Yi) =pE� (Y 2jY 2 Cl) �
pE�2 (Y jY 2 Cl) = E
� (Y jY 2 Cl)
and consequently also
NXi=1
g(Yi) � N
KXl=1
P(Y 2 Cl)E� (Y jY 2 Cl) = NE� (Y ) =
NXi=1
Yi
In contrast we note that by construction we have
NXi=1
f(Yi) = N
KXl=1
P(Y 2 Cl)E� (Y jY 2 Cl) = NE�Y =
NXi=1
Yi
We note that = ~ = 1 when the number of classes K is equal to the number of
distinct Yi values. From (5.4) we have at once
NXi=1
Y 2i
g(Yi)=
NXi=1
g(Yi) = ~ �(F )Y (5.5)
We also have the inequalities
� ~ 2 � ~ (5.6)
The second holds because ~ � 1. To prove the �rst set
al =
sP(Y 2 Cl)E
� (Y 2jY 2 Cl)E� (Y jY 2 Cl) and bl =
pP(Y 2 Cl)E� (Y jY 2 Cl)
Then check that =P
la2
l
Plb2
l
E�2(Y )
and use the Cauchy-Schwartz inequality
Xl
a2l
Xl
b2l � (Xl
albl)2 (5.7)
to get the result. For one single class (K = 1) it can be checked that ~ 2 = .
When Yi is the timber volume based on the DBH (Diameter at Breast Height,
i.e. 1:3m above the ground) numerical integrations for many analytical distribu-
tions and direct calculations based on empirical DBH distributions from the Swiss
National Inventory (SNI) show that for two and more classes Cl one has ~ 2 �
(see section 6.2).
We now apply the above results to the anticipated variance (4.8) and in partic-
ular to the termPN
i=1Y2
i
�i(note that Yi = Y �
iin one-stage sampling). By Cauchy-
Schwartz inequality with ai =Yip�i
and bi =p�i one has
NXi=1
Y 2i
�i�
�PN
i=1 Yi
�2P
N
i=1 �i
with equality if and only if �i = �Yi 8i, so that the lower bound is achieved with
inclusion probabilities proportional to size (PPS). In practice this is only possible
when Yi is the basal area and the �i are obtained with the angle count technique.
In general the best we can do is to approximate the PPS sampling scheme by using
stepwise constant inclusion probabilities leading to the well known techniques with
35
concentric circles. This can be achieved by using the f(Yi) and g(Yi) de�ned above.
From (5.3), (5.4), (5.6) we have
NXi=1
Y 2i
f(Yi)=
NXi=1
Yi � ~
NXi=1
Yi =
NXi=1
Y 2i
g(Yi)
Hence, we can conjecture that the discrete PPS approximation based on the g(Yi)
is better than the approximation based on the f(Yi). We now prove that this choice
is indeed optimal. Let us denote by ~�l the inclusion probabilities of any sampling
scheme with stepwise constant inclusion probabilities, i.e. �i = ~�l whenever Yi 2 Cl,and by Nl the number of Yi in Cl (not necessarily distinct). Then write
NXi=1
Y 2i
�i
! NXi=1
�i
!=
KXl=1
1
~�l
XYi2Cl
Y 2i
! KXl=1
Nl~�l
!
SinceP
Yi2Cl Y2i= NlE
� (Y 2jY 2 Cl), the right hand side can be rewritten as KXl=1
1
~�lNlE
� (Y 2jY 2 Cl)!
KXl=1
Nl~�l
!
Let
al =
pNl
pE� (Y 2jY 2 Cl)p
~�land bl =
pNl~�l
Then use Cauchy-Schwartz inequalityP
la2l
Plb2l� (P
lalbl)
2 to obtain
NXi=1
Y 2i
�i�
�PN
i=1 g(Yi)�2
PN
i=1 �i=
~ 2�P
N
i=1 Yi
�2PN
i=1 �i:=
~ 2�P
N
i=1 Yi
�2m1
with equality if and only if bl = �al 8l, which is equivalent to ~�l = �pE� (Y 2jY 2 Cl).
In other words the lower bound is precisely achieved when the discrete inclusion
probabilities are given by the g(Yi), which completes the proof.PN
i=1 �i = m1 is
the expected number of trees sampled. Therefore, under a (cost) constraint, the
lower bound is uniquely de�ned. For the �i based on the f(Yi) one obtains simi-
larlyPN
i=1Y2
i
�i=
(P
N
i=1Yi)
2
m1
, which according to (5.6) is indeed larger. However, in
practice, the di�erence between the two lower bounds is likely to be small.
One can obviously apply this technique with the predictions Y �iinstead of the
true values Yi to de�ne the optimal discrete approximation, based on g(Y �i), of the
scheme with �rst-stage inclusion probabilities exactly proportional to the predic-
tions Y �i, the so called PPP scheme.
Let us now consider the second-stage termP
N
i=1R2
i
�ipiappearing in (4.8). By
Cauchy-Schwartz one has
NXi=1
R2i
�ipi�
�PN
i=1 jRij�2
PN
i=1 �ipi:=
�PN
i=1 jRij�2
m2
Hence, the lower bound will be achieved if the unconditional second-stage proba-
bilities �ipi are proportional to the errors jRij, i.e. if we have a so called PPE
scheme. Again, the lower bound is uniquely de�ned under a constraint for the ex-
pected number of second-stage trees m2 =PN
i=1 �ipi. In practice the optimal �i's
are given and, with a model predicting jRij with the Y �i, we can implement exact
PPE, and there is no need for a discrete approximation.
36
The de�nitions of and ~ rest explicitly upon the de�nitions of the f() and
g() functions. Up to now most of the existing inventories, if not all, use inclusion
probabilities �i which are not directly related to either f() or g(). According to
Lanz (2000) we de�ne the coeÆcient � which is valid for any � = (�1; �2; : : : ; �N )
by setting
� =
PN
i=1Y2
i
�i
PN
i=1 �i�PN
i=1 Yi
�2 � 1 (5.8)
The inequality results again from Cauchy-Schwartz and � = 1 only for exact PPS.
By using (5.3) and (5.4) one checks that if �i = �f(Yi) then � = and that if
�i = �g(Yi) then � = ~ 2. Writing si = �(F )�i for the inclusion area one can write
� =E� Y 2
i
siE�si
(E�Yi)2
(5.9)
which is useful for estimating � from the empirical DBH distribution. Finally, the
following relation is important to calculate the anticipated variance
1
�2(F )
NXi=1
Y 2i
�i=
�
m1
Y2
(5.10)
wherem1 =PN
i=1 �i is the expected number of trees sampled, which can be obtained
from
m1 = NE�si (5.11)
N = N
�(F )is the density of stems per ha. In the next section we shall use the
above results to derive the optimal sampling schemes, which will turn up to be a
combination of PPP and PPE schemes.
5.3 Optimal Sampling Schemes
We shall minimize the anticipated variance (4.8) for given costs and given sample
sizes n1 and n2g within the classes �g
i= �1ggg(Y
�i) and �
g
ipi = �2gjRij by using La-
grange multipliers. We assume that Assumptions I-V hold.To simplify the formulae
we shall use the abbreviation M = Ex2AM(x) = EgMg(x) in this section. Within
each domain Fg we use the optimal PPS approximation gg(:) with ~ g based on the
predicted values Y �i, i 2 Fg . We also set "g =
jRjg
Y g
with jRjg= 1
�(Fg)
Pi2Fg jRij.
"g is the relative prediction error at the tree level in the domain Fg . Proofs are
essentially the same as in Mandallaz and Ye (1999) so that we only give the end
results. The optimal �rst-stage inclusion probabilities are given by
�(Fg)�g
i=
C �Mn1c1 �PD
g=1 �g(n2;g)PD
g=1 pg�Yg(pc21g~ g +
pc22g"g)
pggg(Y�i)
n2;gMpc21g
(5.12)
where the second-stage inclusion probabilities satisfy
�(Fg)�g
ipi =
C �Mn1c1 �PD
g=1 �g(n2;g)PD
g=1 pg�Yg(pc21g~ g +
pc22g"g)
pgjRijn2;gM
pc22g
(5.13)
37
After some algebra one obtains the lower bound M A V(bY �) of the anticipated vari-
ance for given n1 and n2;g
M A V(bY �) =
�PD
g=1 pg�Yg(pc21g~ g +
pc22g"g)
�2C �Mn1c1 �
PD
g=1 �g(n2;g)
+
DXg=1
p2g
n2;gM�2g +
1
n1M(1 + ~�2) ~�2
2(5.14)
But for the second term, containing the lack of �t, this is the expression given in
Mandallaz and Lanz (2001) (or Mandallaz and Ye (1999) in the case of one domain).
This term has far-reaching consequences. Indeed, in contrast to the previous
papers, the partial derivatives@MAV(bY )�
@n2;gare no longer positive and one
can �nd n2;g values with zero derivatives yielding a true minimum. For
arbitrary �g(n2;g) this can be done numerically in principle. To go further we
linearize as usual the traveling costs according to �(n2;g) = �g + c2gn2;g and ~C =
C �PD
g=1 �g. To simplify the notation we set � =P
D
g=1 pg�Yg(pc21g~ g +
pc22g"g).
We have to solve the equations
@M A V(bY )�@n2;g
=�2c2g
( ~C �Mn1c1 �P
D
g=1 n2;gc2g)2� 1
M
p2g�2g
n22;g= 0 (5.15)
and
@M A V(bY )�@n1
=�2Mc1
( ~C �Mn1c1 �PD
g=1 n2;gc2g)2� 1
M
(1 + ~�2 ~�22
n21= 0 (5.16)
Dividing the two equations and setting n1;g = n1pg we obtain the relation at the
optimum
n2;g
n1;g=
�gpMpc1q
(1 + ~�2) ~�2pc2g
(5.17)
This leads toDXg=1
n2;gc2g =n1P
D
g=1 pgpc1c2g�g
pMp
1 + ~�2 ~�2(5.18)
Replacing (5.18) into (5.16) and taking the square root we get a linear equation for
n1. This gives us n1 and the n1;g = n1pg and by (5.17) the n2;g . The solutions read
n1 =~Cp1 + ~�2 ~�2
Mpc1
(5.19)
n2;g =~Cpg�gpMpc2g
(5.20)
where we have set
:=
DXg=1
pg �Yg(pc21g~ g +
pc22g"g +
1pM
pc2g�g) +
pc1
q1 + ~�2 ~�2
�g =�g
Y g
is the relative lack of �t. Replacing the optimal sample sizes back into
(5.14) we obtain (after tedious but elementary algebra) the absolute lower bound
38
of the anticipated variance as
M A Vn1;n2;g (bY �) =�P
D
g=1 pg�Yg(pc21g~ g +
pc22g"g +
1pM
pc2g�g) +
pc1p1 + ~�2 ~�2
�2~C
(5.21)
If the lack-of-�t terms �g tend towards zero we see by (5.20) that the number of
terrestrial plots should be as small as possible and that in this case (5.21) yields
the result given in Mandallaz and Lanz (2001), so that we have perfect consistency.
For easier reference we summarize the solutions yielding the absolute lower
bound, they read:
n1 =~Cp1 + ~�2 ~�2
Mpc1
n1;g = n1pg
n2;g
n1;g=
�gp1 + ~�2 ~�2
sMc1
c2g
�(Fg)�g
i=
rc2g
c21g
1pM
gg(Y�i)
�g= m1g
gg(Y�i)
~ gY g
�(Fg)�g
ipi =
rc2g
c22g
1pM
jRij�g
= m2g
jRijjRj
g
m1g =
rc2g
c21g
1pM
~ gY g
�g
m2g =
rc2g
c22g
1pM
jRjg
�g
(5.22)
m1g =P
i2Fg �g
iand m2g =
Pi2Fg �
g
ipi are the expected number of �rst and
second-stage trees. It can be checked that these solutions satisfy the cost constraint.
If one uses the fg() instead of the gg() functions then one must replace ~ g byp g
into (5.21) to obtain the lower bound, which is slightly increased according to (5.6).
Intuitively speaking we see that the optimal inclusion probabilities attempt to
mimic exact PPS by a combination of discrete PPP based on the gg(Y�i) and exact
PPE based on the error jRij.If the lack-of-�t terms vanish it is possible, according to Mandallaz and Lanz
(2001), to obtain a true minimum by requiring the expected number of �rst-stage
trees to be equal to a preassigned constant m1g in each domain. Let us do the same
with the lack of �t.
To ful�ll this constraint we set
�(Fg)�g
i=m1ggg(Y
�i)
~ gY g
(5.23)
and we optimize the second-stage inclusion probabilities in the class �g
ipi = �2g jRij.
For given n1 and n2;g we use again the Lagrange multiplier technique and we obtain
the optimal second-stage inclusion probabilities as
�(Fg)�g
ipi =
1
�
pg
n2;gpc22g
jRij (5.24)
39
where we have set
� =MPD
g=1 pgjRg jpc22gC �Mn1c1 �
PD
g=1 �(n2;g)�MPD
g=1 n2;gm1gc21g
Using these expressions in (4.8) we obtain the anticipated variance for given m1g
E!V1;2;3(bY �jm1g) =�PD
g=1 pgjRg jpc22g�2
C �Mn1c1 �P
D
g=1 �(n2;g)�MP
D
g=1 n2;gm1gc21g+
DXg=1
p2g
n2;gMm1g
~ 2gY2
g
+
DXg=1
p2g
n2;gM�2g +
1
n1M(1 + ~�2) ~�
22
(5.25)
Linearizing the traveling costs and using the same techniques as for solving
(5.15) and (5.16) we obtain the solutions
n1 =~Cp1 + ~�2 ~�2
Mpc1 1
n2;g =
~Cpg
r~ 2gY2
g
m1g
+�2gq
M~c2g 1
(5.26)
where we have set
~c2g = c2g +Mm1gc21g
and
1 =
DXg=1
pgjRg jpc22g + 1pM
DXg=1
pgp~c2g
vuut~ 2gY2
g
m1g
+�2g+pc1
q1 + ~�2 ~�2
Tedious but elementary calculations give the corresponding lower bound of the
anticipated variance for given m1g as
M A V(bY �jm1g) =�PD
g=1 pgY g
�"gpc22g +
1pM
p~c2g
q~ 2g
m1g
+ �2g
�+pc1p1 + ~�2 ~�2
�2~C
(5.27)
Again, if the lack-of-�t terms �g vanish we get the same lower bound as given in
Mandallaz and Lanz (2001). Furthermore, one can minimize (5.27) with respect to
the m1g . This is equivalent to minimizing the terms
~c2g
~ 2gY2
g
m1g
+�2g
!which, by the de�nition of ~c2g , is equivalent to minimizing
c2gY2
g~ 2g
m1g
+Mm1gc21g�2g
40
It can be checked that doing so leads to exactly the same solutions and lower bound
as in (5.21) and (5.22), e.g. without constraint on the expected number of �rst-
stage trees. Thus we have perfect consistency with the previous results. In much
the same way we consider �nally the sampling schemes for given expected values
m1g and m2g of the numbers of �rst-stage and second-stage trees. That is, we set
�(Fg)�g
i=
m1ggg(Y�i)
~ gY g
�(Fg)�g
ipi =
m2gjRijjRj
g
(5.28)
Substituting this into (4.8) and using the relation (5.5) we obtain
E!V1;2;3(bY �jm1g;m2g) =
1
M
DXg=1
p2g
n2;g
0@ jRj2gm2g
+~ 2gY2
g
m1g
+�2g
1A+1
n1M(1 + ~�2) ~�
22
(5.29)
Linearizing the traveling costs we obtain the constraint
~C =Mn1c1 +
DXg=1
n2;gc�2g (5.30)
where c�2g = c2g +M(c21gm1g + c22gm2g). Using the Lagrange technique as for
deriving (5.19), (5.20) and (5.26) we get after some algebra the solutions
n1 =~Cp1 + ~�2 ~�2
Mpc1�
n2;g =~CpgAg
pM
Mpc�2g�
(5.31)
where we have set
Ag =
vuut jRj2g
m2g
+~ 2gY2
g
m1g
+�2g
and
� =1pM
DXg=1
pg
qc�2gAg +
pc1
q1 + ~�2 ~�2
After tedious but simple algebra substituting these expressions into (5.29) leads to
the following lower bound of the anticipated variance for given m1g and m2g .
M A V(bY �jm1g;m1g) =�1pM
PD
g=1 pgY g
pc�2g
q"2g
m2g+
~ 2g
m1g+ �2g +
pc1p1 + ~�2 ~�2
�2~C
(5.32)
Again, one can minimize the expression
T = c�2g
0@ jRj2gm2g
+~ 2gY2
g
m1g
+�2g
1A41
appearing in the lower bound (5.32) with respect to m1g and m2g. It can be checked
that the m1g and m2g given in (5.22) satisfy the equations @T
@m1g= @T
@m2g= 0.
Furthermore, if we set the corresponding solutions into (5.32) we get the absolute
lower bound given in (5.21). Thus we have full consistency between all the optimal
solutions, either without constraint on the number m1g, m2g of trees sampled, or
for given m1g only, or for given m1g and m2g . The end result does not depend on
the sequence chosen for optimization. In this sense the lower bounds for �xed m1g
and m2g given in section 5.2 justi�es a posteriori the strategy of seeking for a global
minimum in the discrete PPP class and the exactPPE class. Let us also emphasize
the fact that in principle one can numerically �nd the optimal solutions for arbitrary
traveling cost functions �(n2g). However, due to the inherent uncertainty of the
traveling cost functions we believe that linear approximations, of say a square root
law, as given in Mandallaz (1997) should be adequate in practice. Besides, they
alone can yield qualitative insight required for calculating the relative eÆciency of
various sampling schemes and for determining the achievable lower bounds.
42
Chapter 6
Examples
6.1 Simulations
We re-analyze the simulation results given in Mandallaz and Ye (1999) by using
in two-phase sampling the anticipated variance with lack of �t. The simulations
are based on an almost real and relatively large forest of 10090ha with 7 strata
(de�ned by the stand age) and 3800000 trees above 12cm DBH(diameter at breast
height). Almost real in the sense that this forest was constructed by using a soft-
ware for the automatic interpretation of aerial photographs. The procedure is too
sophisticated to be explained here and the reader may consult Ye (1995) for details.
The positions of the 3800000 (visible) trees were estimated automatically from the
aerial photographs, together with the surface area of the canopy. The DBH and
volumes of the trees were then simulated according to a prediction model (based
on previous studies), which relates canopy parameters to the volume. The resulting
error on the tree volume is largely irrelevant for our purposes. We consider this
forest as one single domain (D = 1). The true average timber volume is 320m3 per
ha with a between-strata coeÆcient of variation 100 � �2
Y= 65% (rather than 35%
as given in Mandallaz and Ye (1999), due to a typing error). For each sampling
scheme we simulated for the �rst phase 500 systematic grids 200m� 200m yielding
an expected value of n1 = 293:6 with M(x) 6= 0 per run. The second phase is the
sub-grid 400m� 400m with expected value n2 = 73:4. The cluster consists of the 4
vertices of a 50m� 50m square, yielding EM(x) = 3:7 and VM(x) = 0:6. The 32
sampling schemes were:
� 12 schemes with 1 circle with surface areas between 100m2 and 500m2, plot
symbol ? in Fig. 1; 2; 3
� 8 schemes with 2 concentric circles of:
(100; 600), (200; 300), (200; 400), (200; 500), (200; 600), (300; 400), (300; 500),
(400; 500) (m2) with DBH thresholds at 36cm, plot symbol 2.
� 7 schemes with 3 concentric circles of (60; 300; 900), (200; 300; 400),
(200; 300; 500), (200; 300; 600), (300; 400; 500), (300; 400; 600), (400; 500; 600)
m2 with DBH thresholds at 24cm and 41cm, plot symbol �.� 5 schemes with angle-count factors k = (2; 2:5; 3; 3:5; 4), plot symbol 4.
Predictions were based on the ordinary least squares (see section 2.3). In order
to reduce the computing time adjustments for boundary e�ects were performed
only at the forest edge and not at the boundaries of the working-strata (i.e. we
used Y1(u)), the resulting bias is negligible. The 500 runs allow us to calculate
43
Figure 6.1: Empirical Variance E.V. v. Mean Estimated Variance M.E.V.
E:V: = �135:9+ 0:95�M:E:V:, R2 = 0:96
Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4
the empirical variance of the 500 point estimates as well as the mean of the 500
estimated variances, calculated under the assumption of random sampling. Under
random sampling both have the same expected values. The in ation factor for
the working-strata was estimated, for each of the 32 schemes, by a single large
simulation ( 10000 random clusters). One has 1 + �2 � 2:30. We estimate the lack
of �t by using the ANOVA decomposition given in Table 1 for a single domain. In
this example we have one-stage sampling and therefore pi � 1 and Ri � 0. The
pure error term �2 is known exactly as well as �22 . Hence, we have an estimate of
(1 + �2)�22 , the model sum of squares, for the 32 examples. In theory this quantity
does not depend on the inclusion circles used and should therefore be constant.
However, due to the sampling errors on 1 + �, the values(1+b�)�2
2
n2Mfall between 238
and 266 with a mean of 251. Subtracting the pure error from the mean of the
500 estimated variances we get an estimate of(1+b�)�2
1
n1Mand therefore also of �2
n2M
for each of the 32 schemes. Again, the theoretical lack of �t is independent from
the sampling schemes used. Because of the sampling error of the mean estimated
variance the values of �2
n2Mfall between 188 and 233 with a mean of 214. Most of
this variability is due to the sampling schemes with one single circle. The mean
coeÆcients of determination R2 increased from 0:4 (one circle) to 0:44 (two circles),
to 0:45 (three circles) and to 0:47 (angle count).
Fig. 6.1 gives the relationship between the empirical variance E.V. (i.e. the
empirical variance of the 500 point estimates obtained for each sampling scheme,
which is an unbiased estimate of the true variance under systematic sampling) and
the mean estimated variance M.E.V. (i.e. the mean of the 500 variance estimates
obtained by treating the systematic sample as a random sample). Fig. 6.2 gives the
relationship between the mean estimated varianceM.E.V. and the the anticipated
variance A.V. without lack of �t and Fig. 6.3 gives the relationship between the
M.E.V. and the anticipated variance with lack of �t A.V.L.. Though the R2 of
the regression lines in Fig. 6.2 and Fig. 6.3 are both very close to 1, the slope and
intercept are much closer to their ideal values of 0 and 1 in Fig. 6.3.
In practice, of course, the regression lines are not known and it is therefore bet-
ter to use the anticipated variance with lack of �t. In general, treating a systematic
44
Figure 6.2: Mean Estimated Variance M.E.V. v. Anticipated Variance
without Lack of Fit A.V.
M:E:V: = 249:2+ 0:77�A:V:, R2 = 0:99
Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4
Figure 6.3: Mean Estimated Variance M.E.V. v. Anticipated Variance
with Lack of Fit A.V.L.
M:E:V: = 17:02+ 0:91�A:V:L:, R2 = 0:99
Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4
45
Figure 6.4: Empirical E.RED v. Mean Estimated M.RED variance reduc-
tion (in %)
E:RED: = �25 + 1:43�M:RED:, R2 = 0:50
Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4
sample as a random sample overestimates the error, so that we are usually on the
safe side. In this example the mean estimated error over-estimates the empirical
error for the volume by ca 25% on the average. This is in agreement with results
obtained for the basal area in a similar forest by using geostatistical methods (Man-
dallaz (1993) and Mandallaz (2000)), which, however, are rather diÆcult to use for
estimation and even more so for optimization. Since the relationships between
E.V., M.E.V. and A.V.L are linear the anticipated variance with lack of �t will
almost certainly select the more eÆcient of two schemes if they really di�er and also
give a good conservative estimate of the error. In this example the pure-error terms
account on average for ca 25% (one circle) to 10% (angle count) of the variance (of
Y (u)), the working-strata for 40% to 50% and the lack of �t for 35% to 40%. The
simulations also show that the estimation of the lack of �t and of the in ation factor
require rather large samples to be reliable. This implies that one should investigate
the stability of the optimal solutions by considering many scenarios for these pa-
rameters. Fig. 6.4 displays the relationship between the empirical relative variance
reduction E.RED. and the estimated mean relative variance reduction M.RED.,
de�ned by (4.10). The empirical reductions are signi�cantly smaller, but of course
unknown in practice, whereas the M.RED are close to the theoretical values ob-
tained from (4.10). The simulations show that the anticipated variance with lack of
�t approximates very well the behavior of the empirical and estimated variances.
46
6.2 Swiss National Inventory
This example is based on the �rst 1983-1985 Swiss National Inventory (SNI1) and
on the second 1993-1995 Swiss National Inventory SNI2. SNI1 is a one-phase
two-stage simple sampling scheme using plots with two concentric circles: 200m2
and 500m2 with DBH thresholds at 12cm and 36cm. In all, n2 = 100974 plots
were available from a 1km� 1km grid with m1 � 11:7. The second-stage sampling
procedure used essentially equal probability sampling with �i � 0:33 for DBHi <
60cm and �i = 1 forDBHi � 60cm, which resulted in m2 � 4. SNI2 is a two-phase
two-stage sampling scheme with the same concentric plots, but with only half the
original plots from the sub-gridp2km �p2km, whereas the �rst phase for aerial
photographs is based on a 0:5km� 0:5km grid leading to n1 = 510296, n2 = 60412,m1 � 12. The second-stage procedure was changed to implementPPE withm2 � 2.
Switzerland (CH) is divided into D = 5 domains: Jura (JU), Swiss Plateau (SP),
Pre-Alps (PA), Alps (AL) and Southern Alps (SA). The working-strata are based
on the semi-automatic interpretation of aerial photographs to determine the average
tree height above the ground (with 25 points per plot). This leads to 5 working-
strata within each domain: 0m, 0m� 10m, 10m� 20m, 20m� 30m, > 30m. The
correponding R2's fall into the range 0:12� 0:24, which is not as good as the usual
stand-map strati�cation with R2 � 0:4, but requires only the aerial photographs
available from the Swiss Topographic Survey. The post-strati�cation procedure used
the 5 working-strata within each domain separately, which explains why R2 = 0:27
for (CH) is higher than the R2's in each domain. The costs parameters, as well as
the pg, Y g are based on the �rst inventory, whereas the estimation of the lack of �t
and of the pure error had to be based on the second.
We shall now brie y discuss the construction of the function �(n2;g) for this
particular case. tg is the mean time required to go, usually by car, from either the
lodging facility to the topographic �xed point (marked on the aerial photograph and
easily accessible) nearest to the sample plot or from one �xed point to the nearest
�xed point. cg, the mean installation time, is the time required to access the vicinity
of the sample plot from the nearest �xed point, usually by walking, plus the time
required to locate and secure exactly the center of the permanent plot. The total
installation time increases approximately linearly with the number of sample points.
Let n0g = 100974pg be the number of terrestrial plots available from SNI1 in each
region. With a square root law for the traveling time from �xed point to �xed point
we can write �(n2;g) � n2;gcg + tgpn2;g
pn0;g. With this choice we insure that the
total traveling time is the one observed when n2;g = n0;g . Then one linearizes the
traveling time, over a given range for n2;g , according to tgpn2;g
pn0;g = �g+sgn2;g,
to obtain �(n2;g) � n2;gcg + sgn2;g + �g = n2;gc2;g + �g with c2g = cg + sg. The
R2's for the six linear regressions were all above 0:97 in their respective range.
Taking into account the inherent diÆculty to assess the traveling costs the �t is
therefore more than acceptable. Recall that ~C = C�PD
g=1 �g . Table 6.1 (based on
Lanz (2000)) gives the parameters required to calculate the c2g values. Tables 6.2
(where, according to (5.9),p � is given for easier comparison with the optimal ~ )
gives a summary of the parameters required for the optimization. The cost for the
orientation and interpretation of an aerial photograph is constant c1 = 10 minutes.
Note that all the terrestrial costs are given in total time unit for a crew of two
persons (i.e. the e�ective time spent is only half of the values given).
Table 6.3 gives the proportion of the variance due to the pure error, the working-
strata and the lack of �t, as well as the�2;g
Y g
needed to calculate~�2Y
= 0:42 (last cell
in Table 6.3). Note that in this example all the in ation factors 1+�1;g, 1+�2;g are
equal to 1 (simple random sampling). The pure error terms for the SNI inclusion
circles were obtained in each region according to the formulae (5.9), (5.10), (5.11)
47
and the DBH distribution in the regions. The lack-of-�t terms were then estimated
by reconstructing the ANOVA tables (4.1) from the empirical errors obtained in each
region by using the second phase only and the corresponding R2's. The SNI result
for CH assumed the pg to be known exactly and therefore ignored the 1n1�2 term.
This small adjustment changes the relative error from 0:88% to 0:89%.
Table 6.1: Installation and Traveling Costs
cg tg n2;g range �g sg c2g(hrs) (hrs) (hrs) (hrs) (hrs)
JU 3.25 0.80 400-10'000 1'038 0.27 3.52
SP 3.01 0.74 400-10'000 1'037 0.27 3.28
PA 4.31 1.04 400-10'000 1'386 0.36 4.67
AL 5.20 1.20 600-16'000 2'537 0.41 5.61
SA 6.96 1.44 200- 6'000 1'160 0.52 7.48
CH 4.46 1.04 2'000-20'000 4'950 0.55 5.01
Legend
cg : installation time, tg : traveling time, �g: intercept for traveling cost, sg: slope
for traveling cost, c2g : linearized installation and traveling cost per point.
Table 6.2: Parameters of SNI
pg Ng c2g c21g c22gp � "g Y g �g R
2
g
(%) (hrs) (min) (min) (%) (m3=ha) (%) (%)
JU 18 468 3.52 1.9 4.8 1.28 15 328 47 15SP 21 454 3.28 1.9 4.8 1.26 14 403 49 24PA 19 508 4.67 2.0 4.8 1.26 16 419 53 20AL 30 445 5.61 2.1 5.1 1.31 19 292 70 21SA 12 425 7.48 2.1 5.1 1.48 22 178 72 12
CH 100 460 5.01 2.0 5.0 1.30 16 332 57 27
Legend
pg: relative surface area, Ng: stem density, c2g : linearized installation and travel-
ing cost per point, c21g: unit cost per �rst-stage tree, c22g : unit cost per second-
stage tree,p �: from equation (5.9), "g: relative prediction error at tree level,
Y g: timber volume per ha, �g : relative lack of �t, R2g: coeÆcient of determination.
In order to illustrate the results on discrete PPP the empirical distribution of
DBH (as obtained from SNI1 for CH) is displayed in Fig. 6.5. Fig. 6.6 displays
, ~ as a function of the DBH threshold for the two concentric circles (for the
variable Y �i, which is the timber volume based only on the DBH). The minima
are ~ = 1:18699 and = 1:41467 both at DBH = 31cm. Fig. 6.7 shows that the
ratio q =
~ 2is very close to 1. The threshold values for the optimal scheme with
three concentric circles are: DBH = 24cm and DBH = 41cm with ~ = 1:09. By
comparison the scheme with one circle only gives ~ = 1:67. Hence, the eÆciency
gain from one to two circles is substantial, whereas the gain from two to three
circles is marginal if one takes the increased complexity into account. The � were
determined on the basis of empirical studies performed at the end of the seventies
48
Table 6.3: Components of Variance in SNI
�2
g
VgY (u)
VgbY (u)
VgY (u)= R2
g
�2g
VgY (u)
�2;g
Y g
(%) (%) (%) (%)
JU 35 15 50 26
SP 28 24 48 35
PA 26 20 54 32
AL 20 21 59 42
SA 29 12 59 32
CH 24 27 49 42
Legend�2
g
VgY (u): pure error,
VgbY (u)
VgY (u)= R2
g : working-strata,�2g
VgY (u): lack of �t,
�2;g
Y g
:
coeÆcient of variation between working-strata.
(in the district of Nidwald). They correspond neither to f(), nor to g() nor to
any other closed mathematical expression, but, as we shall see, they did serve their
purpose very well indeed.
Figure 6.5: Distribution of DBH in SNI1
Legend: f relative frequency in %, DBH diameter class in cm
Table 6.4 gives the key parameters of the optimal two concentric circles.
Table 6.5 gives the parameters of the optimal sampling scheme with an overall
budget C = 440307hrs ( ~C = 370149hrs) equal to the variable costs obtained with
SNI2. Likewise, Table 6.6 gives the optimal sampling scheme for Switzerland con-
sisting of D = 1 domain only, whereas Table 6.7 adds the constraint m1 = 11:7
( ~C = 390457 in both cases.)
Table 6.8 displays the relative empirical and anticipated errors for SNI2 and
for the various optimal sampling schemes.
Recall that the anticipated errors are based on SNI1 data except for the lack-
of-�t terms which had to be based on SNI2. The perfect agreement between the
�rst two colums in Table 6.8 is therefore tautological, it simply shows that up to
very small rounding errors the calculations are consistent. Fig. 6.8 and Fig. 6.9
display the relative anticipated error and the surface areas of the inclusion circles
49
Figure 6.6: Gamma Values for two Concentric Circles according to DBH
Threshold in cm
Legend: upper curve, ~ lower curve
Figure 6.7: Ratio q =
~ 2according to DBH Threshold in cm
for the optimal scheme with D = 1 as a function of the number n2 of terrestrial
plots when the number n1 of aerial photographs is optimal for given n2 (this can
be easily obtained by di�erentiation from (5.14)). Likewise, Fig. 6.10 displays the
relative error for the optimal scheme with D = 1 and m1 = 11:7, that is when the
number of �rst-stage trees is equal to the observed SNI1 value.
50
Table 6.4: Optimal Concentric Circles
threshold ~ g1g(Y�i) g2g(Y
�i)
(cm) (m3) (m3)
JU 31 1.18 0.33 2.00
SP 31 1.17 0.33 2.07
PA 32 1.17 0.36 2.18
AL 32 1.19 0.35 2.29
SA 31 1.25 0.28 2.44
CH 31 1.19 0.32 2.15
Legend
~ : from equation (5.4), g1g(Y�i) and g2g(Y
�i): values of step function g().
Table 6.5: Optimal Sampling Scheme with D = 5
small circle large circle m1g m2g n1;g n2;gn2;g
n1;gerror
(m2) (m2) (%) (%)
JU 226 1'368 26.5 2.1 4'154 999 24 1.73
SP 170 1'067 24.3 1.8 4'846 1'547 32 1.49
PA 192 1'162 26.1 2.3 4'385 1'319 30 1.69
AL 217 1'418 21.5 2.2 6'924 1'749 25 1.88
SA 319 2'783 25.4 2.9 2'769 380 14 4.01
CH 23'079 5'995 0.85
Legend
m1g : number of �rst-stage trees per point, m2g : number of second-stage trees per
point, n1;g: number of �rst-phase points, n2;g: number of second-phase points.
Table 6.6: Optimal Sampling Scheme with D = 1
small circle large circle m1 m2 n1 n2n2
n1error
(m2) (m2) (%) (%)
CH 207 1'393 25.6 2.2 23'668 5'859 25 0.86
Legend
m1: number of �rst-stage trees per point, m2: number of second-stage trees per
point, n1: number of �rst-phase points, n2: number of second-phase points.
51
Table 6.7: Optimal Sampling Scheme with D = 1 and m1 = 11:7
small circle large circle m1 m2 n1 n2n2;g
n1;gerror
(m2) (m2) (%) (%)
CH 95 636 11.7 1.9 22'878 6'393 28 0.89
Legend
m1: number of �rst-stage trees per point, m2: number of second-stage trees per
point, n1: number of �rst-phase points, n2: number of second-phase points.
Table 6.8: Empirical and Anticipated Relative Errors in %
SNI2 SNI2 D = 5 D = 1 D = 1
m1 = 11:7
e.e. a.e. a.e. a.e. a.e.
JU 1.81 1.81 1.73 1.70 1.80
SP 1.72 1.71 1.49 1.64 1.73
PA 1.88 1.87 1.69 1.81 1.84
AL 1.88 1.88 1.87 1.86 1.86
SA 3.18 3.18 4.01 3.06 3.15
CH 0.89 0.89 0.85 0.86 0.89
Legend
e.e. empiricical relative error, a.e. anticipated relative error.
Figure 6.8: Relative Anticipated Error: D = 1
Legend: a.e. relative anticipated error in %, n number of terrestrial plots
52
Figure 6.9: Surface Areas of Optimal Circles: D = 1
Legend: s.a. surface areas in m2, n number of terrestrial plots
Figure 6.10: Relative Anticipated Error: D = 1;m1 = 11:7
Legend: a.e. relative anticipated error in %, n number of terrestrial plots
53
Comments
1. The optimal scheme with D = 5 yields surface areas for the large circles which
are too large for �eld work because of complex boundary adjustments and
slope corrections, especially in the Alps. With respect to SNI2 the relative
error is decreased by 3:5%, or, equivalently, the cost is reduced by 7%, which
does not justify the increased complexity. The same conclusion was given
in Mandallaz and Lanz (2001), however without taking the lack of �t into
account.
2. Essentially the same can be said for the simpli�ed optimal scheme with D = 1.
3. The optimal scheme with D = 1 and m1 = 11:7 is very close to the SNI2,
except for the number of aerial photographs and the surface area of the small
circle, which are halved. The errors are the same. In this sense, the SNI is
remarkably \optimal".
4. In practical terms the optimum is rather \ at" with respect to the resulting
error, which remains within narrow bounds over a large range of n2. The
characteristics of the design, on the other hand, vary much more.
5. The only feasible possibility to increase substantially the eÆciency of SNI
is therefore to reduce the lack-of-�t term by improving the prediction model
based on the aerial photographs and thus increasing the R2 from 0:20 to
0:30� 0:40.
54
Chapter 7
Conclusions
Mathematically speaking we have solved completely the optimization problem of
two-phase two-stage cluster sampling schemes in the case where the known working-
strata used for post-strati�cation are not identical with the unknown Poisson-strata
generating the random location of the trees, which is a major advantage for appli-
cations. The only assumption required is that the sample points must be allocated
error-free to the working-strata. The setup is very general: costs and terrestrial
sampling densities are allowed to vary across domains, as well as the number and
structure of the concentric circles de�ning the �rst-stage inclusion probabilities.
The surfaces areas of the optimal concentric circles depend on the sample sizes used,
whereas the corresponding optimal DBH thresholds do not. The optimal second-
stage inclusion probabilities can be readily implemented if a prediction model for
the absolute error is available. One can �nd the optimal sample sizes with or with-
out constraint on the number of �rst- and second-stage trees. It is possible to give
closed analytical results after linearizing the traveling costs between clusters. If such
an approximation is questionable one can optimize numerically. To implement the
techniques presented in this work one must have some approximate prior knowledge
on the �rst- and second-stage measurement costs, the traveling costs and the forest
structure: DBH distribution, surface areas and mean values in the working-strata,
prediction error at the tree level and lack of �t of the post-strati�cation procedure
used. This information can be based on a past inventory: in such a case it is possible
to estimate the anticipated variance of any other inventory performed with simple
random sampling. This is also possible under cluster sampling provided that the
cluster geometry of the past and future inventory is the same (only the geometry,
not the number and structure of concentric circles, nor the second-stage procedure).
If the geometry di�ers one can at least work out a plausible scenario. It is in some
sense remarkable that the end results of such a complex optimization task should
require only simple algebra to be implemented. In any case, the anticipated vari-
ance with lack of �t is a very useful tool to investigate many possible alternatives,
to �nd the theoretical optimum and investigate its stability. Preliminary valida-
tions indicate that the anticipated variance approximates very well the behavior
of the empirical variance under two-phase sampling. It was also shown that the
design of the Swiss National Inventory is very close to the optimal scheme with
respect to the relative error achieved though its characteristics di�er from those of
the theoretically optimal scheme.
Future work should consider more case studies for validation, particularly with
respect to the stability of the optimum. At a more theoretical level one should
attempt to calculate the anticipated variance under models more sophisticated than
the local Poisson model and generalize the method to the multivariate situation.
55
References
Lanz, A. (2000) Optimal sample design for extensive forest inventories. Ph.D. thesis,
ETH Zurich, Chair of Forest Inventory and Planning.
Mandallaz, D. and Lanz, A. (2001) Further results for optimal sampling schemes
based on the anticipated variance. To appear in Canadian Journal of Forest
Research.
Mandallaz, D. and Ye, R. (1999) Optimal two-phase two-stage sampling schemes
based on the anticipated variance. Canadian Journal of Forest Research, 29,
1691{1708.
Mandallaz, D. (1991) A uni�ed approach to sampling theory for forest inventory
based on in�nite population models. Ph.D. thesis, ETH Zurich, Chair of Forest
Inventory and Planning.
Mandallaz, D. (1993) Geostatistical methods for double sampling schemes: appli-
cations to combined forest inventory. Technical report, ETH Zurich, Chair of
Forest Inventory and Planning.
Mandallaz, D. (1997) The anticipated variance: a tool for the optimization of for-
est inventories. Technical report, ETH Zurich, Chair of Forest Inventory and
Planning.
Mandallaz, D. (2000) Estimation of the spatial covariance in universal kriging: Ap-
plication to forest inventory. Environmental and Ecological Statistics , 7, 263{284.
Saerndal, C.; Swenson, B. and Wretman, J. (1992) Model assisted survey sampling .
Springer series in statistics, New York.
Ye, R. (1995) Waldsimulation auf der Basis automatischer Luftbildmessung und
unter Kontrolle von GIS. Ph.D. thesis, Albert-Ludwig- Universit�at, Freiburg in
Breisgau, Germany.
56
Appendix A
Calculation of the
Anticipated Variance under
Cluster Sampling
Detailed proofs of the key results Lemma 1 (4.4) and Lemma 2 (4.5) are provided
below. For ease of notation we omit the symbol !. Also, when dealing with the
Poisson-strata we shall write write Fgk , Y gk instead of F1;gk , Y 1;gk. We calculate
EgM2g (x)(Yc;g(x)� Y g)
2 = Eg
MXl=1
IFg (xl)(Y (xl)� Y g)
!2
(A.1)
which is equal to
Eg
0@ MXl=1
IFg (xl)(Y (xl)� Y g)2 +
MXl 6=k
IFg (xl)IFg (xk)(Y (xl)� Y g)(Y (xk)� Y g)
1Aand hence to PM
l=1 P(xl 2 Fg)V(Y (xl)jxl 2 Fg)+P
M
l6=k P(xl 2 Fg ; xk 2 Fg)Eg f(Y (xl)� Y g)(Y (xk)� Y g)jxl 2 Fg ; xk 2 Fgg(A.2)
Note that given xl 2 Fg , xl is uniformly distributed in Fg . By taking the expectation
with respect to ! the �rst term yields EgMg(u) times the anticipated variance under
simple random sampling, which is given by (2.8). The extra-term is due to cluster
sampling and is more tricky. We need to calculate
E!Eg f(Y (xl)� Y g)(Y (xk)� Y g)jxl 2 Fg ; xk 2 Fgg
First we split the event
fxk 2 Fgg \ fxl 2 Fggacross the Poisson-strata into the disjointed events
0@P1g[j=1
fxk 2 Fgjg \ fxl 2 Fgjg1A[0@P1g[
i6=jfxk 2 Fgig \ fxl 2 Fgjg
1A :=[s
As
57
and use the decomposition rule for conditional expectation on disjointed As
E(Zj [s As) =
PsE(ZjAs )P(As)P
sP(As)
Note thatP
sP(As) = P(xk 2 Fg \ xl 2 Fg). The following expressions are neededE!Exj!f(Y (xl)� Y g)(Y (xk)� Y g)jxk 2 Fgj ; xl 2 Fgjg (A.3)
E!Exj!f(Y (xl)� Y g)(Y (xk)� Y g)jxk 2 Fgi; xl 2 Fgjg i 6= j (A.4)
To calculate the �rst expression write Y (xl)� Y g = Y (xl)� Y gj + Y gj � Y g and
likewise for Y (xk) � Y g, expand the product, interchange the order of expectation
and neglect boundary e�ects so that E!Y (u) = Y gj 8u 2 Fgj to �nally obtain
ExE!jxY (xk)Y (xl)� Y2
gj+ (Y gj � Y g)
2
At this point we make the assumption that the same tree cannot be sampled
from two di�erent points of the cluster, which is nearly always the case in
practice, with the possible exception of the angle count method and very large trees.
That is, we assume that
Ii(xk ; !)Ii(xl; !) = 0 8i 8k 6= l (A.5)
Under this assumption we obtain after some algebra
E!jxY (xk)Y (xl) =1
�2(Fgj)
Xi6=j2Fgj
YiYj = Y2
gj �1
�2(Fgj)
Xi2Fgj
Y 2i
As �(Fgj) ! 1 the second term vanishes. Hence, the �rst term is asymptotically
given by ( �Ygj � �Yg)2. To calculate the second term, we interchange the order of
expectation, write (Y (xl)� Y g) = (Y (xl)� Y gj + Y gj � Y g) and (Y (xk)� Y g) =
(Y (xk)�Y gi+Y gi�Y g) to obtain with the same arguments (Y gi�Y g)(Y gj�Y g).
The second term in (A.2) is therefore equal to
P1gXj=1
MXl6=k
(Y gj � Y g)2P(xk 2 Fgj ; xl 2 Fgj)+
P1gXi 6=j
MXl6=k
(Y gi � Y g)(Y gj � Y g)P(xk 2 Fgi; xl 2 Fgj)
(A.6)
To go further consider the random variables associated with the number of points
of a cluster falling into a given stratum. They read
M1;gj(x) =
MXl=1
IF1;gj (xl) (A.7)
Straightforward calculations yield
EgM21;gj(x) =
MXl6=k
P(xl 2 Fj ; xk 2 Fj) + EgM1;gj(x) (A.8)
EgM1;gi(x)M1;gj(x) =
MXk 6=l
P(xk 2 Fgi; xl 2 Fgj) (A.9)
58
Substituting (A.8, A.9) into (A.6) we see that (A.6) is equal to
Eg
0@P1gXj=1
M1;gj(x)(Y gj � Y g)
1A2
�P1gXj=1
EgM1;gj(x)(Y 1;gj � Y g)2 (A.10)
Using EgM1;gj(x) = p1;gjEgMg(x) with p1;gj =�(F1;gj )
�(Fg), substituting (A.10) into
(A.2) and using (2.8) yields
EgM2g(x)(Yc;g(x) � Y g)
2
E2gMg(x)
=1
EgMg(x)�2(Fg)
Xi2Fg
Y 2i
�g
i
+Eg
�PP1g
j=1M1;gj(x)(Y gj � Y g)�2
E2gMg(x)
(A.11)
which is precisely Lemma 1 (4.4) since Eg
�PP1g
j=1M1;gj(x)(Y gj � Y g)�= 0.
To prove Lemma 2 (4.5) bY (xl) is replaced asymptotically by Y 2;gk whenever
xl 2 F2;gk. Then one uses exactly the same technique as above but according
to sets As de�ned now by the working-strata instead of the Poisson-strata and by
replacing the random variable Y (xl) by the constant Y 2;gk whenever xl 2 F2;gk . Theterm containing
Pi2Fg
Y2
i
�g
i
does no longer appear since Mg(x) is the only random
variable in (4.5).
59
Recommended