60
Research Collection Report Optimal sampling schemes based on the anticipated variance with lack to fit Author(s): Mandallaz, Daniel Publication Date: 2001 Permanent Link: https://doi.org/10.3929/ethz-a-004158007 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection . For more information please consult the Terms of use . ETH Library

Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Research Collection

Report

Optimal sampling schemes based on the anticipated variancewith lack to fit

Author(s): Mandallaz, Daniel

Publication Date: 2001

Permanent Link: https://doi.org/10.3929/ethz-a-004158007

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

Page 2: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Daniel Mandallaz

Optimal Sampling Schemes Based on the

Anticipated Variance with Lack of Fit

Chair of Forest Inventory and PlanningSwiss Federal Institute of Technology (ETH), Zurich

2001

Page 3: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Published by:

Chair of Forest Inventory and Planning

Department of Forest and Wood Sciences

CH-8092 Zurich

Herausgeber:

Professur f�ur Forsteinrichtung und

Waldwachstum

Departement Wald-und Holzforschung

ETH Zentrum

CH-8092 Zurich

Page 4: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Aknowledgements

I would like to express my thanks to Professor P. Bachmann, Chair of Forest

Inventory and Planning at ETH Zurich, for his continuous support as well as for the

working environment he succeeded to create. Thanks are also due to Dr. A. Lanz

and E. Kaufmann at the Swiss Federal Research Institute WSL in Birmensdorf for

providing data from the Swiss National Inventory and for many helpful discussions.

Last not least, I thank my friend J.-F. Didisheim for his proof-reading, the remaining

errors being my own responsibility.

Page 5: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp
Page 6: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

A L�ea Marine

et

Philom�ene

Page 7: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Abstract

This technical report generalizes and improves on previous work on optimal

sampling schemes for forest inventory. It also gives more detailed mathematical

derivations of previously published results. The sampling procedures are optimal in

the sense that they minimize the anticipated variance for given costs or conversely.

The anticipated variance is de�ned as the average of the design-based variance un-

der a simple stochastic model for the location of the trees. This location model,

the local Poisson Forest, assumes that trees are uniformly and independently dis-

tributed within a given number of Poisson-strata. We consider two-phase two-stage

cluster sampling schemes in which costs and terrestrial second-phase sampling den-

sity can vary over domains. The estimation procedure is based on post-strati�cation

with respect to so-called working-strata which do not need to be identical with the

Poisson-strata, usually unknown, which induces a lack of �t. It is then possible to

derive analytically the optimal sampling schemes. Simulations and data from the

Swiss National Inventory illustrate the theory.

Zusammenfassung

Dieser Bericht verallgemeinert und verbessert fr�uhere Arbeiten �uber optimale

Stichprobenpl�ane f�ur die Waldinventur. Es werden ferner ausf�uhrlichere mathe-

matische Herleitungen gegeben als in den schon publizierten Artikeln. Optimal

bedeutet, dass die antizipierte Varianz bei vorgegebenen Kosten minimiert wird,

oder umgekehrt. Die antizipierte Varianz ist das Mittel der klassischen Stich-

probenvarianz unter einem stochastischem Modell, welches die r�aumliche Lage der

B�aume erzeugt. In diesem r�aumlichen Modell, das lokale Poisson Modell, sind die

B�aume unabh�angig und uniform innerhalb Poisson-Straten verteilt. Wir betra-

chten zwei-phasige zwei-stu�ge Trakt-Stichproben, f�ur welche die Kosten und die

terrestrische Stichprobendichte der zweiten Phase zwischen Gebieten variieren kann.

Das Sch�atzverfahren verwendet Post-Strati�zierung bez�uglich sogenannter Arbeit-

Straten, welche mit den, meistens unbekannten, Poisson-straten nicht identisch

sein m�ussen, was einen \lack of �t" erzeugt. Es ist m�oglich, die optimalen Stich-

probenpl�ane analytisch abzuleiten. Simulationen und Daten der Schweizerischen

Landesforstinventur illustrieren die Theorie.

R�esum�e

Ce rapport g�en�eralise et am�eliore des r�esultats ant�erieurs sur les plans de sondage

optimaux pour l'inventaire des forets, au sens que la variance anticip�ee est mini-

male pour un cout donn�e, ou invers�ement. De plus, il donne des d�emonstrations

math�ematiques plus compl�etes que dans les articles d�ej�a publi�es. La variance an-

ticip�ee est la moyenne de la variance sous le plan de sondage par rapport �a un mod�ele

stochastique pour la distribution spatiale des arbres. Ce mod�ele, le mod�ele pois-

sonien local, suppose que les arbre sont r�epartis ind�ependamment et uniform�ement

�a l'int�erieur de strates (dites de Poisson). Nous consid�erons des plans de sondages

en satellites �a deux phases et deux degr�es pour lesquels la densit�e de sondage ter-

restre de la deuxi�eme phase ainsi que les couts peuvent varier d'un domaine �a

l'autre. La proc�edure d'estimation repose sur une post-strati�cation par rapport

�a des strates, dites de travail, qui ne sont pas forc�ement identiques aux strates de

Poisson, d'ailleurs inconnues le plus souvent. Ceci engendre un d�e�cit d'ajustement

(\lack of �t"). Il est possible de calculer analytiquement les plans de sondage op-

timaux. Des simulations et les donn�ees de l'Inventaire Forestier National Suisse

illustrent la th�eorie.

Page 8: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Contents

1 Introduction 11

1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Formulation of the Problem and De�nitions . . . . . . . . . . . . . . 11

2 Basic Concepts 15

2.1 Reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Post-Strati�ed Estimates . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Internal Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Design-Based Aspects 23

3.1 Calculation of the Design-Based Variance . . . . . . . . . . . . . . . 23

3.2 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 The Anticipated Variance 27

4.1 Mathematical Prerequisites . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Calculation of the Anticipated Variance . . . . . . . . . . . . . . . . 29

4.3 Interpretation and Estimation of the Lack of Fit . . . . . . . . . . . 30

5 Optimization 33

5.1 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Discrete PPS Approximation . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Optimal Sampling Schemes . . . . . . . . . . . . . . . . . . . . . . . 37

6 Examples 43

6.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 Swiss National Inventory . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 Conclusions 55

A Calculation of the Anticipated Variance under Cluster Sampling 57

7

Page 9: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

List of Figures

6.1 Empirical Variance E.V. v. Mean Estimated Variance M.E.V. 44

6.2 Mean Estimated Variance M.E.V. v. Anticipated Variance

without Lack of Fit A.V. . . . . . . . . . . . . . . . . . . . . . . 45

6.3 Mean Estimated Variance M.E.V. v. Anticipated Variance

with Lack of Fit A.V.L. . . . . . . . . . . . . . . . . . . . . . . . 45

6.4 Empirical E.RED v. Mean Estimated M.RED variance re-

duction (in %) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.5 Distribution of DBH in SNI1 . . . . . . . . . . . . . . . . . . . . 49

6.6 Gamma Values for two Concentric Circles according to DBH

Threshold in cm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.7 Ratio q =

~ 2according to DBH Threshold in cm . . . . . . . . 50

6.8 Relative Anticipated Error: D = 1 . . . . . . . . . . . . . . . . . 52

6.9 Surface Areas of Optimal Circles: D = 1 . . . . . . . . . . . . . 53

6.10 Relative Anticipated Error: D = 1;m1 = 11:7 . . . . . . . . . . 53

8

Page 10: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

List of Tables

4.1 ANOVA with Working-Strata . . . . . . . . . . . . . . . . . . . 31

6.1 Installation and Traveling Costs . . . . . . . . . . . . . . . . . 48

6.2 Parameters of SNI . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Components of Variance in SNI . . . . . . . . . . . . . . . . . . 49

6.4 Optimal Concentric Circles . . . . . . . . . . . . . . . . . . . . 51

6.5 Optimal Sampling Scheme with D = 5 . . . . . . . . . . . . . . 51

6.6 Optimal Sampling Scheme with D = 1 . . . . . . . . . . . . . . 51

6.7 Optimal Sampling Scheme with D = 1 and m1 = 11:7 . . . . . 52

6.8 Empirical and Anticipated Relative Errors in % . . . . . . . . 52

9

Page 11: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp
Page 12: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 1

Introduction

1.1 Preliminaries

The choice of an eÆcient sampling design is paramount. Though optimal sampling

schemes based on anticipated variances have received much attention in sampling

theory, primarily in the context of socio-economical studies (see e.g. Saerndal et al.

(1992)), they have been largely ignored by forest inventory specialists. Let us em-

phasize the fact that no meaningful optimization is possible in sampling theory

without some kind of \super-population model" under which an anticipated vari-

ance can be de�ned. A technical report by Mandallaz (1997) and recent papers by

Mandallaz and Ye (1999) and Mandallaz and Lanz (2001) presented optimal sam-

pling schemes based on the anticipated variance under a local Poisson model for

the spatial distribution of the trees. The idea di�ers from the general framework in

as much as only the location of the trees is random and not the response variable

of interest, e.g. tree timber volume, which is �xed. The present work extends these

results to the case where the Poisson-strata, the idealized true model chosen by

Nature but usually unknown, are not identical with the so-called working-strata,

chosen by the statistician to calculate post-strati�ed estimates. This discrepancy

between Poisson- and working-strata induces a so-called lack-of-�t term in the re-

sulting anticipated variance of the post-strati�ed estimates, which has important

consequences with respect to optimization and applications. Though this technical

report is almost self-contained the reader is advised to read �rst Mandallaz and Ye

(1999) in order to get acquainted with the new concepts. Chapter 2 below gives

a short review. One must take the de�nitions word for word and in particular

distinguish carefully between Poisson-strata, working-strata and sampling-strata.

1.2 Formulation of the Problem and De�nitions

We consider a forested region F , assumed to be a subset of the Euclidean plane <2

whose surface area, measured in ha, is �(F ) and a well de�ned population P of N

trees in F , which are identi�ed by their labels i = 1; 2 : : :N . We shall write i 2 G

if the i-th tree belongs to a set G � F ; the surface area of an arbitrary set G is

always denoted by �(G). The response variable of interest measured or observed

at a given time point on each tree in P is denoted by Yi, and is assumed to be

error-free. Given F , our objective is to estimate

�YF =1

�(F )

Xi2F

Yi (1.1)

11

Page 13: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

We assume that F is partitioned into D domains, F = [Dg=1Fg with Fg \ Fk = ;.

In each domain Fg the Poisson-strata are denoted by F1;gk with k = 1; 2 : : : P1g .

According to Mandallaz and Ye (1999), the local Poisson model holds if the locations

of all trees i 2 F1;gk can be viewed as the realizations of independently uniformly

distributed random points in F1;gk . We emphasize the fact that the Poisson-strata

F1;gk do not need to be known to the inventorist, we simply assume that \Nature"

has chosen this model for the location of the trees. In contrast we de�ne accordingly

in each domain Fg the working-strata F2;gk with k = 1; 2 : : : P2g , which will be

used to construct post-strati�ed estimates in each domain. For each sample point

(see below) one has to know to which working-stratum it belongs. For a better

intuitive understanding one can view each working-stratum as the union of �ner

Poisson-strata. Note also that domains, working-strata and Poisson-strata do not

need to be simply connected sets of the plane, i.e. they can each consist of several

non-contiguous components.

The sampling procedure is as follows:

1. First phase

Draw a set s1 of n1 points in A � F independently and uniformly distributed,

which are the origins of the clusters (or trakts). The M points of the x-th

cluster are x+el; l = 1; 2 : : :M . For each x 2 s1 letMg(x) =PM

l=1 IFg (x+el)

be the number of points of the cluster with origin x = x + e1 hitting the

domain Fg . Set M(x) =P

D

g=1Mg(x) and s1;g = fx 2 s1 j Mg(x) 6= 0g. Fur-thermore, let Ag = fx 2 <2 j Mg(x) 6= 0g and note that the set A � F can

be chosen as A = fx 2 <2 j M(x) 6= 0g. A can be viewed as the �rst-phase

sampling-stratum. The set s1 is usually drawn by a systematic grid on aerial

photographs (in agreement with general practice we shall treat systematic

samples as random samples). One de�nes for each cluster x 2 s1 a predictionof the local density at the cluster level according to:

bYc;g(x) = 1

Mg(x)

MXl=1

IFg (x + el)bY (x+ el) (1.2)

where bY (x + el) is the prediction at the point x + el, which depends on the

working-stratum to which x+ el belongs. The actual calculation of bY (x+ el)

is discussed in section 2.3.

2. Second phase

In each set s1;g draw n2;g clusters out of the n1;g by equal probability sam-

pling without replacement to obtain the subset s2;g � s1;g. For each x 2 s2;gone can calculate at each point x + el either the local density Y (x + el)

and the residual R(x + el) = Y (x + el) � bY (x + el) if one-stage sampling

is used, or the estimated local density Y �(x + el) and the estimated residual

R�(x + el) = Y �(x + el) � bY (x + el) if two-stage sampling is used (see sec-

tions 2.1 and 2.3). The Ag can be viewed as the (conditional) second-phase

sampling-strata: if x is uniform in A and given that x is in Ag then x is

uniform in Ag . In practice s2;g is obtained by a coarser terrestrial grid than

the aerial grid. Again, we shall assume that this can be treated as a random

subsample in the above sense. Note that the terrestrial sampling density can

vary over domains. The terrestrial quantities at the cluster level are de�ned

as:

Y �c;g(x) =

1

Mg(x)

MXl=1

IFg (x+ el)Y�(x+ el) (1.3)

12

Page 14: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

R�c;g(x) =

1

Mg(x)

MXl=1

IFg (x+ el)R�(x+ el) = Y �

c;g(x)� bYc;g(x) (1.4)

for two-stage sampling and similarly by dropping the � for one-stage sampling. We

shall need the following assumptions:

Assumption I (no overlap)

We shall assume that a cluster cannot be spread over several domains, so that

Mg1(x)Mg2 (x) = 0 whenever g1 6= g2. The Ag therefore also form a partition of

A. n1;g is the number of non-void clusters hitting Fg , n1 is the number of non-void

clusters hitting F and n1 =PD

g=1 n1;g. ThereforeM(x) =Mg(x) whenever x 2 Ag .

Assumption II (negligible boundaries)

�(Ag)

�(A)=�(Fg)

�(F ):= pg

Assumption III (constant e�ective cluster size)

Ex2AM(x) = Ex2AgMg(x) := EgMg(x) 8g

The assumption that a cluster cannot be spread over several domains (though

it may be spread over several Poisson-strata or working-strata within the same

domain) is purely technical. In practice this will seldom occur. In such a case we

would split the cluster into several components, adjusting accordingly the sample

sizes. The assumption of negligible boundaries is not a concern for suÆciently large

forested areas Fg , which is the context of this work. Whenever Assumption III

is severely violated one has good reasons to use di�erent cluster geometries in the

domains Fg , and the theory presented here is not applicable. Note that Assumptions

I-III all hold under simple random sampling or if we have a single domain.

13

Page 15: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp
Page 16: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 2

Basic Concepts

2.1 Reminder

For easier reference we shall brie y review the concepts of local density, estimated

local density and anticipated variance for a given domain Fg . Full details are given

in Mandallaz (1997) and Mandallaz and Ye (1999).

We consider sampling schemes based on inclusion circles. To each tree we assign

a circle Ki centered on the tree. The i-th tree is sampled from the random point u

whenever the point u falls withinKi, the indicator variable Ii(u) is then 1, otherwise

0. By symmetry, this is equivalent to saying that the i-th tree is inside the circle

centered on the point, which is the de�nition used for �eld work. The situation

is slightly more intricate with strata and we have 3 possible versions. Let us �rst

de�ne the local density Y0(u) at the point u as

Y0(u) =1

�(Fg)

Xi2Fg

YiIi(u)

�g

i

(2.1)

�g

iis the �rst-stage inclusion probability and �(Fg)�

g

iis the inclusion area. In this

version we adjust for edge e�ects only at the forest boundary: that is �(Fg)�g

i=

�(Fg \ Ki). The second version adjusts for edge e�ects at the boundary of the

Poisson-strata. In this case Ii(u) = 0 if u and the i�th tree are not in the same

Poisson-stratum, say F1;gk . The conditional inclusion probability �1;gki

, given that

u 2 F1;gk and i 2 F1;gk, is de�ned by �(F1;gk)�1;gki

= �(F1;gk \Ki) and we set

Y1(u) =1

�(F1;gk)

Xi2F1;gk

YiIi(u)

�1;gki

(2.2)

This version is of theoretical use only since we do not know the Poisson-strata. The

third version adjusts for edge e�ects at the boundary of the working-strata, which

is the version one should ideally use in �eld work. Again Ii(u) = 0 if u and the

i�th tree are not in the same working-stratum, say F2;gk. The conditional inclusion

probability �2;gki

, given that u 2 F2;gk and i 2 F2;gk , is de�ned by �(F2;gk)�2;gki

=

�(F2;gk \Ki)

Y2(u) =1

�(F2;gk)

Xi2F2;gk

YiIi(u)

�2;gki

(2.3)

15

Page 17: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

If the point u is uniformly distributed in Fg then

EuY0(u) = Y g

E(Y1 (u)ju 2 F1;gk) =1

�(F1;gk)

Xi2F1;gk

Yi := Y 1;gk

E(Y2 (u)ju 2 F2;gk) =1

�(F2;gk)

Xi2F2;gk

Yi := Y 2;gk

EuY1(u) = EuY2(u) = Y g

Hence, Y1(u) and Y2(u) yield unbiased estimates for the strata. Let us now calculate

the variance of (2.2) when the point u is uniformly distributed in Fg . With p1;gk =�(F1;gk)

�(Fg)we get

Vu2FgY1(u) =1

�(Fg)

ZFg

(Y1(u)� Y g)2 =

P1gXk=1

p1;gk1

�(F1;gk)

ZF1;gk

(Y1(u)� Y g)2

Write Y1(u)�Y g = Y1(u)�Y 1;gk+Y 1;gk �Y g whenever u 2 F1;gk and expand the

square. The cross-product term vanishes and we get

Vu2FgY1(u) =

P1gXk=1

p1;gkV(Y1(u)ju 2 F1;gk) +P1gXk=1

p1;gk(Y 1;gk � Y g)2 (2.4)

As Y1(u) is a Horwitz-Thompson estimate its variance is given by

V(Y1(u)ju 2 F1;gk) =1

�2(F1;gk)

0@ Xi2F1;gk

Y 2i(1� �

1;gki

)

�1;gki

+X

i6=j2F1;gk

�1;gkij

� �1;gki

�1;gkj

�1;gki

�1;gkj

1A (2.5)

where �1;gkij

= E (Ii(u)Ij(u)ju 2 F1;gk) =�(Ki\Kj\F1;gk)

�(F1;gk)are the pairwise condi-

tional inclusion probabilities.

According to Mandallaz and Ye (1999) we de�ne the anticipated variance of

Y1(u) as the average E!V(Y1(u)ju 2 F1;gk), where ! stands for the random location

of the trees in F1;gk under the Poisson-model. Hence, all the probabilities occurring

in (2.5) depend on the location of the trees. However, �(F1;gk)�1;gki

� �(Ki) as

long as the i-th tree is in the \interior" of Fg . The assumption of negligible

boundary e�ects states that

�(Fg)�g

i(!) = �(F1;gk)�

1;gki

(!) = �(F2;gk)�2;gki

(!) � �(Ki) (2.6)

Under this assumption we can calculate E!V(Y1(u)ju 2 F1;gk). We need only

E!Eu (Ii(u; !)Ij(u; !)ju 2 F1;gk). Interchanging the order of integration and since

for a given u the random variables Ii(u; !) are independent, we obtain, neglect-

ing boundary e�ects, E!�1;gkij

(!) = �1;gki

�1;gkj

. Hence, by taking the anticipated

variance in (2.5) the second term vanishes. The �rst term in (2.5) is

1

�2(F1;gk)

Xi2F1;gk

Y 2i

�1;gki

� 1

�2(F1;gk)

Xi2F1;gk

Y 2i

As �(F1;gk)!1 the second term tends to zero but not the �rst. Substituting this

into (2.4), noting thatp1;gk

�(F1;gk)= 1

�(Fg)and also that, neglecting boundary e�ects,

16

Page 18: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

�(F1;gk)�1;gki

= �(Fg)�g

iwe obtain the following result:

Under simple random sampling the asymptotic anticipated variance is

given by

E!V(Y1(u)ju 2 Fg) = 1

�2(Fg)

Xi2Fg

Y 2i

�g

i

+

P1gXk=1

p1;gk(Y 1;gk � Y )2 (2.7)

Under the assumption of negligible boundary e�ects (2.6) we have Y0(u) = Y1(u) =

Y2(u) := Y (u). The mathematics of boundary e�ects is beyond exact general re-

sults. For a semi-quantitative approach based on integral geometry see Mandallaz

(1997). It suÆces to say that boundary e�ects are negligible if the inclusion circles

are small with respect to the strata, which is common sense. We emphasize once

again the fact that this assumption is only required in the calculation of the an-

ticipated variance in order to derive the optimal sampling schemes and not for the

calculation of the estimates based on actual inventory data. Hence, we shall assume

from now on that the anticipated variance under simple random sampling is given

by

E!V(Yk(u)ju 2 Fg) = 1

�2(Fg)

Xi2Fg

Y 2i

�g

i

+

P1gXk=1

p1;gk(Y 1;gk � Y )2 k = 0; 2 (2.8)

The corresponding formula for cluster sampling rests upon the same arguments and

is given in (4.4), with the proof in the Appendix.

In two-stage sampling let us denote by s2(u) the set of �rst-stage trees selected

at the point u. On each of the selected trees i 2 s2(u) one gets an approximation Y �i

of the exact value Yi. In the �nite set s2(u) one draws a subsample s3(u) � s2(u)

of trees. For each tree i 2 s3(u) one measures the exact variable Yi. Let us now

de�ne the second stage indicator variable:

Ji(u) =

(1 if i 2 s3(u)0 if i 62 s3(u)

(2.9)

pi = P(Ji(u) = 1jIi(u) = 1) are the second-stage conditional inclusion probabil-

ities. The trees in s2(u) are sampled independently from each other, so that

pij = P(Ji(u)Jj(u) = 1jIi(u)Ij(u) = 1) = pipj . The estimated local density

Y �0 (u) at the point u is de�ned as

Y �0 (u) =

1

�(Fg)

0@Xi2Fg

Y �iIi(u)

�g

i

+X

i2s3(u)

Ri

�g

ipi

1A (2.10)

where Ri = Yi �Y �iare the residuals at the tree level. Adjusting for edge e�ects at

the working-strata boundaries we set as in (2.3)

Y �2 (u) =

1

�(F2;gk)

0@ Xi2F2;gk

Y �iIi(u)

�2;gki

+X

i2s3(u)

Ri

�2;gki

pi

1A (2.11)

whenever u and i are in the same working-stratum F2;gk . By construction we have

for (2.10, 2.11)

E(Y �k (u)ju) = Yk(u) k = 0; 2 (2.12)

Since Y �0 (u) is a Horwitz-Thompson estimate of Y0(u) with independent drawings

we have according to (2.10, 2.11)

V(Y �0 (u)ju) =

1

�2(Fg)

Xi2Fg

R2iIi(u)(1� pi)

(�g

i)2pi

:= V0(u) (2.13)

17

Page 19: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

V(Y �2 (u)ju) =

1

�2(F2;gk)

Xi2F2;gk

R2iIi(u)(1� pi)

(�2;gki

)2pi:= V2(u) (2.14)

Under the assumption of negligible boundary e�ects we have for both versions the

important result

Vg := Eu2Fg Vk(u) =1

�2(Fg)

Xi2Fg

�R2i

�g

ipi� R2

i

�g

i

�k = 0; 2 (2.15)

Using for k = 0; 2

Vu2Fg (Y�k (u)) = Vu2FgE(Y

�k (u)ju)+Eu2FgV(Y �

k (u)ju) = Vu2Fg (Yk(u))+Vg (2.16)

we obtain by the previous results

E!V(Y�k (u)) =

1

�2(Fg)

Xi2Fg

Y 2i�R2

i

�g

i

+

P1gXk=1

p1;gk(Y 1;gk � Y )2 +1

�2(Fg)

Xi2Fg

R2i

�g

ipi

(2.17)

To simplify this expression we assume that the prediction modelM at the tree level

satis�es

Yi = Y �i+Ri EMRi = C OVM (Y �

i; Ri) = 0 8i (2.18)

As a consequence EM (Y 2i� R2

i) = Y �2

i. Hence, by taking EM E!V(Y

�k(u)) we can

replace the term Y 2i�R2

iby Y �2

i. To simplify the notation we shall henceforth omit

the expectation with respect to the model M and write

E!V(Y�k (u)) =

1

�2(Fg)

Xi2Fg

Y �2i

�g

i

+

P1gXk=1

p1;gk(Y 1;gk �Y )2+ 1

�2(Fg)

Xi2Fg

R2i

�g

ipi

(2.19)

Since we are dealing with bounded random variables only we have 1�(Fg)

Pi2Fg Yi !

1�(Fg)

Pi2Fg Y

�iin probability and in the mean as �(Fg) ! 1. Also, we shall use

Y2(u), Y�2 (u) and V2(u) henceforth at the points u = x + el of the x cluster. To

simplify the notation we write Y (u), Y �(u) and V (u).

2.2 Post-Strati�ed Estimates

We consider only two-phase two-stage procedures as the two-phase one-stage proce-

dure is a special case. Within each domain Fg one obtains, according to Mandallaz

and Ye (1999), the following two-phase two-stage point estimate

bY �g=

Px2s1;g Mg(x)bYc;g(x)P

x2s1;g Mg(x)+

Px2s2;g Mg(x)R

�c;g(x)P

x2s2;g Mg(x)(2.20)

Recall that the predictions bYc;g(x) at the cluster level are based by (1.2) on the

working-strata. The relative surface areas can be estimated by the overall propor-

tion of points falling within a domain, i.e. by

pg =

Px2s1 Mg(x)Px2s1 M(x)

=

Px2s1;g Mg(x)Px2s1 M(x)

(2.21)

Intuitively it is clear that one can combine the domain estimates into a global

estimate by setting

bY � =DXg=1

pg bY �g

(2.22)

18

Page 20: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Using (2.20) and (2.21) this can be rewritten as

bY � =

Px2s1 M(x)bYc(x)P

x2s1 M(x)+

DXg=1

pg

Px2s2;g Mg(x)R

�c;g(x)P

x2s2;g Mg(x)

!(2.23)

where we have set

bYc(x) = 1

M(x)

MXl=1

IF (x+ el)bY (x+ el) (2.24)

(2.23) is the mean of all the predictions plus the weighted mean (according to

the estimated surface areas of the domains) of the mean residual within each do-

main, which is a straightforward generalization of results given in Mandallaz and

Ye (1999). Before calculating the design-based and anticipated variances of (2.22)

we go back to the calculation of the predictions within each domain. This technical

section is taken from Mandallaz (1997).

2.3 Internal Linear Models

Since the predictions are de�ned within each domain we consider in this section

a single domain. As such, as far as the design-based variance is concerned, the

predictions can be arbitrary and in particular obtained from existing models, not

based on the data of the inventory at hand (so called external models). In particular,

there could be a bias in the predictions, but this is irrelevant since this bias is

removed by adding the mean residual. However, usually one has to construct the

predictions with a model based on the data from the inventory at hand (so called

internal models). We follow Mandallaz (1991).

We consider one-stage sampling �rst. At the point level the theoretical pre-

dictions are obtained via a design-based linear model of the form

bY�(x) = �Y + �t(Z(x)� �Z); � 2 <p; Z(x) 2 <p

where Z(x) is the vector of auxiliary variables available in the large sample. Z(x) is

further de�ned as a vector of 0=1 indicator variables based on the working-strata.

At the cluster level we set consequently

bY�;c(x) = �Y + �t(Zc(x) � �Z); � 2 <p; Z(x) 2 <p

These predictions are purely theoretical since �Y , �Z are unknown and � is, for the

time being, arbitrary. These de�nitions ensure that

Ex2AM(x)bY�;c(x)Ex2AM(x)

= �Y 8�

The theoretical residuals are de�ned as

R�;c(x) = Yc(x) � bY�;c(x)By construction the theoretical residuals have zero mean, namely

�R� =Ex2AM(x)Rc(x)

Ex2AM(x)= 0 (2.25)

We shall assume that the model has an intercept term, which is generally the case in

practice. This means that one component of Z(x), say the �rst Z1(x), is constant

19

Page 21: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

and equal to 1. We therefore partition the vectors as Z(x)t = (1; Z�(x)t) and

�t = (�1; ��t) (t is the transposition operator). In two-phase sampling the optimal

choice of the unknown parameter vector � is determined by minimizing the residual

variance term, i.e.

min�

ExM2(x)(Yc(x)� �Y � �t(Zc(x)� �Z))2

Di�erentiating with respect to � leads to the normal equations for the optimal

theoretical choice �0

ExM2(x)(Zc(x)� �Z)(Zc(x)� �Z)t�0 = ExM

2(x)(Yc(x)� �Y )(Zc(x)� �Z) (2.26)

It follows from the normal equations that the optimal theoretical residuals and

predictions are uncorrelated in the sense that

Ex2AM2(x)(R�0;c(x) � �R)(bY�o;c(x)� �Y ) = 0 (2.27)

From this we have the following decomposition, which plays a key role for the

calculation of the anticipated variance

Ex2AM2(x)(Rc(x)�R)2 = Ex2AM

2(x)(Yc(x) � Y )2 � Ex2AM2(x)(bYc(x) � Y )2

(2.28)

The (p; p) matrix on the left hand side of (2.26) is singular for models with an inter-

cept term, since in this case the �rst row and the �rst column are identically zero.

An elegant solution is to use generalized inverses, see Mandallaz (1991), which unfor-

tunately is not always a suitable procedure when using standard statistical software

packages. We therefore are providing below an alternative approach. Rewriting the

normal equations in terms of the reduced vectors Z�c(x) and �� we see that the

general solution of the original normal equations is given by:

ExM2(x)(Z�

c (x)� �Z�)(Z�c (x) � �Z�)t��0 = ExM

2(x)(Yc(x) � �Y )(Z�c (x)� �Z�)

The optimal �t is therefore equal to (�1; ��t0 ), where �1 is arbitrary.

In practice the theoretical normal equations are obviously not available and we

solve instead their sample versions, i.e.Xx2s2

M2(x)(Z�c(x) � bZ�

2 )(Z�c(x)� bZ�

2 )t b��0 = X

x2s2M2(x)(Yc(x)� bY2)(Z�

c(x) � bZ�

2 )

In other words, b��0 is obtained by linear regression with weights M2(x) of the cen-

tered response variable Yc(x) � bY2 on the centered explanatory variables without

the intercept term, i.e. on Z�c (x) � bZ�

2 .

The empirical predictions are given bybYc(x) = bY2 + b��t0 (Z�c(x) � bZ�

2 )

Most software packages will give directly the predictions, say Pc(x), of bYc(x) � bY2,so that one can also write bYc(x) = bY2 + Pc(x).

The empirical residuals are then given by

Rc(x) = Yc(x) � bYc(x) = (Yc(x)� bY2)� b��t(Z�c(x)� bZ�

2 )

which by construction satisfy

bR2 =

Px2s2

M(x)Rc(x)Px2s2 M(x)

= 0

20

Page 22: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Hence, the point estimate is �nally given by

bYc;reg = bY2 + b��t0 ( bZ�1 � bZ�

2 ) =bY2 + b�t0( bZ1 � bZ2)

where b�t0 = (�1; b��t0 )

and �1 is arbitrary. Note that the �rst component of bZ1 � bZ2 is zero and that all

statistically relevant quantities do not dependend on the arbitrary choice of �1.

In short the optimal design-based point estimate and its estimated vari-

ance can be obtained by standard regression procedures.

It is also worthwhile to note that direct regression of the Yc(x) on the Zc(x)

with weight M2(x) leads to residuals satisfyingP

x2s2 M2(x)Rc(x) = 0 instead ofP

x2s2 M(x)Rc(x) = 0 which is more intuitive. In simple cluster sampling M(x) �1, so that one could in this case also use ordinary least squares.

Further modi�cations are possible by estimating the matrix

ExM2(x)(Z�

c (x)� �Z�)(Z�c (x) � �Z�)t

in the large sample, see Mandallaz (1991) for details.

In two-phase two-stage sampling the theoretical results remain the same since

the extra term due to two-stage sampling does not depend on the prediction model

used. For estimation purposes it suÆces to replace everywhere Yc(x) by Y �c(x),

since this is precisely what the variance estimate does (Mandallaz and Ye (1999)).

Finally, one can also use standard least squares of the Y (xl) or Y�(xl) on the

Z(xl), that is, by ignoring the cluster structure of the data, and then de�ne the

predictions and residuals at the cluster level. In this case we still have zero mean

residual (2.25) but (2.27) may be violated. However, extensive simulations have

shown that the points and variance estimates obtained with the various methods,

including the model-based technique presented in Mandallaz (1991), are usually

small and that the sampling error of b� can be neglected (that this holds asymp-

totically is proved in Mandallaz (1991)). In our case the standard least square

estimates based on a simple ANOVA model with the working-strata as groups have

a clear intuitive interpretation (which is not the case for the model at the cluster

level) : if xo + el 2 F2;gk for a given xo 2 s1 then bY (xo + el) is the ordinary mean,

i.e. ignoring the cluster structure, of all the estimated local densities Y �(xm) withxm 2 F2;gk, i.e. at points in the same working-stratum. This is the estimate we

shall use, while assuming that (2.27) holds exactly when calculating the anticipated

variance. Note that this is true for simple random sampling (M(x) � 1). Intuitively

this will hold approximately whenever the design-based intra-cluster correlation of

the residual is small. Furthermore, one can assume asymptotically (n2 ! 1) that

for xo + el 2 F2;gk the prediction bY (xo + el) is equal to the true working-stratum

mean Y 2;gk =1

�(F2;gk)

Pi2F2;gk Yi.

21

Page 23: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp
Page 24: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 3

Design-Based Aspects

3.1 Calculation of the Design-Based Variance

We shall now derive the design-based variance of the overall estimate (2.22) or

of (2.23), which is equivalent. We shall calculate it conditionally on the sample

sizes n1;g and n2;g. Also, all the random variables occurring are bounded, so that

convergence in probability implies converge in the mean. For instance one can write

Ex2A (bpg) = Ex2AP

x2s1 Mg(x)

n1Ex2AM(x)(1 + op(n�1=21 ))

=Ex2AMg(x)

Ex2AM(x)+o(n

�1=21 ) = pg+o(n

�1=21 )

We shall use repeatedly such approximations, without explicit mention, and keep

only the �rst order terms. With respect to the expectation and variance operators

E and V the index 1 refers to the �rst phase, 2 to the second phase and 3 to the

second stage. Hence, using (2.23), we have

bY := E3j1;2 bY � =

Px2s1 M(x)bYc(x)P

x2s1 M(x)+

DXg=1

pg

Px2s2;g Mg(x)Rc;g(x)P

x2s2;g Mg(x)

!(3.1)

since E3Y�(x+ el) = Y (x+ el). Furthermore,

E2;3j1 bY � =

Px2s1 M(x)bYc(x)P

x2s1 M(x)+

DXg=1

pg

Px2s1;g Mg(x)Rc;g(x)P

x2s1;g Mg(x)

!

and, by using the de�nition (2.21), also

E2;3j1 bY � =

Px2s1 M(x)Yc(x)P

x2s1 M(x)(3.2)

Thus, according to Mandallaz and Ye (1999), bY � is asymptotically design unbiased

(ADU), with a bias of order o(n�12;g) (even exactly unbiased for simple random sam-

pling with external model). To calculate the variance we �rst use the decomposition

V1;2;3bY � = V1;2(E3j1;2 bY �) + E1;2 (V3j1;2 bY �) (3.3)

To calculate the second term in (3.3) we need

V3j1;2

Px2s2;g Mg(x)R

�c;g(x)P

x2s2;g Mg(x)= V3j1;2

Px2s2;g Mg(x)Y

�c;g(x)P

x2s2;g Mg(x)

23

Page 25: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Since the second-stage sampling are independent we have by (2.14)

V3j1;2

Px2s2;g Mg(x)Y

�c;g(x)P

x2s2;g Mg(x)=

Px2s2;g

PM

l=1IFg (x+el)V (x+el)

n2;g

n2;g

�Px2s2;g

Mg(x)

n2;g

�2Therefore

E2j1(V3j1;2 bY �) =DXg=1

bp2g

1

n2;g

Px2s1;g

PM

l=1IFg (x+el)V (x+el)

n1;g�Px2s1;g

Mg(x)

n1;g

�2

Taking now the expectation with respect to the �rst phase and since bp2g= p2+o(n�11;g)

E1E2j1V3j1;2 bY � =DXg=1

p2g

1

n2;g

EgMg(x)Vg

E2gMg(x)

where EgMg(x) = Ex2AgMg(x) and Vg =

1�2(Fg)

Pi2Fg

�R2

i

�g

ipi� R

2

i

�g

i

�by (2.15). We

therefore have the intermediate result

E1;2 (V3j1;2 bY �) =DXg=1

p2g

1

n2;gEgMg(x)

1

�2(Fg)

Xi2Fg

�R2i

�g

ipi� R2

i

�g

i

�(3.4)

We still need V1;2(E3j1;2 bY �) = V1;2bY = V1E2j1 bY + E1V2j1 bY . According to (3.2)

the �rst term is

V1

Px2s1 M(x)Yc(x)P

x2s1 M(x)=

1

n1E2x2AM(x)

Ex2AM2(x)(Yc(x) � Y )2

For the second term we have

V2j1 bY = V2j1

DXg=1

bpg�P

x2s2;gMg(x)Rc;g(x)

n2;g

�P

x2s2;gMg(x)

n2;g

Under Assumption I and since the second-phase drawings are independent between

domains, this is equivalent to

DXg=1

bp2g

�1� n2;g

n1;g

�1

n2;g

Px2s1;g M

2g(x)(Rc;g(x) �R1;g)

2�Px2s1;g

Mg(x)

n1;g

�2

where R1;g =

Px2s1;g

Mg(x)Rc;g(x)Px2s1;g

Mg(x). Therefore we obtain

E1V2j1 bY =

DXg=1

p2g

n2;g

�1� n2;g

n1;g

�EgM

2g (x)(Rc;g(x)�Rg)

2

E2gMg(x)

Collecting the pieces we see that under Assumption I the asymptotic variance is

given by

24

Page 26: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

V1;2;3(bY �) =DXg=1

p2g

n2;g

1

EgMg(x)

1

�2(Fg)

Xi2Fg

�R2i

�g

ipi� R2

i

�g

i

+

DXg=1

p2g

n2;g

�1� n2;g

n1;g

�EgM

2g(x)(Rc;g(x)�Rg)

2

E2gMg(x)

+1

n1

EM2 (x)(Yc(x)� Y )2

E2M(x)(3.5)

This is the result given in Mandallaz and Lanz (2001). The �rst term is the second-

stage variance within the domains, the second is the residual variance within the

domains and the last term is the variance one would get if the true local densities

were known at all n1 points.

3.2 Variance Estimation

We shall now derive an ADU estimate of the asymptotic theoretical variance (3.5).

By analogy with the case of one domain treated in Mandallaz (1997) we can expect

that the �rst two terms in (3.5) can be partially estimated by

A :=

DXg=1

bp2g

n2;gM2

g

�1� n2;g

n1;g

�1

n2;g

Xx2s2;g

M2g(x)(R�

c;g(x)�R

�2;g)

2

with R�2;g =

Px2s2;g

Mg(x)R�

c;g(x)P

x2s2;gMg(x)

andMg =

Px2s1;g

Mg(x)

n1;g. Likewise, if all the Y �

c (x)

for x 2 s1 were available, the last term in (3.5) could be partially estimated by

1

n21M2

1

Xx2s1

M2(x)(Y �c(x)� bY �)2

with M1 =

Px2s1

M(x)

n1. Under Assumption I this is equal to

1

n21M2

1

DXg=1

Xx2s1;g

M2g(x)(Y �

c;g(x) � bY �)2

This can be estimated by using Horwitz-Thompson estimates within each s1;g by

B :=1

n21M2

1

DXg=1

n1;g

n2;g

Xx2s2;g

M2g(x)(Y �

c;g(x) � bY �)2

We therefore set

bV(bY �) =DXg=1

bp2g

n2;gM2

g

�1� n2;g

n1;g

�1

n2;g

Xx2s2;g

M2g(x)(R�

c;g(x)�R

�2;g)

2

+1

n21M2

1

DXg=1

n1;g

n2;g

Xx2s2;g

M2g (x)(Y

�c;g(x)� bY �)2 (3.6)

We shall now prove that (3.6) is indeed an ADU estimate of (3.5), provided Assump-

tions I-III hold. To this end we shall replace asymptotically all the point estimates

25

Page 27: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

appearing in (3.6) by their true values (e.g. Mg by EgMg(x), bY � by Y and so on)

and use the relations

EgM2g(x)(R�

c;g(x) �Rg)

2 = EgM2g(x)(Rc;g(x) �Rg)

2 + EgMg(x)Vg (3.7)

EgM2g (x)(Y

�c;g(x) � Y g)

2 = EgM2g (x)(Yc;g(x)� Y g)

2 + EgMg(x)Vg (3.8)

where Y g = 1�(Fg)

Pi2Fg . They can be checked by direct calculations, similar to

those of the second stage variance given in the proof of (3.5). It follows from (3.7)

that

E1;2;3 (A) =

DXg=1

p2g

n2;gE2gMg(x)

�1� n2;g

n1;g

��EgM

2g (x)(Rc;g(x) �Rg)

2 + EgMg(x)Vg�

To calculate E1;2;3(B) we set n1;g = n1pg , write (Y �c;g(x) � Y ) = (Y �

c;g(x) � Y g +

Y g � Y ), expand the square, use E3j1;2Y�c;g(x) = Yc;g(x) and (3.8). Collecting the

terms yields

E1;2;3 (B) =1

n1E2M(x)

DXg=1

pgEgMg(x)Vg+1

n1E2M(x)

DXg=1

pgEgM2g(x)(Yc;g(x)�Y )2

Under Assumptions I-III we can rewrite the last term of (3.5) by using the relation

Ex2AM2(x)(Yc(x)� Y )2 =

DXg=1

pgEgM2g(x)(Yc;g(x)� Y )2

and check, by calculating E(A) + E(B) with n1;g = n1pg , that (3.6) is an ADU

estimate of (3.5).

26

Page 28: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 4

The Anticipated Variance

4.1 Mathematical Prerequisites

The calculation of the anticipated variance under cluster sampling is technically

intricate. In order to have analytically tractable results we need two further as-

sumptions.

Assumption IV:

Ex2Ag[Mg(x)Yc;g(x)jMg(x)] =Mg(x)Y g

Assumption V:

Ex2Ag

hPP2g

k=1M2;gk(x)Y 2;gkjMg(x)i=Mg(x)Y g

Assumptions I-V are all satis�ed under simple random sampling. Assumptions

IV and V essentially say that the number of points per cluster falling within the

forest area is not related to the values of the response variable. This could be vi-

olated if for instance values of timber volume at the forest edge di�ered markedly

from values in the interior. In any case, these are boundary e�ects, speci�c to each

given forest and therefore beyond general theory. Furthermore, they are negligible

for very large forested areas. Following Mandallaz and Ye (1999) and Mandallaz

and Lanz (2001) we introduce the in ation factors due to cluster sampling with

respect to Poisson-strata, working-strata and domains according to

(1 + �1;g)�21;g =

Vx2Ag

�PP1g

k=1M1;gk(x)(Y 1;gk � Y g)�

EgMg(x)

(1 + �2;g)�22;g =

Vx2Ag

�PP2g

k=1M2;gk(x)(Y 2;gk � Y g)�

EgMg(x)

(1 + ~�2) ~�22 =

Vx2A�PD

g=1

PP2g

k=1M2;gk(x)(Y 2;gk � Y )�

Ex2AM(x)

(1 + �)�2 =Vx2A

�PD

g=1Mg(x)(Y g � Y )�

Ex2AM(x)(4.1)

27

Page 29: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

In the above equations the variance between Poisson- and working-strata as well as

between domains are de�ned as

�21;g =

P1gXk=1

p1;gk(Y 1;gk � Y g)2

�22;g =

P2gXk=1

p2;gk(Y 2;gk � Y g)2

~�22 =

DXg=1

P2gXk=1

pgp2;gk(Y 2;gk � Y )2

�2 =

DXg=1

pg(Y g � Y )2 (4.2)

The relative surface areas are p1;gk =�(F1;gk)

�(Fg)and p2;gk =

�(F2;gk)

�(Fg). M1;gk(x) is

the number of points of the x-th cluster falling within the Poisson-stratum F1;gk ,

likewise M2;gk(x) for the working-stratum F2;gk.

We de�ne the lack of �t in the g-th domain as

�2g = (1 + �1;g)�

21;g � (1 + �2;g)�

22;g (4.3)

It is zero if the working-strata coincide with the Poisson-strata. We shall go back

later to interpretating and estimating �2g.

We need the following two important results, the proofs of which are given in

the Appendix.

Lemma 1: anticipated variance of local densities with respect to Poisson-strata

E!EgM2g(x)(Yc;g(x) � Y g)

2

E2gMg(x)

=1

EgMg(x)

0@ 1

�2(Fg)

Xi2Fg

Y 2i

�g

i

+ (1 + �1;g)�21;g

1A(4.4)

Lemma 2: variance of predictions with respect to working-strata

EgM2g (x)(

bYc;g(x) � Y g)2

E2gMg(x)

=1

EgMg(x)(1 + �2;g)�

22;g (4.5)

we also need the following decomposition which holds under Assumptions I, II, III

and V.

Lemma 3

(1 + ~�2) ~�22 =

DXg=1

pg(1 + �2;g)�22;g + (1 + �)�2 (4.6)

28

Page 30: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Proof:

Under Assumption III and V we have zero expectation and therefore

Vx2A

0@ DXg=1

P2gXk=1

M2;gk(x)(Y 2;gk � Y )

1A = E2A

0@ DXg=1

P2gXk=1

M2;gk(x)(Y 2;gk � Y )

1A2

Under Assumption I the cross terms with di�erent indexes g vanish. Conditioning

on x 2 Ag we get under Assumption II the equivalent expression

DXg=1

pgEx2Ag

0@P2gXk=1

M2;gk(x)(Y 2;gk � Y )

1A2

We shall further set Y 2;gk�Y = Y 2;gk�Y g+Y g�Y and note thatPP2g

k=1M2;gk(x) =

Mg(x). Expanding the square we see by Assumption V and after conditioning on

Mg(x) that the cross-product term also vanishes. To complete the proof one checks

that (1 + �)�2 =Ex2AM

2(x)

Ex2AM(x)

PD

g=1 pg(Y g � Y )2 under Assumptions I-III. Collecting

the terms we obtain (4.6).

Finally we need the following result, which holds under Assumptions I-IV

Lemma 4

1

n1

Ex2AM2(x)(Yc(x)� Y )2

E2x2AM(x)

=

DXg=1

p2g

n1;g

EgM2g (x)(Yc;g(x)� Y g)

2

E2gMg(x)

+1

n1E2x2AM(x)

DXg=1

pgEgM2g(x)(Y g � Y )2

(4.7)

Proof

Write Yc(x)� Y = Yc(x)� Y g + Y g � Y whenever x 2 Ag. Then

Ex2AM2(x)(Yc(x)�Y )2 =

DXg=1

pgEgM2g (x)(Yc;g(x)�Y g)

2+

DXg=1

pgEgM2g (x)(Y g�Y )2

This results from the fact that the cross-product term is equal to

2

DXg=1

pg(Y g � Y )EgM2g (x)(Yc;g(x)� Y g)

which vanishes under Assumption IV since

2EgM2g (x)(Yc;g(x)� Y g) = 2EgMg(x)E g

�Mg(x)(Yc;g(x)� Y g)jMg(x)

�Replacing n1;g by pgn1 we obtain (4.7).

4.2 Calculation of the Anticipated Variance

We can now calculate the anticipated variance. The second term in (3.5) contains

the residuals for which we use the decomposition (2.28). This term appears once

29

Page 31: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

with the factor 1n2;g

, for which we use Lemma 1, Lemma 2 and the lack-of-�t term,

and once more with the factor � 1n1;g

, for which we use Lemma 3 and Lemma 4 in

order to combine it with the last term in (3.5). Finally, we use the prediction model

at the tree level (2.18) and the arguments leading to (2.19). This allows us to shift

the expression �R2

i

�g

i

appearing in the �rst term of (3.5) into the �rst term of Lemma

1. After some algebra this leads to the following fundamental result:

Under Assumptions I-V the asymptotic anticipated variance of (2.22) is given by

E!V1;2;3(bY �) =DXg=1

p2g

n2;gEgMg(x)

1

�2(Fg)

Xi2Fg

R2i

�g

ipi

+

DXg=1

p2g

n2;gEgMg(x)

1

�2(Fg)

Xi2Fg

Y �2i

�g

i

+

DXg=1

p2g

n2;gEgMg(x)�2g

+1

n1EM(x)(1 + ~�2) ~�

22

(4.8)

The �rst term derives from the second-stage sampling within domains, the second

is the anticipated variance within domains as if they were global Poisson forests,

the third derives from the lack of �t of the prediction models used and the forth

from the overall heterogeneity of the working-strata. If the working-strata coincide

with the Poisson-strata, then the lack-of-�t terms are zero and we get the result

given in Mandallaz and Lanz (2001). We used EgMg(x) instead of Ex2AM(x) in the

above formula in order to emphasize the additive structure of the variance terms

across domains. Besides, one can conjecture that (4.8) remains approximately valid

even if Assumption III (constant cluster size) is violated. In contrast to the previous

results we note that as n1 !1 the heterogeneity of the strata is not entirely

removed unless the lack-of-�t term is zero. We shall see that this term will

play a key role for optimization. In the next section we shall give an intuitive

interpretation for the lack of �t as well as a technique to estimate it.

4.3 Interpretation and Estimation of the Lack of

Fit

It is enough to consider a single domain. In order to illustrate the main point

we consider one-stage simple random sampling (then Mg(x) � 1 and 1 + �1;g =

1 + �2;g = 1). Let us assume that each working-stratum is the union of Poisson-

strata. That is, we set F2;gk =SP1gk

j=1 F1;gkj were F1;gkj is the j-th Poisson-stratum

of the k-th working-stratum in the g-th domain. With this notation we obtain �21;g =PP2g

k=1

PP1gk

j=1 p1;gkj(Y 1;gkj�Y g)2. Writing Y 1;gkj�Y g = Y 1;gkj�Y 2;gk+Y 2;gk�Y g

and expanding the square we obtain

�2g = �21;g � �22;g =

P2gXk=1

P1gkXj=1

p1;gkj(Y 1;gkj � Y 2;gk)2

So that the lack-of-�t term is simply the remaining heterogeneity not accounted for

by the working-strata. Let us now consider the general case. We assume that we

30

Page 32: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Table 4.1: ANOVA with Working-Strata

Source Sum of Squares Anticipated Values

ModelP

x2s2;g M2g(x)(bYc;g(x)� bY �

g)2 n2;gEgMg(x)(1 + �2;g)�

22;g

ResidualP

x2s2;g M2g (x)(Y

�c;g(x)� bYc;g(x))2 n2;gEgMg(x)(�

2g +�2

g)

TotalP

x2s2;g M2g(x)(Y �

c;g(x)� bY �

g)2 n2;gEgMg(x)(�

2g+ (1 + �1;g)�

21;g)

have a one-phase two-stage inventory with a given cluster structure (it could also

be a two-phase inventory but we only need the terrestrial data). The idea is very

simple: we break down the overall sum of squares of the estimated local densities

according to the ANOVA model based on the working-strata. The asymptotic

anticipated values of the sum of squares \Model" and \Residual" are obtained by

the techniques used in sections 4.1 and 4.2. Table 1 gives the results.

The pure-error term �2g is de�ned as

�2g =1

�2(Fg)

Xi2Fg

Y �2i

�g

i

+1

�2(Fg)

Xi2Fg

R2i

�g

ipi

(4.9)

We can also interpret Table 1 as a classical analysis of variance. We introduce the

observed coeÆcient of determination R2g= Model

Totaland we can write

(1�R2

g)Total

n2;gEgMg(x)��2

g=

�2g. The lack-of-�t term �2

g is therefore a decreasing function of the goodness of

�t of the model, as expressed by the coeÆcient of determination R2g, whence the

expression lack of �t. Note that R2gis also a decreasing function of �2

g. The estimated

relative variance reduction satis�es

M.RED =bV(bY �

g;one-phase)� bV(bY �g;two-phase)bV(bY �

g;one-phase)=

�1� n2;g

n1;g

�R2g (4.10)

To estimate the pure-error term we rewrite it as a density

�2g=

1

�(Fg)

Xi2Fg

Y �2i

�(Ki)+

1

�(Fg)

Xi2Fg

R2i

�(Ki)pi(4.11)

Hence, at a given point u we can construct the estimate

��2g (u) =1

�(Fg)

Xi2Fg

Y �2iIi(u)

�g

i�(Ki)

+1

�(Fg)

Xi2s2(u)

R2i

�g

i�(Ki)p

2i

(4.12)

We can obtain an estimate of �2gand of its variance by using the previous techniques

with Y �(x + el) = ��2g(x + el). Note also that �2

gcan be estimated for any new

sampling scheme with data from a past inventory: simply replace Ki by Ki;new,

�g

iby �

g

i;oldand p2

iby pi;oldpi;new in formula (4.12). At the end of section 5.2 we

shall give an alternative procedure to estimate the pure error term, which requires

only aggregated data and not the tree data at the plot level. Consequently, by the

ANOVA table, one can also estimate the lack-of-�t term �2gif the cluster geometry of

the existing inventory is the same as the cluster geometry of the inventory for which

we want to predict the anticipated variance (the inclusion probabilities appearing

31

Page 33: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

in the pure-error terms can di�er). The crucial point is that we do not need

to know the Poisson-strata to do that: we only assume that they do exist. If

the working-strata are available in the form of thematic maps it is possible, using

simulations and GIS, to estimate the term (1 + �2;g)�22;g for any cluster structure.

This is not possible for the term (1 + �1;g)�21;g of the Poisson-strata. However, one

can estimate the term �21;g from the inventory at end (with clusters of nominal

size M simply compute M ANOVA tables for simple random sampling and pool

the resulting estimates of �21;g, likewise for �22;g). According to Mandallaz and Ye

(1999) it can be conjectured that

1 � 1 + �1;g � 1 + �2;g � EgM(x) +VgMg(x)

EgMg(x)

In short one can estimate the lack-of-�t term if the geometry of the cluster is

constant or at least work out a plausible scenario if it is not.

32

Page 34: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 5

Optimization

5.1 Costs

We shall use the following terminology, which is adapted from Mandallaz and Ye

(1999).

1. The �rst-phase costs can be written asP

D

g=1 n1;gEgM(x)c1g , which is asymp-

totically equivalent under Assumption III to n1EM(x)P

D

g=1 pgc1g . To sim-

plify we shall write this as n1EM(x)c1 where c1 is the mean �rst-phase cost

per point.

2. Within each domain Fg and for n2;g second-phase clusters, the total costs of

traveling between and within clusters, together with the installation costs will

be written as �g(n2;g). The total costs is thereforeP

D

g=1 �g(n2;g).

3. The linearized version of the traveling costs between clusters within domains

will be written as �g + sgn2;g. The intercept terms �g must be subtracted

from the overall budget C, so that we set ~C = C �PD

g=1 �g.

4. Let cog denote the mean installation costs per point and Tcg the mean traveling

time within clusters. Set ccg = cog +Tcg

EM(x). Then one has

�g(n2;g) � �g + c2gn2;g

where c2g = EM(x)ccg + sg.

5. c21g is the mean unit cost per �rst-stage tree to obtain the approximate value

Y �iof Yi. This might entail for instance the time required to measure only

the diameter at 1.3m.

6. c22g is the mean unit cost per second-stage tree to perform the extra measure-

ments required to know the exact response Yi. This might entail for instance

the time required to measure diameter at 7m and the tree height.

Consequently we shall use the following cost constraint when n1 and the n2;g are

given

C = EM(x)n1c1 +

DXg=1

�g(n2;g) + EM(x)

DXg=1

n2;g

0@c21g Xi2Fg

�g

i+ c22g

Xi2Fg

�g

ipi

1A(5.1)

33

Page 35: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

and the following linear approximation when we also optimize the sample sizes

~C = EM(x)n1c1 +

DXg=1

c2gn2;g + EM(x)

DXg=1

n2;g

0@c21g Xi2Fg

�g

i+ c22g

Xi2Fg

�g

ipi

1A(5.2)

5.2 Discrete PPS Approximation

It was shown in Mandallaz and Ye (1999) that the optimal sampling schemes must

be found in the class of discrete PPS approximation. We follow here the new version

given in Mandallaz and Lanz (2001). We give the calculations for the true values

Yi. The results will be used in the next section mutatis mutandis for the predictions

Y �i. A single domains needs to be considerd. We assume that the N values Yi of

the response variable Y are partitioned into classes Cl; l = 1; 2 : : :K and that

the �rst-stage inclusion probabilities �i are stepwise constant, i.e. we set �i = ~�lwhenever Yi 2 Cl. In practice Yi is usually the timber volume of the i-th tree.

In Mandallaz and Ye (1999) the optimal discrete approximation was found in the

class �i = �f(Yi) where � is a constant and the stepwise constant function f is

de�ned according to f(Yi) = E� (Y jY 2 Cl) whenever Yi 2 Cl, where E

� denotes

the expectation with respect to the discrete distribution of the random variable Y

with values Yi. According to Mandallaz and Lanz (2001) this choice can be slightly

improved by using g(Yi) =pE� (Y 2 j Y 2 Cl) instead whenever Yi 2 Cl. We set

:=

PN

i=1Y2

i

f(Yi)PN

i=1 Yi� 1 (5.3)

This inequality follows from the following calculations:

NXi=1

Y 2i

f(Yi)= NE

� Y 2

f(Y )= N

KXl=1

P(Y 2 Cl)E� ( Y 2

f(Y )jY 2 Cl)

which by the de�nition of the function f() and the property of the conditional

expectation is equal to

N

KXl=1

P(Y 2 Cl)E� (Y 2jY 2 Cl)E� (Y jY 2 Cl) � N

KXl=1

P(Y 2 Cl)E� (Y jY 2 Cl) = NE� (Y ) =

NXi=1

Yi

Likewise we de�ne

~ :=

PN

i=1Y2

i

g(Yi)PN

i=1 Yi=

PN

i=1 g(Yi)PN

i=1 Yi� 1 (5.4)

The second equality results from

NXi=1

Y 2i

g(Yi)= NE

� Y 2

g(Y )= N

KXl=1

P(Y 2 Cl)E� ( Y2

g(Y )jY 2 Cl)

which by the de�nition of the function g() and the property of the conditional

expectation is equivalent to

N

KXl=1

P(Y 2 Cl) E� (Y 2jY 2 Cl)pE� (Y 2jY 2 Cl)

= N

KXl=1

P(Y 2 Cl)pE� (Y 2jY 2 Cl) =

NXi=1

g(Yi)

34

Page 36: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

The inequality results from the fact that for Yi 2 Cl we have

g(Yi) =pE� (Y 2jY 2 Cl) �

pE�2 (Y jY 2 Cl) = E

� (Y jY 2 Cl)

and consequently also

NXi=1

g(Yi) � N

KXl=1

P(Y 2 Cl)E� (Y jY 2 Cl) = NE� (Y ) =

NXi=1

Yi

In contrast we note that by construction we have

NXi=1

f(Yi) = N

KXl=1

P(Y 2 Cl)E� (Y jY 2 Cl) = NE�Y =

NXi=1

Yi

We note that = ~ = 1 when the number of classes K is equal to the number of

distinct Yi values. From (5.4) we have at once

NXi=1

Y 2i

g(Yi)=

NXi=1

g(Yi) = ~ �(F )Y (5.5)

We also have the inequalities

� ~ 2 � ~ (5.6)

The second holds because ~ � 1. To prove the �rst set

al =

sP(Y 2 Cl)E

� (Y 2jY 2 Cl)E� (Y jY 2 Cl) and bl =

pP(Y 2 Cl)E� (Y jY 2 Cl)

Then check that =P

la2

l

Plb2

l

E�2(Y )

and use the Cauchy-Schwartz inequality

Xl

a2l

Xl

b2l � (Xl

albl)2 (5.7)

to get the result. For one single class (K = 1) it can be checked that ~ 2 = .

When Yi is the timber volume based on the DBH (Diameter at Breast Height,

i.e. 1:3m above the ground) numerical integrations for many analytical distribu-

tions and direct calculations based on empirical DBH distributions from the Swiss

National Inventory (SNI) show that for two and more classes Cl one has ~ 2 �

(see section 6.2).

We now apply the above results to the anticipated variance (4.8) and in partic-

ular to the termPN

i=1Y2

i

�i(note that Yi = Y �

iin one-stage sampling). By Cauchy-

Schwartz inequality with ai =Yip�i

and bi =p�i one has

NXi=1

Y 2i

�i�

�PN

i=1 Yi

�2P

N

i=1 �i

with equality if and only if �i = �Yi 8i, so that the lower bound is achieved with

inclusion probabilities proportional to size (PPS). In practice this is only possible

when Yi is the basal area and the �i are obtained with the angle count technique.

In general the best we can do is to approximate the PPS sampling scheme by using

stepwise constant inclusion probabilities leading to the well known techniques with

35

Page 37: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

concentric circles. This can be achieved by using the f(Yi) and g(Yi) de�ned above.

From (5.3), (5.4), (5.6) we have

NXi=1

Y 2i

f(Yi)=

NXi=1

Yi � ~

NXi=1

Yi =

NXi=1

Y 2i

g(Yi)

Hence, we can conjecture that the discrete PPS approximation based on the g(Yi)

is better than the approximation based on the f(Yi). We now prove that this choice

is indeed optimal. Let us denote by ~�l the inclusion probabilities of any sampling

scheme with stepwise constant inclusion probabilities, i.e. �i = ~�l whenever Yi 2 Cl,and by Nl the number of Yi in Cl (not necessarily distinct). Then write

NXi=1

Y 2i

�i

! NXi=1

�i

!=

KXl=1

1

~�l

XYi2Cl

Y 2i

! KXl=1

Nl~�l

!

SinceP

Yi2Cl Y2i= NlE

� (Y 2jY 2 Cl), the right hand side can be rewritten as KXl=1

1

~�lNlE

� (Y 2jY 2 Cl)!

KXl=1

Nl~�l

!

Let

al =

pNl

pE� (Y 2jY 2 Cl)p

~�land bl =

pNl~�l

Then use Cauchy-Schwartz inequalityP

la2l

Plb2l� (P

lalbl)

2 to obtain

NXi=1

Y 2i

�i�

�PN

i=1 g(Yi)�2

PN

i=1 �i=

~ 2�P

N

i=1 Yi

�2PN

i=1 �i:=

~ 2�P

N

i=1 Yi

�2m1

with equality if and only if bl = �al 8l, which is equivalent to ~�l = �pE� (Y 2jY 2 Cl).

In other words the lower bound is precisely achieved when the discrete inclusion

probabilities are given by the g(Yi), which completes the proof.PN

i=1 �i = m1 is

the expected number of trees sampled. Therefore, under a (cost) constraint, the

lower bound is uniquely de�ned. For the �i based on the f(Yi) one obtains simi-

larlyPN

i=1Y2

i

�i=

(P

N

i=1Yi)

2

m1

, which according to (5.6) is indeed larger. However, in

practice, the di�erence between the two lower bounds is likely to be small.

One can obviously apply this technique with the predictions Y �iinstead of the

true values Yi to de�ne the optimal discrete approximation, based on g(Y �i), of the

scheme with �rst-stage inclusion probabilities exactly proportional to the predic-

tions Y �i, the so called PPP scheme.

Let us now consider the second-stage termP

N

i=1R2

i

�ipiappearing in (4.8). By

Cauchy-Schwartz one has

NXi=1

R2i

�ipi�

�PN

i=1 jRij�2

PN

i=1 �ipi:=

�PN

i=1 jRij�2

m2

Hence, the lower bound will be achieved if the unconditional second-stage proba-

bilities �ipi are proportional to the errors jRij, i.e. if we have a so called PPE

scheme. Again, the lower bound is uniquely de�ned under a constraint for the ex-

pected number of second-stage trees m2 =PN

i=1 �ipi. In practice the optimal �i's

are given and, with a model predicting jRij with the Y �i, we can implement exact

PPE, and there is no need for a discrete approximation.

36

Page 38: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

The de�nitions of and ~ rest explicitly upon the de�nitions of the f() and

g() functions. Up to now most of the existing inventories, if not all, use inclusion

probabilities �i which are not directly related to either f() or g(). According to

Lanz (2000) we de�ne the coeÆcient � which is valid for any � = (�1; �2; : : : ; �N )

by setting

� =

PN

i=1Y2

i

�i

PN

i=1 �i�PN

i=1 Yi

�2 � 1 (5.8)

The inequality results again from Cauchy-Schwartz and � = 1 only for exact PPS.

By using (5.3) and (5.4) one checks that if �i = �f(Yi) then � = and that if

�i = �g(Yi) then � = ~ 2. Writing si = �(F )�i for the inclusion area one can write

� =E� Y 2

i

siE�si

(E�Yi)2

(5.9)

which is useful for estimating � from the empirical DBH distribution. Finally, the

following relation is important to calculate the anticipated variance

1

�2(F )

NXi=1

Y 2i

�i=

m1

Y2

(5.10)

wherem1 =PN

i=1 �i is the expected number of trees sampled, which can be obtained

from

m1 = NE�si (5.11)

N = N

�(F )is the density of stems per ha. In the next section we shall use the

above results to derive the optimal sampling schemes, which will turn up to be a

combination of PPP and PPE schemes.

5.3 Optimal Sampling Schemes

We shall minimize the anticipated variance (4.8) for given costs and given sample

sizes n1 and n2g within the classes �g

i= �1ggg(Y

�i) and �

g

ipi = �2gjRij by using La-

grange multipliers. We assume that Assumptions I-V hold.To simplify the formulae

we shall use the abbreviation M = Ex2AM(x) = EgMg(x) in this section. Within

each domain Fg we use the optimal PPS approximation gg(:) with ~ g based on the

predicted values Y �i, i 2 Fg . We also set "g =

jRjg

Y g

with jRjg= 1

�(Fg)

Pi2Fg jRij.

"g is the relative prediction error at the tree level in the domain Fg . Proofs are

essentially the same as in Mandallaz and Ye (1999) so that we only give the end

results. The optimal �rst-stage inclusion probabilities are given by

�(Fg)�g

i=

C �Mn1c1 �PD

g=1 �g(n2;g)PD

g=1 pg�Yg(pc21g~ g +

pc22g"g)

pggg(Y�i)

n2;gMpc21g

(5.12)

where the second-stage inclusion probabilities satisfy

�(Fg)�g

ipi =

C �Mn1c1 �PD

g=1 �g(n2;g)PD

g=1 pg�Yg(pc21g~ g +

pc22g"g)

pgjRijn2;gM

pc22g

(5.13)

37

Page 39: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

After some algebra one obtains the lower bound M A V(bY �) of the anticipated vari-

ance for given n1 and n2;g

M A V(bY �) =

�PD

g=1 pg�Yg(pc21g~ g +

pc22g"g)

�2C �Mn1c1 �

PD

g=1 �g(n2;g)

+

DXg=1

p2g

n2;gM�2g +

1

n1M(1 + ~�2) ~�2

2(5.14)

But for the second term, containing the lack of �t, this is the expression given in

Mandallaz and Lanz (2001) (or Mandallaz and Ye (1999) in the case of one domain).

This term has far-reaching consequences. Indeed, in contrast to the previous

papers, the partial derivatives@MAV(bY )�

@n2;gare no longer positive and one

can �nd n2;g values with zero derivatives yielding a true minimum. For

arbitrary �g(n2;g) this can be done numerically in principle. To go further we

linearize as usual the traveling costs according to �(n2;g) = �g + c2gn2;g and ~C =

C �PD

g=1 �g. To simplify the notation we set � =P

D

g=1 pg�Yg(pc21g~ g +

pc22g"g).

We have to solve the equations

@M A V(bY )�@n2;g

=�2c2g

( ~C �Mn1c1 �P

D

g=1 n2;gc2g)2� 1

M

p2g�2g

n22;g= 0 (5.15)

and

@M A V(bY )�@n1

=�2Mc1

( ~C �Mn1c1 �PD

g=1 n2;gc2g)2� 1

M

(1 + ~�2 ~�22

n21= 0 (5.16)

Dividing the two equations and setting n1;g = n1pg we obtain the relation at the

optimum

n2;g

n1;g=

�gpMpc1q

(1 + ~�2) ~�2pc2g

(5.17)

This leads toDXg=1

n2;gc2g =n1P

D

g=1 pgpc1c2g�g

pMp

1 + ~�2 ~�2(5.18)

Replacing (5.18) into (5.16) and taking the square root we get a linear equation for

n1. This gives us n1 and the n1;g = n1pg and by (5.17) the n2;g . The solutions read

n1 =~Cp1 + ~�2 ~�2

Mpc1

(5.19)

n2;g =~Cpg�gpMpc2g

(5.20)

where we have set

:=

DXg=1

pg �Yg(pc21g~ g +

pc22g"g +

1pM

pc2g�g) +

pc1

q1 + ~�2 ~�2

�g =�g

Y g

is the relative lack of �t. Replacing the optimal sample sizes back into

(5.14) we obtain (after tedious but elementary algebra) the absolute lower bound

38

Page 40: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

of the anticipated variance as

M A Vn1;n2;g (bY �) =�P

D

g=1 pg�Yg(pc21g~ g +

pc22g"g +

1pM

pc2g�g) +

pc1p1 + ~�2 ~�2

�2~C

(5.21)

If the lack-of-�t terms �g tend towards zero we see by (5.20) that the number of

terrestrial plots should be as small as possible and that in this case (5.21) yields

the result given in Mandallaz and Lanz (2001), so that we have perfect consistency.

For easier reference we summarize the solutions yielding the absolute lower

bound, they read:

n1 =~Cp1 + ~�2 ~�2

Mpc1

n1;g = n1pg

n2;g

n1;g=

�gp1 + ~�2 ~�2

sMc1

c2g

�(Fg)�g

i=

rc2g

c21g

1pM

gg(Y�i)

�g= m1g

gg(Y�i)

~ gY g

�(Fg)�g

ipi =

rc2g

c22g

1pM

jRij�g

= m2g

jRijjRj

g

m1g =

rc2g

c21g

1pM

~ gY g

�g

m2g =

rc2g

c22g

1pM

jRjg

�g

(5.22)

m1g =P

i2Fg �g

iand m2g =

Pi2Fg �

g

ipi are the expected number of �rst and

second-stage trees. It can be checked that these solutions satisfy the cost constraint.

If one uses the fg() instead of the gg() functions then one must replace ~ g byp g

into (5.21) to obtain the lower bound, which is slightly increased according to (5.6).

Intuitively speaking we see that the optimal inclusion probabilities attempt to

mimic exact PPS by a combination of discrete PPP based on the gg(Y�i) and exact

PPE based on the error jRij.If the lack-of-�t terms vanish it is possible, according to Mandallaz and Lanz

(2001), to obtain a true minimum by requiring the expected number of �rst-stage

trees to be equal to a preassigned constant m1g in each domain. Let us do the same

with the lack of �t.

To ful�ll this constraint we set

�(Fg)�g

i=m1ggg(Y

�i)

~ gY g

(5.23)

and we optimize the second-stage inclusion probabilities in the class �g

ipi = �2g jRij.

For given n1 and n2;g we use again the Lagrange multiplier technique and we obtain

the optimal second-stage inclusion probabilities as

�(Fg)�g

ipi =

1

pg

n2;gpc22g

jRij (5.24)

39

Page 41: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

where we have set

� =MPD

g=1 pgjRg jpc22gC �Mn1c1 �

PD

g=1 �(n2;g)�MPD

g=1 n2;gm1gc21g

Using these expressions in (4.8) we obtain the anticipated variance for given m1g

E!V1;2;3(bY �jm1g) =�PD

g=1 pgjRg jpc22g�2

C �Mn1c1 �P

D

g=1 �(n2;g)�MP

D

g=1 n2;gm1gc21g+

DXg=1

p2g

n2;gMm1g

~ 2gY2

g

+

DXg=1

p2g

n2;gM�2g +

1

n1M(1 + ~�2) ~�

22

(5.25)

Linearizing the traveling costs and using the same techniques as for solving

(5.15) and (5.16) we obtain the solutions

n1 =~Cp1 + ~�2 ~�2

Mpc1 1

n2;g =

~Cpg

r~ 2gY2

g

m1g

+�2gq

M~c2g 1

(5.26)

where we have set

~c2g = c2g +Mm1gc21g

and

1 =

DXg=1

pgjRg jpc22g + 1pM

DXg=1

pgp~c2g

vuut~ 2gY2

g

m1g

+�2g+pc1

q1 + ~�2 ~�2

Tedious but elementary calculations give the corresponding lower bound of the

anticipated variance for given m1g as

M A V(bY �jm1g) =�PD

g=1 pgY g

�"gpc22g +

1pM

p~c2g

q~ 2g

m1g

+ �2g

�+pc1p1 + ~�2 ~�2

�2~C

(5.27)

Again, if the lack-of-�t terms �g vanish we get the same lower bound as given in

Mandallaz and Lanz (2001). Furthermore, one can minimize (5.27) with respect to

the m1g . This is equivalent to minimizing the terms

~c2g

~ 2gY2

g

m1g

+�2g

!which, by the de�nition of ~c2g , is equivalent to minimizing

c2gY2

g~ 2g

m1g

+Mm1gc21g�2g

40

Page 42: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

It can be checked that doing so leads to exactly the same solutions and lower bound

as in (5.21) and (5.22), e.g. without constraint on the expected number of �rst-

stage trees. Thus we have perfect consistency with the previous results. In much

the same way we consider �nally the sampling schemes for given expected values

m1g and m2g of the numbers of �rst-stage and second-stage trees. That is, we set

�(Fg)�g

i=

m1ggg(Y�i)

~ gY g

�(Fg)�g

ipi =

m2gjRijjRj

g

(5.28)

Substituting this into (4.8) and using the relation (5.5) we obtain

E!V1;2;3(bY �jm1g;m2g) =

1

M

DXg=1

p2g

n2;g

0@ jRj2gm2g

+~ 2gY2

g

m1g

+�2g

1A+1

n1M(1 + ~�2) ~�

22

(5.29)

Linearizing the traveling costs we obtain the constraint

~C =Mn1c1 +

DXg=1

n2;gc�2g (5.30)

where c�2g = c2g +M(c21gm1g + c22gm2g). Using the Lagrange technique as for

deriving (5.19), (5.20) and (5.26) we get after some algebra the solutions

n1 =~Cp1 + ~�2 ~�2

Mpc1�

n2;g =~CpgAg

pM

Mpc�2g�

(5.31)

where we have set

Ag =

vuut jRj2g

m2g

+~ 2gY2

g

m1g

+�2g

and

� =1pM

DXg=1

pg

qc�2gAg +

pc1

q1 + ~�2 ~�2

After tedious but simple algebra substituting these expressions into (5.29) leads to

the following lower bound of the anticipated variance for given m1g and m2g .

M A V(bY �jm1g;m1g) =�1pM

PD

g=1 pgY g

pc�2g

q"2g

m2g+

~ 2g

m1g+ �2g +

pc1p1 + ~�2 ~�2

�2~C

(5.32)

Again, one can minimize the expression

T = c�2g

0@ jRj2gm2g

+~ 2gY2

g

m1g

+�2g

1A41

Page 43: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

appearing in the lower bound (5.32) with respect to m1g and m2g. It can be checked

that the m1g and m2g given in (5.22) satisfy the equations @T

@m1g= @T

@m2g= 0.

Furthermore, if we set the corresponding solutions into (5.32) we get the absolute

lower bound given in (5.21). Thus we have full consistency between all the optimal

solutions, either without constraint on the number m1g, m2g of trees sampled, or

for given m1g only, or for given m1g and m2g . The end result does not depend on

the sequence chosen for optimization. In this sense the lower bounds for �xed m1g

and m2g given in section 5.2 justi�es a posteriori the strategy of seeking for a global

minimum in the discrete PPP class and the exactPPE class. Let us also emphasize

the fact that in principle one can numerically �nd the optimal solutions for arbitrary

traveling cost functions �(n2g). However, due to the inherent uncertainty of the

traveling cost functions we believe that linear approximations, of say a square root

law, as given in Mandallaz (1997) should be adequate in practice. Besides, they

alone can yield qualitative insight required for calculating the relative eÆciency of

various sampling schemes and for determining the achievable lower bounds.

42

Page 44: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 6

Examples

6.1 Simulations

We re-analyze the simulation results given in Mandallaz and Ye (1999) by using

in two-phase sampling the anticipated variance with lack of �t. The simulations

are based on an almost real and relatively large forest of 10090ha with 7 strata

(de�ned by the stand age) and 3800000 trees above 12cm DBH(diameter at breast

height). Almost real in the sense that this forest was constructed by using a soft-

ware for the automatic interpretation of aerial photographs. The procedure is too

sophisticated to be explained here and the reader may consult Ye (1995) for details.

The positions of the 3800000 (visible) trees were estimated automatically from the

aerial photographs, together with the surface area of the canopy. The DBH and

volumes of the trees were then simulated according to a prediction model (based

on previous studies), which relates canopy parameters to the volume. The resulting

error on the tree volume is largely irrelevant for our purposes. We consider this

forest as one single domain (D = 1). The true average timber volume is 320m3 per

ha with a between-strata coeÆcient of variation 100 � �2

Y= 65% (rather than 35%

as given in Mandallaz and Ye (1999), due to a typing error). For each sampling

scheme we simulated for the �rst phase 500 systematic grids 200m� 200m yielding

an expected value of n1 = 293:6 with M(x) 6= 0 per run. The second phase is the

sub-grid 400m� 400m with expected value n2 = 73:4. The cluster consists of the 4

vertices of a 50m� 50m square, yielding EM(x) = 3:7 and VM(x) = 0:6. The 32

sampling schemes were:

� 12 schemes with 1 circle with surface areas between 100m2 and 500m2, plot

symbol ? in Fig. 1; 2; 3

� 8 schemes with 2 concentric circles of:

(100; 600), (200; 300), (200; 400), (200; 500), (200; 600), (300; 400), (300; 500),

(400; 500) (m2) with DBH thresholds at 36cm, plot symbol 2.

� 7 schemes with 3 concentric circles of (60; 300; 900), (200; 300; 400),

(200; 300; 500), (200; 300; 600), (300; 400; 500), (300; 400; 600), (400; 500; 600)

m2 with DBH thresholds at 24cm and 41cm, plot symbol �.� 5 schemes with angle-count factors k = (2; 2:5; 3; 3:5; 4), plot symbol 4.

Predictions were based on the ordinary least squares (see section 2.3). In order

to reduce the computing time adjustments for boundary e�ects were performed

only at the forest edge and not at the boundaries of the working-strata (i.e. we

used Y1(u)), the resulting bias is negligible. The 500 runs allow us to calculate

43

Page 45: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Figure 6.1: Empirical Variance E.V. v. Mean Estimated Variance M.E.V.

E:V: = �135:9+ 0:95�M:E:V:, R2 = 0:96

Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4

the empirical variance of the 500 point estimates as well as the mean of the 500

estimated variances, calculated under the assumption of random sampling. Under

random sampling both have the same expected values. The in ation factor for

the working-strata was estimated, for each of the 32 schemes, by a single large

simulation ( 10000 random clusters). One has 1 + �2 � 2:30. We estimate the lack

of �t by using the ANOVA decomposition given in Table 1 for a single domain. In

this example we have one-stage sampling and therefore pi � 1 and Ri � 0. The

pure error term �2 is known exactly as well as �22 . Hence, we have an estimate of

(1 + �2)�22 , the model sum of squares, for the 32 examples. In theory this quantity

does not depend on the inclusion circles used and should therefore be constant.

However, due to the sampling errors on 1 + �, the values(1+b�)�2

2

n2Mfall between 238

and 266 with a mean of 251. Subtracting the pure error from the mean of the

500 estimated variances we get an estimate of(1+b�)�2

1

n1Mand therefore also of �2

n2M

for each of the 32 schemes. Again, the theoretical lack of �t is independent from

the sampling schemes used. Because of the sampling error of the mean estimated

variance the values of �2

n2Mfall between 188 and 233 with a mean of 214. Most of

this variability is due to the sampling schemes with one single circle. The mean

coeÆcients of determination R2 increased from 0:4 (one circle) to 0:44 (two circles),

to 0:45 (three circles) and to 0:47 (angle count).

Fig. 6.1 gives the relationship between the empirical variance E.V. (i.e. the

empirical variance of the 500 point estimates obtained for each sampling scheme,

which is an unbiased estimate of the true variance under systematic sampling) and

the mean estimated variance M.E.V. (i.e. the mean of the 500 variance estimates

obtained by treating the systematic sample as a random sample). Fig. 6.2 gives the

relationship between the mean estimated varianceM.E.V. and the the anticipated

variance A.V. without lack of �t and Fig. 6.3 gives the relationship between the

M.E.V. and the anticipated variance with lack of �t A.V.L.. Though the R2 of

the regression lines in Fig. 6.2 and Fig. 6.3 are both very close to 1, the slope and

intercept are much closer to their ideal values of 0 and 1 in Fig. 6.3.

In practice, of course, the regression lines are not known and it is therefore bet-

ter to use the anticipated variance with lack of �t. In general, treating a systematic

44

Page 46: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Figure 6.2: Mean Estimated Variance M.E.V. v. Anticipated Variance

without Lack of Fit A.V.

M:E:V: = 249:2+ 0:77�A:V:, R2 = 0:99

Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4

Figure 6.3: Mean Estimated Variance M.E.V. v. Anticipated Variance

with Lack of Fit A.V.L.

M:E:V: = 17:02+ 0:91�A:V:L:, R2 = 0:99

Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4

45

Page 47: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Figure 6.4: Empirical E.RED v. Mean Estimated M.RED variance reduc-

tion (in %)

E:RED: = �25 + 1:43�M:RED:, R2 = 0:50

Legend: 1 circle: ?, 2 circles: 2, 3 circles: �, angle count: 4

sample as a random sample overestimates the error, so that we are usually on the

safe side. In this example the mean estimated error over-estimates the empirical

error for the volume by ca 25% on the average. This is in agreement with results

obtained for the basal area in a similar forest by using geostatistical methods (Man-

dallaz (1993) and Mandallaz (2000)), which, however, are rather diÆcult to use for

estimation and even more so for optimization. Since the relationships between

E.V., M.E.V. and A.V.L are linear the anticipated variance with lack of �t will

almost certainly select the more eÆcient of two schemes if they really di�er and also

give a good conservative estimate of the error. In this example the pure-error terms

account on average for ca 25% (one circle) to 10% (angle count) of the variance (of

Y (u)), the working-strata for 40% to 50% and the lack of �t for 35% to 40%. The

simulations also show that the estimation of the lack of �t and of the in ation factor

require rather large samples to be reliable. This implies that one should investigate

the stability of the optimal solutions by considering many scenarios for these pa-

rameters. Fig. 6.4 displays the relationship between the empirical relative variance

reduction E.RED. and the estimated mean relative variance reduction M.RED.,

de�ned by (4.10). The empirical reductions are signi�cantly smaller, but of course

unknown in practice, whereas the M.RED are close to the theoretical values ob-

tained from (4.10). The simulations show that the anticipated variance with lack of

�t approximates very well the behavior of the empirical and estimated variances.

46

Page 48: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

6.2 Swiss National Inventory

This example is based on the �rst 1983-1985 Swiss National Inventory (SNI1) and

on the second 1993-1995 Swiss National Inventory SNI2. SNI1 is a one-phase

two-stage simple sampling scheme using plots with two concentric circles: 200m2

and 500m2 with DBH thresholds at 12cm and 36cm. In all, n2 = 100974 plots

were available from a 1km� 1km grid with m1 � 11:7. The second-stage sampling

procedure used essentially equal probability sampling with �i � 0:33 for DBHi <

60cm and �i = 1 forDBHi � 60cm, which resulted in m2 � 4. SNI2 is a two-phase

two-stage sampling scheme with the same concentric plots, but with only half the

original plots from the sub-gridp2km �p2km, whereas the �rst phase for aerial

photographs is based on a 0:5km� 0:5km grid leading to n1 = 510296, n2 = 60412,m1 � 12. The second-stage procedure was changed to implementPPE withm2 � 2.

Switzerland (CH) is divided into D = 5 domains: Jura (JU), Swiss Plateau (SP),

Pre-Alps (PA), Alps (AL) and Southern Alps (SA). The working-strata are based

on the semi-automatic interpretation of aerial photographs to determine the average

tree height above the ground (with 25 points per plot). This leads to 5 working-

strata within each domain: 0m, 0m� 10m, 10m� 20m, 20m� 30m, > 30m. The

correponding R2's fall into the range 0:12� 0:24, which is not as good as the usual

stand-map strati�cation with R2 � 0:4, but requires only the aerial photographs

available from the Swiss Topographic Survey. The post-strati�cation procedure used

the 5 working-strata within each domain separately, which explains why R2 = 0:27

for (CH) is higher than the R2's in each domain. The costs parameters, as well as

the pg, Y g are based on the �rst inventory, whereas the estimation of the lack of �t

and of the pure error had to be based on the second.

We shall now brie y discuss the construction of the function �(n2;g) for this

particular case. tg is the mean time required to go, usually by car, from either the

lodging facility to the topographic �xed point (marked on the aerial photograph and

easily accessible) nearest to the sample plot or from one �xed point to the nearest

�xed point. cg, the mean installation time, is the time required to access the vicinity

of the sample plot from the nearest �xed point, usually by walking, plus the time

required to locate and secure exactly the center of the permanent plot. The total

installation time increases approximately linearly with the number of sample points.

Let n0g = 100974pg be the number of terrestrial plots available from SNI1 in each

region. With a square root law for the traveling time from �xed point to �xed point

we can write �(n2;g) � n2;gcg + tgpn2;g

pn0;g. With this choice we insure that the

total traveling time is the one observed when n2;g = n0;g . Then one linearizes the

traveling time, over a given range for n2;g , according to tgpn2;g

pn0;g = �g+sgn2;g,

to obtain �(n2;g) � n2;gcg + sgn2;g + �g = n2;gc2;g + �g with c2g = cg + sg. The

R2's for the six linear regressions were all above 0:97 in their respective range.

Taking into account the inherent diÆculty to assess the traveling costs the �t is

therefore more than acceptable. Recall that ~C = C�PD

g=1 �g . Table 6.1 (based on

Lanz (2000)) gives the parameters required to calculate the c2g values. Tables 6.2

(where, according to (5.9),p � is given for easier comparison with the optimal ~ )

gives a summary of the parameters required for the optimization. The cost for the

orientation and interpretation of an aerial photograph is constant c1 = 10 minutes.

Note that all the terrestrial costs are given in total time unit for a crew of two

persons (i.e. the e�ective time spent is only half of the values given).

Table 6.3 gives the proportion of the variance due to the pure error, the working-

strata and the lack of �t, as well as the�2;g

Y g

needed to calculate~�2Y

= 0:42 (last cell

in Table 6.3). Note that in this example all the in ation factors 1+�1;g, 1+�2;g are

equal to 1 (simple random sampling). The pure error terms for the SNI inclusion

circles were obtained in each region according to the formulae (5.9), (5.10), (5.11)

47

Page 49: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

and the DBH distribution in the regions. The lack-of-�t terms were then estimated

by reconstructing the ANOVA tables (4.1) from the empirical errors obtained in each

region by using the second phase only and the corresponding R2's. The SNI result

for CH assumed the pg to be known exactly and therefore ignored the 1n1�2 term.

This small adjustment changes the relative error from 0:88% to 0:89%.

Table 6.1: Installation and Traveling Costs

cg tg n2;g range �g sg c2g(hrs) (hrs) (hrs) (hrs) (hrs)

JU 3.25 0.80 400-10'000 1'038 0.27 3.52

SP 3.01 0.74 400-10'000 1'037 0.27 3.28

PA 4.31 1.04 400-10'000 1'386 0.36 4.67

AL 5.20 1.20 600-16'000 2'537 0.41 5.61

SA 6.96 1.44 200- 6'000 1'160 0.52 7.48

CH 4.46 1.04 2'000-20'000 4'950 0.55 5.01

Legend

cg : installation time, tg : traveling time, �g: intercept for traveling cost, sg: slope

for traveling cost, c2g : linearized installation and traveling cost per point.

Table 6.2: Parameters of SNI

pg Ng c2g c21g c22gp � "g Y g �g R

2

g

(%) (hrs) (min) (min) (%) (m3=ha) (%) (%)

JU 18 468 3.52 1.9 4.8 1.28 15 328 47 15SP 21 454 3.28 1.9 4.8 1.26 14 403 49 24PA 19 508 4.67 2.0 4.8 1.26 16 419 53 20AL 30 445 5.61 2.1 5.1 1.31 19 292 70 21SA 12 425 7.48 2.1 5.1 1.48 22 178 72 12

CH 100 460 5.01 2.0 5.0 1.30 16 332 57 27

Legend

pg: relative surface area, Ng: stem density, c2g : linearized installation and travel-

ing cost per point, c21g: unit cost per �rst-stage tree, c22g : unit cost per second-

stage tree,p �: from equation (5.9), "g: relative prediction error at tree level,

Y g: timber volume per ha, �g : relative lack of �t, R2g: coeÆcient of determination.

In order to illustrate the results on discrete PPP the empirical distribution of

DBH (as obtained from SNI1 for CH) is displayed in Fig. 6.5. Fig. 6.6 displays

, ~ as a function of the DBH threshold for the two concentric circles (for the

variable Y �i, which is the timber volume based only on the DBH). The minima

are ~ = 1:18699 and = 1:41467 both at DBH = 31cm. Fig. 6.7 shows that the

ratio q =

~ 2is very close to 1. The threshold values for the optimal scheme with

three concentric circles are: DBH = 24cm and DBH = 41cm with ~ = 1:09. By

comparison the scheme with one circle only gives ~ = 1:67. Hence, the eÆciency

gain from one to two circles is substantial, whereas the gain from two to three

circles is marginal if one takes the increased complexity into account. The � were

determined on the basis of empirical studies performed at the end of the seventies

48

Page 50: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Table 6.3: Components of Variance in SNI

�2

g

VgY (u)

VgbY (u)

VgY (u)= R2

g

�2g

VgY (u)

�2;g

Y g

(%) (%) (%) (%)

JU 35 15 50 26

SP 28 24 48 35

PA 26 20 54 32

AL 20 21 59 42

SA 29 12 59 32

CH 24 27 49 42

Legend�2

g

VgY (u): pure error,

VgbY (u)

VgY (u)= R2

g : working-strata,�2g

VgY (u): lack of �t,

�2;g

Y g

:

coeÆcient of variation between working-strata.

(in the district of Nidwald). They correspond neither to f(), nor to g() nor to

any other closed mathematical expression, but, as we shall see, they did serve their

purpose very well indeed.

Figure 6.5: Distribution of DBH in SNI1

Legend: f relative frequency in %, DBH diameter class in cm

Table 6.4 gives the key parameters of the optimal two concentric circles.

Table 6.5 gives the parameters of the optimal sampling scheme with an overall

budget C = 440307hrs ( ~C = 370149hrs) equal to the variable costs obtained with

SNI2. Likewise, Table 6.6 gives the optimal sampling scheme for Switzerland con-

sisting of D = 1 domain only, whereas Table 6.7 adds the constraint m1 = 11:7

( ~C = 390457 in both cases.)

Table 6.8 displays the relative empirical and anticipated errors for SNI2 and

for the various optimal sampling schemes.

Recall that the anticipated errors are based on SNI1 data except for the lack-

of-�t terms which had to be based on SNI2. The perfect agreement between the

�rst two colums in Table 6.8 is therefore tautological, it simply shows that up to

very small rounding errors the calculations are consistent. Fig. 6.8 and Fig. 6.9

display the relative anticipated error and the surface areas of the inclusion circles

49

Page 51: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Figure 6.6: Gamma Values for two Concentric Circles according to DBH

Threshold in cm

Legend: upper curve, ~ lower curve

Figure 6.7: Ratio q =

~ 2according to DBH Threshold in cm

for the optimal scheme with D = 1 as a function of the number n2 of terrestrial

plots when the number n1 of aerial photographs is optimal for given n2 (this can

be easily obtained by di�erentiation from (5.14)). Likewise, Fig. 6.10 displays the

relative error for the optimal scheme with D = 1 and m1 = 11:7, that is when the

number of �rst-stage trees is equal to the observed SNI1 value.

50

Page 52: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Table 6.4: Optimal Concentric Circles

threshold ~ g1g(Y�i) g2g(Y

�i)

(cm) (m3) (m3)

JU 31 1.18 0.33 2.00

SP 31 1.17 0.33 2.07

PA 32 1.17 0.36 2.18

AL 32 1.19 0.35 2.29

SA 31 1.25 0.28 2.44

CH 31 1.19 0.32 2.15

Legend

~ : from equation (5.4), g1g(Y�i) and g2g(Y

�i): values of step function g().

Table 6.5: Optimal Sampling Scheme with D = 5

small circle large circle m1g m2g n1;g n2;gn2;g

n1;gerror

(m2) (m2) (%) (%)

JU 226 1'368 26.5 2.1 4'154 999 24 1.73

SP 170 1'067 24.3 1.8 4'846 1'547 32 1.49

PA 192 1'162 26.1 2.3 4'385 1'319 30 1.69

AL 217 1'418 21.5 2.2 6'924 1'749 25 1.88

SA 319 2'783 25.4 2.9 2'769 380 14 4.01

CH 23'079 5'995 0.85

Legend

m1g : number of �rst-stage trees per point, m2g : number of second-stage trees per

point, n1;g: number of �rst-phase points, n2;g: number of second-phase points.

Table 6.6: Optimal Sampling Scheme with D = 1

small circle large circle m1 m2 n1 n2n2

n1error

(m2) (m2) (%) (%)

CH 207 1'393 25.6 2.2 23'668 5'859 25 0.86

Legend

m1: number of �rst-stage trees per point, m2: number of second-stage trees per

point, n1: number of �rst-phase points, n2: number of second-phase points.

51

Page 53: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Table 6.7: Optimal Sampling Scheme with D = 1 and m1 = 11:7

small circle large circle m1 m2 n1 n2n2;g

n1;gerror

(m2) (m2) (%) (%)

CH 95 636 11.7 1.9 22'878 6'393 28 0.89

Legend

m1: number of �rst-stage trees per point, m2: number of second-stage trees per

point, n1: number of �rst-phase points, n2: number of second-phase points.

Table 6.8: Empirical and Anticipated Relative Errors in %

SNI2 SNI2 D = 5 D = 1 D = 1

m1 = 11:7

e.e. a.e. a.e. a.e. a.e.

JU 1.81 1.81 1.73 1.70 1.80

SP 1.72 1.71 1.49 1.64 1.73

PA 1.88 1.87 1.69 1.81 1.84

AL 1.88 1.88 1.87 1.86 1.86

SA 3.18 3.18 4.01 3.06 3.15

CH 0.89 0.89 0.85 0.86 0.89

Legend

e.e. empiricical relative error, a.e. anticipated relative error.

Figure 6.8: Relative Anticipated Error: D = 1

Legend: a.e. relative anticipated error in %, n number of terrestrial plots

52

Page 54: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Figure 6.9: Surface Areas of Optimal Circles: D = 1

Legend: s.a. surface areas in m2, n number of terrestrial plots

Figure 6.10: Relative Anticipated Error: D = 1;m1 = 11:7

Legend: a.e. relative anticipated error in %, n number of terrestrial plots

53

Page 55: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Comments

1. The optimal scheme with D = 5 yields surface areas for the large circles which

are too large for �eld work because of complex boundary adjustments and

slope corrections, especially in the Alps. With respect to SNI2 the relative

error is decreased by 3:5%, or, equivalently, the cost is reduced by 7%, which

does not justify the increased complexity. The same conclusion was given

in Mandallaz and Lanz (2001), however without taking the lack of �t into

account.

2. Essentially the same can be said for the simpli�ed optimal scheme with D = 1.

3. The optimal scheme with D = 1 and m1 = 11:7 is very close to the SNI2,

except for the number of aerial photographs and the surface area of the small

circle, which are halved. The errors are the same. In this sense, the SNI is

remarkably \optimal".

4. In practical terms the optimum is rather \ at" with respect to the resulting

error, which remains within narrow bounds over a large range of n2. The

characteristics of the design, on the other hand, vary much more.

5. The only feasible possibility to increase substantially the eÆciency of SNI

is therefore to reduce the lack-of-�t term by improving the prediction model

based on the aerial photographs and thus increasing the R2 from 0:20 to

0:30� 0:40.

54

Page 56: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Chapter 7

Conclusions

Mathematically speaking we have solved completely the optimization problem of

two-phase two-stage cluster sampling schemes in the case where the known working-

strata used for post-strati�cation are not identical with the unknown Poisson-strata

generating the random location of the trees, which is a major advantage for appli-

cations. The only assumption required is that the sample points must be allocated

error-free to the working-strata. The setup is very general: costs and terrestrial

sampling densities are allowed to vary across domains, as well as the number and

structure of the concentric circles de�ning the �rst-stage inclusion probabilities.

The surfaces areas of the optimal concentric circles depend on the sample sizes used,

whereas the corresponding optimal DBH thresholds do not. The optimal second-

stage inclusion probabilities can be readily implemented if a prediction model for

the absolute error is available. One can �nd the optimal sample sizes with or with-

out constraint on the number of �rst- and second-stage trees. It is possible to give

closed analytical results after linearizing the traveling costs between clusters. If such

an approximation is questionable one can optimize numerically. To implement the

techniques presented in this work one must have some approximate prior knowledge

on the �rst- and second-stage measurement costs, the traveling costs and the forest

structure: DBH distribution, surface areas and mean values in the working-strata,

prediction error at the tree level and lack of �t of the post-strati�cation procedure

used. This information can be based on a past inventory: in such a case it is possible

to estimate the anticipated variance of any other inventory performed with simple

random sampling. This is also possible under cluster sampling provided that the

cluster geometry of the past and future inventory is the same (only the geometry,

not the number and structure of concentric circles, nor the second-stage procedure).

If the geometry di�ers one can at least work out a plausible scenario. It is in some

sense remarkable that the end results of such a complex optimization task should

require only simple algebra to be implemented. In any case, the anticipated vari-

ance with lack of �t is a very useful tool to investigate many possible alternatives,

to �nd the theoretical optimum and investigate its stability. Preliminary valida-

tions indicate that the anticipated variance approximates very well the behavior

of the empirical variance under two-phase sampling. It was also shown that the

design of the Swiss National Inventory is very close to the optimal scheme with

respect to the relative error achieved though its characteristics di�er from those of

the theoretically optimal scheme.

Future work should consider more case studies for validation, particularly with

respect to the stability of the optimum. At a more theoretical level one should

attempt to calculate the anticipated variance under models more sophisticated than

the local Poisson model and generalize the method to the multivariate situation.

55

Page 57: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

References

Lanz, A. (2000) Optimal sample design for extensive forest inventories. Ph.D. thesis,

ETH Zurich, Chair of Forest Inventory and Planning.

Mandallaz, D. and Lanz, A. (2001) Further results for optimal sampling schemes

based on the anticipated variance. To appear in Canadian Journal of Forest

Research.

Mandallaz, D. and Ye, R. (1999) Optimal two-phase two-stage sampling schemes

based on the anticipated variance. Canadian Journal of Forest Research, 29,

1691{1708.

Mandallaz, D. (1991) A uni�ed approach to sampling theory for forest inventory

based on in�nite population models. Ph.D. thesis, ETH Zurich, Chair of Forest

Inventory and Planning.

Mandallaz, D. (1993) Geostatistical methods for double sampling schemes: appli-

cations to combined forest inventory. Technical report, ETH Zurich, Chair of

Forest Inventory and Planning.

Mandallaz, D. (1997) The anticipated variance: a tool for the optimization of for-

est inventories. Technical report, ETH Zurich, Chair of Forest Inventory and

Planning.

Mandallaz, D. (2000) Estimation of the spatial covariance in universal kriging: Ap-

plication to forest inventory. Environmental and Ecological Statistics , 7, 263{284.

Saerndal, C.; Swenson, B. and Wretman, J. (1992) Model assisted survey sampling .

Springer series in statistics, New York.

Ye, R. (1995) Waldsimulation auf der Basis automatischer Luftbildmessung und

unter Kontrolle von GIS. Ph.D. thesis, Albert-Ludwig- Universit�at, Freiburg in

Breisgau, Germany.

56

Page 58: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Appendix A

Calculation of the

Anticipated Variance under

Cluster Sampling

Detailed proofs of the key results Lemma 1 (4.4) and Lemma 2 (4.5) are provided

below. For ease of notation we omit the symbol !. Also, when dealing with the

Poisson-strata we shall write write Fgk , Y gk instead of F1;gk , Y 1;gk. We calculate

EgM2g (x)(Yc;g(x)� Y g)

2 = Eg

MXl=1

IFg (xl)(Y (xl)� Y g)

!2

(A.1)

which is equal to

Eg

0@ MXl=1

IFg (xl)(Y (xl)� Y g)2 +

MXl 6=k

IFg (xl)IFg (xk)(Y (xl)� Y g)(Y (xk)� Y g)

1Aand hence to PM

l=1 P(xl 2 Fg)V(Y (xl)jxl 2 Fg)+P

M

l6=k P(xl 2 Fg ; xk 2 Fg)Eg f(Y (xl)� Y g)(Y (xk)� Y g)jxl 2 Fg ; xk 2 Fgg(A.2)

Note that given xl 2 Fg , xl is uniformly distributed in Fg . By taking the expectation

with respect to ! the �rst term yields EgMg(u) times the anticipated variance under

simple random sampling, which is given by (2.8). The extra-term is due to cluster

sampling and is more tricky. We need to calculate

E!Eg f(Y (xl)� Y g)(Y (xk)� Y g)jxl 2 Fg ; xk 2 Fgg

First we split the event

fxk 2 Fgg \ fxl 2 Fggacross the Poisson-strata into the disjointed events

0@P1g[j=1

fxk 2 Fgjg \ fxl 2 Fgjg1A[0@P1g[

i6=jfxk 2 Fgig \ fxl 2 Fgjg

1A :=[s

As

57

Page 59: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

and use the decomposition rule for conditional expectation on disjointed As

E(Zj [s As) =

PsE(ZjAs )P(As)P

sP(As)

Note thatP

sP(As) = P(xk 2 Fg \ xl 2 Fg). The following expressions are neededE!Exj!f(Y (xl)� Y g)(Y (xk)� Y g)jxk 2 Fgj ; xl 2 Fgjg (A.3)

E!Exj!f(Y (xl)� Y g)(Y (xk)� Y g)jxk 2 Fgi; xl 2 Fgjg i 6= j (A.4)

To calculate the �rst expression write Y (xl)� Y g = Y (xl)� Y gj + Y gj � Y g and

likewise for Y (xk) � Y g, expand the product, interchange the order of expectation

and neglect boundary e�ects so that E!Y (u) = Y gj 8u 2 Fgj to �nally obtain

ExE!jxY (xk)Y (xl)� Y2

gj+ (Y gj � Y g)

2

At this point we make the assumption that the same tree cannot be sampled

from two di�erent points of the cluster, which is nearly always the case in

practice, with the possible exception of the angle count method and very large trees.

That is, we assume that

Ii(xk ; !)Ii(xl; !) = 0 8i 8k 6= l (A.5)

Under this assumption we obtain after some algebra

E!jxY (xk)Y (xl) =1

�2(Fgj)

Xi6=j2Fgj

YiYj = Y2

gj �1

�2(Fgj)

Xi2Fgj

Y 2i

As �(Fgj) ! 1 the second term vanishes. Hence, the �rst term is asymptotically

given by ( �Ygj � �Yg)2. To calculate the second term, we interchange the order of

expectation, write (Y (xl)� Y g) = (Y (xl)� Y gj + Y gj � Y g) and (Y (xk)� Y g) =

(Y (xk)�Y gi+Y gi�Y g) to obtain with the same arguments (Y gi�Y g)(Y gj�Y g).

The second term in (A.2) is therefore equal to

P1gXj=1

MXl6=k

(Y gj � Y g)2P(xk 2 Fgj ; xl 2 Fgj)+

P1gXi 6=j

MXl6=k

(Y gi � Y g)(Y gj � Y g)P(xk 2 Fgi; xl 2 Fgj)

(A.6)

To go further consider the random variables associated with the number of points

of a cluster falling into a given stratum. They read

M1;gj(x) =

MXl=1

IF1;gj (xl) (A.7)

Straightforward calculations yield

EgM21;gj(x) =

MXl6=k

P(xl 2 Fj ; xk 2 Fj) + EgM1;gj(x) (A.8)

EgM1;gi(x)M1;gj(x) =

MXk 6=l

P(xk 2 Fgi; xl 2 Fgj) (A.9)

58

Page 60: Rights / License: Research Collection In Copyright - Non … · 2020-03-26 · arier d'un domaine a l'autre. La pro c edure d'estimation rep ose sur une p ost-strati cation par rapp

Substituting (A.8, A.9) into (A.6) we see that (A.6) is equal to

Eg

0@P1gXj=1

M1;gj(x)(Y gj � Y g)

1A2

�P1gXj=1

EgM1;gj(x)(Y 1;gj � Y g)2 (A.10)

Using EgM1;gj(x) = p1;gjEgMg(x) with p1;gj =�(F1;gj )

�(Fg), substituting (A.10) into

(A.2) and using (2.8) yields

EgM2g(x)(Yc;g(x) � Y g)

2

E2gMg(x)

=1

EgMg(x)�2(Fg)

Xi2Fg

Y 2i

�g

i

+Eg

�PP1g

j=1M1;gj(x)(Y gj � Y g)�2

E2gMg(x)

(A.11)

which is precisely Lemma 1 (4.4) since Eg

�PP1g

j=1M1;gj(x)(Y gj � Y g)�= 0.

To prove Lemma 2 (4.5) bY (xl) is replaced asymptotically by Y 2;gk whenever

xl 2 F2;gk. Then one uses exactly the same technique as above but according

to sets As de�ned now by the working-strata instead of the Poisson-strata and by

replacing the random variable Y (xl) by the constant Y 2;gk whenever xl 2 F2;gk . Theterm containing

Pi2Fg

Y2

i

�g

i

does no longer appear since Mg(x) is the only random

variable in (4.5).

59