63
Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Embed Size (px)

Citation preview

Page 1: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Elementary statistics for foresters

Lecture 1

Socrates/Erasmus Program @ WAU

Spring semester 2005/2006

Page 2: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Instead of introduction:

a few quotes from smart people

Page 3: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistics is a pain

• Every normal person who takes it knows that it is (almost always) badly taught, unreadable, and even when you follow the idea, you can't imagine where to apply it.

Page 4: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

So, why to learn it?

• The reason to learn this stuff is that it is terribly useful in very practical ways.

Page 5: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

So, why to learn it?

• The reason to learn this stuff is that it is terribly useful in very practical ways.

• The difference between 50 and 100 plots established in the field may not seem important to someone in a warm, dry office, but it matters to somebody on a wet, cold, 60% slope.

Page 6: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

So, why to learn it?

• The reason to learn this stuff is that it is terribly useful in very practical ways.

• The difference between 50 and 100 plots established in the field may not seem important to someone in a warm, dry office, but it matters to somebody on a wet, cold, 60% slope.

• The reason to understand this stuff is that it can save real money, real time and real sweat.

Page 7: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Good news about statistics

• Normal people can learn it, with little math background or attitude.

Page 8: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Good news about statistics

• Normal people can learn it, with little math background or attitude.

• It's true that many statisticians come from mathematics, but it isn't necessary or useful for most applications.

Page 9: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Good news about statistics

• Normal people can learn it, with little math background or attitude.

• It's true that many statisticians come from mathematics, but it isn't necessary or useful for most applications.

• Ordinary people who can see statistics in perspective are often the most innovative and credible users of statistics.

Page 10: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistics has little to do with math

• There are some exact probabilities that are used, and you may need math to calculate them, but this is a detail.

• All the powerful ideas are logical ideas.

• Calculators (or computers) now do all the math, you only have to work with the logical ideas.

Page 11: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistics has little to do with math

• There are only a few important ideas, even though there is a mass of names and symbols swirling around them.

• There are no more than a handful of important equations too, they just have lots of special forms.

Page 12: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistics is a matter of practice

• Besides, statistics is a normal procedure many of you are expected to know how to use.

• Just like driving a truck, it's a necessary part of doing the job.

• Like using a chain saw or juggling, it's a matter of practice and the right approach.

Page 13: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Don't be intimidated!

Look at some of the people that do this work - if they can learn it, so can you.

Page 14: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Basic concepts and definitions

• However, before we start a real statistical adventure, we have to introduce a few definitions and basic ides to be able to communicate easier during the whole course.

Page 15: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Basic concepts and definitions

• However, before we start a real statistical adventure, we have to introduce a few definitions and basic ides to be able to communicate easier during the whole course.

• Let's start with a few definitions of statistics itself.

Page 16: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistics

• Statistics is the science and practice of developing human knowledge through the use of empirical data.

• Statistics is a method of analysis, treating about – collecting the data, – summarizing the data, and – making conclusions based on the data.

Page 17: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistics

• Statistics is a discipline which deals with the – collection, – organization, and – interpretation of data.

• Statistics is a collection of methods for – planning experiments, – obtaining data, – and then organizing, summarizing, presenting,

analyzing, interpreting and drawing conclusions based on that data.

Page 18: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Descriptive statistics

• Descriptive statistics are used to summarize or describe characteristics of a known set of data.

• Used if we want to describe or summarize data in a clear and concise way using graphical and/or numerical methods.

Page 19: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Descriptive statistics

• For example: we can consider everybody in the class as a group to be described. Each person can be a source of data for such an analysis.

• A characteristic of this data may be for example age, weight, height, sex, country of origin, etc.

Page 20: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Descriptive statistics

• Closer-to-forestry example: we can consider all pine stands in central Poland as a group to be characterized.

• Each stand can be described by its area, age, site index, average height, QMD, volume per hectare, volume increment per hectare per year, amount of carbon sequestered, species composition, damage index, ...

Page 21: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Graphical description of data

• Pictures are very informative and can tell the entire story about the data.

• We can use different plots for different sorts of variables. We can use for example bar plots (histograms), pie charts, box plots, ... .

Page 22: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Graphical description of data

Page 23: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Graphical description of data

Page 24: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Numerical description of data

• Numerical description is used for quick capturing of main data features using special values - statistical measures

• Graphical and numerical methods will be discussed later (during a lecture on descriptive statistics)

Page 25: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Inferential statistics

• Inferential statistics goes far beyond the simple description of the data

• It means the use of sample to make inferences about a larger set of data from which the sample was chosen.

Page 26: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Inferential statistics

• For example: we can consider participants of this course as a sample of all Socrates students taking part in all courses at the WAU this academic year and calculate the average age of all of us.

• Then we could state that the average age of all Socrates students is the same as ours (as in our sample).

Page 27: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Inferential statistics

• Closer-to-forestry example: we can consider pine stands used for the large-scale inventory in Poland and calculate eg. an average volume per hectare.

• Then we could state that the average volume per hectare of all pine stands in Poland is the same as in our sub-population (our sample).

Page 28: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Population

• A population is the complete collection of elements used for the study (to be studied).

• Population is something we are interested in. Like in the previous example: all pine stands in Poland could be our population of interest, all trees in a given forest tract, all Socrates students at WAU this year, ...

Page 29: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Population consists of individuals

• Population is a collection of individuals, which can be described with data.

• All set of pine stands in Poland consists in fact of single stands. Each stand can be described by some characteristics such as its area, age, site index, average height, QMD, volume per hectare, volume increment per hectare per year, amount of carbon sequestered, species composition, damage index, ...

Page 30: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Sample

• A sample is a part of a population, a subset of elements drawn from the population.

Page 31: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Parameter

• A parameter is a characteristic of the entire population.

• For example: an average age (characteristic) of all spruce stands in Finland (population) is a parameter.

Page 32: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Statistic

• A statistic is a characteristic of a sample (note the second meaning of the word "statistics").

• For example: an average age (characteristic) of spruce stands chosen for measurement during an inventory (sample) is a statistic.

Page 33: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Estimator

• An estimator is a statistic (coming from a sample) used to inference about a parameter (of the entire population)

Page 34: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Inferential statistics

• Taking these together: inferential statistics are used to figure out parameters (characterictics of the population) based on statistics (characteristics coming from a sample) which are estimators.

• This is a major way of performing statistical analyses.

Page 35: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Variable

• A variable is a value or characteristic observed on units in the sample that can vary from unit to unit in the sample.

• Variables are just attributes of an individual.

• Example: people in our lab can be described using various characteristics, such as sex, hair colour, country, height, weight, IQ, ...

Page 36: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Variable

• Closer-to-forestry example: all trees in one of forest tracts in the Rogów forest district are somehow similar (they have their bole, branches, crown, leaves, ...), but

• they can be described by a whole bunch of characteristics (variables, attributes) such as species, height, DBH, crown length, crown ratio, volume, taper, form factor, ...

Page 37: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Variables

• qualitative (which means: describing belonging to a group or category, eg. sex, hair color, tree species)

• quantitative (which means: possible to measure using a numerical scale, or numeric values for which addition and averaging make sense, eg. DBH, height, crown ratio, ...).

Page 38: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Variables

• if variables can take only a finite set of values, we are talking about discrete variables (eg. age, DBH class, ...)

• if variables can take any value (or any value from a given interval), we are talking about continuous variables (eg. height, DBH, ...)

• In many cases, due to measurement limitations or simplifications, continuous variables can be treated as discrete (eg. DBH measured as rounded to 1mm)

Page 39: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

How to measure / arrange variables?

• nominal scale: data consist of names, labels or categories, with no particular ordering scheme (eg. species)

• ordinal scale: data can be arranged in some order, but differences are meaningless in terms of values (eg. damage index)

• interval scale: data can be arranged in some order with meaningful differences between values (eg. DBH class)

• other

Page 40: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Dictribution

• Variables have their distribution.

• Distribution of a variable gives the values the variable can take and how often it takes on each value.

• We'll talk more about variable distributions during the lecture on descriptive statistics, and more about theoretical distributions during the lecture on distributions.

Page 41: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Values and equations

• population parameters: μ , σ2, σ, α, β, δ

• sample statistics: x, s2, s, a, b, ...

• Indices/subscripts: i, j, xi, yi

– Not very informative to use just i, j, k

• Sum: ∑ with index, ∑ without index

• Using brackets

Page 42: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Sampling

Page 43: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Sample

• Sample is a part (subset) of the population

• Sample (individuals to be measured) has to be chosen in a specific way

• Sample is used to inference about the entire population (using a special procedure called estimation)

Page 44: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Why sample?

• Sometimes all the individuals are measured• This is referred to as „a census” or „a 100%

sample”• When do we measure all items?

– when the area is small– when the values are high– when credibility is a value itself (legal cases)– When sampling process causes problems

(boundary)

Page 45: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Why sample?

• Population size• Cost• Time constraints• Destruction and disruption (eg. explosives)• Improving the accuracy (!)

– Accurate and carefull measeurements– Acceptable sampling error– Means: eliminate measurement error and accept

sampling error instead

Page 46: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Why sample?

• Relative answers (when eg. volume per hectare more important than the total volume of the entire area; also if not possible to delineate the total area of interest)

Page 47: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Good sample?

• Ramdomly selected • Probability of selecting a particular

individual from the population is known, but not necessarily equal to the probability of selecting another

• Accurate, precise, and unbiased• Wise and efficient• Flexible

Page 48: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Accuracy and precision

• Accuracy is the degree of conformity of a measured/calculated quantity to its actual (true) value.

• Precision is the degree to which further measurements or calculations will show the same or similar results.

Page 49: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

The target analogy

• Repeated measurements are compared to arrows that are fired at a target.

Page 50: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Accuracy

• Accuracy describes the closeness of arrows to the bullseye at the target center. Arrows that strike closer to the bullseye are considered more accurate.

• The closer are measurements to the accepted value, the more accurate the measurement is considered to be.

Page 51: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Precision

• Precision could be the size of the arrow cluster when many arrows are fired.

• When all arrows are grouped tightly together, the cluster is considered precise since they all struck close to the same spot, if not necessarily near the bullseye.

• The measurements are precise, though not necessarily accurate.

Page 52: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Accuracy and precision

• Note, that if only one arrow is fired, precision is the size of the cluster we expect if this was repeated many times under the same conditions.

• The concept of repeating the experiment (repeatedly draw a sample) will follow us all the time during this course

Page 53: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Precision and accuracy

• The results of a measurement or calculations can be accurate but not precise, precise but not accurate, neither, or both;

• If a result is both accurate and precise, it is called valid.

• Possible cases again:– High accuracy, low precision– High precision, low accuracy– Low precision, low accuracy– High precision, high accuracy

Page 54: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Precision and accuracy

Biased

Unbiased

Precise Imprecise

Page 55: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Error and bias

• The related terms in statistics are error (random variability in measurements) and bias (non-random or directed effects caused by a factor or factors unrelated by the observed variable).

Page 56: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Methods of sampling

• There are a lot of them, they can easily fill a whole thick textbook (as eg. a classical William G. Cochran’s textbook), but there are a few most popular schemes.

http://www.amazon.com/

448 pages, $101.75

Reviewed as „A classic and the bible of sampling techniques”

Page 57: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Methods of sampling

• random sampling: members of the population are selected in such a way that each member has an equal chance of being selected.

• stratified sampling: the population is first split into at least two sub-populations (so called strata) that are similar in some way (eg. gender) and then a sample is driven separately from each strata.

Page 58: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Methods of sampling

• cluster sampling: the population is divided into clusters (groups), and then a few clusters are randomly selected; all members from selected clusters are measured

• systematic sampling: a first element of the population (starting point) is chosen randomly and then every kth element in the population is chosen for measurement

Page 59: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

How to choose random sample?

• Statistical tables with random numbers

• Other tools assuring randomness of a sample (Lotto, hat, ...)

• Computer random number generators

Page 60: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Random numbers table

Page 61: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Sampling frame

• In order to be able to draw a sample, we have to have a sampling frame

• The sampling frame is the list of the population from which the sample is drawn.

• In order to make inferences from survey data, the researcher must understand how the sampling frame defines the population represented, as well as which population groups are excluded.

Page 62: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Sampling

• A few practical examples of drawing a sample from a population using tables, computer software, and other techniques.

Page 63: Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006

Literature

• http://en.wikipedia.org/wiki/Category:Statistics• http://www.statsoft.com/textbook/stathome.html• http://www.stats.gla.ac.uk/steps/glossary/index.html• http://www.proaxis.com/~johnbell/sfpp/sfppc.htm