DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced

DATA ANALYSISDATA ANALYSIS

ERRORS IN CHEMICAL ANALYSISNormal phrases in describing results of an analysis“pretty sure”“very sure”“most likely”“improbable”

Replaced by using mathematical statistical tests.

Is there such a thing as “ERROR FREE ANALYSIS”?

Impossible to eliminate errors.Can only be minimized.Can only be approximated to an acceptable precision.

How reliable are our dataTO OVERCOME ERRORS.Carry out replicate measurements.Analyse accurately known standards.Perform statistical tests on data.

?

Mean/Average

N

x x

N

1 ii

Xi = individual values of x N = number of replicate measurements

MedianData in the middle if the number is odd, arranged in ascending order.The average of two data in the middle if he number is even arranged in ascending order.

ErrorsAbsolute ErrorAbsolute error = measured value – true value

Absolute error,

Relative error =

Percent relative error =

x d

d

100 d

STATISTICS

RangeThe different between the highest and lowest result.

Standard Deviation (SD), s

Measure of the precision of a population of data.

1N

x ix s

2

Small sample size

N-1 = Degree of freedom

Population (infinity)

N

ix

2

N = Number of replicate

More smaller SD means more precise the analysis

Varian, V

The square of standard deviation.For sample, V = s2 For population, V = 2

1N

) x x(s

N

1i

2i

2

Relative Standard Deviation (RSD)And Covarian (CV)

(ppt) 1000 x

s RSD

% 100 x

s CV

PRECISION•Relates to reproducibility or repeatability of a result.

•How similar are values obtained in exactly the same way?

•Useful for measuring deviation from the mean.

ACCURACY•Measurement of agreement between experimental mean and true value (which may not be known!).

xxd ii

DIFFERENCE BETWEEN ACCURACY AND PRECISION

Good precision does not guarantee accuracy.

High PrecisionHigh accuracy

High precisionLow accuracy

Good precision does not guarantee accuracy.

Low precisionLow accuracy

Low precisionHigh accuracy

x

TYPES OF ERROR IN EXPERIMENTAL DATA

Gross Errors•Serious but very seldom occur in analysis.•Usually obvious - give outlier readings.•Detectable by carrying out sufficient replicate measurements.•Experiments must be repeated.e.g. Instrument faulty, contaminate reagent.

Random (indeterminate) Error•Data scattered approximately symmetrically about a mean value.•Affects precision, can only be controlled. Dealt with statistically.e.g. physical and chemical variables.

Systematic (determinate) Error•Determinable and that presumably can be either avoided or corrected.•Several possible sources.•Readings all too high or too low.•Causing bias in technique. Can either be +ve or -ve. •Affects accuracy.

SOURCES OF SYSTEMATIC ERROR

Instrument ErrorNeed frequent calibration - both for apparatus such as volumetric flasks, burettes etc., but also for electronic devices such as spectrometers.Examples:Fluctuation in power supplyTemperature changes

Method ErrorDue to inadequacies in physical or chemical behaviour of reagents or reactions (e.g. slow or incomplete reactions).Difficult to detect and the most serious systematic error.Example:Small excess of reagent required causing an indicator to undergo colour change that signal the completion of a reaction.

Personal ErrorSources: Physical handicap, prejudice, not competence.Examples:Insensitivity to colour changesTendency to estimate scale readings to improve precisionPreconceived idea of “true” value.

SYSTEMATIC ERRORSSystematic errors can be: Constant(e.g. error in burette reading - less important for larger values of reading).Proportional (e.g. presence of given proportion of interfering impurity in sample; equally significant for all values of measurement).

Minimise Errors:•Minimise instrument errors by careful recalibration and good maintenance of equipment.

•Minimise personal errors by care and self-discipline.

Method errors - most difficult. “True” value may not be known. Three approaches to minimise:Analysis of certified standards (SRM)Use 2 or more independent methodsAnalysis of blanks

SIGNIFICANT FIGURES•Minimum number of digits written in scientific notation without a loss in accuracy.•Zero is significant only when,It occurs in the middle of a number401 - 3 significant figures6.0015 - 5 significant figures•It is the last number to the right of the decimal point.3.00 - 3 significant figures6.00 102 - 3 significant figures0.0500 - 3 significant figures

Addition-SubtractionUse the same number of decimal places as the number with the fewest decimal places. 12.2 + 0.365 + 1.04 = 13.605 = 13.6 (1 dp) (3 dp) (2 dp) (1 dp)

Multiplication - DivisionUse the same number of digits as the number with the fewest number of digits.

)sf 3( 10 21.3 228 . 204

1633.0 1.40

2

USE OF STATISTICS IN DATA EVALUATION•Defining the interval of values around a set mean () within which the population mean () can be expected with a given probability. The intervals are called confidence limits.

•Determining the number of replicates required to assure (at a desired probability) that an experimental mean () falls within a predicted interval of values around the population mean ().

•Estimating the probability that the experimental mean () and true value () are different or two experimental mean are different (t test).

•Estimating the probability that data from two experiments are different in precision (F test).

•Deciding when to accept/reject outliers among replicates (Q test).

•Treating calibration data.

•Quality control.

CONFIDENCE LIMITS AND CONFIDENCEINTERVAL Confidence LimitsInterval around the mean that probably contains .Extream value of , a < < b

Confidence IntervalThe magnitude of the confidence limits.

Confidence Limits

Confidence LevelFixes the level of probability that the mean is within the confidence limits.99.7%, 99%, 95%, 90%, 80%, 68%, 50%

x

x

CONFIDENCE LIMITS (CL)•SINCE THE EXACT VALUE OF POPULATION MEAN, CANNOT BE DETERMINED, ONE MUST USE STATISTICAL THEORY TO SET LIMITS AROUND THE MEASURED MEAN, , THAT PROBABLY CONTAIN .

•CL ONLY HAVE MEANING WITH THE MEASURED STANDARD DEVIATION, S, IS A GOOD APPROXIMATION OF THE POPULATION STANDARD DEVIATION, , AND THERE IS NO BIAS IN THE MEASUREMENT.

•CL WHEN IS KNOWN (POPULATION),

N z

x

N = Number of measurements

Values for z at various confidence levels are found in Table 1.

Confidence Level, %

z

50 0.67

68 1.00

80 1.29

90 1.64

95 1.96

96 2.00

99 2.58

99.7 3.00

99.9 3.29

VALUES FOR Z

Examples:

N

σ2.58 x μ

,2.58 z level, confidence 99% At

N

σ1.96 x μ

,1.96 z level, confidence 95% At

N

σ1.64 x μ

1.64, z level, confidence 90% At

or,

N

σ 2.58 x μ

N

σ 2.58 x

CL For Small Data Set (N 20), not known,

N z

x

t s

• Values of t depend on degree of freedom, (N - 1) and confidence level (from Table t).

• t also known as ‘student’s t’ and will be used in hypothesis test.

VALUES OF t AT VARIOUS CONFIDENCE LEVEL

Degree of

Freedom

Confidence Level

90 % 95 % 99 %

1 6.31 12.70 63.66

2 2.92 4.30 9.92

3 2.35 3.18 5.84

4 2.13 2.78 4.60

5 2.02 2.57 4.03

6 1.94 2.45 3.71

7 1.90 2.26 3.50

8 1.86 2.31 3.36

9 1.83 2.26 3.25

10 1.81 2.23 3.17

11 1.80 2.20 3.11

12 1.78 2.18 3.06

13 1.77 2.16 3.01

14 1.76 2.14 2.98

15 1.75 2.13 2.95

16 1.75 2.12 2.92

17 1.74 2.11 2.90

18 1.73 2.10 2.88

19 1.73 2.09 2.86

20 1.72 2.09 2.85

infinity

1.64 1.96 2.58

Example:

Data for the analysis of calcium in rock: 14.35%, 14.41%, 14.40%, 14.32% and 14.37%. Calculate the confidence interval at 95% confidence level.

N

ts x μ

Min, = 14.37Standard deviation, s = 0.037From the table,At 95 % confidence level, N - 1 = 4, t = 2.78.

Confidence interval,

0.05 14.37 5

0.037 2.78 x

At different confidence level,

Confidence Level Confidence Interval

90% = 14.37% 0.04

95% = 14.37% 0.05

99% = 14.37% 0.08

Summary:If the confidence level increased, the confidence interval is also increased. The probabilities of appear in the interval increased

Example:

Data for the analysis of calcium in rock: 14.35%, 14.41%, 14.40%, 14.32%, 14.45%, 14.50%, 14.25% and 14.37%. Calculate the confidence interval.

OTHER USAGE OF CONFIDENCE INTERVAL•To determine number of replicates needed for the mean to be within the confidence interval.•To determine systematic error.

TO DETERMINE NUMBER OF REPLICATES

If is known (s ),

2

x

z N

N z

x

μ

If is unknown

2

x s t

N

Nts

x

Example:Calculate the number of replicates needed to reduce the confidence interval to 1.5 g/mL at 95% confidence level. Given, s = 2.4 g/mL.

At 95% confidence level, t = 1.96,

2

xs t

N

10 5.1

4.2 96.1 N

2

TO DETERMINE SYSTEMATIC ERRORExampleA standard solution gave an absorption reading of 0.470 at a particular wavelength. Ten measurements were done on a sample and the mean gave a value of 0.461, with standard deviation of 0.003. Show whether systematic error exists in the measurements at 95% confidence level.

AnswerAt 95% confidence level, N – 1 = 9, t = 2.26,

002.0 461.0 10003.0

26.2 461.0

N

ts x

The calculation gives confidence limit of, 0.459 < < 0.463

•Does the true mean 0.470 belong to the interval?

•Does systematic error present?

Observations

Hypothesis Model

Valid? Reject

Basis for further experiments

YES

NO

TESTING A HYPOTHESIS

Hypothesis Data Theory

•Normally, the measured data is not always the same and seldom agree.

•We use statistics to test the disagreement.

NULL HYPOTHESIS

Null Hypothesis, Ho – the values of two measured quantities do not differ (significantly) UNLESS we can prove it that the two values are significantly different.

“Innocent until proven guilty”

•In Null Hypothesis, the calculated value of a parameter from the equation is compared to the parameter value from the table.

•If the calculated value is smaller than the table value, the hypothesis is accepted and vice-versa.

Null Hypothesis can be used to:

•Compare and

•Compare and from two sets of data

•Compare s1 and s2 or 1 and 2

•Compare s and

x

1x 2x

t TESTComparison between experimental mean and true mean ( and )

– the presence of systematic errorx

Steps:

1) If is not known,

sN

- xt

Nst

x

If is known,

σ

Nμ x z

x

x

ii) Calculate t or z (tcalc) from the data.

iii)Compare tcalc and ttable

iv) If tcalc > ttable

Reject Null Hypothesis (Ho) i.e.

• The different is due to systematic error.

v) If tcalc < ttable

Accept Null Hypothesis (Ho) i.e. ( )

• The different is due to random error.

t test

Example:The sulphur content of a sample of kerosene was found to be 0.123%. A new method was used on the same sample and the following data is obtained:%S : 0.112; 0.118; 0.113; 0.119

Show whether systematic error is present in the new method.

)(t 38.4 0032.0

4 123.0116.0

sN

x t

calc

xNull Hypothesis, Ho : =

= 0.116%, = 0.123%

s = 0.0032

x

Since tcalc> ttable, Ho is rejected and the two means are significantly different and thus systematic error is present.

t table = 3.18 (95 %, N-1 = 3)

data) tal(experimen 007.0

0.123 - 0.116 x

)value Table( 0051.0

40.0032 3.18

N

ts x

tablecalculated - x x

N

ts x

x

Other Solution:

Since,

) is significant and there is systematic error in the measurement.

x

(i.e. 0.007 > 0.0051),

Ho is rejected, the difference ( ) is significant and there is systematic error in the measurement.

2. Comparing two mean values and

Normally used to determine whether the two samples are identical or not.

The difference in the mean of two sets of the same analysis will provide information on the similarity of the sample or the existence of random error.

Data: , and s1, s2

Ho : =

We want to test whether - = 0

1x 2x

2

22

2

1

11

1

N

ts x

N

ts x

2x1x

1x2x

2x1x

2121 and Assume,

Calculate the value of t;

21

21

p

21calc NN

N+Ns

xx=t±

Compare tcalc with ttable; if tcalc< ttable, Ho is accepted.The pooled standard deviation, sp is calculated using:

s -21

222

212-1

p NN + N)s1-+(N)sN(N

=s

where,N1, N2 are numbers of data in sets 1 and 2Ns is the number of data sets

Example:The source of wine (the vineyard) is identified by determining the alcohol content of the different barrels.

Barrel 1: 6 determinations, = 12.61%Barrel 2: 4 determinations, = 12.53%

sp = 0.07%

Prove that both wine sources are different???

Ho : =

2x

2x

1x

1x

21

21

p

21calc NN

NNs

x-xt

7.14646

07.012.53-61.12

tcalc

From the t table at 95% confidence, with degrees of freedom 6 + 4 – 2 =

8, ttable is 2.31.

Since tcalc is smaller than ttable, Ho is accepted and the source of the wine statistically is the same.

COMPARING THE PRECISION OF TWO MEASUREMENTS (THE F-TEST)

•Is Method A more precise than Method B?

•Is there any significant difference between both methods?

22

21

2

1

s

s

vv

F

With the degree of freedom = N – 1Ho : the precision is identical; s1 = s2

Then, if Fcalc < Ftable ,Ho is accepted.

Since the values of F (from table) are always greater than 1, the smaller variance (the more precise) always become the denominator.V1 > V2, so 1F

Example:The determination of CO in a mixture of gases using the standard procedure gave an s value of 0.21 ppm. The method was modified twice giving s1 of 0.15 (12 degrees of freedom) and s2 of 0.12 (12 degrees of freedom)

Are the two modified methods more precise than the standard?

Ho : s1 = sstd. and Ho : s2 = sstd.

96.115.0

21.0

s

sF

2

2

21

2std

1 06.312.0

21.0

s

sF

2

2

22

2std

2

In the standard method, s and the degrees of freedom becomes infinity.Refer the F table:Numerator = , and denominator = 12; giving the critical F value of 2.30Since F1 < Ftable, Ho is accepted, While F2>Ftable; so Ho is rejected.

Ho : s1 = sstd.

and

Ho : s2 = sstd.

CONCLUSIONS

THE DIXON TEST OR THE Q TEST

A way of detecting outlier, a data, which is statistically, does not belong to the set.

Example:Data: 10.05, 10.10, 10.15, 10.05, 10.45, 10.10

By inspection, 10.45 seem to be out of the data normal range. It is easier to see it when the numbers are arranged in a decreasing or increasing order.

10.05, 10.05, 10.10, 10.10, 10.15, 10.45

Should this data (10.45) be eliminated, the mean will change from the original value!

w

x-xQ nq

texp

where,xq is the questionable dataxn is its nearest neighbourw is the difference between the highest and the lowest value (range).

The Qexpt or Qcalc will be compared with the Qcritical or Qtable, and the null hypothesis is checked.

10.05 - 10.4510.15 -45.10

Q texp = 0.75

Qcritical (95%, n = 6) = 0.625Qexpt > Qcritical

Data (10.45) can be rejected.

VALUE OF Q

Number of Observations

Confidence Level

90 % 95 % 99 %

3 0.941 0.970 0.994

4 0.765 0.829 0.926

5 0.642 0.710 0.821

6 0.560 0.625 0.740

7 0.507 0.568 0.680

8 0.468 0.526 0.634

9 0.437 0.493 0.599

10 0.412 0.466 0.568

Example:An analysis on calcite gave the following percentage of CaO: 55.45, 56.04, 56.23, 56.00, 55.08

Arrange the data in order:55.45, 56.00, 56.04, 56.08, 56.23Suspected data: 55.45 OR 56.23Qtable from 5 determinations, 95% = 0.710

19.0

55.45 - 56.2356.08-23.56

Qcalc

Since Qcalc<Qtable. Data cannot be rejected.

71.0

55.45 - 56.2356.00-45.55

Qcalc

Since the Qcalc is = Qtable, the data cannot be rejected.


Confidence Level

90 % 95 % 99 %

3 0.941

0.970

0.994

4 0.765

0.829

0.926

5 0.642

0.710

0.821

6 0.560

0.625

0.740

7 0.507

0.568

0.680

8 0.468

0.526

0.634

9 0.437

0.493

0.599

10 0.412

0.466

0.568

Example:An analysis on calcite gave the following percentage of CaO: 55.45, 56.04, 56.80, 56.23, 56.00,55.30, 55.08, 54.80 and 55.80. Any data should be rejected at 90% confidence level?

w

x-xQ nq

texp


Confidence Level

90 % 95 % 99 %

3 0.941 0.970 0.994

4 0.765 0.829 0.926

5 0.642 0.710 0.821

6 0.560 0.625 0.740

7 0.507 0.568 0.680

8 0.468 0.526 0.634

9 0.437 0.493 0.599

10 0.412 0.466 0.568

Documents

DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced