Upload
edith-curtis
View
215
Download
0
Embed Size (px)
Citation preview
DATA ANALYSISDATA ANALYSIS
ERRORS IN CHEMICAL ANALYSISNormal phrases in describing results of an analysis“pretty sure”“very sure”“most likely”“improbable”
Replaced by using mathematical statistical tests.
Is there such a thing as “ERROR FREE ANALYSIS”?
Impossible to eliminate errors.Can only be minimized.Can only be approximated to an acceptable precision.
How reliable are our dataTO OVERCOME ERRORS.Carry out replicate measurements.Analyse accurately known standards.Perform statistical tests on data.
?
Mean/Average
N
x x
N
1 ii
Xi = individual values of x N = number of replicate measurements
MedianData in the middle if the number is odd, arranged in ascending order.The average of two data in the middle if he number is even arranged in ascending order.
ErrorsAbsolute ErrorAbsolute error = measured value – true value
Absolute error,
Relative error =
Percent relative error =
x d
d
100 d
STATISTICS
RangeThe different between the highest and lowest result.
Standard Deviation (SD), s
Measure of the precision of a population of data.
1N
x ix s
2
Small sample size
N-1 = Degree of freedom
Population (infinity)
N
ix
2
N = Number of replicate
More smaller SD means more precise the analysis
Varian, V
The square of standard deviation.For sample, V = s2 For population, V = 2
1N
) x x(s
N
1i
2i
2
Relative Standard Deviation (RSD)And Covarian (CV)
(ppt) 1000 x
s RSD
% 100 x
s CV
PRECISION•Relates to reproducibility or repeatability of a result.
•How similar are values obtained in exactly the same way?
•Useful for measuring deviation from the mean.
ACCURACY•Measurement of agreement between experimental mean and true value (which may not be known!).
xxd ii
DIFFERENCE BETWEEN ACCURACY AND PRECISION
Good precision does not guarantee accuracy.
High PrecisionHigh accuracy
High precisionLow accuracy
Good precision does not guarantee accuracy.
Low precisionLow accuracy
Low precisionHigh accuracy
x
TYPES OF ERROR IN EXPERIMENTAL DATA
Gross Errors•Serious but very seldom occur in analysis.•Usually obvious - give outlier readings.•Detectable by carrying out sufficient replicate measurements.•Experiments must be repeated.e.g. Instrument faulty, contaminate reagent.
Random (indeterminate) Error•Data scattered approximately symmetrically about a mean value.•Affects precision, can only be controlled. Dealt with statistically.e.g. physical and chemical variables.
Systematic (determinate) Error•Determinable and that presumably can be either avoided or corrected.•Several possible sources.•Readings all too high or too low.•Causing bias in technique. Can either be +ve or -ve. •Affects accuracy.
SOURCES OF SYSTEMATIC ERROR
Instrument ErrorNeed frequent calibration - both for apparatus such as volumetric flasks, burettes etc., but also for electronic devices such as spectrometers.Examples:Fluctuation in power supplyTemperature changes
Method ErrorDue to inadequacies in physical or chemical behaviour of reagents or reactions (e.g. slow or incomplete reactions).Difficult to detect and the most serious systematic error.Example:Small excess of reagent required causing an indicator to undergo colour change that signal the completion of a reaction.
Personal ErrorSources: Physical handicap, prejudice, not competence.Examples:Insensitivity to colour changesTendency to estimate scale readings to improve precisionPreconceived idea of “true” value.
SYSTEMATIC ERRORSSystematic errors can be: Constant(e.g. error in burette reading - less important for larger values of reading).Proportional (e.g. presence of given proportion of interfering impurity in sample; equally significant for all values of measurement).
Minimise Errors:•Minimise instrument errors by careful recalibration and good maintenance of equipment.
•Minimise personal errors by care and self-discipline.
Method errors - most difficult. “True” value may not be known. Three approaches to minimise:Analysis of certified standards (SRM)Use 2 or more independent methodsAnalysis of blanks
SIGNIFICANT FIGURES•Minimum number of digits written in scientific notation without a loss in accuracy.•Zero is significant only when,It occurs in the middle of a number401 - 3 significant figures6.0015 - 5 significant figures•It is the last number to the right of the decimal point.3.00 - 3 significant figures6.00 102 - 3 significant figures0.0500 - 3 significant figures
Addition-SubtractionUse the same number of decimal places as the number with the fewest decimal places. 12.2 + 0.365 + 1.04 = 13.605 = 13.6 (1 dp) (3 dp) (2 dp) (1 dp)
Multiplication - DivisionUse the same number of digits as the number with the fewest number of digits.
)sf 3( 10 21.3 228 . 204
1633.0 1.40
2
USE OF STATISTICS IN DATA EVALUATION•Defining the interval of values around a set mean () within which the population mean () can be expected with a given probability. The intervals are called confidence limits.
•Determining the number of replicates required to assure (at a desired probability) that an experimental mean () falls within a predicted interval of values around the population mean ().
•Estimating the probability that the experimental mean () and true value () are different or two experimental mean are different (t test).
•Estimating the probability that data from two experiments are different in precision (F test).
•Deciding when to accept/reject outliers among replicates (Q test).
•Treating calibration data.
•Quality control.
CONFIDENCE LIMITS AND CONFIDENCEINTERVAL Confidence LimitsInterval around the mean that probably contains .Extream value of , a < < b
Confidence IntervalThe magnitude of the confidence limits.
Confidence Limits
Confidence LevelFixes the level of probability that the mean is within the confidence limits.99.7%, 99%, 95%, 90%, 80%, 68%, 50%
x
x
CONFIDENCE LIMITS (CL)•SINCE THE EXACT VALUE OF POPULATION MEAN, CANNOT BE DETERMINED, ONE MUST USE STATISTICAL THEORY TO SET LIMITS AROUND THE MEASURED MEAN, , THAT PROBABLY CONTAIN .
•CL ONLY HAVE MEANING WITH THE MEASURED STANDARD DEVIATION, S, IS A GOOD APPROXIMATION OF THE POPULATION STANDARD DEVIATION, , AND THERE IS NO BIAS IN THE MEASUREMENT.
•CL WHEN IS KNOWN (POPULATION),
N z
x
N = Number of measurements
Values for z at various confidence levels are found in Table 1.
Confidence Level, %
z
50 0.67
68 1.00
80 1.29
90 1.64
95 1.96
96 2.00
99 2.58
99.7 3.00
99.9 3.29
VALUES FOR Z
Examples:
N
σ2.58 x μ
,2.58 z level, confidence 99% At
N
σ1.96 x μ
,1.96 z level, confidence 95% At
N
σ1.64 x μ
1.64, z level, confidence 90% At
or,
N
σ 2.58 x μ
N
σ 2.58 x
CL For Small Data Set (N 20), not known,
N z
x
t s
• Values of t depend on degree of freedom, (N - 1) and confidence level (from Table t).
• t also known as ‘student’s t’ and will be used in hypothesis test.
VALUES OF t AT VARIOUS CONFIDENCE LEVEL
Degree of
Freedom
Confidence Level
90 % 95 % 99 %
1 6.31 12.70 63.66
2 2.92 4.30 9.92
3 2.35 3.18 5.84
4 2.13 2.78 4.60
5 2.02 2.57 4.03
6 1.94 2.45 3.71
7 1.90 2.26 3.50
8 1.86 2.31 3.36
9 1.83 2.26 3.25
10 1.81 2.23 3.17
11 1.80 2.20 3.11
12 1.78 2.18 3.06
13 1.77 2.16 3.01
14 1.76 2.14 2.98
15 1.75 2.13 2.95
16 1.75 2.12 2.92
17 1.74 2.11 2.90
18 1.73 2.10 2.88
19 1.73 2.09 2.86
20 1.72 2.09 2.85
infinity
1.64 1.96 2.58
Example:
Data for the analysis of calcium in rock: 14.35%, 14.41%, 14.40%, 14.32% and 14.37%. Calculate the confidence interval at 95% confidence level.
N
ts x μ
Min, = 14.37Standard deviation, s = 0.037From the table,At 95 % confidence level, N - 1 = 4, t = 2.78.
Confidence interval,
0.05 14.37 5
0.037 2.78 x
At different confidence level,
Confidence Level Confidence Interval
90% = 14.37% 0.04
95% = 14.37% 0.05
99% = 14.37% 0.08
Summary:If the confidence level increased, the confidence interval is also increased. The probabilities of appear in the interval increased
Example:
Data for the analysis of calcium in rock: 14.35%, 14.41%, 14.40%, 14.32%, 14.45%, 14.50%, 14.25% and 14.37%. Calculate the confidence interval.
OTHER USAGE OF CONFIDENCE INTERVAL•To determine number of replicates needed for the mean to be within the confidence interval.•To determine systematic error.
TO DETERMINE NUMBER OF REPLICATES
If is known (s ),
2
x
z N
N z
x
μ
If is unknown
2
x s t
N
Nts
x
Example:Calculate the number of replicates needed to reduce the confidence interval to 1.5 g/mL at 95% confidence level. Given, s = 2.4 g/mL.
At 95% confidence level, t = 1.96,
2
xs t
N
10 5.1
4.2 96.1 N
2
TO DETERMINE SYSTEMATIC ERRORExampleA standard solution gave an absorption reading of 0.470 at a particular wavelength. Ten measurements were done on a sample and the mean gave a value of 0.461, with standard deviation of 0.003. Show whether systematic error exists in the measurements at 95% confidence level.
AnswerAt 95% confidence level, N – 1 = 9, t = 2.26,
002.0 461.0 10003.0
26.2 461.0
N
ts x
The calculation gives confidence limit of, 0.459 < < 0.463
•Does the true mean 0.470 belong to the interval?
•Does systematic error present?
Observations
Hypothesis Model
Valid? Reject
Basis for further experiments
YES
NO
TESTING A HYPOTHESIS
Hypothesis Data Theory
•Normally, the measured data is not always the same and seldom agree.
•We use statistics to test the disagreement.
NULL HYPOTHESIS
Null Hypothesis, Ho – the values of two measured quantities do not differ (significantly) UNLESS we can prove it that the two values are significantly different.
“Innocent until proven guilty”
•In Null Hypothesis, the calculated value of a parameter from the equation is compared to the parameter value from the table.
•If the calculated value is smaller than the table value, the hypothesis is accepted and vice-versa.
Null Hypothesis can be used to:
•Compare and
•Compare and from two sets of data
•Compare s1 and s2 or 1 and 2
•Compare s and
x
1x 2x
t TESTComparison between experimental mean and true mean ( and )
– the presence of systematic errorx
Steps:
1) If is not known,
sN
- xt
Nst
x
If is known,
σ
Nμ x z
x
x
ii) Calculate t or z (tcalc) from the data.
iii)Compare tcalc and ttable
iv) If tcalc > ttable
Reject Null Hypothesis (Ho) i.e.
• The different is due to systematic error.
v) If tcalc < ttable
Accept Null Hypothesis (Ho) i.e. ( )
• The different is due to random error.
t test
Example:The sulphur content of a sample of kerosene was found to be 0.123%. A new method was used on the same sample and the following data is obtained:%S : 0.112; 0.118; 0.113; 0.119
Show whether systematic error is present in the new method.
)(t 38.4 0032.0
4 123.0116.0
sN
x t
calc
xNull Hypothesis, Ho : =
= 0.116%, = 0.123%
s = 0.0032
x
Since tcalc> ttable, Ho is rejected and the two means are significantly different and thus systematic error is present.
t table = 3.18 (95 %, N-1 = 3)
data) tal(experimen 007.0
0.123 - 0.116 x
)value Table( 0051.0
40.0032 3.18
N
ts x
tablecalculated - x x
N
ts x
x
Other Solution:
Since,
) is significant and there is systematic error in the measurement.
x
(i.e. 0.007 > 0.0051),
Ho is rejected, the difference ( ) is significant and there is systematic error in the measurement.
2. Comparing two mean values and
Normally used to determine whether the two samples are identical or not.
The difference in the mean of two sets of the same analysis will provide information on the similarity of the sample or the existence of random error.
Data: , and s1, s2
Ho : =
We want to test whether - = 0
1x 2x
2
22
2
1
11
1
N
ts x
N
ts x
2x1x
1x2x
2x1x
2121 and Assume,
Calculate the value of t;
21
21
p
21calc NN
N+Ns
xx=t±
Compare tcalc with ttable; if tcalc< ttable, Ho is accepted.The pooled standard deviation, sp is calculated using:
s -21
222
212-1
p NN + N)s1-+(N)sN(N
=s
where,N1, N2 are numbers of data in sets 1 and 2Ns is the number of data sets
Example:The source of wine (the vineyard) is identified by determining the alcohol content of the different barrels.
Barrel 1: 6 determinations, = 12.61%Barrel 2: 4 determinations, = 12.53%
sp = 0.07%
Prove that both wine sources are different???
Ho : =
2x
2x
1x
1x
21
21
p
21calc NN
NNs
x-xt
7.14646
07.012.53-61.12
tcalc
From the t table at 95% confidence, with degrees of freedom 6 + 4 – 2 =
8, ttable is 2.31.
Since tcalc is smaller than ttable, Ho is accepted and the source of the wine statistically is the same.
COMPARING THE PRECISION OF TWO MEASUREMENTS (THE F-TEST)
•Is Method A more precise than Method B?
•Is there any significant difference between both methods?
22
21
2
1
s
s
vv
F
With the degree of freedom = N – 1Ho : the precision is identical; s1 = s2
Then, if Fcalc < Ftable ,Ho is accepted.
Since the values of F (from table) are always greater than 1, the smaller variance (the more precise) always become the denominator.V1 > V2, so 1F
Example:The determination of CO in a mixture of gases using the standard procedure gave an s value of 0.21 ppm. The method was modified twice giving s1 of 0.15 (12 degrees of freedom) and s2 of 0.12 (12 degrees of freedom)
Are the two modified methods more precise than the standard?
Ho : s1 = sstd. and Ho : s2 = sstd.
96.115.0
21.0
s
sF
2
2
21
2std
1 06.312.0
21.0
s
sF
2
2
22
2std
2
In the standard method, s and the degrees of freedom becomes infinity.Refer the F table:Numerator = , and denominator = 12; giving the critical F value of 2.30Since F1 < Ftable, Ho is accepted, While F2>Ftable; so Ho is rejected.
Ho : s1 = sstd.
and
Ho : s2 = sstd.
CONCLUSIONS
THE DIXON TEST OR THE Q TEST
A way of detecting outlier, a data, which is statistically, does not belong to the set.
Example:Data: 10.05, 10.10, 10.15, 10.05, 10.45, 10.10
By inspection, 10.45 seem to be out of the data normal range. It is easier to see it when the numbers are arranged in a decreasing or increasing order.
10.05, 10.05, 10.10, 10.10, 10.15, 10.45
Should this data (10.45) be eliminated, the mean will change from the original value!
w
x-xQ nq
texp
where,xq is the questionable dataxn is its nearest neighbourw is the difference between the highest and the lowest value (range).
The Qexpt or Qcalc will be compared with the Qcritical or Qtable, and the null hypothesis is checked.
10.05 - 10.4510.15 -45.10
Q texp = 0.75
Qcritical (95%, n = 6) = 0.625Qexpt > Qcritical
Data (10.45) can be rejected.
VALUE OF Q
Number of Observations
Confidence Level
90 % 95 % 99 %
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.599
10 0.412 0.466 0.568
Example:An analysis on calcite gave the following percentage of CaO: 55.45, 56.04, 56.23, 56.00, 55.08
Arrange the data in order:55.45, 56.00, 56.04, 56.08, 56.23Suspected data: 55.45 OR 56.23Qtable from 5 determinations, 95% = 0.710
19.0
55.45 - 56.2356.08-23.56
Qcalc
Since Qcalc<Qtable. Data cannot be rejected.
71.0
55.45 - 56.2356.00-45.55
Qcalc
Since the Qcalc is = Qtable, the data cannot be rejected.
Number of Observations
Confidence Level
90 % 95 % 99 %
3 0.941
0.970
0.994
4 0.765
0.829
0.926
5 0.642
0.710
0.821
6 0.560
0.625
0.740
7 0.507
0.568
0.680
8 0.468
0.526
0.634
9 0.437
0.493
0.599
10 0.412
0.466
0.568
Example:An analysis on calcite gave the following percentage of CaO: 55.45, 56.04, 56.80, 56.23, 56.00,55.30, 55.08, 54.80 and 55.80. Any data should be rejected at 90% confidence level?
w
x-xQ nq
texp
Number of Observations
Confidence Level
90 % 95 % 99 %
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.599
10 0.412 0.466 0.568