Van Loco 2002

Received: 15 January 2002Accepted: 18 April 2002

Abstract The correlation coefficientis commonly used to evaluate the de-gree of linear association betweentwo variables. However, it can beshown that a correlation coefficientvery close to one might also be ob-tained for a clear curved relation-ship. Other statistical tests, like theLack-of-fit and Mandels fitting testthus appear more suitable for thevalidation of the linear calibrationmodel. A number of cadmium cali-bration curves from atomic absorp-tion spectroscopy were assessed fortheir linearity. All the investigatedcalibration curves were characterizedby a high correlation coefficient (r >0.997) and low quality coeffi-cient (QC

Assessing the linearity of calibration curves

Graphite furnace atomic absorption spectroscopy (GF-AAS) is known to have a limited linear calibration range.In order to assess the linearity of the calibration process,several calibration lines for cadmium were constructedover a period of 4 months. These calibrations were per-formed using standard solutions prepared from the corre-sponding high purity metal Baker Cd Atomic AbsorptionStandard of 1000 g/ml (National Institute for Standardsand Technology NIST traceable). The GF-AAS wasprogrammed to produce a calibration curve with the fol-lowing concentrations: 0, 0.8, 1.6, 2.4, 3.2 and 4.0 ng/ml.The solutions were injected in duplicate.

The linearity of the calibration process was investi-gated by means of the Lack-of-fit test [2], Mandels fit-ting test value [5], the quality coefficient (QC) [2,4] andr [3,2]. The results are summarized in Table 1.

The Lack-of-fit test and Mandels fitting test are com-monly used to ascertain whether the chosen regressionmodel adequately fits the data. The test values for thesetwo statistical tests follows an F-distribution and the sig-nificance of the test values can be calculated. On thecontrary, the QC and r are used to arbitrary accept or re-ject the LRM. The equations of the QC and r for LRMsare given below:

(1)

(2)with Yi the measured response and Yi the response pre-dicted by the model.

The results in Table 1 shows that for the Lack-of-fittest the LRM must systematically be rejected at the 95%confidence level (Fcrit,95% = 4.53), and for Mandels fit-ting test even rejected at the 99% confidence level(Fcrit,99% = 10.56). Thus, despite the fact that r and QCare greater than 0.997 and lower than 5%, respectively,the linearity of the calibration lines were rejected on thebasis of the before mentioned F-tests. This corroboratesthe statements of the Analytical Methods Committee [3]that r should be used with care when evaluating the lin-earity of calibration lines. Moreover, questions arise re-garding the significance of QC, for which an upper limitof 5% was proposed in assessing the suitability of a cali-bration process [4]. Here, even with a QC-value less than3%, the LRM is rejected at the 95% confidence level(Table 1).

Alternatively, the residual plots give useful informa-tion to validate the chosen regression model. The residu-al plot can be used to check whether the underlying as-sumptions, like normality of the residuals and homosce-dasticity, are met as for evaluating the goodness of fit ofthe regression model [2]. Figure 1a shows a residual plotfor an LRM. The U-shaped residual plot indicates that acurvilinear regression model should be preferred over anLRM.

Several authors [2, 6, 811] recommended alternativecalibration functions when linearity of the calibrationcurve has to be rejected. In order to correct the non-lin-earity, a quadratic curvilinear function (f(x) = a + bx+cx2) was chosen. The Lack-of-fit tests for the quadraticregression model (QRM) are summarized in Table 1. Thetest for Lack-of-fit reveals that this QRM adequately fitsthe calibration data at 99% (highly significant) confi-dence level and at the 95% (significant) confidence levelin all cases except one. In determining whether the orderof the polynomial regression model is appropriate, the

282

rX X Y Y

X X Y Yi i

i i

=

( )( )( ) ( )2 2

Table 1 The F-value of theLack-of-fit (LOF) test (Fcrit,95% = 4.53) and Mandelsfitting test (Fcrit,95%= 5.12) arecompared with the quality co-efficient and the correlation co-efficient for several linear cali-bration lines of Cd. For thequadratic regression model, theF-value of the Lack-of-fit test(Fcrit,95%=4.76) and the P-valuefor testing significance of thesecond order coefficient for thequadratic regression model arerepresented. The significantvalues at the 95% confidencelevel are underlined

Linear regression model Quadratic regression modelLOF Mandels QC (%) r LOF P-value on

test value second-ordercoefficient

11.08 51.46 3.93 0.9982 0.63 0.000019.42 56.84 4.23 0.9978 1.58 0.0000

7.13 26.29 3.67 0.9985 0.94 0.00066.99 37.73 3.79 0.9984 0.18 0.0002

11.43 58.21 4.03 0.9981 0.31 0.000029.91 53.02 3.53 0.9986 4.08 0.000049.80 71.07 3.76 0.9984 5.69 0.000023.77 73.86 3.19 0.9989 1.66 0.000031.95 63.37 3.24 0.9988 3.55 0.0000

7.49 33.50 2.92 0.9991 0.54 0.00039.99 55.19 3.95 0.9983 0.15 0.0000

10.71 28.65 4.70 0.9975 1.89 0.000525.21 79.60 3.34 0.9987 1.62 0.000013.16 35.74 3.37 0.9987 1.93 0.0002

QCY Y

Yn

i i

(%)

=

100 1

2

283

significance of the second order coefficient is estimated.The P-value on the second order coefficient, shown inTable 1, is systematically smaller than 1%. Consequent-ly, a lower order model should not be considered. In ad-dition, residual plots (Fig. 1b) were constructed for thisQRM. The residuals were randomly scattered within ahorizontal band around the centre line. Therefore, theQRM was chosen as the reference model. It is noted thatan increase of the variance is observed at higher concen-trations.

Predictions made on the basis of the fitted curvefor linear (LRM) and quadratic (QRM) regressionmodels

To gauge the agreement/disagreement between predictedconcentrations calculated from the LRM and the QRM, amid-scale calibration standard (2 ng/ml) was systemati-cally injected in duplicate. The instrument signal corre-sponds to a point close to the centroid of the data cloud,where the confidence limits for the regression line ofLRM is the narrowest.

To compare the outcome of both regression models,the predicted concentration of the mid-scale standardwas expressed both as a recovery rate and as a relativedeviation. Hence, the following equations were used:

(3)Relative deviation (%) = 100-recovery (%) (4)In order to investigate possible effects of time, severalcalibration lines were produced over a period of almost 4months. The mid-scale standard was determined twice atthe beginning and at the end of an analysis. The recoveryrate and relative deviation of the results are summarizedin Table 2.

If both curves yield equivalent results and are not bi-ased, the recovery rate should be around 100%. Figure 2

Recovery (%) = determined concentrationnominal concentration 100%

Fig. 1 Plots of residuals for (a)the linear regression model(LRM) and (b) the quadraticregression model (QRM) ver-sus predicted values

Fig. 2 Recovery results for the mid-scale standard calculated witha second-order calibration curve (QRM) and a linear calibrationcurve (LRM)

shows the recovery rates for the mid-scale standard cal-culated with the LRM and the QRM. It clearly appearsthat the median from the LRM differs from the theoreti-cal value of 100%. In general, the recovery rates areoverestimated when calculated with the LRM. A system-atic error of about 4% was found.

This result is supported by the Wilcoxon one-sampletest [12]. The null hypothesis (median = 100%) is reject-ed for the results determined with the LRM (P = 0.0004),but not for those derived from the QRM (P = 0.1788).The one-sample student t-test gave similar outcomes(LRM: P = 8.3.108, QRM: P = 0.177).

Furthermore, to gauge whether the results calculatedwith the QRM were more accurate than those obtainedfrom the LRM the relative deviation of the data werecompared. The non-parametric paired sign test waschosen for this evaluation because the relative deviationswere not symmetrically distributed. The sign test is analternative to the paired t-test, when the underlying as-sumptions of the student t-test are violated. The signtest makes use of positive and negative signs depending

on the difference between the values in conditions 1 and2. The null-hypothesis is rejected if the number of posi-tive and negative signs is statistically different. The null-hypothesis stating equivalent results from both modelsmust be rejected in this case (n- = 14, n+ = 2, P 0.997. In addition, the present resultsraise the question about the relevance of the QC in as-sessing the process calibration. Other statistical tests like

the Lack-of-fit and Mandels fitting test seems more ap-propriate for evaluating the linearity of the calibrationcurve during method validation. Preferably, the Lack-of-fit and Mandels fitting test should be used in conjunc-tion with an evaluation of the residual plot.

Furthermore, it is shown that a straight-line modelwith r >0.997 and QC

5. Mandel J (1964) The statistical analy-sis of experimental data. Wiley, NewYork

6. Ratkowsky DA (1990) Handbook ofnonlinear regression models. MarcelDekker, New York

7. ISO 118432: 2000 (2000) Capabilityof detection Part 2: Methodology inthe linear calibration case. InternationalOrganization for Standardization(ISO), Geneva

8. ISO 84662: 2001 (2001) Water quali-ty Calibration and evaluation of ana-lytical methods and estimation of per-formance characteristics, Part 2: Cali-bration strategy for nonlinear second-order calibration functions. ISO, Geneva

9. MacTaggart DL, Farwell SO (1992) J AOAC Int 75: 594608

10. Wang X, Smeyers-Verbeke J, MassartDL (1992) Analusis 20: 209215

11. Funk W, Dammann V, Donnevert G(1995) Quality assurance in analyticalchemistry. VCH, Weinheim

12. Sheskin DJ (2000) Handbook of para-metric and non-parametric statisticalprocedures, 2nd edn. Chapman andHall/CRC Press, Boca Raton

13. Levene H (1960) In: Olkin I. et al.(eds) Contributions to probability andstatistics: Essays in honor of HaroldHotelling. Stanford University Press,Stanford, pp 278292

285

Documents

Van Loco 2002