5
PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates Seul-Ki Jeong,* ,Keun Na, Kwang-Youl Kim, § Hoguen Kim, and Young-Ki Paik* ,,§ Yonsei Proteome Research Center and Biomedical Proteome Research Center, Graduate Program in Functional Genomics, § Department of Biochemistry, Department of the Integrated Omics for Biomedical Science (World Class University Graduate Program), and Deptartment of Pathology, College of Medicine, Yonsei University, Seoul 120-749, Korea * S Supporting Information ABSTRACT: Measuring and evaluating diagnostic eciency is important in biomarker discovery and validation. The receiver operating characteristic (ROC) curve is a graphical plot for assessing the performance of a classier or predictor that can be used to test the sensitivity and specicity of diagnostic biomarkers. In this study, we describe PanelCom- poser, a Web-based software tool that uses statistical results from proteomic expression data and validates biomarker candidates based on ROC curves and the area under the ROC curve (AUC) values using a logistic regression model and provides an ordered list that includes ROC graphs and AUC values for proteins (individually or in combination). This tool allows users to easily compare and assess the eectiveness and diagnostic eciency of single or multiprotein biomarker candidates. PanelComposer is available publicly at http:// panelcomposer.proteomix.org/ and is compatible with major Web browsers. KEYWORDS: biomarker evaluation, protein expression, multivariate analysis, panel, bioinformatics INTRODUCTION A disease biomarker is dened as a molecular signature that reects dierent stages of disease before or after treatment. Biomarkers are often used to monitor prognosis during treatment. 1 The goal of biomarker discovery is to develop non-invasive tests that permit early disease detection, patient classication, and disease progression or recurrence monitor- ing. 2 To better identify biomarkers, the ecacy of a dierentially expressed protein in patients with a specic disease compared to that in healthy individuals should be thoroughly evaluated. In addition, the diagnostic performance of the dierentially expressed proteins should be compared to that of other known marker proteins. 3 The receiver operating characteristic (ROC) curve is generally used as a statistical method to assess the performance of a binary classier that can distinguish two categories, such as disease versus nondisease or moderate versus severe disease states. 4 The ROC curve is created by plotting the sensitivity of a test versus its specicity and can illustrate the eciency of a test at various cutopoints. ROC curves have also been shown to be useful for determining the optimal threshold for a given test. For classiers or biomarkers, the threshold can be dened as a criterion to distinguish one group from another. The area under the ROC curve (AUC) is the most widely used measure of biomarker performance. The AUC allows assessment of the performance of each test by means of a single measurement and, therefore, is often used to compare the performance of two or more tests. 5,6 Single component (protein) biomarkers have been widely used for disease diagnosis (e.g., α-fetoprotein for hepatocellular carcinoma (HCC) and prostate-specic antigen for prostate cancer); however, some limitations exist in their performance and applicable range in providing accurate assessments of disease states. The combination of several proteins into a biomarker panel has been proposed as a means to overcome these limitations, as this strategy may increase the diagnostic eciency compared to that of a single protein. 710 Several dierent methods have been reported for determin- ing biomarker panel composition, such as logistic regres- sion, 4,11,12 random forest, 1315 and machine learning algo- rithms. 16,17 To combine multiple proteins into a single panel, we employed a logistic regression model because it is easy to interpret in its linear form. 18,19 Unlike other methods, logistic regression also does not require the data set to be normally distributed or linearly related, nor does it require equal variance within each group. 18 Further, logistic regression models can be easily updated when new or additional data sets become available. 19 Logistic regression can model numerical or categorical values into a single variable ranging from 0 to 1 Received: May 12, 2012 Published: November 9, 2012 Technical Note pubs.acs.org/jpr © 2012 American Chemical Society 6277 dx.doi.org/10.1021/pr3004387 | J. Proteome Res. 2012, 11, 62776281

PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates

Embed Size (px)

Citation preview

Page 1: PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates

PanelComposer: A Web-Based Panel Construction Tool forMultivariate Analysis of Disease Biomarker CandidatesSeul-Ki Jeong,*,† Keun Na,‡ Kwang-Youl Kim,§ Hoguen Kim,⊥ and Young-Ki Paik*,†,§

†Yonsei Proteome Research Center and Biomedical Proteome Research Center, ‡Graduate Program in Functional Genomics,§Department of Biochemistry, Department of the Integrated Omics for Biomedical Science (World Class University GraduateProgram), and ⊥Deptartment of Pathology, College of Medicine, Yonsei University, Seoul 120-749, Korea

*S Supporting Information

ABSTRACT: Measuring and evaluating diagnostic efficiencyis important in biomarker discovery and validation. Thereceiver operating characteristic (ROC) curve is a graphicalplot for assessing the performance of a classifier or predictorthat can be used to test the sensitivity and specificity ofdiagnostic biomarkers. In this study, we describe PanelCom-poser, a Web-based software tool that uses statistical resultsfrom proteomic expression data and validates biomarkercandidates based on ROC curves and the area under theROC curve (AUC) values using a logistic regression modeland provides an ordered list that includes ROC graphs andAUC values for proteins (individually or in combination). Thistool allows users to easily compare and assess the effectivenessand diagnostic efficiency of single or multiprotein biomarker candidates. PanelComposer is available publicly at http://panelcomposer.proteomix.org/ and is compatible with major Web browsers.

KEYWORDS: biomarker evaluation, protein expression, multivariate analysis, panel, bioinformatics

■ INTRODUCTION

A disease biomarker is defined as a molecular signature thatreflects different stages of disease before or after treatment.Biomarkers are often used to monitor prognosis duringtreatment.1 The goal of biomarker discovery is to developnon-invasive tests that permit early disease detection, patientclassification, and disease progression or recurrence monitor-ing.2 To better identify biomarkers, the efficacy of adifferentially expressed protein in patients with a specificdisease compared to that in healthy individuals should bethoroughly evaluated. In addition, the diagnostic performanceof the differentially expressed proteins should be compared tothat of other known marker proteins.3

The receiver operating characteristic (ROC) curve isgenerally used as a statistical method to assess the performanceof a binary classifier that can distinguish two categories, such asdisease versus nondisease or moderate versus severe diseasestates.4 The ROC curve is created by plotting the sensitivity ofa test versus its specificity and can illustrate the efficiency of atest at various cutoff points. ROC curves have also been shownto be useful for determining the optimal threshold for a giventest. For classifiers or biomarkers, the threshold can be definedas a criterion to distinguish one group from another. The areaunder the ROC curve (AUC) is the most widely used measureof biomarker performance. The AUC allows assessment of theperformance of each test by means of a single measurement

and, therefore, is often used to compare the performance of twoor more tests.5,6

Single component (protein) biomarkers have been widelyused for disease diagnosis (e.g., α-fetoprotein for hepatocellularcarcinoma (HCC) and prostate-specific antigen for prostatecancer); however, some limitations exist in their performanceand applicable range in providing accurate assessments ofdisease states. The combination of several proteins into abiomarker panel has been proposed as a means to overcomethese limitations, as this strategy may increase the diagnosticefficiency compared to that of a single protein.7−10

Several different methods have been reported for determin-ing biomarker panel composition, such as logistic regres-sion,4,11,12 random forest,13−15 and machine learning algo-rithms.16,17 To combine multiple proteins into a single panel,we employed a logistic regression model because it is easy tointerpret in its linear form.18,19 Unlike other methods, logisticregression also does not require the data set to be normallydistributed or linearly related, nor does it require equal variancewithin each group.18 Further, logistic regression models can beeasily updated when new or additional data sets becomeavailable.19 Logistic regression can model numerical orcategorical values into a single variable ranging from 0 to 1

Received: May 12, 2012Published: November 9, 2012

Technical Note

pubs.acs.org/jpr

© 2012 American Chemical Society 6277 dx.doi.org/10.1021/pr3004387 | J. Proteome Res. 2012, 11, 6277−6281

Page 2: PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates

to evaluate the performance of a panel of several candidateproteins by using the ROC methodology.9,10,20

In our previous work, we experienced many difficulties instatistical analyses related to biomarker panel construction.21,22

Several ROC software packages are available; however, they arenot freeware,23−25 easy to use,25 or capable of analyzingpanels.25,26 In this paper, we present PanelComposer, a novelWeb-based tool for composing biomarker panels. Wedemonstrate its usefulness for evaluating candidate biomarkersthat can distinguish HCC from other liver-related diseases as asample case. PanelComposer both utilizes statistical resultsfrom proteomic expression data and validates biomarkercandidates based on ROC curves and AUC values by using alogistic regression model.

■ METHODS

We analyzed a sample data set containing expression dataobtained by multiple reaction monitoring (MRM) for vitaminD binding protein (VDBP), ceruloplasmin (CP), andapolipoprotein A-1 (ApoA1). We used human plasma isolatedfrom healthy controls and patients with various diseaseconditions (liver diseases, HCC, and other cancers; seeSupplementary Table 1 for more details). MRM analysis wasperformed using Acquity UPLC (Waters, Milford, MA) and4000 QTRAP LC-MSMS system (AB-SCIEX, Framingham,MA) following the MIDAS data acquisition workflow.27

Quantitative analysis of MRM data was performed usingMultiQuant (AB-SCIEX, version 1.1). PanelComposer wasimplemented on a three-tiered architecture (i.e., Web interface,

logistic regression, and ROC analyzer). After the logisticregression model was used to combine multiple biomarkersinto one panel, a stepwise selection method to automaticallyfind an optimal biomarker panel was implemented using the“glm” module of the R software package (version 2.9.2). A Java-based (version 1.6.0) program was written to plot ROC curves,calculate the respective AUC values, perform cross-validation(CV), and obtain other statistical results. Details on thesemethods related to data acquisition, software implementation,and data analysis and the corresponding data sets are describedin Supporting Information.

■ RESULTS AND DISCUSSION

Data Analysis Flow and User Interface

PanelComposer performs several steps when analyzingexpression data to assess the effectiveness of a biomarkercandidate (Figure 1). First, expression data and supplementaryinformation including gene or protein names and diagnosticcategories are submitted to the analysis. Expression data mustbe input in a comma-separated values (CSV) file formatsatisfying some requirements (see Supplementary Figure 1).Next, the user designates positive and negative categories,according to the disease state. Examples of positive andnegative groups are disease versus nondisease, cancerous versusnormal samples, or moderate versus severe disease onset. Morethan one category can be selected as positive or negative (e.g.,normal, hepatitis, and cirrhosis can all be selected as negativeand HCC chosen as positive). Third, for the internal validation,the user can select one of three CV methods. ROC graphs are

Figure 1. PanelComposer interface. (1) Submit datasheet: expression data arranged in a compatible format is submitted. (2) Select positive andnegative groups. (3) Set cross-validation method. (4) Compute ROCs, AUCs, and p values. (5) AUC and ROC results: results are displayed in atable and ROC graph. (6) Combine proteins in panel: two or more proteins can be easily combined to determine their effectiveness in a panel. (7)Perform a different analysis: positive and negative groups can easily be redefined to yield updated ROC graphs for the selected biomarker candidates.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr3004387 | J. Proteome Res. 2012, 11, 6277−62816278

Page 3: PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates

plotted on the basis of the expression data and selectedcategories, and the AUC value and p value are calculated toenable evaluation of each biomarker candidate. The p valuesfrom Mann−Whitney U test were used to test whether proteinexpression levels were significantly different in two categories(positive vs negative). After testing the performance of eachprotein, if more than one protein is evaluated (i.e., a panel),ROC analysis can be performed following a composition stepthrough logistic regression, and the user can select amulticomponent panel combined by logistic regression andevaluated on the basis of ROC and AUC.ROC graphs and their AUC values for specific biomarker

candidates are produced on the basis of the selected positivesand negatives and subsequently represented in tabular form.The resultant table lists the gene or protein names, AUC values,and other related information (Supplementary Figure 2A).Moreover, potential biomarkers are tabulated in descending

order according to their AUC value. The user can select two ormore proteins from the table for inclusion in a biomarker panel.If the user prefers an automatic selection of the biomarkerpanel among all proteins, PanelComposer recommends anoptimal biomarker panel candidate through a stepwise selectionmethod based on logistic regression.The results from the performance evaluation of each panel

are also depicted as a ROC graph and AUC value. The ROCgraph and statistical table display the sensitivity (%), 100-specificity (%), 95% confidence interval (CI), and Y-index forthe expression of each sample group tested (SupplementaryFigure 2B). Y-index is useful for selecting an optimal cutoff (thepoint at which the Y-index reaches its maximum value), and itmeasures the performance of a diagnostic test.28 Proteins withhigh AUC values typically have high Y-index values. For panelanalysis, additional information is also provided. For instance,pairwise comparison between a panel of biomarkers and the

Table 1. Representative Cases of the PanelComposer Application

protein name

casea category ApoA1 VDBP CP ApoA1/VDBP ApoA1/VDBP/CP

case A p valueb 0.03 0.085 0.171 <0.01 <0.01AUC (CI of AUC) 0.669 (0.663, 0.675) 0.634 (0.627, 0.640) 0.606 (0.600, 0.612) 0.791 (0.786, 0.796) 0.792 (0.786, 0.797)sensitivity (%, CI) under 99%specificity

4.56 (4.08, 5.03) 2.73 (2.48, 2.98) 12.2 (11.83, 12.57) 2.73 (2.48, 2.98) 2.67 (2.39, 2.94)

sensitivity (%, CI) under 95%specificity

18.98 (18.2, 19.75) 2.67 (2.39, 2.94) 14.44 (13.9, 14.99) 28.58 (27.9, 29.25) 16.6 (16.11, 17.09)

sensitivity (%, CI) under 90%specificity

23.64 (22.84, 24.45) 16.42 (15.91, 16.94) 23.69 (23.16, 24.21) 54.64 (53.96, 55.33) 54.44 (53.68, 55.21)

sensitivity (%, CI) under 80%specificity

42.33 (41.48, 43.18) 21.47 (20.79, 22.14) 33.36 (32.61, 34.1) 71.42 (70.71, 72.13) 73.82 (73.25, 74.39)

highest Y-index 28.62 33.85 26.26 59.02 59.75ApoA1/VDBP/CP vsc <0.01 <0.01 <0.01 0.994ApoA1/VDBP vsd <0.01 <0.01 <0.01

case B p valueb <0.01 0.301 <0.01 <0.01 <0.01AUC (CI of AUC) 0.792 (0.789, 0.796) 0.549 (0.545, 0.553) 0.672 (0.668, 0.676) 0.812 (0.809, 0.816) 0.810 (0.807, 0.814)sensitivity (%, CI) under 99%specificity

11.62 (11.24, 12.01) 0.0 (0.0, 0.0) 8.18 (7.9, 8.46) 7.31 (7.09, 7.53) 9.13 (8.88, 9.38)

sensitivity (%, CI) under 95%specificity

38.33 (37.73, 38.94) 11.67 (11.22, 12.11) 19.87 (19.41, 20.32) 11.4 (10.97, 11.83) 15.44 (15.12, 15.77)

sensitivity (%, CI) under 90%specificity

49.96 (49.38, 50.53) 15.31 (14.85, 15.77) 28.09 (27.65, 28.53) 46.84 (46.24, 47.45) 48.98 (48.43, 49.53)

sensitivity (%, CI) under 80%specificity

68.51 (67.97, 69.05) 34.73 (34.12, 35.34) 52.09 (51.59, 52.59) 75.07 (74.57, 75.56) 70.78 (70.23, 71.32)

highest Y-index 53.09 19.38 36.23 55.11 54.73ApoA1/VDBP/CP vsc <0.01 <0.01 <0.01 0.506ApoA1/VDBP vsd <0.01 <0.01 <0.01

case C p valueb 0.695 0.023 0.379 0.016 <0.01AUC (CI of AUC) 0.527 (0.522, 0.532) 0.636 (0.630, 0.641) 0.553 (0.547, 0.558) 0.644 (0.639, 0.649) 0.719 (0.714, 0.725)sensitivity (%, CI) under 99%specificity

0.0 (0.0, 0.0) 0.0 (0.0, 0.0) 7.36 (6.97, 7.74) 19.07 (18.46, 19.67) 16.51 (16.07, 16.95)

sensitivity (%, CI) under 95%specificity

3.93 (3.68, 4.18) 5.33 (5.06, 5.61) 11.91 (11.41, 12.41) 23.96 (23.38, 24.53) 18.84 (18.33, 19.36)

sensitivity (%, CI) under 90%specificity

8.0 (7.52, 8.48) 13.0 (12.64, 13.36) 19.04 (18.48, 19.61) 30.82 (30.18, 31.47) 40.51 (39.77, 41.26)

sensitivity (%, CI) under 80%specificity

15.42 (14.2, 16.64) 27.98 (27.38, 28.57) 23.62 (23.0, 24.25) 47.4 (46.65, 48.15) 47.47 (46.65, 48.28)

highest Y-index 12.16 28.82 14.62 32.67 35.89ApoA1/VDBP/CP vsc <0.01 <0.01 <0.01 <0.01ApoA1/VDBP vsd <0.01 0.04 <0.01

aCase A: normal vs cirrhosis. Case B: normal, cirrhosis, and chronic hepatitis vs HCC, cholangiocarcinoma, stomach cancer, and pancreatic cancer.Case C: HCC vs cholangiocarcinoma, stomach cancer, and pancreatic cancer. bThe p values as determined by the Mann−Whitney U test forassessing differences in expression levels between the positive and negative groups. cThe p values as determined by the Mann−Whitney U test forassessing whether use of the ApoA1/VDBP/CP AUC distribution differs significantly in all cases. dThe p values as determined by the Mann−Whitney U test for assessing whether use of the ApoA1/VDBP AUC distribution differs significantly in all cases.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr3004387 | J. Proteome Res. 2012, 11, 6277−62816279

Page 4: PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates

individual proteins is performed using the Mann−Whitney Utest and represented with a p value (Supplementary Figure 2C).Furthermore, to identify the most suitable biomarkercandidates, the sensitivity of all panels and single proteinsunder a fixed specificity can be analyzed. Table 1 summarizesthe results obtained from each protein and all panels andprovides meaningful specificity values (e.g., 99, 95, 90, 80, and70% specificity). This would also be useful for making crosscomparisons with several fixed points among all of the proteinsand panels (Table 1 and Supplementary Figure 2D). If there isa certain level of specificity as a criterion established for a usefulbiomarker, this analysis result may offer a basis for selecting thebest biomarker candidate with the highest sensitivity whilesatisfying the established criterion. For example, for abiomarker to distinguish HCC from cirrhosis, when we donot have any criterion on the sensitivity or specificity, theApoA1/VDBP panel has a higher AUC value than ApoA1.However, when we have such criteria for the specificity of thebiomarker, which is 99%, ApoA1 is better than the ApoA1/VDBP panel. At a specificity of 99%, the sensitivities of ApoA1and the ApoA1/VDBP panel are 4.56 and 2.73%, respectively(Supplementary Figure 2D). Finally, we provide the accuracy ofthe model measured by CV as well as the Pearson’s correlationcoefficient to guide the selection of proteins for panelcomposition (Supplementary Figure 2E).

Use of PanelComposer To Determine a Biomarker Panelfor HCC

We used PanelComposer to evaluate the performance of threeproteins that are expressed in liver tissue and secreted into theblood as biomarkers for HCC.29 The relative expression levelsof VDBP, CP, and ApoA1 were determined by MRM for eachgroup of patients, which included those with liver cirrhosis,chronic hepatitis, HCC, cholangiocarcinoma, stomach cancer,and pancreatic cancer (see Supplementary Table 2). Threedifferent cases were assessed in our study. To identify abiomarker for HCC, we compared protein levels in samplesfrom patients with HCC and cirrhosis (case A). As a proof-of-concept for PanelComposer and its ability to handlecomplicated cases easily without modification of input data,we compared noncancerous (i.e., normal, cirrhosis, and chronichepatitis) versus cancerous samples (i.e., HCC, cholangiocarci-noma, stomach cancer, and pancreatic cancer; case B) andcompared HCC samples to samples for other cancers (i.e.,cholangiocarcinoma, stomach cancer, and pancreatic cancer;case C). The 10-fold CV, which is commonly used in manystudies,30 was used for all three cases. Our analysis of HCCversus cirrhosis patients revealed that the AUC value of ApoA1(0.67) was slightly greater than those of VDBP (0.63) and CP(0.61), indicating that ApoA1 may be a better diagnosticmarker than VDBP and CP for distinguishing HCC fromcirrhosis (the results from each case are summarized in Table1). According to the p value, only ApoA1 expression differedsignificantly (p value <0.05) in HCC versus cirrhosis samples.These data are consistent with published reports identifyingApoA1 as a potential biomarker for HCC.31−33 ApoA1 alsoexhibited a much higher AUC value than VDBP or CP (0.79 vs0.55 and 0.67 for VDBP and CP, respectively) in our analysis ofnoncancerous versus cancerous samples. However, thecomparison of HCC to other cancers revealed that VDBPhad a better AUC value than ApoA1 or CP (0.64 vs 0.53 and0.55 for ApoA1 and CP, respectively).

These results suggest that ApoA1 may be more reliable fordistinguishing HCC from cirrhosis but not from other cancers(i.e., cholangiocarcinoma, stomach cancer, and pancreaticcancer). Nevertheless, the expression of VDBP, which yieldeda p value of 0.02, was significantly different between HCC andother cancers. When ApoA1 and VDBP were combined intoone panel, the test performance improved in all cases (Table 1).Therefore, we tested whether an ApoA1/VDBP panel wouldprovide better predictive results than each protein alone. TheMann−Whitney U test results indicated that the ApoA1/VDBPpanel has better predictive results than the individual proteins,suggesting that assessing the expression of both biomarkers in apanel yields better results in all cases (p value <0.05).Interestingly, combining all three proteins did not always

produce better results than the ApoA1/VDBP panel. Forexample, significant differences between the ApoA1/VDBP/CPand ApoA1/VDBP panels were not detected in AUC values inour analyses of HCC versus cirrhosis or noncancerous versuscancerous samples. Taken together, the HCC exampleillustrated the effectiveness and convenience of PanelComposerfor designing a biomarker panel.

■ CONCLUSIONSAssessing the performance of biomarker candidates is criticalfor biomarker discovery and requires extensive, rigorousstatistical analysis. We implemented logistic regression modelsto calculate the predictive value of biomarker panels byevaluating the components individually or in combination,using ROC and AUC values. These analyses were conducted tocompare patients with different liver diseases against patientswith HCC. PanelComposer is an important tool for evaluatingthe performance of biomarker candidates under variousconditions. Further, PanelComposer allows users to defineand redefine positive and negative subgroups through a simpleWeb interface and easily combine several proteins into onepanel for evaluation. The results produced by this tool arecomprehensive and easy to interpret. Thus, PanelComposer iswidely applicable to any type of disease model and biomarkerdiscovery process.

■ ASSOCIATED CONTENT*S Supporting Information

Supplementary Figure 1. Example and description of aPanelComposer-compatible data set. Supplementary Figure 2.PanelComposer outputs for HCC biomarker determination.Supplementary Table 1. Sample information in detail.Supplementary Table 2. Sample data set used for this study.This material is available free of charge via the Internet athttp://pubs.acs.org.

■ AUTHOR INFORMATIONCorresponding Author

*Phone: +82-2-2123-4242. Fax: +82-2-393-6589. E-mail:[email protected] or [email protected]

The authors declare no competing financial interest.

■ ACKNOWLEDGMENTSWe thank to Kyung-Sik Kim and Si-Young Song at YonseiUniversity College of Medicine for supplying HCC specimensfor other related projects. This study was supported by a grants

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr3004387 | J. Proteome Res. 2012, 11, 6277−62816280

Page 5: PanelComposer: A Web-Based Panel Construction Tool for Multivariate Analysis of Disease Biomarker Candidates

from MediStar (A112047 to S.-K.J.), the National Project forPersonalized Genomic Medicine (A111218-11 to Y.-K.P.), theNational R&D Program for Cancer Control, Ministry of Healthand Welfare (1120200 to Y.-K.P.), and World Class Universitygrant (R31-2008-000-10086-0).

■ REFERENCES(1) Paik, Y. K.; Kim, H.; Lee, E. Y.; Kwon, M. S.; Cho, S. Y. Overviewand Introduction to Clinical Proteomics. Methods Mol. Biol. 2008, 428,1−31.(2) Alizadeh, A. A.; Eisen, M. B.; Davis, R. E.; Ma, C.; Lossos, I. S.;et al. Distinct Types of Diffuse Large B-Cell Lymphoma Identified byGene Expression Profiling. Nature 2000, 403, 503−511.(3) Hanash, S. M.; Pitteri, S. J.; Faca, V. M. Mining the PlasmaProteome for Cancer Biomarkers. Nature 2008, 452, 571−579.(4) Zweig, M. H.; Campbell, G. Receiver-Operating Characteristic(ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine.Clin. Chem. 1993, 39, 561−577.(5) Obuchowski, N. A. Fundamentals of Clinical Research forRadiologists. Am. J. Roentgenol. 2005, 184, 364−372.(6) Shultz, E. K. Multivariate Receiver-Operating CharacteristicCurve Analysis: Prostate Cancer Screening as an Example. Clin. Chem.1995, 41, 1248−1255.(7) Lundberg, M.; Thorsen, S. B.; Assarsson, E.; Villablanca, A.; Tran,B. et al. Multiplexed Homogeneous Proximity Ligation Assays forHigh-Throughput Protein Biomarker Research in Serological Material.Mol. Cell. Proteomics 2012, 10, DOI: M110.004978.(8) Schwamborn, K.; Gaisa, N. T.; Henkel, C. Tissue and SerumProteomic Profiling for Diagnostic and Prognostic Bladder CancerBiomarkers. Expert Rev. Proteomics 2010, 7, 897−906.(9) Chatterjee, M.; Mohapatra, S.; Ionan, A.; Bawa, G.; Ali-Fehmi, R.;et al. Diagnostic Markers of Ovarian Cancer by High-ThroughputAntigen Cloning and Detection on Arrays. Cancer Res. 2006, 66,1181−1190.(10) Lu, H.; Goodell, V.; Disis, M. L. Humoral Immunity Directedagainst Tumor-Associated Antigens As Potential Biomarkers for theEarly Diagnosis of Cancer. J. Proteome Res. 2008, 7, 1388−1394.(11) Wang, M.; Long, R. E.; Comunale, M. A.; Junaidi, O.; Marrero,J.; et al. Novel fucosylated Biomarkers for the Early Detection ofHepatocellular Carcinoma. Cancer Epidemiol., Biomarkers Prev. 2009,18, 1914−1921.(12) Rom, W. N.; Goldberg, J. D.; Addrizzo-Harris, D.; Watson, H.N.; Khilkin, M.; et al. Identification of an Autoantibody Panel ToSeparate Lung Cancer from Smokers and Nonsmokers. BMC Cancer2010, 10, 234.(13) Wiseman, S. M.; Melck, A.; Masoudi, H.; Ghaidi, F.; Goldstein,L.; et al. Molecular Phenotyping of Thyroid Tumors Identifies aMarker Panel for Differentiated Thyroid Cancer Diagnosis. Ann. Surg.Oncol. 2008, 15, 2811−2826.(14) Anderson, K. S.; Sibani, S.; Wallstrom, G.; Qiu, J.; Mendoza, E.A.; et al. Protein Microarray Signature of Autoantibody Biomarkers forthe Early Detection of Breast Cancer. J. Proteome Res. 2010, 10, 85−96.(15) Farlow, E. C.; Patel, K.; Basu, S.; Lee, B. S.; Kim, A. W.; et al.Development of a Multiplexed Tumor-Associated Autoantibody-BasedBlood Test for the Detection of Non-Small Cell Lung Cancer. Clin.Cancer Res. 2010, 16, 3452−3462.(16) Wang, X.; Yu, J.; Sreekumar, A.; Varambally, S.; Shen, R.; et al.Autoantibody Signatures in Prostate Cancer. N. Engl. J. Med. 2005,353, 1224−1235.(17) Hudson, M. E.; Pozdnyakova, I.; Haines, K.; Mor, G.; Snyder,M. Identification of Differentially Expressed Proteins in OvarianCancer Using High-Density Protein Microarrays. Proc. Natl. Acad. Sci.U.S.A. 2007, 104, 17494−17499.(18) Ayer, T.; Chhatwal, J.; Alagoz, O.; Kahn, C. E. Informatics inRadiology: Comparison of Logistic Regression and Artificial NeuralNetwork Models in Breast Cancer Risk Estimation. Radiographics2010, 30, 13−22.

(19) Bewick, V.; Cheek, L.; Ball, J. Statistics Review 14: LogisticRegression. Crit. Care 2005, 112−118.(20) Muller, R.; Mockel, M. Logistic Regression and CART in theAnalysis of Multimarker Studies. Clin. Chim. Acta 2008, 394, 1−6.(21) Kong, D. H.; Jung, J. W.; Na, K.; Jeong, S. K.; Paik, Y. K.; et al.Normalization Using a Tagged-Internal Standard Assay for Analysis ofAntibody Arrays and the Evaluation of Serological Biomarkers forLiver Disease. Anal. Chim. Acta 2012, 718, 92−98.(22) Na, K.; Lee, E. Y.; Lee, H. J.; Kim, K. Y.; Lee, H.; et al. HumanPlasma Carboxylesterase 1, a Novel Serologic Biomarker Candidate forHepatocellular Carcinoma. Proteomics 2009, 9, 3989−3999.(23) Sonego, P.; Kocsor, A.; Pongor, S. ROC Analysis: Applicationsto the Classification of Biological Sequences and 3D Structures.Briefings Bioinf. 2008, 9, 198−209.(24) Stephan, C.; Wesseling, S.; Schink, T.; Jung, K. Comparison ofEight Computer Programs for Receiver-Operating CharacteristicAnalysis. Clin. Chem. 2003, 49, 433−439.(25) Vergara, I. A.; Norambuena, T.; Ferrada, E.; Slater, A. W.; Melo,F StAR: A Simple Tool for the Statistical Comparison of ROC Curves.BMC Bioinf. 2008, 9, 265.(26) Eng J. ROC Analysis: Web-Based Calculator for ROC Curves;http://www.jrocfit.org (accessed April 30, 2012).(27) Unwin, R. D.; Griffiths, J. R.; Whetton, A. D. A Sensitive MassSpectrometric Method for Hypothesis-Driven Detection of PeptidePost-translational Modifications: Multiple Reaction Monitoring-Initiated Detection and Sequencing (MIDAS). Nat. Protoc. 2009, 4,870−877.(28) Youden, W. J. Index for Rating Diagnostic Tests. Cancer 1950,3, 32−35.(29) Safran, M.; Dalah, I.; Alexander, J.; Rosen, N.; Iny Stein, T.;et al. GeneCards Version 3: The Human Gene Integrator. Database2010, baq020.(30) Smit, S.; Hoefsloot, H. C.; Smilde, A. K. Statistical DataProcessing in Clinical Proteomics. J. Chromatogr., B: Anal. Technol.Biomed. Life Sci. 2007, 866, 77−88.(31) Qiu, J. G.; Fan, J.; Liu, Y. K.; Zhou, J.; Dai, Z.; et al. Screeningand Detection of Portal Vein Tumor Thrombi-Associated Serum LowMolecular Weight Protein Biomarkers in Human HepatocellularCarcinoma. J. Cancer Res. Clin. Oncol. 2008, 134, 299−305.(32) Fernandez-Irigoyen, J.; Santamaría., E.; Sesma, L.; Munoz, J.;Riezu, J. I.; et al. Oxidation of Specific Methionine and TryptophanResidues of Apolipoprotein A-I in Hepatocarcinogenesis. Proteomics2005, 5, 4964−4972.(33) Steel, L. F.; Shumpert, D.; Trotter, M.; Seeholzer, S. H.; Evans,A. A.; et al. A Strategy for the Comparative Analysis of SerumProteomes for the Discovery of Biomarkers for HepatocellularCarcinoma. Proteomics 2003, 3, 601−609.

Journal of Proteome Research Technical Note

dx.doi.org/10.1021/pr3004387 | J. Proteome Res. 2012, 11, 6277−62816281