Receiver Operating Characteristic Curve Explorer and Tester

Receiver Operating Characteristic Curve Explorer and Tester (ROCCET) is an open-access web server for performing biomarker analysis using ROC (Receiver Operating Characteristic) curve analyses on metabolomic data sets. ROCCET is designed specifically for performing and assessing a standard binary classification test (disease vs. control). ROCCET accepts metabolite data tables, with or without clinical/observational variables, as input and performs extensive biomarker analysis and biomarker identification using these input data. It operates through a menu-based navigation system that allows users to identify or assess those clinical variables and/or metabolites that contain the maximal diagnostic or class-predictive information. ROCCET supports both manual and semi-automated feature selection and is able to automatically generate a variety of mathematical models that maximize the sensitivity and specificity of the biomarker(s) while minimizing the number of biomarkers used in the biomarker model. ROCCET also supports the rigorous assessment of the quality and robustness of newly discovered biomarkers using permutation testing, hold-out testing and cross-validation.

Background – ROC curves in biomarker discovery
Biomarkers are commonly defined as measured characteristics that may be used as indicators of some biological state or condition. They may be genes, chemicals, proteins, physiological parameters, imaging data or histological measurements. Biomarkers can consist of single components (i.e. blood glucose) or multiple components (a biomarker panel such as acylcarnitines). Medical biomarkers fall into 5 major categories: 1) diagnostic (used to identify if you have a disease or condition); 2) prognostic (used to determine how well you will do with the disease or condition); 3) predictive (used to determine if you may get the disease); 4) efficacy or monitoring (used to determine how well a drug or treatment is doing in fighting the disease) and 5) exposure (used to determine if you have been exposed to a drug, food, toxin or other kind of substance). Good biomarkers should exhibit good sensitivity (the fraction of correctly identified true positives) and good specificity (the fraction of correctly identified true negatives). A perfect biomarker or biomarker panel would be 100% sensitive (predict all people in the sick group as being sick) and 100% specific (not predicting anyone from the healthy group as being sick).  However, since few things in life are perfect, there is often a trade-off between sensitivity and specificity. In medical biomarker studies it is becoming increasingly common to report this tradeoff in sensitivity and specificity using a Receiver Operating Characteristic (ROC) curve. ROC curves plot the sensitivity of a biomarker on the y axis, against the false discovery rate (1- specificity) on the x axis. An image of different ROC curves is shown in Figure 1. ROC curves provide a simple visual method for one to determine the boundary limit (or the separation threshold) of a biomarker or a combination of biomarkers for the optimal combination of sensitivity and specificity. The AUC (area under the curve) of the ROC curve reflects the overall accuracy and the separation performance of the biomarker (or biomarkers), and can be readily used to compare different biomarker combinations or models. As a rule of thumb, the fewer the biomarkers that one uses to maximize the AUC of the ROC curve, the better.

Metabolomics
ROCCET’s ROC curve generation and analysis is specifically tailored for metabolomics datasets. Metabolomics data sets produced by high throughput analytical chemistry techniques typically consist of large matrices containing multiple values for multiple samples. The comparison between groups or subsets of samples within the data usually involves statistical procedures employing univariate analysis and multivariate analysis such as Partial Least Squares - Discriminant Analysis (PLS-DA) or machine learning classification procedures such as Support Vector Machine (SVM). As a result, ROCCET offers two different kinds of analytical modules – a univariate module and a multivariate module. In the univariate module single variables are evaluated (by a t-test) and ranked for their separation performance (i.e. the AUC of the ROC), including confidence intervals (CI) and a computed optimal threshold. In the multivariate module one can choose between three different techniques – SVM (support vector machine), PLS-DA (partial least squares discriminant analysis) and Random Forests for classifying and selecting metabolites or clinical variables for an optimal ROC performance. The resulting analysis produces the top-performing multi-variable model(s) based on their ROC curve characteristics. This module also presents the significant variables (clinical variables and/or metabolites) contributing to the model (via “ROC explorer”). ROCCET also supports an option to manually select specific variables to be included in a given biomarker model. These variables can be analyzed using “ROC tester”. ROCCET also supports the rigorous assessment of the quality and robustness of newly discovered biomarkers or biomarker panels using permutation testing, hold-out testing and cross-validation. ROCCET generates a variety of colorful, journal-ready graphs and tables (see Figures 1 and 2) and supports the downloading of all generated files including tables (CSV format), graphs (PNG or PDF) and the processing history as an R file (which can be read as a simple text file). A tutorial for using ROCCET is offered in the following reference. Training datasets are available on ROCCET website for experimenting with the tool.