Open Access

Visualization and Comparison of Single and Combined Parametric and Nonparametric Discriminant Methods for Leukemia Type Recognition Based on Gene Expression


A gene expression data set, containing 3051 genes and 38 tumor mRNA training samples, from a leukemia microarray study, was used for differentiation between ALL and AML groups of leukemia. In this paper, single and combined discriminant methods were applied on the basis of the selected few most discriminative variables according to Wilks’ lambda or the leave-one-out error of first nearest neighbor classifier. For the linear, quadratic, regularized, uncorrelated discrimination, kernel, nearest neighbor and naive Bayesian classifiers, two-dimensional graphs of the boundaries and discriminant functions for diagnostics are presented. Cross-validation and leave-one-out errors were used as measures of classifier performance to support diagnosis coming from this genomic data set. A small number of best discriminating genes, from two to ten, was sufficient to build discriminant methods of good performance. Especially useful were nearest neighbor methods. The results presented herein were comparable with outcomes obtained by other authors for larger numbers of applied genes. The linear, quadratic, uncorrelated Bayesian and regularized discrimination methods were subjected to bagging or boosting in order to assess the accuracy of the fusion. A conclusion drawn from the analysis was that resampling ensembles were not beneficial for two-dimensional discrimination.

