<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1748-7188-5-14</ui><ji>1748-7188</ji><fm>
<dochead>Research</dochead>
<bibl>
<title>
<p>ANMM4CBR: a case-based reasoning method for gene expression data classification</p>
</title>
<aug>
<au id="A1"><snm>Yao</snm><fnm>Bangpeng</fnm><insr iid="I1"/><email>ybp02@mails.tsinghua.edu.cn</email></au>
<au ca="yes" id="A2"><snm>Li</snm><fnm>Shao</fnm><insr iid="I1"/><email>shaoli@mail.tsinghua.edu.cn</email></au>
</aug>
<insg>
<ins id="I1"><p>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PR China</p></ins>
</insg>
<source>Algorithms for Molecular Biology</source>
<issn>1748-7188</issn>
<pubdate>2010</pubdate>
<volume>5</volume>
<issue>1</issue>
<fpage>14</fpage>
<url>http://www.almob.org/content/5/1/14</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1748-7188-5-14</pubid><pubid idtype="pmpid">20051140</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>4</day><month>8</month><year>2009</year></date></rec><acc><date><day>6</day><month>1</month><year>2010</year></date></acc><pub><date><day>6</day><month>1</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Yao and Li; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms.</p>
</sec>
<sec>
<st>
<p>Method</p>
</st>
<p>In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and <it>k </it>nearest neighbor (<it>k</it>NN), especially when the data contains a high level of noise.</p>
</sec>
<sec>
<st>
<p>Availability</p>
</st>
<p>The source code is attached as an additional file of this paper.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Recently gene microarray technology has become a fundamental tool in biomedical research, enabling us to simultaneously observe the expression of thousands of genes on the transcriptional level. Two typical problems that researches want to solve using microarray data are: (1) discovering informative genes for classification based on different cell-types or diseases <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>; (2) clustering and arranging genes according to their similarity in expression patterns <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. Here we focus on the former, especially on microarray classification using gene expression data, which has attracted extensive attentions in the last few years. It is believed that gene expression profiling could be a precise and systematic approach for cancer diagnosis and clinical-outcome prediction <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>.</p>
<p>With about ten years of research, many algorithms have been applied to microarray classification, such as nearest neighbor (NN) <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>, artificial neural networks <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>, boosting <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>, support vector machine (SVM) <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, etc. Many commonly used classifiers are rule-based or statistical-based. One challenge of these methods on microarray data is the small sample size problem. With the limited number of training samples, it is difficult to obtain domain knowledge for rule-based systems or get accurate parameters (such as mean value and standard deviation) for statistical-based approaches.</p>
<p>Other than adopting rule-based or statistical-based classification methods, in this paper we use a case-based reasoning (CBR) <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp> approach to design a robust microarray classifier. CBR usually requires much less domain knowledge than rule-based or statistical-based systems, because it does not heavily rely on the statistical assumptions on the data during the classification procedure. It maintains a case-base of previous problems and their solutions, and solves new problems by reference to this case-base. NN can be viewed as the simplest form of CBR methods. With a complicated comparative study, in <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp> it was concluded that NN performed better compared with more sophisticated ones. Moreover, <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp> observed that CBR is particularly useful for applications in life sciences, where we lack sufficient knowledge either for formal representation or for parameter estimation. <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp> reviewed previous research works in applying CBR to bioinformatics domains. In the problem of microarray classification, however, except the simplest form NN, CBR classifiers were considered in only a few literatures <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
</abbrgrp> and was only tested on some simple data sets.</p>
<p>In order to design an effective classifier, dimension of the microarray data should be reduced. Of the thousands of genes in a microarray data, only a small fraction are informative from the aspect of biological meaning or classification performance <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp>. In this work we propose a novel additive nonparametric margin maximum (ANMM) method for feature selection. Three properties determine ANMM's superiority in feature selection for microarray data: (1) ANMM is a nonparametric method which requires less restrictive assumptions about the original data, and thus is suitable for dealing with microarray data <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. (2) The feature reduction criterion for ANMM is defined based on gene pre-selection and sample clustering, which renders ANMM insensitive to outliers or mislabeled samples. (3) There exist some relationships between ANMM and CBR, and therefore the performance of CBR classification can be improved by ANMM feature selection.</p>
<p>Using ANMM for feature selection and CBR for classification, a novel ANMM4CBR method is established in this paper. The performance of ANMM4CBR is tested on one simulated data and four publicly available data sets, comparing with some well-known methods including SVM, <it>k</it>NN and LogitBoost, as well as the other CBR methods that have been applied to microarray classification. We show that ANMM4CBR can result in exciting classification results, especially on the data which contains a high level of noise.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Overview of ANMM4CBR</p>
</st>
<p>In a microarray data classification problem, we are given <it>N </it>training samples <inline-formula>
<graphic file="1748-7188-5-14-i1.gif"/>
</inline-formula>, where <it>x</it>
<sub>
<it>i </it>
</sub>is an <it>M</it>-dimensional vector in the feature space and <it>y</it>
<sub>
<it>i </it>
</sub>&#8712; {0, &#8943; <it>K </it>- 1} is the class label. The set of samples in the <it>k</it>th class are denoted as <it>&#969;</it>
<sub>
<it>k</it>
</sub>, <it>i.e. x</it>
<sub>
<it>i </it>
</sub>&#8712; <it>&#969;</it>
<sub>
<it>k </it>
</sub>means <it>y</it>
<sub>
<it>i </it>
</sub>= <it>k</it>. The genes are denoted as <inline-formula>
<graphic file="1748-7188-5-14-i2.gif"/>
</inline-formula>, where <it>&#981;</it>
<sub>
<it>m </it>
</sub>(<it>x</it>) is the expression value of sample <it>x </it>on the <it>m</it>th gene. The learning task is to select a subset from all the genes, and define a similarity measurement based on the selected genes. When given an unlabeled sample, we expect to predict the category of this sample using the selected genes and the defined similarity measure.</p>
<p>In this paper, we propose a CBR-based method to construct the classifier. CBR classifiers use a philosophy that plays a vital role in human decision making. They try to solve new problems by retrieving previously solved cases from a case-base. The process of solving new cases contributes new information to the system, and this new information can be used for solving other future cases. In <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>, CBR method is described in terms of four phases. In the first phase, CBR <it>retrieves </it>old cases similar to the new one. The second phase <it>reuses </it>the solutions of the retrieved cases for solving the new case. The third phase <it>revises </it>the solution, e.g. by a human. Finally, the fourth phase <it>retains </it>the useful information which is obtained when solving this case.</p>
<p>Here we focus on the <it>retrieving </it>and <it>reusing </it>phases, and propose a novel ANMM4CBR method for classification (see Figure <figr fid="F1">1</figr>). For feature selection, we develop a novel ANMM method, which additively optimizes a nonparametric margin maximum criterion. We define this criterion based on gene pre-selection and sample clustering to make it robust to noise and outliers. In our CBR classifier, each class contains one case-base. For a testing case, we retrieve similar cases from each case-base, and combine the results of all the case-bases to provide a classification label.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Framework of ANMM4CBR for microarray classification</p></caption><text>
   <p><b>Framework of ANMM4CBR for microarray classification</b>. ANMM4CBR contains two modules, ANMM for feature selection and CBR for classification. Both ANMM and CBR are suitable for dealing with microarray data, which usually contain noisy information and only a small number of training samples are available.</p>
</text><graphic file="1748-7188-5-14-1"/></fig>
<p>According to the notion of CBR, we can revise the prediction results of testing samples and then add them to the case-bases. The <it>revising </it>and <it>retaining </it>phases, however, are not the focus of this paper and will not be mentioned in the following descriptions. Details of the ANMM and CBR modules are described below.</p>
</sec>
<sec>
<st>
<p>Additive Nonparametric Margin Maximum for Feature Selection</p>
</st>
<p>Here we introduce an ANMM feature selection method, which uses an additive method to optimize a nonparametric margin maximum (NMM) criterion. The NMM criterion is defined based on <it>nearest between-class distance maximization </it>and <it>furthest within-cluster distance minimization</it>. We first describe the NMM criterion, and then present the additive optimization method.</p>
<sec>
<st>
<p>Nonparametric Margin Maximum (NMM) Criterion</p>
</st>
<p>The goal of feature selection is to identify <it>informative genes </it>from thousands of available genes. The informative genes are those that have high discriminative powers, and have low correlations between each pair of them <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. Selecting informative genes helps not only overcome the curse of dimensionality problem and thus improve the prediction accuracy, but also reveal meaningful biological explanations of the dataset. Theoretically, any wrapper or filter feature selection method, such as t-test, mutual information measurement, etc, can be used. However, one drawback of these approaches is that the feature selection criterion is designed regardless of the classifier design. In <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>, it has been observed that almost all feature selection methods have some assumptions of the distribution of the data, and these assumptions usually affect the performance of the classifiers. Therefore, it is important to design a feature selection method that is suitable for the classification method that will be used.</p>
<p>Bressan and Vitri&#224; <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp> showed that there is a close link between nonparametric discriminant analysis (NDA) <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp> and instance-based classifiers. In that work, a modified NDA was applied to improve the performance of NN for face recognition. Since CBR-based methods also belong to instance-based classifiers, we believe that the idea of NDA also helps to improve, at least not downgrade the performance of CBR. Our NMM criterion is defined based on the notion of NDA. Instead of directly using the furthest within-class distance as in the original NDA method, in our method training samples in each class <it>&#969;</it>
<sub>
<it>k </it>
</sub>are firstly grouped into many clusters {<inline-formula>
<graphic file="1748-7188-5-14-i3.gif"/>
</inline-formula>} so that the samples in each cluster have similar patterns. The objective of NMM is to maximize the between-class distance of samples while minimize the within-cluster distance. For one sample <it>x</it>
<sub>
<it>i </it>
</sub>&#8712; <it>&#969;</it>
<sub>
<it>k</it>
</sub>, we define its nearest between-class neighbor as</p>
<p>
<display-formula id="M1">
<graphic file="1748-7188-5-14-i4.gif"/>
</display-formula>
</p>
<p>Similarly, its furthest within-cluster neighbor is defined as</p>
<p>
<display-formula id="M2">
<graphic file="1748-7188-5-14-i5.gif"/>
</display-formula>
</p>
<p>where <it>C </it>[<it>x</it>] indicates the cluster that <it>x </it>belongs to.</p>
<p>Then the nonparametric margin of <it>x</it>
<sub>
<it>i </it>
</sub>is</p>
<p>
<display-formula id="M3">
<graphic file="1748-7188-5-14-i6.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1748-7188-5-14-i7.gif"/>
</inline-formula> is the nonparametric nearest between-class distance for <it>x</it>
<sub>
<it>i</it>
</sub>, and <inline-formula>
<graphic file="1748-7188-5-14-i8.gif"/>
</inline-formula> is the furthest within-cluster distance. Obviously, the larger &#920;<sub>
<it>i </it>
</sub>is, the more likely that <it>x</it>
<sub>
<it>i </it>
</sub>is correctly classified. Therefore the learning objective of NMM is to select a subset of genes <inline-formula>
<graphic file="1748-7188-5-14-i9.gif"/>
</inline-formula> from &#934; to maximize the nonparametric margin for all the samples, <it>i.e</it>. to maximize</p>
<p>
<display-formula id="M4">
<graphic file="1748-7188-5-14-i10.gif"/>
</display-formula>
</p>
<p>where <it>&#948;</it>
<sub>
<it>i </it>
</sub>is the sample <it>x</it>
<sub>
<it>i </it>
</sub>in the space of selected features, which is represented as <it>&#948;</it>
<sub>
<it>i </it>
</sub>= [<it>h</it>
<sub>1</sub>(<it>x</it>
<sub>
<it>i</it>
</sub>), &#8943;, <it>h</it>
<sub>
<it>T</it>
</sub>(<it>x</it>
<sub>
<it>i</it>
</sub>)]<sup>
<it>T</it>
</sup>.</p>
<p>Not surprisingly, we find that if each class contains only one cluster, the NMM criterion is equal to the optimization objective of NDA (see proof 1). Since it has been proved that there are close relationships between NDA and instance-based classifiers such as NN <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>, we believe that our margin maximum criterion also benefits the design of a robust CBR classifier. Moreover we replaced the furthest within-class distance with furthest within-cluster distance, which makes our approach more robust to outliers, considering that the outliers that usually exist in microarray data might make the furthest within-class distance extremely large. Another major difference between our method and NDA is that, NDA performs feature reduction by finding a weighted combination of all the features, while NMM aims at selecting a subset of features. This property is important since the selected features can be used to reveal some biological significance.</p>
<sec>
<st>
<p>Proof 1</p>
</st>
<p>The Nonparametric Margin Maximum (NMM) criterion in Equation (4) can be expanded as the following</p>
<p>
<display-formula id="M5">
<graphic file="1748-7188-5-14-i11.gif"/>
</display-formula>
</p>
<p>When each class contains only one cluster, we have</p>
<p>
<display-formula id="M6">
<graphic file="1748-7188-5-14-i12.gif"/>
</display-formula>
</p>
<p>where <it>S</it>
<sup>
<it>B </it>
</sup>and <it>S</it>
<sup>
<it>W </it>
</sup>are between-class and within-class scatter matrix for NDA respectively. Therefore we can conclude that when each class contains only one cluster,</p>
<p>
<display-formula id="M7">
<graphic file="1748-7188-5-14-i13.gif"/>
</display-formula>
</p>
<p>where the left-hand side is the NMM criterion and the right-hand side is the NDA optimization criterion.&#160;&#160;&#160;&#9633;</p>
</sec>
</sec>
<sec>
<st>
<p>Feature Pre-selection and Clustering</p>
</st>
<p>In our method, we normalize the original data and then perform feature pre-selection and sample clustering to define within-cluster neighbors. We use the same normalization method as in <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>, which includes base 10 log-transformation as well as normalization to mean 0 and variance 1. For the data that contains negative values, we do not perform log-transformation.</p>
<p>In microarray data, the gene dimension is extremely large compared to the small number of samples. Many of these genes are not differentially expressed across the samples of different classes and thus do not contain very useful information. It is likely that too many non-informative genes in the data will undermine the clustering results. In order to improve the clustering performance, we implement gene pre-selection before clustering. Another benefit of removing some non-informative genes is that it can drastically ease the computational burden in subsequent processing procedures.</p>
<p>Approaches that can remove non-informative genes have been studied in many literatures, for instance t-test <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>, mutual information (MI) maximization <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>, etc. Instead of these parametric methods, we use a nonparametric scoring algorithm presented in <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp>. For binary classification which involves two classes <it>&#969;</it>
<sub>0 </sub>and <it>&#969;</it>
<sub>1</sub>, the score of a feature <it>&#981;</it>
<sub>
<it>m </it>
</sub>is</p>
<p>
<display-formula id="M8">
<graphic file="1748-7188-5-14-i14.gif"/>
</display-formula>
</p>
<p>where &#10214;A&#10215; equals 1 if A is true, otherwise 0. |<it>&#969;</it>| is the number of samples in <it>&#969;</it>.</p>
<p>The genes whose scores are below a threshold <it>&#952;</it>
<sub>
<it>p </it>
</sub>will be removed, and the remaining genes are used for further processing. Compared with the parametric methods such as t-test and MI maximization, this method is less sensitive to outliers, since it does not rely on any statistical values (mean, standard deviation, etc.) of the data, which can be highly affected by outliers.</p>
<p>This nonparametric method can be easily generalized to multiclass problems by considering all the possible binary cases. For a <it>K </it>class problem, the score of a feature <it>&#981;</it>
<sub>
<it>m </it>
</sub>is</p>
<p>
<display-formula id="M9">
<graphic file="1748-7188-5-14-i15.gif"/>
</display-formula>
</p>
<p>After gene pre-selection, we group samples in each class into some clusters. Although there are many choices of clustering approaches, hierarchical clustering <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> is the most commonly used one for microarray analysis. The preference of hierarchical clustering in microarray analysis is due to its good performance <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp> and, moreover, it does not require a pre-specification of the number of clusters.</p>
<p>We use the most common type of hierarchical clustering. At the initial level, each sample forms its own cluster. At each subsequent level, the two 'nearest' clusters are combined to form one bigger cluster. We use <it>method = 'furthest' </it>which means the distance between two clusters is the maximum of all the distances between any sample in one cluster and any sample in the other cluster. The 'furthest' metric is used since it is not highly sensitive to outliers compared with the other metrics such as 'nearest' and 'average'. We empirically set a threshold <it>&#952;</it>
<sub>
<it>h </it>
</sub>for clustering, which means that for each class, the clustering procedure terminates when the distance between any two clusters is larger than <it>&#952;</it>
<sub>
<it>h</it>
</sub>.</p>
</sec>
<sec>
<st>
<p>Additive optimization method</p>
</st>
<p>Here the NMM criterion is optimized in an additive approach, which operates iteratively. At each iteration, one feature is selected. Assuming that until the (<it>t - </it>1)-th iteration the margin is <it>J</it>
<sub>
<it>t</it>-1</sub>, at iteration <it>t </it>the feature <it>h</it>
<sub>
<it>t </it>
</sub>will be selected to maximize</p>
<p>
<display-formula id="M10">
<graphic file="1748-7188-5-14-i16.gif"/>
</display-formula>
</p>
<p>During the optimization procedure, however, when one feature is selected, for each sample its nearest between-class neighbor and furthest within-cluster neighbor might change. In another word, the optimization of <it>J</it>
<sub>
<it>t </it>
</sub>might change <it>J</it>
<sub>
<it>t</it>-1</sub>, and for each sample, many other samples might become its nearest between-class neighbor or furthest within-cluster neighbor in subsequent processing. Therefore we should maintain the distance between any two samples in each iteration, which is computationally expensive. In order to reduce computational complexity, we maximize the following formula instead of directly optimizing Equation (4),</p>
<p>
<display-formula id="M11">
<graphic file="1748-7188-5-14-i17.gif"/>
</display-formula>
</p>
<p>Proof 2 shows that Equation (11) is a low bound of Equation (4), which implies that we can maximize Equation (4) by optimizing Equation (11).</p>
<sec>
<st>
<p>Proof 2</p>
</st>
<p>
<display-formula id="M12">
<graphic file="1748-7188-5-14-i18.gif"/>
</display-formula>
</p>
<p>With the criterion of Equation (11), at each iteration we can independently treat each feature to select the best one, regardless of the features that have been selected at previous iterations. This implies that we can test each feature on training set and select the top-ranked ones. However, <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp> has observed that simply combining the top-ranked genes often does not form a good feature set. One reason is that the top-ranked genes could be highly correlated, and therefore the selected features might contain much redundant information. In order to overcome this problem, similar in the way that the boosting method <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp> does, we assign weights <inline-formula>
<graphic file="1748-7188-5-14-i19.gif"/>
</inline-formula> to training samples. Initially all samples share the same weight. When one feature is selected, the weights are updated with the principle that the sample that has a larger margin will get a lower weight, and vice versa. The weights of the samples are updated by</p>
<p>
<display-formula id="M13">
<graphic file="1748-7188-5-14-i20.gif"/>
</display-formula>
</p>
<p>where <inline-formula>
<graphic file="1748-7188-5-14-i21.gif"/>
</inline-formula>, and <it>&#945; </it>is a positive parameter. Algorithm flow of the additive optimization method is shown in Figure <figr fid="F2">2</figr>.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Additive optimization of the NNM criterion</p></caption><text>
   <p><b>Additive optimization of the NNM criterion</b>. flag<sub><it>m </it></sub>indicates whether <it>&#981;</it><sub><it>m </it></sub>has been selected. It is true if <it>&#981;</it><sub><it>m </it></sub>has been selected, otherwise false.</p>
</text><graphic file="1748-7188-5-14-2"/></fig>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Case-Based Reasoning Classifier</p>
</st>
<p>Rather than using the traditional CBR methods in which all the samples form a single case-base, here we treat samples in each class one case-base. For a <it>K</it>-class problem, there are <it>K </it>case-bases <inline-formula>
<graphic file="1748-7188-5-14-i22.gif"/>
</inline-formula>. Given an input sample <it>x</it>, ANMM4CBR retrieves several similar cases from each case-base. The distance between <it>x </it>and a sample <it>x' </it>in case-base is measured by</p>
<p>
<display-formula id="M14">
<graphic file="1748-7188-5-14-i23.gif"/>
</display-formula>
</p>
<p>If there are <it>l</it>
<sub>
<it>k </it>
</sub>samples in the case-base <it>&#969;</it>
<sub>
<it>k</it>
</sub>, <it>&#946;</it>&#183;<it>l</it>
<sub>
<it>k </it>
</sub>cases that have small distances from <it>x </it>will be selected as similar cases, where <it>&#946; </it>is a parameter that controls the number of samples that will be retrieved from each case-base. The distance between <it>x </it>and <it>&#969;</it>
<sub>
<it>k</it>
</sub>, <it>D</it>(<it>x</it>, <it>&#969;</it>
<sub>
<it>k</it>
</sub>) is the average of the retrieved <it>&#946;</it>&#183;<it>l</it>
<sub>
<it>k </it>
</sub>distances. In the ANMM4CBR method, we calculate the distance from <it>x </it>to each case-base <it>&#969;</it>
<sub>
<it>k</it>
</sub>, and <it>x </it>belongs to the class which relates to the minimum distance <it>D</it>(<it>x</it>, <it>&#969;</it>
<sub>
<it>k</it>
</sub>).</p>
</sec>
</sec>
<sec>
<st>
<p>Results and Discussion</p>
</st>
<p>We carried out experiments using simulated data as well as real microarray data to test the performance of ANMM4CBR. There are four parameters to be chosen in ANMM4CBR: gene pre-selection threshold <it>&#952;</it>
<sub>
<it>p</it>
</sub>, cluster-stopping threshold <it>&#952;</it>
<sub>
<it>h</it>
</sub>, weight-updating parameter <it>&#945;</it>, and the parameter <it>&#946; </it>for case retrieve. We empirically set <it>&#952;</it>
<sub>
<it>p </it>
</sub>and <it>&#946; </it>to 0.7 and 0.3 respectively, which means the genes with score values smaller than 0.7 will be removed in the gene pre-selection procedure, and CBR will retrieve 0.3|<it>&#969;</it>| cases from a case-base which contains |<it>&#969;</it>| cases. The other two parameters, <it>&#952;</it>
<sub>
<it>h </it>
</sub>and <it>&#945;</it>, are data-dependent. Therefore we adopted cross-validation to choose them. After the whole data were split into training and testing sets, we used five-fold cross-validation on training set to evaluate the performance of ANMM4CBR with different values of <it>&#952;</it>
<sub>
<it>h </it>
</sub>and <it>&#945;</it>. Then the best combination of <it>&#952;</it>
<sub>
<it>h </it>
</sub>and <it>&#945; </it>was selected to train an ANMM4CBR classifier using all training samples. The tuning parameters for <it>&#952;</it>
<sub>
<it>h </it>
</sub>are 0.8, 0.9, &#8943;, 1.5, and for <it>&#945; </it>are 0.3, 0.4, &#8943;, 1.0. Please see additional file <supplr sid="S1">1</supplr> for the source code of the ANMM4CBR method.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>We provide the source code and a readme file as an additional file.</b> The code was compiled with Visual Studio 2005.</p>
</text>
<file name="1748-7188-5-14-S1.ZIP">
   <p>Click here for file</p>
</file>
</suppl>
<sec>
<st>
<p>Simulation</p>
</st>
<p>We first consider simulated data. We used a noisy version of the simulated data in <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>. The original data assumes three different normal distributions for both insignificant genes (null cases) and significant genes. There are 72 samples (47 positive and 25 negative) in the dataset, and out of 1000 genes there are 10 significantly differentially expressed ones. Please refer to <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp> for more details of this data.</p>
<p>We compared ANMM4CBR with several typical classification methods, including support vector machine (SVM) <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp> with linear kernal, <it>k</it>-nearest neighbor (<it>k</it>NN, we set <it>k </it>= 3), and LogitBoost <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. In the above three algorithms, only LogitBoost is a combination of feature selection and classification. There should be feature selection methods for SVM and <it>k</it>NN classification. Here two feature selection methods were tested. One is the Between-group to Within-group (BW) ratio method described in <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. The BW ratio for gene <it>m </it>is</p>
<p>
<display-formula id="M15">
<graphic file="1748-7188-5-14-i24.gif"/>
</display-formula>
</p>
<p>where <it>x</it>(<it>m</it>) and <it>x</it>
<sub>
<it>k</it>
</sub>(<it>m</it>) denote the average expression value of gene <it>m </it>across all samples and across samples that only belong to class <it>k </it>respectively. <it>x</it>
<sub>
<it>i</it>, <it>m </it>
</sub>is the expression value of gene <it>m </it>in the <it>i</it>th sample. &#10214;&#183;&#10215; is the indicator function which has been described in Equation (8). Another feature selection method we used is the Minimum Redundancy - Maximum Relevance (MRMR) method proposed in <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>, which has been proved very effective for microarray data analysis. Other than simply picking the top-ranked genes, MRMR also minimizes redundant information in the selected genes by measuring correlations between different genes. We used the FCQ criterion to optimize MRMR, which means using F-test to compute the maximum relevance <it>V</it>
<sub>
<it>F </it>
</sub>and using Pearson correlation coefficient to compute the minimum redundancy <it>W</it>
<sub>
<it>c</it>
</sub>, and combining them with their quotient, max(<it>V</it>
<sub>
<it>F</it>
</sub>/<it>W</it>
<sub>
<it>c</it>
</sub>).</p>
<p>The simulated data was randomly and equally divided into three parts, of which two parts were used for training and the third part was used for testing. In each experiment we constructed a noisy training data by assigning a randomly chosen, incorrect label to 20% of the training samples. We use noisy data because we want to test the performance of the algorithms confronting noises, which is usually the case for real microarray data. Another reason for the usage of noisy data is, we found that if there is no noise in training data, all algorithms used in this paper can get a 100% testing accuracy if we choose appropriate number of features. We used the noisy training samples to train classifiers and the test error rates were computed by testing samples. In order to obtain more replicable results <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>, we repeated this procedure for 100 times. Here we also investigated the performance of ANMM4CBR method without feature pre-selection and sample clustering.</p>
<p>Figure <figr fid="F3">3</figr> shows the distribution of training samples with top 3 selected features by different feature selection methods. We can see that the BW method cannot well separate the two classes, since the mis-specifications made the data not separable by the BW criterion. In the ANMM method, samples in each class were clustered into many groups, which is illustrated in Figure <figr fid="F3">3(c)</figr>. We can see that the mis-specifications were clustered into different groups with the other samples, so that they did not exert great influence to the feature selection procedure. Figure <figr fid="F3">3(c)</figr> shows that the training samples of different classes were well separated, excluding the mis-specifications. The ANMM result without feature pre-selection and sample clustering are listed in Figure <figr fid="F3">3(b)</figr>. The result in Figure <figr fid="F3">3(b)</figr> is even worse than that obtained by BW, which shows that feature pre-selection and sample clustering can really improve the performance of ANMM in noisy data.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Visualization of training samples using top 3 selected features by different feature selection methods</p></caption><text>
   <p><b>Visualization of training samples using top 3 selected features by different feature selection methods</b>. The feature selection methods are: (a) BW, (b) ANMM without feature pre-selection and sample clustering, (c) ANMM. Results of MRMR were not listed due to space limitation. Figure 4 shows that MRMR did not perform better than BW on this data. In these figures, different marker types represent samples in different classes, and the mis-specifications are depicted with red edge. In (c) samples in different clusters are filled with different colors.</p>
</text><graphic file="1748-7188-5-14-3"/></fig>
<p>Boxplots of the accuracy on various methods are shown in Figure <figr fid="F4">4</figr>. For each method, the feature number was chosen by minimizing the average error rates. We can see that ANMM4CBR resulted in much higher accuracy. If we do not add noise on training data, all approaches can get 100% testing accuracy. This shows that ANMM4CBR is very robust when dealing with noisy data, while the performance of the other methods will be undermined because of the noise in training samples.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Boxplots of the accuracy on simulated data</p></caption><text>
   <p><b>Boxplots of the accuracy on simulated data</b>. "Values" indicate the accuracy. Each column indicates different algorithms: 1 - BW+<it>k</it>NN; 2 - MRMR +<it>k</it>NN; 3 - BW+SVM; 4 - MRMR+SVM; 5 - LogitBoost; 6 - ANMM4CBR without feature pre-selection and sample clustering; 7 - ANMM4CBR.</p>
</text><graphic file="1748-7188-5-14-4"/></fig>
</sec>
<sec>
<st>
<p>Real Data</p>
</st>
<sec>
<st>
<p>Data sets and experimental set up</p>
</st>
<p>In this section we carry out experiments on four publicly available real data sets that have been widely studied. Brief descriptions of these data sets are as follows. Please refer to the original papers for more details of each data set.</p>
<sec>
<st>
<p>Leukemia</p>
</st>
<p>This data comes from a study <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> of 72 leukemia patients using Affymetrix HuGeneFL array. It contains 47 cases of acute lymphoblastic leukemia (ALL) and 25 cases of acute myeloid leukemia (AML) with the expression levels of 7,129 genes.</p>
</sec>
<sec>
<st>
<p>Colon</p>
</st>
<p>The Colon data contains expression levels of 40 tumor and 22 normal colon tissues. The data was analyzed with an Affymetrix oligonucleotide array complementary to more than 6,500 human genes. We used 2,000 genes with the highest minimal intensity across the samples selected by <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>SRBCT</p>
</st>
<p>The SRBCT data <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> contains gene-expression data from cDNA microarrays of 2308 genes. The 63 samples include four subtypes of small, round blue cell tumors of childhood, which are 12 neuroblastoma (NB), 20 rhabdomyosarcoma (RMS), 8 non-Hodgkin lymphoma (NHL), and 23 Ewing family of tumors (EWS).</p>
</sec>
<sec>
<st>
<p>GCM</p>
</st>
<p>GCM (Global Cancer Map) <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> is a very complicated data, which consists of 198 human tumor samples covering 14 different cancer types. The gene number is 16,063. Please refer to <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> for details of this data set.</p>
<p>The procedure of each experiment was implemented as that on the simulated data. Each data set was split into three parts, of which two parts for training and the left part for testing. For each method, this procedure was repeated for 100 times, and the averages and standard deviations of accuracy were taken for performance evaluation.</p>
</sec>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>Similar to that on simulated data, here we also compared ANMM4CBR with SVM, <it>k</it>NN and LogitBoost. BW and MRMR were used to select features for SVM and <it>k</it>NN classification. Since the standard SVM is tailored for binary classification, in multiclass data sets we used the one-versus-all (OVA) <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> approach, which firstly solves many binary problems and then combines the results to solve the multiclass problem. Given a <it>k </it>class problem, OVA trains <it>k </it>binary classifiers, each focuses on classifying one class against the others. A new sample will take the class label of the classifier with the largest real valued output from all <it>k </it>classifiers. For LogitBoost, we used the same approach of <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>, in which multiclass problems were solved by combining OVA results in a Bayes framework.</p>
<p>Table <tblr tid="T1">1</tblr> gives the classification results of the six methods on the four microarray data sets. The results demonstrate that these algorithms show different performance on different data sets. On Leukemia data, all methods achieve comparable results, with ANMM4CBR and MRMR+SVM perform slightly better. On Colon data, ANMM4CBR performs better than the other methods by a large margin. We can see that with different number of selected features, ANMM4CBR consistently achieves the highest accuracy. On SRBCT data, the results are different with different numbers of features. When the feature number is small, SVM and LogitBoost perform better than ANMM4CBR; when the feature number is large, ANMM4CBR performs better. Table <tblr tid="T1">1</tblr> also shows that, results of ANMM4CBR on GCM are not encouraging. SVM performs better than the other algorithms on GCM data.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Average classification accuracy and standard deviation. </p></caption><tblbdy cols="7">
      <r>
         <c cspan="2" ca="center">
            <p>
               <b># Iteration</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>10</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>20</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>30</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>40</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>50</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Leukemia</p>
         </c>
         <c ca="center">
            <p>BW+<it>k</it>NN</p>
         </c>
         <c ca="center">
            <p>95.7 &#177; 1.2</p>
         </c>
         <c ca="center">
            <p>96.9 &#177; 1.8</p>
         </c>
         <c ca="center">
            <p>96.6 &#177; 2.2</p>
         </c>
         <c ca="center">
            <p>96.6 &#177; 1.2</p>
         </c>
         <c ca="center">
            <p>96.8 &#177; 1.7</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+<it>k</it>NN</p>
         </c>
         <c ca="center">
            <p><b>96.5 </b>&#177; <b>2.5</b></p>
         </c>
         <c ca="center">
            <p>96.4 &#177; 2.1</p>
         </c>
         <c ca="center">
            <p>97.4 &#177; 1.7</p>
         </c>
         <c ca="center">
            <p>96.9 &#177; 2.2</p>
         </c>
         <c ca="center">
            <p>95.8 &#177; 2.4</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>BW+SVM</p>
         </c>
         <c ca="center">
            <p>95.6 &#177; 1.3</p>
         </c>
         <c ca="center">
            <p>95.7 &#177; 1.7</p>
         </c>
         <c ca="center">
            <p>95.9 &#177; 2.2</p>
         </c>
         <c ca="center">
            <p>96.2 &#177; 2.3</p>
         </c>
         <c ca="center">
            <p>96.9 &#177; 1.2</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+SVM</p>
         </c>
         <c ca="center">
            <p>96.4 &#177; 2.5</p>
         </c>
         <c ca="center">
            <p>96.8 &#177; 3.6</p>
         </c>
         <c ca="center">
            <p><b>97.6 </b>&#177; <b>2.0</b></p>
         </c>
         <c ca="center">
            <p><b>97.1 </b>&#177; <b>2.7</b></p>
         </c>
         <c ca="center">
            <p>96.8 &#177; 3.4</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>LogitBoost</p>
         </c>
         <c ca="center">
            <p>95.3 &#177; 2.9</p>
         </c>
         <c ca="center">
            <p>96.0 &#177; 2.4</p>
         </c>
         <c ca="center">
            <p>96.6 &#177; 1.8</p>
         </c>
         <c ca="center">
            <p>96.6 &#177; 2.8</p>
         </c>
         <c ca="center">
            <p>96.7 &#177; 1.7</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>ANMM4CBR</p>
         </c>
         <c ca="center">
            <p>96.3 &#177; 2.4</p>
         </c>
         <c ca="center">
            <p><b>97.5 </b>&#177; <b>1.7</b></p>
         </c>
         <c ca="center">
            <p>97.3 &#177; 1.8</p>
         </c>
         <c ca="center">
            <p>96.6 &#177; 1.7</p>
         </c>
         <c ca="center">
            <p><b>97.0 </b>&#177; <b>2.3</b></p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Colon</p>
         </c>
         <c ca="center">
            <p>BW+<it>k</it>NN</p>
         </c>
         <c ca="center">
            <p>81.2 &#177; 8.1</p>
         </c>
         <c ca="center">
            <p>82.8 &#177; 7.5</p>
         </c>
         <c ca="center">
            <p>83.5 &#177; 4.2</p>
         </c>
         <c ca="center">
            <p>83.4 &#177; 5.3</p>
         </c>
         <c ca="center">
            <p>83.6 &#177; 6.5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+<it>k</it>NN</p>
         </c>
         <c ca="center">
            <p>83.7 &#177; 4.3</p>
         </c>
         <c ca="center">
            <p>83.6 &#177; 7.9</p>
         </c>
         <c ca="center">
            <p>84.2 &#177; 6.0</p>
         </c>
         <c ca="center">
            <p>83.8 &#177; 5.9</p>
         </c>
         <c ca="center">
            <p>83.5 &#177; 6.9</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>BW+SVM</p>
         </c>
         <c ca="center">
            <p>84.0 &#177; 4.3</p>
         </c>
         <c ca="center">
            <p>83.6 &#177; 6.4</p>
         </c>
         <c ca="center">
            <p>83.6 &#177; 6.0</p>
         </c>
         <c ca="center">
            <p>84.2 &#177; 7.2</p>
         </c>
         <c ca="center">
            <p>84.5 &#177; 7.9</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+SVM</p>
         </c>
         <c ca="center">
            <p>85.4 &#177; 5.8</p>
         </c>
         <c ca="center">
            <p>84.1 &#177; 6.6</p>
         </c>
         <c ca="center">
            <p>84.0 &#177; 4.0</p>
         </c>
         <c ca="center">
            <p>84.6 &#177; 7.0</p>
         </c>
         <c ca="center">
            <p>84.7 &#177; 8.1</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>LogitBoost</p>
         </c>
         <c ca="center">
            <p>84.4 &#177; 4.3</p>
         </c>
         <c ca="center">
            <p>84.5 &#177; 8.9</p>
         </c>
         <c ca="center">
            <p>83.6 &#177; 4.9</p>
         </c>
         <c ca="center">
            <p>84.2 &#177; 6.8</p>
         </c>
         <c ca="center">
            <p>84.1 &#177; 4.6</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>ANMM4CBR</p>
         </c>
         <c ca="center">
            <p><b>86.3 </b>&#177; <b>6.1</b></p>
         </c>
         <c ca="center">
            <p><b>86.7 </b>&#177; <b>5.6</b></p>
         </c>
         <c ca="center">
            <p><b>86.2 </b>&#177; <b>4.2</b></p>
         </c>
         <c ca="center">
            <p><b>86.5 </b>&#177; <b>5.6</b></p>
         </c>
         <c ca="center">
            <p><b>85.6 </b>&#177; <b>4.4</b></p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>SRBCT</p>
         </c>
         <c ca="center">
            <p>BW+<it>k</it>NN (50)</p>
         </c>
         <c ca="center">
            <p>94.4 &#177; 4.2</p>
         </c>
         <c ca="center">
            <p>97.7 &#177; 2.1</p>
         </c>
         <c ca="center">
            <p>97.9 &#177; 1.3</p>
         </c>
         <c ca="center">
            <p>98.2 &#177; 1.6</p>
         </c>
         <c ca="center">
            <p>98.0 &#177; 1.2</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+<it>k</it>NN (50)</p>
         </c>
         <c ca="center">
            <p>78.4 &#177; 9.0</p>
         </c>
         <c ca="center">
            <p>97.4 &#177; 1.9</p>
         </c>
         <c ca="center">
            <p>98.6 &#177; 1.0</p>
         </c>
         <c ca="center">
            <p>98.8 &#177; 0.9</p>
         </c>
         <c ca="center">
            <p>98.2 &#177; 0.8</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>BW+SVM (97)</p>
         </c>
         <c ca="center">
            <p>94.0 &#177; 3.2</p>
         </c>
         <c ca="center">
            <p>98.0 &#177; 1.4</p>
         </c>
         <c ca="center">
            <p>98.4 &#177; 1.2</p>
         </c>
         <c ca="center">
            <p>98.8 &#177; 0.9</p>
         </c>
         <c ca="center">
            <p>99.2 &#177; 0.3</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+SVM (95)</p>
         </c>
         <c ca="center">
            <p>81.0 &#177; 10.5</p>
         </c>
         <c ca="center">
            <p><b>98.2 </b>&#177; <b>1.0</b></p>
         </c>
         <c ca="center">
            <p><b>98.9 </b>&#177; <b>1.3</b></p>
         </c>
         <c ca="center">
            <p>99.1 &#177; 0.7</p>
         </c>
         <c ca="center">
            <p>99.2 &#177; 0.2</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>LogitBoost (102)</p>
         </c>
         <c ca="center">
            <p><b>94.9 </b>&#177; <b>3.1</b></p>
         </c>
         <c ca="center">
            <p>97.3 &#177; 1.8</p>
         </c>
         <c ca="center">
            <p>98.0 &#177; 1.6</p>
         </c>
         <c ca="center">
            <p>98.6 &#177; 1.1</p>
         </c>
         <c ca="center">
            <p>98.6 &#177; 0.6</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>ANMM4CBR (50)</p>
         </c>
         <c ca="center">
            <p>90.3 &#177; 5.5</p>
         </c>
         <c ca="center">
            <p>97.3 &#177; 1.5</p>
         </c>
         <c ca="center">
            <p>98.8 &#177; 1.2</p>
         </c>
         <c ca="center">
            <p><b>99.3 </b>&#177; <b>0.7</b></p>
         </c>
         <c ca="center">
            <p><b>99.7 </b>&#177; <b>0.3</b></p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>GCM</p>
         </c>
         <c ca="center">
            <p>BW+<it>k</it>NN (50)</p>
         </c>
         <c ca="center">
            <p>46.2 &#177; 4.7</p>
         </c>
         <c ca="center">
            <p>47.4 &#177; 7.0</p>
         </c>
         <c ca="center">
            <p>51.2 &#177; 4.9</p>
         </c>
         <c ca="center">
            <p>52.6 &#177; 6.2</p>
         </c>
         <c ca="center">
            <p>54.1 &#177; 5.8</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+<it>k</it>NN (50)</p>
         </c>
         <c ca="center">
            <p>41.1 &#177; 7.1</p>
         </c>
         <c ca="center">
            <p>42.7 &#177; 8.1</p>
         </c>
         <c ca="center">
            <p>51.5 &#177; 1.6</p>
         </c>
         <c ca="center">
            <p>58.3 &#177; 4.9</p>
         </c>
         <c ca="center">
            <p>60.5 &#177; 5.9</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>BW+SVM (254)</p>
         </c>
         <c ca="center">
            <p>53.7 &#177; 5.1</p>
         </c>
         <c ca="center">
            <p>58.1 &#177; 9.8</p>
         </c>
         <c ca="center">
            <p>59.0 &#177; 6.6</p>
         </c>
         <c ca="center">
            <p><b>66.6 </b>&#177; <b>6.7</b></p>
         </c>
         <c ca="center">
            <p>66.9 &#177; 3.6</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>MRMR+SVM (259)</p>
         </c>
         <c ca="center">
            <p>51.0 &#177; 7.7</p>
         </c>
         <c ca="center">
            <p><b>60.3 </b>&#177; <b>7.0</b></p>
         </c>
         <c ca="center">
            <p><b>61.8 </b>&#177; <b>3.7</b></p>
         </c>
         <c ca="center">
            <p>64.8 &#177; 8.2</p>
         </c>
         <c ca="center">
            <p><b>67.8 </b>&#177; <b>4.6</b></p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>LogitBoost (273)</p>
         </c>
         <c ca="center">
            <p><b>57.1 </b>&#177; <b>4.9</b></p>
         </c>
         <c ca="center">
            <p>60.1 &#177; 1.9</p>
         </c>
         <c ca="center">
            <p>60.6 &#177; 4.0</p>
         </c>
         <c ca="center">
            <p>62.1 &#177; 5.7</p>
         </c>
         <c ca="center">
            <p>65.1 &#177; 5.4</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>ANMM4CBR (50)</p>
         </c>
         <c ca="center">
            <p>41.1 &#177; 1.2</p>
         </c>
         <c ca="center">
            <p>51.0 &#177; 8.1</p>
         </c>
         <c ca="center">
            <p>57.2 &#177; 6.9</p>
         </c>
         <c ca="center">
            <p>61.1 &#177; 1.4</p>
         </c>
         <c ca="center">
            <p>63.3 &#177; 3.9</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Each experiment was carried out for 100 runs. The best results in different situations are labeled as black. Here the iteration number means the number of features used by each single classifier. In OVA case, the total number of genes may exceed the iteration number, since in OVA a multiclass problem is solved by considering many binary ones. In the parentheses we list the average number of features selected by each method when the iteration number is 50. See Table 2 for another experiment on multiclass data set.</p>
   </tblfn></tbl>
<p>We now take a closer look at the results in Table <tblr tid="T1">1</tblr>. We can see that ANMM4CBR performs much better than all the other algorithms on the Colon data, while only achieves comparative results on the Leukemia data. This is because Leukemia is a simple data on which many algorithms have reported impressive results. Therefore it is not surprising that all six algorithms in our experiment can have similar good results. In contrast, it was reported in <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp> that the Colon data might have a sample contamination problem, and therefore the much better performance of ANMM4CBR on Colon data demonstrated its robustness to noise in the data sets.</p>
<p>Although when the feature number is 40 and 50, ANMM4CBR performs the best on SRBCT, on the two multiclass data sets ANMM4CBR cannot achieve comparative results with SVM and LogitBoost. It is shown in Table <tblr tid="T1">1</tblr> that SVM and LogitBoost perform better than ANMM4CBR, and ANMM4CBR performs better than <it>k</it>NN. However, we argue that this does not indicate that ANMM4CBR cannot get good results on multiclass problems. Note that the same as <it>k</it>NN, ANMM4CBR can be directly used to solve a multiclass problem. Therefore in ANMM4CBR method the number of iterations is equal to the number of selected features. But in SVM and LogitBoost algorithms, we used OVA method to make the final prediction, which needs to solve <it>k </it>(class number) binary problems. When each binary classifier selects <it>s </it>features, the total number of selected features will be <it>O</it>(<it>s </it>&#215; <it>c</it>). This means that with the same iteration number, SVM and LogitBoost have to use more features than ANMM4CBR and <it>k</it>NN.</p>
<p>Here we made another experiment on GCM. We compared ANMM4CBR with MRMR+SVM, which showed the best performance on GCM data in Table <tblr tid="T1">1</tblr>. In each comparison of this experiment, the number of features selected by ANMM4CBR was equal to the total number of genes that are selected for all the binary classifiers. Since we performed experiment for 100 times and in each time the total gene number may be different, we firstly carried out SVM experiment and then calculated the total number of genes. The results are shown in Table <tblr tid="T2">2</tblr>, which demonstrate that ANMM4CBR outperforms SVM by a large margin when they choose the same number of genes.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Comparison of MRMR+SVM and ANMM4CBR on GCM data. </p></caption><tblbdy cols="6">
      <r>
         <c ca="center">
            <p>
               <b><it>s</it>/<it>T</it></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>10/86</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>20/157</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>30/209</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>40/243</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>50/259</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>SVM+MRMR</p>
         </c>
         <c ca="center">
            <p>51.0 &#177; 3.7</p>
         </c>
         <c ca="center">
            <p>60.3 &#177; 4.0</p>
         </c>
         <c ca="center">
            <p>61.8 &#177; 2.4</p>
         </c>
         <c ca="center">
            <p>64.8 &#177; 4.5</p>
         </c>
         <c ca="center">
            <p>67.8 &#177; 3.5</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>ANMM4CBR</p>
         </c>
         <c ca="center">
            <p><b>62.7 </b>&#177; <b>4.8</b></p>
         </c>
         <c ca="center">
            <p><b>66.1 </b>&#177; <b>2.4</b></p>
         </c>
         <c ca="center">
            <p><b>67.9 </b>&#177; <b>3.5</b></p>
         </c>
         <c ca="center">
            <p><b>69.1 </b>&#177; <b>1.9</b></p>
         </c>
         <c ca="center">
            <p><b>70.0 </b>&#177; <b>2.9</b></p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p><it>s </it>is the number of genes in each binary SVM classifier, and <it>T </it>indicates the total number of different genes, <it>i.e</it>. the gene number for ANMM4CBR. In each situation the higher accuracy is labeled as black.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Compare with MOE4CBR</p>
</st>
<p>Since ANMM4CBR is a CBR-based method, we would like to compare it with other CBR methods that have been applied to microarray classification problems. Because both source code and data sets used in <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp> are not available, we did not compare our method with the gene-CBR method in <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>. We compared ANMM4CBR with the mixture of experts for case-based reasoning (MOE4CBR) method <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, which builds CBR classifiers based on the idea of mixture of experts. We applied our ANMM4CBR method to the same microarray data with the same experimental set as that in <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, <it>i.e</it>., using the training and testing data suggested in <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> on the Leukemia data, and using leave-one-out cross-validation on the Lung data and average the results obtained from 20 trials. The Lung data contains 39 lung cancer samples with 18,117 gene expression levels. This data set is classified into two categories, recurrence (23 samples) and nonrecurrence (16 samples). The Lung data was not used in previous experiments because there are missing values. The same as that in <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, here missing values were imputed using the weighted <it>k</it>-nearest neighbor method <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>.</p>
<p>In <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, the classification accuracies on Leukemia and Lung data are 74% and 70% respectively. 712 out of 7,129 genes were selected for Leukemia data classification and 1,811 out of 18,117 genes were selected for Lung data classification. When the same number of genes are selected, the classification results of ANMM4CBR are 91% on Leukemia and 75% on Lung. Moreover, on the Leukemia data, the best result obtained by ANMM4CBR is 94% when only 23 genes are selected. This shows that ANMM4CBR outperforms MOE4CBR, especially on the Leukemia data set.</p>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>In the present work, we proposed a novel ANMM4CBR method for microarray classification. For feature selection, we proposed an ANMM method to additively optimize a nonparametric margin maximum criterion which was defined based on feature pre-selection and sample clustering. For classification, we adopted a CBR method, in which samples of each class form a case-base.</p>
<p>Some properties determine that the ANMM4CBR can be well applied to microarray data classification. (1) The nearest between-class distance maximum and furthest within-cluster distance criterion used in ANMM makes the feature selection less sensitive to noise or outliers existing in the data. (2) In classification phase ANMM4CBR uses a case-based reasoning method, which has been proved to be suitable for life science related problems <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. (3) In microarray data the sample number is too small for us to estimate the accurate distribution of the data. In each step of ANMM4CBR (including feature pre-selection, clustering, feature selection, classification), we use nonparametric approaches which require less restrictive assumptions about the original data. (4) There are some links between ANMM feature selection and CBR classifier. Furthermore, ANMM4CBR can directly solve multiclass problems without having to convert them to many binary ones.</p>
<p>Our future research will focus on two directions. One is to study how to facilitate the parameters choice and gene number selection in ANMM4CBR. We have several parameters to tune, and it is time consuming to select a set of optimal parameters when dealing with a new data. Moreover in ANMM4CBR we should pre-specify the number of features to be selected. The other direction is to further investigate the relationship between ANMM and CBR, which was not theoretically warranted in this paper. We believe that a better algorithm can be designed by revealing the relationships between feature selection approach and the classifier.</p>
</sec>
<sec>
<st>
<p>List of abbreviations</p>
</st>
<p>ANMM4CBR: additive nonparametric margin maximization for case-based reasoning; ANMM: additive nonparametric margin maximization; NMM: nonparametric margin maximization; CBR: case-based reasoning; SVM: support vector machine; NN: nearest neighbor; NDA: nonparametric discriminant analysis; MI: mutual information; BW: between-group to within-group; MRMR: minimum redundancy - maximum relevance.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>SL conceived and coordinated the research. BY designed the algorithms, carried out the experiments and drafted the manuscript. SL participated in the design of the experiments and helped to draft the manuscript. Both authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We thank Mr. Nan Chen in our laboratory for useful discussion and pre-processing of the data set. This work is supported by the National Natural Science Foundation of PR China (Nos. 60934004, 90709013 and 60721003).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Discovering and analysis of inflammatory disease-related genes using cDNA microarrays</p></title><aug><au><snm>Heller</snm><fnm>RA</fnm></au><au><snm>Schena</snm><fnm>M</fnm></au><au><snm>Chai</snm><fnm>A</fnm></au><au><snm>Shalon</snm><fnm>D</fnm></au><au><snm>Bedilion</snm><fnm>T</fnm></au><au><snm>Gilmore</snm><fnm>J</fnm></au><au><snm>Woolley</snm><fnm>DE</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au></aug><source>P Natl Acad Sci USA</source><pubdate>1997</pubdate><volume>94</volume><fpage>2150</fpage><lpage>2155</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.94.6.2150</pubid></xrefbib></bibl><bibl id="B2"><title><p>Cluster analysis and display of genome-wide expression patterns</p></title><aug><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Spellman</snm><fnm>PT</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au></aug><source>P Natl Acad Sci USA</source><pubdate>1998</pubdate><volume>95</volume><fpage>14863</fpage><lpage>14868</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.95.25.14863</pubid></xrefbib></bibl><bibl id="B3"><title><p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring</p></title><aug><au><snm>Golub</snm><fnm>TR</fnm></au><au><snm>Slonim</snm><fnm>DK</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Huard</snm><fnm>C</fnm></au><au><snm>Gaasenbeek</snm><fnm>M</fnm></au><au><snm>Mesirov</snm><fnm>JP</fnm></au><au><snm>Coller</snm><fnm>H</fnm></au><au><snm>Loh</snm><fnm>ML</fnm></au><au><snm>Downing</snm><fnm>JR</fnm></au><au><snm>Caligiuri</snm><fnm>MA</fnm></au><au><snm>Bloomfield</snm><fnm>CD</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au></aug><source>Science</source><pubdate>1999</pubdate><volume>286</volume><fpage>531</fpage><lpage>537</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.286.5439.531</pubid><pubid idtype="pmpid" link="fulltext">10521349</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Instance-based concept learning from multiclass DNA microarray data</p></title><aug><au><snm>Berrar</snm><fnm>D</fnm></au><au><snm>Bradbury</snm><fnm>I</fnm></au><au><snm>Dubitzky</snm><fnm>W</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>73</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-73</pubid><pubid idtype="pmcid">1402330</pubid><pubid idtype="pmpid" link="fulltext">16483361</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks</p></title><aug><au><snm>Khan</snm><fnm>J</fnm></au><au><snm>Wei</snm><fnm>JS</fnm></au><au><snm>Ringn&#233;r</snm><fnm>M</fnm></au><au><snm>Saal</snm><fnm>LH</fnm></au><au><snm>Ladanyi</snm><fnm>M</fnm></au><au><snm>Westermann</snm><fnm>F</fnm></au><au><snm>Berthold</snm><fnm>F</fnm></au><au><snm>Schwab</snm><fnm>M</fnm></au><au><snm>Antonescu</snm><fnm>CR</fnm></au><au><snm>Peterson</snm><fnm>C</fnm></au><au><snm>Meltzer</snm><fnm>PS</fnm></au></aug><source>Nat Med</source><pubdate>2001</pubdate><volume>7</volume><fpage>673</fpage><lpage>679</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/89044</pubid><pubid idtype="pmcid">1282521</pubid><pubid idtype="pmpid" link="fulltext">11385503</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Boosting for tumor classification with gene expression data</p></title><aug><au><snm>Dettling</snm><fnm>M</fnm></au><au><snm>B&#252;hlmann</snm><fnm>P</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>1061</fpage><lpage>1069</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btf867</pubid><pubid idtype="pmpid" link="fulltext">12801866</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Support vector machine classification and validation of cancer tissue samples using microarray expression data</p></title><aug><au><snm>Furey</snm><fnm>TS</fnm></au><au><snm>Cristianini</snm><fnm>N</fnm></au><au><snm>Duffy</snm><fnm>N</fnm></au><au><snm>Bednarski</snm><fnm>DW</fnm></au><au><snm>Schummer</snm><fnm>M</fnm></au><au><snm>Haussler</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2000</pubdate><volume>16</volume><fpage>906</fpage><lpage>914</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/16.10.906</pubid><pubid idtype="pmpid" link="fulltext">11120680</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><aug><au><snm>Kolodner</snm><fnm>J</fnm></au></aug><source>Case-Based Reasoning</source><publisher>Morgan Kaufmann</publisher><pubdate>1993</pubdate></bibl><bibl id="B9"><title><p>Comparison of discrimination methods for the classification of tumors using gene expression data</p></title><aug><au><snm>Dudoit</snm><fnm>S</fnm></au><au><snm>Fridlyand</snm><fnm>J</fnm></au><au><snm>Speed</snm><fnm>TP</fnm></au></aug><source>J Am Stat Assoc</source><pubdate>2002</pubdate><volume>97</volume><fpage>77</fpage><lpage>87</lpage><xrefbib><pubid idtype="doi">10.1198/016214502753479248</pubid></xrefbib></bibl><bibl id="B10"><title><p>Application of case-based reasoning in molecular biology</p></title><aug><au><snm>Jurisica</snm><fnm>I</fnm></au><au><snm>Glasgow</snm><fnm>J</fnm></au></aug><source>Artif Intell Mag</source><pubdate>2004</pubdate><volume>25</volume><fpage>85</fpage><lpage>95</lpage></bibl><bibl id="B11"><title><p>Gene-CBR: a case-based reasoning tool for cancer diagnosis using microarray data sets</p></title><aug><au><snm>D&#237;az</snm><fnm>F</fnm></au><au><snm>Fdez-Riverola</snm><fnm>F</fnm></au><au><snm>Corchado</snm><fnm>JM</fnm></au></aug><source>Comput Intell</source><pubdate>2006</pubdate><volume>22</volume><fpage>254</fpage><lpage>268</lpage><xrefbib><pubid idtype="doi">10.1111/j.1467-8640.2006.00287.x</pubid></xrefbib></bibl><bibl id="B12"><title><p>Data mining for case-based reasoning in high-dimensional biological domains</p></title><aug><au><snm>Arshadi</snm><fnm>N</fnm></au><au><snm>Jurisica</snm><fnm>I</fnm></au></aug><source>IEEE T Knowl Data En</source><pubdate>2005</pubdate><volume>17</volume><fpage>1127</fpage><lpage>1137</lpage><xrefbib><pubid idtype="doi">10.1109/TKDE.2005.124</pubid></xrefbib></bibl><bibl id="B13"><title><p>A nonparametric scoring algorithm for identifying informative genes form microarray data</p></title><aug><au><snm>Park</snm><fnm>PJ</fnm></au><au><snm>Pagano</snm><fnm>M</fnm></au><au><snm>Bonetti</snm><fnm>M</fnm></au></aug><source>Pac Symp Biocomput</source><pubdate>2005</pubdate><volume>6</volume><fpage>310</fpage><lpage>321</lpage></bibl><bibl id="B14"><title><p>Nonparametric methods for identifying differentially expressed genes in microarray data</p></title><aug><au><snm>Troyanskaya</snm><fnm>OG</fnm></au><au><snm>Garber</snm><fnm>ME</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Altman</snm><fnm>RB</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><fpage>1454</fpage><lpage>1461</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.11.1454</pubid><pubid idtype="pmpid" link="fulltext">12424116</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Case-based reasoning: foundations issues, methodo-logical variations, and system approaches</p></title><aug><au><snm>Aamodt</snm><fnm>A</fnm></au><au><snm>Plaza</snm><fnm>E</fnm></au></aug><source>AI Commun</source><pubdate>1994</pubdate><volume>7</volume><fpage>39</fpage><lpage>59</lpage></bibl><bibl id="B16"><title><p>Minimum redundancy feature selection from microarray gene expression data</p></title><aug><au><snm>Ding</snm><fnm>C</fnm></au><au><snm>Peng</snm><fnm>H</fnm></au></aug><source>Proceedings of the 2003 IEEE Bioinformatics Conference</source><pubdate>2003</pubdate><fpage>523</fpage><lpage>528</lpage></bibl><bibl id="B17"><title><p>Nonparametric discriminant analysis and nearest neighbor classification</p></title><aug><au><snm>Bressan</snm><fnm>M</fnm></au><au><snm>Vitri&#224;</snm><fnm>J</fnm></au></aug><source>Pattern Recogn Lett</source><pubdate>2003</pubdate><volume>24</volume><fpage>2743</fpage><lpage>2749</lpage><xrefbib><pubid idtype="doi">10.1016/S0167-8655(03)00117-X</pubid></xrefbib></bibl><bibl id="B18"><title><p>Nonparametric discriminant analysis</p></title><aug><au><snm>Fukunaga</snm><fnm>K</fnm></au><au><snm>Mantock</snm><fnm>J</fnm></au></aug><source>IEEE T Pattern Anal</source><pubdate>1983</pubdate><volume>5</volume><fpage>671</fpage><lpage>678</lpage><xrefbib><pubid idtype="doi">10.1109/TPAMI.1983.4767461</pubid></xrefbib></bibl><bibl id="B19"><title><p>Robust and accurate cancer classification with gene expression profiling</p></title><aug><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Zhang</snm><fnm>K</fnm></au><au><snm>Jiang</snm><fnm>T</fnm></au></aug><source>Proceedings of the 2005 IEEE Bioinformatics Conference</source><pubdate>2005</pubdate><fpage>310</fpage><lpage>321</lpage></bibl><bibl id="B20"><title><p>A comparative review of satistical methods for discovering differentially expressed genes in replicated microarray experiments</p></title><aug><au><snm>Pan</snm><fnm>W</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>19</volume><fpage>546</fpage><lpage>554</lpage><xrefbib><pubid idtype="doi">10.1093/bioinformatics/18.4.546</pubid></xrefbib></bibl><bibl id="B21"><title><p>Hierarchical clustering schemes</p></title><aug><au><snm>Johnson</snm><fnm>SC</fnm></au></aug><source>Psychometrika</source><pubdate>1967</pubdate><volume>32</volume><fpage>241</fpage><lpage>253</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/BF02289588</pubid><pubid idtype="pmpid">5234703</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>A decision-theoretic generalization of on-line learning and an application to boosting</p></title><aug><au><snm>Freund</snm><fnm>Y</fnm></au><au><snm>Schapire</snm><fnm>R</fnm></au></aug><source>J Comput Syst Sci</source><pubdate>1997</pubdate><volume>55</volume><fpage>119</fpage><lpage>139</lpage><xrefbib><pubid idtype="doi">10.1006/jcss.1997.1504</pubid></xrefbib></bibl><bibl id="B23"><title><p>Structured polychotomous machine diagnosis of multiple cancer types using gene expression</p></title><aug><au><snm>Koo</snm><fnm>JY</fnm></au><au><snm>Sohn</snm><fnm>I</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Lee</snm><fnm>JW</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>950</fpage><lpage>958</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl029</pubid><pubid idtype="pmpid" link="fulltext">16452113</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms</p></title><aug><au><snm>Bouckaert</snm><fnm>R</fnm></au><au><snm>Frank</snm><fnm>E</fnm></au></aug><source>Advances in Knowledge Discovery and Data Mining</source><pubdate>2004</pubdate><volume>3056</volume><fpage>3</fpage><lpage>12</lpage></bibl><bibl id="B25"><title><p>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays</p></title><aug><au><snm>Alon</snm><fnm>U</fnm></au><au><snm>Barkai</snm><fnm>N</fnm></au><au><snm>Notterman</snm><fnm>DA</fnm></au><au><snm>Gish</snm><fnm>K</fnm></au><au><snm>Ybarra</snm><fnm>S</fnm></au><au><snm>Mack</snm><fnm>D</fnm></au><au><snm>Levine</snm><fnm>AJ</fnm></au></aug><source>P Natl Acad Sci USA</source><pubdate>1999</pubdate><volume>96</volume><fpage>6745</fpage><lpage>6750</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.96.12.6745</pubid></xrefbib></bibl><bibl id="B26"><title><p>Multiclass cancer diagnosis using tumor gene expression signatures</p></title><aug><au><snm>Ramaswamy</snm><fnm>S</fnm></au><au><snm>Tamayo</snm><fnm>P</fnm></au><au><snm>Rifkin</snm><fnm>R</fnm></au><au><snm>Mukherjee</snm><fnm>S</fnm></au><au><snm>Yeang</snm><fnm>CH</fnm></au><au><snm>Angelo</snm><fnm>M</fnm></au><au><snm>Ladd</snm><fnm>C</fnm></au><au><snm>Reich</snm><fnm>M</fnm></au><au><snm>Latulippe</snm><fnm>E</fnm></au><au><snm>Mesirov</snm><fnm>JP</fnm></au><au><snm>Poggio</snm><fnm>T</fnm></au><au><snm>Gerald</snm><fnm>W</fnm></au><au><snm>Loda</snm><fnm>M</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Golub</snm><fnm>TR</fnm></au></aug><source>P Natl Acad Sci USA</source><pubdate>2001</pubdate><volume>98</volume><fpage>15149</fpage><lpage>15154</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.211566398</pubid></xrefbib></bibl><bibl id="B27"><title><p>Using uncorrelated discriminant analysis for tissue classification with gene expression data</p></title><aug><au><snm>Ye</snm><fnm>J</fnm></au><au><snm>Li</snm><fnm>T</fnm></au><au><snm>Xiong</snm><fnm>T</fnm></au><au><snm>Janardan</snm><fnm>R</fnm></au></aug><source>IEEE/ACM T Comput Biol Bioinfor</source><pubdate>2004</pubdate><volume>1</volume><fpage>181</fpage><lpage>190</lpage><xrefbib><pubid idtype="doi">10.1109/TCBB.2004.45</pubid></xrefbib></bibl><bibl id="B28"><title><p>Missing value estimation methods for DNA microarrays</p></title><aug><au><snm>Troyanskaya</snm><fnm>OG</fnm></au><au><snm>Cantor</snm><fnm>M</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Brown</snm><fnm>P</fnm></au><au><snm>Hastie</snm><fnm>T</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Altman</snm><fnm>RB</fnm></au></aug><source>Bioinformatics</source><pubdate>2001</pubdate><volume>17</volume><fpage>520</fpage><lpage>525</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/17.6.520</pubid><pubid idtype="pmpid" link="fulltext">11395428</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>