<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
<ui>1748-7188-7-5</ui>
<ji>1748-7188</ji>
<fm>
<dochead>Research</dochead>
<bibl>
<title><p>A normalization strategy for comparing tag count data</p></title>
<aug>
<au id="A1" ca="yes"><snm>Kadota</snm><fnm>Koji</fnm><insr iid="I1"/><insr iid="I2"/><email>kadota@bi.a.u-tokyo.ac.jp</email></au>
<au id="A2"><snm>Nishiyama</snm><fnm>Tomoaki</fnm><insr iid="I3"/><email>tomoakin@kenroku.kanazawa-u.ac.jp</email></au>
<au id="A3"><snm>Shimizu</snm><fnm>Kentaro</fnm><insr iid="I1"/><email>shimizu@bi.a.u-tokyo.ac.jp</email></au>
</aug>
<insg>
<ins id="I1"><p>Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan</p></ins>
<ins id="I2"><p>Project on Health and Anti-aging, Kanagawa Academy of Science and Technology, 3-2-1 Sakado, Takatsu-ku, Kawasaki, Kanagawa 213-0012, Japan</p></ins>
<ins id="I3"><p>Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa 920-0934, Japan</p></ins>
</insg>
<source>Algorithms for Molecular Biology</source>
<issn>1748-7188</issn>
<pubdate>2012</pubdate>
<volume>7</volume>
<issue>1</issue>
<fpage>5</fpage>
<url>http://www.almob.org/content/7/1/5</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1748-7188-7-5</pubid><pubid idtype="pmpid">22475125</pubid></pubidlist></xrefbib></bibl>
<history><rec><date><day>1</day><month>12</month><year>2011</year></date></rec><acc><date><day>5</day><month>4</month><year>2012</year></date></acc><pub><date><day>5</day><month>4</month><year>2012</year></date></pub></history><cpyrt><year>2012</year><collab>Kadota et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec><st><p>Abstract</p></st>
<sec><st><p>Background</p></st>
<p>High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.</p>
</sec>
<sec><st><p>Results</p></st>
<p>We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (<it>edgeR</it>, <it>DESeq</it>, <it>baySeq</it>, and <it>NBPSeq</it>) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.</p>
</sec>
<sec><st><p>Conclusion</p></st>
<p>Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.</p>
</sec>
</sec>
</abs>
</fm>
<bdy>
<sec><st><p>Background</p></st>
<p>Development of next-generation sequencing technologies has enabled biological features such as gene expression and histone modification to be quantified as tag count data by ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Different from hybridization-based microarray technologies <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, sequencing-based technologies do not require prior information about the genome or transcriptome sequences of the samples of interest <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Therefore, researchers can profile the expression of not only well-annotated model organisms but also poorly annotated non-model organisms. RNA-seq in such organisms enables the gene structures and expression levels to be determined.</p>
<p>One important task for RNA-seq is to identify differential expression (DE) for genes or transcripts. Similar to microarray analysis, we typically start the analysis with a so-called "gene expression matrix," where each row indicates the gene (or transcript), each column indicates the sample (or library), and each cell indicates the number of reads mapped to the gene in the sample. In general, there are two major factors for accurately quantifying and normalizing RNA-seq data: gene length and sequencing depth (or total read counts). Normalization by gene length is important for comparing different genes within a sample because longer genes tend to have more reads to be sequenced <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Previous approaches for normalizing length include defining an effective length of a gene that may have two or more transcript isoforms of different lengths, and normalizing by the length <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>.</p>
<p>Normalization by sequencing depth is particularly important for comparing genes in different samples because different samples generally have different total read counts. Previous approaches include (i) global scaling so that a summary statistic such as the mean or upper-quartile value of read counts for each sample (or library) becomes a common value and (ii) standardization of distribution so that the read count distributions become the same across samples <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Some groups recently reported that over-representation of genes with higher expression in one of the samples, i.e., biased differential expression, has a negative impact on data normalization and consequently can lead to biased estimates of true differentially expressed genes (DEGs) <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. To reduce the effect of such genes on the data normalization step, Robinson and Oshlack reported a simple but effective global scaling method, the trimmed mean of M values (TMM) method, where a scaling factor for the normalization is calculated as a weighted trimmed mean of the log ratios between two classes of samples (i.e., Samples A vs. B) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The concept of the TMM method is the basis for developing our normalization strategy.</p>
<p>In this paper, we focus on normalization related to sequencing depth as well as the TMM normalization method. We believe the TMM method can be improved. Consider, for example, a hypothetical dataset containing a total of 1000 genes, where (i) 200 genes (i.e., 200/1000 = 20%) are detected as DEGs when comparing Samples A vs. B (<it>P</it><sub>DEG </sub>= 20%), (ii) 180 of the 200 DEGs are highly expressed in Sample A (i.e., <it>P</it><sub>A </sub>= 180/200 = 90%), (iii) the dataset can be perfectly normalized by applying a normalization factor calculated based only on the remaining 800 non-DEGs, and (iv) individual DEGs (or non-DEGs) have a negative (or positive) impact on calculation of the normalization factor. In this case, the two parameters should ideally be estimated as <it>P</it><sub>DEG </sub>= 20% and <it>P</it><sub>A </sub>= 90%. Currently, the TMM method implicitly uses fixed values for these two parameters (i.e., <it>P</it><sub>DEG </sub>= 60 and <it>P</it><sub>A </sub>= 50) unless users explicitly provide arbitrary values <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. This is probably because an automatic estimation of the <it>P</it><sub>DEG </sub>value is practically difficult.</p>
<p>Hardcastle and Kelly <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> recently proposed an R <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> package, <it>baySeq</it>, for differential expression analysis of RNA-seq data. A notable advantage of this method is that an objective <it>P</it><sub>DEG </sub>value is produced by calculating multiple models of differential expression. This method also inspired us in our improvement of the normalization of RNA-seq data. Our normalization strategy, named TbT, consists of TMM <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and <it>baySeq </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, used twice and once respectively in a TMM-<it>baySeq</it>-TMM pipeline. We show the importance of estimating the <it>P</it><sub>DEG </sub>value according to the <it>true P</it><sub>DEG </sub>value for individual datasets. The results were obtained using simulated and real datasets.</p>
</sec>
<sec><st><p>Results and Discussion</p></st>
<p>RNA-seq data must be normalized before differential expression analysis can be conducted on them. Some R packages exist for comparing two groups of samples <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, and each package uses its own normalization method and gene ranking algorithm. For example, the R package <it>edgeR </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> uses the TMM method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> for data normalization and an exact test for negative binomial (NB) distribution <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> for gene ranking. A good normalization method coupled with gene ranking methods should produce good ranked gene lists where <it>true </it>DEGs can easily be detected as top-ranked and non-DEGs are bottom-ranked, when all genes are ranked according to the degree of DE.</p>
<p>Following from our previous study <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>, the area under the receiver operating characteristic (ROC) curve (i.e., AUC) values were used for evaluating individual combinations based on sensitivity and specificity simultaneously. A good combination should therefore have a high AUC value (i.e., high sensitivity and specificity). In the remainder of this paper, we first describe our normalization strategy (called TbT). We then evaluate a total of eight package combinations: four R packages for differential expression analysis (<it>edgeR </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, <it>DESeq </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, <it>baySeq </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and <it>NBPSeq </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>) with default normalization settings (which we call <it>edgeR</it>/default, <it>DESeq</it>/default, <it>baySeq</it>/default, and <it>NBPSeq</it>/default) and the same four packages with TbT normalization (i.e., <it>edgeR </it>coupled with TbT (<it>edgeR</it>/TbT), <it>DESeq</it>/TbT, <it>baySeq</it>/TbT, and <it>NBPSeq</it>/TbT). Finally, we discuss guidelines for meaningful differential expression analysis.</p>
<p>Note that the execution of the <it>baySeq </it>package was performed using data after scaling for the reads per million (RPM) mapped reads in each sample. The procedure in the <it>baySeq </it>package and in the other three packages (<it>edgeR</it>, <it>DESeq</it>, and <it>NBPSeq</it>) is not intended for use with RPM-normalized data, i.e., the original raw count data should be used as the input. However, we found that the use of RPM-normalized data generally yields higher AUC values compared to the use of raw count data when executing the <it>baySeq </it>package. We also found that the use of RPM data did not positively affect the results when the other three packages were executed. Accordingly, all of the results relating to the <it>baySeq </it>package were obtained using the RPM-normalized data. This includes step 2 in the TbT normalization and the gene ranking of DEGs using two <it>baySeq</it>-related combinations (<it>baySeq</it>/TbT and <it>baySeq</it>/default).</p>
<sec><st><p>Outline of TbT normalization strategy</p></st>
<p>The key feature of TbT is that data assigned as potential DEGs are removed before the normalization factor is calculated. We will explain the concept of TbT by using simulation data that are negative binomially distributed (three libraries from Sample A vs. three libraries from Sample B; i.e., {A<sub>1</sub>, A<sub>2</sub>, A<sub>3</sub>} vs. {B<sub>1</sub>, B<sub>2</sub>, B<sub>3</sub>}). The simulation conditions were that (i) 20% of genes were DEGs (<it>P</it><sub>DEG </sub>= 20%), (ii) 90% of <it>P</it><sub>DEG </sub>was higher in Sample A (<it>P</it><sub>A </sub>= 90%), and (iii) the level of DE was four-fold.</p>
<p>The NB model is generally applicable when the tag count data are based on biological replicates. It has been noted that the variance of biological replicate read counts for a gene (<it>V</it>) is higher than the mean (<it>&#956;</it>) of the read counts (e.g., <it>V </it>= <it>&#956; + &#981;&#956;</it><sup>2 </sup>where <it>&#981; </it>&gt; 0) and that the extra dispersion parameter <it>&#981; </it>tends to have large (or small) values when <it>&#956; </it>is small (or large) <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. To mimic this mean-dispersion relationship in the simulation, we used an empirical distribution of these values (<it>&#956; </it>and <it>&#981;</it>) calculated from Arabidopsis data available in the <it>NBPSeq </it>package <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. For details, see the Methods section.</p>
<p>An M-A plot of the simulation data, after scaling for RPM reads in each library, is shown in Figure <figr fid="F1">1a</figr>. The horizontal axis indicates the average expression level of a gene across two groups, and the vertical axis indicates log-ratios (Sample B relative to Sample A). As shown by the black horizontal line, the median log-ratio for non-DEGs based on the RPM-normalized data (0.543) has a clear offset from zero due to the introduced DEGs with the above three conditions. Therefore, the primary aim of our method is to accurately estimate the percentage of true DEGs (<it>P</it><sub>DEG</sub>) and trim the corresponding DEGs so that the median log-ratio for non-DEGs is as close to zero as possible when our TbT normalization factors are used.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Outline of TbT normalization strategy</p></caption><text>
   <p><b>Outline of TbT normalization strategy</b>. Left panel: M-A plot for negative binomially distributed simulation data from Ref. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, after scaling for RPM mapped reads in each sample. Magenta and black dots indicate DEGs (20% of all genes; <it>P</it><sub>DEG </sub>= 20%) and non-DEGs (80%), respectively. 90% of all DEGs is four-fold higher in Sample A than B (<it>P</it><sub>A </sub>= 90%). Each dot represents a gene. Right panel: same plot but colored differently. TbT estimates 16.8% of <it>P</it><sub>DEG </sub>using this data. Gray dots indicate genes estimated as non-DEGs by step 2 in TbT. Note that the median log-ratio for true non-DEGs when data normalization is performed using the TbT normalization factors (0.045) is closer to zero than that using the TMM normalization factors (0.170).</p>
</text><graphic file="1748-7188-7-5-1" hint_layout="double"/></fig>
<p>To accomplish this, our normalization method consists of three steps: (1) temporary normalization, (2) identification of DEGs, and (3) final normalization of data after eliminating those DEGs. We used the TMM method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> at steps 1 and 3 and an empirical Bayesian method implemented in the <it>baySeq </it>package <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> at step 2. Other methods could have been used, but our choices seemed to produce good ranked gene lists with high sensitivity and specificity (i.e., a high AUC value). We observed that the median log-ratio for non-DEGs based on our TbT normalization factors (0.045) was closer to zero than the log-ratio based on the TMM normalization factors (0.170) that corresponds to the result of TbT right after step 1 (Figure <figr fid="F1">1b</figr>).</p>
<p>This result suggests the validity of our strategy of removing potential DEGs before calculating the normalization factor. Recall that the true values for <it>P</it><sub>DEG </sub>and <it>P</it><sub>A </sub>in this simulation were 20% and 90%, respectively. Our TbT method estimated 16.8% of <it>P</it><sub>DEG </sub>and 76.3% of <it>P</it><sub>A</sub>. We found that 64.4% of the estimated DEGs were true DEGs (i.e., sensitivity = 64.4%) and that the overall accuracy was 89.0%. Some researchers might think that the TMM method (i.e., <it>P</it><sub>DEG </sub>= 60% and <it>P</it><sub>A </sub>= 50%) must be able to remove many more true DEGs than our TbT method (i.e., higher sensitivity). This is true, but the TMM method tends to trim many more non-DEGs than our method (i.e., lower specificity), especially when most DEGs are highly expressed in one of the samples (corresponding to our simulation conditions with high <it>P</it><sub>DEG </sub>and <it>P</it><sub>A </sub>values). These characteristics for the two normalization methods and the results shown in Figure <figr fid="F1">1</figr> indicate that the balance of sensitivity and specificity regarding the assignment of both DEGs and non-DEGs is critical. Our TbT method was originally designed to normalize tag count data for various scenarios including such biased differential expression.</p>
<p>The successful removal of DEGs in the data normalization step generally increases both the sensitivity and specificity of the subsequent differential expression analysis. Indeed, when an exact test implemented in the R package <it>edgeR </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> was used in common for gene ranking, the TbT normalization method showed a higher AUC value (i.e., <it>edgeR</it>/TbT = 90.0%) than the default (the TMM method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> in this package) normalization method (i.e., <it>edgeR</it>/default = 88.9%). We also observed the same trend for the other combinations: <it>DESeq</it>/TbT = 88.7%, <it>DESeq</it>/default = 87.4%, <it>baySeq</it>/TbT = 90.2%, <it>baySeq</it>/default = 78.2%, <it>NBPSeq</it>/TbT = 90.1%, and <it>NBPSeq</it>/default = 80.9%. These results also suggest that our TbT normalization strategy can successfully be combined with the four existing R packages and that the TbT method outperforms the other normalization methods implemented in these packages.</p>
</sec>
<sec><st><p>Simulation results</p></st>
<p>Note that different trials of simulation analysis generally yield different AUC values even if the same simulation conditions are introduced. It is important to show the statistical significance, if any, of our proposed method. The distributions of AUC values for two <it>edgeR</it>-related combinations (<it>edgeR</it>/TbT and <it>edgeR</it>/default) under three conditions (<it>P</it><sub>A </sub>= 50, 70, and 90% with a fixed <it>P</it><sub>DEG </sub>value of 20%) are shown in Figure <figr fid="F2">2</figr>. The performances between the two combinations were very similar when <it>P</it><sub>A </sub>= 50% (Figure <figr fid="F2">2a</figr>; <it>p</it>-value = 0.95, Wilcoxon rank sum test). This is reasonable because the average estimate of the <it>P</it><sub>A </sub>values by TbT in the 100 trials (49.62%) was quite close to the truth (i.e., 50%) and TMM uses a fixed <it>P</it><sub>A </sub>value of 50%. The higher the <it>P</it><sub>A </sub>value (&gt; 50%) TbT estimates, the higher the performance of TbT (compared to TMM) that can be expected.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Distributions of AUC values for two <it>edgeR</it>-related combinations</p></caption><text>
   <p><b>Distributions of AUC values for two <it>edgeR</it>-related combinations</b>. Simulation results for 100 trials under <it>P</it><sub>A </sub>= (a) 50%, (b) 70%, and (c) 90%, with <it>P</it><sub>DEG </sub>= 20%. Left panel: box plots for AUC values. Right panel: scatter plots for AUC values. When the performances between the two combinations are completely the same, all the points should be on the black (<it>y </it>= <it>x</it>) line. Point below (or above) the black line indicates that the AUC value from the <it>edgeR</it>/TbT combination is higher (or lower) than that from the <it>edgeR</it>/default combination.</p>
</text><graphic file="1748-7188-7-5-2" hint_layout="double"/></fig>
<p>Different from the above unbiased case (<it>P</it><sub>A </sub>= 50%), we observed the obvious superiority of TbT under the other two conditions (<it>P</it><sub>A </sub>= 70 and 90%). A significant improvement resulting from use of TbT may seem doubtful because of the very small difference between the two average AUC values (e.g., 90.52% for <it>edgeR</it>/TbT and 90.26% for <it>edgeR</it>/default when <it>P</it><sub>A </sub>= 70%; left panel of Figure <figr fid="F2">2b</figr>), but the <it>edgeR</it>/TbT combination did outperform the <it>edgeR</it>/default combination in all of the 100 trials under the two conditions (right panels of Figures <figr fid="F2">2b</figr> and <figr fid="F2">2c</figr>), and the <it>p</it>-values were lower than 0.01 (Wilcoxon rank sum test).</p>
<p>Table <tblr tid="T1">1</tblr> shows the average AUC values for the two <it>edgeR</it>-related combinations under the various simulation conditions (<it>P</it><sub>DEG </sub>= 5-30% and <it>P</it><sub>A </sub>= 50-100%). Overall, <it>edgeR</it>/TbT performed better than <it>edgeR</it>/default for most of the simulation conditions analyzed. The relative performance of TbT compared to the default method (i.e., the TMM method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> in this case) can be seen to improve according to the increased <it>P</it><sub>A </sub>values starting from 50%. This is because our estimated values for <it>P</it><sub>DEG </sub>and <it>P</it><sub>A </sub>are closer to the <it>true </it>values than the fixed values of TMM (<it>P</it><sub>DEG </sub>= 60% and <it>P</it><sub>A </sub>= 50%; see Table <tblr tid="T2">2</tblr>). The closeness of those estimations will inevitably increase the overall accuracy of assignment for DE and lead directly to the higher AUC values. This success primarily comes from our three-step normalization strategy, TbT (the TMM-<it>baySeq</it>-TMM pipeline).</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Average AUC values for two <it>edgeR</it>-related combinations.</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b><it>P</it><sub>A </sub>= 50%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>60%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>70%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>80%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>100%</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(a) <it>edgeR</it>/TbT</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p><it>P</it><sub>DEG </sub>= 5%</p>
         </c>
         <c ca="center">
            <p>90.52</p>
         </c>
         <c ca="center">
            <p>
               <b>89.92</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.58</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.67</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.59</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.10</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.33</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.23</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.80</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.14*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>91.02*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.39*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.43</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.53</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.52*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.60*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.41*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.41*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>90.71</p>
         </c>
         <c ca="center">
            <p>
               <b>90.66*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.23*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.67*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.00*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.46*</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(b) <it>edgeR</it>/default</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.52</b>
            </p>
         </c>
         <c ca="center">
            <p>89.92</p>
         </c>
         <c ca="center">
            <p>90.56</p>
         </c>
         <c ca="center">
            <p>90.62</p>
         </c>
         <c ca="center">
            <p>90.50</p>
         </c>
         <c ca="center">
            <p>89.95</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>90.33</p>
         </c>
         <c ca="center">
            <p>90.21</p>
         </c>
         <c ca="center">
            <p>90.73</p>
         </c>
         <c ca="center">
            <p>89.99</p>
         </c>
         <c ca="center">
            <p>90.74</p>
         </c>
         <c ca="center">
            <p>89.89</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>90.43</p>
         </c>
         <c ca="center">
            <p>90.49</p>
         </c>
         <c ca="center">
            <p>90.26</p>
         </c>
         <c ca="center">
            <p>90.00</p>
         </c>
         <c ca="center">
            <p>89.24</p>
         </c>
         <c ca="center">
            <p>88.40</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.71</b>
            </p>
         </c>
         <c ca="center">
            <p>90.54</p>
         </c>
         <c ca="center">
            <p>89.58</p>
         </c>
         <c ca="center">
            <p>89.35</p>
         </c>
         <c ca="center">
            <p>87.20</p>
         </c>
         <c ca="center">
            <p>84.55</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Average AUC values of total of 100 trials for each simulation condition: (a) <it>edgeR</it>/TbT and (b) <it>edgeR</it>/default. Simulation data contain a total of 20,000 genes: <it>P</it><sub>DEG </sub>% of genes is DEGs, and <it>P</it><sub>A </sub>% of <it>P</it><sub>DEG </sub>is higher in Sample A. A total of 24 conditions (four <it>P</it><sub>DEG </sub>values &#215; six <it>P</it><sub>A </sub>values) are shown. Highest AUC value for each condition is in bold. AUC values with asterisks indicate significant improvements (<it>p</it>-value &lt; 0.01, Wilcoxon rank sum test).</p>
   </tblfn></tbl>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Estimated values for <it>P</it><sub>DEG </sub>and <it>P</it><sub>A </sub>by TbT.</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>True <it>P</it><sub>A </sub>= 50%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>60%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>70%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>80%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>100%</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>(a) Estimated <it>P</it><sub>DEG </sub>(%)</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p><it>P</it><sub>DEG </sub>= 5%</p>
         </c>
         <c ca="left">
            <p>5.65</p>
         </c>
         <c ca="center">
            <p>5.44</p>
         </c>
         <c ca="center">
            <p>5.68</p>
         </c>
         <c ca="center">
            <p>5.61</p>
         </c>
         <c ca="center">
            <p>5.67</p>
         </c>
         <c ca="center">
            <p>5.54</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="left">
            <p>9.38</p>
         </c>
         <c ca="center">
            <p>9.39</p>
         </c>
         <c ca="center">
            <p>9.58</p>
         </c>
         <c ca="center">
            <p>9.28</p>
         </c>
         <c ca="center">
            <p>9.54</p>
         </c>
         <c ca="center">
            <p>9.31</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="left">
            <p>17.14</p>
         </c>
         <c ca="center">
            <p>17.41</p>
         </c>
         <c ca="center">
            <p>17.21</p>
         </c>
         <c ca="center">
            <p>17.22</p>
         </c>
         <c ca="center">
            <p>17.11</p>
         </c>
         <c ca="center">
            <p>17.01</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="left">
            <p>25.47</p>
         </c>
         <c ca="center">
            <p>25.19</p>
         </c>
         <c ca="center">
            <p>24.87</p>
         </c>
         <c ca="center">
            <p>25.15</p>
         </c>
         <c ca="center">
            <p>24.61</p>
         </c>
         <c ca="center">
            <p>24.34</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>(b) Estimated <it>P</it><sub>A </sub>(%)</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="left">
            <p>49.44</p>
         </c>
         <c ca="center">
            <p>55.08</p>
         </c>
         <c ca="center">
            <p>59.55</p>
         </c>
         <c ca="center">
            <p>65.56</p>
         </c>
         <c ca="center">
            <p>70.02</p>
         </c>
         <c ca="center">
            <p>74.35</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="left">
            <p>50.66</p>
         </c>
         <c ca="center">
            <p>56.27</p>
         </c>
         <c ca="center">
            <p>61.64</p>
         </c>
         <c ca="center">
            <p>67.47</p>
         </c>
         <c ca="center">
            <p>73.98</p>
         </c>
         <c ca="center">
            <p>79.51</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="left">
            <p>49.62</p>
         </c>
         <c ca="center">
            <p>57.41</p>
         </c>
         <c ca="center">
            <p>63.67</p>
         </c>
         <c ca="center">
            <p>69.17</p>
         </c>
         <c ca="center">
            <p>75.49</p>
         </c>
         <c ca="center">
            <p>82.30</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="left">
            <p>50.05</p>
         </c>
         <c ca="center">
            <p>56.58</p>
         </c>
         <c ca="center">
            <p>63.34</p>
         </c>
         <c ca="center">
            <p>70.08</p>
         </c>
         <c ca="center">
            <p>72.47</p>
         </c>
         <c ca="center">
            <p>76.05</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>(c) Sensitivity</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="left">
            <p>62.17</p>
         </c>
         <c ca="center">
            <p>59.53</p>
         </c>
         <c ca="center">
            <p>62.13</p>
         </c>
         <c ca="center">
            <p>62.06</p>
         </c>
         <c ca="center">
            <p>61.79</p>
         </c>
         <c ca="center">
            <p>60.32</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="left">
            <p>63.61</p>
         </c>
         <c ca="center">
            <p>63.41</p>
         </c>
         <c ca="center">
            <p>64.85</p>
         </c>
         <c ca="center">
            <p>62.59</p>
         </c>
         <c ca="center">
            <p>64.27</p>
         </c>
         <c ca="center">
            <p>62.31</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="left">
            <p>67.37</p>
         </c>
         <c ca="center">
            <p>68.15</p>
         </c>
         <c ca="center">
            <p>67.24</p>
         </c>
         <c ca="center">
            <p>66.68</p>
         </c>
         <c ca="center">
            <p>65.13</p>
         </c>
         <c ca="center">
            <p>63.98</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="left">
            <p>71.07</p>
         </c>
         <c ca="center">
            <p>70.09</p>
         </c>
         <c ca="center">
            <p>68.53</p>
         </c>
         <c ca="center">
            <p>68.69</p>
         </c>
         <c ca="center">
            <p>63.99</p>
         </c>
         <c ca="center">
            <p>59.56</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>(d) Specificity</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="left">
            <p>97.35</p>
         </c>
         <c ca="center">
            <p>97.43</p>
         </c>
         <c ca="center">
            <p>97.31</p>
         </c>
         <c ca="center">
            <p>97.39</p>
         </c>
         <c ca="center">
            <p>97.31</p>
         </c>
         <c ca="center">
            <p>97.37</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="left">
            <p>96.70</p>
         </c>
         <c ca="center">
            <p>96.67</p>
         </c>
         <c ca="center">
            <p>96.62</p>
         </c>
         <c ca="center">
            <p>96.69</p>
         </c>
         <c ca="center">
            <p>96.60</p>
         </c>
         <c ca="center">
            <p>96.64</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="left">
            <p>95.53</p>
         </c>
         <c ca="center">
            <p>95.39</p>
         </c>
         <c ca="center">
            <p>95.40</p>
         </c>
         <c ca="center">
            <p>95.26</p>
         </c>
         <c ca="center">
            <p>95.01</p>
         </c>
         <c ca="center">
            <p>94.84</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="left">
            <p>94.25</p>
         </c>
         <c ca="center">
            <p>94.22</p>
         </c>
         <c ca="center">
            <p>94.00</p>
         </c>
         <c ca="center">
            <p>93.67</p>
         </c>
         <c ca="center">
            <p>92.42</p>
         </c>
         <c ca="center">
            <p>90.89</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>(e) Accuracy</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="left">
            <p>95.58</p>
         </c>
         <c ca="center">
            <p>95.52</p>
         </c>
         <c ca="center">
            <p>95.54</p>
         </c>
         <c ca="center">
            <p>95.61</p>
         </c>
         <c ca="center">
            <p>95.52</p>
         </c>
         <c ca="center">
            <p>95.50</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="left">
            <p>93.36</p>
         </c>
         <c ca="center">
            <p>93.31</p>
         </c>
         <c ca="center">
            <p>93.42</p>
         </c>
         <c ca="center">
            <p>93.26</p>
         </c>
         <c ca="center">
            <p>93.34</p>
         </c>
         <c ca="center">
            <p>93.17</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="left">
            <p>89.86</p>
         </c>
         <c ca="center">
            <p>89.90</p>
         </c>
         <c ca="center">
            <p>89.73</p>
         </c>
         <c ca="center">
            <p>89.50</p>
         </c>
         <c ca="center">
            <p>88.99</p>
         </c>
         <c ca="center">
            <p>88.62</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="left">
            <p>87.25</p>
         </c>
         <c ca="center">
            <p>86.94</p>
         </c>
         <c ca="center">
            <p>86.32</p>
         </c>
         <c ca="center">
            <p>86.13</p>
         </c>
         <c ca="center">
            <p>83.84</p>
         </c>
         <c ca="center">
            <p>81.43</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Average estimates of total of 100 trials for (a) <it>P</it><sub>DEG </sub>and (b) <it>P</it><sub>A</sub>. The (c) sensitivity, (d) specificity, and (e) accuracy for the estimation are also shown.</p>
   </tblfn></tbl>
<p>Table <tblr tid="T3">3</tblr> shows the simulation results for the other six combinations. As can be seen, TbT performed better than the individual default normalization methods implemented in the other three packages (<it>DESeq </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, <it>baySeq </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and <it>NBPSeq </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>). When we compare the results of the four default procedures (<it>edgeR</it>/default, <it>DESeq</it>/default, <it>baySeq</it>/default, and <it>NBPSeq</it>/default), the <it>edgeR</it>/default combination outperforms the others. This result suggests the superiority of the default normalization method (i.e., TMM) implemented in the <it>edgeR </it>package and the validity of our choices at steps 1 and 3 in our TbT normalization strategy. For reproducing the research, the R-code for obtaining a small portion of the above results is given in Additional file <supplr sid="S1">1</supplr>.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Average AUC values for other six combinations.</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b><it>P</it><sub>A </sub>= 50%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>60%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>70%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>80%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>100%</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(a) <it>DESeq</it>/TbT</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p><it>P</it><sub>DEG </sub>= 5%</p>
         </c>
         <c ca="center">
            <p>
               <b>85.03</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>83.94</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>85.20</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>85.31</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>85.12*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>84.60*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>
               <b>86.94</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>86.90</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>87.42*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>86.80*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>87.61*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>86.95*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>89.05</p>
         </c>
         <c ca="center">
            <p>
               <b>89.23</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.18*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.33*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>88.97*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>88.95*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.30</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.20*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.79*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.11*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.44*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>88.80*</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(b) <it>DESeq</it>/default</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>85.03</p>
         </c>
         <c ca="center">
            <p>83.92</p>
         </c>
         <c ca="center">
            <p>85.13</p>
         </c>
         <c ca="center">
            <p>85.19</p>
         </c>
         <c ca="center">
            <p>84.84</p>
         </c>
         <c ca="center">
            <p>84.18</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>86.93</p>
         </c>
         <c ca="center">
            <p>86.85</p>
         </c>
         <c ca="center">
            <p>87.27</p>
         </c>
         <c ca="center">
            <p>86.46</p>
         </c>
         <c ca="center">
            <p>87.07</p>
         </c>
         <c ca="center">
            <p>86.15</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>
               <b>89.06</b>
            </p>
         </c>
         <c ca="center">
            <p>89.19</p>
         </c>
         <c ca="center">
            <p>88.93</p>
         </c>
         <c ca="center">
            <p>88.62</p>
         </c>
         <c ca="center">
            <p>87.76</p>
         </c>
         <c ca="center">
            <p>86.84</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>90.30</p>
         </c>
         <c ca="center">
            <p>90.00</p>
         </c>
         <c ca="center">
            <p>88.94</p>
         </c>
         <c ca="center">
            <p>87.95</p>
         </c>
         <c ca="center">
            <p>85.36</p>
         </c>
         <c ca="center">
            <p>81.98</p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(c) <it>baySeq</it>/TbT</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>
               <b>89.91</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.45</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.91*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.17*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.93*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.36*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>
               <b>89.89</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.90*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.46*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.79*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.28*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.02*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.39*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.46*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.40*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.49*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.21*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.47*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.80</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.55*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.44*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.69*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.26*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>88.33*</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(d) <it>baySeq</it>/default</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>89.67</p>
         </c>
         <c ca="center">
            <p>89.27</p>
         </c>
         <c ca="center">
            <p>88.62</p>
         </c>
         <c ca="center">
            <p>88.69</p>
         </c>
         <c ca="center">
            <p>86.37</p>
         </c>
         <c ca="center">
            <p>86.18</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>89.80</p>
         </c>
         <c ca="center">
            <p>89.55</p>
         </c>
         <c ca="center">
            <p>89.52</p>
         </c>
         <c ca="center">
            <p>87.71</p>
         </c>
         <c ca="center">
            <p>84.14</p>
         </c>
         <c ca="center">
            <p>83.86</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>90.22</p>
         </c>
         <c ca="center">
            <p>88.78</p>
         </c>
         <c ca="center">
            <p>88.92</p>
         </c>
         <c ca="center">
            <p>87.85</p>
         </c>
         <c ca="center">
            <p>79.09</p>
         </c>
         <c ca="center">
            <p>69.65</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>90.76</p>
         </c>
         <c ca="center">
            <p>90.05</p>
         </c>
         <c ca="center">
            <p>87.21</p>
         </c>
         <c ca="center">
            <p>79.69</p>
         </c>
         <c ca="center">
            <p>65.45</p>
         </c>
         <c ca="center">
            <p>53.37</p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(e) <it>NBPSeq</it>/TbT</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.75*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.18</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.80*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.90*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.78*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.33*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.59*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.47*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>91.00*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.34*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>91.14*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.49*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.67*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.72*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.70*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.68*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.42*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.37*</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>
               <b>90.92</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.83*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.32*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>90.74*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.89*</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>89.23*</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>(f) <it>NBPSeq</it>/default</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>90.48</p>
         </c>
         <c ca="center">
            <p>90.00</p>
         </c>
         <c ca="center">
            <p>89.71</p>
         </c>
         <c ca="center">
            <p>89.58</p>
         </c>
         <c ca="center">
            <p>87.85</p>
         </c>
         <c ca="center">
            <p>87.60</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>90.34</p>
         </c>
         <c ca="center">
            <p>90.15</p>
         </c>
         <c ca="center">
            <p>90.11</p>
         </c>
         <c ca="center">
            <p>88.46</p>
         </c>
         <c ca="center">
            <p>86.19</p>
         </c>
         <c ca="center">
            <p>85.38</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>20%</p>
         </c>
         <c ca="center">
            <p>90.39</p>
         </c>
         <c ca="center">
            <p>89.12</p>
         </c>
         <c ca="center">
            <p>89.22</p>
         </c>
         <c ca="center">
            <p>88.29</p>
         </c>
         <c ca="center">
            <p>81.59</p>
         </c>
         <c ca="center">
            <p>73.93</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>30%</p>
         </c>
         <c ca="center">
            <p>90.84</p>
         </c>
         <c ca="center">
            <p>90.26</p>
         </c>
         <c ca="center">
            <p>87.45</p>
         </c>
         <c ca="center">
            <p>81.96</p>
         </c>
         <c ca="center">
            <p>70.97</p>
         </c>
         <c ca="center">
            <p>60.73</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Results for (a) <it>DESeq</it>/TbT, (b) <it>DESeq</it>/default, (c) <it>baySeq</it>/TbT, (d) <it>baySeq</it>/default, (e) <it>NBPSeq</it>/TbT, and (f) <it>NBPSeq</it>/default. Higher AUC values between different normalization methods in each package are in bold. AUC values with asterisks indicate significant improvements (<it>p</it>-value &lt; 0.01, Wilcoxon rank sum test).</p>
   </tblfn></tbl>
<suppl id="S1">
<title><p>Additional file 1</p></title>
<text><p><b>R-code for simulation analysis</b>. After execution of this R-code with default parameter settings (i.e., <it>rep_num </it>= 100, <it>param1 </it>= 4,.., and <it>param6 </it>= 090), two output files named "Fig1.png" and "resultNB_020_090.txt" can be obtained. The former is the same as Figure <figr fid="F1">1</figr>. The latter output file will contain raw data for Tables <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr> when <it>P</it><sub>DEG </sub>= 20% and <it>P</it><sub>A </sub>= 90%. The numbers given as <it>rep_num</it>, <it>param1</it>,..., and <it>param6 </it>indicate the number of trials (<it>rep_num</it>), degree of differential expression of fold-change (<it>param1</it>-fold), number of libraries for sample A (<it>param2</it>), number of libraries for sample B (<it>param3</it>), total number of genes (<it>param4</it>), true <it>P</it><sub>DEG </sub>(<it>param5</it>), and true <it>P</it><sub>A </sub>(<it>param6</it>), respectively. Accordingly, for example, respective values for <it>param5 </it>and <it>param6 </it>should be changed to "030" and "060", to obtain the raw results when <it>P</it><sub>DEG </sub>= 30% and <it>P</it><sub>A </sub>= 60%.</p></text>
<file name="1748-7188-7-5-S1.R">
   <p>Click here for file</p>
</file>
</suppl>
<p>Recall that the level of DE for DEGs was four-fold in this simulation framework and the shape of the distribution for introduced DEGs is the same as that of non-DEGs (left panel of Figure <figr fid="F1">1</figr>). This indicates that some DEGs introduced as higher expression in Sample A (or Sample B) can display positive (or negative) M values even after adjustment by the median M value for non-DEGs. In other words, there are some DEGs whose log-ratio signs (i.e., directions of DE) are different from the original intentions. Although the simulation framework regarding the introduction of DEGs was the same as that described in the TMM study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, this may weaken the validity of the current simulation framework.</p>
<p>To mitigate this concern, we performed simulations with compatible directions of DE by adding a floor value of fold-changes (&gt; 1.2-fold) when introducing DEGs. In this simulation, the fold-changes for DEGs were randomly sampled from "1.2 + a gamma distribution with shape = 2.0 and scale = 0.5." Accordingly, the minimum and mean fold-changes were approximately 1.2 and 2.2 (= 1.2 + 2.0 &#215; 0.5), respectively. We confirmed the superiority of TbT under the various simulation conditions (<it>P</it><sub>DEG </sub>= 5-30% and <it>P</it><sub>A </sub>= 50-100%) with the above simulation framework (data not shown). An M-A plot of the simulation result when <it>P</it><sub>DEG </sub>= 20% and <it>P</it><sub>A </sub>= 90% is given in Additional file <supplr sid="S2">2</supplr>. The R-code for obtaining the full results under the simulation condition is given in Additional file <supplr sid="S3">3</supplr>.</p>
<suppl id="S2">
<title><p>Additional file 2</p></title>
<text><p><b>Result of TbT using simulation data with &gt; 1.2-fold of DEGs</b>. Legends in this figure are essentially the same as those described in Figure <figr fid="F1">1</figr>. The difference between the two is the distributions of DEGs (magenta dots). This simulation does not have DEGs with low fold-changes (&lt; = 1.2-fold) and the average fold-change is theoretically 2.2. The R code for obtaining the full results under the simulation condition (i.e., <it>P</it><sub>DEG </sub>= 20% and <it>P</it><sub>A </sub>= 90%) is given in Additional file <supplr sid="S3">3</supplr>.</p></text>
<file name="1748-7188-7-5-S2.PPT">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S3">
<title><p>Additional file 3</p></title>
<text><p><b>R-code for obtaining simulation results with &gt; 1.2-fold of DEGs</b>. After execution of this R-code with default parameter settings (i.e., <it>rep_num </it>= 100, <it>param1 </it>= c(1.2, 2.0, 0.5),..., and <it>param6 </it>= 090), two output files named "Additional2.png" and "resultNB2_020_090.txt" can be obtained. The former is the same as Additional file <supplr sid="S2">2</supplr>. The format of the latter output file is essentially the same as the "resultNB_020_090.txt" file obtained by executing Additional file <supplr sid="S1">1</supplr>. The main difference between the current code and Additional file <supplr sid="S1">1</supplr> is in the parameter settings for producing the distributions of DEGs at <it>param1</it>. The parameter values (1.2, 2.0, and 0.5) indicated in <it>param1 </it>are used for the minimum fold-change (= 1.2) and for random sampling of fold-change values from a gamma distribution with shape (= 2.0) and scale (= 0.5) parameters, respectively.</p></text>
<file name="1748-7188-7-5-S3.R">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec><st><p>Iterative normalization approach</p></st>
<p>Recall that the outperformance of TbT compared to TMM (see Table <tblr tid="T1">1</tblr> and Figure <figr fid="F2">2</figr>) is by virtue of our DEG elimination strategy for normalizing tag count data and that the identification of DEGs in TbT is performed using <it>baySeq </it>with the TMM normalization factors at step 2. From these facts, it is expected that the accuracy of the DEG identification at step 2 can be increased by using <it>baySeq </it>with the TbT factors instead of the TMM factors when <it>P</it><sub>A </sub>&gt; 50%. The advanced DEG elimination procedure (the TbT-<it>baySeq</it>-TMM pipeline) can produce different normalization factors (say "TbT1") from the original ones. As also illustrated in Figure <figr fid="F3">3a</figr>, this procedure can repeatedly be performed until the calculated normalization factors become convergent.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Results of iterative TbT approach</p></caption><text>
   <p><b>Results of iterative TbT approach</b>. (a) Procedure for iterative TbT approach until the third iteration, and simulation results under <it>P</it><sub>A </sub>= (b) 50%, (c) 70%, and (d) 90%, with <it>P</it><sub>DEG </sub>= 20%. Left panel: accuracies of DEG identifications when step 2 in our DEG elimination strategy is performed using the following normalization factors: TMM (<it>Default</it>), TbT (<it>First</it>), TbT1 (<it>Second</it>), and TbT2 (<it>Third</it>). Right panel: AUC values when the following normalization factors are combined with the <it>edgeR </it>package: TbT (<it>Default</it>), TbT1 (<it>First</it>), TbT2 (<it>Second</it>), and TbT3 (<it>Third</it>).</p>
</text><graphic file="1748-7188-7-5-3" hint_layout="double"/></fig>
<p>The results under three simulation conditions (<it>P</it><sub>A </sub>= 50, 70, and 90% with a fixed <it>P</it><sub>DEG </sub>value of 20%) are shown in Figures <figr fid="F3">3b-d</figr>. The left panels show the accuracies of DEG identifications when step 2 in our DEG elimination procedures is performed using the following normalization factors: TMM (<it>Default</it>), TbT (<it>First</it>), TbT1 (<it>Second</it>), and TbT2 (<it>Third</it>). As expected, the iterative approach does not positively affect the results when <it>P</it><sub>A </sub>= 50% (Figure <figr fid="F3">3b</figr>). Indeed, the performances between the <it>baySeq</it>/TMM combination (<it>Default</it>) and the <it>baySeq</it>/TbT2 combination (<it>Third</it>) are not statistically distinguished (<it>p </it>= 0.38, Wilcoxon rank sum test). Meanwhile, the use of the <it>baySeq</it>/TbT combination (<it>First</it>) can clearly increase the accuracy compared to use of the <it>baySeq</it>/TMM combination (<it>Default</it>), though the subsequent iterations do not improve the accuracies when <it>P</it><sub>A </sub>= 70% (Figure <figr fid="F3">3c</figr>, left panel). An advantageous trend for the iterative approach was also observed until the second iteration (<it>Second</it>; the <it>baySeq</it>/TbT1 combination) when <it>P</it><sub>A </sub>= 90% (Figure <figr fid="F3">3d</figr>, left panel).</p>
<p>The right panels for Figures <figr fid="F3">3b-d</figr> show the AUC values when the following normalization factors are combined with the <it>edgeR </it>package: TbT (<it>Default</it>), TbT1 (<it>First</it>), TbT2 (<it>Second</it>), and TbT3 (<it>Third</it>). The overall trend is the same as that of the accuracies shown in the left panels: the iterative TbT approach can outperform the original TbT approach when the degree of biased differential expression is high (<it>P</it><sub>A </sub>&gt; 50%). We confirmed the utility of the iterative approach with the other three packages (<it>DESeq</it>, <it>baySeq</it>, and <it>NBPSeq</it>) (data not shown). These results suggest that the iterative approach can be recommended, especially when the <it>P</it><sub>A </sub>value estimated by the original TbT method is displaced from 50%.</p>
<p>Nevertheless, we should emphasize that the improvement of the iterative TbT approach compared to the original TbT approach is much smaller than that of the TbT compared to the default normalization methods implemented in the four R packages investigated (Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr>). For example, the average difference of the AUC values between the <it>edgeR</it>/TbT3 and the <it>edgeR</it>/TbT is 0.02% (Figure <figr fid="F3">3c</figr>) while the average difference of the AUC values between the <it>edgeR</it>/TbT and the <it>edgeR</it>/default is 0.26% (Figure <figr fid="F2">2b</figr>), when <it>P</it><sub>A </sub>= 70%. Note also that the <it>baySeq </it>package used in step 2 in our TbT method is much more computationally intensive than the other three packages, indicating that the <it>n </it>times iteration of TbT roughly requires <it>n</it>-fold computation time. In this sense, a speed-up of our proposed DEG elimination strategy should be performed next as future work. The R-code for obtaining a small portion of the above results is given in Additional file <supplr sid="S4">4</supplr>.</p>
<suppl id="S4">
<title><p>Additional file 4</p></title>
<text><p><b>R-code for obtaining raw results shown in Figure </b><figr fid="F3">3</figr>. After execution of this R-code with default parameter settings (i.e., <it>rep_num </it>= 100, <it>param1 </it>= 4,..., and <it>param7 </it>= 5000), four output files named "iteration0_020_090.txt", "iteration1_020_090.txt", "iteration2_020_090.txt", and "iteration3_020_090.txt" can be obtained. The box plots for <it>Default</it>, <it>First</it>, <it>Second</it>, and <it>Third </it>shown in Figure <figr fid="F3">3</figr> are produced using values in two columns (named "accuracy" and "AUC(edgeR/TbT)") in the first, second, third, and fourth file, respectively. The <it>p</it>-values were calculated based on the Wilcoxon rank sum test.</p></text>
<file name="1748-7188-7-5-S4.R">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec><st><p>Real data (wildtype vs. <it>RDR6 </it>knockout dataset used in <it>baySeq </it>study)</p></st>
<p>Finally, we show results from an analysis similar to that described in Ref. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In brief, Hardcastle and Kelly compared two wildtype and two <it>RNA-dependent RNA polymerase 6 </it>(<it>RDR6</it>) knockout <it>Arabidopsis thaliana </it>leaf samples by sequencing small RNAs (sRNAs). From a total of 70,619 unique sRNA sequences, they identified 657 differentially expressed (DE) sRNAs that uniquely match tasRNA, which is produced by <it>RDR6</it>, and that are decreased in <it>RDR6 </it>mutants and regarded as provisional true positives. Therefore, we assume that the logical values for <it>P</it><sub>DEG </sub>and <it>P</it><sub>A </sub>are at least 0.93% (= 657/70,619) and around 100%, respectively. In accordance with that study <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, the evaluation metric here is that a good method should be able to rank those true positives as highly as possible. Recall that the strategy for TbT is to normalize data after the elimination of such DE sRNAs for such a purpose.</p>
<p>The TbT estimated 9.0% of <it>P</it><sub>DEG </sub>(5,495 <it>potential </it>DE sRNAs) and 70.2% of <it>P</it><sub>A</sub>. We found that the 5,495 sRNAs included 255 of the 657 true positives. This suggests that our strategy was effective because the original percentage (657/70,619 = 0.93%) of true positives decreased ((657 - 255)/(70,619 - 5,495) = 0.62%) before the TbT normalization factor was calculated at step 3. In summary, the TbT normalization factor was calculated based on 65,124 (= 70,619 - 5,495) potentially non-DE sRNAs after 255 out of the 657 provisional DE sRNAs were eliminated.</p>
<p>A true discovery plot (the number of provisional true positives when an arbitrary number of top-ranked sRNAs is selected as differentially expressed) is shown in Figure <figr fid="F4">4a</figr>. Note that this figure is essentially the same as Figure five in Ref. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, so we chose the colors for indicating individual R packages and the ranges for both axes to be as similar as possible to the original. Since the original study <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> reported that another package (<it>DEGseq </it><abbrgrp><abbr bid="B26">26</abbr></abbrgrp>) was the best when the range in the figure was evaluated, we also analyzed the package with the same parameter settings as in Ref. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and obtained a reproducible result for <it>DEGseq</it>.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Results for real data</p></caption><text>
   <p><b>Results for real data</b>. (a) Number of tasRNA-associated sRNAs (i.e., provisional true discoveries) for given numbers of top-ranked sRNAs obtained from individual combinations. Combinations of individual R packages with TbT and default normalization methods are indicated by dashed and solid lines, respectively. For easy comparison with the previous study, results of <it>DEGseq </it>with the same parameter settings as in the previous study are also shown (solid yellow line). (b) Full ROC plots. Plots on left side (roughly the [0.00, 0.05] region on the <it>x</it>-axis) are essentially the same as those shown in Figure 4a. The R-code for producing Figure 4 is available in Additional file <supplr sid="S5">5</supplr>.</p>
</text><graphic file="1748-7188-7-5-4" hint_layout="double"/></fig>
<p>Three combinations (<it>baySeq</it>/TbT, <it>edgeR</it>/TbT, and <it>edgeR</it>/default) outperformed the <it>DEGseq </it>package. The higher performances of these combinations were also observed from the full ROC curves (Figure <figr fid="F4">4b</figr>). The <it>baySeq</it>/TbT combination displayed the highest AUC value (74.6%), followed by <it>edgeR</it>/default (70.0%) and <it>edgeR</it>/TbT (69.3%). Recall that the <it>edgeR</it>/default combination uses the TMM normalization method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and that the basic strategy (i.e., potential DEGs are not used) for data normalization is essentially the same as that of our TbT. This result confirms the previous findings <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>: potential DE entities have a negative impact on data normalization, and their existences themselves consequently interfere with their opportunity to be top-ranked.</p>
<p>Three combinations (<it>edgeR</it>/default, <it>DESeq</it>/default, and <it>baySeq</it>/default) performed differently between the current study and the original one <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The difference for the first two combinations can be explained by the different choices for the <it>default </it>normalization methods. Hardcastle and Kelly <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> used a simple normalization method by adjusting the total number of reads in each library for both packages with a reasonable explanation for why the recommended method (i.e., the default method we used here) implemented in the <it>DESeq </it>package was not used. The TMM normalization method that we used as the <it>default </it>in the <it>edgeR </it>package was probably not implemented in the package when they conducted their evaluation. We found that both procedures (i.e., <it>edgeR </it>and <it>DESeq </it>packages with library-size normalization) performed poorly on average (data not shown).</p>
<p>The difference between the current result (<it>baySeq</it>/default; solid red line in Figure <figr fid="F4">4a</figr>) and the previous result (dashed red line in Figure five in Ref. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>) might be explained by the fact that bootstrap resampling was conducted a different number of times for estimating the empirical distribution on the parameters of the NB distribution. Although the current result was obtained using 10,000 iterations of resampling as suggested in the package, we sometimes obtained a similar result to the previous one when we analyzed <it>baySeq</it>/default using 1,000 iterations of resampling. We therefore determined that the previous result was obtained by taking a small sample, such as 1,000 iterations. In any case, we found that those results for the <it>baySeq</it>/default combination with different parameter settings were overall inferior to the <it>baySeq</it>/TbT combination. For reproducing the research, the R-code for obtaining the results in Figure <figr fid="F4">4</figr> and AUC values for individual combinations is given in Additional file <supplr sid="S5">5</supplr>.</p>
<suppl id="S5">
<title><p>Additional file 5</p></title>
<text><p><b>R-code for producing Figure </b><figr fid="F4">4</figr> <b>and AUC values for individual combinations</b>. We obtained an input file (named "rdr6_wt.RData") from Dr. T.J. Hardcastle (the corresponding author of Ref. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>). After execution of this R-code, three output files (arbitrarily named "Fig4a.png", "Fig4b.png", and "AUCvalue_Fig4b.txt") can be obtained.</p></text>
<file name="1748-7188-7-5-S5.R">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
</sec>
<sec><st><p>Conclusion</p></st>
<p>We described a strategy (called TbT as an acronym for the TMM-<it>baySeq</it>-TMM procedure) for normalizing tag count data. We evaluated the feasibility of TbT based on three commonly used R packages (<it>edgeR</it>, <it>DESeq</it>, and <it>baySeq</it>) and a recently published package <it>NBPSeq</it>, using a variety of simulation data and a real dataset. By comparing the default procedures recommended in the individual packages (<it>edgeR</it>/default, <it>DESeq</it>/default, <it>baySeq</it>/default, and <it>NBPSeq</it>/default) and procedures where our proposed TbT was used in the normalization step instead of the default normalization method (<it>edgeR</it>/TbT, <it>DESeq</it>/TbT, <it>baySeq</it>/TbT, and <it>NBPSeq</it>/TbT), the effectiveness of TbT has been suggested for increasing the sensitivity and specificity of differential expression analysis of tag count data such as RNA-seq.</p>
<p>Our study demonstrated that the elimination of potential DEGs is essential for obtaining good normalized data. In other words, the elimination of the DEGs before data normalization can increase both sensitivity and specificity for identifying DEGs. Conventional approaches consisting of two steps (i.e., data normalization and gene ranking) cannot accomplish this aim in principle. The two-step approach includes the default procedures recommended in individual packages (<it>edgeR</it>/default, <it>DESeq</it>/default, <it>baySeq</it>/default, and <it>NBPSeq</it>/default). Our proposed approach consists of a total of four steps (data normalization, DEG identification, data normalization, and DEG identification). This procedure enables potential DEGs to be eliminated before the second normalization (step 3).</p>
<p>Our TbT normalization strategy is a proposed pipeline for the first three steps, where the TMM normalization method is used at steps 1 and 3 and the empirical Bayesian method implemented in the <it>baySeq </it>package is used at step 2. This is because our strategy was originally designed to improve the TMM method, the default method implemented in the <it>edgeR </it>package. As demonstrated in the current simulation results comparing two groups (for example, samples A and B), the use of default normalization methods implemented in the existing R packages performed poorly in simulations where almost all the DEGs are highly expressed in Sample A (i.e., the case of <it>P</it><sub>A </sub>&gt; &gt; 50% when the range is defined as 50% &#8804; <it>P</it><sub>A </sub>&#8804; 100%). Although the negative impact derived from such biased differential expression gradually increases according to the increased proportion of DEGs in the data, our strategy can eliminate some of those DEGs before data normalization (Tables <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>, and <tblr tid="T3">3</tblr>). The use of the empirical Bayesian method implemented in the <it>baySeq </it>package primarily contributes to solving this problem.</p>
<p>Although we focused on expression-level data in this study, similar analysis of differences in ChIP-seq tag counts would benefit from this method. It is natural to expect that loss of the function of histone modification enzymes will lead to biased distribution of the difference between compared conditions in the corresponding ChIP-seq analysis, in a similar way to the <it>RDR6 </it>case. We observed relatively high performances for <it>NBPSeq</it>/TbT when analyzing simulation data (Tables <tblr tid="T1">1</tblr> and <tblr tid="T3">3</tblr>) and <it>baySeq</it>/TbT when analyzing a real dataset (Figure <figr fid="F4">4</figr>). However, this might simply be because the simulation and real data used in this study were derived from the <it>NBPSeq </it>study <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and the <it>baySeq </it>study <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, respectively. In this sense, the <it>edgeR</it>/TbT combination might be suitable because it performed comparably to the individual bests. The DEG elimination strategy we proposed here could be applied for many other combinations of methods, e.g., the use of an exact test for NB distribution <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> for detecting potential DEGs at step 2. A more extensive study with other recently proposed methods (e.g., Ref. <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>) based on many real datasets should still be performed.</p>
</sec>
<sec><st><p>Methods</p></st>
<p>All analyses were basically performed using R (ver. 2.14.1) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and Bioconductor <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
<sec><st><p>Simulation details</p></st>
<p>The negative binomially distributed simulation data used in Tables <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>, and <tblr tid="T3">3</tblr> and Figures <figr fid="F1">1</figr>, <figr fid="F2">2</figr>, and <figr fid="F3">3</figr> were produced using an R generic function <it>rnbinom</it>. Each dataset consisted of 20,000 genes &#215; 6 samples (3 of Sample A vs. 3 of Sample B). Of the 20,000 genes, the <it>P</it><sub>DEG </sub>% were DEGs at the four-fold level, and <it>P</it><sub>A </sub>% of the <it>P</it><sub>DEG </sub>% was higher in Sample A. For example, the simulation condition for Figure <figr fid="F1">1</figr> used 20% of <it>P</it><sub>DEG </sub>and 90% of <it>P</it><sub>A</sub>, giving 4,000 (= 20,000 &#215; 0.20) DEGs, 3,600 (= 20,000 &#215; 0.20 &#215; 0.9) of which are highly expressed in Sample A in the simulation dataset.</p>
<p>The variance of the NB distribution can generally be modelled as <it>V </it>= <it>&#956; </it>+ <it>&#981;&#956;</it><sup>2</sup>. The empirical distribution of read counts for producing the mean (<it>&#956;</it>) and dispersion (<it>&#981;</it>) parameters of the model was obtained from Arabidopsis data (three biological replicates for both the treated and non-treated samples) in Ref. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The simulations were performed using a total of 24 combinations of <it>P</it><sub>DEG </sub>(= 5, 10, 20, and 30%) and <it>P</it><sub>A </sub>(= 50, 60, 70, 80, 90, and 100%) values. The full R-code for obtaining the simulation data is described in Additional file <supplr sid="S1">1</supplr>. The parameter <it>param1 </it>in Additional file <supplr sid="S1">1</supplr> corresponds to the degree of fold-change.</p>
<p>Simulations with different types of DEG distribution were also performed in this study. The fold-change values for individual genes were randomly sampled from a gamma distribution with shape and scale parameters. Specifically, an R generic function <it>rgamma </it>with respective values of 2.0 and 0.5 for the shape and scale parameters was used. This roughly gives respective values of 0.0 and 1.0 for the minimum and mean fold-changes. We added an offset value of 1.2 to prevent low fold-changes for introduced DEGs, giving respective values of 1.2 and 2.2 for the minimum and mean fold-changes. The full R-code for obtaining the simulation data is described in Additional file <supplr sid="S3">3</supplr>. The values in <it>param1 </it>in Additional file <supplr sid="S3">3</supplr> correspond to those parameters.</p>
</sec>
<sec><st><p>Wildtype vs. <it>RDR6 </it>knockout dataset used in <it>baySeq </it>study</p></st>
<p>The dataset was obtained by e-mail from the author of Ref. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The dataset (named "rdr6_wt.RData") consists of 70,619 sRNAs &#215; 4 samples (2 wildtype and 2 <it>RDR6 </it>knockout samples). Of the 70,619 sRNAs, 657 were used as true DE sRNAs whose expressions were higher in the wildtype than the <it>RDR6 </it>knockout samples.</p>
</sec>
<sec><st><p>Gene ranking with default procedure</p></st>
<p>Ranked gene lists according to the differential expression are pre-required for calculating AUC values. The input data for differential expression analysis using five R packages (<it>edgeR </it>ver. 2.4.1, <it>DESeq </it>ver. 1.6.1, <it>baySeq </it>ver. 1.8.1, <it>NBPSeq </it>ver. 0.1.4, and <it>DEGseq </it>ver. 1.6.2) is basically the raw count data where each row indicates the gene (or transcript), each column indicates the sample (or library), and each cell indicates the number of reads mapped to the gene in the sample. The execution of the <it>baySeq </it>package was performed using data after scaling for RPM mapped reads.</p>
<p>The analysis using the <it>edgeR </it>packages with default settings (i.e., the <it>edgeR</it>/default combination) was performed using four functions (<it>calcNormFactors</it>, <it>estimateCommonDisp</it>, <it>estimateTagwiseDisp</it>, and <it>exactTest</it>) in the package <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The TMM normalization factor can be obtained from the output object after applying the <it>calcNormFactors </it>function <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The genes were ranked in ascending order of the <it>p</it>-values.</p>
<p>The <it>DESeq</it>/default combination was performed using three functions (<it>estimateSizeFactors</it>, <it>estimateDispersions</it>, and <it>nbinomTest</it>) in the package. The genes were ranked in ascending order of the <it>p</it>-values adjusted for multiple-testing with the Benjamini-Hochberg procedure.</p>
<p>The <it>baySeq</it>/default combination was performed using two functions (<it>getPriors.NB </it>and <it>getLikelihoods.NB</it>) in the package <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> for the RPM data. The empirical distribution on parameters of the NB distribution was estimated by bootstrapping from the data. We took sample sizes of (i) 2,000 iterations for the simulation data shown in Tables <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>, and <tblr tid="T3">3</tblr>, Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr>, and Additional file <supplr sid="S2">2</supplr> (see Additional files <supplr sid="S1">1</supplr> and <supplr sid="S3">3</supplr>), (ii) 5,000 iterations for the simulation data shown in Figure <figr fid="F3">3</figr> (Additional file <supplr sid="S4">4</supplr>), and (iii) 10,000 iterations for real data (Additional file <supplr sid="S5">5</supplr>). The genes were ranked in descending order of the posterior likelihood of the model for differential expression.</p>
<p>The <it>NBPSeq</it>/default combination was performed using the <it>nbp.test </it>function in the package <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The genes were ranked in ascending order of the <it>p</it>-values of the exact NB test.</p>
<p>The analysis using the <it>DEGseq </it>package <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> was performed for benchmarking the current study and a previous study <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, both of which analyzed the same real dataset. There are multiple methods in the <it>DEGseq </it>package <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Following from the previous study, we used an MA plot-based method with random sampling (MARS), i.e., the <it>DEGexp </it>function with method = "MARS" option was used. A higher absolute value for the statistics indicates a higher degree of differential expression. Accordingly, the genes were ranked in descending order of the absolute value. Note that the execution of this package (ver. 1.6.2) was performed using R 2.13.1 because we encountered an error when executing the more recent version (ver. 1.8.0) using R 2.14.1.</p>
</sec>
<sec><st><p>TbT normalization strategy</p></st>
<p>Our proposed strategy is an analysis pipeline consisting of three steps. In step 1, the TMM normalization factors are calculated by using the <it>calcNormFactors </it>function in the <it>edgeR </it>package with the raw count data. These factors are used for calculating <it>effective </it>library sizes, i.e., library sizes multiplied by the TMM factors.</p>
<p>In step 2, potential DEGs are identified by using the <it>baySeq </it>package with the RPM data. Different from the above <it>baySeq</it>/default combination, the analysis is performed using the effective library sizes. The effective library sizes are introduced when constructing a <it>countData </it>object, the input data for the <it>getPriors.NB </it>function. By applying the subsequent <it>getLikelihoods.NB </it>function, the percentage of DEGs in the data (the <it>P</it><sub>DEG </sub>value) and the corresponding potential DEGs can be obtained.</p>
<p>In step 3, TMM normalization factors are again calculated based on the raw count data after eliminating the estimated DEGs. The TbT normalization factors are defined as (the TMM normalization factors calculated in this step) &#215; (library sizes after eliminating the DEGs)/(library sizes before eliminating the DEGs). As the TbT normalization factors are comparable with the original TMM normalization factors such as those calculated in step 1, effective library sizes can also be calculated by multiplying library sizes by the TbT factors.</p>
<p>The four combinations coupled with the TbT normalization strategy (<it>edgeR</it>/TbT, <it>DESeq</it>/TbT, <it>baySeq</it>/TbT, and <it>NBPSeq</it>/TbT) were analyzed to compare the above four combinations coupled with the default normalization strategy. The <it>edgeR</it>/TbT combination introduced the TbT normalization factors instead of the original TMM factors. The <it>NBPSeq</it>/TbT combination introduced the TbT normalization factors in the <it>nbp.test </it>function. The remaining two combinations (<it>DESeq</it>/TbT and <it>baySeq</it>/TbT) introduced the effective library sizes, i.e., the original library sizes multiplied by the TbT factors.</p>
</sec>
</sec>
<sec><st><p>List of abbreviations used</p></st>
<p>DE: differential expression; DEG: differentially expressed gene; EB: embryonic body; RPM: reads per million (normalization); sRNA: small RNA; tasRNA: <it>TAS </it>locus-derived small RNA; TMM: trimmed mean of M values (method).</p>
</sec>
<sec><st><p>Competing interests</p></st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec><st><p>Authors' contributions</p></st>
<p>KK performed analyses and drafted the paper. TN provided helpful comments and refined the manuscript. KS supervised the critical discussion. All the authors read and approved the final manuscript.</p>
</sec>
</bdy>
<bm>
<ack>
<sec><st><p>Acknowledgements</p></st>
<p>The authors thank Dr. TJ Hardcastle for providing the dataset used in the <it>baySeq </it>study. This study was supported by KAKENHI (21710208 and 24500359 to KK and 22128008 to TN) from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing</p></title><aug><au><snm>Weber</snm><fnm>AP</fnm></au><au><snm>Weber</snm><fnm>KL</fnm></au><au><snm>Carr</snm><fnm>K</fnm></au><au><snm>Wilkerson</snm><fnm>C</fnm></au><au><snm>Ohlrogge</snm><fnm>JB</fnm></au></aug><source>Plant Physiol</source><pubdate>2007</pubdate><volume>144</volume><issue>1</issue><fpage>32</fpage><lpage>42</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.107.096677</pubid><pubid idtype="pmcid">1913805</pubid><pubid idtype="pmpid" link="fulltext">17351049</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>The impact of next-generation sequencing technology on genetics</p></title><aug><au><snm>Mardis</snm><fnm>ER</fnm></au></aug><source>Trends Genet</source><pubdate>2008</pubdate><volume>24</volume><issue>3</issue><fpage>133</fpage><lpage>141</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tig.2007.12.007</pubid><pubid idtype="pmpid" link="fulltext">18262675</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Quantitative monitoring of gene expression patterns with a complementary DNA microarray</p></title><aug><au><snm>Schena</snm><fnm>M</fnm></au><au><snm>Shalon</snm><fnm>D</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au></aug><source>Science</source><pubdate>1995</pubdate><volume>270</volume><issue>5235</issue><fpage>467</fpage><lpage>470</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.270.5235.467</pubid><pubid idtype="pmpid" link="fulltext">7569999</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Expression monitoring by hybridization to high-density oligonucleotide arrays</p></title><aug><au><snm>Lockhart</snm><fnm>DJ</fnm></au><au><snm>Dong</snm><fnm>H</fnm></au><au><snm>Byrne</snm><fnm>MC</fnm></au><au><snm>Follettie</snm><fnm>MT</fnm></au><au><snm>Gallo</snm><fnm>MV</fnm></au><au><snm>Chee</snm><fnm>MS</fnm></au><au><snm>Mittmann</snm><fnm>M</fnm></au><au><snm>Wang</snm><fnm>C</fnm></au><au><snm>Kobayashi</snm><fnm>M</fnm></au><au><snm>Horton</snm><fnm>H</fnm></au><au><snm>Brown</snm><fnm>EL</fnm></au></aug><source>Nat Biotechnol</source><pubdate>1996</pubdate><volume>14</volume><issue>13</issue><fpage>1675</fpage><lpage>1680</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1296-1675</pubid><pubid idtype="pmpid" link="fulltext">9634850</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>3' tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer</p></title><aug><au><snm>Asmann</snm><fnm>YW</fnm></au><au><snm>Klee</snm><fnm>EW</fnm></au><au><snm>Thompson</snm><fnm>EA</fnm></au><au><snm>Perez</snm><fnm>EA</fnm></au><au><snm>Middha</snm><fnm>S</fnm></au><au><snm>Oberg</snm><fnm>AL</fnm></au><au><snm>Therneau</snm><fnm>TM</fnm></au><au><snm>Smith</snm><fnm>DI</fnm></au><au><snm>Poland</snm><fnm>GA</fnm></au><au><snm>Wieben</snm><fnm>ED</fnm></au><au><snm>Kocher</snm><fnm>JP</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>531</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-531</pubid><pubid idtype="pmcid">2781828</pubid><pubid idtype="pmpid" link="fulltext">19917133</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Transcript length bias in RNA-seq data confounds systems biology</p></title><aug><au><snm>Oshlack</snm><fnm>A</fnm></au><au><snm>Wakefield</snm><fnm>MJ</fnm></au></aug><source>Biology Direct</source><pubdate>2009</pubdate><volume>4</volume><fpage>14</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1745-6150-4-14</pubid><pubid idtype="pmcid">2678084</pubid><pubid idtype="pmpid" link="fulltext">19371405</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Mapping and quantifying mammalian transcriptomes by RNA-Seq</p></title><aug><au><snm>Mortazavi</snm><fnm>A</fnm></au><au><snm>Williams</snm><fnm>BA</fnm></au><au><snm>McCue</snm><fnm>K</fnm></au><au><snm>Schaeffer</snm><fnm>L</fnm></au><au><snm>Wold</snm><fnm>B</fnm></au></aug><source>Nat Methods</source><pubdate>2008</pubdate><volume>5</volume><issue>7</issue><fpage>621</fpage><lpage>628</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1226</pubid><pubid idtype="pmpid" link="fulltext">18516045</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome</p></title><aug><au><snm>Sultan</snm><fnm>M</fnm></au><au><snm>Schulz</snm><fnm>MH</fnm></au><au><snm>Richard</snm><fnm>H</fnm></au><au><snm>Magen</snm><fnm>A</fnm></au><au><snm>Klingenhoff</snm><fnm>A</fnm></au><au><snm>Scherf</snm><fnm>M</fnm></au><au><snm>Seifert</snm><fnm>M</fnm></au><au><snm>Borodina</snm><fnm>T</fnm></au><au><snm>Soldatov</snm><fnm>A</fnm></au><au><snm>Parkhomchuk</snm><fnm>D</fnm></au><au><snm>Schmidt</snm><fnm>D</fnm></au><au><snm>O&apos;Keeffe</snm><fnm>S</fnm></au><au><snm>Haas</snm><fnm>S</fnm></au><au><snm>Vingron</snm><fnm>M</fnm></au><au><snm>Lehrach</snm><fnm>H</fnm></au><au><snm>Yaspo</snm><fnm>ML</fnm></au></aug><source>Science</source><pubdate>2008</pubdate><volume>321</volume><issue>5891</issue><fpage>956</fpage><lpage>960</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1160342</pubid><pubid idtype="pmpid" link="fulltext">18599741</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoforms switching during cell differentiation</p></title><aug><au><snm>Trapnell</snm><fnm>C</fnm></au><au><snm>Williams</snm><fnm>BA</fnm></au><au><snm>Pertea</snm><fnm>G</fnm></au><au><snm>Mortazavi</snm><fnm>A</fnm></au><au><snm>Kwan</snm><fnm>G</fnm></au><au><snm>van Baren</snm><fnm>MJ</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au><au><snm>Wold</snm><fnm>BJ</fnm></au><au><snm>Pachter</snm><fnm>L</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2010</pubdate><volume>28</volume><fpage>511</fpage><lpage>515</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1621</pubid><pubid idtype="pmcid">3146043</pubid><pubid idtype="pmpid" link="fulltext">20436464</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Accurate quantification of transcriptome from RNA-Seq data by effective length normalization</p></title><aug><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Seo</snm><fnm>CH</fnm></au><au><snm>Lim</snm><fnm>B</fnm></au><au><snm>Yang</snm><fnm>JO</fnm></au><au><snm>Oh</snm><fnm>J</fnm></au><au><snm>Kim</snm><fnm>M</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Lee</snm><fnm>B</fnm></au><au><snm>Kang</snm><fnm>C</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>39</volume><issue>2</issue><fpage>e9</fpage><xrefbib><pubidlist><pubid idtype="pmcid">3025570</pubid><pubid idtype="pmpid" link="fulltext">21059678</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Estimation of alternative splicing isoform frequencies from RNA-Seq data</p></title><aug><au><snm>Nicolae</snm><fnm>M</fnm></au><au><snm>Mangul</snm><fnm>S</fnm></au><au><snm>Mandoiu</snm><fnm>II</fnm></au><au><snm>Zelikovsky</snm><fnm>A</fnm></au></aug><source>Algorithms Mol Biol</source><pubdate>2011</pubdate><volume>6</volume><fpage>9</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1748-7188-6-9</pubid><pubid idtype="pmcid">3107792</pubid><pubid idtype="pmpid" link="fulltext">21504602</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Stem cell transcriptome profiling via massive-scale mRNA sequencing</p></title><aug><au><snm>Cloonan</snm><fnm>N</fnm></au><au><snm>Forrest</snm><fnm>AR</fnm></au><au><snm>Kolle</snm><fnm>G</fnm></au><au><snm>Gardiner</snm><fnm>BB</fnm></au><au><snm>Faulkner</snm><fnm>GJ</fnm></au><au><snm>Brown</snm><fnm>MK</fnm></au><au><snm>Taylor</snm><fnm>DF</fnm></au><au><snm>Steptoe</snm><fnm>AL</fnm></au><au><snm>Wani</snm><fnm>S</fnm></au><au><snm>Bethel</snm><fnm>G</fnm></au><au><snm>Robertson</snm><fnm>AJ</fnm></au><au><snm>Perkins</snm><fnm>AC</fnm></au><au><snm>Bruce</snm><fnm>SJ</fnm></au><au><snm>Lee</snm><fnm>CC</fnm></au><au><snm>Ranade</snm><fnm>SS</fnm></au><au><snm>Peckham</snm><fnm>HE</fnm></au><au><snm>Manning</snm><fnm>JM</fnm></au><au><snm>McKernan</snm><fnm>KJ</fnm></au><au><snm>Grimmond</snm><fnm>SM</fnm></au></aug><source>Nat Methods</source><pubdate>2008</pubdate><volume>5</volume><issue>7</issue><fpage>613</fpage><lpage>619</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1223</pubid><pubid idtype="pmpid" link="fulltext">18516046</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays</p></title><aug><au><snm>Marioni</snm><fnm>JC</fnm></au><au><snm>Mason</snm><fnm>CE</fnm></au><au><snm>Mane</snm><fnm>SM</fnm></au><au><snm>Stephens</snm><fnm>M</fnm></au><au><snm>Gilad</snm><fnm>Y</fnm></au></aug><source>Genome Res</source><pubdate>2008</pubdate><volume>18</volume><issue>9</issue><fpage>1509</fpage><lpage>1517</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.079558.108</pubid><pubid idtype="pmcid">2527709</pubid><pubid idtype="pmpid" link="fulltext">18550803</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Methods for analyzing deep sequencing expression data: contructing the human and mouse promoteome with deepCAGE data</p></title><aug><au><snm>Balwierz</snm><fnm>PJ</fnm></au><au><snm>Carninci</snm><fnm>P</fnm></au><au><snm>Daub</snm><fnm>CO</fnm></au><au><snm>Kawai</snm><fnm>J</fnm></au><au><snm>Hayashizaki</snm><fnm>Y</fnm></au><au><snm>Van Belle</snm><fnm>W</fnm></au><au><snm>Beisel</snm><fnm>C</fnm></au><au><snm>van Nimwegen</snm><fnm>E</fnm></au></aug><source>Genome Biol</source><pubdate>2009</pubdate><volume>10</volume><issue>7</issue><fpage>R79</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2009-10-7-r79</pubid><pubid idtype="pmcid">2728533</pubid><pubid idtype="pmpid" link="fulltext">19624849</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments</p></title><aug><au><snm>Bullard</snm><fnm>JH</fnm></au><au><snm>Purdom</snm><fnm>E</fnm></au><au><snm>Hansen</snm><fnm>KD</fnm></au><au><snm>Dudoit</snm><fnm>S</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2010</pubdate><volume>11</volume><fpage>94</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-94</pubid><pubid idtype="pmcid">2838869</pubid><pubid idtype="pmpid" link="fulltext">20167110</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>A scaling normalization method for differential expression analysis of RNA-seq data</p></title><aug><au><snm>Robinson</snm><fnm>MD</fnm></au><au><snm>Oshlack</snm><fnm>A</fnm></au></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><fpage>R25</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-3-r25</pubid><pubid idtype="pmcid">2864565</pubid><pubid idtype="pmpid" link="fulltext">20196867</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>edgeR: a Bioconductor package for differential expression analysis of digital gene expression data</p></title><aug><au><snm>Robinson</snm><fnm>MD</fnm></au><au><snm>McCarthy</snm><fnm>DJ</fnm></au><au><snm>Smyth</snm><fnm>GK</fnm></au></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><issue>1</issue><fpage>139</fpage><lpage>140</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp616</pubid><pubid idtype="pmcid">2796818</pubid><pubid idtype="pmpid" link="fulltext">19910308</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>baySeq: empirical Bayesian methods for identifying differential expression in sequence count data</p></title><aug><au><snm>Hardcastle</snm><fnm>TJ</fnm></au><au><snm>Kelly</snm><fnm>KA</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2010</pubdate><volume>11</volume><fpage>422</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-422</pubid><pubid idtype="pmcid">2928208</pubid><pubid idtype="pmpid" link="fulltext">20698981</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>R: A Language and Environment for Statistical Computing</p></title><aug><au><cnm>R Development Core Team</cnm></au></aug><publisher>R Foundation for Statistical computing, Vienna, Austria</publisher><pubdate>2011</pubdate></bibl><bibl id="B20"><title><p>Differential expression analysis for sequence count data</p></title><aug><au><snm>Anders</snm><fnm>S</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><fpage>R106</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-10-r106</pubid><pubid idtype="pmcid">3218662</pubid><pubid idtype="pmpid" link="fulltext">20979621</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>The NBP negative binomial model for assessing differential gene expression from RNA-Seq</p></title><aug><au><snm>Di</snm><fnm>Y</fnm></au><au><snm>Schafer</snm><fnm>DW</fnm></au><au><snm>Cumbie</snm><fnm>JS</fnm></au><au><snm>Chang</snm><fnm>JH</fnm></au></aug><source>Stat Appl Genet Mol Biol</source><pubdate>2011</pubdate><volume>10</volume><fpage>art24</fpage></bibl><bibl id="B22"><title><p>Small-sample estimation of negative binomial dispersion, with applications to SAGE data</p></title><aug><au><snm>Robinson</snm><fnm>MD</fnm></au><au><snm>Smyth</snm><fnm>GK</fnm></au></aug><source>Biostatistics</source><pubdate>2008</pubdate><volume>9</volume><fpage>321</fpage><lpage>332</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17728317</pubid></xrefbib></bibl><bibl id="B23"><title><p>A weighted average difference method for detecting differentially expressed genes from microarray data</p></title><aug><au><snm>Kadota</snm><fnm>K</fnm></au><au><snm>Nakai</snm><fnm>Y</fnm></au><au><snm>Shimizu</snm><fnm>K</fnm></au></aug><source>Algorithms Mol Biol</source><pubdate>2008</pubdate><volume>3</volume><fpage>8</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1748-7188-3-8</pubid><pubid idtype="pmcid">2464587</pubid><pubid idtype="pmpid" link="fulltext">18578891</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity</p></title><aug><au><snm>Kadota</snm><fnm>K</fnm></au><au><snm>Nakai</snm><fnm>Y</fnm></au><au><snm>Shimizu</snm><fnm>K</fnm></au></aug><source>Algorithms Mol Biol</source><pubdate>2009</pubdate><volume>4</volume><fpage>7</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1748-7188-4-7</pubid><pubid idtype="pmcid">2679019</pubid><pubid idtype="pmpid" link="fulltext">19386098</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Evaluating methods for ranking differentially expressed genes applied to MicroArray Quality Control data</p></title><aug><au><snm>Kadota</snm><fnm>K</fnm></au><au><snm>Shimizu</snm><fnm>K</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2011</pubdate><volume>12</volume><fpage>227</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-12-227</pubid><pubid idtype="pmcid">3128035</pubid><pubid idtype="pmpid" link="fulltext">21639945</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>DEGseq: an R package for identifying differentially expressed genes from RNA-seq data</p></title><aug><au><snm>Wang</snm><fnm>L</fnm></au><au><snm>Feng</snm><fnm>Z</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><issue>1</issue><fpage>136</fpage><lpage>138</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp612</pubid><pubid idtype="pmpid" link="fulltext">19855105</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Proportion statistics to detect differentially expressed genes: a comparison with log-ratio statistics</p></title><aug><au><snm>Bergemann</snm><fnm>TL</fnm></au><au><snm>Wilson</snm><fnm>J</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2011</pubdate><volume>12</volume><fpage>228</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-12-228</pubid><pubid idtype="pmcid">3224106</pubid><pubid idtype="pmpid" link="fulltext">21649912</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Bioconductor: open software development for computational biology and bioinformatics</p></title><aug><au><snm>Gentleman</snm><fnm>RC</fnm></au><au><snm>Carey</snm><fnm>VJ</fnm></au><au><snm>Bates</snm><fnm>DM</fnm></au><au><snm>Bolstad</snm><fnm>B</fnm></au><au><snm>Dettling</snm><fnm>M</fnm></au><au><snm>Dudoit</snm><fnm>S</fnm></au><au><snm>Ellis</snm><fnm>B</fnm></au><au><snm>Gautier</snm><fnm>L</fnm></au><au><snm>Ge</snm><fnm>Y</fnm></au><au><snm>Gentry</snm><fnm>J</fnm></au><au><snm>Hornik</snm><fnm>K</fnm></au><au><snm>Hothorn</snm><fnm>T</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au><au><snm>Iacus</snm><fnm>S</fnm></au><au><snm>Irizarry</snm><fnm>R</fnm></au><au><snm>Leisch</snm><fnm>f</fnm></au><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Maechler</snm><fnm>M</fnm></au><au><snm>Rossini</snm><fnm>AJ</fnm></au><au><snm>Sawitzki</snm><fnm>G</fnm></au><au><snm>Smith</snm><fnm>C</fnm></au><au><snm>Smyth</snm><fnm>G</fnm></au><au><snm>Tierney</snm><fnm>L</fnm></au><au><snm>Yang</snm><fnm>JY</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au></aug><source>Genome Biol</source><pubdate>2004</pubdate><volume>5</volume><fpage>R80</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2004-5-10-r80</pubid><pubid idtype="pmcid">545600</pubid><pubid idtype="pmpid" link="fulltext">15461798</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm>
</art>