<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1748-7188-1-10</ui>
	<ji>1748-7188</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>P-value based visualization of codon usage data</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Meinicke</snm>
					<fnm>Peter</fnm>
					<insr iid="I1"/>
					<email>pmeinic@gwdg.de</email>
				</au>
				<au id="A2">
					<snm>Brodag</snm>
					<fnm>Thomas</fnm>
					<insr iid="I2"/>
					<email>Thomas.Brodag@T-Online.de</email>
				</au>
				<au id="A3">
					<snm>Fricke</snm>
					<mnm>Florian</mnm>
					<fnm>Wolfgang</fnm>
					<insr iid="I3"/>
					<email>wfricke@gwdg.de</email>
				</au>
				<au id="A4">
					<snm>Waack</snm>
					<fnm>Stephan</fnm>
					<insr iid="I2"/>
					<email>waack@cs.uni-goettingen.de</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Abteilung Bioinformatik, Institut f&#252;r Mikrobiologie und Genetik, Georg-August-Universit&#228;t G&#246;ttingen, Goldschmidtstr. 1, 37077 G&#246;ttingen, Germany</p>
				</ins>
				<ins id="I2">
					<p>Institut f&#252;r Numerische und Angewandte Mathematik, Universit&#228;t G&#246;ttingen, Lotzestr. 16, 37083 G&#246;ttingen, Germany</p>
				</ins>
				<ins id="I3">
					<p>G&#246;ttingen Genomics Laboratory, Universit&#228;t G&#246;ttingen, Grisebachstr. 8, 37077 G&#246;ttingen, Germany</p>
				</ins>
			</insg>
			<source>Algorithms for Molecular Biology</source>
			<issn>1748-7188</issn>
			<pubdate>2006</pubdate>
			<volume>1</volume>
			<issue>1</issue>
			<fpage>10</fpage>
			<url>http://www.almob.org/content/1/1/10</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16808834</pubid><pubid idtype="doi">10.1186/1748-7188-1-10</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>13</day>
					<month>3</month>
					<year>2006</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>29</day>
					<month>6</month>
					<year>2006</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>29</day>
					<month>6</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Meinicke et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>Two important and not yet solved problems in bacterial genome research are the identification of horizontally transferred genes and the prediction of gene expression levels. Both problems can be addressed by multivariate analysis of codon usage data. In particular dimensionality reduction methods for visualization of multivariate data have shown to be effective tools for codon usage analysis. We here propose a multidimensional scaling approach using a novel similarity measure for codon usage tables. Our probabilistic similarity measure is based on P-values derived from the well-known chi-square test for comparison of two distributions. Experimental results on four microbial genomes indicate that the new method is well-suited for the analysis of horizontal gene transfer and translational selection. As compared with the widely-used correspondence analysis, our method did not suffer from outlier sensitivity and showed a better clustering of putative alien genes in most cases.</p>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>The standard genetic code of protein coding DNA sequences shows a redundancy, since different triplet codons may be used to code for the same amino acid. In general, codon usages show organism-specific patterns. However, codon usage variation within a single genome can be an important source of information about gene expression levels and events of horizontal gene transfer. In particular, dimensionality reduction methods have widely been used for the analysis of codon usage patterns in microbial genomes. These methods provide a low-dimensional point representation of genes, where the proximity of gene-specific points indicates a similar codon usage of the associated genes. Hence, the resulting two-dimensional scatter plots enable a total view on the genome which may reveal a clustering of genes according to groups of nearby points. These clusters can for instance provide evidence for horizontal gene transfer according to groups of putative alien genes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp> or for translational selection according to groups of highly expressed genes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>.</p>
			<p>As a standard method for scatter plot visualization of codon usage data, researchers mostly resort to the so-called correspondence analysis (CA) which has originally been developed for the analysis of contingency tables <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. From the original formulation it is not completely clear how CA applies to codon counts. Because different preprocessing and normalization schemes have been proposed, the use of CA in codon usage studies has not been without controversy <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Nevertheless, CA has been applied for the analysis of many bacterial genomes, including those of <it>Escherichia coli </it><abbrgrp><abbr bid="B1">1</abbr><abbr bid="B3">3</abbr></abbrgrp>, <it>Bacillus subtilis </it><abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, <it>Borrelia burgdorferi </it><abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, <it>Chlamydia trachomatis </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, <it>Mycoplasma genitalium </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, <it>Helicobacter pylori </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and <it>Pseudomonas aeruginosa </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
			<p>Recently, self-organizing maps <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> have been proposed as an alternative visualization method for codon usage data <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Although this method provides a simultaneous clustering of the data which may be useful in certain contexts, it requires to choose the size of a discrete grid on which the genes are mapped in a non-linear way. The grid-size is a critical parameter of the method and directly controls the final clustering in the visualization. Unfortunately, the grid-size of self-organizing maps is a so-called <it>hyperparameter </it>which usually cannot be inferred from the data in an unsupervised manner. Therefore the resulting visualizations bare the risk of being highly subjective.</p>
			<p>Here we present a visualization method, which has been tailored to the analysis of codon usage data while not depending on difficult to tune hyperparameters. Our visualization method is based on multidimensional scaling and a new similarity measure for codon usage data. In the following we first introduce our probabilistic similarity measure for codon usage tables and outline the corresponding algorithm for multidimensional scaling based on P-values. Then we provide some visualizations for the analysis of four microbial genomes and discuss our results in comparison with the results obtained from the classical correspondence analysis method.</p>
		</sec>
		<sec>
			<st>
				<p>P-values for multidimensional scaling</p>
			</st>
			<p>For the analysis of codon usage tables we developed a special similarity measure which has been derived from the well-known chi-square test for the comparison of two distributions. Unlike the classical chi-square test we do not decide whether two distributions are equal or not, but instead we only use the corresponding P-values to compute a similarity measure for the underlying codon usage tables. For each pair of genes we compare the corresponding codon distributions on the basis of the codon frequencies in the two genes. For a suitable similarity score we average the P-values of the amino acid specific chi-square tests. We start with the counts <graphic file="1748-7188-1-10-i1.gif"/> for codon <graphic file="1748-7188-1-10-i2.gif"/> of amino acid <it>a</it><sub><it>i </it></sub>in the <it>j</it>-th gene. These counts sum up to <graphic file="1748-7188-1-10-i3.gif"/> over the number <it>L</it><sub><it>i </it></sub>of different codons for amino acid <it>a</it><sub><it>i</it></sub>. Note that <it>n</it><sub><it>ij </it></sub>corresponds to the number of occurrences of amino acid <it>a</it><sub><it>i </it></sub>in gene <it>j</it>. With these counts we compute the chi-square statistic for each pair (<it>j</it>, <it>k</it>) of genes:</p>
			<p>
				<graphic file="1748-7188-1-10-i4.gif"/>
			</p>
			<p>The classical chi-square test for comparison of two distributions is based on the following proposition: under the null hypothesis that the corresponding samples were drawn from the same probability distribution, the variable <graphic file="1748-7188-1-10-i5.gif"/> is asymptotically chi-square distributed with <it>L</it><sub><it>i </it></sub>degrees of freedom. Here we do not perform a chi-square test, but rather calculate the P-value <it>P</it><sub><it>ijk </it></sub>associated with the chi-square statistic <graphic file="1748-7188-1-10-i5.gif"/>. The P-values are obtained from the chi-square probability function which is an incomplete gamma function <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. A small value of <it>P</it><sub><it>ijk </it></sub>indicates a significant difference between the codon distributions of gene <it>j </it>and <it>k </it>with respect to amino acid <it>a</it><sub><it>i</it></sub>. For a number of <it>M </it>genes in a genome we then assemble the <it>M </it>&#215; <it>M </it>matrix <b>S </b>of similarity scores with non-negative elements</p>
			<p>
				<graphic file="1748-7188-1-10-i6.gif"/>
			</p>
			<p>where <it>n</it><sub><it>a </it></sub>is the number of amino acids. Note that <b>S </b>has unit diagonal elements, i.e. <it>S</it><sub><it>jj </it></sub>= 1, because the P-value for tables with identical counts is one. Consequently all off-diagonal elements are in the range [0, 1].</p>
			<p>In order to derive a suitable low-dimensional point representation of genes we apply classical multidimensional scaling (see e.g. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>) to the above similarities. The objective is to find a two-dimensional point configuration with interpoint distances reflecting the codon usage similarities of the corresponding genes. To perform classical scaling based on similarities we first transform the similarity matrix <b>S </b>into a positive semi-definite matrix <b>C </b>by subtracting the smallest eigenvalue &#955;<sub>min </sub>of <b>S </b>from all of its diagonal elements:</p>
			<p><b>C </b>= <b>S </b>- &#955;<sub>min</sub><b>I </b>&#160;&#160;&#160; (3)</p>
			<p>where <b>I </b>is the <it>M </it>&#215; <it>M </it>identity matrix. Note that this transformation preserves the equality of diagonal elements. With the <it>M </it>&#215; <it>M </it>centering matrix <b>H </b>with elements</p>
			<p>
				<graphic file="1748-7188-1-10-i7.gif"/>
			</p>
			<p>we finally obtain the matrix</p>
			<p><b>B </b>= <b>HCH</b>. &#160;&#160;&#160; (5)</p>
			<p>It can be shown that for a positive semi-definite matrix <b>C </b>the distance matrix <b>D </b>with elements obtained by the standard transformation <graphic file="1748-7188-1-10-i8.gif"/> is Euclidean and <b>B </b>is a centered inner product matrix (<abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, pp. 402). Therefore principal components can be obtained from (partial) eigenvalue decomposition of <b>B</b>. Thus, for 2D-visualization we compute the two leading eigenvectors x<sub>1 </sub>and x<sub>2 </sub>of <b>B </b>associated with the largest and second largest eigenvalue, respectively. The <it>M </it>components of x<sub>1 </sub>and x<sub>2 </sub>provide the <it>x</it><sub>1 </sub>and <it>x</it><sub>2 </sub>coordinates for the <it>M </it>genes, which are utilized for scatter plot visualization.</p>
		</sec>
		<sec>
			<st>
				<p>Experimental results</p>
			</st>
			<sec>
				<st>
					<p>Data sets</p>
				</st>
				<p>To evaluate our multidimensional scaling (MDS) approach, we focused on visualizations of ribosomal protein genes and putative alien genes for different microbial genomes. Ribosomal protein genes belong to the class of highly expressed genes which tend to use codons associated with the prevalent tRNAs present in the organism. If translational selection is one of the main sources for codon preferences in a particular genome, then codon usage can in turn be used for the prediction of putative highly expressed genes <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Another source of codon usage variation in microbial genomes is provided by the insertion of foreign DNA by means of horizontal gene transfer. Thus, putative alien genes may also be predicted on the basis of codon usage analysis <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B21">21</abbr></abbrgrp>. While ribosomal protein genes can be identified from the annotations of completely sequenced genomes, reliable information about putative alien genes is much more difficult to obtain. We combined predictions of the SIGI-HMM tool <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> with existing references from the literature in order to obtain suitable test sets for our evaluations. SIGI-HMM is based on a Hidden Markov Model for the detection of genomic islands and, in contrast to our MDS-based visualization method, it explicitly uses information about the locations of genes on the corresponding chromosomes. However, unlike MDS, SIGI-HMM does not consider codon usage correlations between different amino acids. Using the two complementary kinds of information exclusively, both methods provide completely different approaches to codon usage analysis <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
				<p>For the evaluation of the MDS-based visualizations we analyzed the microbial genomes of <it>Escherichia coli K-12</it>, <it>Bacillus subtilis</it>, <it>Vibrio cholerae </it>and <it>Thermus thermophilus HB8</it>. We used annotated DNA sequence data in the EMBL format publicly available from EBI <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Ribosomal protein genes were extracted from the datasets of the completely annotated genomes. Putative alien genes were selected according to the following information: On chromosome 1 of <it>V. cholerae </it>two genomic islands were predicted by SIGI-HMM that comprise a gene cluster for a toxin-coregulated pilus and fragments of a temperate filamentous phage described in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Both clusters are closely associated with the pathogenicity of <it>V. cholerae</it>. For <it>Bacillus subtilis </it>10 integrated prophages have been described based on experimental evidence and theoretical considerations <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Nine of these prophages overlap with genomic islands as predicted by SIGI-HMM. For <it>Escherichia coli </it>K-12 the authors of <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> used different compositional variables and estimated that about 18% of the genome have been imported by horizontal gene transfer. In contrast, SIGI-HMM predicted 580 genes (13,6%) to be putatively alien. The largest genomic islands comprise the cryptic prophages CP4-6, DLP12, e14, Rac, Qin, CP4-44, CPS-53, Eut, CP4-57, and the phage-like element KpLE2 (reviewed in <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>). For the extremophilic bacterium <it>Thermus thermophilus HB8 </it>no genomic islands have been described so far. SIGI-HMM predicted a contiguous gene cluster of 5 genes associated with functions in cell wall biosynthesis to be putative alien. The total number of putative alien genes and the number of ribosomal protein genes for all species considered here are summarized in table <tblr tid="T1">1</tblr>. <supplr sid="S1">Additional file 1</supplr> provides a detailed list of all putative alien genes used for the visualization.</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Number of genes used for the visualization for all species under consideration. Given are the number of putative alien genes, the number of ribosomal protein genes and the total number of genes on the respective chromosomes.</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c ca="left">
								<p>species</p>
							</c>
							<c ca="center">
								<p># genes (total)</p>
							</c>
							<c ca="center">
								<p># ribosomal protein genes</p>
							</c>
							<c ca="center">
								<p># putative alien genes</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>E. coli</it>
								</p>
							</c>
							<c ca="center">
								<p>4254</p>
							</c>
							<c ca="center">
								<p>61</p>
							</c>
							<c ca="center">
								<p>206</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>B. subtilis</it>
								</p>
							</c>
							<c ca="center">
								<p>4106</p>
							</c>
							<c ca="center">
								<p>57</p>
							</c>
							<c ca="center">
								<p>317</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>V. cholerae </it>Chr1</p>
							</c>
							<c ca="center">
								<p>2736</p>
							</c>
							<c ca="center">
								<p>64</p>
							</c>
							<c ca="center">
								<p>41</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>V. cholerae </it>Chr2</p>
							</c>
							<c ca="center">
								<p>1092</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>216</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>T. thermophilus</it>
								</p>
							</c>
							<c ca="center">
								<p>1973</p>
							</c>
							<c ca="center">
								<p>60</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<suppl id="S1">
					<title>
						<p>Additional File 1</p>
					</title>
					<text>
						<p>provides an Excel table (XLS) containing a detailed list of all putative alien genes used for the visualization.</p>
					</text>
					<file name="1748-7188-1-10-S1.xls">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
			<sec>
				<st>
					<p>Visualization</p>
				</st>
				<p>We compared our multidimensional scaling (MDS) approach with the correspondence analysis (CA) method as implemented in the <it>CodonW </it>program <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> of J. Peden. Computations were based on <it>relative synonymous codon usage </it>(<it>RSCU</it>) values which is the most common way to perform CA on codon usage data <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. For both methods the resulting coordinates were normalized according to a unit variance of the leading two factors and principal components, respectively.</p>
				<p>The CA-based visualization for E. coli (Fig. <figr fid="F1">1</figr>) shows the typical "rabbit head" structure which has been described in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The "ears" correspond to two branches of the distribution with low density. The "left ear" in the upper left corner shows a cluster of ribosomal protein genes while putative alien genes are mainly located around the other branch of the distribution. The MDS plot in Fig. <figr fid="F1">1</figr> shows a similar picture with ribosomal protein genes and putative alien genes again concentrated in the two branches of the distribution which here appears rotated by 180 degrees. Comparing the visualizations, most of the ribosomal protein genes are well-clustered in both plots while putative alien genes are slightly more concentrated in the MDS plot. Note that the CA-based visualization shows an outlier at the lower boundary of the plot which is not among the putative alien genes.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Scatter plots for <it>E. coli </it>based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively</p>
					</caption>
					<text>
						<p>Scatter plots for <it>E. coli </it>based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes.</p>
					</text>
					<graphic file="1748-7188-1-10-1"/>
				</fig>
				<p>For <it>B. subtilis </it>(Fig. <figr fid="F2">2</figr>) both visualization methods show a good clustering of putative alien genes and ribosomal protein genes in the branches of the distribution. Again the lower boundary of the CA plot is determined by an outlier which does not belong to the set of putative alien genes.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Scatter plots for <it>B. subtilis </it>based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively</p>
					</caption>
					<text>
						<p>Scatter plots for <it>B. subtilis </it>based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes.</p>
					</text>
					<graphic file="1748-7188-1-10-2"/>
				</fig>
				<p>For the first chromosome of <it>V. cholerae </it>(Fig. <figr fid="F3">3</figr>) the comparison shows a similar situation as for <it>B. subtilis</it>: in both plots, most of the ribosomal protein and putative alien genes are well-clustered in the two branches of the distribution. In the lower left corner of the CA-based plot there is an outlier which is not in the set of putative alien genes. As chromosome II of <it>V. cholerae </it>does not contain any ribosomal protein genes, the visualization of this replicon is restricted to putative alien genes (Fig. <figr fid="F4">4</figr>). These genes are slightly more concentrated in the MDS-based plot. Again, the lower boundary of the CA-plot is determined by an outlier which is not among putative alien genes.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Scatter plots for <it>V. cholerae </it>(chromosome 1) based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively</p>
					</caption>
					<text>
						<p>Scatter plots for <it>V. cholerae </it>(chromosome 1) based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes.</p>
					</text>
					<graphic file="1748-7188-1-10-3"/>
				</fig>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Scatter plots for <it>V. cholerae </it>(chromosome 2) based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively</p>
					</caption>
					<text>
						<p>Scatter plots for <it>V. cholerae </it>(chromosome 2) based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes.</p>
					</text>
					<graphic file="1748-7188-1-10-4"/>
				</fig>
				<p>For <it>T. thermophilus </it>(Fig. <figr fid="F5">5</figr>) the outlier sensitivity of CA results in a highly distorted plot which makes it difficult to draw any conclusions from the visualization at all. While ribosomal protein genes are clumped together with the remaining genes in a small region of the plot, putative alien genes are widespread in a region of low density. In contrast, the MDS-based plot shows a specific proximity of putative alien genes in a tail at the right border and the ribosomal protein genes at least show some weak clustering in the upper right part of the core distribution.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Scatter plots for <it>T. thermophilus </it>based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively</p>
					</caption>
					<text>
						<p>Scatter plots for <it>T. thermophilus </it>based on first two components of correspondence analysis (left, CA) and P-value based multidimensional scaling (right, MDS), respectively. Red dots: ribosomal protein genes; blue dots: putative alien genes; yellow dots: all other genes.</p>
					</text>
					<graphic file="1748-7188-1-10-5"/>
				</fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>We proposed an approach for the visualization of codon usage data, using multidimensional scaling (MDS). In that context we introduced a novel similarity measure for codon usage tables, which has been derived from the classical chi-square test. An important feature of our P-value based similarity measure is that it does not involve any hyperparameters. Therefore a subjective "bias" on the visualization due to user-adjusted parameters is effectively avoided. Our comparisons with the widely-used correspondence analysis (CA) method in most cases showed a slightly better clustering of putative alien genes for our P-value based visualization. In particular the results indicate that our approach is more robust than the CA-based visualization method. The outlier-sensitivity of CA becomes apparent in the plots for all species considered here and has already been observed in previous studies <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. While in most cases the CA-based visualizations are still useful in terms of a suitable clustering of ribosomal protein and putative alien genes, for <it>T. thermophilus </it>that sensitivity results in an inappropriate plot which complicates interpretation.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>The work was partially supported by BMBF project MediGrid (01AK803G).</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Evidence for horizontal gene transfer in Escherichia coli speciation</p>
				</title>
				<aug>
					<au>
						<snm>M&#233;digue</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Rouxel</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Vigier</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>H&#233;naut</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Danchin</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1991</pubdate>
				<volume>222</volume>
				<fpage>851</fpage>
				<lpage>856</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0022-2836(91)90575-Q</pubid>
						<pubid idtype="pmpid" link="fulltext">1762151</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Analysis of codon usage patterns of bacterial genomes using the self-organizing map</p>
				</title>
				<aug>
					<au>
						<snm>Wang</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Badger</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kearney</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2001</pubdate>
				<volume>18</volume>
				<fpage>792</fpage>
				<lpage>792</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11319263</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Codon usage and gene expression</p>
				</title>
				<aug>
					<au>
						<snm>Holm</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1986</pubdate>
				<volume>14</volume>
				<fpage>3075</fpage>
				<lpage>3087</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">339722</pubid>
						<pubid idtype="pmpid">2938078</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases</p>
				</title>
				<aug>
					<au>
						<snm>Shields</snm>
						<fnm>DC</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1987</pubdate>
				<volume>15</volume>
				<fpage>8023</fpage>
				<lpage>8040</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">306324</pubid>
						<pubid idtype="pmpid">3118331</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Correspondence analysis: a neglected multivariate method</p>
				</title>
				<aug>
					<au>
						<snm>Hill</snm>
						<fnm>MO</fnm>
					</au>
				</aug>
				<source>Appl Stat</source>
				<pubdate>1974</pubdate>
				<volume>23</volume>
				<fpage>340</fpage>
				<lpage>354</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2347127</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Use and misuse of correspondence analysis in codon usage studies</p>
				</title>
				<aug>
					<au>
						<snm>Perri&#232;re</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Thioulouse</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>4548</fpage>
				<lpage>4555</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">137129</pubid>
						<pubid idtype="pmpid" link="fulltext">12384602</pubid>
						<pubid idtype="doi">10.1093/nar/gkf565</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>NRSub: a non-redundant data base for the Bacillus subtilis genome</p>
				</title>
				<aug>
					<au>
						<snm>Perri&#232;re</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Gouy</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gojobori</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1994</pubdate>
				<volume>22</volume>
				<fpage>5525</fpage>
				<lpage>5529</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">310112</pubid>
						<pubid idtype="pmpid">7838704</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Codon usage and lateral gene transfer in Bacillus subtilis</p>
				</title>
				<aug>
					<au>
						<snm>Moszer</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Rocha</snm>
						<fnm>EP</fnm>
					</au>
					<au>
						<snm>Danchin</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Curr Opin Microbiol</source>
				<pubdate>1999</pubdate>
				<volume>2</volume>
				<fpage>524</fpage>
				<lpage>528</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1369-5274(99)00011-9</pubid>
						<pubid idtype="pmpid" link="fulltext">10508724</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Replicational and transcriptional selection on codon usage in Borrelia burgdorferi</p>
				</title>
				<aug>
					<au>
						<snm>Mclnerney</snm>
						<fnm>JO</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>10698</fpage>
				<lpage>10703</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">27958</pubid>
						<pubid idtype="pmpid" link="fulltext">9724767</pubid>
						<pubid idtype="doi">10.1073/pnas.95.18.10698</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases</p>
				</title>
				<aug>
					<au>
						<snm>Lafay</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Lloyd</snm>
						<fnm>AT</fnm>
					</au>
					<au>
						<snm>McLean</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Devine</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Wolfe</snm>
						<fnm>KH</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>1642</fpage>
				<lpage>1649</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148367</pubid>
						<pubid idtype="pmpid" link="fulltext">10075995</pubid>
						<pubid idtype="doi">10.1093/nar/27.7.1642</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces</p>
				</title>
				<aug>
					<au>
						<snm>Romero</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Zavala</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Musto</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>2084</fpage>
				<lpage>2090</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">105376</pubid>
						<pubid idtype="pmpid" link="fulltext">10773076</pubid>
						<pubid idtype="doi">10.1093/nar/28.10.2084</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Prokaryotic Genome Evolution as Assessed by Multivariate Analysis of Codon Usage Patterns</p>
				</title>
				<aug>
					<au>
						<snm>Mclnerney</snm>
						<fnm>JO</fnm>
					</au>
				</aug>
				<source>Microbial and Comparative Genomics</source>
				<pubdate>1997</pubdate>
				<volume>2</volume>
				<fpage>1</fpage>
				<lpage>10</lpage>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Absence of translationally selected synonymous codon usage bias in Helicobacter pylori</p>
				</title>
				<aug>
					<au>
						<snm>Lafay</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Atherton</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
				</aug>
				<source>Microbiology</source>
				<pubdate>2000</pubdate>
				<volume>146</volume>
				<issue>Pt 4</issue>
				<fpage>851</fpage>
				<lpage>860</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10784043</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa</p>
				</title>
				<aug>
					<au>
						<snm>Gupta</snm>
						<fnm>SK</fnm>
					</au>
					<au>
						<snm>Ghosh</snm>
						<fnm>TC</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2001</pubdate>
				<volume>273</volume>
				<fpage>63</fpage>
				<lpage>63</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(01)00576-5</pubid>
						<pubid idtype="pmpid" link="fulltext">11483361</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<aug>
					<au>
						<snm>Kohonen</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Self-Organizing Maps</source>
				<publisher>Springer, Berlin</publisher>
				<pubdate>1995</pubdate>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome</p>
				</title>
				<aug>
					<au>
						<snm>Kanaya</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kinouchi</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Abe</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Kudo</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Yamada</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Nishi</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Mori</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Ikemura</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2001</pubdate>
				<volume>276</volume>
				<fpage>89</fpage>
				<lpage>89</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(01)00673-4</pubid>
						<pubid idtype="pmpid" link="fulltext">11591475</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>INCA: synonymous codon usage analysis and clustering by means of self-organizing map</p>
				</title>
				<aug>
					<au>
						<snm>Supek</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Vlahovicek</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>2329</fpage>
				<lpage>2330</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth238</pubid>
						<pubid idtype="pmpid" link="fulltext">15059815</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<aug>
					<au>
						<snm>Press</snm>
						<fnm>WH</fnm>
					</au>
					<au>
						<snm>Flannery</snm>
						<fnm>BP</fnm>
					</au>
					<au>
						<snm>Teukolsky</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Vetterling</snm>
						<fnm>WT</fnm>
					</au>
				</aug>
				<source>Numerical Recipes in C</source>
				<publisher>Cambridge University Press, Cambridge</publisher>
				<edition>2</edition>
				<pubdate>1992</pubdate>
			</bibl>
			<bibl id="B19">
				<aug>
					<au>
						<snm>Mardia</snm>
						<fnm>KV</fnm>
					</au>
					<au>
						<snm>Kent</snm>
						<fnm>JT</fnm>
					</au>
					<au>
						<snm>Bibby</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Multivariate Analysis</source>
				<publisher>Academic Press, London</publisher>
				<pubdate>1979</pubdate>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Predicted highly expressed genes of diverse prokaryotic genomes</p>
				</title>
				<aug>
					<au>
						<snm>Karlin</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Mrazek</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2000</pubdate>
				<volume>182</volume>
				<issue>18</issue>
				<fpage>5238</fpage>
				<lpage>5250</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">94675</pubid>
						<pubid idtype="pmpid" link="fulltext">10960111</pubid>
						<pubid idtype="doi">10.1128/JB.182.18.5238-5250.2000</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>SIGI: score-based identification of genomic islands</p>
				</title>
				<aug>
					<au>
						<snm>Merkl</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>22</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">394314</pubid>
						<pubid idtype="pmpid" link="fulltext">15113412</pubid>
						<pubid idtype="doi">10.1186/1471-2105-5-22</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models</p>
				</title>
				<aug>
					<au>
						<snm>Waack</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Keller</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Asper</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Brodag</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Damm</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Fricke</snm>
						<fnm>WF</fnm>
					</au>
					<au>
						<snm>Surovcik</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Meinicke</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Merkl</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>7</volume>
				<fpage>142</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/1471-2105-7-142</pubid>
						<pubid idtype="pmpid" link="fulltext">16542435</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>European Bioinformatics Institute</p>
				</title>
				<url>http://www.ebi.ac.uk/genomes/</url>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Lysogenic conversion by a filamentous phage encoding cholera toxin</p>
				</title>
				<aug>
					<au>
						<snm>Waldor</snm>
						<fnm>MK</fnm>
					</au>
					<au>
						<snm>Mekalanos</snm>
						<fnm>JJ</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1996</pubdate>
				<volume>272</volume>
				<issue>5270</issue>
				<fpage>1910</fpage>
				<lpage>1914</lpage>
				<note>Comment.</note>
				<xrefbib>
					<pubid idtype="pmpid">8658163</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>The complete genome sequence of the gram-positive bacterium Bacillus subtilis</p>
				</title>
				<aug>
					<au>
						<snm>Kunst</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Ogasawara</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Moszer</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Albertini</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Alloni</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Azevedo</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Bertero</snm>
						<fnm>MG</fnm>
					</au>
					<au>
						<snm>Bessieres</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Bolotin</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Borchert</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Borriss</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Boursier</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Brans</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Braun</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Brignell</snm>
						<fnm>SC</fnm>
					</au>
					<au>
						<snm>Bron</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Brouillet</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bruschi</snm>
						<fnm>CV</fnm>
					</au>
					<au>
						<snm>Caldwell</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Capuano</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Carter</snm>
						<fnm>NM</fnm>
					</au>
					<au>
						<snm>Choi</snm>
						<fnm>SK</fnm>
					</au>
					<au>
						<snm>Codani</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Connerton</snm>
						<fnm>IF</fnm>
					</au>
					<au>
						<snm>Danchin</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1997</pubdate>
				<volume>390</volume>
				<issue>6657</issue>
				<fpage>249</fpage>
				<lpage>256</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/36786</pubid>
						<pubid idtype="pmpid" link="fulltext">9384377</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Complete nucleotide sequence of a skin element excised by DNA rearrangement during sporulation in Bacillus subtilis</p>
				</title>
				<aug>
					<au>
						<snm>Takemaru</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Mizuno</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sato</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Takeuchi</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kobayashi</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Microbiology</source>
				<pubdate>1995</pubdate>
				<volume>141</volume>
				<issue>Pt 2</issue>
				<fpage>323</fpage>
				<lpage>327</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7704261</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Characterization of PBSX, a defective prophage of Bacillus subtilis</p>
				</title>
				<aug>
					<au>
						<snm>Wood</snm>
						<fnm>HE</fnm>
					</au>
					<au>
						<snm>Dawson</snm>
						<fnm>MT</fnm>
					</au>
					<au>
						<snm>Devine</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>McConnell</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1990</pubdate>
				<volume>172</volume>
				<issue>5</issue>
				<fpage>2667</fpage>
				<lpage>2674</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">208911</pubid>
						<pubid idtype="pmpid">2110147</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Bacillus subtilis bacteriophage SPbeta: localization of the prophage attachment site, and specialized transduction</p>
				</title>
				<aug>
					<au>
						<snm>Zahler</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Korman</snm>
						<fnm>RZ</fnm>
					</au>
					<au>
						<snm>Rosenthal</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Hemphill</snm>
						<fnm>HE</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1977</pubdate>
				<volume>129</volume>
				<issue>1</issue>
				<fpage>556</fpage>
				<lpage>558</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">234961</pubid>
						<pubid idtype="pmpid">401505</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Molecular archaeology of the Escherichia coli genome</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<issue>16</issue>
				<fpage>9413</fpage>
				<lpage>9417</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">21352</pubid>
						<pubid idtype="pmpid" link="fulltext">9689094</pubid>
						<pubid idtype="doi">10.1073/pnas.95.16.9413</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Prophages and bacterial genomics: what have we learned so far?</p>
				</title>
				<aug>
					<au>
						<snm>Casjens</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>49</volume>
				<issue>2</issue>
				<fpage>277</fpage>
				<lpage>300</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2958.2003.03580.x</pubid>
						<pubid idtype="pmpid" link="fulltext">12886937</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>CodonW</p>
				</title>
				<url>http://codonw.sourceforge.net/</url>
			</bibl>
		</refgrp>
	</bm>
</art>
