Algorithms for Molecular Biology

official impact factor 2.80

Open Access Research

Phylogenetic comparative assembly

Peter Husemann1,2* and Jens Stoye1,3

Author Affiliations

1 AG Genominformatik, Technische Fakultät, Bielefeld University, Germany

2 International Graduate School in Bioinformatics and Genome Research, Bielefeld University, Germany

3 Institute for Bioinformatics, Center for Biotechnology (CeBiTec), Bielefeld University, Germany

For all author emails, please log on.

Algorithms for Molecular Biology 2010, 5:3 doi:10.1186/1748-7188-5-3

Published: 4 January 2010

Abstract

Background

Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence.

Results

Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent.

Subsequently, this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible, and the best alternatives where necessary.

Conclusions

Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same time.