Email updates

Keep up to date with the latest news and content from Algorithms for Molecular Biology and BioMed Central.

Open Access Highly Accessed Review Article

Data compression for sequencing data

Sebastian Deorowicz1 and Szymon Grabowski2*

Author Affiliations

1 Institute of Informatics, Silesian University of Technology, Gliwice, Poland

2 Institute of Applied Computer Science, Lodz University of Technology, Łódź, Poland

For all author emails, please log on.

Algorithms for Molecular Biology 2013, 8:25  doi:10.1186/1748-7188-8-25

Published: 19 November 2013

Abstract

Post-Sanger sequencing methods produce tons of data, and there is a general agreement that the challenge to store and process them must be addressed with data compression. In this review we first answer the question “why compression” in a quantitative manner. Then we also answer the questions “what” and “how”, by sketching the fundamental compression ideas, describing the main sequencing data types and formats, and comparing the specialized compression algorithms and tools. Finally, we go back to the question “why compression” and give other, perhaps surprising answers, demonstrating the pervasiveness of data compression techniques in computational biology.