Algorithms for Molecular Biology

official impact factor 2.80

Open Access Software article

Periodic pattern detection in sparse boolean sequences

Ivan Junier1,2, Joan Hérisson2 and François Képès2*

Author Affiliations

1 Institut des Systèmes Complexes Paris Île-de-France, 57-59 rue Lhomond, F-75005, Paris, France

2 Epigenomics Project, Genopole, CNRS UPS3201, UniverSud Paris, University of Evry, Genopole Campus 1 - Genavenir 6, 5 rue Henri Desbruères - F-91030 EVRY cedex, France

For all author emails, please log on.

Algorithms for Molecular Biology 2010, 5:31 doi:10.1186/1748-7188-5-31

Published: 10 September 2010

Abstract

Background

The specific position of functionally related genes along the DNA has been shown to reflect the interplay between chromosome structure and genetic regulation. By investigating the statistical properties of the distances separating such genes, several studies have highlighted various periodic trends. In many cases, however, groups built up from co-functional or co-regulated genes are small and contain wrong information (data contamination) so that the statistics is poorly exploitable. In addition, gene positions are not expected to satisfy a perfectly ordered pattern along the DNA. Within this scope, we present an algorithm that aims to highlight periodic patterns in sparse boolean sequences, i.e. sequences of the type 010011011010... where the ratio of the number of 1's (denoting here the transcription start of a gene) to 0's is small.

Results

The algorithm is particularly robust with respect to strong signal distortions such as the addition of 1's at arbitrary positions (contaminated data), the deletion of existing 1's in the sequence (missing data) and the presence of disorder in the position of the 1's (noise). This robustness property stems from an appropriate exploitation of the remarkable alignment properties of periodic points in solenoidal coordinates.

Conclusions

The efficiency of the algorithm is demonstrated in situations where standard Fourier-based spectral methods are poorly adapted. We also show how the proposed framework allows to identify the 1's that participate in the periodic trends, i.e. how the framework allows to allocate a positional score to genes, in the same spirit of the sequence score. The software is available for public use at http://www.issb.genopole.fr/MEGA/Softwares/iSSB_SolenoidalApplication.zip webcite.