Algorithms for Molecular Biology - Latest Articles
http://www.almob.org
The latest research articles published by Algorithms for Molecular Biology2014-12-16T12:00:00ZBicPAM: Pattern-based biclustering for biomedical data analysisBackground:
Biclustering, the discovery of sets of objects with a coherent pattern across a subset of conditions, is a critical task to study a wide-set of biomedical problems, where molecular units or patients are meaningfully related with a set of properties. The challenging combinatorial nature of this task led to the development of approaches with restrictions on the allowed type, number and quality of biclusters. Contrasting, recent biclustering approaches relying on pattern mining methods can exhaustively discover flexible structures of robust biclusters. However, these approaches are only prepared to discover constant biclusters and their underlying contributions remain dispersed.
Methods:
The proposed BicPAM biclustering approach integrates existing principles made available by state-of-the-art pattern-based approaches with two new contributions. First, BicPAM is the first efficient attempt to exhaustively mine non-constant types of biclusters, including additive and multiplicative coherencies in the presence or absence of symmetries. Second, BicPAM provides strategies to effectively compose different biclustering structures and to handle arbitrary levels of noise inherent to data and with discretization procedures.
Results:
Results show BicPAM?s superiority against its peers and its ability to retrieve unique types of biclusters of interest, to efficiently deliver exhaustive solutions and to successfully recover planted biclusters in datasets with varying levels of missing values and noise. Its application over gene expression data leads to unique solutions with heightened biological relevance.
Conclusions:
BicPAM approaches integrate existing disperse efforts towards pattern-based biclustering and provides the first critical strategies to efficiently discover exhaustive solutions of biclusters with shifting, scaling and symmetric assumptions with varying quality and underlying structures. Additionally, BicPAM dynamically adapts its behavior to mine data with different levels of missing values and noise.
http://www.almob.org/content/9/1/27
Rui HenriquesSara MadeiraAlgorithms for Molecular Biology 2014, null:272014-12-16T12:00:00Zdoi:10.1186/s13015-014-0027-z/content/figures/s13015-014-0027-z-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}272014-12-16T12:00:00ZPDFAnalysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov ModelsBackground:
Finding new functional fragments in biological sequences is a challenging problem. Methods addressing this problem commonly search for clusters of pattern occurrences that are statistically significant. A measure of statistical significance is the P-value of a number of pattern occurrences, i.e. the probability to find at least S occurrences of words from a pattern in a random text of length N generated according to a given probability model. All words of the pattern are supposed to be of same length.
Results:
We present a novel algorithm SufPref that computes an exact P-value for Hidden Markov models (HMM). The algorithm is based on recursive equations on text sets related to pattern occurrences; the equations can be used for any probability model. The algorithm inductively traverses a specific data structure, an overlap graph. The nodes of the graph are associated with the overlaps of words from . The edges are associated to the prefix and suffix relations between overlaps. An originality of our data structure is that pattern need not be explicitly represented in nodes or leaves. The algorithm relies on the Cartesian product of the overlap graph and the graph of HMM states; this approach is analogous to the automaton approach from JBCB 4: 553-569. The gain in size of SufPref data structure leads to significant improvements in space and time complexity compared to existent algorithms. The algorithm SufPref was implemented as a C++ program; the program can be used both as Web-server and a stand alone program for Linux and Windows. The program interface admits special formats to describe probability models of various types (HMM, Bernoulli, Markov); a pattern can be described with a list of words, a PSSM, a degenerate pattern or a word and a number of mismatches. It is available at http://server2.lpm.org.ru/bio/online/sf/. The program was applied to compare sensitivity and specificity of methods for TFBS prediction based on P-values computed for Bernoulli models, Markov models of orders one and two and HMMs. The experiments show that the methods have approximately the same qualities.
http://www.almob.org/content/9/1/25
Mireille RégnierEvgenia FurletovaVictor YakovlevMikhail RoytbergAlgorithms for Molecular Biology 2014, null:252014-12-16T12:00:00Zdoi:10.1186/s13015-014-0025-1/content/figures/s13015-014-0025-1-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}252014-12-16T12:00:00ZPDFA constraint solving approach to model reduction by tropical equilibrationModel reduction is a central topic in systems biology and dynamical systems theory, for reducing the complexity of detailed models, finding important parameters, and developing multi-scale models for instance. While singular perturbation theory is a standard mathematical tool to analyze the different time scales of a dynamical system and decompose the system accordingly, tropical methods provide a simple algebraic framework to perform these analyses systematically in polynomial systems. The crux of these methods is in the computation of tropical equilibrations. In this paper we show that constraint-based methods, using reified constraints for expressing the equilibration conditions, make it possible to numerically solve non-linear tropical equilibration problems, out of reach of standard computation methods. We illustrate this approach first with the detailed reduction of a simple biochemical mechanism, the Michaelis-Menten enzymatic reaction model, and second, with large-scale performance figures obtained on the http://biomodels.net
repository.
http://www.almob.org/content/9/1/24
Sylvain SolimanFrançois FagesOvidiu RadulescuAlgorithms for Molecular Biology 2014, null:242014-12-04T12:00:00Zdoi:10.1186/s13015-014-0024-2/content/figures/s13015-014-0024-2-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}242014-12-04T12:00:00ZXMLAtom mapping with constraint programmingChemical reactions are rearrangements of chemical bonds. Each atom in an educt molecule thus appears again in a specific position of one of the reaction products. This bijection between educt and product atoms is not reported by chemical reaction databases, however, so that the “Atom Mapping Problem” of finding this bijection is left as an important computational task for many practical applications in computational chemistry and systems biology. Elementary chemical reactions feature a cyclic imaginary transition state (ITS) that imposes additional restrictions on the bijection between educt and product atoms that are not taken into account by previous approaches. We demonstrate that Constraint Programming is well-suited to solving the Atom Mapping Problem in this setting. The performance of our approach is evaluated for a manually curated subset of chemical reactions from the KEGG database featuring various ITS cycle layouts and reaction mechanisms.
http://www.almob.org/content/9/1/23
Martin MannFeras NaharNorah SchnorrRolf BackofenPeter StadlerChristoph FlammAlgorithms for Molecular Biology 2014, null:232014-11-29T00:00:00Zdoi:10.1186/s13015-014-0023-3/content/figures/s13015-014-0023-3-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}232014-11-29T00:00:00ZXMLA priori assessment of data quality in molecular phylogeneticsSets of sequence data used in phylogenetic analysis are often plagued by both random noise and systematic biases. Since the commonly used methods of phylogenetic reconstruction are designed to produce trees it is an important task to evaluate these trees a posteriori. Preferably, however, one would like to assess the suitability of the input data for phylogenetic analysis a priori and, if possible, obtain information on how to prune the data sets to improve the quality of phylogenetic reconstruction without introducing unwarranted biases. In the last few years several different approaches, algorithms, and software tools have been proposed for this purpose. Here we provide an overview of the state of the art and briefly discuss the most pressing open problems.
http://www.almob.org/content/9/1/22
Bernhard MisofKaren MeusemannBjörn von ReumontPatrick KückSonja ProhaskaPeter StadlerAlgorithms for Molecular Biology 2014, null:222014-09-12T00:00:00Zdoi:10.1186/s13015-014-0022-4/content/figures/s13015-014-0022-4-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}222014-09-12T00:00:00ZXMLGraph-distance distribution of the Boltzmann ensemble of RNA secondary structuresBackground:
Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains.ResultSpatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naïve implementation would yield recursions with a very high time complexity of O(n
6
D
5) for sequence length n and D distinct distance values, it is possible to reduce this to O(n
4) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules.
Conclusions:
The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html.
http://www.almob.org/content/9/1/19
Jing QinMarkus FrickeManja MarzPeter StadlerRolf BackofenAlgorithms for Molecular Biology 2014, null:192014-09-11T00:00:00Zdoi:10.1186/1748-7188-9-19/content/figures/1748-7188-9-19-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}192014-09-11T00:00:00ZXMLOptimal computation of all tandem repeats in a weighted sequenceBackground:
Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.
Results:
Crochemore’s repetitions algorithm, also referred to as Crochemore’s partitioning algorithm, was introduced in 1981, and was the first optimal
O
(
n
log
n
)
-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal
O
(
n
log
n
)
time, thus improving on the best known
O
n
2
-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.
http://www.almob.org/content/9/1/21
Carl BartonCostas IliopoulosSolon PissisAlgorithms for Molecular Biology 2014, null:212014-08-16T12:00:00Zdoi:10.1186/s13015-014-0021-5/content/figures/s13015-014-0021-5-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}212014-08-16T12:00:00ZXMLUsing the message passing algorithm on discrete data to detect faults in boolean regulatory networksBackground:
An important problem in systems biology is to model gene regulatory networks which can then be utilized to develop novel therapeutic methods for cancer treatment. Knowledge about which proteins/genes are dysregulated in a regulatory network, such as in the Mitogen Activated Protein Kinase (MAPK) Network, can be used not only to decide upon which therapy to use for a particular case of cancer, but also help in discovering effective targets for new drugs.
Results:
In this work we demonstrate how one can start from a model signal transduction network derived from prior knowledge, and infer from gene expression data the probable locations of dysregulations in the network. Our model is based on Boolean networks, and the inference problem is solved using a version of the message passing algorithm. We have done simulation experiments on synthetic data to verify the efficacy of the algorithm as compared to the results from the much more computationally intensive Markov Chain Monte-Carlo methods. We also applied the model to analyze data collected from fibroblasts, thereby demonstrating how this model can be used on real world data.
http://www.almob.org/content/9/1/20
Anwoy MohantyAniruddha DattaVijayanagaram VenkatrajAlgorithms for Molecular Biology 2014, null:202014-08-16T12:00:00Zdoi:10.1186/s13015-014-0020-6/content/figures/s13015-014-0020-6-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}202014-08-16T12:00:00ZXMLNot assessing the efficiency of multiple sequence alignment programsOne can search for messages in the digits of π or a Kazakhstan telephone book, but there may be hidden messages closer to home. A recent publication in this journal purportedly compared a set of multiple sequence alignment programs. The real purpose of the article may have been to remind readers how to present scientific data.
http://www.almob.org/content/9/1/18
Andrew TordaAlgorithms for Molecular Biology 2014, null:182014-07-05T00:00:00Zdoi:10.1186/1748-7188-9-18/content/figures/1748-7188-9-18-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}182014-07-05T00:00:00ZXMLRNA-RNA interaction prediction using genetic algorithmBackground:
RNA-RNA interaction plays an important role in the regulation of gene expression and cell development. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. In the RNA-RNA interaction prediction problem, two RNA sequences are given as inputs and the goal is to find the optimal secondary structure of two RNAs and between them. Some different algorithms have been proposed to predict RNA-RNA interaction structure. However, most of them suffer from high computational time.
Results:
In this paper, we introduce a novel genetic algorithm called GRNAs to predict the RNA-RNA interaction. The proposed algorithm is performed on some standard datasets with appropriate accuracy and lower time complexity in comparison to the other state-of-the-art algorithms. In the proposed algorithm, each individual is a secondary structure of two interacting RNAs. The minimum free energy is considered as a fitness function for each individual. In each generation, the algorithm is converged to find the optimal secondary structure (minimum free energy structure) of two interacting RNAs by using crossover and mutation operations.
Conclusions:
This algorithm is properly employed for joint secondary structure prediction. The results achieved on a set of known interacting RNA pairs are compared with the other related algorithms and the effectiveness and validity of the proposed algorithm have been demonstrated. It has been shown that time complexity of the algorithm in each iteration is as efficient as the other approaches.
http://www.almob.org/content/9/1/17
Soheila MontaseriFatemeh Zare-MirakabadNasrollah Moghadam-CharkariAlgorithms for Molecular Biology 2014, null:172014-06-29T00:00:00Zdoi:10.1186/1748-7188-9-17/content/figures/1748-7188-9-17-toc.gifAlgorithms for Molecular Biology1748-7188${item.volume}172014-06-29T00:00:00ZXML