Email updates

Keep up to date with the latest news and content from Algorithms for Molecular Biology and BioMed Central.

Open Access Research

Random generation of RNA secondary structures according to native distributions

Markus E Nebel, Anika Scheid* and Frank Weinberg

Author Affiliations

Department of Computer Science, University of Kaiserslautern, Germany

For all author emails, please log on.

Algorithms for Molecular Biology 2011, 6:24  doi:10.1186/1748-7188-6-24

Published: 12 October 2011

Abstract

Background

Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest.

Results

In this article, we present a new general framework for deriving algorithms for the non-uniform random generation of combinatorial objects according to the encoding and probability distribution implied by a stochastic context-free grammar. Briefly, the framework extends on the well-known recursive method for (uniform) random generation and uses the popular framework of admissible specifications of combinatorial classes, introducing weighted combinatorial classes to allow for the non-uniform generation by means of unranking. This framework is used to derive an algorithm for the generation of RNA secondary structures of a given fixed size. We address the random generation of these structures according to a realistic distribution obtained from real-life data by using a very detailed context-free grammar (that models the class of RNA secondary structures by distinguishing between all known motifs in RNA structure). Compared to well-known sampling approaches used in several structure prediction tools (such as SFold) ours has two major advantages: Firstly, after a preprocessing step in time <a onClick="popup('http://www.almob.org/content/6/1/24/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.almob.org/content/6/1/24/mathml/M1">View MathML</a> for the computation of all weighted class sizes needed, with our approach a set of m random secondary structures of a given structure size n can be computed in worst-case time complexity <a onClick="popup('http://www.almob.org/content/6/1/24/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.almob.org/content/6/1/24/mathml/M2">View MathML</a> while other algorithms typically have a runtime in <a onClick="popup('http://www.almob.org/content/6/1/24/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.almob.org/content/6/1/24/mathml/M3">View MathML</a>. Secondly, our approach works with integer arithmetic only which is faster and saves us from all the discomforting details of using floating point arithmetic with logarithmized probabilities.

Conclusion

A number of experimental results shows that our random generation method produces realistic output, at least with respect to the appearance of the different structural motifs. The algorithm is available as a webservice at http://wwwagak.cs.uni-kl.de/NonUniRandGen webcite and can be used for generating random secondary structures of any specified RNA type. A link to download an implementation of our method (in Wolfram Mathematica) can be found there, too.

Keywords:
Random generation; stochastic context-free grammars; RNA secondary structures