Research
Random generation of RNA secondary structures according to native distributions
Department of Computer Science, University of Kaiserslautern, Germany
Algorithms for Molecular Biology 2011, 6:24 doi:10.1186/1748-7188-6-24
Published: 12 October 2011Abstract
Background
Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest.
Results
In this article, we present a new general framework for deriving algorithms for the
non-uniform random generation of combinatorial objects according to the encoding and
probability distribution implied by a stochastic context-free grammar. Briefly, the
framework extends on the well-known recursive method for (uniform) random generation
and uses the popular framework of admissible specifications of combinatorial classes,
introducing weighted combinatorial classes to allow for the non-uniform generation
by means of unranking. This framework is used to derive an algorithm for the generation
of RNA secondary structures of a given fixed size. We address the random generation
of these structures according to a realistic distribution obtained from real-life
data by using a very detailed context-free grammar (that models the class of RNA secondary
structures by distinguishing between all known motifs in RNA structure). Compared
to well-known sampling approaches used in several structure prediction tools (such
as SFold) ours has two major advantages: Firstly, after a preprocessing step in time
for the computation of all weighted class sizes needed, with our approach a set of
m random secondary structures of a given structure size n can be computed in worst-case time complexity
while other algorithms typically have a runtime in
. Secondly, our approach works with integer arithmetic only which is faster and saves
us from all the discomforting details of using floating point arithmetic with logarithmized
probabilities.
Conclusion
A number of experimental results shows that our random generation method produces realistic output, at least with respect to the appearance of the different structural motifs. The algorithm is available as a webservice at http://wwwagak.cs.uni-kl.de/NonUniRandGen webcite and can be used for generating random secondary structures of any specified RNA type. A link to download an implementation of our method (in Wolfram Mathematica) can be found there, too.



