<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-7188-2-13</ui>
   <ji>1748-7188</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of <it>cis</it>-regulatory modules</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Boeva</snm>
               <fnm>Valentina</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>valeyo@yandex.ru</email>
            </au>
            <au id="A2">
               <snm>Cl&#233;ment</snm>
               <fnm>Julien</fnm>
               <insr iid="I3"/>
               <email>Julien.Clement@info.unicaen.fr</email>
            </au>
            <au id="A3">
               <snm>R&#233;gnier</snm>
               <fnm>Mireille</fnm>
               <insr iid="I2"/>
               <email>Mireille.Regnier@inria.fr</email>
            </au>
            <au id="A4">
               <snm>Roytberg</snm>
               <mi>A</mi>
               <fnm>Mikhail</fnm>
               <insr iid="I4"/>
               <insr iid="I5"/>
               <email>mroytberg@impb.psn.ru</email>
            </au>
            <au id="A5">
               <snm>Makeev</snm>
               <mi>J</mi>
               <fnm>Vsevolod</fnm>
               <insr iid="I1"/>
               <insr iid="I6"/>
               <email>makeev@genetika.ru</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institute of Genetics and Selection of Industrial Microorganisms, GosNIIGenetika, 117545 Moscow, Russia</p>
            </ins>
            <ins id="I2">
               <p>MIGEC, INRIA Rocquencourt, 78153 Le Chesnay, France</p>
            </ins>
            <ins id="I3">
               <p>GREYC, CNRS UMR 6072, Laboratoire d'informatique, 14032 Caen, France</p>
            </ins>
            <ins id="I4">
               <p>Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Puschino, Moscow Region, Russia</p>
            </ins>
            <ins id="I5">
               <p>Puschino State University, Puschino, Moscow Region, Russia</p>
            </ins>
            <ins id="I6">
               <p>Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia</p>
            </ins>
         </insg>
         <source>Algorithms for Molecular Biology</source>
         <issn>1748-7188</issn>
         <pubdate>2007</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>13</fpage>
         <url>http://www.almob.org/content/2/1/13</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17927813</pubid>
               <pubid idtype="doi">10.1186/1748-7188-2-13</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>13</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>10</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>10</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Boeva et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p><it>cis</it>-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of <it>p</it>-values for simultaneous occurrences of different motifs which can overlap.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We developed and implemented an algorithm computing the <it>p</it>-value that <it>s </it>different motifs occur respectively <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s </it></sub>or more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of <it>cis</it>-regulatory modules involved in <it>D. melanogaster </it>early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA.</p>
            </sec>
            <sec>
               <st>
                  <p>Method</p>
               </st>
               <p>The algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the <it>O</it>(<it>n</it>|&#931;|(<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| + <it>K</it>|<it>&#963;</it>|<sup><it>K</it></sup>) &#8719;<sub><it>i </it></sub><it>k</it><sub><it>i</it></sub>) time complexity, where <it>n </it>is the length of the text, |&#931;| is the alphabet size, <it>m </it>is the maximal motif length, |<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| is the total number of words in motifs, <it>K </it>is the order of Markov model, and <it>k</it><sub><it>i </it></sub>is the number of occurrences of the <it>i</it>th motif.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs.</p>
            </sec>
            <sec>
               <st>
                  <p>Availability</p>
               </st>
               <p>Project web page, stand-alone version and documentation can be found at <url>http://bioinform.genetika.ru/AhoPro/</url></p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>During the past few years, a number of computational tools have been designed <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> for locating potential <it>transcription factor binding sites </it>(TFBSs) in nucleotide sequences, e.g., in compilations of sequences upstream of putative co-regulated genes. In parallel, experimental approaches were developed <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, which allowed identification of binding motifs for many different transcription factors. Experimental <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and bioinformatical <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> studies demonstrated that sequences of regulatory DNA that bind transcription factors can exhibit many different types of architecture. In eukaryotes TFBSs found in DNA sequences often form rather dense clusters: this was demonstrated both by experimental <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp> and computational <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp> methods. Such clusters can contain sites binding the same factor or several different factors <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. The <it>cis</it>-regulatory module (CRM) in this case contains respectively homotypic or heterotypic clusters of motifs specifically recognized by binding proteins <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
         <p>The particular arrangement of motifs in a homotypic or heterotypic cluster is not random, and it is commonly accepted, that the motif arrangement within a CRM is important for its functionality <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Bioinformatics studies indicate that antagonistic factors often bind to overlapping sites <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> whereas synergetic factors are often positioned within a fixed distance <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, often close to the multiple of 10.2 bp, the DNA double-helix pitch value <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
         <p>Non-random arrangements of TFBSs within regulatory segments of DNA sequences are exploited in several TFBS identification tools, and it was observed that cooperativity-based discrimination of TFBSs surpasses the performance of models for individual TFBSs <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>On observing a cluster of TFBSs in some genome segment one can calculate the probability of observing similar site arrangements in a random sequence. This idea of evaluating the statistical significance of heterotypic clusters of sites was implemented in many programs including ClusterDraw <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, ModuleSearcher <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, MCAST <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, eCIS-ANALYST <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, Cister <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, Cluster-Buster <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and TargetExplorer <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. At the moment, such programs use empirical procedures like motif counting in biological and simulated sequences to assess the significance of observed site clustering. But it is highly desirable to have a good statistical measure of site clustering, and we believe that the best measure is the <it>p</it>-value of obtaining the observed cluster by chance in a random sequence of a Markov or Bernoulli (common name for Markov chain of order 0) type. In the case of heterotypic clusters one needs to take into account possible overlapping occurrences of different motifs, a problem that was considered difficult until now <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. In the case of homotypic clusters, an approximate statistical scoring function was constructed <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B31">31</abbr></abbrgrp>; this approach has been implemented in algorithms like FLYENHANCER <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, SCORE <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, and CLUSTER <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. However, this approximation performs poorly for highly overlapping TFBSs. One cannot ignore site overlapping if the motifs are fuzzy (highly degenerate), which is often the case for so-called "shadow sites" <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. In the case of heterotypic clusters, competing factors can bind even to very well determined motifs that overlap.</p>
         <sec>
            <st>
               <p>Representation of protein binding motifs in nucleotide sequences</p>
            </st>
            <p>Experimental methods on protein binding to DNA usually locate some DNA segment, or word in DNA text, as a probable binding target. Proteins can bind to similar DNA words <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, the whole assembly of which can be called a motif. The simplest motif representation is the enumeration of sequences that can be bound by a transcription factor (TF) <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Sometimes, information about binding sites can be found in SELEX <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp> or Protein Binding Microarray (PBM) experiments <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. However, it is possible that such experiments do not give the exhaustive list of sequences of binding sites, so one needs to expand the list of putative binding sites using an appropriate criterion, which brings about the problem of the generalization of several known examples.</p>
            <p>For instance, several words aligned with mismatches, can be generalized to IUPAC string (like RSTGACTNMNW for AP-1 binding sites <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>) by disregarding correlated substitutions in different motif positions <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Another example of generalization is the set of words that can deviate from a consensus word for less than a given number of mismatches.</p>
            <p>The most popular way to represent binding sites is a Position Weight Matrix (PWM), which is also called position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM) <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. For a text with length <it>D </it>over an alphabet &#931; with |&#931;| symbols, a PWM is a |&#931;| &#215; <it>D </it>matrix: each row corresponding to a symbol of the alphabet &#931;, and each column to a position in the motif. For DNA texts, one has &#931; = {<it>A</it>, <it>C</it>, <it>G</it>, <it>T</it>}. The PWM score is defined as <inline-formula><m:math name="1748-7188-2-13-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>L</m:mi></m:msubsup><m:mrow><m:msub><m:mi>m</m:mi><m:mrow><m:mi>&#969;</m:mi><m:mo stretchy="false">(</m:mo><m:mi>i</m:mi><m:mo stretchy="false">)</m:mo><m:mo>,</m:mo><m:mi>i</m:mi></m:mrow></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWaqaaiabd2gaTnaaBaaaleaaiiGacqWFjpWDcqGGOaakcqWGPbqAcqGGPaqkcqGGSaalcqWGPbqAaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabdYeambqdcqGHris5aaaa@3BC0@</m:annotation></m:semantics></m:math></inline-formula>, where <it>i </it>represents a position in the <it>D</it>-substring, <it>&#969;</it>(<it>i</it>) the symbol at position <it>i </it>in the substring, and <it>m</it><sub><it>&#945;, i </it></sub>the score in row <it>&#945;</it>, column <it>i </it>of the matrix. So, given a cutoff value, one gets a list of <it>D</it>-sequences that score higher than this cutoff; thus representing possible DNA binding sites for the protein.</p>
            <p>Any of the three motif representations above can be converted to a list of words. The same is true for many other representations of motifs. In this study, we consider only the motifs that can be represented as a set of words.</p>
         </sec>
         <sec>
            <st>
               <p>P-value for clusters of motif occurrences, problem formulation</p>
            </st>
            <p>The objective of this work is to develop a statistical criterion to assess clustering of TFBS. Intuitively, a TFBS cluster is a DNA segment simultaneously containing "too many" TFBSs for given factor proteins; such a segment can often operate as a CRM regulated by these TFs. From a formal point of view, the problem we address here is as follows. Let <it>s </it>sets of words <inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula> be given. Typically, each set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i </it></sub>is associated to a TF motif. Given a <it>s</it>-tuple of integers (<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>), we compute the corresponding <it>p</it>-value, that is the probability to find at least <it>k</it><sub><it>i </it></sub>occurrences of words from each set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i </it></sub>in a random text of size <it>n</it>. We assume that the texts where motifs are searched are randomly generated by a Bernoulli process or a Markov model of order <it>K</it>. If (<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) occurrences of motifs <inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula> are found in a DNA segment, the <it>p</it>-value can be used to infer if such numbers of occurrences could be found by chance.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Related work</p>
         </st>
         <p>Most previous works address counting problems for one set of several words <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. In contrast, in this paper we deal with a separate counting for several sets of several words <inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>, each set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>represents one TFBS motif.</p>
         <p>All methods of solving the problem of <it>p</it>-value calculations for multiple occurrences of words from a set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> study some basic languages. Let <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>) be the set of texts of length <it>n </it>containing at least <it>k </it>occurrences of <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. The desired <it>p</it>-value would therefore be the probability <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>)). Let <inline-formula><m:math name="1748-7188-2-13-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#8475;</m:mi><m:mi>&#8459;</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFBeIudaqhaaWcbaGae83cHGeabaGaem4AaSgaaaaa@3A01@</m:annotation></m:semantics></m:math></inline-formula> be the set of texts of all lengths that contain exactly <it>k </it>words of <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>, the last one occurring as a suffix <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. For any H<sub><it>j </it></sub>in <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>, let <inline-formula><m:math name="1748-7188-2-13-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#8475;</m:mi><m:mrow><m:msub><m:mtext>H</m:mtext><m:mi>j</m:mi></m:msub></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFBeIudaqhaaWcbaGaeeisaG0aaSbaaWqaaiabdQgaQbqabaaaleaacqWGRbWAaaaaaa@3BB4@</m:annotation></m:semantics></m:math></inline-formula> be the subset of <inline-formula><m:math name="1748-7188-2-13-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#8475;</m:mi><m:mi>&#8459;</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFBeIudaqhaaWcbaGae83cHGeabaGaem4AaSgaaaaa@3A01@</m:annotation></m:semantics></m:math></inline-formula> where H<sub><it>j </it></sub>is a suffix. One observes that a text contains at least <it>k </it>occurrences if and only if it admits a prefix in <inline-formula><m:math name="1748-7188-2-13-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#8475;</m:mi><m:mi>&#8459;</m:mi><m:mi>k</m:mi></m:msubsup><m:mo>=</m:mo><m:msub><m:mo>&#8746;</m:mo><m:mrow><m:msub><m:mtext>H</m:mtext><m:mi>j</m:mi></m:msub><m:mo>&#8712;</m:mo><m:mi>&#8459;</m:mi></m:mrow></m:msub><m:msubsup><m:mi>&#8475;</m:mi><m:mrow><m:msub><m:mtext>H</m:mtext><m:mi>j</m:mi></m:msub></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFBeIudaqhaaWcbaGae83cHGeabaGaem4AaSgaaOGaeyypa0JaeSOkIu1aaSbaaSqaaiabbIeainaaBaaameaacqWGQbGAaeqaaSGaeyicI4Sae83cHGeabeaakiab=TrisnaaDaaaleaacqqGibasdaWgaaadbaGaemOAaOgabeaaaSqaaiabdUgaRbaaaaa@46ED@</m:annotation></m:semantics></m:math></inline-formula>. One defines <inline-formula><m:math name="1748-7188-2-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>r</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math></inline-formula> (<it>p</it>) as the probability that a text of size <it>p </it>be in set <inline-formula><m:math name="1748-7188-2-13-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#8475;</m:mi><m:mrow><m:msub><m:mtext>H</m:mtext><m:mi>j</m:mi></m:msub></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFBeIudaqhaaWcbaGaeeisaG0aaSbaaWqaaiabdQgaQbqabaaaleaacqWGRbWAaaaaaa@3BB4@</m:annotation></m:semantics></m:math></inline-formula>. If no word in <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> is a subword of another word in <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>, the probability <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>)) to find at least <it>k </it>occurrences of words from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> in a random text of length <it>n </it>satisfies</p>
         <p>
            <display-formula>
               <m:math name="1748-7188-2-13-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>P</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>L</m:mi>
                           <m:mi>n</m:mi>
                        </m:msub>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>&#8459;</m:mi>
                        <m:mo>;</m:mo>
                        <m:mi>k</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munder>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>p</m:mi>
                                 <m:mo>&#8804;</m:mo>
                                 <m:mi>n</m:mi>
                              </m:mrow>
                           </m:munder>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munder>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mtext>H</m:mtext>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                       <m:mo>&#8712;</m:mo>
                                       <m:mi>&#8459;</m:mi>
                                    </m:mrow>
                                 </m:munder>
                                 <m:mrow>
                                    <m:msubsup>
                                       <m:mi>r</m:mi>
                                       <m:mi>j</m:mi>
                                       <m:mi>k</m:mi>
                                    </m:msubsup>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>p</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mstyle>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFqbaucqGGOaakcqWGmbatdaWgaaWcbaGaemOBa4gabeaakiabcIcaOmrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+TqiijabcUda7iabdUgaRjabcMcaPiabcMcaPiabg2da9maaqafabaWaaabuaeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaOGaeiikaGIaemiCaaNaeiykaKcaleaacqqGibasdaWgaaadbaGaemOAaOgabeaaliabgIGiolab+Tqiibqab0GaeyyeIuoaaSqaaiabdchaWjabgsMiJkabd6gaUbqab0GaeyyeIuoaaaa@577F@</m:annotation>
                  </m:semantics>
               </m:math>
            </display-formula>
         </p>
         <p>Therefore, one tries to compute the sequence of (<inline-formula><m:math name="1748-7188-2-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>r</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math></inline-formula> (<it>p</it>)) values.</p>
         <sec>
            <st>
               <p>Linear induction</p>
            </st>
            <p>In the first class of methods <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>, one computes, implicitly or explicitly, probabilities <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>)) up to a given text length <it>n</it>. Such methods are intrinsically linear in <it>n</it>. In <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp> one relies on a recurrence relation on <inline-formula><m:math name="1748-7188-2-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>r</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math></inline-formula> (<it>n</it>) that extends the one originally given in <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Typically, one step will cost <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|<it>m</it>), where <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> is a set of words of length <it>m </it>and |<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| is its cardinality. Time complexity is <it>O </it>(<it>n</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|<it>m</it>) and, relying on a combinatorial property, <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> achieves optimal space complexity <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| log |<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|<it>m</it>). However the authors of <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> do not consider several motifs occurrences and restrict themselves to the Bernoulli model. The authors of <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> consider the Markov model, still using one motif for TFBS.</p>
         </sec>
         <sec>
            <st>
               <p>Algebraic Formulae</p>
            </st>
            <p>In a second class of methods <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>, a preprocessing computes <it>generating functions</it></p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-13-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>r</m:mi>
                              <m:mi>j</m:mi>
                              <m:mi>k</m:mi>
                           </m:msubsup>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>z</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mi>n</m:mi>
                              </m:munder>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>r</m:mi>
                                    <m:mi>j</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msubsup>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:msup>
                                    <m:mi>z</m:mi>
                                    <m:mi>n</m:mi>
                                 </m:msup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaOGaeiikaGIaemOEaONaeiykaKIaeyypa0ZaaabuaeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaOGaeiikaGIaemOBa4MaeiykaKIaemOEaO3aaWbaaSqabeaacqWGUbGBaaaabaGaemOBa4gabeqdcqGHris5aOGaeiOla4caaa@4432@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>In a second step, probabilities <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>)) are either extracted from the generating function or approximated.</p>
            <p>In <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B53">53</abbr></abbrgrp>, <inline-formula><m:math name="1748-7188-2-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>r</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math></inline-formula> (<it>z</it>) are the solutions of a system of equations. To derive these equations, the authors build an automaton that recognizes these languages <inline-formula><m:math name="1748-7188-2-13-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#8475;</m:mi><m:mrow><m:msub><m:mtext>H</m:mtext><m:mi>j</m:mi></m:msub></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFBeIudaqhaaWcbaGaeeisaG0aaSbaaWqaaiabdQgaQbqabaaaleaacqWGRbWAaaaaaa@3BB4@</m:annotation></m:semantics></m:math></inline-formula> (one can prove that they are regular).</p>
            <p>A language approach <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> or an induction <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> leads to a formal expression that depends on the words overlaps. The main drawback is that these methods need to compute the determinant of a matrix of polynomials with a huge dimension, e.g. <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|). This <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|<sup>2</sup>) <it>symbolic computation </it>may be more expensive than the extraction step or the linear computation above, that involve <it>arithmetic operations </it>on real numbers.</p>
            <p>When the preprocessing step is achievable, the extraction step is amenable to the solution of a linear recurrence of degree <it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|; therefore, its complexity is <it>O </it>(<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|<it>n</it>) and a classical optimization yields <it>O </it>(<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| log <it>n</it>). There exists some good implementations that are numerically stable. One may cite the REGEXPCOUNT <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> or EXCEP <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> programs that rely on Fast Fourier Transform.</p>
            <p>Finally, approximations are available, the computation of which is constant with respect to <it>n</it>, but not to <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. One approach is the compound Poisson approximation <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>, but this approximation is not precise enough <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. Asymptotic results can also be derived from the algebraic formulae above <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B58">58</abbr></abbrgrp>, not needing an explicit expression for <inline-formula><m:math name="1748-7188-2-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>r</m:mi><m:mi>j</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCdaqhaaWcbaGaemOAaOgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math></inline-formula> (<it>z</it>), and therefore avoiding the expensive determinant computation. Time complexity, typically, is the one for computing all possible overlaps, that is approximately <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|<sup>2</sup>). This yields extremely precise results when the expectation of the number of occurrences, <it>nP </it>(H) is very small <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> or close to 1 <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> (the case studied the most often). Case <it>nP </it>(H) ~2 is achieved in <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Nevertheless, extension to larger values of <it>k </it>or multioccurrences and multisets is still open.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>Here we consider in detail the approach we suggest.</p>
         <p>A motif assigned to a TF is a finite set of words <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = (H<sub>1</sub>, ..., H<sub>r</sub>) where each word represents one putative TF binding site in DNA. Note that words in motif can generally be of different lengths. However, no word from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> can contain another word from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> as a substring. We consider, as an occurrence of motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> in text <it>T</it>, any occurrence of any word <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>&#8712; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> in <it>T</it>. Below all texts and words in motifs are sequences on a given alphabet &#931;.</p>
         <p>Let (<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>) be <it>s </it>different motifs. Our objective is to calculate the probability (<it>p</it>-value) that motifs (<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>) have respectively at least (<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) possibly overlapping occurrences in a random text <it>T</it><sub><it>n</it></sub>.</p>
         <p>To be more precise, there is a probability distribution defined on the set &#931;<sup><it>n </it></sup>of all texts of length <it>n </it>in the alphabet &#931;; the most widely used models are random Bernoulli trials and a Markov model of order <it>K</it>. Denote as <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) the set of all texts of length <it>n </it>containing at least <it>k</it><sub><it>i </it></sub>possibly overlapping occurrences of each motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i</it></sub>; <it>i </it>= 1, ..., <it>s</it>. Then the desired <it>p</it>-value is the probability <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>)) of the set <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) with respect to the given probability distribution on &#931;<sup><it>n</it></sup>.</p>
         <p>Our approach to the calculation of this <it>p</it>-value is similar to that published in <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, which was used there to calculate seed sensitivity in local alignment search. The approach exploits the fact that the algorithm of Aho and Corasick <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> can be modified to efficiently determine whether a given text belongs to the set <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) or not. Ideas published in <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> and <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> can be adopted to compute the probability <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>)) that the random text <it>T</it><sub><it>n </it></sub>&#8712; &#931;<sup><it>n </it></sup>belongs to the set <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>).</p>
         <p>We start from the simplest case of one motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> for which we calculate the probability <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; 1)) that text <it>T</it><sub><it>n </it></sub>contains at least one occurrence of the motif with respect to a Bernoulli probability distribution. More complicated cases (arbitrary number of occurrences; arbitrary number of motifs; Markov distribution) will be discussed in the following sections.</p>
         <sec>
            <st>
               <p>Construction of Aho-Corasick traversal</p>
            </st>
            <p>Aho and Corasick <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> have proposed the algorithm determining if a given text <it>T </it>contains an occurrence of a word from a given set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. The basic data structure is a prefix tree which is a variant of the classical trie <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula><abbrgrp><abbr bid="B42">42</abbr></abbrgrp> that may be built on the set of words <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. Let <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> denote the set of prefixes of these words. In the following, we identify a word <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> with node <it>Node </it>(<it>q</it>) at the end of the branch labeled by <it>q</it>. In particular, the root is identified with the empty string <it>&#949;</it>. The length of a prefix is the depth of <it>Node </it>(<it>q</it>).</p>
            <p>The classic Aho-Corasick algorithm is a tree traversal determined by a <it>transition function </it><inline-formula><m:math name="1748-7188-2-13-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#948;</m:mi><m:mo>:</m:mo><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub><m:mo>&#215;</m:mo><m:mi>&#931;</m:mi><m:mo>&#8594;</m:mo><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF0oazcqGG6aGocqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae43cHGeabeaakiabgEna0kabfo6atjabgkziUkabdgfarnaaBaaaleaacqGFlecsaeqaaaaa@4341@</m:annotation></m:semantics></m:math></inline-formula> defined as follows. For any pair (<it>p, a</it>) in <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> &#215; &#931;, <it>&#948; </it>(<it>p, a</it>) is the largest suffix of concatenation <it>pa </it>that belongs to <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>. Remark that <it>&#948; </it>(<it>p, a</it>) = <it>pa </it>iff <it>pa </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <p>Given a text <it>T </it>read from left to right, let <it>T </it>[<it>i</it>] denote the letter of <it>T </it>at position <it>i</it>. Let <it>q</it><sub><it>i </it></sub>be the largest suffix in text <it>T</it>[1] &#8943; <it>T </it>[<it>i</it>] that belongs to <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>. The sequence of nodes visited during the traversal are defined by words <it>q</it><sub><it>i </it></sub>that satisfy the inductive relationship</p>
            <p>
               <display-formula>&#8704;<it>i </it>&#8805; 0, <it>q</it><sub><it>i</it>+1 </sub>= <it>&#948; </it>(<it>q</it><sub><it>i</it></sub>, <it>T </it>[<it>i </it>+ 1]),</display-formula>
            </p>
            <p>with the initial condition <it>q</it><sub>0 </sub>= <it>&#949;</it>.</p>
            <p><b>Example: </b>Let <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> be the set {AAA, AAC, ACA, ACA, CCT}. The corresponding tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> is depicted in Figure <figr fid="F1">1</figr>. Values of <it>&#948; </it>function are given in Table <tblr tid="T1">1</tblr>. Aho-Corasick traversal of tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> according to text <it>T </it>= 'ATGCCAACCTT' produces the following sequence of nodes {<it>q</it><sub><it>i</it></sub>}<sub><it>i </it>&#8805; 1 </sub>in <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> (the numbers of corresponding nodes in Figure <figr fid="F1">1</figr> are shown in square brackets): A[1], <it>&#949;</it>[0], <it>&#949;</it>[0], C[2], CC[5], A[1], AA[3], AAC[7], ACC[9], CCT[10], <it>&#949;</it>[0].</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Values of <it>&#948; </it>function for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {aaa, aac, aca, acc, cct}.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p><it>q</it>\<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>A</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>G</p>
                     </c>
                     <c ca="center">
                        <p>T</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Values of <it>&#948; </it>(<it>q</it>, <it>&#945;</it>) function for <it>q </it>&#8712; <it>Q </it>and <it>&#945; </it>= <it>A</it>, <it>C</it>, <it>G</it>, <it>T </it>constructed for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {AAA, AAC, ACA, ACC, CCT}.</p>
               </tblfn>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {aaa, aac, aca, acc, cct} with dashed links for <it>&#948; </it>function</p>
               </caption>
               <text>
                  <p><b>Tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {aaa, aac, aca, acc, cct} with dashed links for <it>&#948; </it>function</b>. Tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {AAA, AAC, ACA, ACC, CCT}. Dashed colored links represent <it>&#948; </it>function for internal node (5) &#8211; in red, and for marked node (7) corresponding to the word AAC &#8712; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> &#8211; in purple.</p>
               </text>
               <graphic file="1748-7188-2-13-1"/>
            </fig>
            <p><inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> and transition function <it>&#948; </it>can be efficiently constructed with an algorithm proposed by Aho and Corasick <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. Both time and space of the algorithm is proportional to the sum of lengths of all words from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <p>The combination of tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> and transition function <it>&#948; </it>allows solving numerous pattern matching problems: search of the first occurrence of a word from a given set, search of all occurrences, word counting, <it>etc</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Bernoulli text model. Probability to find at least one occurrence of a single motif</p>
            </st>
            <p>In this section we consider the simplest case. One computes the <it>p</it>-value for a single motif in a text <it>T</it><sub><it>n </it></sub>of length <it>n</it>, assuming that <it>T</it><sub><it>n </it></sub>is generated by independent Bernoulli random trials over alphabet &#931;. The algorithm computes probabilities <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; 1)) by induction on <it>n</it>.</p>
            <p>To describe the algorithm we divide the set &#931;<sup><it>i </it></sup>of all texts <it>T</it><sub><it>i </it></sub>of length <it>i </it>into classes that do and do not contain occurrences of <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <p><b>Definition 1 </b><it>A text T<sub><it>i </it></sub>belongs to class C</it><sub><it>i </it></sub>(0; <it>q</it>) <it>iff</it></p>
            <p><it>1. Length of T</it><sub><it>i </it></sub><it>is i</it>,</p>
            <p><it>2. T</it><sub><it>i </it></sub><it>does not contain words from </it><inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>,</p>
            <p><it>3. A traversal AC </it>(<inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula>, <it>T</it><sub><it>i</it></sub>) <it>ends at node q</it>.</p>
            <p><it>A text T</it><sub><it>i </it></sub><it>belongs to class G</it><sub><it>i </it></sub>(1) <it>iff</it></p>
            <p><it>(i) Length of T</it><sub><it>i </it></sub><it>is i</it>,</p>
            <p><it>(ii) T</it><sub><it>i </it></sub><it>does contain at least one occurrence of a word from </it><inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <p>For a given number <it>i </it>larger than <it>m</it>, the union for classes <it>C</it><sub><it>i </it></sub>(0; <it>q</it>), where <it>q </it>is in <inline-formula><m:math name="1748-7188-2-13-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub><m:mo>\</m:mo><m:mi>&#8459;</m:mi></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaakiabcYfaCjab=Tqiibaa@3AFC@</m:annotation></m:semantics></m:math></inline-formula> and the class <it>G</it><sub><it>i </it></sub>(1) form a partition of the set &#931;<sup><it>i </it></sup>of all texts of length <it>i</it>, i.e., any texts of length <it>i </it>belongs either to a class <it>C</it><sub><it>i </it></sub>(0; <it>q</it>) for some <it>q </it>in <inline-formula><m:math name="1748-7188-2-13-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub><m:mo>\</m:mo><m:mi>&#8459;</m:mi></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaakiabcYfaCjab=Tqiibaa@3AFC@</m:annotation></m:semantics></m:math></inline-formula>, or to a class <it>G</it><sub><it>i </it></sub>(1). Indeed, condition 3. means that the largest suffix of <it>T</it><sub><it>i </it></sub>in <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> is <it>q</it>. It follows from condition 2. that classes <it>C</it><sub><it>i </it></sub>(<it>q</it>; 0) are empty if <it>q </it>is in <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. A text <it>T</it><sub><it>i </it></sub>of length <it>i </it>is in <it>G</it><sub><it>i </it></sub>(1) if and only if a node of <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> was visited during the traversal.</p>
            <p>Let <b>P </b>(<it>C</it><sub><it>n </it></sub>(0; <it>q</it>)) and <b>P </b>(<it>G</it><sub><it>n </it></sub>(1)) denote probabilities that a text <it>T</it><sub><it>n </it></sub>belongs to class <it>C</it><sub><it>n </it></sub>(0; <it>q</it>) and <it>G</it><sub><it>n </it></sub>(1), respectively. Then, <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; 1) = <it>G</it><sub><it>n </it></sub>(1); therefore the desired <it>p</it>-value <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; 1)) is equal to <b>P </b>(<it>G</it><sub><it>n </it></sub>(1)).</p>
            <p>The algorithm calculates probabilities <b>P </b>(<it>C</it><sub><it>i </it></sub>(0; <it>q</it>)) and <b>P </b>(<it>G</it><sub><it>i </it></sub>(1)) using induction on length <it>i</it>. For <it>i </it>= 0, these probabilities obviously comply with: <b>P </b>(<it>C</it><sub>0 </sub>(0; <it>&#949;</it>)) = 1; <b>P </b>(<it>C</it><sub>0 </sub>(0; <it>q</it>)) = 0, for any <it>q </it>&#8800; <it>&#949;</it>; <b>P </b>(<it>G</it><sub>0 </sub>(1)) = 0.</p>
            <p>The values of <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(0; <it>q</it>)) and <b>P </b>(<it>G</it><sub><it>i</it>+1 </sub>(1)) are calculated using values of <b>P </b>(<it>C</it><sub><it>i </it></sub>(0; <it>q</it>)) and <b>P </b>(<it>G</it><sub><it>i </it></sub>(1)). Therefore, the needed space is proportional to the size of <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> (see section <it>Extensions and complexity </it>below).</p>
            <p>Calculation of values <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(0; <it>q</it>)) and <b>P </b>(<it>G</it><sub><it>i</it>+1 </sub>(1)) is based on the following observations. Let <it>U </it>be a set of texts of the same length over the alphabet &#931;, <b>P </b>(<it>U</it>) the probability of <it>U </it>in the Bernoulli model and <it>a </it>a character in &#931;. Let <it>U</it>&#183;<it>a </it>be the set of all possible concatenations, i.e., <it>U</it>&#183;<it>a </it>= {<it>xa</it>|<it>x </it>&#8712; <it>U</it>}. And in the case of the Bernoulli model</p>
            <p>
               <display-formula id="M1"><b>P </b>(<it>U</it>&#183;<it>a</it>) = <b>P </b>(<it>U</it>) <b>P </b>(<it>a</it>).</display-formula>
            </p>
            <p>Then the following relations hold for any <it>i </it>&#8712; {1, ..., <it>n </it>- 1} and &#931;:</p>
            <p>(i) if the text <it>T</it><sub><it>i </it></sub>contains a word from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> then all its concatenations with characters from &#931; would contain a word from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; i.e.,</p>
            <p>
               <display-formula id="M2"><it>G</it><sub><it>i </it></sub>(1)&#183;<it>a </it>&#8834; <it>G</it><sub><it>i</it>+1 </sub>(1).</display-formula>
            </p>
            <p>(ii) if the text <it>T</it><sub><it>i </it></sub>does not contain a word from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> and belongs to <it>C</it><sub><it>i</it>+1 </sub>(0; <it>q</it>), i.e., ends with <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub><m:mo>\</m:mo><m:mi>&#8459;</m:mi></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaakiabcYfaCjab=Tqiibaa@3AFC@</m:annotation></m:semantics></m:math></inline-formula>, then its concatenation <it>T</it><sub><it>i</it></sub>&#183;<it>a </it>belongs to the class determined by the result of the Aho-Corasick transition function <it>&#948; </it>(<it>q, a</it>); i.e.,</p>
            <p>
               <display-formula id="M3">if <it>&#948; </it>(<it>q</it>, <it>a</it>) &#8712; <m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math>,&#160;&#160;&#160;then <it>C</it><sub><it>i </it></sub>(0; <it>q</it>)&#183;<it>a </it>&#8834; <it>C</it><sub><it>i</it>+1 </sub>(0; <it>&#948; </it>(<it>q, a</it>))</display-formula>
            </p>
            <p>
               <display-formula id="M4">otherwise&#160;&#160;&#160;<it>C</it><sub><it>i </it></sub>(0; <it>q</it>) &#8834; <it>G</it><sub><it>i</it>+1 </sub>(1).</display-formula>
            </p>
            <p>Remembering that classes <it>C</it><sub><it>i </it></sub>(0; <it>q</it>) for different <it>q </it>and <it>G</it><sub><it>i </it></sub>(1) form a partition of &#931;<sup><it>i</it></sup>, we obtain the following relation for the texts containing words from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>:</p>
            <p>
               <display-formula id="M5">
                  <m:math name="1748-7188-2-13-i14" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>G</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mo>{</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8746;</m:mo>
                                 <m:mrow>
                                    <m:mi>a</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>&#931;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>G</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>a</m:mi>
                                 <m:mo>}</m:mo>
                                 <m:mo>&#8746;</m:mo>
                                 <m:mo>{</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:munder>
                                       <m:mo>&#8746;</m:mo>
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>q</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>a</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>;</m:mo>
                                          <m:mi>&#948;</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>q</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>a</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>&#8712;</m:mo>
                                          <m:mi>&#8459;</m:mi>
                                       </m:mrow>
                                    </m:munder>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>0</m:mn>
                                       <m:mo>;</m:mo>
                                       <m:mi>q</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#8901;</m:mo>
                                       <m:mi>a</m:mi>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                                 <m:mo>.</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGhbWrdaWgaaWcbaGaemyAaKMaey4kaSIaeGymaedabeaakiabcIcaOiabigdaXiabcMcaPiabg2da9iabcUha7naatafabaGaem4raC0aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqaIXaqmcqGGPaqkcqGHflY1cqWGHbqycqGG9bqFcqGHQicYcqGG7bWEdaWeqbqaaiabdoeadnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaeGimaaJaei4oaSJaemyCaeNaeiykaKIaeyyXICTaemyyaeMaeiyFa0haleaacqGGOaakcqWGXbqCcqGGSaalcqWGHbqycqGGPaqkcqGG7aWoiiGacqWF0oazcqGGOaakcqWGXbqCcqGGSaalcqWGHbqycqGGPaqkcqGHiiIZt0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqGFlecsaeqaniablQIivbGccqGGUaGlaSqaaiabdggaHjabgIGiolabfo6atbqab0GaeSOkIufaaaa@72A7@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Similarly, classes of texts that do not contain words from <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> satisfy</p>
            <p>
               <display-formula id="M6">
                  <m:math name="1748-7188-2-13-i15" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mo>&#8704;</m:mo>
                                       <m:msup>
                                          <m:mi>q</m:mi>
                                          <m:mo>&#8242;</m:mo>
                                       </m:msup>
                                       <m:mo>&#8712;</m:mo>
                                       <m:msub>
                                          <m:mi>Q</m:mi>
                                          <m:mi>&#8459;</m:mi>
                                       </m:msub>
                                       <m:mo>\</m:mo>
                                       <m:mi>&#8459;</m:mi>
                                       <m:mo>:</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>0</m:mn>
                                       <m:mo>;</m:mo>
                                       <m:msup>
                                          <m:mi>q</m:mi>
                                          <m:mo>&#8242;</m:mo>
                                       </m:msup>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munder>
                                             <m:mo>&#8746;</m:mo>
                                             <m:mrow>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>q</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>a</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>;</m:mo>
                                                <m:mi>&#948;</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>q</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>a</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>=</m:mo>
                                                <m:msup>
                                                   <m:mi>q</m:mi>
                                                   <m:mo>&#8242;</m:mo>
                                                </m:msup>
                                             </m:mrow>
                                          </m:munder>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>C</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mn>0</m:mn>
                                             <m:mo>;</m:mo>
                                             <m:mi>q</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8901;</m:mo>
                                             <m:mi>a</m:mi>
                                          </m:mrow>
                                       </m:mstyle>
                                       <m:mo>.</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaGaeyiaIiIafmyCaeNbauaacqGHiiIZcqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaakiabcYfaCjab=TqiijabcQda6aqaaiabdoeadnaaBaaaleaacqWGPbqAcqGHRaWkcqaIXaqmaeqaaOGaeiikaGIaeGimaaJaei4oaSJafmyCaeNbauaacqGGPaqkcqGH9aqpdaWeqbqaaiabdoeadnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaeGimaaJaei4oaSJaemyCaeNaeiykaKIaeyyXICTaemyyaegaleaacqGGOaakcqWGXbqCcqGGSaalcqWGHbqycqGGPaqkcqGG7aWoiiGacqGF0oazcqGGOaakcqWGXbqCcqGGSaalcqWGHbqycqGGPaqkcqGH9aqpcuWGXbqCgaqbaaqab0GaeSOkIufakiabc6caUaaaaaa@67F3@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Classes <it>C</it><sub><it>i </it></sub>(0; <it>q</it>) for different <it>q </it>in <inline-formula><m:math name="1748-7188-2-13-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub><m:mo>\</m:mo><m:mi>&#8459;</m:mi></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaakiabcYfaCjab=Tqiibaa@3AFC@</m:annotation></m:semantics></m:math></inline-formula> and <it>G</it><sub><it>i </it></sub>(1) form a partition of &#931;<sup><it>i</it></sup>; classes <it>C</it><sub><it>i </it></sub>(0; <it>q</it>) are empty if <it>q </it>is in <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. Relations (5) and (6) with the help of (1) yield the recursive expressions for probabilities <b>P </b>(<it>C</it><sub><it>i+i </it></sub>(0; <it>q</it>)) and <b>P </b>(<it>G</it><sub><it>i</it>+1 </sub>(1)) in the Bernoulli case:</p>
            <p>
               <display-formula id="M7">
                  <m:math name="1748-7188-2-13-i16" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>G</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>G</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>+</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>a</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>:</m:mo>
                                    <m:mi>&#948;</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>a</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>&#8459;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>C</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>0</m:mn>
                                 <m:mo>;</m:mo>
                                 <m:mi>q</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>a</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFqbaucqGGOaakcqWGhbWrdaWgaaWcbaGaemyAaKMaey4kaSIaeGymaedabeaakiabcIcaOiabigdaXiabcMcaPiabcMcaPiabg2da9iab=bfaqjabcIcaOiabdEeahnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaeGymaeJaeiykaKIaeiykaKIaey4kaSYaaabuaeaacqWFqbaucqGGOaakcqWGdbWqdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabicdaWiabcUda7iabdghaXjabcMcaPiabcMcaPiabgwSixlabdchaWjabcIcaOiabdggaHjabcMcaPaWcbaGaeiikaGIaemyCaeNaeiilaWIaemyyaeMaeiykaKIaeiOoaOdcciGae4hTdqMaeiikaGIaemyCaeNaeiilaWIaemyyaeMaeiykaKIaeyicI48enfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae03cHGeabeqdcqGHris5aOGaeiilaWcaaa@6E5E@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>
               <display-formula id="M8">
                  <m:math name="1748-7188-2-13-i17" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>C</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mn>0</m:mn>
                           <m:mo>;</m:mo>
                           <m:msup>
                              <m:mi>q</m:mi>
                              <m:mo>&#8242;</m:mo>
                           </m:msup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>a</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>:</m:mo>
                                    <m:mi>&#948;</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>a</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>=</m:mo>
                                    <m:msup>
                                       <m:mi>q</m:mi>
                                       <m:mo>&#8242;</m:mo>
                                    </m:msup>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>C</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>0</m:mn>
                                 <m:mo>;</m:mo>
                                 <m:mi>q</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>a</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFqbaucqGGOaakcqWGdbWqdaWgaaWcbaGaemyAaKMaey4kaSIaeGymaedabeaakiabcIcaOiabicdaWiabcUda7iqbdghaXzaafaGaeiykaKIaeiykaKIaeyypa0ZaaabuaeaacqWFqbaucqGGOaakcqWGdbWqdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabicdaWiabcUda7iabdghaXjabcMcaPiabcMcaPiabgwSixlabdchaWjabcIcaOiabdggaHjabcMcaPaWcbaGaeiikaGIaemyCaeNaeiilaWIaemyyaeMaeiykaKIaeiOoaOdcciGae4hTdqMaeiikaGIaemyCaeNaeiilaWIaemyyaeMaeiykaKIaeyypa0JafmyCaeNbauaaaeqaniabggHiLdGccqGGUaGlaaa@5E0F@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The run-time for each step of the computation of <it>C</it><sub><it>i</it>+1 </sub>(0; <it>q</it>) and <it>G</it><sub><it>i</it>+1 </sub>(1) is <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>|&#183;|&#931;|); therefore the total time of all <it>n </it>stages of <it>p</it>-value computation is <it>O </it>(|<inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>|&#183;|&#931;|&#183;<it>n</it>).</p>
            <p>The approach described in this section can be readily extended to the case of multiple occurrences of motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>. The detailed procedure can be found in Additional file <supplr sid="S1">1</supplr>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Bernoulli text model. Probability to find multiple occurrences of a single motif. The detailed description of the algorithm for the <it>p</it>-value calculation in the case of multiple occurrences of a single motif.</p>
               </text>
               <file name="1748-7188-2-13-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Bernoulli text model. Probability to find multiple occurrences of multiple motifs</p>
            </st>
            <p>DNA transcription is usually regulated with several factors simultaneously interacting with DNA and specifically recognizing different DNA sites. Individual regulatory segment of DNA can contain many binding sites for several factors, often substantially overlapping with each other <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. This brings about a problem of studying of co-occurring motifs.</p>
            <p>Let (<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>) be <it>s </it>different motifs. Our objective is to calculate the probability that motifs (<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>) have respectively at least (<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) possibly overlapping occurrences in the random text <it>T</it><sub><it>n </it></sub>of the length <it>n</it>. This <it>p</it>-value is the probability <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>)) to obtain text <it>T</it><sub><it>n </it></sub>belonging to the set of texts <it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>). In this section, we will suppose that the probability of each text is given by Bernoulli model. The Markov case will be considered in the next subsection. The recursion for multiple occurrences of multiple motifs obtained here is rather tricky. Therefore we suggest the reader to see Additional file <supplr sid="S1">1</supplr> where we describe the recursion for the simpler case of multiple occurrences of a single motif</p>
            <p>Let us consider the union <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> of individual motifs <inline-formula><m:math name="1748-7188-2-13-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#8459;</m:mi><m:mo>=</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>&#8746;</m:mo><m:mo>&#8943;</m:mo><m:mo>&#8746;</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecscqGH9aqpcqWFlecsdaWgaaWcbaGaeGymaedabeaakiabgQIiilabl+UimjabgQIiilab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@4249@</m:annotation></m:semantics></m:math></inline-formula>. It contains all words that belong to any of motifs <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i</it></sub>. The tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> is constructed for the overall set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>, its nodes <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> contain all possible prefixes of all motifs from (<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>). A node of the tree <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> can belong to some motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>k </it></sub>or simultaneously to several different motifs from {<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j</it></sub>}<sub>1&#8804;<it>j</it>&#8804;<it>s</it></sub>. Let each node <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> be marked with numbers <it>j </it>of motifs <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>to which it belongs. Nodes, corresponding to proper prefixes of <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>, remain unmarked. The transition function <inline-formula><m:math name="1748-7188-2-13-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#948;</m:mi><m:mo>:</m:mo><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub><m:mo>&#215;</m:mo><m:mi>&#931;</m:mi><m:mo>&#8594;</m:mo><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF0oazcqGG6aGocqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae43cHGeabeaakiabgEna0kabfo6atjabgkziUkabdgfarnaaBaaaleaacqGFlecsaeqaaaaa@4341@</m:annotation></m:semantics></m:math></inline-formula> is defined as it was defined in the case of a single motif for the unified motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <p>All texts <it>T</it><sub><it>n </it></sub>of length <it>n </it>are classified into classes depending on occurrences of different <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j</it></sub>. In this case it is difficult to introduce the target class <it>G</it>, since when the target number of occurrences <it>k</it><sub><it>i </it></sub>is attained for some motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i</it></sub>, the corresponding value <it>k</it><sub><it>j </it></sub>may not yet be attained for another motif <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j</it></sub>. Therefore we need to introduce the occurrence index of a set of motifs.</p>
            <p><b>Definition 2 </b><it>Let the target number of occurrences of motif </it><inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i </it></sub><it>be k</it><sub><it>i</it></sub>. <it>Then, the occurrence index </it><inline-formula><m:math name="1748-7188-2-13-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#923;</m:mi><m:mrow><m:mo stretchy="false">(</m:mo><m:msub><m:mi>k</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>k</m:mi><m:mi>s</m:mi></m:msub><m:mo stretchy="false">)</m:mo></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqHBoatdaWgaaWcbaGaeiikaGIaem4AaS2aaSbaaWqaaiabigdaXaqabaWccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGRbWAdaWgaaadbaGaem4CamhabeaaliabcMcaPaqabaaaaa@39F8@</m:annotation></m:semantics></m:math></inline-formula> (<it>l</it><sub>1</sub>, ..., <it>l</it><sub><it>s</it></sub>) <it>of a set of motifs </it>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>) <it>in the text T</it><sub><it>n </it></sub><it>containing l</it><sub><it>i </it></sub><it>possibly overlapping occurrences of each </it><inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>i </it></sub><it>is an s-vector the ith component of which can be calculated as follows</it>:</p>
            <p>
               <display-formula id="M9">
                  <m:math name="1748-7188-2-13-i20" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mrow>
                                 <m:mo stretchy="false">[</m:mo>
                                 <m:msub>
                                    <m:mi>&#923;</m:mi>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>k</m:mi>
                                          <m:mn>1</m:mn>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:mn>...</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>k</m:mi>
                                          <m:mi>s</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>l</m:mi>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:mn>...</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>l</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">]</m:mo>
                              </m:mrow>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msub>
                              <m:mi>&#955;</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable columnalign="left">
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>l</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>f</m:mi>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>l</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8804;</m:mo>
                                             <m:msub>
                                                <m:mi>k</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>,</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>k</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>f</m:mi>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>l</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>></m:mo>
                                             <m:msub>
                                                <m:mi>k</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>.</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGGBbWwcqqHBoatdaWgaaWcbaGaeiikaGIaem4AaS2aaSbaaWqaaiabigdaXaqabaWccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGRbWAdaWgaaadbaGaem4CamhabeaaliabcMcaPaqabaGccqGGOaakcqWGSbaBdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdYgaSnaaBaaaleaacqWGZbWCaeqaaOGaeiykaKIaeiyxa01aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpiiGacqWF7oaBdaWgaaWcbaGaemyAaKgabeaakiabg2da9maaceqabaqbaeaabiWaaaqaaiabdYgaSnaaBaaaleaacqWGPbqAaeqaaaGcbaGaemyAaKMaemOzaygabaGaemiBaW2aaSbaaSqaaiabdMgaPbqabaGccqGHKjYOcqWGRbWAdaWgaaWcbaGaemyAaKgabeaakiabcYcaSaqaaiabdUgaRnaaBaaaleaacqWGPbqAaeqaaaGcbaGaemyAaKMaemOzaygabaGaemiBaW2aaSbaaSqaaiabdMgaPbqabaGccqGH+aGpcqWGRbWAdaWgaaWcbaGaemyAaKgabeaakiabc6caUaaaaiaawUhaaaaa@6BCA@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><b>Definition 3 </b><it>A text T</it><sub><it>i </it></sub><it>belongs to class C</it><sub><it>i </it></sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q</it>), 0 &#8804; <it>&#955;</it><sub><it>i </it></sub>&#8804; <it>k</it><sub><it>i </it></sub><it>iff</it></p>
            <p><it>1. Length of T</it><sub><it>i </it></sub><it>equals i</it>,</p>
            <p><it>2. The occurrence index of motifs </it>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>) <it>in text T</it><sub><it>i </it></sub><it>is equal to </it>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>),</p>
            <p><it>3. A traversal AC </it>(<inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula>, <it>T</it><sub><it>i</it></sub>) <it>ends in node q</it>.</p>
            <p><it>A text T</it><sub><it>i </it></sub><it>belongs to class G</it><sub><it>i </it></sub>(<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>) <it>if it belongs to the union of classes</it></p>
            <p>
               <display-formula id="M10">
                  <m:math name="1748-7188-2-13-i21" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>G</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>k</m:mi>
                              <m:mn>1</m:mn>
                           </m:msub>
                           <m:mo>,</m:mo>
                           <m:mn>...</m:mn>
                           <m:mo>,</m:mo>
                           <m:msub>
                              <m:mi>k</m:mi>
                              <m:mi>s</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8746;</m:mo>
                                 <m:mrow>
                                    <m:mi>q</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>&#8459;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>C</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>k</m:mi>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:mn>...</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>k</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:msub>
                                 <m:mo>;</m:mo>
                                 <m:mi>q</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGhbWrdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdUgaRnaaBaaaleaacqaIXaqmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaem4AaS2aaSbaaSqaaiabdohaZbqabaGccqGGPaqkcqGH9aqpdaWeqbqaaiabdoeadnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaem4AaS2aaSbaaSqaaiabigdaXaqabaGccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGRbWAdaWgaaWcbaGaem4CamhabeaakiabcUda7iabdghaXjabcMcaPaWcbaGaemyCaeNaeyicI48enfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeqdcqWIQisvaOGaeiOla4caaa@5CF8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The desired <it>p</it>-value <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>)) is equal to <b>P </b>(<it>G</it><sub><it>n </it></sub>(<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>)). The value is calculated iteratively. Again, we have a sum over all possible tree nodes <it>q </it>and symbols <it>a</it>. Now, <it>q'</it>, the image of the transition function <it>&#948; </it>(<it>q, a</it>) can belong simultaneously to several motifs {<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j</it></sub>}<sub>1&#8804;<it>j</it>&#8804;<it>s</it></sub>. Thus, the resulting probability <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q'</it>)) that text <it>T</it><sub><it>i</it>+1 </sub>belongs to class <it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q'</it>) calculates as</p>
            <p>
               <display-formula id="M11">
                  <m:math name="1748-7188-2-13-i22" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>C</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>&#955;</m:mi>
                              <m:mn>1</m:mn>
                           </m:msub>
                           <m:mo>,</m:mo>
                           <m:mn>...</m:mn>
                           <m:mo>,</m:mo>
                           <m:msub>
                              <m:mi>&#955;</m:mi>
                              <m:mi>s</m:mi>
                           </m:msub>
                           <m:mo>;</m:mo>
                           <m:msup>
                              <m:mi>q</m:mi>
                              <m:mo>&#8242;</m:mo>
                           </m:msup>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>a</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>:</m:mo>
                                    <m:mi>&#948;</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>a</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>=</m:mo>
                                    <m:msup>
                                       <m:mi>q</m:mi>
                                       <m:mo>&#8242;</m:mo>
                                    </m:msup>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munder>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>r</m:mi>
                                             <m:mn>1</m:mn>
                                          </m:msub>
                                          <m:mo>,</m:mo>
                                          <m:mn>...</m:mn>
                                          <m:mo>,</m:mo>
                                          <m:msub>
                                             <m:mi>r</m:mi>
                                             <m:mi>s</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>&#8712;</m:mo>
                                          <m:mi>J</m:mi>
                                       </m:mrow>
                                    </m:munder>
                                    <m:mrow>
                                       <m:mi>P</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>r</m:mi>
                                          <m:mn>1</m:mn>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:mn>...</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>r</m:mi>
                                          <m:mi>s</m:mi>
                                       </m:msub>
                                       <m:mo>;</m:mo>
                                       <m:mi>q</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#8901;</m:mo>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>a</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFqbaucqGGOaakcqWGdbWqdaWgaaWcbaGaemyAaKMaey4kaSIaeGymaedabeaakiabcIcaOGGaciab+T7aSnaaBaaaleaacqaIXaqmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIae43UdW2aaSbaaSqaaiabdohaZbqabaGccqGG7aWocuWGXbqCgaqbaiabcMcaPiabcMcaPiabg2da9maaqafabaWaaabuaeaacqWFqbaucqGGOaakcqWGdbWqdaWgaaWcbaGaemyAaKgabeaakiabcIcaOiabdkhaYnaaBaaaleaacqaIXaqmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemOCai3aaSbaaSqaaiabdohaZbqabaGccqGG7aWocqWGXbqCcqGGPaqkcqGGPaqkcqGHflY1cqWGWbaCcqGGOaakcqWGHbqycqGGPaqkaSqaaiabcIcaOiabdkhaYnaaBaaameaacqaIXaqmaeqaaSGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemOCai3aaSbaaWqaaiabdohaZbqabaWccqGGPaqkcqGHiiIZcqWFkbGsaeqaniabggHiLdaaleaacqGGOaakcqWGXbqCcqGGSaalcqWGHbqycqGGPaqkcqGG6aGocqGF0oazcqGGOaakcqWGXbqCcqGGSaalcqWGHbqycqGGPaqkcqGH9aqpcuWGXbqCgaqbaaqab0GaeyyeIuoaaaa@8070@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where the summation in the second sum is performed over all allowed <it>s</it>-tuples of indexes (<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>) which together make the set of <it>s</it>-tuples <b>J</b>. A <it>s</it>-tuple of indexes (<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>) belongs to <b>J </b>if it complies with the following conditions:</p>
            <p>1. if <it>q' </it>&#8713; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>then <it>r</it><sub><it>j </it></sub>= <it>&#955;</it><sub><it>j</it></sub>,</p>
            <p>2. if <it>q' </it><inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>and <it>&#955;</it><sub><it>j </it></sub>&lt;<it>k</it><sub><it>j </it></sub>then <it>r</it><sub><it>j </it></sub>= <it>&#955;</it><sub><it>j </it></sub>- 1,</p>
            <p>3. if <it>q' </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>and <it>&#955;</it><sub><it>j </it></sub>= <it>k</it><sub><it>j </it></sub>then <it>r</it><sub><it>j </it></sub>= <it>k</it><sub><it>j </it></sub>or <it>r</it><sub><it>j </it></sub>= <it>k</it><sub><it>j </it></sub>- 1.</p>
         </sec>
         <sec>
            <st>
               <p>Implementation details</p>
            </st>
            <p>Our basic data structure is the prefix tree; we use its standard representation <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> [see also Additional files <supplr sid="S2">2</supplr> and <supplr sid="S3">3</supplr> for <it>Tree construction from PWM motif representation</it>]. Each tree node <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> is supplied with several additional variables.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Tree construction from PWM motif representation. The brief description of the procedure of the prefix tree construction from PWM motif representation.</p>
               </text>
               <file name="1748-7188-2-13-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Tree construction from PWM motif representation. Steps of the prefix tree construction for a PWM and a given cut-off.</p>
               </text>
               <file name="1748-7188-2-13-S3.bmp">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>At stage (<it>i </it>+ 1) of probability computation the values <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q</it>)) become computed from the values <b>P </b>(<it>C</it><sub><it>i </it></sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q</it>)) obtained at the previous stage of induction. Therefore, at stage (<it>i </it>+ 1), one no longer needs the values calculated at stage (<it>i </it>- 1). Thus, each node is supplied with two <it>k</it><sub>1 </sub>&#215; &#8943; &#215; <it>k</it><sub><it>s</it></sub>-arrays of real values <b>C</b><sub><b>0 </b></sub>and <b>C</b><sub><b>1 </b></sub>for storing <b>P </b>(<it>C</it><sub><it>i </it></sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q</it>)) and <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q</it>)) for different <it>&#955;</it><sub><it>j</it></sub>. <b>C</b><sub><b>0 </b></sub>is used to store probabilities for even text lengths while <b>C</b><sub><b>1 </b></sub>for odd.</p>
            <p>In implementation the calculation of values <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q'</it>)) from <b>P </b>(<it>C</it><sub><it>i </it></sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q</it>)) for all <it>q'</it>, <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> and (<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>): 0 &#8804; <it>&#955;</it><sub><it>j </it></sub>&#8804; <it>k</it><sub><it>j</it></sub>, 1 &#8804; <it>j </it>&#8804; <it>s</it>, is performed in the parallel way. Initially we set all the values <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q'</it>)) to 0. Then we look over all tuples (<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>), where <it>q </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula> and (<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>): 0 &#8804; <it>r</it><sub><it>j </it></sub>&#8804; <it>k</it><sub><it>j</it></sub>, 1 &#8804; <it>j </it>&#8804; <it>s</it>. For each tuple (<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>) and all letters <it>a </it>&#8712; &#931; we find the prefix <it>q' </it>= <it>&#948; </it>(<it>q, a</it>) and the value <b>P </b>(<it>C</it><sub><it>i </it></sub>(<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>))&#183;<it>p</it>(<it>a</it>). Then we add <b>P </b>(<it>C</it><sub><it>i </it></sub>(<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>))&#183;<it>p</it>(<it>a</it>) to the value <b>P </b>(<it>C</it><sub><it>i</it>+1 </sub>(<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q'</it>)) where (<it>&#955;</it><sub>1</sub>, ..., <it>&#955;</it><sub><it>s</it></sub>; <it>q'</it>) meet the conditions inverse to those of formula (11):</p>
            <p>1. if <it>q' </it>&#8713; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>then <it>&#955;</it><sub><it>j </it></sub>= <it>r</it><sub><it>j</it></sub>,</p>
            <p>2. if <it>q' </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>and <it>r</it><sub><it>j </it></sub>&lt;<it>k</it><sub><it>j </it></sub>then <it>&#955;</it><sub><it>j </it></sub>= <it>r</it><sub><it>j </it></sub>+ 1,</p>
            <p>3. if <it>q' </it>&#8712; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula><sub><it>j </it></sub>and <it>r</it><sub><it>j </it></sub>= <it>k</it><sub><it>j </it></sub>then <it>&#955;</it><sub><it>j </it></sub>= <it>r</it><sub><it>j</it></sub>.</p>
            <p>At the stage <it>i </it>= <it>n </it>the desired <it>p</it>-value is the sum</p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-13-i23" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>G</m:mi>
                              <m:mi>n</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>k</m:mi>
                              <m:mn>1</m:mn>
                           </m:msub>
                           <m:mo>,</m:mo>
                           <m:mn>...</m:mn>
                           <m:mo>,</m:mo>
                           <m:msub>
                              <m:mi>k</m:mi>
                              <m:mi>s</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>q</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>&#8459;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>C</m:mi>
                                    <m:mi>n</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>k</m:mi>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:mn>...</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>k</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:msub>
                                 <m:mo>;</m:mo>
                                 <m:mi>q</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>.</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFqbaucqGGOaakcqWGhbWrdaWgaaWcbaGaemOBa4gabeaakiabcIcaOiabdUgaRnaaBaaaleaacqaIXaqmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaem4AaS2aaSbaaSqaaiabdohaZbqabaGccqGGPaqkcqGGPaqkcqGH9aqpdaaeqbqaaiab=bfaqjabcIcaOiabdoeadnaaBaaaleaacqWGUbGBaeqaaOGaeiikaGIaem4AaS2aaSbaaSqaaiabigdaXaqabaGccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGRbWAdaWgaaWcbaGaem4CamhabeaakiabcUda7iabdghaXjabcMcaPiabcMcaPiabc6caUaWcbaGaemyCaeNaeyicI48enfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae43cHGeabeqdcqGHris5aaaa@6328@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
         </sec>
         <sec>
            <st>
               <p>Markov text model</p>
            </st>
            <p>Tree approach and the recursion (11) can be readily extended to calculate <it>p</it>-values of motif occurrences in random texts generated by the Markov model of order <it>K</it>. Given the order <it>K </it>of the Markov model, the probability <it>p</it>(<it>a</it>) in (11) depends on <it>K </it>previous letters. Thus, if the length |<it>q</it>| of the prefix <it>q </it>is less than <it>K</it>, one cannot calculate <it>p</it>(<it>a</it>) knowing only the prefix <it>q</it>. To overcome this we divide each class <it>C</it><sub><it>i </it></sub>(<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>), where |<it>q</it>| = <it>d </it>&lt;<it>min </it>(<it>K</it>, <it>i</it>) into subclasses <it>C</it><sub><it>i </it></sub>(<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>, <it>w</it>); each subclass corresponds to a word <it>w </it>of length <it>min </it>(<it>K</it>, <it>i</it>) - <it>d</it>. Then, a text <it>T</it><sub><it>i </it></sub>of length <it>i </it>belongs to class <it>C</it><sub><it>i </it></sub>(<it>r</it><sub>1</sub>, ..., <it>r</it><sub><it>s</it></sub>; <it>q</it>, <it>w</it>) if the suffix of <it>T</it><sub><it>i </it></sub>of length <it>min </it>(<it>K</it>, <it>i</it>) equals to <it>w</it>&#183;<it>q</it>.</p>
            <p>Figure <figr fid="F2">2</figr> gives an example for Markov model of order <it>K </it>= 1. The tree is constructed for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {AAA, AAC, ACA, ACC, CCT}. The text <it>T </it>= ATGCCAACCTT produces the following sequence of nodes {<it>q</it><sub><it>i</it></sub>}<sub><it>i</it>&#8805;1 </sub>(the numbers of the corresponding nodes in Figure <figr fid="F2">2</figr> are shown in square brackets): A[4], (<it>&#949;, T</it>)[3], (<it>&#949;, G</it>)[2], C[5], CC[8], A[4], AA[6], AAC[10], ACC[12], CCT[13], (<it>&#949;, T</it>)[3].</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {aaa, aac, aca, acc, cct} with dashed links for <it>&#948; </it>function under Markov(1) model</p>
               </caption>
               <text>
                  <p><b>Tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {aaa, aac, aca, acc, cct} with dashed links for <it>&#948; </it>function under Markov(1) model</b>. Tree <inline-formula><m:math name="1748-7188-2-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8459;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecscqGGPaqkaaa@3AF1@</m:annotation></m:semantics></m:math></inline-formula> for the set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> = {AAA, AAC, ACA, ACC, CCT} under Markov model of order 1. Dashed colored links represent <it>&#948; </it>function for internal node (8) &#8211; in red, and for marked node (10) corresponding to the word AAC &#8712; <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> &#8211; in purple.</p>
               </text>
               <graphic file="1748-7188-2-13-2"/>
            </fig>
            <p>The recursive equations for probabilities <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; 1)), <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>)), and <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@3F88@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>s</it></sub>)) can be obtained from the corresponding formulae (7-8), (11&#8211;13) and (16) by substituting probabilities <it>p</it>(<it>a</it>) with <it>p</it>(<it>a</it>|<it>t</it>[1] &#8943; <it>t </it>[<it>K</it>]), where</p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-13-i24" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>t</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>&#8943;</m:mo>
                           <m:mi>t</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mi>K</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable columnalign="left">
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>w</m:mi>
                                             <m:mo>&#8901;</m:mo>
                                             <m:mi>q</m:mi>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mtext>if&#160;</m:mtext>
                                             <m:mn>0</m:mn>
                                             <m:mo>&#8804;</m:mo>
                                             <m:mi>d</m:mi>
                                             <m:mo>&lt;</m:mo>
                                             <m:mi>K</m:mi>
                                             <m:mo>,</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mi>K</m:mi>
                                             <m:mtext>-suffix&#160;of&#160;</m:mtext>
                                             <m:mi>q</m:mi>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mtext>otherwise</m:mtext>
                                             <m:mo>.</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG0baDcqGGBbWwcqaIXaqmcqGGDbqxcqWIVlctcqWG0baDcqGGBbWwcqWGlbWscqGGDbqxcqGH9aqpdaGabeqaauaabaqaciaaaeaacqWG3bWDcqGHflY1cqWGXbqCaeaacqqGPbqAcqqGMbGzcqqGGaaicqaIWaamcqGHKjYOcqWGKbazcqGH8aapcqWGlbWscqGGSaalaeaacqWGlbWscqqGTaqlcqqGZbWCcqqG1bqDcqqGMbGzcqqGMbGzcqqGPbqAcqqG4baEcqqGGaaicqqGVbWBcqqGMbGzcqqGGaaicqWGXbqCaeaacqqGVbWBcqqG0baDcqqGObaAcqqGLbqzcqqGYbGCcqqG3bWDcqqGPbqAcqqGZbWCcqqGLbqzcqGGUaGlaaaacaGL7baaaaa@67AD@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The Markov extension is currently implemented for <it>K </it>= 1.</p>
         </sec>
         <sec>
            <st>
               <p>Complexity</p>
            </st>
            <p>To resume, the computation of <b>P </b>(<it>L</it><sub><it>n </it></sub>(<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>; <it>k</it>)) for one set <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula> requires a computation of <inline-formula><m:math name="1748-7188-2-13-i25" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>P</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>C</m:mi><m:mi>i</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>l</m:mi><m:mo>,</m:mo><m:mi>q</m:mi><m:mo stretchy="false">)</m:mo><m:mo stretchy="false">)</m:mo></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mrow><m:mn>0</m:mn><m:mo>&#8804;</m:mo><m:mi>l</m:mi><m:mo>&lt;</m:mo><m:mi>k</m:mi><m:mo>,</m:mo><m:mi>q</m:mi><m:mo>&#8712;</m:mo><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadaqaaGqabiab=bfaqjabcIcaOiabdoeadnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemiBaWMaeiilaWIaemyCaeNaeiykaKIaeiykaKcacaGLOaGaayzkaaWaaSbaaSqaaiabicdaWiabgsMiJkabdYgaSjabgYda8iabdUgaRjabcYcaSiabdghaXjabgIGiolabdgfarnaaBaaameaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqGFlecsaeqaaaWcbeaaaaa@4F8E@</m:annotation></m:semantics></m:math></inline-formula> for <it>i </it>&#8804; <it>n</it>. For each iteration, the time complexity is <it>O </it>(<it>k</it>|<inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>| |&#931;|), where |&#931;| is the size of the alphabet. One traverses the tree <it>n </it>times. As |<inline-formula><m:math name="1748-7188-2-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>Q</m:mi><m:mi>&#8459;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqudaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae83cHGeabeaaaaa@38B9@</m:annotation></m:semantics></m:math></inline-formula>| is upper bounded by (<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|), where <it>m </it>is the maximal length of word in <inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>, this yields the overall <it>O </it>(<it>nkm</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>||&#931;|) time complexity and a <it>O </it>(<it>km</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|) space complexity.</p>
            <p>When several sets are involved, the number of nodes in the tree <inline-formula><m:math name="1748-7188-2-13-i26" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">T</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>&#8746;</m:mo><m:mo>&#8943;</m:mo><m:mo>&#8746;</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFtepvcqGGOaakcqWFlecsdaWgaaWcbaGaeGymaedabeaakiabgQIiilabl+UimjabgQIiilab=TqiinaaBaaaleaacqWGZbWCaeqaaOGaeiykaKcaaa@43E3@</m:annotation></m:semantics></m:math></inline-formula> becomes <it>O </it>(<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|) with <it>m </it>equal to the maximal length of word in <inline-formula><m:math name="1748-7188-2-13-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#8459;</m:mi><m:mo>=</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mn>1</m:mn></m:msub><m:mo>&#8746;</m:mo><m:mo>&#8943;</m:mo><m:mo>&#8746;</m:mo><m:msub><m:mi>&#8459;</m:mi><m:mi>s</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecscqGH9aqpcqWFlecsdaWgaaWcbaGaeGymaedabeaakiabgQIiilabl+UimjabgQIiilab=TqiinaaBaaaleaacqWGZbWCaeqaaaaa@4249@</m:annotation></m:semantics></m:math></inline-formula>. Additional memory in each node is &#8719;<sub><it>i </it></sub><it>k</it><sub><it>i</it></sub>. Therefore, the time complexity is <it>O </it>(<it>nm</it>|&#931;|&#8719;<sub><it>i </it></sub><it>k</it><sub><it>i</it></sub>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|) and the space complexity is <it>O </it>(<it>m </it>&#8719;<sub><it>i </it></sub><it>k</it><sub><it>i </it></sub>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>|). In the Markov model of order <it>K</it>, one memorizes |&#931;|<sup><it>K - d </it></sup>predecessors for each node at depth <it>d</it>, 0 = <it>d </it>&lt;<it>K</it>. In other words, the number of classes becomes (<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| + <it>K</it>|&#931;|<sup><it>K</it></sup>). Therefore, the space memory is <it>O </it>((<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| + <it>K </it>|&#931;|<sup><it>K</it></sup>) &#8719;<sub><it>i </it></sub><it>k</it><sub><it>i</it></sub>) and the running time is <it>O </it>(<it>n</it>|&#931;|(<it>m</it>|<inline-formula><m:math name="1748-7188-2-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8459;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@</m:annotation></m:semantics></m:math></inline-formula>| + <it>K </it>|&#931;|<sup><it>K </it></sup>)&#8719;<sub><it>i </it></sub><it>k</it><sub><it>i</it></sub>). This <it>additive </it>increment compares favorably to simple induction methods <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B53">53</abbr></abbrgrp> that introduce a <it>multiplicative O </it>(<it>K</it>|&#931;|<sup><it>K</it></sup>) factor in time and space complexity for the Markov(<it>K</it>) model.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>We developed an algorithm for precise calculation of the <it>p</it>-value for multiple occurrences of multiple motifs with possible overlaps. The running time is linear in the text length and depends on the alphabet size, the maximal motif length, the number of words in the motifs, and the number of occurrences of each motif. The algorithm was implemented in the AHOPRO software. Below we give examples of how <it>p</it>-values can be used for studying gene regulation <it>in silico</it>, particularly for selecting optimal cutoff values for motifs represented by PWMs. In the subsection <it>'Comparison with simulation and approximation methods' </it>we compare our <it>p</it>-value computations with the result of Monte Carlo simulations and the Poisson approximation. Our results confirm the accuracy of our algorithm and show in what cases the Poisson approximation <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B11">11</abbr></abbrgrp> cannot be employed. In the subsection <it>'Optimal cutoffs'</it>, we apply AHOPRO to choose an appropriate cutoff score for Position Weights Matrices. In the subsection <it>'Assessment of gene regulation'</it>, we show how AHOPRO can be used for studying regulatory regions containing heterotypic clusters of TFBSs to distinguish genes that are regulated by given transcription factors from those that are not.</p>
         <p>As a model example, we use in this section data published in <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> on regulatory clusters in <it>D. melanogaster</it>. This compilation includes information on</p>
         <p>(i) known binding motifs for transcription factors,</p>
         <p>(ii) known CRM regions, and</p>
         <p>(iii) known regulatory interactions.</p>
         <sec>
            <st>
               <p>Comparison with simulation and approximation methods</p>
            </st>
            <p>In our first example we use the <it>even-skipped stripe 2 </it>enhancer (<it>eve2</it>) <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> of length 728 bp that is known to contain binding sites for TFs <it>bicoid, kruppel </it>and <it>hunchback</it>. Below we compare <it>p</it>-values calculated by the AHOPRO program and those calculated using compound Poisson approximation with <it>p</it>-values computed through Monte Carlo simulations.</p>
            <sec>
               <st>
                  <p>AhoPro and Monte Carlo comparisons</p>
               </st>
               <p>Table <tblr tid="T2">2</tblr> displays results of comparison of <it>p</it>-values calculated with AHOPRO and with Monte Carlo simulation assuming the Bernoulli model M0. The corresponding results for the first order Markov model M1 are displayed in Table <tblr tid="T3">3</tblr>. Letters probabilities for M0 and the transition matrix for M1 were evaluated from <it>eve2 </it>sequence. We used the PWM cutoff values taken from <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, i.e., 5.3, 5.0, and 6.2 for <it>bicoid, kruppel</it>, and <it>hunchback </it>respectively. With these threshold values in sequence <it>eve</it>2 we have found 3, 4, and 2 occurrences of motifs of each type respectively. In Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr> we listed the <it>p</it>-values, i.e, the probabilities to find no less than the observed number of occurrences of motifs in a random text of length <it>L</it>, where <it>L </it>is the length of <it>eve2 </it>enhancer. The number of Monte Carlo simulations was set to 10<sup>6 </sup>everywhere, except for the triplet (<it>bcd&amp;kr&amp;hb</it>), where we did 10<sup>7 </sup>simulations. The probability to find the observed number of occurrences of (<it>bcd&amp;kr&amp;hb</it>) simultaneously in the same simulated sequence is extremely low; thus we increased the number of simulations so that the product of the probability by the number of simulations be greater than 1.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Comparison of <it>p</it>-values calculated by the AHOPRO program, by Monte Carlo simulations and by compound Poisson distribution formula under the M0 model</p>
                  </caption>
                  <tblbdy cols="7">
                     <r>
                        <c ca="left">
                           <p>MOTIF, CUTOFF</p>
                        </c>
                        <c ca="center">
                           <p>OCC.</p>
                        </c>
                        <c ca="center">
                           <p>AHOPRO</p>
                        </c>
                        <c ca="center">
                           <p>MONTE CARLO</p>
                        </c>
                        <c ca="center">
                           <p>POISSON</p>
                        </c>
                        <c ca="center">
                           <p>AHOPRO/MC</p>
                        </c>
                        <c ca="center">
                           <p>AHOPRO/POISSON</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>bcd</it>, 5.3</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c ca="center">
                           <p>0.012</p>
                        </c>
                        <c ca="center">
                           <p>0.012</p>
                        </c>
                        <c ca="center">
                           <p>0.010</p>
                        </c>
                        <c ca="center">
                           <p>1.00</p>
                        </c>
                        <c ca="center">
                           <p>1.10</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>kr</it>, 5.0</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>0.0044</p>
                        </c>
                        <c ca="center">
                           <p>0.0044</p>
                        </c>
                        <c ca="center">
                           <p>0.0033</p>
                        </c>
                        <c ca="center">
                           <p>1.01</p>
                        </c>
                        <c ca="center">
                           <p>1.34</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>hb</it>, 6.2</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>0.013</p>
                        </c>
                        <c ca="center">
                           <p>0.013</p>
                        </c>
                        <c ca="center">
                           <p>0.012</p>
                        </c>
                        <c ca="center">
                           <p>0.99</p>
                        </c>
                        <c ca="center">
                           <p>1.04</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>bcd &amp; kr</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>3&amp;4</p>
                        </c>
                        <c ca="center">
                           <p>0.00025</p>
                        </c>
                        <c ca="center">
                           <p>0.00026</p>
                        </c>
                        <c ca="center">
                           <p>3.6E-05</p>
                        </c>
                        <c ca="center">
                           <p>0.99</p>
                        </c>
                        <c ca="center">
                           <p>7.10</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>bcd &amp; kr &amp; hb</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>3&amp;4&amp;2</p>
                        </c>
                        <c ca="center">
                           <p>6.54E-06</p>
                        </c>
                        <c ca="center">
                           <p>5.8E-06</p>
                        </c>
                        <c ca="center">
                           <p>4.34E-07</p>
                        </c>
                        <c ca="center">
                           <p>1.13</p>
                        </c>
                        <c ca="center">
                           <p>7.13</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Comparison of <it>p</it>-values calculated for the Markov(0) model by the AHOPRO program with p-values calculated by Monte Carlo simulations and by Poisson formula for motifs of <it>D. melanogaster </it>developmental transcription factors <it>bicoid, kruppel </it>and <it>hunchback</it>.</p>
                  </tblfn>
               </tbl>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Comparison of <it>p</it>-values calculated by the AHOPRO program, by Monte Carlo simulation and by compound Poisson distribution formula under the M1 model</p>
                  </caption>
                  <tblbdy cols="7">
                     <r>
                        <c ca="left">
                           <p>MOTIF, CUTOFF</p>
                        </c>
                        <c ca="center">
                           <p>OCC.</p>
                        </c>
                        <c ca="center">
                           <p>AHOPRO</p>
                        </c>
                        <c ca="center">
                           <p>MONTE CARLO</p>
                        </c>
                        <c ca="center">
                           <p>POISSON</p>
                        </c>
                        <c ca="center">
                           <p>AHOPRO/MC</p>
                        </c>
                        <c ca="center">
                           <p>AHOPRO/POISSON</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>bcd</it>, 5.3</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c ca="center">
                           <p>0.013</p>
                        </c>
                        <c ca="center">
                           <p>0.014</p>
                        </c>
                        <c ca="center">
                           <p>0.012</p>
                        </c>
                        <c ca="center">
                           <p>0.998</p>
                        </c>
                        <c ca="center">
                           <p>1.11</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>kr</it>, 5.0</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>0.011</p>
                        </c>
                        <c ca="center">
                           <p>0.011</p>
                        </c>
                        <c ca="center">
                           <p>0.008</p>
                        </c>
                        <c ca="center">
                           <p>1.01</p>
                        </c>
                        <c ca="center">
                           <p>1.43</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>hb</it>, 6.2</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>0.11</p>
                        </c>
                        <c ca="center">
                           <p>0.9987</p>
                        </c>
                        <c ca="center">
                           <p>1.25</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>bcd &amp; kr</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>3&amp;4</p>
                        </c>
                        <c ca="center">
                           <p>0.00051</p>
                        </c>
                        <c ca="center">
                           <p>0.00051</p>
                        </c>
                        <c ca="center">
                           <p>9.62E-05</p>
                        </c>
                        <c ca="center">
                           <p>0.9991</p>
                        </c>
                        <c ca="center">
                           <p>5.34</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>bcd &amp; kr &amp; hb</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>3&amp;4&amp;2</p>
                        </c>
                        <c ca="center">
                           <p>6.9E-05</p>
                        </c>
                        <c ca="center">
                           <p>6.97E-05</p>
                        </c>
                        <c ca="center">
                           <p>1.08E-05</p>
                        </c>
                        <c ca="center">
                           <p>0.9889</p>
                        </c>
                        <c ca="center">
                           <p>6.36</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Comparison of <it>p</it>-values calculated by the AHOPRO program for the Markov(1) model with those calculated by Monte Carlo simulations and by Poisson formula for motifs of <it>D. melanogaster </it>developmental transcription factors <it>bicoid, kruppel</it>, and <it>hunchback</it>.</p>
                  </tblfn>
               </tbl>
               <p>The results of comparison of the AHOPRO computation with those obtained from simulated random sequences presented in Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr> confirm the accuracy of our algorithm.</p>
            </sec>
            <sec>
               <st>
                  <p>Poisson approximation</p>
               </st>
               <p>In practical application, compound Poisson distribution <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> is widely used to assess <it>p</it>-values of multiple motif occurrences <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B8">8</abbr><abbr bid="B34">34</abbr><abbr bid="B65">65</abbr></abbrgrp>. Here we apply it to compute the probability to observe the given number of motif occurrences when the probabilities of individual words are calculated adopting the M0 or M1 models described above. The results of the comparison given in corresponding columns in Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr> show that the <it>p</it>-value calculated using Poisson approximation can be significantly underestimated. This happens most probably because the Poisson approximation does not take into account possible overlaps between motif occurrences and considers motif occurrences as independent. The error increases when the <it>p</it>-value is calculated for simultaneous occurrences of several factors, as it is done in the last two rows. In this case, the Poisson approximation <it>p</it>-value for a combination of several TFs is calculated as a product of <it>p</it>-values calculated independently for each TF. Actually, the motif occurrences can overlap especially when the motifs resemble each other, thus there is no independence, which brings about the error.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Optimal cutoffs</p>
            </st>
            <p>Below, we use AHOPRO to determine the optimal cutoff values for PWMs of regulatory factors, given the sequences of regulatory region assumedly interacting with the factors. The distribution of occurrences of TF binding sites in corresponding experimentally confirmed regulatory regions is strongly biased <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. In CRMs binding sites often tend to occur in clusters, which is not the case for random sequences.</p>
            <p>Different cutoff values correspond to different numbers of putative binding sites of different quality. The higher the cutoff value, the closer the motif occurrences are to the consensus and the smaller the number of motif occurrences. Therefore, for a given factor it is reasonable to select a cutoff value that minimizes the probability of finding in the random sequence the number of motif occurrences observed in the sequence of the regulatory region.</p>
            <p>As an example, we considered again transcription factors <it>bicoid, kruppel</it>, which are known to regulate the <it>even-skipped stripe 2 </it>(<it>eve2</it>) enhancer. To select the optimal cutoff value we used the following procedure: first, in the sequence of <it>eve2 </it>we counted occurrences of motifs with a score greater than the cutoff with cutoff values varied from 3 to 8.5. Therefore, each pair of cutoff values (<it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>) corresponded to (<it>k</it><sub>1</sub>, <it>k</it><sub>2</sub>) occurrences for motifs of <it>bicoid </it>and <it>kruppel </it>respectively. For each pair (<it>k</it><sub>1</sub>, <it>k</it><sub>2</sub>), we computed <it>p</it>-value <it>P</it><sub><it>n </it></sub>(<it>k</it><sub>1 </sub>(<it>S</it><sub>1</sub>), <it>k</it><sub>2 </sub>(<it>S</it><sub>2</sub>)), which is denoted below as <it>P </it>(<it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>). That is the probability to obtain at least <it>k</it><sub>1 </sub>occurrences of <it>bicoid</it>, with scores greater than <it>S</it><sub>1</sub>, and at least <it>k</it><sub>2 </sub>occurrences of <it>kruppel</it>, with scores greater than <it>S</it><sub>2</sub>. In Figure <figr fid="F3">3</figr>, a 3D-surface is shown, where (<it>x, y, z</it>) corresponds to (<it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>, - log<sub>10 </sub><it>P </it>(<it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>)), the cutoff value for <it>bicoid </it>motif, the cutoff value for <it>kruppel </it>motif and -logarithm of the corresponding <it>p</it>-value calculated for the M1 model respectively. The view to the surface from the above is shown in Figure <figr fid="F3">3C</figr>. The maximal value for &#8211; log<sub>10 </sub><it>P </it>(<it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>), 6.3044, is attained when the <it>bicoid </it>cutoff is equal to <it>S</it><sub>1 </sub>= 5.1 and the <it>kruppel </it>cutoff is equal to <it>S</it><sub>2 </sub>= 5.6. With such cutoff values in the sequence of the <it>eve2 </it>enhancer there are <it>k</it><sub>1 </sub>= 6 and <it>k</it><sub>2 </sub>= 4 occurrences of <it>bicoid </it>and <it>kruppel </it>motifs defined by corresponding PWMs. We believe that the sites that are found with this optimal <it>p</it>-value are the best candidates for functional TF binding sites.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>P-value distribution for <it>eve2 </it>and random sequences</p>
               </caption>
               <text>
                  <p><b>P-value distribution for <it>eve2 </it>and random sequences</b>. Distribution of log<sub>10 </sub>(<it>Pvalue</it>) calculated for the M1 model as a function of cutoff values for PWMs for BICOID and KRUPPEL in the <it>even-skipped stripe 2 </it>enhancer (A), in a random sequence (B). View from above: eve2 sequence (C), random sequence (D).</p>
               </text>
               <graphic file="1748-7188-2-13-3"/>
            </fig>
            <p>For comparison, we simulated random sequences with the same length as the <it>eve2 </it>enhancer and the same dinucleotide probabilities. In most of simulated sequences, for the cutoff values for <it>bicoid </it>and <it>kruppel </it>equal to (<it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>) = (5.1, 5.6) we found no more than one occurrence of each motif. The average number of occurrences is 0.54 for <it>bicoid </it>and 0.31 for <it>kruppel</it>. The average <it>p</it>-value is 0.633. We took one of the random sequences and compared <it>p</it>-values calculated for various cutoff values in this random sequence (Figures <figr fid="F3">3B, 3D</figr>) and in the real biological sequence of the <it>eve2 </it>enhancer (Figures <figr fid="F3">3A, 3C</figr>). One can see that there are two major differences between <it>p</it>-value distributions in really regulated sequences and in the random sequence. First, <it>p</it>-values in the random sequence are much greater than those in the enhancer sequence. In particular, maximal &#8211; log(<it>pvalue</it>) for this random sequence is about 1.02 which is 6.17 times smaller than maximal &#8211; log(<it>pvalue</it>) for the enhancer sequence (see also Table <tblr tid="T4">4</tblr>). Second, the shapes of <it>p</it>-value distributions are different. For the enhancer sequence, there are only few distinct peaks (4.3, 5.6),(4.3, 6.8), (5.1, 5.6), (5.1, 6.8) whereas for the random sequence we see ridges between (2.2, 2.0) and (2.2, 4.8), and (2.8, 2.0) and (2.8, 4.8). As we expected, it is impossible to choose the appropriate cutoff for PWMs of factors from the random sequence data (Figures <figr fid="F3">3B</figr> and <figr fid="F3">3D</figr>).</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Comparison of <it>p</it>-values and cutoff for different sets of DNA sequences</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c ca="left">
                        <p>regulatory regions bicoid regulated</p>
                     </c>
                     <c ca="center">
                        <p>minimal pvalue</p>
                     </c>
                     <c ca="center">
                        <p>Cut-off</p>
                     </c>
                     <c ca="left">
                        <p>regulatory regions not regulated by bicoid</p>
                     </c>
                     <c ca="center">
                        <p>minimal pvalue</p>
                     </c>
                     <c ca="center">
                        <p>Cut-off</p>
                     </c>
                     <c ca="left">
                        <p>random seq.</p>
                     </c>
                     <c ca="center">
                        <p>minimal pvalue</p>
                     </c>
                     <c ca="center">
                        <p>Cut-off</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Btd crm</p>
                     </c>
                     <c ca="center">
                        <p>3.24E-05</p>
                     </c>
                     <c ca="center">
                        <p>3.4</p>
                     </c>
                     <c ca="left">
                        <p>Gt p. enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.023</p>
                     </c>
                     <c ca="center">
                        <p>2.7</p>
                     </c>
                     <c ca="left">
                        <p>seq. 1</p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                     <c ca="center">
                        <p>2.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hb P2</p>
                     </c>
                     <c ca="center">
                        <p>4.13E-05</p>
                     </c>
                     <c ca="center">
                        <p>3.7</p>
                     </c>
                     <c ca="left">
                        <p>Hb upstream enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.053</p>
                     </c>
                     <c ca="center">
                        <p>4.4</p>
                     </c>
                     <c ca="left">
                        <p>seq. 2</p>
                     </c>
                     <c ca="center">
                        <p>0.12</p>
                     </c>
                     <c ca="center">
                        <p>1.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Kni cis element</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                     </c>
                     <c ca="center">
                        <p>5.3</p>
                     </c>
                     <c ca="left">
                        <p>Eve stripe 4+6 enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>3.6</p>
                     </c>
                     <c ca="left">
                        <p>seq. 3</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Kr CD-1 enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.0001</p>
                     </c>
                     <c ca="center">
                        <p>5.1</p>
                     </c>
                     <c ca="left">
                        <p>Eve stripe 3+7 enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.58</p>
                     </c>
                     <c ca="center">
                        <p>2.5</p>
                     </c>
                     <c ca="left">
                        <p>seq. 4</p>
                     </c>
                     <c ca="center">
                        <p>0.065</p>
                     </c>
                     <c ca="center">
                        <p>1.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Otd early enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.024</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>Ftz upstream enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.037</p>
                     </c>
                     <c ca="center">
                        <p>5.8</p>
                     </c>
                     <c ca="left">
                        <p>seq. 5</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sal blastoder. enh.</p>
                     </c>
                     <c ca="center">
                        <p>8.62E-04</p>
                     </c>
                     <c ca="center">
                        <p>6.5</p>
                     </c>
                     <c ca="left">
                        <p>Ftz</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>3.3</p>
                     </c>
                     <c ca="left">
                        <p>seq. 6</p>
                     </c>
                     <c ca="center">
                        <p>0.0087</p>
                     </c>
                     <c ca="center">
                        <p>3.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tll PD enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.26</p>
                     </c>
                     <c ca="center">
                        <p>4.2</p>
                     </c>
                     <c ca="left">
                        <p>Ubx PBX enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.196</p>
                     </c>
                     <c ca="center">
                        <p>6.7</p>
                     </c>
                     <c ca="left">
                        <p>seq. 7</p>
                     </c>
                     <c ca="center">
                        <p>0.024</p>
                     </c>
                     <c ca="center">
                        <p>2.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tll AD+PD enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.025</p>
                     </c>
                     <c ca="center">
                        <p>8.1</p>
                     </c>
                     <c ca="left">
                        <p>Ubx BXD enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.698</p>
                     </c>
                     <c ca="center">
                        <p>4.6</p>
                     </c>
                     <c ca="left">
                        <p>seq. 8</p>
                     </c>
                     <c ca="center">
                        <p>0.17</p>
                     </c>
                     <c ca="center">
                        <p>3.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Eve stripe 2 enh.</p>
                     </c>
                     <c ca="center">
                        <p>4.04E-05</p>
                     </c>
                     <c ca="center">
                        <p>5.1</p>
                     </c>
                     <c ca="left">
                        <p>Ubx BX enh. (BRE)</p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>7.5</p>
                     </c>
                     <c ca="left">
                        <p>seq. 9</p>
                     </c>
                     <c ca="center">
                        <p>0.092</p>
                     </c>
                     <c ca="center">
                        <p>2.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Eve stripe 1 enh.</p>
                     </c>
                     <c ca="center">
                        <p>8.09E-06</p>
                     </c>
                     <c ca="center">
                        <p>5.2</p>
                     </c>
                     <c ca="left">
                        <p>Ems upstream enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.276</p>
                     </c>
                     <c ca="center">
                        <p>4.4</p>
                     </c>
                     <c ca="left">
                        <p>seq. 10</p>
                     </c>
                     <c ca="center">
                        <p>0.052</p>
                     </c>
                     <c ca="center">
                        <p>3.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Eve stripe 5 enh.</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>3.8</p>
                     </c>
                     <c ca="left">
                        <p>En stripe enh. (intr. 1)</p>
                     </c>
                     <c ca="center">
                        <p>0.049</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>seq. 11</p>
                     </c>
                     <c ca="center">
                        <p>0.13</p>
                     </c>
                     <c ca="center">
                        <p>1.7</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Median</p>
                     </c>
                     <c ca="center">
                        <p>8.62E-04</p>
                     </c>
                     <c ca="center">
                        <p>5.1</p>
                     </c>
                     <c ca="left">
                        <p>Median</p>
                     </c>
                     <c ca="center">
                        <p>0.196</p>
                     </c>
                     <c ca="center">
                        <p>4.4</p>
                     </c>
                     <c ca="left">
                        <p>Median</p>
                     </c>
                     <c ca="center">
                        <p>0.1128</p>
                     </c>
                     <c ca="center">
                        <p>2.6</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Comparison of minimal <it>p</it>-values and best found cutoffs for bicoid PWM calculated (i) in regulatory regions which are regulated by <it>bicoid</it>, (ii) in regulatory regions which are not regulated by <it>bicoid</it>, and (iii) in random sequences of the same length and with the same dinucleotide distribution as in the <it>even-skipped stripe 2 </it>enhancer.</p>
               </tblfn>
            </tbl>
            <p>We also would like to address the choice between the M0 and M1 models. We observed, that in almost all cases the <it>p</it>-value calculated for the M0 model is smaller than the <it>p</it>-value calculated for the M1 model. This can probably be explained by the fact that using the M1 model we take into account more information about the real sequence than in the M0 model. Nevertheless, the difference is not crucial; for instance, the greatest value of the ratio between <it>p</it>-values calculated adopting the M0 and M1 for <it>bicoid </it>and <it>kruppel </it>is about 3.62 for the <it>eve2 </it>enhancer. So, the M0 model can be equally used in practical applications.</p>
         </sec>
         <sec>
            <st>
               <p>Assessment of gene regulation</p>
            </st>
            <p>Enhancers may contain clusters of TF binding sites for gene regulators. In such cases, <it>p</it>-value computation can be used to distinguish genes that are regulated by a given transcription factor from those that are not. To illustrate this, we took PWM for TF <it>bicoid </it>and calculated <it>p</it>-values for different cutoff values in various sets of sequences:</p>
            <p>- regulatory regions which are regulated by <it>bicoid</it>, the positive set;</p>
            <p>- regulatory regions which are not regulated by <it>bicoid</it>, the negative set;</p>
            <p>- random sequences of the same length as <it>eve2 </it>enhancer and with the same dinucleotide distribution, the random set.</p>
            <p>Minimal <it>p</it>-value and the corresponding cutoff value for 11 sequences in each set are presented in Table <tblr tid="T4">4</tblr>. Comparing the <it>p</it>-values we observed that <it>p</it>-values calculated for the positive set generally were significantly smaller than those, calculated for the negative and for the random sets.</p>
            <p>The median for the <it>p</it>-value in the positive set is equal to 8.62E-04. But there are some exceptions, for instance, the <it>tailless PD </it>enhancer with a minimal <it>p</it>-value that is equal to 0.26 and the <it>even-skipped stripe 5 </it>enhancer with the minimal <it>p</it>-value that is equal to 0.27. Despite the fact that these genes are reported to be regulated by <it>bicoid </it>and that there are experimentally confirmed individual <it>bicoid </it>binding sites in these sequences, these sequences do not contain clusters of <it>bicoid </it>binding sites.</p>
            <p>Most <it>p</it>-values calculated for the negative set, (second set in Table <tblr tid="T4">4</tblr>), are significantly higher than <it>p</it>-values calculated for the positive set. But we observed rather small <it>p</it>-values for sequences of the <it>giant posterior </it>enhancer (0.023), the <it>hunchback </it>upstream enhancer (0.053), the <it>fushi tarazu </it>upstream enhancer (0.037), the <it>ultrabithorax BX </it>enhancer (0.05), and the <it>engrailed </it>stripe enhancer (0.049). We believe that this can be explained by the fact that these regions contain clusters of binding sites of regulatory factors with motifs that are similar to the <it>bicoid </it>motif. Indeed, it was experimentally shown that TF <it>kruppel </it>regulates the <it>giant </it>posterior enhancer, TF <it>tailless </it>regulates the <it>hunchback </it>upstream enhancer and the <it>ultrabithorax BX </it>enhancer, and TF <it>fushi tarazu </it>regulates the <it>fushi tarazu </it>upstream enhancer, the <it>ultrabithorax BX </it>enhancer and the <it>engrailed </it>stripe enhancer. All these motifs of <it>kruppel, tailless </it>and <it>fushi tarazu </it>exhibit some similarity to the <it>bicoid </it>motif. This observation shows the necessity to use some sort of conditional <it>p</it>-values in order to distinguish between the true <it>bicoid </it>clusters and the clusters of weak <it>bicoid </it>sites induced by presence of the clusters of other TF sites <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. Moreover, the apparent false positive hit (<it>p</it>-value = 0.05, cutoff = 7.5) in a region that was not reported to be regulated by <it>bicoid </it>seems to be related to the real <it>bicoid </it>binding, although not necessarily functional.</p>
            <p>For the random set, i.e., sequences simulated with the same dinucleotide probabilities as in the <it>even-skipped stripe 2 </it>enhancer, we observe a rather broad range of minimal <it>p</it>-values, from 0.0087 for the 6th sample to 0.25 for the 3rd sample. It shows that the predictive power of this approach is limited to the case of regulatory sequences containing clusters of motifs.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this work we have developed an algorithm inspired by the Aho-Corasick pattern matching algorithm that allows precise calculation of the probability to find given motif conformation in a random text. It was implemented in the AHOPRO software for the Bernoulli model and the Markov model of order 1 of random sequences. There would be no difficulty in extending our approach for Markov models of order <it>k</it>, <it>k </it>> 1. We compared probabilities computed with AHOPRO with those computed by compound Poisson distribution and showed that in the case of multiple occurrences of multiple motifs the Poisson approximation often substantially underestimate the <it>p</it>-value.</p>
         <p>As we have demonstrated, the statistical significance of multiple motif occurrence in the text can be efficiently calculated with a simple algorithm. This can give an independent criteria to improve the results of site extraction algorithms, which still performs rather poorly. P-values or E-values are used in such programs as BLAST and make quantities to which practicing biologists are used to. Thus, adopting this measure to motif extraction (for a single or multiple motif occurrences) would greatly help the users who use motif extraction analysis as a preliminary stage for experiments in the lab. On the other hand, our algorithm is not connected with a particular motif extraction program, and uses a most general motif representation, the list of the allowed words <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, as input. Thus, it can be used when the results of several motif extraction algorithms are compared, for instance in the interpretation of ChIP-chip experiments <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. In addition, our algorithm AHOPRO can easily be extended to amino acid sequences and applied in identification of protein domain signatures.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>VM initiated the study by pointing at the biological problem. JC suggested the initial idea of using Aho-Corasick structure. The final version of the algorithm was developed in discussions between JC, VB, MR and MAR. JC and VB developed the implementation. VB obtained results on simulated and biological sequences. VB designed the web site. MR, MAR, VB and VM participated in manuscript writing. MR and VM coordinated the study. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Thanks to Andrey Mironov, Stephen Small, Dmitri Papatsenko, Bruno Salvy and Philippe Flajolet for helpful comments and suggestions. Thanks to Alexander Favorov for help with the programming. Thanks to Tim Barker for correcting the English in the manuscript. This research was partially supported by INTAS #04-83-3994 and #05-1000008-8028, French Program EcoNet-12635WG, the RFBR grants 07-04-01584 and 06-04-49249, and by Russian Federation Agency in Science and Innovation State Contract 02.531.11.9003.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Practical strategies for discovering regulatory DNA sequence motifs</p>
            </title>
            <aug>
               <au>
                  <snm>MacIsaac</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Fraenkel</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>PloS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>4</issue>
            <fpage>e36</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1447654</pubid>
                  <pubid idtype="pmpid" link="fulltext">16683017</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020036</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A survey of motif discovery methods in an integrated framework</p>
            </title>
            <aug>
               <au>
                  <snm>Sandve</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Drablos</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Biol Direct</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <fpage>11</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1479319</pubid>
                  <pubid idtype="pmpid" link="fulltext">16600018</pubid>
                  <pubid idtype="doi">10.1186/1745-6150-1-11</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Computational approaches to identify promoters and cis-regulatory elements in plant genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Rombauts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Florquin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lescot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marchal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>van de Peer</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2003</pubdate>
            <volume>132</volume>
            <issue>3</issue>
            <fpage>1162</fpage>
            <lpage>1176</lpage>
            <note>Review.</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">167057</pubid>
                  <pubid idtype="pmpid" link="fulltext">12857799</pubid>
                  <pubid idtype="doi">10.1104/pp.102.017715</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>DNA microarray technologies for measuring protein-DNA interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Bulyk</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Curr Opin Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>17</volume>
            <issue>4</issue>
            <fpage>422</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.copbio.2006.06.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">16839757</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Transcriptional regulatory code of a eukaryotic genome</p>
            </title>
            <aug>
               <au>
                  <snm>Harbison</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Gordon</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Rinaldi</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Macisaac</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Danford</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hannett</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Tagne</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Reynolds</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jennings</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Zeitlinger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pokholok</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rolfe</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Takusagawa</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Gifford</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Fraenkel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>99</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02800</pubid>
                  <pubid idtype="pmpid" link="fulltext">15343339</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Discovering functional transcription-factor combinations in the human cell cycle</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Shendure</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>6</issue>
            <fpage>848</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1142475</pubid>
                  <pubid idtype="pmpid" link="fulltext">15930495</pubid>
                  <pubid idtype="doi">10.1101/gr.3394405</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A self-organizing system of repressor gradients establishes segmental complexity in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Clyde</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Corado</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Pare</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Small</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>426</volume>
            <issue>6968</issue>
            <fpage>849</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02189</pubid>
                  <pubid idtype="pmpid" link="fulltext">14685241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <issue>10</issue>
            <fpage>776</fpage>
            <lpage>784</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/15.10.776</pubid>
                  <pubid idtype="pmpid" link="fulltext">10705431</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Homotypic regulatory clusters in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Lifanov</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Nazina</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>4</issue>
            <fpage>579</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430164</pubid>
                  <pubid idtype="pmpid" link="fulltext">12670999</pubid>
                  <pubid idtype="doi">10.1101/gr.668403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>New computational approaches for analysis of cis-regulatory networks</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Rust</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Schilstra</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>De Buysscher</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Griffin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wold</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Bolouri</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2002</pubdate>
            <volume>246</volume>
            <fpage>86</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.2002.0619</pubid>
                  <pubid idtype="pmpid" link="fulltext">12027436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A computational genomics approach to the identification of gene networks</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <issue>18</issue>
            <fpage>3594</fpage>
            <lpage>3604</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146952</pubid>
                  <pubid idtype="pmpid" link="fulltext">9278479</pubid>
                  <pubid idtype="doi">10.1093/nar/25.18.3594</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Control of tailless expression by bicoid, dorsal and synergistically interacting terminal system regulatory elements</p>
            </title>
            <aug>
               <au>
                  <snm>Liaw</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Lengyel</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Mech Dev</source>
            <pubdate>1993</pubdate>
            <volume>40</volume>
            <issue>1&#8211;2</issue>
            <fpage>47</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0925-4773(93)90087-E</pubid>
                  <pubid idtype="pmpid">8443106</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Cooperative interactions between paired domain and homeodomain</p>
            </title>
            <aug>
               <au>
                  <snm>Jun</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Desplan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1996</pubdate>
            <volume>122</volume>
            <issue>9</issue>
            <fpage>2639</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8787739</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>[Constructive synergism of regulatory genes expressed in the course of the eye and muscle development and regeneration]</p>
            </title>
            <aug>
               <au>
                  <snm>Mitashev</snm>
                  <fnm>VI</fnm>
               </au>
               <au>
                  <snm>Koussoulakos</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zinov'eva</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Ozerniuk</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Mikaelian</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Shmukler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Smirnova Iu</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Izv Akad Nauk Ser Biol</source>
            <pubdate>2001</pubdate>
            <issue>3</issue>
            <fpage>261</fpage>
            <lpage>75</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11433936</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Regulatory modules shared within gene classes as well as across gene classes can be detected by the same in silico approach</p>
            </title>
            <aug>
               <au>
                  <snm>Klingenhoff</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Frech</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Werner</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>In Silico Biol</source>
            <pubdate>2002</pubdate>
            <volume>2</volume>
            <fpage>S17</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11808874</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Identifying combinatorial regulation of transcription factors and binding motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Kato</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hata</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Banerjee</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Futcher</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>MQ</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>8</issue>
            <fpage>R56</fpage>
            <note>Epub 2004 Jul 28.</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">507881</pubid>
                  <pubid idtype="pmpid" link="fulltext">15287978</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-8-r56</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Combinatorial motif analysis and hypothesis generation on a genomic scale</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>YJ</fnm>
               </au>
               <au>
                  <snm>Sandmeyer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McLaughlin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kibler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>3</issue>
            <fpage>222</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.3.222</pubid>
                  <pubid idtype="pmpid" link="fulltext">10869015</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes</p>
            </title>
            <aug>
               <au>
                  <snm>Jegga</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Sherwood</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Carman</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Pinski</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Pestian</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Aronow</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>9</issue>
            <fpage>1408</fpage>
            <lpage>17</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186658</pubid>
                  <pubid idtype="pmpid" link="fulltext">12213778</pubid>
                  <pubid idtype="doi">10.1101/gr.255002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Identification of the binding sites of regulatory proteins in bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rhodius</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <issue>18</issue>
            <fpage>11772</fpage>
            <lpage>7</lpage>
            <note>Epub 2002 Aug 14.</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">129344</pubid>
                  <pubid idtype="pmpid" link="fulltext">12181488</pubid>
                  <pubid idtype="doi">10.1073/pnas.112341999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A regulatory code for neurogenic gene expression in the Drosophila embryo</p>
            </title>
            <aug>
               <au>
                  <snm>Markstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zinzen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Markstein</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yee</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Erives</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stathopoulos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2004</pubdate>
            <volume>131</volume>
            <issue>10</issue>
            <fpage>2387</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/dev.01124</pubid>
                  <pubid idtype="pmpid" link="fulltext">15128669</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Distance preferences in distribution of binding motifs and hierarchical levels in organization of transcription regulatory information</p>
            </title>
            <aug>
               <au>
                  <snm>Makeev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lifanov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nazina</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>20</issue>
            <fpage>6016</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">219477</pubid>
                  <pubid idtype="pmpid" link="fulltext">14530449</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg799</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Exploring genetic regulatory networks in metazoan development: methods and models</p>
            </title>
            <aug>
               <au>
                  <snm>Halfon</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Michelson</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2002</pubdate>
            <volume>10</volume>
            <issue>3</issue>
            <fpage>131</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12209016</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>ClusterDraw web server: a tool to identify and visualize clusters of binding motifs for transcription factors</p>
            </title>
            <aug>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>8</issue>
            <fpage>1032</fpage>
            <lpage>1034</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm047</pubid>
                  <pubid idtype="pmpid" link="fulltext">17308342</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Computational detection of cis -regulatory modules</p>
            </title>
            <aug>
               <au>
                  <snm>Aerts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Loo</snm>
                  <fnm>PV</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Moor</snm>
                  <fnm>BD</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>2</issue>
            <fpage>II5</fpage>
            <lpage>II14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1052</pubid>
                  <pubid idtype="pmpid" link="fulltext">14534164</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Searching for statistically significant regulatory modules</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>2</issue>
            <fpage>II16</fpage>
            <lpage>II25</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1054</pubid>
                  <pubid idtype="pmpid" link="fulltext">14534166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Laverty</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>9</issue>
            <fpage>R61</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522868</pubid>
                  <pubid idtype="pmpid" link="fulltext">15345045</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-9-r61</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Detection of cis-element clusters in higher eukaryotic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Frith</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>10</issue>
            <fpage>878</fpage>
            <lpage>889</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.10.878</pubid>
                  <pubid idtype="pmpid" link="fulltext">11673232</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Cluster-Buster: Finding dense clusters of motifs in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3666</fpage>
            <lpage>3668</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkg540</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824389</pubid>
                  <pubid idtype="pmcid">168947</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Target Explorer: an automated tool for the identification of new target genes for a specified set of transcription factors</p>
            </title>
            <aug>
               <au>
                  <snm>Sosinsky</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bonin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Honig</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3589</fpage>
            <lpage>3592</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168951</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824372</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg544</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Searching for transcription factor binding site clusters: how true are true positives?</p>
            </title>
            <aug>
               <au>
                  <snm>Krivan</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <issue>2</issue>
            <fpage>413</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S021972000400065X</pubid>
                  <pubid idtype="pmpid" link="fulltext">15297989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Extraction of Functional Binding Sites from Unique Regulatory Regions: The <it>Drosophila </it>Early Developmental Enhancers</p>
            </title>
            <aug>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lifanov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nazina</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Desplan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>470</fpage>
            <lpage>481</lpage>
            <note>[Preliminary version in Drosophila Workshop, Washington 2001].</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155290</pubid>
                  <pubid idtype="pmpid" link="fulltext">11875036</pubid>
                  <pubid idtype="doi">10.1101/gr.212502. Article published online before print in February 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Genome-wide Analysis of Clustered Dorsal Binding Sites Identifies Putative Target Genes in the Drosophila Embryo</p>
            </title>
            <aug>
               <au>
                  <snm>Markstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Markstein</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Markstein</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>PNAS</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <issue>2</issue>
            <fpage>763</fpage>
            <lpage>768</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">117379</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752406</pubid>
                  <pubid idtype="doi">10.1073/pnas.012591199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation</p>
            </title>
            <aug>
               <au>
                  <snm>Rebeiz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Reeves</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Posakony</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <issue>15</issue>
            <fpage>9888</fpage>
            <lpage>93</lpage>
            <note>Epub 2002 Jul 09.</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">125053</pubid>
                  <pubid idtype="pmpid" link="fulltext">12107285</pubid>
                  <pubid idtype="doi">10.1073/pnas.152320899</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Uniform clusters in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Lifanov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Nazina</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>4</issue>
            <fpage>579</fpage>
            <lpage>588</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430164</pubid>
                  <pubid idtype="pmpid" link="fulltext">12670999</pubid>
                  <pubid idtype="doi">10.1101/gr.668403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Methods for calculating the probabilities of finding patterns in sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Staden</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1989</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>89</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2720468</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>In vitro selection of RNA molecules that bind specific ligands</p>
            </title>
            <aug>
               <au>
                  <snm>Ellington</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Szostak</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1990</pubdate>
            <volume>346</volume>
            <fpage>818</fpage>
            <lpage>822</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/346818a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">1697402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase</p>
            </title>
            <aug>
               <au>
                  <snm>Tuerk</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1990</pubdate>
            <volume>249</volume>
            <fpage>505</fpage>
            <lpage>510</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.2200121</pubid>
                  <pubid idtype="pmpid" link="fulltext">2200121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities</p>
            </title>
            <aug>
               <au>
                  <snm>Berger</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Philippakis</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Qureshi</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Estep</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Bulyk</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <fpage>1429</fpage>
            <lpage>1435</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1246</pubid>
                  <pubid idtype="pmpid" link="fulltext">16998473</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Modeling Transcriptional Regulation in Chondrogenesis Using Particle Swarm Optimization</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yokota</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB2005</source>
            <pubdate>2005</pubdate>
            <fpage>311</fpage>
            <lpage>317</lpage>
         </bibl>
         <bibl id="B40">
            <title>
               <p>IUPAC codes</p>
            </title>
            <url>http://bioinformatics.org/sms2/iupac.html</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Selection of DNA binding sites by regulatory proteins. Functional specificity and pseudosite competition</p>
            </title>
            <aug>
               <au>
                  <snm>Berg</snm>
                  <fnm>OG</fnm>
               </au>
            </aug>
            <source>J Biomol Struct Dyn</source>
            <pubdate>1988</pubdate>
            <volume>6</volume>
            <issue>2</issue>
            <fpage>275</fpage>
            <lpage>297</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3271524</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <aug>
               <au>
                  <snm>Knuth</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>The Art of Computer Programming, Sorting and Searching</source>
            <publisher>Addison-Wesley</publisher>
            <pubdate>1973</pubdate>
            <volume>3</volume>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Computing exact P-values for DNA motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tromp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>5</issue>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl662</pubid>
                  <pubid idtype="pmpid" link="fulltext">17237046</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Finding Motifs in Promoter Regions</p>
            </title>
            <aug>
               <au>
                  <snm>Hertzberg</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zuk</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Getz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Domany</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <issue>3</issue>
            <fpage>314</fpage>
            <lpage>330</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/cmb.2005.12.314</pubid>
                  <pubid idtype="pmpid" link="fulltext">15857245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Exact distribution of word occurrences in a random sequence of letters</p>
            </title>
            <aug>
               <au>
                  <snm>Robin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Daudin</snm>
                  <fnm>JJ</fnm>
               </au>
            </aug>
            <source>J Appl Prob</source>
            <pubdate>1999</pubdate>
            <volume>36</volume>
            <fpage>179</fpage>
            <lpage>193</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1239/jap/1032374240</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>The Occurrence of Sequence of Patterns in Repeated Dependent Experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Chrysaphinou</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Papastavridis</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Theory of Probability and Applications</source>
            <pubdate>1990</pubdate>
            <volume>79</volume>
            <fpage>167</fpage>
            <lpage>173</lpage>
         </bibl>
         <bibl id="B47">
            <title>
               <p>String Overlaps, Pattern Matching and Nontransitive Games</p>
            </title>
            <aug>
               <au>
                  <snm>Guibas</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Odlyzko</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Journal of Combinatorial Theory, Series A</source>
            <pubdate>1981</pubdate>
            <volume>30</volume>
            <fpage>183</fpage>
            <lpage>208</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0097-3165(81)90005-4</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Central Limit Theorem for Renewal Theory for Several Patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Tanushev</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Arratia</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>1997</pubdate>
            <volume>4</volume>
            <fpage>35</fpage>
            <lpage>44</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9109036</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Motif Statistics</p>
            </title>
            <aug>
               <au>
                  <snm>Nicod&#232;me</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Salvy</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Flajolet</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Theoretical Computer Science</source>
            <pubdate>2002</pubdate>
            <volume>287</volume>
            <issue>2</issue>
            <fpage>593</fpage>
            <lpage>618</lpage>
            <note>[Preliminary version at ESA'99].</note>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0304-3975(01)00264-X</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>A Unified Approach to Word Occurrences Probabilities</p>
            </title>
            <aug>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Discrete Applied Mathematics</source>
            <pubdate>2000</pubdate>
            <volume>104</volume>
            <fpage>259</fpage>
            <lpage>280</lpage>
            <note>[Special issue on Computational Biology;preliminary version at RECOMB'98].</note>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0166-218X(00)00195-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <aug>
               <au>
                  <snm>Szpankowski</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Average Case Analysis of Algorithms on Sequences</source>
            <publisher>New York: John Wiley and Sons</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Counting occurrences for a finite set of words: an inclusion-exclusion approach</p>
            </title>
            <aug>
               <au>
                  <snm>Bassino</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cl&#233;ment</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fayolle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nicod&#232;me</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>2007 International Conference on Analysis of Algorithms (AofA'07), Discrete Mathematics and Theoretical Computer Science</source>
            <pubdate>2007</pubdate>
            <fpage>12</fpage>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Searching for Multiple Words inMarkov Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Park</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Spouge</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>INFORMS journal of Computing</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <issue>4</issue>
            <fpage>341</fpage>
            <lpage>347</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1287/ijoc.1040.0095</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Regexpcount, a symbolic package for counting problems on regular expressions and words</p>
            </title>
            <aug>
               <au>
                  <snm>Nicod&#232;me</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Fundamenta Informaticae</source>
            <pubdate>2003</pubdate>
            <volume>56</volume>
            <issue>1&#8211;2</issue>
            <fpage>71</fpage>
            <lpage>88</lpage>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Detecting localized repeats in genomic sequences: A new strategy and its application to <it>B. subtilis </it>and <it>A. thaliana </it>sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Klaerr-Blanchard</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chiapello</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Coward</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Comput Chem</source>
            <pubdate>2000</pubdate>
            <volume>24</volume>
            <fpage>57</fpage>
            <lpage>70</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0097-8485(99)00047-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">10642880</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Compound Poisson Approximation for Occurrences of Multiple Words in Markov Chains</p>
            </title>
            <aug>
               <au>
                  <snm>Reinert</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schbath</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>223</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9672830</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Comparison of statistical significance criteria</p>
            </title>
            <aug>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vandenbogaert</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <issue>2</issue>
            <fpage>537</fpage>
            <lpage>551</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0219720006002028</pubid>
                  <pubid idtype="pmpid" link="fulltext">16819801</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Mathematical Tools for Regulatory Signals Extraction</p>
            </title>
            <aug>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics of Genome Regulation and Structure</source>
            <publisher>Kluwer Academic Publisher</publisher>
            <editor>Kolchanov N, Hofestaedt R</editor>
            <pubdate>2004</pubdate>
            <fpage>61</fpage>
            <lpage>70</lpage>
            <note>[Preliminary version at BGRS'02].</note>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Rare events and Conditional Events on random strings</p>
            </title>
            <aug>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Denise</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>DMTCS</source>
            <pubdate>2004</pubdate>
            <volume>6</volume>
            <issue>2</issue>
            <fpage>191</fpage>
            <lpage>214</lpage>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Assessing the significance of Sets of Words</p>
            </title>
            <aug>
               <au>
                  <snm>Boeva</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cl&#233;ment</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vandenbogaert</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>CPM'05, of Lecture Notes in Computer Science</source>
            <publisher>Springer-Verlag</publisher>
            <pubdate>2005</pubdate>
            <volume>3537</volume>
            <fpage>358</fpage>
            <lpage>370</lpage>
            <note>[Proc. CPM'05, Jeju Island, Korea].</note>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Multi-seed lossless filtration</p>
            </title>
            <aug>
               <au>
                  <snm>Kucherov</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>No&#233;</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Roytberg</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proceedings of the 15th Annual Combinatorial Pattern Matching Symposium (CPM), Istanbul (Turkey), of Lecture Notes in Computer Science</source>
            <publisher>Springer Verlag</publisher>
            <editor>Sahinalp S, Muthukrishnan S, Dogrusoz U</editor>
            <pubdate>2004</pubdate>
            <volume>3109</volume>
            <fpage>297</fpage>
            <lpage>310</lpage>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Efficient String Matching</p>
            </title>
            <aug>
               <au>
                  <snm>Aho</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Corasick</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>CACM</source>
            <pubdate>1975</pubdate>
            <volume>18</volume>
            <issue>6</issue>
            <fpage>333</fpage>
            <lpage>340</lpage>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Regulation of even-skipped stripe 2 in the Drosophila embryo</p>
            </title>
            <aug>
               <au>
                  <snm>Small</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Blair</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Embo Journal</source>
            <pubdate>1992</pubdate>
            <volume>11</volume>
            <issue>13</issue>
            <fpage>4047</fpage>
            <lpage>4057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">556915</pubid>
                  <pubid idtype="pmpid">1327756</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains</p>
            </title>
            <aug>
               <au>
                  <snm>Reinert</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schbath</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>223</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9672830</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Identification of regulatory regions which confer muscle-specific gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>278</volume>
            <fpage>167</fpage>
            <lpage>81</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.1700</pubid>
                  <pubid idtype="pmpid" link="fulltext">9571041</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>An Assessment of Computational Tools for the Discovery of Transcription Factor Binding Sites</p>
            </title>
            <aug>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>De Moor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Favorov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mironov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>R&#233;gnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simonis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>van Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vandenbogaert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Workman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nature Biotechnology</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1053</pubid>
                  <pubid idtype="pmpid" link="fulltext">15637633</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Separating real motifs from their artifacts</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S30</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11472990</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
