<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-7188-1-21</ui>
   <ji>1748-7188</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>EXMOTIF: efficient structured motif extraction</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Zhang</snm>
               <fnm>Yongqiang</fnm>
               <insr iid="I1"/>
               <email>zhangy0@cs.rpi.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Zaki</snm>
               <mi>J</mi>
               <fnm>Mohammed</fnm>
               <insr iid="I1"/>
               <email>zaki@cs.rpi.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York 12180, USA</p>
            </ins>
         </insg>
         <source>Algorithms for Molecular Biology</source>
         <issn>1748-7188</issn>
         <pubdate>2006</pubdate>
         <volume>1</volume>
         <issue>1</issue>
         <fpage>21</fpage>
         <url>http://www.almob.org/content/1/1/21</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17109757</pubid>
               <pubid idtype="doi">10.1186/1748-7188-1-21</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>23</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Zhang and Zaki; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Extracting motifs from sequences is a mainstay of bioinformatics. We look at the problem of mining structured motifs, which allow variable length gaps between simple motif components. We propose an efficient algorithm, called EXMOTIF, that given some sequence(s), and a structured motif template, extracts all <it>frequent </it>structured motifs that have quorum <it>q</it>. Potential applications of our method include the extraction of single/composite regulatory binding sites in DNA sequences.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>EXMOTIF is efficient in terms of both time and space and is shown empirically to outperform RISO, a state-of-the-art algorithm. It is also successful in finding potential single/composite transcription factor binding sites.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>EXMOTIF is a useful and efficient tool in discovering structured motifs, especially in DNA sequences. The algorithm is available as open-source at: <url>http://www.cs.rpi.edu/~zaki/software/exMotif/</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>Analyzing and interpreting sequence data is an important task in bioinformatics. One critical aspect of such interpretation is to extract important motifs (patterns) from sequences. The challenges for motif extraction problem are two-fold: one is to design an efficient algorithm to enumerate the frequent motifs; the other is to statistically validate the extracted motifs and report the significant ones.</p>
         <p>Motifs can be classified into two main types. If no variable gaps are allowed in the motif, it is called a <it>simple motif</it>. For example, in the genome of <it>Saccharomyces cerevisiae</it>, the binding sites of transcription factor, GAL4, have as consensus <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, the simple motif, CGG[11,11]CCG. Here [11,11] means that there is a fixed "gap" (or don't care characters), 11 positions long. If variable gaps are allowed in a motif, it is called a <it>structured motif</it>. A structured motif can be regarded as an ordered collection of simple motifs with gap constraints between each pair of adjacent simple motifs. For example, many <it>retrotransposons </it>in the <it>Ty1-copia </it>group <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> have as consensus the structured motif:  MT[115,136]MTNTAYGG[121,151]GTNGAYGAY. Here MT, MTNTAYGG and GTNGAYGAY are three simple motifs; [115,136] and [121,151] are variable gap constraints ([minimum gap, maximum gap]) allowed between the adjacent simple motifs. More formally, a structured motif, <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, is specified in the form:</p>
         <p><it>M</it><sub>1</sub>[<it>l</it><sub>1</sub>, <it>u</it><sub>1</sub>]<it>M</it><sub>2</sub>[<it>l</it><sub>2</sub>, <it>u</it><sub>2</sub>]<it>M</it><sub>3 </sub>... <it>M</it><sub><it>k</it>-1</sub>[<it>l</it><sub><it>k</it>-1</sub>, <it>u</it><sub><it>k</it>-1</sub>]<it>M</it><sub><it>k</it></sub></p>
         <p>where <it>M</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>k</it>, is a simple motif <it>component</it>, and <it>l</it><sub><it>i </it></sub>and <it>u</it><sub><it>i </it></sub>(for 1 &#8804; <it>i </it>&lt;<it>k </it>and where 0 &#8804; <it>l</it><sub><it>i </it></sub>&#8804; <it>u</it><sub><it>i</it></sub>), are the minimum and maximum number of gaps allowed between <it>M</it><sub><it>i </it></sub>and <it>M</it><sub><it>i</it>+1</sub>, respectively. Note that a gap is defined to be the number of intervening positions after <it>M</it><sub><it>i </it></sub>but before <it>M</it><sub><it>i</it>+1</sub>. In other words, if <it>s</it><sub><it>i </it></sub>and <it>e</it><sub><it>i </it></sub>represent the start and end positions of component <it>M</it><sub><it>i</it></sub>, then for <it>i </it>&#8712; [1, <it>k </it>- 1], the number of gaps is given as <it>g</it><sub><it>i </it></sub>= <it>e</it><sub><it>i</it>+1 </sub>- <it>s</it><sub><it>i </it></sub>- 1, and we require that <it>g</it><sub><it>i </it></sub>&#8712; [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>]. The number of simple motif components, <it>k</it>, is also called the <it>length </it>of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. Let <it>W</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&lt;<it>k</it>, denote the span of the gap range, [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>], which is calculated as: <it>W</it><sub><it>i </it></sub>= <it>u</it><sub><it>i </it></sub>- <it>l</it><sub><it>i </it></sub>+ 1.</p>
         <p>In the structured motif extraction problem, the component motifs <it>M</it><sub><it>i </it></sub>are <it>unknown </it>before the extraction. However, we do provide some <it>known </it>parameters to restrict the structured motifs to be extracted, including: (i) <it>k </it>&#8211; the <it>length </it>of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>; (ii) |<it>M</it><sub><it>i</it></sub>| &#8211; the length of each component <it>M</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, for 1 &#8804; <it>i </it>&#8804; <it>k</it>; and (iii) [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>] &#8211; the gap range between <it>M</it><sub><it>i </it></sub>and <it>M</it><sub><it>i</it>+1</sub>, for 1 &#8804; <it>i </it>&lt;<it>k</it>. All these parameters define a <it>structured motif template</it>, <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>, for the structured motifs to be extracted from a set of sequences <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. A structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> matching the template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> is called an <it>instance </it>of <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>. We use <it>K </it>to denote the number of symbols (not counting gaps) in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> and use <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>[<it>j</it>] (with 1 &#8804; <it>j </it>&#8804; <it>K</it>) to denote the <it>j</it>th symbol of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>.</p>
         <p>Let <it>&#948;</it><sub><it>S </it></sub>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) denote the number of occurrences of an instance motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in a sequence <it>S </it>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. Let <it>d</it><sub><it>S </it></sub>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) = 1 if <it>&#948;</it><sub><it>S </it></sub>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) > 0 and <it>d</it><sub><it>S </it></sub>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) = 0 if <it>&#948;</it><sub><it>S </it></sub>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) = 0. The <it>support </it>of motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in the is defined as <m:math name="1748-7188-1-21-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#960;</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msub><m:mo>&#8721;</m:mo><m:mrow><m:mi>S</m:mi><m:mo>&#8712;</m:mo><m:mi mathvariant="script">S</m:mi></m:mrow></m:msub><m:mrow><m:msub><m:mi>d</m:mi><m:mi>S</m:mi></m:msub></m:mrow></m:mstyle><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaiiGacqWFapaCcqGGOaakimaacqGFZestcqGGPaqkcqGH9aqpdaaeqaqaaiabdsgaKnaaBaaaleaacqWGtbWuaeqaaaqaaiabdofatjabgIGiolab+jr8tbqab0GaeyyeIuoakiabcIcaOiab+ntinjabcMcaPaaa@47FD@</m:annotation></m:semantics></m:math>, i.e., the number of sequences in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> that contain at least one occurrence of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. The <it>weighted support </it>of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> is defined as <m:math name="1748-7188-1-21-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#960;</m:mi><m:mi>w</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msub><m:mo>&#8721;</m:mo><m:mrow><m:mi>S</m:mi><m:mo>&#8712;</m:mo><m:mi mathvariant="script">S</m:mi></m:mrow></m:msub><m:mrow><m:msub><m:mi>&#948;</m:mi><m:mi>S</m:mi></m:msub></m:mrow></m:mstyle><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaiiGacqWFapaCdaWgaaWcbaacbiGae43DaChabeaakiabcIcaOGWaaiab9ntinjabcMcaPiabg2da9maaqababaGae8hTdq2aaSbaaSqaaiabdofatbqabaaabaGaem4uamLaeyicI4Sae0NeXpfabeqdcqGHris5aOGaeiikaGIae03mH0KaeiykaKcaaa@49FC@</m:annotation></m:semantics></m:math>, i.e., total number of occurrences of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> over all sequences in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. We use <m:math name="1748-7188-1-21-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">O</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFoe=taaa@383D@</m:annotation></m:semantics></m:math> (<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) to denote the set of all occurrences of a structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. Given a user-specified quorum threshold <it>q </it>&#8805; 1, a motif that occurs at least <it>q </it>times will be called <it>frequent</it>.</p>
         <p>There are two main tasks in the structured motif extraction problem: a) <it>Common Motifs </it>&#8211; find all motifs <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in a set of sequences <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, such that the support of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> is at least <it>q</it>, b) <it>Repeated Motifs </it>&#8211; find all motifs in a single sequence <it>S</it>, such that the weighted support of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> is at least <it>q</it>. Furthermore, the structured motif extraction problem allows several variations:</p>
         <p>&#8226; <it>Substitutions</it>: <m:math name="1748-7188-1-21-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">O</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFoe=taaa@383D@</m:annotation></m:semantics></m:math> may consist of similar motifs, as measured by <it>Hamming Distance </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, instead of exact matches, to the simple motifs in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. We can either allow for at most <it>&#949;</it><sub><it>i </it></sub>errors for each simple motif <it>M</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>k</it>, or at most <it>&#949; </it>errors for the whole structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>.</p>
         <p>&#8226; <it>Overlapping Components</it>: The variable gap constraints (<it>l</it><sub><it>i </it></sub>and <it>u</it><sub><it>i</it></sub>) can take on a limited range of <it>negative </it>values, allowing search for overlapping simple motifs. We allow two adjacent components <it>M</it><sub><it>i </it></sub>and <it>M</it><sub><it>i</it>+1 </sub>to overlap, but we require that <it>M</it><sub><it>i</it>+1 </sub>does not precede <it>M</it><sub><it>i</it></sub>. This condition can be satisfied by the following constraints on the gap range [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>]: -|<it>M</it><sub><it>i</it></sub>| &#8804; <it>l</it><sub><it>i </it></sub>&#8804; <it>u</it><sub><it>i</it></sub>, for <it>i </it>&#8712; [l, <it>k</it>). For example the search for motif template NNN[-2,2]NNN (where 'N' stands for any of the four DNA bases: A,C,G,T), may discover the pattern ACG[-2,2]CGA, representing an overlapped occurrence, ACGA, as well as a non-overlapped occurrence, ACG--CGA, at the two extremes of the gap range.</p>
         <p>&#8226; <it>Motif Length Ranges</it>: Each simple motif <it>M</it><sub><it>i </it></sub>in a template <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> can be of a range of lengths, i.e., |<it>M</it><sub><it>i</it></sub>| &#8712; [<it>l</it><sub><it>a</it></sub>, <it>l</it><sub><it>b</it></sub>], where <it>l</it><sub><it>a </it></sub>and <it>l</it><sub><it>b </it></sub>are the lower and upper bounds on the desired length.</p>
         <p>Table <tblr tid="T1">1</tblr> shows four example DNA sequences <it>S</it><sub>1</sub>, <it>S</it><sub>2</sub>, <it>S</it><sub>3</sub>, <it>S</it><sub>4 </sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>; a structured motif template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>, where <it>M</it><sub>1 </sub>= NNN, <it>M</it><sub>2 </sub>= NN and <it>M</it><sub>3 </sub>= NNNN, and [0,3] and [1,3] are the intervening gap ranges between the components; and a quorum threshold <it>q </it>= 2. The length of the template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> is <it>k </it>= 3 and the number of symbols in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> is <it>K </it>= 3 + 2 + 4 = 9. The span of gap ranges are: <it>W</it><sub>1 </sub>= <it>u</it><sub>1 </sub>- <it>l</it><sub>1 </sub>+ 1 = 2 and <it>W</it><sub>2 </sub>= <it>u</it><sub>2 </sub>- <it>l</it><sub>2 </sub>+ 1 = 2. If no substitutions are allowed, there are five frequent structured motifs in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> matching the template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>, namely <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>1 </sub>= CCG[0,3]TA[1,3]GAAC (shown in bold) and <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>2 </sub>= CCG[0,3]TA[1,3]AACC which occur in <it>S</it><sub>1 </sub>and <it>S</it><sub>2</sub>; <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>3 </sub>= TAT[0,3]GG[1,3]ACCA (shown underlined), <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>4 </sub>= TAT[0,3]GA[1,3]CCAT and <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>5 </sub>= TAT[0,3] GG[1,3]CCAT which occur in <it>S</it><sub>2 </sub>and <it>S</it><sub>3</sub>. If substitutions are allows, say, <it>e</it><sub>1 </sub>= 1 = <it>e</it><sub>3</sub>, then the occurrence of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>6 </sub>= TAA[0,3]GG[1,3] CCCT (shown underlined) in <it>S</it><sub>4 </sub>will be considered to match motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><sub>5</sub>.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Structured motif extraction.</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>Sequence <it>S</it><sub>1 </sub>(&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>):</p>
                  </c>
                  <c ca="left">
                     <p><b>CCGTA</b>CC<b>GAA</b>CCTCAAA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Sequence <it>S</it><sub>2 </sub>(&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>):</p>
                  </c>
                  <c ca="left">
                     <p><b>CCG</b>T<b><ul>TAT</ul>A</b><ul>G</ul><b><ul>G</ul>A<ul>AC</ul></b><ul>CA</ul>TT</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Sequence <it>S</it><sub>3 </sub>(&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>):</p>
                  </c>
                  <c ca="left">
                     <p><ul>TAT</ul><ul>GG</ul>A<ul>ACCA</ul>TCTT</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Sequence <it>S</it><sub>4 </sub>(&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>):</p>
                  </c>
                  <c ca="left">
                     <p><ul>TAA</ul>C<ul>GG</ul>AT<ul>CCCT</ul>TT</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Structured Motif Template (<m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>):</p>
                  </c>
                  <c ca="left">
                     <p>NNN[0,3]NN[1,3]NNNN</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Quorum (<it>q</it>):</p>
                  </c>
                  <c ca="left">
                     <p>2</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>In this paper, we propose EXMOTIF, an efficient algorithm for both the structured motif extraction problems. It uses an inverted index of symbol positions, and it enumerates all structured motifs by <it>positional joins </it>over this index. The variable gap constraints are also considered at the same time as the joins, resulting in considerable efficiency. In order to save time and space, we only keep the start positions of each intermediate pattern during the positional join.</p>
      </sec>
      <sec>
         <st>
            <p>Related work</p>
         </st>
         <p>Many simple motif extraction algorithms have been proposed primarily for extracting the transcription factor binding sites, where each motif consists of a unique binding site <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> or two binding sites separated by a fixed number of gaps <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. A pattern with a single component is also called a <it>monad pattern</it>. Structured motif extraction problems, in which variable number of gaps are allowed, have attracted much attention recently, where the structured motifs can be extracted either from multiple sequences <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> or from a single sequence <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. In many cases, more than one transcription factor may cooperatively regulate a gene. Such patterns are called <it>composite regulatory patterns</it>. To detect the composite regulatory patterns, one may apply single binding site identification algorithms to detect each component separately. However, this solution may fail when some components are not very strong (significant). Thus it is necessary to detect the whole composite regulatory patterns (even with weak components) directly, whose gaps and other possibly strong components can increase its significance.</p>
         <p>Several algorithms have been used to address the composite pattern discovery with two components, which are called <it>dyad patterns</it>. Helden et al. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> propose a method for dyad analysis, which exhaustively counts the number of occurrences of each possible pair of patterns in the sequences and then assesses their statistical significance. This method can only deal with fixed number of gaps between the two components. MITRA <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> first casts the composite pattern discovery problem as a larger monad discovery problem and then applies an exhaustive monad discovery algorithm. It can handle several mismatches but can only handle sequences less than 60 kilo-bases long. Co-Bind <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> models composite transcription factors with Position Weight Matrices (PWMs) and finds PWMs that maximize the joint likelihood of occurrences of the two binding site components. Co-Bind uses Gibbs sampling to select binding sites and then refines the PWMs for a fixed number of times. Co-Bind may miss some binding sites since not all patterns in the sequences are considered. Moreover, using a fixed number of iterations for improvement may not converge to the global optimal dyad PWM.</p>
         <p>SMILE <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> describes four variants of increasing generality for common structured motif extraction, and proposes two solutions for them. The two approaches for the first problem, in which the structured motif template consists of two components with a gap range between them, both start by building a generalized suffix tree for the input sequences and extracting the first component. Then in the first approach, the second component is extracted by simply jumping in the sequences from the end of the first one to the second within the gap range. In the second approach, the suffix tree is temporarily modified so as to extract the second component from the modified suffix tree directly. The drawback of SMILE is that its time and space complexity are exponential in the number of gaps between the two components. In order to reduce the time during the extraction of the structured motifs, <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> presents a parallel algorithm, PSmile, based on SMILE, where the search space is well-partitioned among the available processors.</p>
         <p>RISO <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> improves SMILE in two aspects. First, instead of building the whole suffix tree for the input sequences, RISO builds a suffix tree only up to a certain level <it>l</it>, called a <it>factor tree</it>, which leads to a large space saving. Second, a new data structure called <it>box-link </it>is proposed to store the information about how to jump within the DNA sequences from one simple component (box) to the subsequent one in the structured motif. This accelerates the extraction process and avoids exponential time and space consumption (in the gaps) as in SMILE. In RISO, after the generalized factor tree is built, the box-links are constructed by exhaustively enumerating all the possible structured motifs in the sequences and are added to the leaves of the factor tree. Then the extraction process begins during which the factor tree may be temporarily and partially modified so as to extract the subsequent simple motifs. Since during the box-link construction, the structured motif occurrences are exhaustively enumerated and the frequency threshold is never used to prune the candidate structured motifs, RISO needs a lot of computation during this step.</p>
         <p>For repeated structured motif identification problem, the frequency closure property that "all the subsequences of a frequent sequence must be frequent", doesn't hold any more since the frequency of a pattern can exceed the frequency of its sub-patterns. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> introduces an closure-like property which can help prune the patterns without missing the frequent patterns. The two algorithms proposed in <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> can extract within one sequence all frequent patterns of length no greater than a length threshold, which can be either manually specified or automatically determined. However, this method requires that all the gap ranges [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>], between adjacent <it>symbols </it>in the structured motif be the same, i.e., [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>] = [<it>l</it>, <it>u</it>] for all <it>i </it>&#8712; [1, <it>k </it>- 1]. Moreover, approximate matches are not allowed for the structured motif.</p>
      </sec>
      <sec>
         <st>
            <p>The EXMOTIF algorithm</p>
         </st>
         <p>We first introduce our basic approach for common structured motif extraction problem. We then successively optimize it for various practical scenarios.</p>
         <sec>
            <st>
               <p>The basic approach</p>
            </st>
            <p>Let's assume that we are extracting all structured motif instances from <it>n </it>sequence <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> = {<it>S</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>n</it>}, each of which satisfies the template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> and occurs at least in <it>q </it>sequences of <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. We assume for the moment that no substitutions are allowed in any of the simple motifs. We also assume that all <it>S</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, 1 &#8804; <it>i </it>&#8804; <it>n </it>and the extracted motifs are over the DNA alphabet, &#931;<sub>DNA</sub>. EXMOTIF first converts each <it>S</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, 1 &#8804; <it>i </it>&#8804; <it>n </it>into an equivalent inverted format <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, where we associate with each symbol in the sequence <it>S</it><sub><it>i </it></sub>its <it>pos-list</it>, a <it>sorted </it>list of the positions where the symbol occurs in <it>S</it><sub><it>i</it></sub>. Then for each symbol we combine its pos-list in each <it>S</it><sub><it>i </it></sub>to obtain its pos-list in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. More formally, for a symbol <it>X </it>&#8712; &#931;<sub>DNA</sub>, its pos-list in <it>S</it><sub><it>i </it></sub>is given as <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>) = {<it>j </it>| <it>S</it><sub><it>i</it></sub>[<it>j</it>] = <it>X</it>, <it>j </it>&#8712; [1, |<it>S</it><sub><it>i</it></sub>|]}, where <it>S</it><sub><it>i</it></sub>[<it>j</it>] is the symbol at position <it>j </it>in <it>S</it><sub><it>i</it></sub>, and |<it>S</it><sub><it>i</it></sub>| denotes the length of <it>S</it><sub><it>i</it></sub>. Its pos-list across all sequences <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> is obtained by grouping the pos-lists of each sequence, and is given as <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>) = {&#10216; <it>i</it>, | <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>)|, <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>)&#10217; | <it>S</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>}, where <it>i </it>is the <it>sequence identifier </it>of <it>S</it><sub><it>i</it></sub>, and | <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>)| denotes the cardinality of the pos-list <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>) in sequence <it>S</it><sub><it>i</it></sub>. For our example sequences in Table <tblr tid="T1">1</tblr>, the pos-list for each DNA base is given in Table <tblr tid="T2">2</tblr>. For example, A occurs in sequence <it>S</it><sub>1 </sub>at the positions {5, 9, 10, 15, 16, 17}, thus the entries in A's pos-list are {<b>1</b>, <b>6</b>, 5, 9, 10, 15, 16, 17}.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Pos-lists.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>X</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>pos-lists</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="left">
                        <p>{<b>1</b>,<b>6</b>,5,9,10,15,16,17, <b>2</b>,<b>5</b>,6,8,11,12,15, <b>3</b>,<b>4</b>,2,6,7,10, <b>4</b>,<b>3</b>,2,3,7}</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C</p>
                     </c>
                     <c ca="left">
                        <p>{<b>1</b>,<b>7</b>,1,2,6,7,11,12,14, <b>2</b>,<b>4</b>,1,2,13,14, <b>3</b>,<b>3</b>,8,9,12, <b>4</b>,<b>4</b>,4,9,10,11}</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>G</p>
                     </c>
                     <c ca="left">
                        <p>{<b>1</b>,<b>2</b>,3,8, <b>2</b>,<b>3</b>,3,9,10, <b>3</b>,<b>2</b>,4,5, <b>4</b>, <b>2</b>,5,6}</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>T</p>
                     </c>
                     <c ca="left">
                        <p>{<b>1</b>,<b>2</b>,4,13, <b>2</b>,<b>5</b>,4,5,7,16,17 <b>3</b>,<b>5</b>,1,3,11,13,14, <b>4</b>,<b>5</b>,1,8,12,13,14}</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Sequence identifiers (<it>i</it>) and cardinality of <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>) are marked in bold.</p>
               </tblfn>
            </tbl>
            <sec>
               <st>
                  <p>Positional joins</p>
               </st>
               <p>We first extend the notion of pos-lists to cover structured motifs. The pos-list of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in <it>S</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> is given as the set of start positions of all the matches of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in <it>S</it><sub><it>i</it></sub>. Let <it>X</it>, <it>Y </it>&#8712; &#931;<sub>DNA </sub>be any two symbols, and let <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = <it>X</it>[<it>l</it>, <it>u</it>]<it>Y </it>be a structured motif. Given the pos-lists of <it>X </it>and <it>Y </it>in <it>S</it><sub><it>i </it></sub>for 1 &#8804; <it>i </it>&#8804; <it>n</it>, namely, <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>) and <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>), the pos-list of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in <it>S</it><sub><it>i </it></sub>can be obtained by a positional join as follows: for a position <it>x </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>), if there exists a position <it>y </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>), such that <it>l </it>&#8804; <it>y </it>- <it>x </it>- 1 &#8804; <it>u</it>, it means that <it>Y </it>follows <it>X </it>within the variable gap range [<it>l</it>, <it>u</it>] in the sequence <it>S</it><sub><it>i</it></sub>, and thus we can add <it>x </it>to the pos-list of motif <it>X</it>[<it>l</it>, <it>u</it>]<it>Y</it>. Let <it>d </it>be the number of gaps between <it>x </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>) and <it>y </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>), given as <it>d </it>= <it>y </it>- <it>x </it>- 1.</p>
               <p>Then, in general, there are three cases to consider in the positional join algorithm:</p>
               <p>&#8226; <it>d </it>&lt;<it>l</it>: Advance <it>y </it>to the next element in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>).</p>
               <p>&#8226; <it>d </it>> <it>u</it>: Advance <it>x </it>to the next element in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>).</p>
               <p>&#8226; <it>l </it>&#8804; <it>d </it>&#8804; <it>u</it>: Save this occurrence in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>[<it>l</it>, <it>u</it>]<it>Y</it>, <it>S</it><sub><it>i</it></sub>), and then advance <it>x </it>to the next element in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>).</p>
               <p>The pos-list for <it>X</it>[<it>l</it>, <it>u</it>]<it>Y </it>can be computed in time linear in the lengths of <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>) and <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>), i.e., the complexity of a positional join is <it>O</it>(|<m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>)| + |<m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>)|). In essence, each time we advance <it>x </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub><it>i</it></sub>), we check if there exists a <it>y </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub><it>i</it></sub>) that satisfies the given gap constraint. Instead of searching for the matching <it>y </it>from the beginning of the pos-list each time, we search from the last position used to compare with <it>x</it>. This results in fast positional joins. For example, during the positional join for the motif A[0,1]T in <it>S</it><sub>4</sub>, with <it>l </it>= 0 and <it>u </it>= 1, we scan the pos-lists of A and T for <it>S</it><sub>4 </sub>in Table <tblr tid="T2">2</tblr>, i.e. <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>X</it>, <it>S</it><sub>4</sub>) = {2, 3, 7} and <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>Y</it>, <it>S</it><sub>4</sub>) = {1, 8, 12, 13, 14}. Initially, <it>x </it>= 2 and <it>y </it>= 1. This gives <it>d </it>= 1 - 2 - 1 = - 2 &lt;<it>l</it>, thus we advance <it>y </it>to 8. Next, <it>d </it>= 8 - 2 - 1 = 5 > <it>u</it>, thus we advance <it>x </it>to 3.  Then, <it>d </it>= 8 - 3 - 1 = 4 > <it>u</it>, thus we advance <it>x </it>to 7. Next, <it>d </it>= 8 - 7 - 1 = 0 &#8712; [<it>l</it>, <it>u</it>], so we store <it>x </it>= 7 in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>A</it>[0, 1]<it>T</it>, <it>S</it><sub>4</sub>). We would advance <it>x </it>but since we have already reached the end of <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>A</it>, <it>S</it><sub>4</sub>), the positional join stops. Thus the final pos-list of A[0,1]T in <it>S</it><sub>4 </sub>is: <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>A</it>[0, 1]<it>T</it>, <it>S</it><sub>4</sub>) = {7}. After we obtain the pos-list of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in each <it>S</it><sub><it>i </it></sub>for 1 &#8804; <it>i </it>&#8804; <it>n</it>, we can combine them together to obtain the pos-list of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. For example, the full pos-list of A[0,1]T for <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> is: {<b>2</b>, <b>2</b>, 6, 15, <b>3</b>, <b>2</b>, 2, 10, <b>4, 1</b>, 7}. Thus the support of A[0,1]T is 3. Note here for each non-empty pos-list, we insert its sequence identifier and length before it. The pseudo-code for the positional joins for a given sequence <it>S</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> is shown in Figure <figr fid="F1">1</figr>. The full pos-list is obtained by concatenating the pos-lists from each sequence <it>S</it><sub><it>i</it></sub>.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>Positional Joins Algorithm</p>
                  </caption>
                  <text>
                     <p>Positional Joins Algorithm.</p>
                  </text>
                  <graphic file="1748-7188-1-21-1"/>
               </fig>
               <p>Given a longer motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, the positional joins start with the last two symbols, and proceed by successively joining the pos-list of the current symbol with the intermediate pos-list of the suffix. That is, the intermediate pos-list for a <it>(l+1)</it>-length pattern (with <it>l </it>&#8805; 1) is obtained by doing a positional join of the pos-list of the pattern's first symbol, called the <it>head symbol</it>, with the pos-list of its <it>l</it>-length suffix, called the <it>tail</it>. As the computation progresses the previous tail pos-lists are discarded. Combined with the fact that only start positions are kept in a pos-list, this saves both time and space.</p>
               <p>In order to enumerate all frequent motifs instances <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>, EXMOTIF computes the pos-list for each <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> and report <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> only if its support is no less than the quorum (<it>q</it>). A straightforward approach is to directly perform positional joins on the symbols from the end to the start for each <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. This approach leads to much redundant computation since simple motif components may be shared among several structured motifs. EXMOTIF, in contrast, performs two steps: it first computes the pos-lists for all simple motifs in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> by doing positional joins on pos-lists of its symbols, and it then computes the pos-list for each structured motif by doing positional joins on pos-lists of its simple motif components. EXMOTIF handles both simple and structured motifs uniformly, by adding the gap range [0, 0] between adjacent symbols within each simple motif <it>M</it><sub><it>i</it></sub>. For our example in Table <tblr tid="T1">1</tblr>, the structured motif template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> becomes: N[0,0]N[0,0]N[0,1]N[0,0]N[2,3]N[0,0]N[0,0]N[0,0]N. Also since we only report frequent motifs, we can prune the candidate patterns during the positional joins based on the closure property of support (note however that this cannot be done for weighted support).</p>
            </sec>
            <sec>
               <st>
                  <p>Extraction of the simple motifs</p>
               </st>
               <p>Given a template motif <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>, we know the lengths of the simple motif components desired. A naive approach is to directly do positional joins on the symbols from the end to the start of each simple motif. However, since some simple motifs are of the same length and the longer simple motifs can be obtained by doing positional joins on the shorter simple motifs/symbols, we can avoid some redundant computation. Note also that the gap range inside the simple motif is always [0,0].</p>
               <p>Let <m:math name="1748-7188-1-21-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8466;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFsectaaa@376D@</m:annotation></m:semantics></m:math> = {<it>L</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>m</it>}, where <it>L</it><sub><it>i </it></sub>is the length of each simple motif in <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> and assume <m:math name="1748-7188-1-21-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8466;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFsectaaa@376D@</m:annotation></m:semantics></m:math> is sorted in the ascending order. For each <it>L</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>m</it>, we need to enumerate <m:math name="1748-7188-1-21-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msup><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>&#931;</m:mi><m:mrow><m:mtext>DNA</m:mtext></m:mrow></m:msub></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow><m:mrow><m:msub><m:mi>L</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaabdaqaaiabfo6atnaaBaaaleaacqqGebarcqqGobGtcqqGbbqqaeqaaaGccaGLhWUaayjcSdWaaWbaaSqabeaacqWGmbatdaWgaaadbaGaemyAaKgabeaaaaaaaa@3799@</m:annotation></m:semantics></m:math> possible simple motifs. Let <m:math name="1748-7188-1-21-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>m</m:mi><m:mi>a</m:mi><m:msub><m:mi>x</m:mi><m:mi>&#8466;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaieGacqWFTbqBcqWFHbqycqWF4baEdaWgaaWcbaacdaGae4NeHWeabeaaaaa@3BBF@</m:annotation></m:semantics></m:math> be the maximum length in <m:math name="1748-7188-1-21-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8466;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFsectaaa@376D@</m:annotation></m:semantics></m:math>. We can compute the pos-lists of simple motifs sequentially from length 1 to <m:math name="1748-7188-1-21-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>m</m:mi><m:mi>a</m:mi><m:msub><m:mi>x</m:mi><m:mi>&#8466;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaieGacqWFTbqBcqWFHbqycqWF4baEdaWgaaWcbaacdaGae4NeHWeabeaaaaa@3BBF@</m:annotation></m:semantics></m:math>. But this may waste time in enumerating some simple motifs of lengths that are not in <m:math name="1748-7188-1-21-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8466;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFsectaaa@376D@</m:annotation></m:semantics></m:math>. Instead, EXMOTIF first computes the pos-lists for the simple motifs of lengths that are powers of 2. Formally, let <it>J </it>be an integer such that 2<sup><it>J </it></sup>&#8804; <m:math name="1748-7188-1-21-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>m</m:mi><m:mi>a</m:mi><m:msub><m:mi>x</m:mi><m:mi>&#8466;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaieGacqWFTbqBcqWFHbqycqWF4baEdaWgaaWcbaacdaGae4NeHWeabeaaaaa@3BBF@</m:annotation></m:semantics></m:math> &lt; 2<sup><it>J</it>+1</sup>. We extract the patterns of length 2<sup><it>j </it></sup>by doing positional joins on the <it>pos-lists </it>of patterns of length 2<sup><it>j</it>-1 </sup>for all 1 &#8804; <it>j </it>&#8804; <it>J</it>. For example, when <m:math name="1748-7188-1-21-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>m</m:mi><m:mi>a</m:mi><m:msub><m:mi>x</m:mi><m:mi>&#8466;</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaieGacqWFTbqBcqWFHbqycqWF4baEdaWgaaWcbaacdaGae4NeHWeabeaaaaa@3BBF@</m:annotation></m:semantics></m:math> = 11, EXMOTIF first computes the pos-lists for simple motifs of length 2<sup>0 </sup>= 1, 2<sup>1 </sup>= 2, 2<sup>2 </sup>= 4 and 2<sup>3 </sup>= 8.</p>
               <p>EXMOTIF then computes the pos-lists for the simple motifs of <it>L</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8466;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFsectaaa@376D@</m:annotation></m:semantics></m:math>, by doing positional joins on simple motifs whose <it>pos-list</it>(s) have already been computed and their lengths sum to <it>L</it><sub><it>i</it></sub>. For example, when <it>L</it><sub><it>i </it></sub>= 11, EXMOTIF has to join motifs of lengths 8, 2, and 1. It first obtains all motifs of length 8 + 2 = 10, and then joins the motifs of lengths 10 and 1, to get the pos-lists of all simple motifs of length 10 + 1 = 11. The pos-lists for the simple motifs of length <it>L</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8466;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFsectaaa@376D@</m:annotation></m:semantics></m:math> are kept for further use in the structured motif extraction. At the end of the first phase, EXMOTIF has computed the pos-lists for all simple motif components that can satisfy the template.</p>
            </sec>
            <sec>
               <st>
                  <p>Extraction of the structured motifs</p>
               </st>
               <p>We extract the structured motifs by doing positional joins on the pos-lists of the simple motifs from the end to the start in the structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. Formally, let <it>H</it>[<it>l</it>, <it>u</it>]<it>T </it>be an intermediate structured motif, with simple motif <it>H </it>as the head, and a suffix structured motif <it>T </it>as tail. Then <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>H</it>[<it>l</it>, <it>u</it>]<it>T</it>) can be obtained by doing positional joins on <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>H</it>) and <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>T</it>). Since <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>H</it>) keeps only the start positions, we need to compute the corresponding end positions for those occurrences of <it>H</it>, to check the gap constraints. Since only exact matches or substitutions are allowed for simple motifs, the end position is simply <it>s </it>+ |<it>H</it>| - 1 for a start position <it>s</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Full-position recovery</p>
               </st>
               <p>In our positional join approach, to save time and space we retain only the motif start positions, however, in some applications, we may need to know the full position of each occurrence, i.e., the set of matching positions for each symbol in the motif. EXMOTIF records some "indices" during the positional joins in order to facilitate full position recovery.</p>
               <p>For each suffix of a structured motif, <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, starting at position <it>i </it>with 1 &#8804; <it>i </it>&#8804; |<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>|, we keep its pos-list, <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it></sub>, and an index list, <m:math name="1748-7188-1-21-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">N</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFneVtaaa@383B@</m:annotation></m:semantics></m:math><sub><it>i</it></sub>. For each entry, say <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it></sub>[<it>j</it>], in the pos-list <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it></sub>, the corresponding index entry <m:math name="1748-7188-1-21-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">N</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFneVtaaa@383B@</m:annotation></m:semantics></m:math><sub><it>i</it></sub>[<it>j</it>], points to the first entry, say <it>f</it>, in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it>+1 </sub>that satisfies the gap range with respect to <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it></sub>[<it>j</it>], i.e., <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it>+1</sub>[<it>f</it>] - <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it></sub>[<it>j</it>] - 1 &#8712; [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>]. Note that <m:math name="1748-7188-1-21-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi mathvariant="script">N</m:mi><m:mrow><m:mrow><m:mo>|</m:mo><m:mi>&#8499;</m:mi><m:mo>|</m:mo></m:mrow></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFneVtdaWgaaWcbaWaaqWaaeaacqWFZestaiaawEa7caGLiWoaaeqaaaaa@3CAF@</m:annotation></m:semantics></m:math> is never used. Also note that <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) = <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub>1</sub>. Let <it>s </it>be a start position for the structured motif in sequence <it>S</it>, and let <it>s </it>be the <it>j</it><sub><it>s</it></sub>-th entry in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub>1</sub>, i.e., <it>s </it>= <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub>1</sub>[<it>j</it><sub><it>s</it></sub>]. Let <it>F </it>store a full position starting from <it>s</it>, and let <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> store the set of all full positions. Figure <figr fid="F2">2</figr> shows the pseudo-code for recovering full positions starting from <it>s</it>. This recursive algorithm has four parameters: <it>i </it>denotes a (suffix) position in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, <it>j </it>gives the <it>j</it>-th entry in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it></sub>, <it>F </it>denotes an intermediate full position, and <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> denotes the set of all the full occurrences. The algorithm is initially called with <it>i </it>= 2, <it>j </it>= <m:math name="1748-7188-1-21-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">N</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFneVtaaa@383B@</m:annotation></m:semantics></m:math><sub>1</sub>[<it>j</it><sub><it>s</it></sub>], <it>F </it>= {<it>s</it>}, and <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = &#8709;. Starting at the first index in <it>P</it><sub><it>i</it></sub>, that satisfies the gap range with respect to the last position in <it>F</it>, we continue to compute all such positions <it>j</it>' &#8712; [<it>j</it>, |<it>P</it><sub><it>i</it></sub>|] that satisfy the gap range (line 3).  That is, we find all positions <it>j</it>', such that <it>P</it><sub><it>i</it></sub>[<it>j</it>'] - <it>F</it>[<it>i </it>- 1] - 1 = <it>d </it>&#8712; [<it>l</it><sub><it>i</it></sub>, <it>u</it><sub><it>i</it></sub>]. For each such position <it>j</it>', we add it in turn to the intermediate full position, and make another recursive call (line 5), passing the first index position <it>N</it><sub><it>i</it></sub>[<it>j</it>'] in <it>P</it><sub><it>i</it>+1 </sub>that can satisfy the gap range with respect to <it>P</it><sub><it>i</it></sub>[<it>j</it>']. Thus in each call we keep following the indices from one pos-list to the next, to finally obtain a full position starting from <it>s </it>when we reach the last pos-list, <m:math name="1748-7188-1-21-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mtext mathvariant="script">P</m:mtext><m:mrow><m:mrow><m:mo>|</m:mo><m:mi>&#8499;</m:mi><m:mo>|</m:mo></m:mrow></m:mrow></m:msub></m:mrow></m:semantics></m:math>. Note that at each suffix position <it>i</it>, since <it>j </it>only marks the first position in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math><sub><it>i</it>+1 </sub>that satisfies the gap constraints, we also need to consider all the subsequent positions <it>j</it>' > <it>j </it>that may satisfy the corresponding gap range.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>Indexed Full Position Recovery Algorithm</p>
                  </caption>
                  <text>
                     <p>Indexed Full Position Recovery Algorithm.</p>
                  </text>
                  <graphic file="1748-7188-1-21-2"/>
               </fig>
               <p>Consider the example shown in Fig. <figr fid="F3">3</figr> to recover the full positions for <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = CCG[0,3]TA[1,3]GAAC. Under each symbol we show two columns. The left column corresponds to the intermediate pos-lists as we proceed from right to left, whereas the right column stores the indices into the previous pos-list. For example, the middle column gives the pos-list <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>TA</it>[1,3]<it>GAAC</it>) = {<b>1</b>, <b>1</b>, 4, <b>2</b>, <b>2</b>, 5, 7, <b>3</b>, <b>1</b>, 1}. For each position <it>x </it>&#8712; <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>TA</it>[l,3]<it>GAAC</it>) (excluding the sequence identifiers and the cardinality), the right column records an index in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>GAAC</it>) which corresponds to the first position in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>GAAC</it>) that satisfies the gap range with respect to <it>x</it>. For example, for position <it>x </it>= 5 (at index 6), the first position in <m:math name="1748-7188-1-21-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext></m:semantics></m:math>
(<it>GAAC</it>) that satisfies the gap range [1,3] is 10 (since in this case there are 3 gaps between the end of TA at position 6 and start of GAAC at position 10), and it occurs at index 6. Likewise, for each position in the current pos-list we store which positions in the previous pos-list were extended. With this indexed information, full-position recovery becomes straightforward. We begin with the start positions of the occurrences. We then keep following the indices from one pos-list to the next, until we reach the last pos-list. Since the index only marks the first position that satisfies the gap range, we still need to check if the following positions satisfy the gap range. At each stage in the full position recovery, we maintain a list of intermediate position prefixes <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> that match up to the <it>j</it>-th position in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. For example, to recover the full position for <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = CCG[0,3]TA[1,3]GAAC, considering start position 1 (with <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = {(1)}) in sequence 2, we follow index 6 to get position 5 in the middle pos-list, to get <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = {(1, 5)}. Since the next position after 5 is 7 which is also within the gap range [0,3], so we update <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = {(1, 5), (1, 7)}. For position 5, we follow index 6 to get position 10 in the rightmost pos-list, to get <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = {(1, 5, 10)}; for position 7, we follow index 6 to get position 10 in the right pos-list, to get <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = {(1, 7, 10)}. Likewise, we can recover the full-position in sequence 1, which is <m:math name="1748-7188-1-21-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8497;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFXeIraaa@3787@</m:annotation></m:semantics></m:math> = {(1, 4, 8)}. During the full-position recovery, we can also count the number of full-positions, i.e., occurrences, of each structured motif. For example, there are 3 occurrences of CCG[0,3]TA[1,3]GAAC.</p>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>Indexed Full-position Recovery Example</p>
                  </caption>
                  <text>
                     <p>Indexed Full-position Recovery Example.</p>
                  </text>
                  <graphic file="1748-7188-1-21-3"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Length ranges for simple motifs</p>
               </st>
               <p>EXMOTIF also allows variation in the lengths of the simple motifs to be found. For example, a motif template may be specified as <it>M</it><sub>1</sub>[5,10] <it>M</it><sub>2</sub>, |<it>M</it><sub>1</sub>| &#8712; [2,4], and |<it>M</it><sub>2</sub>| &#8712; [6,7], which means that we have to consider NN, NNN, and NNNN as the possible templates for <it>M</it><sub>1 </sub>and similarly for <it>M</it><sub>2</sub>. A straightforward way for handling length ranges is to enumerate exhaustively all the possible sub-templates of <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> with simple motifs of fixed lengths and then to extract each sub-template separately. Instead, EXMOTIF does an optimized extraction. EXMOTIF reuses the partial pos-lists created when using a depth first search to enumerate and extract the sub-templates.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Handling substitutions</p>
            </st>
            <p>As mutations are a common phenomena in biological sequences, we allow substitutions in the extracted motifs. That is two motif instances may be considered to be the same if they are within the allowed substitution thresholds. EXMOTIF allows users to specify the number of substitutions allowed for the whole motif (<it>&#949;</it>), and also a per simple motif threshold (<it>&#949;</it><sub><it>i</it></sub>, <it>i </it>&#8712; [1, <it>k</it>]). There are two types of substitutions we consider.</p>
            <sec>
               <st>
                  <p>Position-specific substitutions</p>
               </st>
               <p>Here we allow a position (a DNA symbol) in the instance motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> to be substituted with 1 or 2 other DNA symbols. All such neighbors will contribute to the frequency of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. For example, for <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = <it>ACG</it>[4,6]<it>TT</it>, if we allow <it>e</it><sub>1 </sub>= 1 substitutions in motif <it>M</it><sub><it>1 </it></sub>= <it>ACG</it>, at position 2, then <it>AAG</it>[4,6]<it>TT</it>, <it>ACG</it>[4,6]<it>TT </it>or <it>AGG</it>[4,6]<it>TT </it>may contribute to the frequency of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>. Instead of enumerating all of these separately, EXMOTIF can directly mine relevant motifs using IUPAC symbols (see Table <tblr tid="T3">3</tblr>). EXMOTIF simply constructs the pos-lists for the relevant IUPAC symbols by scanning sequences in <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math> once. Then it mines the motif instances as in the basic approach, since all allowed substitutions have already been incorporated into the relevant IUPAC symbols. Let <it>v</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>k</it>, to denote the set of IUPAC symbols that can appear in the motif. When <it>v</it><sub><it>i </it></sub>= 1 (i.e., each position allows only 1 DNA symbol), the alphabet used is {A, C, G, T}; when <it>v</it><sub><it>i </it></sub>= 2 (i.e., each position may allow up to 2 DNA symbols), the expanded alphabet is {A, C, G, T, R, Y, K, M, S, W}; and when <it>v</it><sub><it>i </it></sub>= 3 (i.e., each position may allow up to 3 DNA symbols), the expanded alphabet is {A, C, G, T, R, Y, K, M, S, W, B, D, H, V}. For example, when <it>v</it><sub>1 </sub>= 2, instead of reporting <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = <it>ACG</it>[4,6]<it>TT </it>as the mined instance, EXMOTIF may report <it>ASG</it>[4,6]<it>TT </it>as an instance, where S stands for either C or G (see Table <tblr tid="T3">3</tblr>). EXMOTIF also allows the user to specify the maximum number of IUPAC symbols that can appear in each simple motif, <it>e</it><sub><it>i</it></sub>, 1 &#8804; <it>i </it>&#8804; <it>k</it>.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>IUPAC alphabet (&#931;<sub>IUPAC</sub>).</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="center">
                           <p>Symbol</p>
                        </c>
                        <c ca="center">
                           <p>A</p>
                        </c>
                        <c ca="center">
                           <p>C</p>
                        </c>
                        <c ca="center">
                           <p>G</p>
                        </c>
                        <c ca="center">
                           <p>T</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Bases</p>
                        </c>
                        <c ca="center">
                           <p>A</p>
                        </c>
                        <c ca="center">
                           <p>C</p>
                        </c>
                        <c ca="center">
                           <p>G</p>
                        </c>
                        <c ca="center">
                           <p>T</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Symbol</p>
                        </c>
                        <c ca="center">
                           <p>U</p>
                        </c>
                        <c ca="center">
                           <p>R</p>
                        </c>
                        <c ca="center">
                           <p>Y</p>
                        </c>
                        <c ca="center">
                           <p>K</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Bases</p>
                        </c>
                        <c ca="center">
                           <p>U</p>
                        </c>
                        <c ca="center">
                           <p>A,G</p>
                        </c>
                        <c ca="center">
                           <p>C,T</p>
                        </c>
                        <c ca="center">
                           <p>G,T</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Symbol</p>
                        </c>
                        <c ca="center">
                           <p>M</p>
                        </c>
                        <c ca="center">
                           <p>S</p>
                        </c>
                        <c ca="center">
                           <p>W</p>
                        </c>
                        <c ca="center">
                           <p>B</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Bases</p>
                        </c>
                        <c ca="center">
                           <p>A,C</p>
                        </c>
                        <c ca="center">
                           <p>G,C</p>
                        </c>
                        <c ca="center">
                           <p>A,T</p>
                        </c>
                        <c ca="center">
                           <p>C,G,T</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Symbol</p>
                        </c>
                        <c ca="center">
                           <p>D</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>V</p>
                        </c>
                        <c ca="center">
                           <p>N</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Bases</p>
                        </c>
                        <c ca="center">
                           <p>A,G,T</p>
                        </c>
                        <c ca="center">
                           <p>A,C,T</p>
                        </c>
                        <c ca="center">
                           <p>A,C,G</p>
                        </c>
                        <c ca="center">
                           <p>A,C,G,T</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Arbitrary substitutions</p>
               </st>
               <p>Here we allow a DNA symbol in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> to be substituted with other symbols across all positions (i.e., in a position independent manner), up to the allowed maximum errors per motif (or per component). To count the support for a motif, EXMOTIF has to consider all of its <it>neighbors </it>as well, which are defined as all the motifs (including itself) within <it>Hamming distance, &#949; </it>(or per motif <it>e</it><sub><it>i</it></sub>). Then the support of an instance motif is calculated as the total number of sequences in which its neighbors (including itself) are present. As always, the motif is frequent if its support meets the quorum <it>q</it>, that is, its neighbors are present in at least <it>q </it>distinct sequences.</p>
               <p>The main challenge is that when arbitrary, position independent substitutions are allowed, we cannot do support checking during each positional join, since the support of the current motif may be below quorum, but combined with its neighbors it may meet quorum. Thus EXMOTIF does support checking at two points. First, it checks for quorum after the pos-lists of all the simple motifs in <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> have been computed, provided the per motif error thresholds <it>e</it><sub><it>i </it></sub>have been specified. In this case each simple motif must be frequent to be extended to a structured motif. Second, it checks for quorum after the pos-lists of all the structured motifs that satisfy <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> are computed.</p>
               <sec>
                  <st>
                     <p>Determining neighbors</p>
                  </st>
                  <p>In order to quickly find all the existing neighbors of a motif within the allowed error thresholds, EXMOTIF first computes all the exact structured motifs, and stores them into a hash table to facilitate fast lookup. Then for each extracted structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, EXMOTIF enumerates all its possible neighbors and checks whether they exist in the hash table. One problem is that the number of possible neighbors of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> can be quite large. When we allow <it>&#949;</it><sub><it>i </it></sub>substitutions for simple component <it>M</it><sub><it>i </it></sub>in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, for 1 &#8804; <it>i </it>&#8804; <it>k</it>, the number of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>'s neighbors is given as <m:math name="1748-7188-1-21-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup><m:mrow><m:mo stretchy="false">[</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>0</m:mn></m:mrow><m:mrow><m:msub><m:mi>e</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msubsup><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>M</m:mi><m:mi>i</m:mi></m:msub></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mi>j</m:mi></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>&#8901;</m:mo><m:msup><m:mn>3</m:mn><m:mi>j</m:mi></m:msup></m:mrow></m:mstyle><m:mo stretchy="false">]</m:mo></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqeWaqaaiabcUfaBnaaqadabaWaaeWaaeaafaqabeGabaaabaWaaqWaaeaacqWGnbqtdaWgaaWcbaGaemyAaKgabeaaaOGaay5bSlaawIa7aaqaaiabdQgaQbaaaiaawIcacaGLPaaacqGHflY1cqaIZaWmdaahaaWcbeqaaiabdQgaQbaaaeaacqWGQbGAcqGH9aqpcqaIWaamaeaacqWGLbqzdaWgaaadbaGaemyAaKgabeaaa0GaeyyeIuoakiabc2faDbWcbaGaemyAaKMaeyypa0JaeGymaedabaGaem4AaSganiabg+Givdaaaa@4B8B@</m:annotation></m:semantics></m:math>. For example, for <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = AACGTT[1,5]AGTTCC, when we allow one substitution for each simple motif, the number of its neighbors is 361; when we allow two substitutions per component, the number of its neighbors is 23,716. Instead of enumerating the potentially large number of neighbors (many of which may not even occur in the sequence set <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>) for each structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> individually, EXMOTIF utilizes the observation that many motifs have shared neighbors, and thus previously computed support information can be reused. EXMOTIF enumerates neighbors in two steps. In the first step, for each <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, it enumerates <it>aggregate </it>neighbor motifs, replacing the allowed number of errors <it>e</it><sub><it>i </it></sub>with as many 'N' symbols (which stands for A,C,G, or T). The number of possible aggregate neighbors is given as <m:math name="1748-7188-1-21-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>M</m:mi><m:mi>i</m:mi></m:msub></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:msub><m:mi>&#949;</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqeWaqaamaabmaabaqbaeqabiqaaaqaamaaemaabaGaemyta00aaSbaaSqaaiabdMgaPbqabaaakiaawEa7caGLiWoaaeaaiiGacqWF1oqzdaWgaaWcbaGaemyAaKgabeaaaaaakiaawIcacaGLPaaaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdUgaRbqdcqGHpis1aaaa@3DF8@</m:annotation></m:semantics></m:math>. The second step, it computes the support for each aggregate neighbor by expanding each 'N' with each DNA symbol, looking up the hash table for the support of the corresponding motif, and adding the supports for all matching motifs. Since the motifs matching an aggregate are also neighbors of each other, the support of the aggregate can be re-used to compute the support of other matching motifs as well. Once the supports for all aggregate neighbors have been computed, the final support of the structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> can be obtained. Thus for each <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, the number of "neighbors" to consider can be as low as <m:math name="1748-7188-1-21-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>M</m:mi><m:mi>i</m:mi></m:msub></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:msub><m:mi>&#949;</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqeWaqaamaabmaabaqbaeqabiqaaaqaamaaemaabaGaemyta00aaSbaaSqaaiabdMgaPbqabaaakiaawEa7caGLiWoaaeaaiiGacqWF1oqzdaWgaaWcbaGaemyAaKgabeaaaaaakiaawIcacaGLPaaaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdUgaRbqdcqGHpis1aaaa@3DF8@</m:annotation></m:semantics></m:math>!</p>
                  <p>For example, consider the example shown in Figure <figr fid="F4">4</figr>. Consider the structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = TAA[0,3]GG[1,3]CCTT (taken from our example in Table <tblr tid="T1">1</tblr>); assume that <it>&#949;</it><sub>1 </sub>= 1, <it>&#949;</it><sub>2 </sub>= 0 and <it>&#949;</it><sub>3 </sub>= 1. There are three possible aggregates for TAA, namely TAN, TNA, and NAA, and four aggregates for CCTT, namely CCTN, CCNT, CNTT, and NCTT, giving a total of 12 aggregate neighbors for <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, as illustrated in the figure. EXMOTIF processes each aggregate neighbor in turn. Using a hash-table (or direct lookup table if there are only a few neighbors), it checks if the aggregate neighbor has been processed previously. If yes, it moves on to the next aggregate. If not, it gathers the support information from all of its matching structured motifs, to compute its total support. Next, it also updates the neighbor support value for each of the matching motifs, so that once an aggregate is processed, we no longer require its information. All we need to know is whether it has been processed or not. For example, once the support of the first aggregate TAN[0,3]GG[1,3]CCTN for the example motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> above is computed, EXMOTIF also updates the neighbor supports for all other matching structured motifs, such as <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> = TAC[0,3]GG[1,3]CCTG. Later when processing <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>, EXMOTIF can skip the above aggregate and focus on the not yet processed aggregates, e.g., NAC[0,3]GG[1,3]NCTG, and so on.</p>
                  <fig id="F4">
                     <title>
                        <p>Figure 4</p>
                     </title>
                     <caption>
                        <p>Aggregate Neighbors</p>
                     </caption>
                     <text>
                        <p>Aggregate Neighbors.</p>
                     </text>
                     <graphic file="1748-7188-1-21-4"/>
                  </fig>
                  <p>The pseudo-code for arbitrary substitutions is given in Figure <figr fid="F5">5</figr>. The procedure takes as input the hash-table &#8461; containing all structured motifs <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> and their supports <it>&#960;</it>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>), the quorum <it>q</it>, and the per simple motif errors <it>e</it><sub><it>i </it></sub>or the global error <it>&#949; </it>for the structured motifs. For each structured motif we also maintain its aggregate support <it>&#960;</it><sup><it>aggregate</it></sup>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>), which is initially set to 0 (line 1). Initially we create all the aggregate neighbors for each extracted structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> (lines 3&#8211;7). For each such aggregate neighbor <it>G </it>(line 8), if it has not been processed, we compute its support by adding the individual supports of all its matching motifs <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> (lines 11&#8211;12). Note that these support values are found quickly via the hash-table &#8461;.  Once the support of an aggregate neighbor is known, we immediately update the aggregate support <it>&#960;</it><sup><it>aggregate</it></sup>) for each of its contributing matching motifs <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> (lines 13&#8211;14). Note that since each motif has already contributed to the support of the aggregate neighbor (<it>&#960; </it>(<it>G</it>)), we must subtract the initial support of <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> (<it>&#960;</it>(<m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>)) to avoid over-counting. Finally, once all the aggregate neighbors have been processed, we output the structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, provided <it>&#960;</it>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) + <it>&#960;</it><sup><it>aggregate</it></sup>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) meets the quorum requirement (line 14).</p>
                  <fig id="F5">
                     <title>
                        <p>Figure 5</p>
                     </title>
                     <caption>
                        <p>Arbitrary Substitutions</p>
                     </caption>
                     <text>
                        <p>Arbitrary Substitutions.</p>
                     </text>
                     <graphic file="1748-7188-1-21-5"/>
                  </fig>
               </sec>
               <sec>
                  <st>
                     <p>Counting support</p>
                  </st>
                  <p>There are two methods to record the support for each motif. In the first method, we associate each motif with a bit vector, <m:math name="1748-7188-1-21-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">V</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFveVvaaa@384B@</m:annotation></m:semantics></m:math>. Each bit, <m:math name="1748-7188-1-21-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">V</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFveVvaaa@384B@</m:annotation></m:semantics></m:math><sub><it>i </it></sub>for 1 &#8804; <it>i </it>&#8804; <it>n </it>(where <it>n </it>= |<m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>|) indicates whether the motif is present in the sequence <it>S</it><sub><it>i </it></sub>&#8712; <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. The support of the motif is the number of set bits in <m:math name="1748-7188-1-21-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">V</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFveVvaaa@384B@</m:annotation></m:semantics></m:math>. Thus to obtain the support for a motif, we can simply union the bit vectors of all its (aggregate) neighbors. Using one bit to represent a sequence saves space, and also saves time via the union operation. However, since we need <it>n </it>fixed bits for each motif to store its bit vector, this is not efficient if there are many sequences, and if a motif occurs only in a small number of sequences, which leads to a sparse bit vector. Thus in the second method, EXMOTIF associates each motif with an identifier array, <m:math name="1748-7188-1-21-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">Q</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFqeFuaaa@3841@</m:annotation></m:semantics></m:math>, to only store the sequence identifiers in which the motif occurs. EXMOTIF can then obtain the support for a motif by scanning the identifier arrays of its neighbors in linear time. For example consider again our motif (from Table <tblr tid="T1">1</tblr>), TAT[0,1]GG[2,3]CCAT, which occurs in <it>S</it><sub>2 </sub>and <it>S</it><sub>3</sub>, Its bit vector is thus <m:math name="1748-7188-1-21-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">V</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFveVvaaa@384B@</m:annotation></m:semantics></m:math> = {0110} and its identifier array <m:math name="1748-7188-1-21-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">Q</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFqeFuaaa@3841@</m:annotation></m:semantics></m:math> = {2, 3}.</p>
               </sec>
               <sec>
                  <st>
                     <p>Creating positional weight matrices</p>
                  </st>
                  <p>For any frequent structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, we can summarize the information about its neighbors (including <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) by computing a <it>Positional Weight Matrix </it>(PWM). The PWM for a structured motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> gives for each non-gap position the likelihood of occurrence for each symbol in &#931;<sub>DNA</sub>. The PWM <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math> for <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> is calculated as follows:</p>
                  <p>
                     <m:math name="1748-7188-1-21-i21" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mtable>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>r</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>f</m:mi>
                                                   <m:mrow>
                                                      <m:mi>i</m:mi>
                                                      <m:mi>j</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                                <m:mo>+</m:mo>
                                                <m:msub>
                                                   <m:mi>p</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:msubsup>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>k</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:mrow>
                                                         <m:mrow>
                                                            <m:mo>|</m:mo>
                                                            <m:mrow>
                                                               <m:msub>
                                                                  <m:mi>&#931;</m:mi>
                                                                  <m:mrow>
                                                                     <m:mtext>DNA</m:mtext>
                                                                  </m:mrow>
                                                               </m:msub>
                                                            </m:mrow>
                                                            <m:mo>|</m:mo>
                                                         </m:mrow>
                                                      </m:mrow>
                                                   </m:msubsup>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>f</m:mi>
                                                         <m:mrow>
                                                            <m:mi>k</m:mi>
                                                            <m:mi>j</m:mi>
                                                         </m:mrow>
                                                      </m:msub>
                                                      <m:mo>+</m:mo>
                                                      <m:msub>
                                                         <m:mi>p</m:mi>
                                                         <m:mi>k</m:mi>
                                                      </m:msub>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi mathvariant="script">W</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:mi>ln</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mfrac>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>r</m:mi>
                                                         <m:mrow>
                                                            <m:mi>i</m:mi>
                                                            <m:mi>j</m:mi>
                                                         </m:mrow>
                                                      </m:msub>
                                                   </m:mrow>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>p</m:mi>
                                                         <m:mi>i</m:mi>
                                                      </m:msub>
                                                   </m:mrow>
                                                </m:mfrac>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                              <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaafaqabeqacaaabaGaemOCai3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaWcaaqaaiabdAgaMnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaey4kaSIaemiCaa3aaSbaaSqaaiabdMgaPbqabaaakeaadaaeWaqaaiabdAgaMnaaBaaaleaacqWGRbWAcqWGQbGAaeqaaOGaey4kaSIaemiCaa3aaSbaaSqaaiabdUgaRbqabaaabaGaem4AaSMaeyypa0JaeGymaedabaWaaqWaaeaacqqHJoWudaWgaaadbaGaeeiraqKaeeOta4KaeeyqaeeabeaaaSGaay5bSlaawIa7aaqdcqGHris5aaaakiabcYcaSaqaaGWaaiab=zr8xnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JagiiBaWMaeiOBa42aaeWaaeaadaWcaaqaaiabdkhaYnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaaGcbaGaemiCaa3aaSbaaSqaaiabdMgaPbqabaaaaaGccaGLOaGaayzkaaaaaiaaxMaacaWLjaWaaeWaaeaacqaIXaqmaiaawIcacaGLPaaaaaa@6FBB@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </p>
                  <p>where, <it>f</it><sub><it>ij </it></sub>and <it>r</it><sub><it>ij </it></sub>represent the observed and relative frequency of symbol <it>i </it>at position <it>j</it>, respectively, <it>p</it><sub><it>i </it></sub>is the prior probability of symbol <it>i</it>, and <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math><sub><it>ij </it></sub>is the weight (log-likelihood) of observing symbol <it>i </it>at position <it>j</it>. Whereas <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math> gives the likelihood of observing a given symbol in a given position in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> it does not account for the degree to which some symbols are conserved at some positions. We can adjust the weights <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math><sub><it>ij </it></sub>by considering the information content at each position. The <it>information content </it>for a PWM is given as:</p>
                  <p>
                     <m:math name="1748-7188-1-21-i22" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mtable>
                                 <m:mtr>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>&#8464;</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:msub>
                                             <m:mi>r</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mi>ln</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>r</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>&#8722;</m:mo>
                                          <m:msub>
                                             <m:mi>p</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mi>ln</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>p</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>&#8464;</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:mstyle displaystyle="true">
                                             <m:munderover>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mrow>
                                                   <m:mi>i</m:mi>
                                                   <m:mo>=</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mrow>
                                                      <m:mo>|</m:mo>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>&#931;</m:mi>
                                                            <m:mrow>
                                                               <m:mtext>DNA</m:mtext>
                                                            </m:mrow>
                                                         </m:msub>
                                                      </m:mrow>
                                                      <m:mo>|</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                             </m:munderover>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>&#8464;</m:mi>
                                                   <m:mrow>
                                                      <m:mi>i</m:mi>
                                                      <m:mi>j</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:mstyle>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>&#8464;</m:mi>
                                             <m:mi mathvariant="script">W</m:mi>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:mstyle displaystyle="true">
                                             <m:munderover>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mrow>
                                                   <m:mi>j</m:mi>
                                                   <m:mo>=</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                                <m:mi>K</m:mi>
                                             </m:munderover>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>&#8464;</m:mi>
                                                   <m:mi>j</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                              <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mn>2</m:mn>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaafaqabeqadaaabaacdaGae8heHK0aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpcqWGYbGCdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiGbcYgaSjabc6gaUjabcIcaOiabdkhaYnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeiykaKIaeyOeI0IaemiCaa3aaSbaaSqaaiabdMgaPbqabaGccyGGSbaBcqGGUbGBcqGGOaakcqWGWbaCdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiabcYcaSaqaaiab=brijnaaBaaaleaacqWGQbGAaeqaaOGaeyypa0ZaaabCaeaacqWFqessdaWgaaWcbaGaemyAaKMaemOAaOgabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaadaabdaqaaiabfo6atnaaBaaameaacqqGebarcqqGobGtcqqGbbqqaeqaaaWccaGLhWUaayjcSdaaniabggHiLdGccqGGSaalaeaacqWFqessdaWgaaWcbaGae8NfXFfabeaakiabg2da9maaqahabaGae8heHK0aaSbaaSqaaiabdQgaQbqabaaabaGaemOAaOMaeyypa0JaeGymaedabaGaem4saSeaniabggHiLdaaaOGaaCzcaiaaxMaadaqadaqaaiabikdaYaGaayjkaiaawMcaaaaa@7BF1@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </p>
                  <p>where <it>K </it>is the number of symbols in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>; <m:math name="1748-7188-1-21-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8464;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFqessaaa@3769@</m:annotation></m:semantics></m:math><sub><it>ij </it></sub>is the information content of symbol <it>i </it>at position <it>j</it>; <m:math name="1748-7188-1-21-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8464;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFqessaaa@3769@</m:annotation></m:semantics></m:math><sub><it>j </it></sub>is the information content over all bases at position <it>j</it>; and <m:math name="1748-7188-1-21-i24" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#8464;</m:mi><m:mi mathvariant="script">W</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFqessdaWgaaWcbaGae8NfXFfabeaaaaa@3978@</m:annotation></m:semantics></m:math> is the information content of the entire matrix <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math>. To allow mismatches at less conserved positions to be more easily tolerated than those at highly conserved positions, we multiply each <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math><sub><it>ij </it></sub>by <m:math name="1748-7188-1-21-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8464;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFqessaaa@3769@</m:annotation></m:semantics></m:math><sub><it>j</it></sub>, which is larger for more conserved positions. As a result, the corrected weight of each element in the PWM <m:math name="1748-7188-1-21-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">W</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vaaa@384D@</m:annotation></m:semantics></m:math> becomes:</p>
                  <p>
                     <m:math name="1748-7188-1-21-i25" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:msubsup>
                                 <m:mi mathvariant="script">W</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                                 <m:mi>c</m:mi>
                              </m:msubsup>
                              <m:mo>=</m:mo>
                              <m:msub>
                                 <m:mi>&#8464;</m:mi>
                                 <m:mi>j</m:mi>
                              </m:msub>
                              <m:mi>ln</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>r</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>p</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                              <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mn>3</m:mn>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFwe=vdaqhaaWcbaGaemyAaKMaemOAaOgabaGaem4yamgaaOGaeyypa0Jae8heHK0aaSbaaSqaaiabdQgaQbqabaGccyGGSbaBcqGGUbGBdaqadaqaamaalaaabaGaemOCai3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakeaacqWGWbaCdaWgaaWcbaGaemyAaKgabeaaaaaakiaawIcacaGLPaaacaWLjaGaaCzcamaabmaabaGaeG4mamdacaGLOaGaayzkaaaaaa@4F98@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </p>
                  <p>Then we can calculate the PWM score, <m:math name="1748-7188-1-21-i26" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8475;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFBeIuaaa@377D@</m:annotation></m:semantics></m:math>, for a structured motif, <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, by summing up the positional weights for the bases in <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, given as <m:math name="1748-7188-1-21-i27" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#8475;</m:mi><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>K</m:mi></m:msubsup><m:mrow><m:msubsup><m:mi mathvariant="script">W</m:mi><m:mrow><m:mi>&#8499;</m:mi><m:mo stretchy="false">[</m:mo><m:mi>j</m:mi><m:mo stretchy="false">]</m:mo><m:mi>j</m:mi></m:mrow><m:mi>c</m:mi></m:msubsup></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFBeIucqGH9aqpdaaeWaqaaiab=zr8xnaaDaaaleaacqWFZestcqGGBbWwcqWGQbGAcqGGDbqxcqWGQbGAaeaacqWGJbWyaaaabaGaemOAaOMaeyypa0JaeGymaedabaGaem4saSeaniabggHiLdaaaa@48AB@</m:annotation></m:semantics></m:math>. Thus for each <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, its PWM score and PWM information content can be further used to measure whether <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> is a significant motif.</p>
               </sec>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Solving repeated structured motif identification problem</p>
            </st>
            <p>In repeated structured motif identification problem, the frequency closure property (that all the subsequences of a frequent sequence must be frequent), does not hold any more. For example, the sequence GCTTT, has three occurrences of pattern G[1,3]T, but it sub-pattern, G, has only one occurrence. Thus we cannot apply the closure property for pruning candidates. Nevertheless, a bound on the frequency of a sub-pattern can be established, which can be used for pruning.</p>
            <p><b>Theorem 1. </b><it>Let </it><m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> = <it>M</it><sub>1 </sub>... <it>M</it><sub><it>k </it></sub><it>be a structured motif and </it><m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> = <it>M</it><sub><it>i </it></sub>... <it>M</it><sub><it>k </it></sub><it>be a suffix of </it><m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, <it>for </it>1 &#8804; <it>i </it>&#8804; <it>k</it>. <it>If the weighted support of </it><m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math><it>is &#960;</it><sub><it>w </it></sub>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>), <it>then </it><m:math name="1748-7188-1-21-i28" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#960;</m:mi><m:mi>w</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mo stretchy="false">)</m:mo><m:mo>&#8805;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>&#960;</m:mi><m:mi>w</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>m</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:msub><m:mi>W</m:mi><m:mi>m</m:mi></m:msub></m:mrow></m:mstyle></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaiiGacqWFapaCdaWgaaWcbaGaem4DaChabeaakiabcIcaOGWaaiqb+ntinzaafaGaeiykaKIaeyyzIm7aaSaaaeaacqWFapaCdaWgaaWcbaGaem4DaChabeaakiabcIcaOiab+ntinjabcMcaPaqaamaaradabaGaem4vaC1aaSbaaSqaaiabd2gaTbqabaaabaGaemyBa0Maeyypa0JaeGymaedabaGaemyAaKMaeyOeI0IaeGymaedaniabg+Givdaaaaaa@500D@</m:annotation></m:semantics></m:math>, <it>where W</it><sub><it>m </it></sub>= <it>u</it><sub><it>m </it></sub>- <it>l</it><sub><it>m </it></sub>+ 1 <it>is the span of the gap range for m </it>&#8712; [1, <it>k </it>- 1].</p>
            <p><it>Proof</it>. Let <m:math name="1748-7188-1-21-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">O</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFoe=taaa@383D@</m:annotation></m:semantics></m:math>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) be the occurrence set of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> and <m:math name="1748-7188-1-21-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">O</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFoe=taaa@383D@</m:annotation></m:semantics></m:math>(<m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>) be the occurrence set of <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>. For each occurrence of <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> in <m:math name="1748-7188-1-21-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">O</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFoe=taaa@383D@</m:annotation></m:semantics></m:math>(<m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>), we can extend it to get occurrences of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> in <m:math name="1748-7188-1-21-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">O</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFoe=taaa@383D@</m:annotation></m:semantics></m:math>(<m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>) by adding <it>M</it><sub>1 </sub>... <it>M</it><sub><it>i</it>-1 </sub>before <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>. This leads to at most <m:math name="1748-7188-1-21-i29" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>m</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:msub><m:mi>W</m:mi><m:mi>m</m:mi></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqeWaqaaiabdEfaxnaaBaaaleaacqWGTbqBaeqaaaqaaiabd2gaTjabg2da9iabigdaXaqaaiabdMgaPjabgkHiTiabigdaXaqdcqGHpis1aaaa@37E9@</m:annotation></m:semantics></m:math> occurrences of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math> for any occurrence of <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>. Thus <m:math name="1748-7188-1-21-i30" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:mi mathvariant="script">O</m:mi><m:mo stretchy="false">(</m:mo><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mo stretchy="false">)</m:mo></m:mrow><m:mo>|</m:mo></m:mrow><m:mo>&#8901;</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>m</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:msub><m:mi>W</m:mi><m:mi>m</m:mi></m:msub></m:mrow></m:mstyle><m:mo>&#8805;</m:mo><m:mrow><m:mo>|</m:mo><m:mrow><m:mi mathvariant="script">O</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaadaabdaqaaGWaaiab=5q8pjabcIcaOiqb=ntinzaafaGaeiykaKcacaGLhWUaayjcSdGaeyyXIC9aaebmaeaacqWGxbWvdaWgaaWcbaGaemyBa0gabeaaaeaacqWGTbqBcqGH9aqpcqaIXaqmaeaacqWGPbqAcqGHsislcqaIXaqma0Gaey4dIunakiabgwMiZoaaemaabaGae8NdX=KaeiikaGIae83mH0KaeiykaKcacaGLhWUaayjcSdaaaa@5567@</m:annotation></m:semantics></m:math>, which immediately gives <m:math name="1748-7188-1-21-i28" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#960;</m:mi><m:mi>w</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mo stretchy="false">)</m:mo><m:mo>&#8805;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>&#960;</m:mi><m:mi>w</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>m</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:msub><m:mi>W</m:mi><m:mi>m</m:mi></m:msub></m:mrow></m:mstyle></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaiiGacqWFapaCdaWgaaWcbaGaem4DaChabeaakiabcIcaOGWaaiqb+ntinzaafaGaeiykaKIaeyyzIm7aaSaaaeaacqWFapaCdaWgaaWcbaGaem4DaChabeaakiabcIcaOiab+ntinjabcMcaPaqaamaaradabaGaem4vaC1aaSbaaSqaaiabd2gaTbqabaaabaGaemyBa0Maeyypa0JaeGymaedabaGaemyAaKMaeyOeI0IaeGymaedaniabg+Givdaaaaaa@500D@</m:annotation></m:semantics></m:math>. &#160;&#160;&#160; &#9633;</p>
            <p>With Theorem 1, EXMOTIF can calculate a support bound for any suffix <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> of <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, given the quorum requirement <it>q</it>. For example, assume that the motif template is NN[3,5]NNN[0,4]NNN and <it>q </it>= 100, with <it>W</it><sub>1 </sub>= 5 - 3 + 1 = 3 and <it>W</it><sub>2 </sub>= 4 - 0 + 1 = 5. When processing the suffix component <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> = NNN, we require that <it>&#960;</it><sub><it>w</it></sub>(<m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>) &#8805; <m:math name="1748-7188-1-21-i31" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mrow><m:mn>100</m:mn></m:mrow><m:mrow><m:mn>3</m:mn><m:mo>&#215;</m:mo><m:mn>5</m:mn></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabigdaXiabicdaWiabicdaWaqaaiabiodaZiabgEna0kabiwda1aaaaaa@338B@</m:annotation></m:semantics></m:math> = 6; when processing <m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math> = NNN[0,4]NNN, we require that <it>&#960;</it><sub><it>w</it></sub>(<m:math name="1748-7188-1-21-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:msup><m:mi>&#8499;</m:mi><m:mo>&#8242;</m:mo></m:msup><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbwvMCKfMBHbqedmvETj2BSbWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aqee0evGueE0jxyaibaieIgFLIOYR2NHOxjYhrPYhrPYpI8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0lXdbba9pGe9qqFf0dXdHuk9fr=xfr=xfrpiWZqaaeaabiGaaiaacaqabeaadaqacqaaaOqaaGWaaiqb=ntinzaafaaaaa@406D@</m:annotation></m:semantics></m:math>) &#8805; <m:math name="1748-7188-1-21-i32" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mrow><m:mn>100</m:mn></m:mrow><m:mn>3</m:mn></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabigdaXiabicdaWiabicdaWaqaaiabiodaZaaaaaa@307C@</m:annotation></m:semantics></m:math> = 33. Thus even the weaker bounds can lead to some pruning.</p>
         </sec>
         <sec>
            <st>
               <p>The complete EXMOTIF algorithm: complexity analysis</p>
            </st>
            <p>The pseudo-code for the complete EXMOTIF algorithm is shown in Figure <figr fid="F6">6</figr>. The program takes as inputs the set of sequences <m:math name="1748-7188-1-21-i33" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">S</m:mi><m:mo>=</m:mo><m:msubsup><m:mrow><m:mo>{</m:mo><m:msub><m:mi>S</m:mi><m:mi>i</m:mi></m:msub><m:mo>}</m:mo></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>n</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=ucqGH9aqpcqGG7bWEcqWGtbWudaWgaaWcbaGaemyAaKgabeaakiabc2ha9naaDaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGUbGBaaaaaa@43EE@</m:annotation></m:semantics></m:math>, the motif template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> = <it>M</it><sub>1</sub>[<it>l</it><sub>1</sub>, <it>u</it><sub>1</sub>] ... [<it>l</it><sub><it>k</it>-1</sub>, <it>u</it><sub><it>k</it>-1</sub>] <it>M</it><sub><it>k</it></sub>, the quorum threshold <it>q</it>, the number of errors or IUPAC symbols allowed per simple motif <m:math name="1748-7188-1-21-i34" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>e</m:mi><m:mo>=</m:mo><m:msubsup><m:mrow><m:mo>{</m:mo><m:msub><m:mi>e</m:mi><m:mi>i</m:mi></m:msub><m:mo>}</m:mo></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWFLbqzcqGH9aqpcqGG7bWEcqWGLbqzdaWgaaWcbaGaemyAaKgabeaakiabc2ha9naaDaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGRbWAaaaaaa@39CC@</m:annotation></m:semantics></m:math>, and the set of IUPAC symbols to use per simple motif, <m:math name="1748-7188-1-21-i35" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>v</m:mi><m:mo>=</m:mo><m:msubsup><m:mrow><m:mo>{</m:mo><m:msub><m:mi>v</m:mi><m:mi>i</m:mi></m:msub><m:mo>}</m:mo></m:mrow><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieqacqWF2bGDcqGH9aqpcqGG7bWEcqWG2bGDdaWgaaWcbaGaemyAaKgabeaakiabc2ha9naaDaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGRbWAaaaaaa@3A10@</m:annotation></m:semantics></m:math> (only for position specific substitutions). As outlined in Figure <figr fid="F6">6</figr> EXMOTIF allows several different variations to motif extraction, as described above. These variations include, exact matching, position-specific substitutions via use of IUPAC symbols, arbitrary substitutions, and repeated motif identification.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>EXMOTIF Algorithm</p>
               </caption>
               <text>
                  <p>EXMOTIF Algorithm.</p>
               </text>
               <graphic file="1748-7188-1-21-6"/>
            </fig>
            <p>EXMOTIF initially adjusts the support thresholds if the task is repeated motif identification (lines 1&#8211;2). The main approach for handling exact matches or position-specific substitutions is the same. The main difference is that while enumerating the simple motifs, EXMOTIF uses the appropriate IUPAC alphabet (specified by <it>v</it><sub><it>i </it></sub>for component <it>M</it><sub><it>i</it></sub>; lines 6&#8211;7). The structured motifs are found via positional joins over the simple motifs (line 8). The positional joins are performed as described in Figure <figr fid="F1">1</figr>. For arbitrary substitutions, EXMOTIF first enumerates the simple motifs (line 9) and checks their aggregate support (i.e., including the supports of all neighbors within error <it>&#949;</it><sub><it>i</it></sub>). From these, the structured motifs are enumerated and stored in a hash-table (&#8461;; line 11). Lastly, the aggregate support of all these motifs is computed as described in Figure <figr fid="F5">5</figr> (line 12). Those that meet the quorum will be output. Finally, if desired, EXMOTIF recovers the full positions for each occurrence, via the procedure outlined in Figure <figr fid="F2">2</figr>.</p>
            <p>In terms of the computational complexity of EXMOTIF, let's first consider the complexity of extracting the simple motifs. Assume that <it>m </it>is the length of the longest simple motif component in the structured template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math>. Note that there are potentially |&#931;|<sup><it>m </it></sup>frequent simple motifs at that length, but due to the quorum requirement, many of these will not be frequent. Nevertheless, in the worst case <it>O</it>(|&#931;|<sup><it>m</it></sup>) simple components may be extracted. For a simple motif of length <it>m</it>, EXMOTIF uses <it>O</it>(log(<it>m</it>)) positional joins to obtain its support, and each such join takes <it>O</it>(<it>N</it>) time, where <m:math name="1748-7188-1-21-i36" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>N</m:mi><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>n</m:mi></m:msubsup><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>S</m:mi><m:mi>i</m:mi></m:msub></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGobGtcqGH9aqpdaaeWaqaamaaemaabaGaem4uam1aaSbaaSqaaiabdMgaPbqabaaakiaawEa7caGLiWoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6gaUbqdcqGHris5aaaa@3B71@</m:annotation></m:semantics></m:math> is the sum of the lengths of all the sequences <it>S</it><sub><it>i </it></sub>in the database <m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>. Thus, extracting the simple motifs takes time <it>O</it>(<it>N </it>log(<it>m</it>)|&#931;|<sup><it>m</it></sup>) in the worst case.</p>
            <p>With |&#931;|<sup><it>m </it></sup>simple motifs, there are <it>O</it>(|&#931;|<sup><it>mk</it></sup>) potential structured motifs, though a vast majority of these will not meet the quorum requirement. Extracting the structured motifs then takes time <it>O</it>(<it>kN</it>|&#931;|<sup><it>mk</it></sup>) for the exact match and position-specific substitution cases. For arbitrary substitutions there is additional cost of enumerating aggregate neighbors and computing their support. For each motif we have to consider <m:math name="1748-7188-1-21-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>k</m:mi></m:msubsup><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msub><m:mi>M</m:mi><m:mi>i</m:mi></m:msub></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:msub><m:mi>&#949;</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqeWaqaamaabmaabaqbaeqabiqaaaqaamaaemaabaGaemyta00aaSbaaSqaaiabdMgaPbqabaaakiaawEa7caGLiWoaaeaaiiGacqWF1oqzdaWgaaWcbaGaemyAaKgabeaaaaaakiaawIcacaGLPaaaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdUgaRbqdcqGHpis1aaaa@3DF8@</m:annotation></m:semantics></m:math> = <it>km</it><sup><it>e </it></sup>aggregate neighbors, where <it>e </it>= max<sub><it>i</it></sub>{<it>e</it><sub><it>i</it></sub>}. Furthermore, an aggregate neighbor can have <it>k</it>|&#931;|<sup><it>e </it></sup>matching motifs. Thus the time complexity of extracting all the structured motifs is <it>O</it>(<it>kN</it>|&#931;|<sup><it>mk </it></sup>+ <it>k</it><sup>2</sup><it>m</it><sup><it>e</it></sup>|&#931;|<sup><it>e</it></sup>) for arbitrary substitutions. Since typically <it>mk </it>> <it>e </it>and <it>N </it>> <it>m</it><sup><it>e</it></sup>, the time complexity is essentially <it>O</it>(<it>kN</it>|&#931;|<sup><it>mk</it></sup>). Combined with the cost for simple motif extraction, the computational complexity of EXMOTIF is then given as <it>O</it>(log(<it>m</it>) <it>N </it>|&#931;|<sup><it>m </it></sup>+ <it>kN</it>|&#931;|<sup><it>km</it></sup>) = <it>O</it>(<it>kN</it>|&#931;|<sup><it>km</it></sup>).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Experimental results</p>
         </st>
         <p>EXMOTIF has been implemented in C++, and compiled with g++ v4.0.0 at optimization level 3 (-O3). We performed experiments on a Macintosh PowerPC G5 with dual 2.7GHz processors and 4GB memory running Mac OS X vl0.4.5. We compare our results with the latest version of RISO <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> (called RISOTTO <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>), the best previous algorithm for structured motif extraction problem.</p>
         <sec>
            <st>
               <p>EXMOTIF and RISO: comparison</p>
            </st>
            <p>For comparison, we extract structured motifs from 1,062 non-coding sequences (a total of 196,736 nucleotides) located between two divergent genes in the genome of <it>B. subtilis </it><abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Figure <figr fid="F7">7</figr> and <figr fid="F8">8</figr> compare the running time (in seconds) for EXMOTIF and RISO using exact matching and approximate matching, respectively. Experiments were done for different gap ranges, number of components, and quorum thresholds. Note that EXMOTIF has two options: one (shown as "exMOTIF" in the figures) for reporting only the number of sequences where the structured motifs occur, the other (shown as "exMOTIF(#)") for reporting both the number of sequences where the structured motifs occur and the actual occurrences. Also note that the current implementation of RISO <it>does not </it>report the actual occurrences; it reports only the frequency.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>EXMOTIF vs. RISO: Exact Matching</p>
               </caption>
               <text>
                  <p>EXMOTIF vs. RISO: Exact Matching.</p>
               </text>
               <graphic file="1748-7188-1-21-7"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>EXMOTIF vs. RISO: Approximate Matching</p>
               </caption>
               <text>
                  <p>EXMOTIF vs. RISO: Approximate Matching.</p>
               </text>
               <graphic file="1748-7188-1-21-8"/>
            </fig>
            <sec>
               <st>
                  <p>Exact matching</p>
               </st>
               <p>In the first experiment, shown in Figure <figr fid="F7">7(a)</figr>, we randomly generated 100 structured motif templates, with <it>k </it>&#8712; [2,4] simple motifs of length <it>l </it>&#8712; [4,7] (<it>k </it>and <it>l </it>are selected uniformly at random within the given ranges). The gap range between each pair of simple motifs is a random sub-interval of [0, 200]. The x-axis is sorted on the number of motifs extracted. For clarity we plot average times for the methods when the number of motifs extracted fall into the given range on the x-axis. For example, the time plotted for the range [10<sup>2</sup>, 10<sup>3</sup>) is the average time for all the random templates that produce between 100 and 1000 motifs. We find that the average running time for RISO across all extracted motifs is 120.7s, whereas for EXMOTIF it takes 88.4s for reporting only the supports, and 91.3s for also reporting all the occurrences. The median times were 26.3s, 8.5s, and 9.2s, respectively, indicating a 3 times speed-up of EXMOTIF over RISO.</p>
               <p>In the next set of experiments we varied one parameter while keeping the others fixed. We set the default quorum to 12% (<it>q </it>= 127), the default gap ranges to [0,100], the default simple motif length to <it>l </it>= 4 (NNNN), and the default number of components <it>k </it>= 3 (e.g., NNNN[0,100]NNNN[0,100]NNNN). In Figure <figr fid="F7">7(b)</figr>, we plot the time as a function of the number of simple motifs <it>k </it>in the template. We find that as the number of components increases the time gap between EXMOTIF and RISO increases; for <it>k </it>= 4 simple motifs, EXMOTIF is around 5 times faster than RISO. Figure <figr fid="F7">7(c)</figr> shows the effect of increasing gap ranges, from [0,0] to [0,200]. We find that as the gap range increases the time for EXMOTIF increases at a slower rate compared to RISO. For [0,200], EXMOTIF is 3&#8211;4 times faster than RISO depending whether only frequency or full occurrences are reported. In Figure <figr fid="F7">7(d)</figr>, as the quorum threshold increases, the running time goes down for both methods. For quorum 24%, EXMOTIF is 4&#8211;5 times faster than RISO. As support decreases, the gap narrows somewhat, but EXMOTIF remains 2&#8211;3 times faster. Finally, Figure <figr fid="F7">7(e)</figr> plots the effect of increasing simple motif lengths <it>l </it>&#8712; [2,6]. We find that the time first increases and then decreases. This is because there are a large number of motif occurrences for length 3 and length 4, but relatively few occurrences for length 5 and length 6. Depending on the motif lengths, EXMOTIF can be 3&#8211;40 times faster than RISO for comparable output, i.e., reporting only the support. EXMOTIF remains up to 5 times faster when also reporting the actual occurrences.</p>
               <p>To compare the performance for extracting structured motifs with length ranges, we used the template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> = <it>M</it><sub>1</sub>[50, 100] <it>M</it><sub>2</sub>[1,50]<it>M</it><sub>3</sub>[20, 100]<it>M</it><sub>4 </sub>with <it>q </it>= 12%, where |<it>M</it><sub>1</sub>| &#8712; [2,4], |<it>M</it><sub>2</sub>| &#8712; [3,4], |<it>M</it><sub>3</sub>| &#8712; [5,6], |<it>M</it><sub>4</sub>| &#8712; [4,5]. EXMOTIF took 78.4s, whereas RISO took 1640.9s to extract 14,174 motifs.</p>
            </sec>
            <sec>
               <st>
                  <p>Approximate matching</p>
               </st>
               <p>In the first experiment, shown in Figure <figr fid="F8">8(a)</figr>, we randomly generated 30 structured motif templates, with <it>k </it>&#8712; [2,3] simple motifs of length <it>l </it>&#8712; [3,6] (<it>k </it>and <it>l </it>are selected uniformly at random within the given ranges). The gap range between each pair of simple motifs is a random sub-interval of [10, 30]. The x-axis is sorted on the number of motifs extracted, and average times are plotted for the extracted number of motifs in the given range. We find that the average running time for RISO is 334.5s, whereas for EXMOTIF it takes 59.3s seconds for reporting only the support, and 176.7s for also reporting all the occurrences. Thus EXMOTIF is on average 5 times faster than RISO, with comparable output.</p>
               <p>Figures <figr fid="F8">8(b)&#8211;(e)</figr> plot the time for approximate matching as a function of different parameters. We set the default quorum to 12% (<it>q </it>= 127, out of |<m:math name="1748-7188-1-21-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">S</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFse=uaaa@3845@</m:annotation></m:semantics></m:math>| = 1062 sequences), the default gap ranges to [12,22], the default simple motif length to <it>l </it>= 6 (NNNNNN), and the default number of components <it>k </it>= 2 (e.g., NNNNNN[12,22]NNNNNN). Figure <figr fid="F8">8(b)</figr> shows how increasing gap ranges effect the running time; for gap range [8,26] between the two motif components, EXMOTIF is 2&#8211;3 times faster than RISO. In Figure <figr fid="F8">8(c)</figr>, we increase the numbers of arbitrary substitutions allowed for each simple motif; a pair (<it>&#949;</it><sub>1</sub>, <it>&#949;</it><sub>2</sub>) on the x-axis denotes that <it>&#949;</it><sub>1 </sub>substitutions are allowed for motif component <it>M</it><sub>1</sub>, and <it>&#949;</it><sub>2 </sub>for <it>M</it><sub>2</sub>. We can see that EXMOTIF is always faster than RISO. It is 9 times faster when only frequencies are reported, and it can be up to 5 times faster then full occurrences are reported, though for some cases the difference is slight.</p>
               <p>Figure <figr fid="F8">8(d)</figr> plots the effect of the quorum threshold. Compared to RISO, EXMOTIF performs much better for low quorum, e.g., for <it>q </it>= 4% EXMOTIF is 4&#8211;5 times faster than RISO. Finally in Figure <figr fid="F8">8(e)</figr>, as the simple motif lengths increase, the time for both EXMOTIF and RISO increases, and we find that EXMOTIF can be 2&#8211;3 times faster.</p>
               <p>We also studied the effect of quorum and allowed substitutions. Table <tblr tid="T4">4</tblr> shows the comparative results for EXMOTIF and RISO. Here we used the template <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> = NNNNNN[12, 22]NNNNNN to extract motifs from the 1062 subsequences from <it>B. subtilis</it>. We vary the quorum from low (5%) to high (90%), and vary the number of errors <it>e</it><sub><it>i </it></sub>per simple motif (with more errors allowed for higher quorum). For a comparable output (when only the frequency is reported), EXMOTIF outperforms RISO, especially for high quorum and high number of errors. It is interesting that for this latter case, reporting all occurrences incurs significant overhead. For example for <it>q </it>= 90% and with (<it>e</it><sub>1 </sub>= 3, <it>e</it><sub>2 </sub>= 3), EXMOTIF is 20 times faster than RISO, but EXMOTIF(#) is 3 times slower!</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Comparison of EXMOTIF and RISO for different quorums and allowed substitutions.</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="center">
                           <p>Quorum</p>
                        </c>
                        <c ca="left">
                           <p>#Substitutions</p>
                        </c>
                        <c ca="left">
                           <p>RISO</p>
                        </c>
                        <c ca="left">
                           <p>EXMOTIF</p>
                        </c>
                        <c ca="left">
                           <p>EXMOTIF(#)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>5%</p>
                        </c>
                        <c ca="left">
                           <p>(0, 0)</p>
                        </c>
                        <c ca="left">
                           <p>1.82s</p>
                        </c>
                        <c ca="left">
                           <p>1.42s</p>
                        </c>
                        <c ca="left">
                           <p>1.52s</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>30%</p>
                        </c>
                        <c ca="left">
                           <p>(1, 1)</p>
                        </c>
                        <c ca="left">
                           <p>63.01s</p>
                        </c>
                        <c ca="left">
                           <p>58.91s</p>
                        </c>
                        <c ca="left">
                           <p>64.52s</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>60%</p>
                        </c>
                        <c ca="left">
                           <p>(2, 2)</p>
                        </c>
                        <c ca="left">
                           <p>2763.31s</p>
                        </c>
                        <c ca="left">
                           <p>328.43s</p>
                        </c>
                        <c ca="left">
                           <p>2317.35s</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>90%</p>
                        </c>
                        <c ca="left">
                           <p>(3, 3)</p>
                        </c>
                        <c ca="left">
                           <p>13682.13s</p>
                        </c>
                        <c ca="left">
                           <p>707.56s</p>
                        </c>
                        <c ca="left">
                           <p>41464.93s</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>The template used is <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> = <it>NNNNNN</it>[12,22]<it>NNNNNN</it>. #Substitutions shows the number of errors (<it>e</it><sub>1</sub>, <it>e</it><sub>2</sub>) allowed for the two simple components.</p>
                  </tblfn>
               </tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Real applications</p>
            </st>
            <sec>
               <st>
                  <p>Discovery of single transcription factor binding sites</p>
               </st>
               <p>We evaluate our algorithm by extracting the conserved features of known transcription factor binding sites in yeast. In particular we used the binding sites for the Zinc (Zn) factors <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. There are 11 binding sites listed for the Zn cluster, 3 of which are simple motifs. The remaining 8 are structured, as shown in Table <tblr tid="T5">5</tblr>. For the evaluation, we first form several structured motif templates according to the conserved features in the binding sites. Then we extract the frequent structured motifs satisfying these templates from the upstream regions of 68 genes regulated by zinc factors <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. We used the -1000 to -1 upstream regions, truncating the region if and where it overlaps with an upstream open-reading frame (ORF). After extraction, since binding sites cannot have many occurrences in the ORF regions, we drop some motifs if they also occur frequently in the ORF regions (i.e., within the genes). Finally, we calculate the Z-scores for the remaining frequent motifs, and rank them by descending Z-scores. In our experiments, we set the minimum quorum threshold to 7% within the upstream regions and the maximum support threshold to 30% in the ORF regions. We use the shuffling program from SMILE <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to compute the Z-scores. The shufffing program randomly shuffles the original input sequences to obtain a new <it>shuffled </it>set of sequences.</p>
               <p>Then it computes, for each extracted frequent motif, its support (<it>&#960;</it>) and weighted support (<it>&#960;</it><sub><it>w</it></sub>) in the shuffled set. For a given frequent motif <m:math name="1748-7188-1-21-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi>&#8499;</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFZestaaa@3790@</m:annotation></m:semantics></m:math>, let <it>&#956; </it>and <it>&#963; </it>be the mean and standard deviation of its support across different sets (about 30) of shuffled sequences. Then the Z-score for each motif is calculated as: <m:math name="1748-7188-1-21-i37" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi mathvariant="script">Z</m:mi><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>&#960;</m:mi><m:mo stretchy="false">(</m:mo><m:mi>&#8499;</m:mi><m:mo stretchy="false">)</m:mo><m:mo>&#8722;</m:mo><m:mi>&#956;</m:mi></m:mrow><m:mi>&#963;</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaatuuDJXwAK1uy0HwmaeXbfv3ySLgzG0uy0Hgip5wzaGqbaiab=Lr8Ajabg2da9maalaaabaacciGae4hWdaNaeiikaGccdaGae03mH0KaeiykaKIaeyOeI0Iae4hVd0gabaGae43Wdmhaaaaa@4BE7@</m:annotation></m:semantics></m:math>. Likewise we can also calculate the Z-score for each frequent motif by using the weighted support (which is also applicable for the repeated structured motif identification problem). As shown in Table <tblr tid="T5">5</tblr>, we can successfully predict GAL4, GAL4 chips, LEU3, PPR1 and PUT3 with the highest rank. CAT8 and LYS also have high ranks. We were thus able to extract all eight transcription factors for the Zinc factors with high confidence. As a comparison, with the same dataset RISO can only predict GAL4, LEU3 and PPR1.</p>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p>Regulons of Zn cluster proteins.</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c ca="left">
                           <p>TF Name</p>
                        </c>
                        <c ca="left">
                           <p>Known Motif</p>
                        </c>
                        <c ca="left">
                           <p>Predicted Motifs</p>
                        </c>
                        <c ca="left">
                           <p>Num-Motifs</p>
                        </c>
                        <c ca="left">
                           <p>Ranking</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>GAL4</p>
                           <p>GAL4 chips</p>
                        </c>
                        <c ca="left">
                           <p>CGGRnnRCYnYnCnCCG</p>
                        </c>
                        <c ca="left">
                           <p>CGG[11,11]CCG</p>
                        </c>
                        <c ca="left">
                           <p>1634(3346)</p>
                        </c>
                        <c ca="left">
                           <p>1/1</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>CAT8</p>
                        </c>
                        <c ca="left">
                           <p>CGGnnnnnnGGA</p>
                        </c>
                        <c ca="left">
                           <p>CGG[6,6]GGA</p>
                        </c>
                        <c ca="left">
                           <p>1621(3356)</p>
                        </c>
                        <c ca="left">
                           <p>147/13</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>HAP1</p>
                        </c>
                        <c ca="left">
                           <p>CGGnnnTAnCGGCGGnnnTAnCGGnnnTA</p>
                        </c>
                        <c ca="left">
                           <p>CGG[6,6]CGG</p>
                        </c>
                        <c ca="left">
                           <p>1621(3356)</p>
                        </c>
                        <c ca="left">
                           <p>111/146</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LEU3</p>
                        </c>
                        <c ca="left">
                           <p>RCCGGnnCCGGY</p>
                        </c>
                        <c ca="left">
                           <p>CCG[4,4]CGG</p>
                        </c>
                        <c ca="left">
                           <p>1588(3366)</p>
                        </c>
                        <c ca="left">
                           <p>2/1</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LYS</p>
                        </c>
                        <c ca="left">
                           <p>WWWTCCRnYGGAWWW</p>
                        </c>
                        <c ca="left">
                           <p>TCC[3,3]GGA</p>
                        </c>
                        <c ca="left">
                           <p>1605(3360)</p>
                        </c>
                        <c ca="left">
                           <p>33/21</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>PPR1</p>
                        </c>
                        <c ca="left">
                           <p>WYCGGnnWWYKCCGAW</p>
                        </c>
                        <c ca="left">
                           <p>CGG[6,6]CCG</p>
                        </c>
                        <c ca="left">
                           <p>1621(3356)</p>
                        </c>
                        <c ca="left">
                           <p>1/2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>PUT3</p>
                        </c>
                        <c ca="left">
                           <p>YCGGnAnGCGnAnnnCCGA</p>
                           <p>CGGnAnGCnAnnnCCGA</p>
                        </c>
                        <c ca="left">
                           <p>CGG[10,11]CCG</p>
                        </c>
                        <c ca="left">
                           <p>727(4035)</p>
                        </c>
                        <c ca="left">
                           <p>1/1</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>TF Name stands for transcription factor name; Known Motif stands for the known binding sites corresponding to the transcription factors in TF Name column; Predicted Motifs stands for the motifs predicted by EXMOTIF; Num-Motifs gives the final (original) number of motifs extracted (final is after pruning those motifs that are also frequent in the ORF regions); Ranking stands for the Z-score ranking based on support/weighted support.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Discovery of composite regulatory patterns</p>
               </st>
               <p>The complex transcriptional regulatory network in Eukaryotic organisms usually requires interactions of multiple transcription factors. A potential application of EXMOTIF is to extract such composite regulatory binding sites from DNA sequences. We took two such transcription factors, URS1H and UASH, which are involved in early meiotic expression during sporulation, and that are known to cooperatively regulate 11 yeast genes <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. These 11 genes are also listed in SCPD <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, the promoter database of <it>Saccharomyces cerevisiae</it>. In 10 of those genes the URS1H binding site appears downstream from UASH; in the remaining one (HOP1) the binding sites are reversed. We took the binding sites for the 10 genes (all except HOP1), and after their multiple alignment, we obtained their consensus:  taTTTtGGAGTaata[4,179]ttGGCGGCTAA (the lower case letters are less conserved, whereas uppercase letters are the most conserved). Table <tblr tid="T6">6</tblr> shows the binding sites for UASH and URS1H for the 10 genes, their start positions, their alignment, and the consensus pattern. The gap between the sites are obtained after subtracting the length of UASH, 15, from the position difference (since the start position of UASH is given). The smallest gap is <it>l </it>= 119 - 110 - 15 = 4 and the largest is <it>u </it>= 288 - 94 - 15 = 179. Based on the on most conserved parts of the consensus, we formed the composite motif template: <m:math name="1748-7188-1-21-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">T</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFtepvaaa@3847@</m:annotation></m:semantics></m:math> = NNN[1,1]NNNNN[10,185]NNNNNNNNN (note the 6 additional gaps added to [4,179] to account for the non-conserved positions). We then extracted the structured motifs in the upstream regions of the 10 genes. We used the -800 to -1 upstream regions, and truncated the segment if it overlaps with an upstream ORF. The numbers of substitutions for NNN, NNNNN and NNNNNNNNN were set to <it>&#949;</it><sub>1 </sub>= 1, <it>&#949;</it><sub>2 </sub>= 2 and <it>&#949;</it><sub>3 </sub>= 1, respectively. The quorum thresholds was set to <it>q </it>= 0.7 with the upstreams, and the maximum support within genes was set to 0.1% The rank of the true motif TTT[1,1]GGAGT[10,185]GGCGGCTAA was 290 (out of 5284 final motifs) with a Z-score of 22.61.</p>
               <tbl id="T6">
                  <title>
                     <p>Table 6</p>
                  </title>
                  <caption>
                     <p>UASH and URS1H binding sites.</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Genes</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>UASH</b>
                           </p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>
                              <b>URS1H</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Gap</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>Site</p>
                        </c>
                        <c ca="center">
                           <p>Pos</p>
                        </c>
                        <c ca="center">
                           <p>Site</p>
                        </c>
                        <c ca="center">
                           <p>Pos</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ZIP1</p>
                        </c>
                        <c ca="center">
                           <p>GATTCGGAAGTAAAA</p>
                        </c>
                        <c ca="center">
                           <p>-42</p>
                        </c>
                        <c ca="center">
                           <p>==TCGGCGGCTAAAT</p>
                        </c>
                        <c ca="center">
                           <p>-22</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MEI4</p>
                        </c>
                        <c ca="center">
                           <p>TCTTTCGGAGTCATA</p>
                        </c>
                        <c ca="center">
                           <p>-121</p>
                        </c>
                        <c ca="center">
                           <p>==TGGGCGGCTAAAT</p>
                        </c>
                        <c ca="center">
                           <p>-98</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>DMC1</p>
                        </c>
                        <c ca="center">
                           <p>TTGTGTGGAGAGATA</p>
                        </c>
                        <c ca="center">
                           <p>-175</p>
                        </c>
                        <c ca="center">
                           <p>AAATAGCCGCCCA==</p>
                        </c>
                        <c ca="center">
                           <p>-143</p>
                        </c>
                        <c ca="center">
                           <p>17</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>SPO13</p>
                        </c>
                        <c ca="center">
                           <p>TAATTAGGAGTATAT</p>
                        </c>
                        <c ca="center">
                           <p>-119</p>
                        </c>
                        <c ca="center">
                           <p>AAATAGCCGCCGA==</p>
                        </c>
                        <c ca="center">
                           <p>-100</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MER1</p>
                        </c>
                        <c ca="center">
                           <p>GGTTTTGTAGTTCTA</p>
                        </c>
                        <c ca="center">
                           <p>-152</p>
                        </c>
                        <c ca="center">
                           <p>TTTTAGCCGCCGA==</p>
                        </c>
                        <c ca="center">
                           <p>-115</p>
                        </c>
                        <c ca="center">
                           <p>22</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>SPO16</p>
                        </c>
                        <c ca="center">
                           <p>CATTGTGATGTATTT</p>
                        </c>
                        <c ca="center">
                           <p>-201</p>
                        </c>
                        <c ca="center">
                           <p>==TGGGCGGCTAAAA</p>
                        </c>
                        <c ca="center">
                           <p>-90</p>
                        </c>
                        <c ca="center">
                           <p>96</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>REC104</p>
                        </c>
                        <c ca="center">
                           <p>CAATTTGGAGTAGGC</p>
                        </c>
                        <c ca="center">
                           <p>-182</p>
                        </c>
                        <c ca="center">
                           <p>==TTGGCGGCTATTT</p>
                        </c>
                        <c ca="center">
                           <p>-93</p>
                        </c>
                        <c ca="center">
                           <p>74</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>RED1</p>
                        </c>
                        <c ca="center">
                           <p>ATTTCTGGAGATATC</p>
                        </c>
                        <c ca="center">
                           <p>-355</p>
                        </c>
                        <c ca="center">
                           <p>==TCAGCGGCTAAAT</p>
                        </c>
                        <c ca="center">
                           <p>-167</p>
                        </c>
                        <c ca="center">
                           <p>173</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>REC114</p>
                        </c>
                        <c ca="center">
                           <p>GATTTTGTAGGAATA</p>
                        </c>
                        <c ca="center">
                           <p>-288</p>
                        </c>
                        <c ca="center">
                           <p>==TGGGCGGCTAACT</p>
                        </c>
                        <c ca="center">
                           <p>-94</p>
                        </c>
                        <c ca="center">
                           <p>179</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MEK1</p>
                        </c>
                        <c ca="center">
                           <p>TCATTTGTAGTTTAT</p>
                        </c>
                        <c ca="center">
                           <p>-233</p>
                        </c>
                        <c ca="center">
                           <p>==ATGGCGGCTAAAT</p>
                        </c>
                        <c ca="center">
                           <p>-150</p>
                        </c>
                        <c ca="center">
                           <p>68</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Consensus</p>
                        </c>
                        <c ca="center">
                           <p>taTTTtGGAGTaata</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>==ttGGCGGCTAA==</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>[4,179]</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion and future work</p>
         </st>
         <p>In this paper, we introduced EXMOTIF, an efficient algorithm to extract structured motifs within one or multiple biological sequences. We showed its application in discovering single/composite regulatory binding sites. In the structured motif template, we assume the gap range between each pair of simple motifs is known. In the future, we plan to solve the motif discovery problem when even the gap ranges are unknown. Another potential direction is to directly extract structured profile (or position weight matrix) patterns.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>All authors contributed equally to this work.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported in part by NSF CAREER Award IIS-0092978, DOE Career Award DE-FG02-02ER25538, and NSF grants EIA-0103708 &amp; EMT-0432098. We also thank the anonymous referees for their helpful suggestions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>SCPD: A Promoter Database of the Yeast Saccharomyces Cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <issue>7&#8211;8</issue>
            <fpage>607</fpage>
            <lpage>11</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10487868</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Structured Motifs Search</p>
            </title>
            <aug>
               <au>
                  <snm>Policriti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vitacolonna</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Morgante</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zuccolo</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Symposium on Research in Computational Molecular Biology</source>
            <pubdate>2004</pubdate>
            <fpage>133</fpage>
            <lpage>139</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>On-line Approximate String Searching Algorithms: Survey and Experimental Results</p>
            </title>
            <aug>
               <au>
                  <snm>Michailidis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Margaritis</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>International Journal of Computer Mathematics</source>
            <pubdate>2002</pubdate>
            <volume>79</volume>
            <issue>8</issue>
            <fpage>867</fpage>
            <lpage>888</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <issue>24</issue>
            <fpage>5549</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">140044</pubid>
                  <pubid idtype="pmpid" link="fulltext">12490723</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3586</fpage>
            <lpage>3588</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169024</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824371</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A Consensus Based Algorithm for Finding Transcription Factor Binding Sites</p>
            </title>
            <aug>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mauri</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Workshop on Genomes: Information Structure and Complexity</source>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B7">
            <title>
               <p>An algorithm for finding signals of unknown length in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mauri</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S207</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11473011</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The value of prior knowledge in discovering motifs with MEME</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Elkan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>3rd Int'l Conference on Intelligent Systems for Molecular Biology</source>
            <pubdate>1995</pubdate>
            <fpage>21</fpage>
            <lpage>29</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Spelling Approximate Repeated or Common Motifs Using a Suffix Tree</p>
            </title>
            <aug>
               <au>
                  <snm>Sagot</snm>
                  <fnm>MF</fnm>
               </au>
            </aug>
            <source>3rd Latin American Symposium on Theoretical Informatics</source>
            <pubdate>1998</pubdate>
            <fpage>374</fpage>
            <lpage>390</lpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Scoring functions for transcription factor binding site prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Friberg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>von Rohr</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gonnet</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>84</fpage>
            <url>http://www.biomedcentral.com/1471-2105/6/84</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1140076</pubid>
                  <pubid idtype="pmpid" link="fulltext">15807889</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Discovering regulatory elements in non-coding sequences by analysis of spaced dyads</p>
            </title>
            <aug>
               <au>
                  <snm>van Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rios</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <issue>8</issue>
            <fpage>1808</fpage>
            <lpage>18</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102821</pubid>
                  <pubid idtype="pmpid" link="fulltext">10734201</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Finding composite regulatory patterns in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>Suppl 1</issue>
            <fpage>S354</fpage>
            <lpage>63</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169566</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Genome-wide analysis of bacterial promoter regions</p>
            </title>
            <aug>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Keich</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Gelfand</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source> Pac Symp Biocomput</source>
            <pubdate>2003</pubdate>
            <fpage>29</fpage>
            <lpage>40</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12603015</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Extracting Structured Motifs Using a suffix Tree &#8211; Algorithms and Application to Promoter Consensus Identification</p>
            </title>
            <aug>
               <au>
                  <snm>Marsan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>345</fpage>
            <lpage>354</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11108467</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Efficient Extraction of Structured Motifs Using Box-links</p>
            </title>
            <aug>
               <au>
                  <snm>Carvalho</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Freitas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>String Processing and Information Retrieval Conference</source>
            <pubdate>2004</pubdate>
            <fpage>267</fpage>
            <lpage>278</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A highly scalable algorithm for the extraction of cis-regulatory regions</p>
            </title>
            <aug>
               <au>
                  <snm>Carvalho</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Freitas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Asia-Pacific Bioinformatics Conference</source>
            <pubdate>2005</pubdate>
            <fpage>273</fpage>
            <lpage>283</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>RISOTTO: Fast extraction of motifs with mismatches</p>
            </title>
            <aug>
               <au>
                  <snm>Pisanti</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Carvalho</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Marsan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>MF</fnm>
               </au>
            </aug>
            <source>7th Latin American Theoretical Informatics Symposium</source>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B18">
            <title>
               <p>A parallel algorithm for the extraction of structured motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Carvalho</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Freitas</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>MF</fnm>
               </au>
            </aug>
            <source>19th ACM Symposium on Applied Computing</source>
            <pubdate>2004</pubdate>
            <fpage>147</fpage>
            <lpage>153</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Pattern Discovery in Biosequences</p>
            </title>
            <aug>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vilo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ukkonen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>International Colloquium on Grammatical Inference</source>
            <pubdate>1998</pubdate>
            <fpage>257</fpage>
            <lpage>270</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Incremental Paradigms of Motif Discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Apostolico</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Parida</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>15</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15072686</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Conservative extraction of over-represented extensible motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Apostolico</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Comin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Parida</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>Suppl. 1</issue>
            <fpage>i9</fpage>
            <lpage>il8</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15961503</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Mining Periodic Patterns with Gap Requirement from Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cheung</snm>
                  <fnm>DWL</fnm>
               </au>
               <au>
                  <snm>Yip</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>ACM Int'l Conference on Management of Data</source>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Tandem repeats finder: a program to analyze DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Benson</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <issue>2</issue>
            <fpage>573</fpage>
            <lpage>80</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148217</pubid>
                  <pubid idtype="pmpid" link="fulltext">9862982</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Identifying target sites for cooperatively binding factors</p>
            </title>
            <aug>
               <au>
                  <snm>Thakurta</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>7</issue>
            <fpage>608</fpage>
            <lpage>621</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11448879</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>SPADE: An Efficient Algorithm for Mining Frequent Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Zaki</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Machine Learning Journal</source>
            <pubdate>2001</pubdate>
            <volume>42</volume>
            <fpage>1</fpage>
            <lpage>31</lpage>
         </bibl>
      </refgrp>
   </bm>
</art>
