<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-7188-1-14</ui>
   <ji>1748-7188</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Allen</snm>
               <mi>E</mi>
               <fnm>Jonathan</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jeallen@umiacs.umd.edu</email>
            </au>
            <au id="A2">
               <snm>Salzberg</snm>
               <mi>L</mi>
               <fnm>Steven</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>salzberg@umiacs.umd.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Bioinformatics and Computational Biology, University of Maryland Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Computer Science, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD 21218, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Computer Science, University of Maryland, College Park, MD 20742, USA</p>
            </ins>
         </insg>
         <source>Algorithms for Molecular Biology</source>
         <issn>1748-7188</issn>
         <pubdate>2006</pubdate>
         <volume>1</volume>
         <issue>1</issue>
         <fpage>14</fpage>
         <url>http://www.almob.org/content/1/1/14</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16934144</pubid>
               <pubid idtype="doi">10.1186/1748-7188-1-14</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>25</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Allen and Salzberg; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies when alternative splicing only occurs under specific biological conditions. Non-expression based computational methods support identification of rarely expressed transcripts.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>A non-expression based statistical method is presented to annotate alternatively spliced exons using a single genome sequence and evidence from cross-species sequence conservation. The computational method is implemented in the program ExAlt and an analysis of prediction accuracy is given for <it>Drosophila melanogaster</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>ExAlt identifies the structure of most alternatively spliced exons in the test set and cross-species sequence conservation is shown to improve the precision of predictions. The software package is available to run on <it>Drosophila </it>genomes to search for new cases of alternative splicing.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>High-throughput sequencing of expression data provides compelling evidence that the long held hypothesis "one gene produces one protein" is far less common than previously thought. Surveys from the human genome estimate that as many as 70% of human genes produce more than one transcribed form <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Examples are found in a variety of metazoan organisms confirming that a significant number of genes produce multiple distinct transcripts <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Alternative splicing is an important biological mechanism for producing multiple distinct transcripts from a single gene locus. Exon intron junctions are pieced together to produce differing mRNAs. In some cases alternative exon splicing leads to different functional proteins thereby increasing protein diversity. In other cases an alternatively spliced exon leads to non-functional mRNA, effectively regulating gene expression <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>Given an input genomic sequence and the locations of gene regions, our goal is to find the functional exons originating from each gene locus, identifying their respective amino acid codons and splice sites. Figure <figr fid="F1">1</figr> shows examples of alternatively spliced exons examined in this study: intron retention (IR), cassette exon (CE), and multiple splice sites (MS). Also considered are constitutive exons (CS), defined to be an exon included with the same splice site boundaries in all functional mRNA forms.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Three forms of alternative splicing: Intron Retention (IR), Cassette Exon (CE), and Multiple Splice sites (MS)</p>
            </caption>
            <text>
               <p>Three forms of alternative splicing: Intron Retention (IR), Cassette Exon (CE), and Multiple Splice sites (MS).</p>
            </text>
            <graphic file="1748-7188-1-14-1"/>
         </fig>
         <sec>
            <st>
               <p>Related work</p>
            </st>
            <p>Gene expression provides evidence for large numbers of alternatively spliced genes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. The most reliable high throughput evidence for alternative splicing comes from full length cDNAs, which are limited in coverage across all biological states. Expressed Sequence Tags (ESTs) supplement the coverage of full length cDNAs but still fail to capture all expressed forms <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Genomic sequence patterns can potentially be used to identify alternative splicing in less commonly expressed genes and recent work has focused on developing computational methods to predict alternative splicing without direct evidence of gene expression. This work is divided into two types: explicit and implicit alternative splicing prediction.</p>
            <sec>
               <st>
                  <p>Explicit alternative splicing prediction</p>
               </st>
               <p>Sorek et al. looked at cassette exons in human and mouse and found a striking pattern of increased intron conservation distinct from constitutive exons <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A list of features were compiled including exon length, sequence conservation and k-mer counts <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, which were used in a support vector machine (SVM) <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> to classify cassette and constitutive exons. Yeo et al. developed a regularized least-squares classifier, called ACESCAN <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, to identify cassette exons in human/mouse orthologs using a similar feature set. A SVM cassette exon classifier was developed for <it>Caenorhabditis elegans </it>using only single species features and was extended to predict cassette exons in intron sequence <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. <it>Drosophila melanogaster </it>exons matched to <it>Drosophila pseudoobscura </it>orthologs with conserved flanking intron sequence were observed by Philipps et al. to be enriched for alternatively spliced exons <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Implicit alternative splicing prediction</p>
               </st>
               <p>An alternative approach is to predict multiple overlapping gene structures, or a single gene structure overlapping existing alternative annotation. Explicit features of alternative splicing are not scored, but by virtue of having multiple overlapping high scoring gene structures, alternative splicing is implied. One method sampled paths <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> in the generalized hidden Markov Model (GHMM) of the single isoform gene finder SLAM <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Re-occurring overlapping high scoring parses were reported as candidates for alternative splicing. Another approach is to find an exon splicing pattern with the highest scoring alignment to profile hidden Markov models (profile-HMMs) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The human genome was searched for cassette exons and intron retention events using a reference annotation <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Predicted gene structures with scores exceeding the reference gene structure were inferred to be examples of alternative splicing.</p>
               <p>The work most similar to the model introduced in this article is the pair-HMM UNCOVER <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, which finds exons in sequence annotated as introns and was tested on human/mouse intron pairs. Unlike the cassette exon classification methods <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, models were trained using examples of protein coding exons without explicitly distinguishing between constitutive exons and cassette exons. Since the input sequence is assumed to be an intron, predicted exons are inferred to be alternatively spliced.</p>
               <p>The method presented in this article extends the GHMMs used in single isoform gene finding <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> to explicitly model features of alternative and constitutive exons. The features of the explicit alternative splicing prediction methods: k-mer counts, exon lengths, and sequence conservation are used to predict multiple splice sites and intron retention events along with cassette exons and constitutive exons. Cross-species sequence conservation is incorporated using components of the single isoform phylogenetic HMM gene finders <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. The phylogenetic shadowing principle is used to assume a multiple sequence alignment can be obtained from closely related species <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. In contrast, the pair-HMM method simultaneously predicts a pairwise alignment and the exon structure making it potentially better suited to incorporate a difficult to align, more distantly related organism. Conservation from greater evolutionary distances may improve discriminative power in identify functional nucleotides, but with the potential trade off of detecting a smaller set of conserved alternative splicing events <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
               <p>The remainder of the article describes our computational prediction model and reports on prediction accuracy in <it>Drosophila melanogaster</it>.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Graphical model of alternative splicing</p>
            </st>
            <p>The prediction model is designed to predict multiple overlapping exons in a sequence believed to contain a single exon or intron, rather than the complete gene from start codon to stop codon. Input is expected to be a target sequence previously annotated by a single-isoform gene annotation tool such as a gene finder, cDNA alignment or some other annotation source. In cases where the input sequence contains untranslated regions, it is assumed that the coding boundary is known. Thus, the problem of translation start/stop site prediction is not addressed here.</p>
            <p>Alternative splicing increases the number of candidate acceptor/donor pairs compared to the constitutive exon equivalent. Figure <figr fid="F2">2</figr> shows four candidate splice sites, an acceptor site <it>a</it><sub>0</sub>, and three donor sites <it>d</it><sub>0</sub>, <it>d</it><sub>1</sub>, and <it>d</it><sub>2</sub>. In a single isoform gene finder, only one of the four exons labeled constitutive in Figure <figr fid="F2">2</figr> represent a viable exon. Allowing for alternative splicing means all three donors sites are potentially functional. For example, in Figure <figr fid="F2">2</figr>, the eighth candidate splicing type from the top has two functional donor sites <it>d</it><sub>0</sub><it/>(marked <it>MD</it>1) and <it>d</it><sub>1 </sub>(marked <it>MD</it>2), leading to two different functional exons. More than two functional donor or acceptor sites can occur leading to a model of unbounded size. The combinatorial possibilities are reduced to a finite number using one symbol for each splice type to represent functional splice sites over 2 in number. For example, MDN is the symbol used to represent the third functional donor site, <it>d</it><sub>2 </sub>in Figure <figr fid="F2">2</figr>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Alternative and single isoform exon candidates</p>
               </caption>
               <text>
                  <p>Alternative and single isoform exon candidates. Four splice sites are shown, one acceptor <it>a</it><sub>0</sub>, and three donor sites, <it>d</it><sub>0</sub>, <it>d</it><sub>1</sub>, and <it>d</it><sub>2</sub>, and the begin (B) and end (E) of the input sequence. There are four candidate constitutive exons (Constitutive Exon), three candidate cassette exons (Cassette Exon), and candidate exons with multiple functional donor sites (Multiple Splice Site Exon). See text for a description of the splice types: single acceptor (SA), cassette acceptor (CA), single donor (SD), cassette donor (CD), multiple donor 1 (MD1), multiple donor 2 (MD2), and multiple donor above two in number (MDN).</p>
               </text>
               <graphic file="1748-7188-1-14-2"/>
            </fig>
            <p>Donor sites are divided into five types: single functional constitutive donor <it>SD</it>, alternative donor for cassette exons <it>CD </it>and multiple functional donors <it>MD</it>1, <it>MD</it>2, and <it>MDN</it>. <it>MD</it>1 is the left most functional donor, <it>MD2 </it>is the donor immediately downstream of <it>MD1</it>, and <it>MDN </it>represents additional downstream donors. The classification scheme similarly extends to acceptor sites: <it>SA </it>(single constitutive acceptor), <it>CA </it>(acceptor for cassette exon), <it>MA1 </it>(first multiple acceptor), <it>MA2 </it>(second multiple acceptor), and <it>MAN </it>(multiple acceptors greater than 2).</p>
            <p>The intron retention splice site labeled GT in Figure <figr fid="F1">1</figr> (the 5' end of the retained intron) forms the basis of the splicing types: <it>SD-IR</it>, <it>MD1-IR</it>, <it>MD2-IR</it>, and <it>MDN-IR</it>. The intron retention acceptor labeled AG in Figure <figr fid="F1">1</figr> (3' end of the retained intron) forms the basis for the splicing types: <it>SA-IR</it>, <it>MA1-IR</it>, <it>MA2-IR</it>, and <it>MAN-IR</it>. There are five end of sequence conditions: beginning of the sequence (Beg), end of a constitutive intron (END-INTRON<sub><it>C</it></sub>), end of an alternative intron (END-INTRON<sub><it>A</it></sub>), end of a constitutive exon (END-EXON<sub><it>C</it></sub>), and end of an alternative exon (END-EXON<sub><it>A</it></sub>).</p>
            <p>Splice sites and end of sequence conditions are called signals and ordered signal pairs define the exon/intron intervals in an alternative exon splicing model. Figure <figr fid="F3">3</figr> shows a portion of the model and two example sets of states aligned to genomic sequence. The model in Figure <figr fid="F3">3</figr> predicts alternative splicing in internal exons for the three splicing types in Figure <figr fid="F1">1</figr> plus constitutive exons. The states represent sequence intervals between pairs of signals. The top right example in Figure <figr fid="F3">3</figr> shows an initial "Upstream Constitutive Intron" state between signal pair (Beg, SD), which marks an intron proximal to a constitutive splice site followed by states for each downstream exon interval: "Internal First Exon of IR" (SD,SD-IR), "Retained Intron" (SD-IR,SA-IR), and "Internal Last Exon of IR" (SA-IR,SD), ending in the "Downstream Constitutive Intron" state (SD,END-INTRON<it>c</it>). The bottom right example in Figure <figr fid="F3">3</figr> shows "Upstream Alternative Intron" (Beg,MA1), "Multiple Acceptor 1" (MA1,MA2), "Single Donor" (MA2,SD), and "Downstream Constitutive Intron" (SD, END-INTRON<sub><it>C</it></sub>). States not shown in Figure <figr fid="F3">3</figr> model rarer forms of splicing, including combinations of alternative splicing events and splicing in exons at the end of genes. The complete model is given in the Methods section.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Left image shows a portion of the graphical model for alternatively spliced exons</p>
               </caption>
               <text>
                  <p>Left image shows a portion of the graphical model for alternatively spliced exons. The right side of the figure shows two examples of parsing a target sequence. The top right example parses an intron retention sequence and the bottom right example parses a multiple splice site sequence. Blue states output partial subsequence of alternatively spliced exons, beige states are exons beginning with an acceptor and ending with a donor. Green states are introns.</p>
               </text>
               <graphic file="1748-7188-1-14-3"/>
            </fig>
            <sec>
               <st>
                  <p>Phylogenetic generalized hidden Markov model definition</p>
               </st>
               <p>A phylogenetic generalized hidden Markov model (PGHMM) extends a model described in the single isoform gene finder Shadower <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The method described here models higher order nucleotide dependencies and is applied to the alternative exon splicing model introduced in Figure <figr fid="F3">3</figr>. An input multiple sequence alignment <it>X </it>= <it>S</it><sub>0</sub>,..., <it>S</it><sub><it>m </it></sub>includes the target sequence <it>S</it><sub>0 </sub>and <it>m </it>informant species. X[k] is the kth column in X and X[i,j] are the columns from position i to j inclusive, the PGHMM is defined to be a 7 tuple, (<it>Q</it>, <it>&#960;</it>, &#931;, <it>R</it>, &#968;, <it>O</it>, <it>L</it>):</p>
               <p>&#8226; <it>Q </it>&#8211; the set of states with states <it>q</it>, <it>q</it>'. &#8712; <it>Q</it></p>
               <p>&#8226; <it>P</it><sub><it>&#960;</it></sub>(<it>q</it>) &#8211; the probability of beginning in state q</p>
               <p>&#8226; &#931; &#8211; the set of nucleotides {<it>A</it>, <it>C</it>, <it>G</it>, <it>T</it>} emitted in the model</p>
               <p>&#8226; <it>P</it><sub><it>R</it></sub>(<it>q</it>|<it>q</it>') &#8211; the transition probabilities from state <it>q</it>' to state <it>q</it></p>
               <p>&#8226; <it>&#968; </it>&#8211; the set of phylogenetic parameters</p>
               <p>&#8226; <it>P</it><sub><it>O </it></sub>(<it>X </it>[<it>i</it>, <it>j</it>]|<it>q</it>, <it>&#968;</it>) &#8211; the probability of emitting sequence alignment columns from <it>i </it>to <it>j </it>in state <it>q </it>using phylogenetic parameter set <it>&#968;</it></p>
               <p>&#8226; <it>P</it><sub><it>L</it>, <it>q </it></sub>(<it>j </it>- <it>i </it>+ 1) &#8211; the probability of the state <it>q </it>emitting the series of columns of length <it>j </it>- <it>i </it>+ 1</p>
               <p>The parse of multiple sequence alignment X is a series of partitions <it>t </it>= (<it>t</it><sub>0</sub>, <it>t</it><sub>1</sub>, ..., <it>t</it><sub><it>n</it></sub>), with state <it>q</it><sub><it>i </it></sub>outputting a contiguous series of columns in X from position <it>b</it><sub><it>i </it></sub>to <it>e</it><sub><it>i </it></sub>inclusive in partition <it>t</it><sub><it>i </it></sub>= (<it>b</it><sub><it>i</it></sub>, <it>e</it><sub><it>i</it></sub>, <it>q</it><sub><it>i</it></sub>). The parse spans the entire multiple sequence alignment <it>X </it>so that <it>b</it><sub><it>i </it>+ 1 </sub>= <it>e</it><sub><it>i </it></sub>+ 1. The joint probability between parse <it>t </it>and sequence <it>X </it>is:</p>
               <p>
                  <m:math name="1748-7188-1-14-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mi>P</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>=</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>P</m:mi>
                                          <m:mi>O</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:msub>
                                          <m:mi>b</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>e</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mi>q</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:mi>&#968;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#215;</m:mo>
                                       <m:msub>
                                          <m:mi>P</m:mi>
                                          <m:mi>&#960;</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>q</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#215;</m:mo>
                                       <m:msub>
                                          <m:mi>P</m:mi>
                                          <m:mrow>
                                             <m:mi>L</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:msub>
                                                <m:mi>q</m:mi>
                                                <m:mn>0</m:mn>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>e</m:mi>
                                          <m:mn>0</m:mn>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#215;</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8719;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>n</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>P</m:mi>
                                                <m:mi>O</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mstyle>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:msub>
                                          <m:mi>b</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>e</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mi>q</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:mi>&#968;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#215;</m:mo>
                                       <m:msub>
                                          <m:mi>P</m:mi>
                                          <m:mi>R</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>q</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mi>q</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#215;</m:mo>
                                       <m:msub>
                                          <m:mi>P</m:mi>
                                          <m:mrow>
                                             <m:mi>L</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:msub>
                                                <m:mi>q</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>e</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msub>
                                          <m:mi>b</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>.</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeWabaaabaGaemiuaaLaeiikaGIaemiDaqNaeiilaWIaemiwaGLaeiykaKIaeyypa0dabaGaemiuaa1aaSbaaSqaaiabd+eapbqabaGccqGGOaakcqWGybawcqGGBbWwcqWGIbGydaWgaaWcbaGaeGimaadabeaakiabcYcaSiabdwgaLnaaBaaaleaacqaIWaamaeqaaOGaeiyxa0LaeiiFaWNaemyCae3aaSbaaSqaaiabicdaWaqabaGccqGGSaaliiGacqWFipqEcqGGPaqkcqGHxdaTcqWGqbaudaWgaaWcbaGae8hWdahabeaakiabcIcaOiabdghaXnaaBaaaleaacqaIWaamaeqaaOGaeiykaKIaey41aqRaemiuaa1aaSbaaSqaaiabdYeamjabcYcaSiabdghaXnaaBaaameaacqaIWaamaeqaaaWcbeaakiabcIcaOiabdwgaLnaaBaaaleaacqaIWaamaeqaaOGaey4kaSIaeGymaeJaeiykaKIaey41aqlabaWaaebmaeaacqWGqbaudaWgaaWcbaGaem4ta8eabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGUbGBa0Gaey4dIunakiabcIcaOiabdIfayjabcUfaBjabdkgaInaaBaaaleaacqWGPbqAaeqaaOGaeiilaWIaemyzau2aaSbaaSqaaiabdMgaPbqabaGccqGGDbqxcqGG8baFcqWGXbqCdaWgaaWcbaGaemyAaKgabeaakiabcYcaSiab=H8a5jabcMcaPiabgEna0kabdcfaqnaaBaaaleaacqWGsbGuaeqaaOGaeiikaGIaemyCae3aaSbaaSqaaiabdMgaPbqabaGccqGG8baFcqWGXbqCdaWgaaWcbaGaemyAaKMaeyOeI0IaeGymaedabeaakiabcMcaPiabgEna0kabdcfaqnaaBaaaleaacqWGmbatcqGGSaalcqWGXbqCdaWgaaadbaGaemyAaKgabeaaaSqabaGccqGGOaakcqWGLbqzdaWgaaWcbaGaemyAaKgabeaakiabgkHiTiabdkgaInaaBaaaleaacqWGPbqAaeqaaOGaey4kaSIaeGymaeJaeiykaKIaeiOla4caaaaa@A1B6@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>Each interval from <it>b </it>to <it>e </it>begins and ends with a signal covering a fixed length window, <it>Sig</it><sub><it>b </it></sub>and <it>Sig</it><sub><it>e </it></sub>respectively. The donor signal window width <it>W</it><sub><it>don</it></sub>, is set to 9 for all donor types (SD, CD, MD1, MD2, MDN, SD-IR, MD1-IR, MD2-IR, MDN-IR). When the window covers columns <it>X </it>[<it>k</it>, <it>k </it>+ 8] from <it>k </it>to <it>k </it>+ 8 the consensus splice site is in subsequence <it>S</it><sub>0 </sub>[<it>k </it>+ 3, <it>k </it>+ 4] = <it>GT</it>. The acceptor window width <it>W</it><sub><it>acc</it></sub>, is set to 24 for all acceptor types (SA, CA, MA1, MA2, MAN, SA-IR, MA1-IR, MA2-IR, MAN-IR), covering columns <it>X</it>[<it>k </it>- 23, <it>k</it>] from k-23 to k with consensus splice site in subsequence <it>S</it><sub>0</sub>[<it>k </it>- 3, <it>k </it>- 2] = <it>AG</it>. The beginning and ending sequence signals set the window parameter <it>W</it><sub><it>Sig </it></sub>to 0 since non splice site signals are not explicitly modeled.</p>
               <p>State <it>q </it>emitting columns <it>X</it>[<it>b</it>, <it>e</it>] from <it>b </it>to <it>e </it>models the downstream signal <it>Sig</it><sub><it>e </it></sub>but excludes the upstream signal <it>Sig</it><sub><it>b</it></sub>. For example, when <it>q </it>= <it>Multiple Donor </it>1, q outputs columns between two donor sites, <it>Sig</it><sub><it>b </it></sub>= <it>MD</it>1 and <it>Sig</it><sub><it>e </it></sub>= <it>MD</it>2. The exon interval is scored from <it>b </it>to e-<it>W</it><sub><it>don </it></sub>- 1 inclusive and the donor columns are scored from <it>e </it>- <it>W</it><sub><it>don </it></sub>to <it>e </it>inclusive. The upstream donor site window <it>MD</it>1 spans the interval <it>b </it>- <it>W</it><sub><it>don </it></sub>- 1 to <it>b </it>- 1 and is scored in the previous state.</p>
               <p>The probability of a state emitting a series of columns becomes:</p>
               <p>
                  <m:math name="1748-7188-1-14-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mi>O</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>X</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mi>b</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>e</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>|</m:mo>
                           <m:mi>q</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#968;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>b</m:mi>
                                 </m:mrow>
                                 <m:mi>e</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>P</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>X</m:mi>
                                 <m:mo stretchy="false">[</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo stretchy="false">]</m:mo>
                                 <m:mo>|</m:mo>
                                 <m:mi>S</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>c</m:mi>
                                 <m:mi>t</m:mi>
                                 <m:mi>M</m:mi>
                                 <m:mi>o</m:mi>
                                 <m:mi>d</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>X</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>e</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>z</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>q</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#968;</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaudaWgaaWcbaGaem4ta8eabeaakiabcIcaOiabdIfayjabcUfaBjabdkgaIjabcYcaSiabdwgaLjabc2faDjabcYha8jabdghaXjabcYcaSGGaciab=H8a5jabcMcaPiabg2da9maarahabaGaemiuaaLaeiikaGIaemiwaGLaei4waSLaem4AaSMaeiyxa0LaeiiFaWNaem4uamLaemyzauMaemiBaWMaemyzauMaem4yamMaemiDaqNaemyta0Kaem4Ba8MaemizaqMaemyzauMaemiBaWMaeiikaGIaemiwaGLaeiilaWIaemyzauMaeiilaWIaem4AaSMaeiilaWIaemOEaONaeiilaWIaemyCaeNaeiilaWIae8hYdKNaeiykaKIaeiykaKcaleaacqWGRbWAcqGH9aqpcqWGIbGyaeaacqWGLbqza0Gaey4dIunaaaa@6C9F@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>The probability of emitting each column in the alignment is defined by a sequence model returned by <it>SelectModel</it>(<it>X</it>, <it>e</it>, <it>k</it>, <it>z</it>, <it>q</it>, <it>&#968;</it>). The current position <it>k </it>in alignment X, the end position of the scored interval (<it>e</it>), current state q, protein coding phase <it>z</it>, and phylogenetic parameters <it>&#968; </it>determine the choice of sequence models. If <it>q </it>is an exon state and k is within the coding region, the coding phase <it>z </it>is 0, 1 or 2 and -1 otherwise. When q is an exon state and k is outside the coding region, an untranslated exon region is implied. The sequence models are divided into three "template" categories, <m:math name="1748-7188-1-14-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>M</m:mi><m:mrow><m:mi>S</m:mi><m:mi>i</m:mi><m:msub><m:mi>g</m:mi><m:mi>e</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGnbqtdaWgaaWcbaGaem4uamLaemyAaKMaem4zaC2aaSbaaWqaaiabdwgaLbqabaaaleqaaaaa@3367@</m:annotation></m:semantics></m:math>, <it>M</it><sub><it>codon</it></sub>, and <it>M</it><sub><it>non-coding </it></sub>and an instance of one of these three types is returned by the function:</p>
               <p><it>SelectModel </it>(<it>X</it>, <it>e</it>, <it>k</it>, <it>z</it>, <it>q</it>, <it>&#968;</it>) =</p>
               <p>
                  <m:math name="1748-7188-1-14-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>M</m:mi>
                                                <m:mrow>
                                                   <m:mi>S</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:msub>
                                                      <m:mi>g</m:mi>
                                                      <m:mi>e</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>k</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>e</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>&#968;</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:mi>k</m:mi>
                                             <m:mo>></m:mo>
                                             <m:mi>e</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>W</m:mi>
                                                <m:mrow>
                                                   <m:mi>S</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:msub>
                                                      <m:mi>g</m:mi>
                                                      <m:mi>e</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>M</m:mi>
                                                <m:mrow>
                                                   <m:mi>c</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>d</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>n</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>k</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>z</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>&#968;</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:mi>z</m:mi>
                                             <m:mo>&#8800;</m:mo>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>M</m:mi>
                                                <m:mrow>
                                                   <m:mi>n</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>n</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>c</m:mi>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>d</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:mi>n</m:mi>
                                                   <m:mi>g</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>k</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>&#968;</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:mi>o</m:mi>
                                             <m:mi>t</m:mi>
                                             <m:mi>h</m:mi>
                                             <m:mi>e</m:mi>
                                             <m:mi>r</m:mi>
                                             <m:mi>w</m:mi>
                                             <m:mi>i</m:mi>
                                             <m:mi>s</m:mi>
                                             <m:mi>e</m:mi>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                              <m:mo>}</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGadeqaauaabeqadiaaaeaacqWGnbqtdaWgaaWcbaGaem4uamLaemyAaKMaem4zaC2aaSbaaWqaaiabdwgaLbqabaaaleqaaOGaeiikaGIaemiwaGLaeiilaWIaem4AaSMaeiilaWIaemyzauMaeiilaWccciGae8hYdKNaeiykaKcabaGaem4AaSMaeyOpa4JaemyzauMaeyOeI0Iaem4vaC1aaSbaaSqaaiabdofatjabdMgaPjabdEgaNnaaBaaameaacqWGLbqzaeqaaaWcbeaaaOqaaiabd2eannaaBaaaleaacqWGJbWycqWGVbWBcqWGKbazcqWGVbWBcqWGUbGBaeqaaOGaeiikaGIaemiwaGLaeiilaWIaem4AaSMaeiilaWIaemOEaONaeiilaWIae8hYdKNaeiykaKcabaGaemOEaONaeyiyIKRaeyOeI0IaeGymaedabaGaemyta00aaSbaaSqaaiabd6gaUjabd+gaVjabd6gaUjabgkHiTiabdogaJjabd+gaVjabdsgaKjabdMgaPjabd6gaUjabdEgaNbqabaGccqGGOaakcqWGybawcqGGSaalcqWGRbWAcqGGSaalcqWFipqEcqGGPaqkaeaacqWGVbWBcqWG0baDcqWGObaAcqWGLbqzcqWGYbGCcqWG3bWDcqWGPbqAcqWGZbWCcqWGLbqzaaaacaGL7bGaayzFaaaaaa@85B2@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>In theory, each state could maintain separate sequence models. For example, the "Internal First Exon of IR" state could model codon usage separately from the "Internal Last Exon of IR" state. In practice, this results in far too many parameters to estimate given training data sizes. Instead the states are tied to 10 candidate sequence models returned by <it>SelectModel</it>. The models are listed with the analogous Markov models commonly used in single isoform <it>ab initio </it>gene finders.</p>
               <p>&#8226; <it>M</it><sub><it>non-coding </it></sub>(<it>X</it>, <it>k</it>, <it>&#968;</it>). 3rd order homogeneous Markov model: <it>P</it>(<it>S</it><sub>0</sub>[<it>k</it>]|<it>S</it><sub>0</sub>[<it>k </it>- 3, <it>k </it>- 1])</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>AUTR</it></sub>(<it>X</it>, <it>k</it>, <it>&#968;</it>) &#8211; alternative 5'/3' untranslated region (AUTR)</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>CUTR </it></sub>(<it>X</it>, <it>k</it>, <it>&#968;</it>) &#8211; constitutive 5'/3' untranslated region (CUTR)</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>AI </it></sub>(<it>X</it>, <it>k</it>, <it>&#968;</it>) &#8211; alternative intron (AI)</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>CI </it></sub>(<it>X</it>, <it>k</it>, <it>&#968;</it>) &#8211; constitutive intron (CI)</p>
               <p>&#8226; <it>M</it><sub><it>codon </it></sub>(<it>X</it>, <it>k</it>, <it>z</it>, <it>&#968;</it>). 3rd order inhomogeneous 3-periodic Markov model: <it>P</it><sup><it>z </it></sup>(<it>S</it><sub>0</sub>[<it>k</it>]|<it>S</it><sub>0</sub>[<it>k </it>- 3, <it>k </it>- 1])</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>AE </it></sub>(<it>X</it>, <it>k</it>, <it>z</it>, <it>&#968;</it>) &#8211; alternative exon (AE)</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>CE </it></sub>(<it>X</it>, <it>k</it>, <it>z</it>, <it>&#968;</it>) &#8211; constitutive exon (CE)</p>
               <p>&#8226; <it>M</it><sub><it>don </it></sub>(<it>X</it>, <it>k</it>, <it>e</it>, <it>&#968;</it>). 1st order inhomogeneous Markov model (WAM): <it>P</it><sup>9-(<it>e </it>- <it>k</it>) </sup>(<it>S</it><sub>0</sub>[<it>k</it>]|<it>S</it><sub>0</sub>[<it>k </it>- 1])</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>SD </it></sub>(<it>X</it>, <it>k</it>, <it>e</it>, <it>&#968;</it>) &#8211; constitutive/single donor (SD)</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>AD </it></sub>(<it>X</it>, <it>k</it>, <it>e</it>, <it>&#968;</it>) &#8211; alternative donor (AD) (covers all alternative donor types)</p>
               <p>&#8226; <it>M</it><sub><it>acc </it></sub>(<it>X</it>, <it>k</it>, <it>e</it>, <it>&#968;</it>). 1st order inhomogeneous Markov model (WAM): <it>P</it><sup>24-(<it>e </it>- <it>k</it>) </sup>(<it>S</it>[<it>k</it>]|<it>S</it>[<it>k </it>- 1])</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>SA </it></sub>(<it>X</it>, <it>k</it>, <it>e</it>, <it>&#968;</it>)- constitutive/single acceptor (SA)</p>
               <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;- <it>M</it><sub><it>AA </it></sub>(<it>X</it>, <it>k</it>, <it>e</it>, <it>&#968;</it>)- alternative acceptor (AA) (covers all alternative acceptor types)</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>From Markov models to evolutionary models</p>
            </st>
            <p>The key difference between our implementation and a single isoform <it>ab initio </it>gene finder is two fold: 1) separate models are maintained for the two splicing types: alternative and constitutive, and 2) the nucleotide dependencies are modeled using an evolutionary framework. The choice to separate sequence models for the two splicing types is motivated by the previous work explicitly classifying alternatively spliced exons and by the hypothesis that a splice site can be activated or deactivated with proximal splicing factors binding to the pre-mRNA sequence to interact directly (or indirectly) with the Spliceosome. The presence of splicing factors in conjunction with the characteristics of the splice site is expected to determine splice site usage. There is growing evidence that many alternative splicing events follow this model <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. </p>
            <p>Each sequence model estimates the probability of emitting column <it>X</it>[<it>k</it>] using a phylogenetic tree. Figure <figr fid="F4">4</figr> shows a schematic of the phylogenetic tree for four <it>Drosophila </it>species used in testing. The goal is to compute the probability of the observed column having evolved from a common ancestral sequence. For the tree in Figure <figr fid="F4">4</figr>, assume the ancestral base at the root ("Ancestor 1") to be A and the descendant node ("Ancestor 2") to be C. The probability of A evolving to C is computed using a nucleotide substitution model. The HYK model <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> was chosen for the use of three features: distinguished transition/transversion mutation events (assumed to be a fixed parameter), a nucleotide equilibrium model, and the evolutionary time interval defined by the tree branch length. The probability of emitting column <it>X</it>[<it>k</it>] is found by computing, in linear time with respect to the number of nodes in the tree, the probability of all possible ancestral sequences having evolved into the observed column using the Felsenstein phylogenetic tree scoring procedure <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Phylogenetic tree for four species of <it>Drosophila</it></p>
               </caption>
               <text>
                  <p>Phylogenetic tree for four species of <it>Drosophila</it>. Each branch <it>i </it>has a branch length of <it>b</it><sub><it>i</it></sub>.</p>
               </text>
               <graphic file="1748-7188-1-14-4"/>
            </fig>
            <p>The nucleotide equilibrium parameters of the HYK model are naturally suited to incorporate the nucleotide bias found in the different sequence models (e.g. donor, codon, etc.). With a multiple sequence alignment as input, an intuitive extension to the <it>ab initio </it>Markov model is to use the preceding <it>o </it>bases from each input sequence to estimate the likelihood of the current nucleotide (where <it>o </it>is the order of the Markov model). For example, in Figure <figr fid="F4">4</figr>, estimating the probability of nucleotide C at "Ancestor 2" having evolved from nucleotide A at "Ancestor 1", should reflect the nucleotide equilibrium of <it>D. melanogaster </it>and <it>D. simulans</it>, given the <it>o </it>previous bases in the input alignment for the two species. Similarly, estimating the probability of the root ancestral base being A (Ancestor 1) should reflect the nucleotide equilibrium among all four species given the <it>o </it>previous nucleotides in the input alignment from all four species. If <it>d</it><sub><it>v </it></sub>is the number of descendants at node <it>v</it>, the number of parameters is <m:math name="1748-7188-1-14-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msub><m:mo>&#8721;</m:mo><m:mi>v</m:mi></m:msub><m:mrow><m:msup><m:mn>4</m:mn><m:mrow><m:msub><m:mi>d</m:mi><m:mi>v</m:mi></m:msub><m:mo>+</m:mo><m:mi>o</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeqaqaaiabisda0maaCaaaleqabaGaemizaq2aaSbaaWqaaiabdAha2bqabaWccqGHRaWkcqWGVbWBcqGHRaWkcqaIXaqmaaaabaGaemODayhabeqdcqGHris5aaaa@3835@</m:annotation></m:semantics></m:math> (<it>v </it>enumerating over all nodes in the tree) leaving too many parameters to reliably estimate, given the current limits on training sizes.</p>
            <p>Frequency counts obtained from each organism independently, reduce the parameter size for each sequence model to (<it>m </it>+ 1) &#215; 4<sup><it>o </it>+ 1 </sup>(where <it>m </it>+ 1 is the number of organisms). Let <m:math name="1748-7188-1-14-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:msup><m:mi>S</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mn>0</m:mn></m:msub><m:mo>,</m:mo><m:mo>&#8230;</m:mo><m:mo>,</m:mo><m:msub><m:msup><m:mi>S</m:mi><m:mo>&#8242;</m:mo></m:msup><m:msup><m:mi>m</m:mi><m:mo>&#8242;</m:mo></m:msup></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqbamaaBaaaleaacqaIWaamaeqaaOGaeiilaWIaeSOjGSKaeiilaWIafm4uamLbauaadaWgaaWcbaGafmyBa0Mbauaaaeqaaaaa@34C3@</m:annotation></m:semantics></m:math> be the sequences descendant from node <it>v</it>. If <it>c</it>(<m:math name="1748-7188-1-14-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:msup><m:mi>S</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F6E@</m:annotation></m:semantics></m:math>[<it>k </it>- <it>o</it>, <it>k </it>- 1], <it>n</it>) returns the number of times each nucleotide <it>n </it>&#8712; {<it>A</it>, <it>C</it>, <it>G</it>, <it>T</it>} was observed to follow the substring <m:math name="1748-7188-1-14-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:msup><m:mi>S</m:mi><m:mo>&#8242;</m:mo></m:msup><m:mi>i</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGtbWugaqbamaaBaaaleaacqWGPbqAaeqaaaaa@2F6E@</m:annotation></m:semantics></m:math>[<it>k </it>- <it>o</it>, <it>k </it>- 1] at position k in a training alignment, the nucleotide equilibrium at node <it>v </it>for each nucleotide <it>n </it>is:</p>
            <p>
               <m:math name="1748-7188-1-14-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>0</m:mn>
                              </m:mrow>
                              <m:msup>
                                 <m:mi>m</m:mi>
                                 <m:mo>&#8242;</m:mo>
                              </m:msup>
                           </m:munderover>
                           <m:mrow>
                              <m:mi>c</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:msup>
                                    <m:mi>S</m:mi>
                                    <m:mo>&#8242;</m:mo>
                                 </m:msup>
                                 <m:mi>i</m:mi>
                              </m:msub>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo stretchy="false">[</m:mo>
                        <m:mi>k</m:mi>
                        <m:mo>&#8722;</m:mo>
                        <m:mi>o</m:mi>
                        <m:mo>,</m:mo>
                        <m:mi>k</m:mi>
                        <m:mo>&#8722;</m:mo>
                        <m:mn>1</m:mn>
                        <m:mo stretchy="false">]</m:mo>
                        <m:mo>,</m:mo>
                        <m:mi>n</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>/</m:mo>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>0</m:mn>
                              </m:mrow>
                              <m:msup>
                                 <m:mi>m</m:mi>
                                 <m:mo>&#8242;</m:mo>
                              </m:msup>
                           </m:munderover>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munder>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mi>n</m:mi>
                                          <m:mo>&#8242;</m:mo>
                                       </m:msup>
                                       <m:mo>&#8712;</m:mo>
                                       <m:mo>{</m:mo>
                                       <m:mi>A</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>C</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>G</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:munder>
                                 <m:mrow>
                                    <m:mi>c</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:msup>
                                          <m:mi>S</m:mi>
                                          <m:mo>&#8242;</m:mo>
                                       </m:msup>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">[</m:mo>
                                    <m:mi>k</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mi>o</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>k</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo stretchy="false">]</m:mo>
                                    <m:mo>,</m:mo>
                                    <m:msup>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8242;</m:mo>
                                    </m:msup>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mstyle>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWbqaaiabdogaJjabcIcaOiqbdofatzaafaWaaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGimaadabaGafmyBa0Mbauaaa0GaeyyeIuoakiabcUfaBjabdUgaRjabgkHiTiabd+gaVjabcYcaSiabdUgaRjabgkHiTiabigdaXiabc2faDjabcYcaSiabd6gaUjabcMcaPiabc+caViabcIcaOmaaqahabaWaaabuaeaacqWGJbWycqGGOaakcuWGtbWugaqbamaaBaaaleaacqWGPbqAaeqaaOGaei4waSLaem4AaSMaeyOeI0Iaem4Ba8MaeiilaWIaem4AaSMaeyOeI0IaeGymaeJaeiyxa0LaeiilaWIafmOBa4MbauaacqGGPaqkcqGGPaqkaSqaaiqbd6gaUzaafaGaeyicI4Saei4EaSNaemyqaeKaeiilaWIaem4qamKaeiilaWIaem4raCKaeiilaWIaemivaqLaeiyFa0habeqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGimaadabaGafmyBa0Mbauaaa0GaeyyeIuoaaaa@7121@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Tree branch lengths are assumed to be fixed, but functional sequence elements are expected to exhibit a slower rate of substitution. Each sequence model maintains substitution rates to either expand the branch lengths of the tree (for rates greater than 1) or contract the branch lengths. Longer branch lengths have the effect of allowing mutations to accumulate in a column without incurring a scoring penalty, whereas, shorter branch lengths reward perfectly preserved columns. The codon, intron, and untranslated region states use two substitution rates, one rate for when the input column is conserved (no mutations observed) and a second rate when the column is not conserved. If a mutation is observed in a codon where the encoded amino acid is preserved, the higher substitution rate is selected to better accept the mutation. In the case of splice sites, each base is assumed to be subject to selective pressure and a single rate is used. An optimal parse of multiple sequence alignment <it>X </it>is found taking the log ratio of a state emitting the columns in X versus an equivalent model assuming all nucleotides are equally probable (the "background" state). Using a dynamic programming matrix <it>D</it>(<it>j</it>, <it>q</it>) initialized to</p>
            <p>
               <m:math name="1748-7188-1-14-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>l</m:mi>
                        <m:mi>o</m:mi>
                        <m:mi>g</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>P</m:mi>
                                 <m:mi>O</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>X</m:mi>
                              <m:mo stretchy="false">[</m:mo>
                              <m:mn>0</m:mn>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                              <m:mo stretchy="false">]</m:mo>
                              <m:mo>|</m:mo>
                              <m:mi>q</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>&#968;</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>&#215;</m:mo>
                              <m:msub>
                                 <m:mi>P</m:mi>
                                 <m:mi>&#960;</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>q</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>&#215;</m:mo>
                              <m:msub>
                                 <m:mi>P</m:mi>
                                 <m:mrow>
                                    <m:mi>L</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>q</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>j</m:mi>
                              <m:mo>+</m:mo>
                              <m:mn>1</m:mn>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>P</m:mi>
                                 <m:mi>O</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>X</m:mi>
                              <m:mo stretchy="false">[</m:mo>
                              <m:mn>0</m:mn>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                              <m:mo stretchy="false">]</m:mo>
                              <m:mo>|</m:mo>
                              <m:mi>b</m:mi>
                              <m:mi>a</m:mi>
                              <m:mi>c</m:mi>
                              <m:mi>k</m:mi>
                              <m:mi>g</m:mi>
                              <m:mi>r</m:mi>
                              <m:mi>o</m:mi>
                              <m:mi>u</m:mi>
                              <m:mi>n</m:mi>
                              <m:mi>d</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>&#968;</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mfrac>
                        <m:mo stretchy="false">)</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieGacqWFSbaBcqWFVbWBcqWFNbWzcqGGOaakdaWcaaqaaiabdcfaqnaaBaaaleaacqWGpbWtaeqaaOGaeiikaGIaemiwaGLaei4waSLaeGimaaJaeiilaWIaemOAaOMaeiyxa0LaeiiFaWNaemyCaeNaeiilaWccciGae4hYdKNaeiykaKIaey41aqRaemiuaa1aaSbaaSqaaiab+b8aWbqabaGccqGGOaakcqWGXbqCcqGGPaqkcqGHxdaTcqWGqbaudaWgaaWcbaGaemitaWKaeiilaWIaemyCaehabeaakiabcIcaOiabdQgaQjabgUcaRiabigdaXiabcMcaPaqaaiabdcfaqnaaBaaaleaacqWGpbWtaeqaaOGaeiikaGIaemiwaGLaei4waSLaeGimaaJaeiilaWIaemOAaOMaeiyxa0LaeiiFaWNaemOyaiMaemyyaeMaem4yamMaem4AaSMaem4zaCMaemOCaiNaem4Ba8MaemyDauNaemOBa4MaemizaqMaeiilaWIae4hYdKNaeiykaKcaaiabcMcaPaaa@7432@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>entries for each nucleotide <it>j </it>and state <it>q </it>are assigned a value:</p>
            <p>
               <m:math name="1748-7188-1-14-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>D</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>q</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>=</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>m</m:mi>
                                    <m:mi>a</m:mi>
                                    <m:msub>
                                       <m:mi>x</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:msup>
                                             <m:mi>q</m:mi>
                                             <m:mo>&#8242;</m:mo>
                                          </m:msup>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mi>l</m:mi>
                                    <m:mi>o</m:mi>
                                    <m:mi>g</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>P</m:mi>
                                             <m:mi>O</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>X</m:mi>
                                          <m:mo stretchy="false">[</m:mo>
                                          <m:mi>i</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>j</m:mi>
                                          <m:mo stretchy="false">]</m:mo>
                                          <m:mo>|</m:mo>
                                          <m:mi>q</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>&#968;</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>&#215;</m:mo>
                                          <m:msub>
                                             <m:mi>P</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>q</m:mi>
                                          <m:mo>|</m:mo>
                                          <m:msup>
                                             <m:mi>q</m:mi>
                                             <m:mo>&#8242;</m:mo>
                                          </m:msup>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>&#215;</m:mo>
                                          <m:msub>
                                             <m:mi>P</m:mi>
                                             <m:mrow>
                                                <m:mi>L</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>q</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>j</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>i</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mn>1</m:mn>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>P</m:mi>
                                             <m:mi>O</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>X</m:mi>
                                          <m:mo stretchy="false">[</m:mo>
                                          <m:mi>i</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>j</m:mi>
                                          <m:mo stretchy="false">]</m:mo>
                                          <m:mo>|</m:mo>
                                          <m:mi>b</m:mi>
                                          <m:mi>a</m:mi>
                                          <m:mi>c</m:mi>
                                          <m:mi>k</m:mi>
                                          <m:mi>g</m:mi>
                                          <m:mi>r</m:mi>
                                          <m:mi>o</m:mi>
                                          <m:mi>u</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mi>d</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>&#968;</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mfrac>
                                    <m:mo>+</m:mo>
                                    <m:mi>D</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:msup>
                                       <m:mi>q</m:mi>
                                       <m:mo>&#8242;</m:mo>
                                    </m:msup>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeGabaaabaGaemiraqKaeiikaGIaemOAaOMaeiilaWIaemyCaeNaeiykaKIaeyypa0dabaacbiGae8xBa0Mae8xyaeMae8hEaG3aaSbaaSqaaiabdMgaPjabcYcaSiqbdghaXzaafaaabeaakiab=XgaSjab=9gaVjab=DgaNjabcIcaOmaalaaabaGaemiuaa1aaSbaaSqaaiabd+eapbqabaGccqGGOaakcqWGybawcqGGBbWwcqWGPbqAcqGGSaalcqWGQbGAcqGGDbqxcqGG8baFcqWGXbqCcqGGSaaliiGacqGFipqEcqGGPaqkcqGHxdaTcqWGqbaudaWgaaWcbaGaemivaqfabeaakiabcIcaOiabdghaXjabcYha8jqbdghaXzaafaGaeiykaKIaey41aqRaemiuaa1aaSbaaSqaaiabdYeamjabcYcaSiabdghaXbqabaGccqGGOaakcqWGQbGAcqGHsislcqWGPbqAcqGHRaWkcqaIXaqmcqGGPaqkaeaacqWGqbaudaWgaaWcbaGaem4ta8eabeaakiabcIcaOiabdIfayjabcUfaBjabdMgaPjabcYcaSiabdQgaQjabc2faDjabcYha8jabdkgaIjabdggaHjabdogaJjabdUgaRjabdEgaNjabdkhaYjabd+gaVjabdwha1jabd6gaUjabdsgaKjabcYcaSiab+H8a5jabcMcaPaaacqGHRaWkcqWGebarcqGGOaakcqWGPbqAcqGGSaalcuWGXbqCgaqbaiabcMcaPaaaaaa@8FC3@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Exons are recovered from the parse ending at the highest scoring entry max<sub><it>q </it></sub><it>D</it>(<it>N </it>- 1, <it>q</it>) where <it>N </it>is the length of X. The runtime of the algorithm is <it>O</it>(|<it>Q</it>|<sup>2 </sup>&#215; <it>N</it><sup>2</sup>) where |<it>Q</it>| is the number of states in the model. The PGHMM can easily be transformed to a single species alternative exon predictor by assuming a single input sequence and single node phylogenetic tree. The evolutionary models reduce to the single sequence Markov model equivalents and are used to measure the impact of sequence conservation on prediction performance.</p>
         </sec>
         <sec>
            <st>
               <p>Experiments</p>
            </st>
            <p>The alternative exon splicing model was implemented in a program called ExAlt and tested on a target genome &#8211; <it>Drosophila melanogaster </it>using three informant species: <it>Drosophila simulans, Drosophila yakuba</it>, and <it>Drosophila erecta</it>. This study focuses on the three most closely related species to <it>D. melanogaster </it>(with available genomic data) to avoid using inaccurate multiple sequence alignments, which can occur when dealing with more distantly related species. Testing is based on 1339 <it>D. melanogaster </it>exons from 1160 gene loci. 572 of the original 600 alternatively spliced test exons (95%) were aligned to at least one of the three informant species and 767 of 777 constitutive exons (99%) were aligned to at least one of the three informant species. As an option, ExAlt predicts exons in the absence of alignment evidence; however, the candidate exons with no cross-species sequence conservation left too small a data set (3% = 38/1377) to make meaningful comparisons between performance on exons with and without detectable cross-species conservation. Therefore, the remaining 97% of the exons showing some cross-species sequence conservation were selected to evaluate the impact of sequence conservation on prediction performance, with the understanding that additional work will be needed (as more data becomes available) to analyze prediction performance in the non-conserved exons.</p>
            <p>The goal of the experiments was to test ExAlt's ability to take a single input sequence presumed to contain an exon and correctly predict all of the exon/intron boundaries. The experiments were constructed to measure the impact of using gene structure information and cross species sequence conservation on prediction performance. ExAlt outputs exon coordinates and exon splicing type labels. 60% of the data (selected at random) was used to evaluate sequence conservation patterns, training, and testing with 10 fold cross-validation. The remaining 40% was held out from the initial training and test phase so that once development of the system was complete, the software could be tested on an independent data set and the reproducibility of the initial performance results verified. The pipeline for generating the test data is described in the Methods section.</p>
            <p>Since the absence of evidence for alternative splicing does not prove the existence of a constitutive exon, a constitutive exon is defined for evaluation purposes to be an exon from a gene with a single known isoform, where each splice site is supported by at least 5 ESTs (or other cDNAs) aligned with 95% identity or higher. The hypothesis is that these genes have sufficient expression evidence to predict the presence or absence of alternative splicing.</p>
            <sec>
               <st>
                  <p>Sequence conservation</p>
               </st>
               <p>The training set confirmed that splice sites and protein coding sequence were conserved between <it>D. melanogaster </it>and each of the three informant species. 99% of the constitutive di-nucleotide splice sites (AG and GT) annotated in <it>D. melanogaster </it>were found in the matching aligned informant sequence. Alternative splice sites were less frequently conserved, but only by a small degree, with over 95% of alternative splice sites found in the matched informant species. Table <tblr tid="T1">1</tblr> shows the percentage of exons with matches to each of the informant species missing a splice site categorized by exon type. Exons with multiple duplicate functional splice sites (MS and IR exons) less frequently shared all splice sites with the informant species. In <it>D. simulans </it>for example, 12% of the multiple splice site exons (MS in Table <tblr tid="T1">1</tblr>) and 8% of the exons with retained introns (IR in Table <tblr tid="T1">1</tblr>) were missing a splice site. The lack of conservation in alternative splicing in nearly every case affects only one exon isoform leaving another shared exon isoform in place. In the vast majority of cases, the lack of observed conservation is not due to misalignments and missing sequence, although a small percentage of cases are affected by this problem.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Percentage of <it>D. melanogaster </it>annotated exons missing at least one splice site in <it>D. simulans, D. yakuba </it>and <it>D. erecta</it>.</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>
                              <it>D. simulans</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>D. yakuba</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>D. erecta</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>CS</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>CE</p>
                        </c>
                        <c ca="center">
                           <p>1</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>MS</p>
                        </c>
                        <c ca="center">
                           <p>10,1</p>
                        </c>
                        <c ca="center">
                           <p>9,0</p>
                        </c>
                        <c ca="center">
                           <p>11,0</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>IR</p>
                        </c>
                        <c ca="center">
                           <p>8,0</p>
                        </c>
                        <c ca="center">
                           <p>16,2</p>
                        </c>
                        <c ca="center">
                           <p>20,3</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Percentages are organized by exon type: constitutive exons (CS), cassette exons (CE), exons with multiple splice sites (MS), and exons with intron retention (IR). The second number associated with the MS and IR rows is the percentage of exons where the non-conserved splice site is constitutive (used in all isoforms).</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Prediction performance</p>
               </st>
               <p>ExAlt's prediction accuracy was measured on exons with an exon counted correct when the predicted left and right boundary matched the test exon. Internal exons begin with an acceptor and end with a donor. Initial exons begin with a transcription or translation start site and end with a donor site. Terminal exons begin with an acceptor and end with a transcription or translation stop site. Single exons begin with a transcription or translation start site and end with the transcription or translation stop site. (Single exons in the test were of the intron retention splicing type.) Sensitivity (the percentage of the test exons correctly detected) and specificity (the percentage of predicted exons, which match the test set) were used to measure performance.</p>
               <p>Table <tblr tid="T2">2</tblr> shows ExAlt's performance on the hold out set compared to the union of different publicly available gene predictions and an initial known exon given as input. This tests the ability to improve an existing annotation, where an initial exon and reading frame are known. Since many test sequences contained multiple overlapping exons, one exon was chosen at random and used as input. Experiments were repeated 10 times and the average taken. Results are listed in Table <tblr tid="T2">2</tblr> as ExAlt-Exon for the ExAlt predictions informed by cross-species sequence conservation. Exon sensitivity and specificity are high since at least one predicted exon matched the test exon. For example, in the case of multiple splice site exons with two overlapping exons, a "naive" program predicting only the input exon would achieve 50% sensitivity and 100% specificity. When only a single exon isoform exists the naive program achieves 100% sensitivity and specificity respectively. For the results in Table <tblr tid="T2">2</tblr> it was important to compare the decrease in specificity from the naive method in cases where only a single exon isoform occurs versus the gains in sensitivity when multiple overlapping exons occur. Two <it>ab initio </it>single isoform gene finders were included in the comparison, Augustus <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> and SNAP <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Also included is the single isoform gene finder, N-SCAN <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, which uses cross-species conservation with <it>Drosophila yakuba, Drosophila pseudoobscura</it>, and <it>Anopheles gambiae </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Prediction performance of ExAlt. Sensitivity (Sens) and Specificity (Spec) are shown for exons.</p>
                  </caption>
                  <tblbdy cols="11">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Constitutive</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Cassete</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Multiple Splice</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Intron Retention</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>All Exons</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="11">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Exon</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>96</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>89</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>94</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>61</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>89</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>84</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>94</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>N-SCAN-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>86</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>89</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>65</p>
                        </c>
                        <c ca="center">
                           <p>79</p>
                        </c>
                        <c ca="center">
                           <p>55</p>
                        </c>
                        <c ca="center">
                           <p>78</p>
                        </c>
                        <c ca="center">
                           <p>82</p>
                        </c>
                        <c ca="center">
                           <p>84</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Augustus-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>82</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>81</p>
                        </c>
                        <c ca="center">
                           <p>63</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                        <c ca="center">
                           <p>52</p>
                        </c>
                        <c ca="center">
                           <p>73</p>
                        </c>
                        <c ca="center">
                           <p>81</p>
                        </c>
                        <c ca="center">
                           <p>79</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>SNAP-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>79</p>
                        </c>
                        <c ca="center">
                           <p>64</p>
                        </c>
                        <c ca="center">
                           <p>74</p>
                        </c>
                        <c ca="center">
                           <p>51</p>
                        </c>
                        <c ca="center">
                           <p>76</p>
                        </c>
                        <c ca="center">
                           <p>81</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>SNAP+N-SCAN-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>73</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>74</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                        <c ca="center">
                           <p>68</p>
                        </c>
                        <c ca="center">
                           <p>57</p>
                        </c>
                        <c ca="center">
                           <p>70</p>
                        </c>
                        <c ca="center">
                           <p>83</p>
                        </c>
                        <c ca="center">
                           <p>72</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Augustus+N-SCAN-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>79</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                        <c ca="center">
                           <p>68</p>
                        </c>
                        <c ca="center">
                           <p>73</p>
                        </c>
                        <c ca="center">
                           <p>57</p>
                        </c>
                        <c ca="center">
                           <p>68</p>
                        </c>
                        <c ca="center">
                           <p>83</p>
                        </c>
                        <c ca="center">
                           <p>76</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Aug.+SNAP+N-SCAN-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>70</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>71</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>64</p>
                        </c>
                        <c ca="center">
                           <p>60</p>
                        </c>
                        <c ca="center">
                           <p>64</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>84</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Columns are organized by exon type: Constitutive, Cassette, Multiple Splice, Intron Retention, and all exons counted together (All Exons). Row 1 shows ExAlt performance using an input exon and default parameter settings (ExAlt-Exon). Rows 2&#8211;6 show the union of different combinations of three gene finders (N-SCAN, SNAP, and Augustus) plus the input exon. (Aug. = Augustus)</p>
                  </tblfn>
               </tbl>
               <p>The coordinates for start and stop codons were included as input to ExAlt but were excluded from input to the gene finders, making it potentially more difficult for the gene finders to accurately predict initial, terminal and single exons. Therefore, for the initial exons to be counted correct, a gene finder was only required to correctly predict the donor site. For terminal exons to be counted correct, a gene finder was only required to correctly predict the acceptor site, and for single gene exons to be counted correct, a gene finder only needed to predict an overlap with the known single exon. The gene finders were run on longer stretches of genomic sequence than ExAlt and have the added challenging task of determining gene boundaries. A gene finder may predict an initial, terminal or single exon to overlap an internal exon in the test set, which would be counted as an incorrect exon prediction. If the start and stop codon information were integrated into the gene finder prediction process, individual prediction performance for the respective gene finders would likely improve. However, since considerable effort has been taken to carefully train and tune the gene finders for annotating long stretches of genomic sequence, the current predictions serve as a reasonable baseline for measuring differences in prediction performance. Using the input exon plus the union of all three single isoform gene finders yields more of the correct multiple splice site exons (71% versus ExAlt's 67%) but at the cost of a large reduction in specificity (64% versus ExAlt's 94%). In the other cases, however, ExAlt matches or improves on the performance of the union of multiple gene finders.</p>
               <p>Table <tblr tid="T3">3</tblr> compares the prediction performance of ExAlt-Exon in Table <tblr tid="T2">2</tblr> to ExAlt predictions using different parameter settings. The impact of using the gene structure information as input (ExAlt-Exon) was compared to alternatives shown in Table <tblr tid="T3">3</tblr> as ExAlt-Frame and ExAlt-Default. ExAlt-Frame makes predictions without using exon coordinates as input but is limited to predicting exons that maintain reading frame consistency with the rest of the known gene. ExAlt-Default is given no gene structure information and checks all three possible reading frames before selecting the exons from the highest scoring reading frame. As expected, starting with an initial known exon improved overall performance, but even when gene structure information is precluded from input, a majority of the exon coordinates were correctly recovered (67% overall).</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Exon prediction accuracy using different ExAlt parameter settings.</p>
                  </caption>
                  <tblbdy cols="11">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Constitutive</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Cassete</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Multiple Splice</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Intron Retention</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>All Exons</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="11">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Exon</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>96</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>89</p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                        <c ca="center">
                           <p>94</p>
                        </c>
                        <c ca="center">
                           <p>61</p>
                        </c>
                        <c ca="center">
                           <p>89</p>
                        </c>
                        <c ca="center">
                           <p>84</p>
                        </c>
                        <c ca="center">
                           <p>94</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Ex Alt-Exon- <it>ab initio</it></p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>88</p>
                        </c>
                        <c ca="center">
                           <p>100</p>
                        </c>
                        <c ca="center">
                           <p>85</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                        <c ca="center">
                           <p>83</p>
                        </c>
                        <c ca="center">
                           <p>70</p>
                        </c>
                        <c ca="center">
                           <p>87</p>
                        </c>
                        <c ca="center">
                           <p>87</p>
                        </c>
                        <c ca="center">
                           <p>84</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Frame</p>
                        </c>
                        <c ca="center">
                           <p>96</p>
                        </c>
                        <c ca="center">
                           <p>95</p>
                        </c>
                        <c ca="center">
                           <p>70</p>
                        </c>
                        <c ca="center">
                           <p>80</p>
                        </c>
                        <c ca="center">
                           <p>53</p>
                        </c>
                        <c ca="center">
                           <p>87</p>
                        </c>
                        <c ca="center">
                           <p>48</p>
                        </c>
                        <c ca="center">
                           <p>82</p>
                        </c>
                        <c ca="center">
                           <p>72</p>
                        </c>
                        <c ca="center">
                           <p>89</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Ex Alt- Frame- <it>ab initio</it></p>
                        </c>
                        <c ca="center">
                           <p>97</p>
                        </c>
                        <c ca="center">
                           <p>87</p>
                        </c>
                        <c ca="center">
                           <p>72</p>
                        </c>
                        <c ca="center">
                           <p>74</p>
                        </c>
                        <c ca="center">
                           <p>56</p>
                        </c>
                        <c ca="center">
                           <p>76</p>
                        </c>
                        <c ca="center">
                           <p>48</p>
                        </c>
                        <c ca="center">
                           <p>80</p>
                        </c>
                        <c ca="center">
                           <p>74</p>
                        </c>
                        <c ca="center">
                           <p>82</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Frame-Single</p>
                        </c>
                        <c ca="center">
                           <p>96</p>
                        </c>
                        <c ca="center">
                           <p>97</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                        <c ca="center">
                           <p>85</p>
                        </c>
                        <c ca="center">
                           <p>45</p>
                        </c>
                        <c ca="center">
                           <p>92</p>
                        </c>
                        <c ca="center">
                           <p>31</p>
                        </c>
                        <c ca="center">
                           <p>92</p>
                        </c>
                        <c ca="center">
                           <p>66</p>
                        </c>
                        <c ca="center">
                           <p>94</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Default</p>
                        </c>
                        <c ca="center">
                           <p>89</p>
                        </c>
                        <c ca="center">
                           <p>84</p>
                        </c>
                        <c ca="center">
                           <p>58</p>
                        </c>
                        <c ca="center">
                           <p>63</p>
                        </c>
                        <c ca="center">
                           <p>49</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                        <c ca="center">
                           <p>43</p>
                        </c>
                        <c ca="center">
                           <p>74</p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                        <c ca="center">
                           <p>79</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Ex Alt-Default-<it>ab initio</it></p>
                        </c>
                        <c ca="center">
                           <p>89</p>
                        </c>
                        <c ca="center">
                           <p>75</p>
                        </c>
                        <c ca="center">
                           <p>58</p>
                        </c>
                        <c ca="center">
                           <p>55</p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                        <c ca="center">
                           <p>36</p>
                        </c>
                        <c ca="center">
                           <p>58</p>
                        </c>
                        <c ca="center">
                           <p>65</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Default-Single</p>
                        </c>
                        <c ca="center">
                           <p>89</p>
                        </c>
                        <c ca="center">
                           <p>90</p>
                        </c>
                        <c ca="center">
                           <p>56</p>
                        </c>
                        <c ca="center">
                           <p>66</p>
                        </c>
                        <c ca="center">
                           <p>41</p>
                        </c>
                        <c ca="center">
                           <p>84</p>
                        </c>
                        <c ca="center">
                           <p>28</p>
                        </c>
                        <c ca="center">
                           <p>83</p>
                        </c>
                        <c ca="center">
                           <p>61</p>
                        </c>
                        <c ca="center">
                           <p>85</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>N-SCAN</p>
                        </c>
                        <c ca="center">
                           <p>87</p>
                        </c>
                        <c ca="center">
                           <p>84</p>
                        </c>
                        <c ca="center">
                           <p>51</p>
                        </c>
                        <c ca="center">
                           <p>80</p>
                        </c>
                        <c ca="center">
                           <p>33</p>
                        </c>
                        <c ca="center">
                           <p>66</p>
                        </c>
                        <c ca="center">
                           <p>31</p>
                        </c>
                        <c ca="center">
                           <p>66</p>
                        </c>
                        <c ca="center">
                           <p>57</p>
                        </c>
                        <c ca="center">
                           <p>78</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Augustus</p>
                        </c>
                        <c ca="center">
                           <p>75</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                        <c ca="center">
                           <p>27</p>
                        </c>
                        <c ca="center">
                           <p>53</p>
                        </c>
                        <c ca="center">
                           <p>27</p>
                        </c>
                        <c ca="center">
                           <p>59</p>
                        </c>
                        <c ca="center">
                           <p>26</p>
                        </c>
                        <c ca="center">
                           <p>57</p>
                        </c>
                        <c ca="center">
                           <p>47</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>SNAP</p>
                        </c>
                        <c ca="center">
                           <p>76</p>
                        </c>
                        <c ca="center">
                           <p>72</p>
                        </c>
                        <c ca="center">
                           <p>42</p>
                        </c>
                        <c ca="center">
                           <p>61</p>
                        </c>
                        <c ca="center">
                           <p>29</p>
                        </c>
                        <c ca="center">
                           <p>56</p>
                        </c>
                        <c ca="center">
                           <p>27</p>
                        </c>
                        <c ca="center">
                           <p>62</p>
                        </c>
                        <c ca="center">
                           <p>50</p>
                        </c>
                        <c ca="center">
                           <p>67</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Columns are organized by exon type: Constitutive, Cassette, Multiple Splice, Intron Retention, and all exons counted together (All Exons). Rows 1&#8211;2 show ExAlt performance using an input exon and default parameters from Table 2 (ExAlt-Exon) and no informant species (ExAlt-Exon-<it>ab initio</it>). Rows 3&#8211;5 show ExAlt performance using an input coding frame with default parameters (ExAlt-Frame), no informant species (ExAlt-Frame-<it>ab initio</it>), and at most 1 exon predicted per test sequence (ExAlt-Frame-Single). Rows 6&#8211;8 show ExAlt performance using no gene structure information with default parameters (ExAlt-Default), no informant species (ExAlt-Default-<it>ab initio</it>), and at most 1 exon prediction per test sequence (ExAlt-Default-Single). Output is shown for three single isoform gene finders N-SCAN, Augustus, and SNAP.</p>
                  </tblfn>
               </tbl>
               <p>ExAlt-Exon, ExAlt-Frame, and ExAlt-Default were compared to the respective <it>ab initio </it>equivalent: ExAlt-Exon-<it>ab initio</it>, ExAlt-Frame-<it>ab initio</it>, and ExAlt-Default-<it>ab initio</it>. Each <it>ab initio </it>version is the GHMM equivalent to the PGHMM using only the target <it>D. melanogaster </it>sequence as input. The multi-species versions of ExAlt in all cases reduced the number of false positive predictions over the equivalent <it>ab initio </it>version, with little or no reduction in sensitivity.</p>
               <p>Finally, the trade off between predicting multiple overlapping exons versus predicting at most one exon per test sequence was measured. With the hold out set comprised of 57% constitutive exons, 18% MS exons, 17% SE exons, and 9% IR exons, both single exon prediction versions of ExAlt (ExAlt-Frame-Single and ExAlt-Default-Single) captured a large percentage of the exons by simply correctly predicting one exon per sequence. When ExAlt is given the coding frame and restricted to predict at most one exon, an exon is correctly predicted in 94% of the sequences (ExAlt-Frame-Single in Table <tblr tid="T3">3</tblr>). Allowing ExAlt to predict overlapping exons (ExAlt-Frame in Table <tblr tid="T3">3</tblr>) lowered specificity to 89% but increased the number of correctly annotated exons to 72%. The last three rows show single isoform gene finding performance for N-SCAN, Augustus, and SNAP, which provided an additional point of reference to measure how well conventional gene finders performed in the evaluated gene regions.</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>ExAlt results on the initial training and testing set in percentages.</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>All Exons</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>Sens</p>
                        </c>
                        <c ca="center">
                           <p>Spec</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Exon</p>
                        </c>
                        <c ca="center">
                           <p>82/-2</p>
                        </c>
                        <c ca="center">
                           <p>94/-1</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>ExAlt-Exon-<it>ab initio</it></p>
                        </c>
                        <c ca="center">
                           <p>84/-3</p>
                        </c>
                        <c ca="center">
                           <p>86/+2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>N-SCAN-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>82/0</p>
                        </c>
                        <c ca="center">
                           <p>82/-2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Augustus-Exon Union</p>
                        </c>
                        <c ca="center">
                           <p>81/0</p>
                        </c>
                        <c ca="center">
               