Abstract
In this paper, we study the problem of constructing perfect phylogenies for threestate characters. Our work builds on two recent results. The first result states that for threestate characters, the local condition of examining all subsets of three characters is sufficient to determine the global property of admitting a perfect phylogeny. The second result applies tools from minimal triangulation theory to the partition intersection graph to determine if a perfect phylogeny exists. Despite the wealth of combinatorial tools and algorithms stemming from the chordal graph and minimal triangulation literature, it is unclear how to use such approaches to efficiently construct a perfect phylogeny for threestate characters when the data admits one. We utilize structural properties of both the partition intersection graph and the original data in order to achieve a competitive time bound.
Keywords:
Perfect phylogeny; Chordal graph; Minimal triangulation; Minimal separatorBackground
In this paper, we study the problem of constructing phylogenies, or evolutionary trees, to describe ancestral relationships between a set of observed taxa. Each taxon is represented by a sequence and the evolutionary tree provides an explanation of branching patterns of mutation events transforming one sequence into another.
We will focus on the widely studied infinite sites model from population genetics, in which the mutation of any character can occur at most once in the phylogeny. Without recombination, the phylogeny is a tree called a perfect phylogeny. The problem of determining if a set of binary sequences fits the infinite sites model without recombination corresponds to determining if the data can be derived on a perfect phylogeny. A generalization of the infinite sites model is the infinite alleles model, in which any character can mutate multiple times but each mutation of the character must lead to a distinct allele (state). Again, without recombination, the phylogeny is tree, called a multistate perfect phylogeny. Correspondingly, the problem of determining if multistate data fits the infinitealleles model without recombination corresponds to determining if the data can be derived on a multistate perfect phylogeny.
Dress and Steel [1] and Kannan and Warnow [2] both give algorithms that construct perfect phylogenies for threestate characters when one exists. The goal of this work is to extend the results in [3] using the minimal separators of the partition intersection graph to create a three state construction algorithm that is competitive with Dress and Steel’s algorithm.
Notation and prior results
The input to our problem is a set of n taxa defined over a set of mcharacters
The perfect phylogeny problem is to determine whether the taxa defined by a matrix M can be displayed on a tree T such that
1. each taxon of M labels exactly one node in T,
2. each leaf in T is labeled by a taxon of M,
3. each node of T is labeled by a species,
4. for every character χ^{i} and for every state
Any tree satisfying conditions 1  4 is called a perfect phylogeny for M. Any character satisfying condition 4 is said to be compatible with T. The general perfect phylogeny problem (with no constraints on r, n, and m) is NPcomplete [4,5]. However, the perfect phylogeny problem becomes polynomially solvable (in n and m) when r is fixed. For r = 2, this follows from the Splits Equivalence Theorem [6,7]. For r = 3, Dress and Steel gave an O(nm^{2}) algorithm [1] and for r = 3 or 4, Kannan and Warnow gave an O(n^{2}m) algorithm [2]. For any fixed r, Agarwala and FernándezBaca gave an O(2^{3r}(nm^{3} + m^{4})) algorithm [8], which was improved to O(2^{2r}nm^{2}) by Kannan and Warnow [9].
Definition 2.1
[7,10]For a set of input taxa M, the partition intersection graph G(M) is obtained by associating
a vertex for each character state and an edge between two vertices
Note that by definition, there are no edges in the partition intersection graph between states of the same character. It will be useful to consider the partition intersection graph G(χ^{i}, χ^{j}, χ^{k}) of the submatrix of M defined by the three characters χ^{i}, χ^{j}, χ^{k}.
Definition 2.2
A graph H is chordal, or triangulated, if there are no induced chordless cycles of length four or greater in H.
See [11] and [12] for further details on chordal graphs.
Consider coloring the vertices of the partition intersection graph G(M) by colors 1, 2, …, m as follows. For each character χ^{i}, assign color i to the vertices
Theorem 2.3
[7,10]A set of taxa M admits a perfect phylogeny if and only if the corresponding partition intersection graph G(M) has a proper triangulation.
A triangulation of a graph G is minimal if it does not have a proper subgraph that is also a triangulation of G. Theorem 2.3 can be restated in terms of proper minimal triangulations of G(M) because removing edges from a proper triangulation will preserve the coloring of
the graph. If G(M) has a proper triangulation H, then a perfect phylogeny for M can be constructed from a clique tree of H.
1. the nodes of
2. for each vertex v of G, the maximal cliques containing v form a connected subtree of
That is, given a clique tree
Theorem 2.4
[10,13]G is a chordal graph if and only if there is a tree T such that each vertex u of G induces a subtree T_{u} of T and uv is an edge of G if and only if subtrees T_{u} and T_{v} share at least one node.
In particular, if a pair of character states appear in the same species of a perfect phylogeny for M but not in any input taxon of M, this pair defines a fill edge to add to obtain a proper triangulation of the partition intersection graph. This fill edge preserves the proper coloring because intersecting subtrees from the same character would contradict conditions 3 and 4 of the perfect phylogeny definition.
To illustrate some of these notions, consider the example in Figure
1. The species with sequence 2100 defines a fill edge
Figure 1. Partition intersection graphs and perfect phylogenies. A 3state matrix M, partition intersection graph G(M), and perfect phylogeny T. There are no species with missing values in T.
In recent work, it is shown that there is a complete description of minimal obstruction sets for threestate characters analogous to a wellknown result on obstruction sets for binary characters (the four gamete condition) [3]. These results allow us to expand upon recent work of Gusfield [14] which uses properties of triangulations and minimal separators of partition intersection graphs to solve several problems related to multistate perfect phylogenies.
An (a,b)separator of a graph G is a set of vertices whose removal from G separates a and b. A minimal (a,b)separator is an (a,b)separator such that no proper subset is an (a,b)separator, and a minimal separator is a separator that is a minimal (a,b)separator for some pair of vertices a and b. For a set of vertices X, let GX be the induced subgraph of G after removing vertices X. If S and S^{′} are two minimal separators of G, we say S is parallel to S^{′} if there is a single connected component C of G − S^{′} such that S ⊆ C ∪ S^{′} (otherwise S and S^{′ }cross). A pair of vertices a and b cross S if S is an (a,b)separator. The neighborhood of a set of vertices X is N(X) = {v ∈ G − X : (u, v) ∈ E(G) for some u ∈ X}. A component C of GS is full if the neighborhood N(C) is equal to S. The following characterization of minimal separators is critical to our arguments.
Lemma 2.5
[15]Let S be a subset of vertices of graph G. Then S is a minimal separator of G if and only if GS has two or more full components.
In a colored graph, a legal separator is a separator such that no two vertices have the same color. Let Δ_{G} denote the minimal separators of graph G. For S ∈ Δ_{G}, we saturateS by adding edges between every pair of vertices in S to create a clique. For Q ⊆ Δ_{G}, G_{Q} denotes the graph obtained by saturating every S ∈ Q. The following theorem shows the connection between minimal triangulations and collections of parallel minimal separators of a graph.
Theorem 2.6
(Minimal Triangulation Theorem
[1618]). Suppose Q ⊆ Δ_{G} is a maximal set of pairwise parallel minimal separators of G. Then G_{Q} is a minimal triangulation of G and
The following are necessary and sufficient conditions for the existence of a perfect phylogeny for data over arbitrary number of states. We refer the reader to [14] for the proofs.
Theorem 2.7
(Theorem 2 (MSP) [14]). For input M over r states (r ≥ 2), there is a perfect phylogeny for M if and only if there is a set Q of pairwise parallel legal minimal separators in G(M) such that every illegal minimal separator in G(M) is crossed by at least one separator in Q.
Theorem 2.8
(Theorem 3 (MSPN) [14]). For input M over r states (r ≥ 2), there is a perfect phylogeny for M if and only if there is a set Q of pairwise parallel legal minimal separators in partition intersection graph G(M) such that every monochromatic pair of nodes in G(M) is separated by some separator in Q.
For the special case of input M with characters over three states (r = 3), the partition intersection graph satisfies additional structure and the following theorems give necessary and sufficient conditions for the existence of a perfect phylogeny for M[3].
Theorem 2.9
[3]Given an input set M with at most three states per character (r ≤ 3), M admits a perfect phylogeny if and only if every subset of three characters of M admits a perfect phylogeny.
Furthermore, there is an explicit description of all minimal obstruction sets to the existence of a perfect phylogeny.
Theorem 2.10
[3]For input M over 3state characters, there exists a perfect phylogeny for M if and only if both of the following conditions hold:
1. for every pair of columns of M, the partition intersection graph induced by the columns is acyclic and
2. for every triple of columns of M, the partition intersection graphs induced by the columns does not contain any of the graphs shown in Figure 2 up to relabeling of the character states.
Figure 2. Minimal obstruction sets. Minimal obstruction sets for threestate characters up to relabeling. The boxes highlight the input entries that are identical for three of the obstruction sets.
This complete characterization of minimal obstruction sets allows us to simplify Theorem 2.8 in the case r = 3.
Theorem 2.11
[3]For input M on at most three states per character (r ≤ 3), there is a threestate perfect phylogeny for M if and only if the partition intersection graph for every pair of characters is acyclic and every monochromatic pair of vertices in G(M) is separated by a legal minimal separator.
Theorem 2.11 shows that the requirement of Theorem MSPN that the legal minimal separators in Q be pairwise parallel can be removed for the case of input data over threestate characters. The condition in Theorem 2.11 that the input is over three state characters is necessary, as there are examples showing that the theorem does not extend to data with fourstate characters.
All of the legal minimal separators for threestate input can be found in O(nm^{2}) time and the algorithm to check if each monochromatic pair is separated by a legal minimal separator can be performed during the algorithm for generating the legal minimal separators (see Section “Proper triangulation algorithm”). Therefore, the 3state perfect phylogeny decision problem can be solved in O(nm^{2}) time using minimal separators. However, it is not clear how minimal separators can be used to solve the construction problem in a similar time bound. In [14], Gusfield used the minimal separator approach and integer linear programming methods to solve both the decision and construction problem for kstate perfect phylogeny. Since integer linear programming methods in general do not have polynomial time bounds, this naturally leads to the following question: is there an O(nm^{2}) algorithm for the construction problem for 3state perfect phylogeny using the separator approach? In this paper, we answer in the affirmative, and show that any algorithm which explicitly computes the partition intersection graph has a time bound of at least O(nm + m^{2}).
We first study the structure of separators in the partition intersection graph for 3state input with the goal of answering this question. We first state two lemmas from [3].
Lemma 2.12
(Lemma 3.4 [3]). Let M be a set of input taxa with at most three states per character, and consider any three characters χ^{i}, χ^{j}, χ^{k}in M. If the partition intersection graph G(χ^{i}, χ^{j}, χ^{k}) is properly triangulatable, then the only possible chordless cycles in G(χ^{i}, χ^{j}, χ^{k}) are chordless 4cycles, with two colors appearing once and the remaining color appearing twice.
Lemma 2.12 implies that if a subset of three characters χ^{i}, χ^{j}, χ^{k} in M is properly triangulatable, then there is a unique set of edges F(χ^{i}, χ^{j}, χ^{k}) that must be added to triangulate the chordless cycles in G(χ^{i}, χ^{j}, χ^{k}). Construct a new graph G^{′}(M) on the same vertices as G(M) with edge set
Lemma 2.13
(Lemmas 4.2, 4.3, 4.7 [3]) Let M be a set of input taxa with at most three states per character, and suppose G(M) is properly triangulatable. Then G^{′}(M) cannot contain a chordless cycle with one or more Fedges. If C is a chordless cycle in G^{′}(M) with only Eedges, then C has length exactly four with four distinct colors.
Structure of separators
In this section, our goal is to study the relationship between minimal separators in G(M) and G^{′}(M) when M is a set of taxa over 3state characters. Our ultimate goal is to show that it suffices to consider only the legal minimal separators of G(M) while disregarding the illegal minimal separators. We first prove the following theorem on the separator structure of G^{′}(M).
Theorem 3.1
Let M be a set of taxa over 3state characters. M allows a perfect phylogeny if and only if G^{′}(M) (the partition intersection graph G(M) together with Fedges) does not contain any illegal minimal separators.
Proof
Suppose M allows a perfect phylogeny and suppose there is an illegal minimal separator S in G^{′}(M) with a monochromatic pair of vertices u and v. By Lemma 2.5, there exist two full components C, D of G  S, and by definition of a full component, there are paths connecting u and v in both C∪{u,v} and D∪{u,v}. Consider the shortest such paths P_{C} and P_{D} respectively (note that there are no chords within P_{C} and no chords within P_{D}). Since C and D are components separated by S, there are no edges between C and D. Also, u and v are not adjacent in G^{′}(M) since u and v have the same color and G^{′}(M) contains no illegal edges. This implies the union of P_{C} and P_{D} creates a chordless cycle. By Lemma 2.13, G^{′}(M) cannot contain any chordless cycles of length five or greater or chordless cycles with Fedges, so the union of the paths P_{C} and P_{D} must be a four cycle C and in particular, must be a cycle u → x → v → x^{′} → u, where u and v have the same color. C is a chordless four cycle in G(M) on at most three colors, which cannot occur since we have triangulated all such cycles by Fedges. This contradiction implies S cannot be an illegal minimal separator.
Now, suppose G^{′}(M) does not contain any illegal minimal separators. By Theorem 2.7, graph G^{′}(M) has a proper triangulation and since G(M) is a subgraph of G^{′}(M), G(M) also has a proper triangulation. It follows that M has a perfect phylogeny. □
This suggests that analyzing the minimal separators of G^{′}(M) suffices for 3state construction. However, the algorithm for enumerating the minimal separators of G(M) necessary for proper triangulations in O(nm^{2}) time uses M (rather than G(M)), and it is not clear if it is possible to extend this approach to enumerate the necessary minimal separators of G^{′}(M). In order to use techniques in [14], the the goal of our next two results will be to describe the relationship between the minimal separators of G^{′}(M) and the legal minimal separators of G(M) when M has a perfect phylogeny.
Lemma 3.2
Let M be a set of taxa over 3state characters allowing a perfect phylogeny. Then H is a proper minimal triangulation of G(M) if and only if H is a minimal triangulation of G^{′}(M).
Proof
Suppose H is a proper minimal triangulation of G(M). Each Fedge of G^{′}(M) comes from a chordless cycle of length four on three colors (see Lemma 2.12), so this edge must appear in any proper triangulation of G(M). Hence the Fedges must be edges of H, so G^{′}(M) ⊆ H and H is a proper triangulation of G^{′}(M). If H is not minimal with respect to G^{′}(M), there exists H^{′} such that G^{′}(M) ⊆ H^{′} ⊂ H and thus G(M) ⊆ H^{′} ⊂ H, contradicting the minimality of H with respect to G(M). Thus H is a minimal triangulation of G^{′}(M).
Conversely, suppose M allows a perfect phylogeny and H is a minimal triangulation of G^{′}(M). By Theorem 2.6, H = G^{′}(M)_{Q} for a set Q of maximal pairwise parallel minimal separators of G^{′}(M), and these minimal separators must be legal by Theorem 3.1. Every edge in H not in G(M) is either an Fedge of G^{′}(M) or a fill edge defined by Q, and in both cases it must be a legal fill edge. Therefore H is a proper triangulation of G(M). If there is some proper triangulation H^{′} of G(M) where G(M) ⊆ H^{′} ⊆ H then the Fedges of G^{′}(M) must be edges of H^{′}, otherwise H^{′} has a chordless four cycle. Thus H^{′} is a proper triangulation of G^{′}(M), and because H is a proper minimal triangulation of G^{′}(M) it must be that H^{′} = H. Therefore H is also a proper minimal triangulation of G(M). □
Let
Theorem 3.3
Suppose M is a set of taxa on 3state characters that allows a perfect phylogeny.
Then the legal minimal separators of G(M) are exactly the minimal separators of G^{′}(M) (i.e.,
Proof
Assume M has a perfect phylogeny. Consider a minimal separator S of G^{′}(M), and suppose Q is a set of maximal pairwise parallel minimal separators of G^{′}(M) with S ∈ Q. Let H = G^{′}(M)_{Q}. H is a minimal triangulation of G^{′}(M) by Theorem 2.6, and H is a proper minimal triangulation of G(M) by Lemma 3.2. By Theorem 2.6, Q is precisely the set of minimal separators of H. Furthermore, because H is also a minimal triangulation of G(M), the same theorem states that Q is a subset of the minimal separators of G(M). Therefore S ∈ Δ_{G(M)}, so
Conversely, let
Now consider a minimal separator S^{′} of G(M). If an Fedge f = xy crosses S^{′}, there is a four cycle x → u → y → v → x in G(M) with monochromatic pair u,v, and further, u,v ∈ S^{′}. Hence S^{′} is illegal, and any legal minimal separator of G(M) is not crossed by any Fedge. From our previous argument, this implies
The second half of the proof of Theorem 3.3 proves the following.
Corollary 3.4
Suppose M is a set of taxa on 3state characters that allows a perfect phylogeny.
If
We now prove the main result of this section.
Theorem 3.5
Suppose M is a set of taxa on 3state characters. Then M has a perfect phylogeny if and only if any maximal pairwise parallel set of legal minimal separators Q of G(M) induces a proper minimal triangulation G(M)_{Q }of G(M).
Proof
First, suppose that M has a perfect phylogeny, and let Q be a maximal pairwise parallel set of legal minimal separators of G(M). We show that G(M)_{Q} is a proper triangulation of G(M). By Theorem 3.3, Q is a maximal set of minimal separators of G^{′}(M), and they are pairwise parallel because the connected components of each minimal separator in Q are the same in G(M) and G^{′}(M) (Corollary 3.4). Hence H = G^{′}(M)_{Q} is a minimal triangulation of G^{′}(M) with minimal separator set Q (Theorem 2.6), and by Lemma 3.2, H is a proper minimal triangulation of G(M). Because Δ_{H} = Q, Theorem 2.6 implies Q is a maximal pairwise parallel set of minimal separators of G(M) and therefore H = G(M)_{Q}. Thus H = G(M)_{Q} is a proper minimal triangulation of G(M).
For the converse, pick any maximal pairwise parallel set of legal minimal separators Q of G(M) that induces a proper minimal triangulation G(M)_{Q} of G(M). Then M has a perfect phylogeny by Theorem 2.3. □
Proper triangulation algorithm
In this section, we build on techniques developed in [14] to generate the minimal separators of G^{′}(M) and their parallel relations in O(nm^{2}) time. This will allow us to use a greedy approach to pick a maximal pairwise parallel set of legal minimal separators. These minimal separators will then define a set of fill edges for a proper minimal triangulation, and a perfect phylogeny will be constructed in the form of a clique tree using Maximum Cardinality Search (MCS).
Lemma 4.1
[14]Let Q be a set of maximal pairwise parallel legal minimal separators of a partition intersection graph G(M). Then for each S ∈ Q, S < m.
Define
Algorithm: proper triangulation for 3state characters
1. Stop if there is a pair of characters whose partition intersection graph contains a cycle.
2. Compute
3. Stop if there is a monochromatic pair not separated by any legal minimal separator.
4. Compute the crossing relations for
5. Greedily construct a maximal pairwise parallel subset Q of
6. Add edges to G(M) to make each S ∈ Q a clique. Call this graph G_{Q}.
7. Use MCS to construct a clique tree for G_{Q}.
We proceed with a series of lemmas that will be used in Theorem 4.11 to show that each step is O(nm^{2}). The following simple observation is important for many of our time bounds.
Observation 4.2
Let M be a set of taxa whose characters have at most three states. Then G(M) has O(m) vertices (one vertex per state of each character) and O(m^{2}) edges.
Step 2 of the algorithm uses concepts from [2,8,9,14], which we detail here for completeness. A proper cluster is a bipartition of the taxa (i.e. the taxa are split into two disjoint nonempty sets) such that each character shares at most one state across the bipartition, and at least one character is not shared across this bipartition [8,9]. There are O(m) proper clusters when r is fixed. In particular, suppose χ is not shared across the bipartition of a proper cluster. Then the proper cluster also creates a bipartition of χ’s character states (see Figure 3). Hence, we can compute the set of proper clusters by exhaustively checking, for each character, if some bipartition of its states split the taxa into a proper cluster (there are O(2^{r}) ways to split each character).
Figure 3. Minimal separators and proper clusters. In this figure, the bipartition abcdef gives rise to the proper clusters ab and cdef. The shared character states
Proper clusters generate the minimal separators in
Lemma 4.3
[14]For any set of taxa M on 3state characters,
Proof
Our previous discussion proves that
In the proof of Lemma 4.3, we showed how to compute the Spartition of the taxa for
Lemma 4.4
[14]Let M be a set of 3state taxa and
Before discussing the running time required to compute crossing relations, we first state two structural lemmas on minimal separators; the second follows from a lemma in [19].
Lemma 4.5
[18]Let S and S^{′ }be nonparallel minimal separators. Then for each full component C of G − S^{′}, S has a vertex in C.
Lemma 4.6
(Lemma 3.10,
[19]). Let S and S^{′ }be two minimal separators of a graph G. Then S and S^{′ }are parallel if and only if there exists a full component C_{S }of G − S and a connected component
Because of the slight change from Lemma 3.10 in [19] and for completeness, we give a proof of Lemma 4.6.
Proof
Suppose S and S^{′} are parallel. Since S is a minimal separator, there are at least two full components in G − S and because S^{′} is parallel to S, there is a full component C_{1} of G − S that does not intersect S^{′}. C_{1} is connected in G − S^{′}, so there is a connected component C of G − S^{′} containing C_{1}.
Now, suppose there are C_{S} and
Lemma 4.7
There is an O(nm^{2}) algorithm to calculate the crossing relations of
Proof
Let
Conversely, assume that S and S^{′}are not parallel. Let C_{1} be a full component of G(M) − S and C_{2} be a full component of G(M) − S^{′}. By Lemma 4.5, there is a vertex
It suffices to check for each full component C of G(M) − S and connected component C^{′} of G(M) − S^{′} if t(C) ⊆ t(C^{′}). There are O(m^{2}) pairs of legal minimal separators, and this check takes O(n) time (O(nm^{2}) time overall) when the Spartition has been calculated for each
It is critical for our time bound that any proper minimal triangulation of G(M) have O(n) minimal separators because this impacts the computation of edges contained in the proper minimal triangulation. A result bounding the number of minimal separators in an earlier version of this paper (Lemma 7 in [20]) was incorrect, as demonstrated in Figure 4. We present a corrected bound for the number of minimal separators in the following Lemma.
Figure 4. Bounding minimal separators of proper minimal triangulations. An example of the bound in Lemma 4.8. H has five minimal separators, obtained from every pair of vertices from the set
Lemma 4.8.
Suppose that H is a proper minimal triangulation of G(M). Then H has at most 2n − 3 minimal separators.
Proof
Let
First, consider any leaf a of
To complete the proof, we show a similar result for internal nodes of
Therefore any node of
Remark
The proof of Lemma 4.8 requires minimality of the triangulation, but it does not require that M lacks missing values or that the number of states for each character is bounded.
This Lemma along with the fact that each
Lemma 4.9
Suppose that H is a proper minimal triangulation of G(M) obtained by saturating a maximal pairwise parallel legal set of minimal separators Q. Then H has O(n) minimal separators, O(m) vertices, and O(m^{2}) edges. Furthermore, H can be calculated in O(nm^{2}) time.
Proof
The minimal separator bound follows from Lemma 4.8, and the vertex and edges bounds follow from Observation 4.2 and the fact that H and G(M) have the same vertex set. In order to calculate H, we must calculate the fill edge set E(H) − E(G(M)). Recall that, by Theorem 2.6, the fill edges of H are obtained by saturating each minimal separator in Q. Each S ∈ Q has fewer than m vertices by Lemma 4.1 and Q = O(n) by Lemma 4.8. It is straightforward to check for each S ∈ Q and each pair u, v ∈ S if uv defines a fill edge with an amortized running time of O(nm^{2}). □
In [23], Tarjan and Yannakakis developed Maximum Cardinality Search (MCS), which recognizes chordal graphs in linear time. Blair and Peyton [11] showed how MCS can be used to construct a clique tree for a chordal graph while retaining the linear time bound.
Lemma 4.10
[11]Let G be a chordal graph. Then Maximum Cardinality Search (MCS) can be implemented
to produce a clique tree
Combining these lemmas show that our minimal separator algorithm for constructing perfect phylogenies for r = 3 is competitive with the algorithm of Dress and Steel [1], giving our main result.
Theorem 4.11
The algorithm Proper Triangulation for 3State Characters runs in O(nm^{2}) time.
Proof
The first step can be implemented in O(m^{2}) time as follows. Each pair of characters has a partition intersection graph with
at most six vertices, and it is straightforward to check for cycles. There are O(m^{2}) such pairs of characters. Lemma 4.3 states that step two takes O(nm^{2}) time. For the third and fourth step, we first compute the connected components of
G(M) − S for each
Large partition intersection graphs
Ideally, one would like to find an O(n^{2}m) or O(nm) algorithm for 3state perfect phylogeny (i.e., m is squarefree). In this section, we will construct a family of 3state matrices M that have a perfect phylogeny and Λ(m^{2}) edges in G(M). This discourages attempts to improve our time bound using an approach that explicitly computes the partition intersection graph.
Any 3state character compatible with a perfect phylogeny can be obtained from choosing
any two edges of the phylogeny, removing them, and using the three resulting subtrees
to define each taxon’s state for that character. 2state characters are obtained in
a similar manner, removing a single edge instead of two edges. Therefore, if a 3state
matrix M with distinct columns (up to relabeling) has a perfect phylogeny,
Consider the tree T with taxa t_{1}, t_{2}, …, t_{n} as depicted in Figure
5, and suppose i < j. We construct the character χ^{(i,j)} using the partition {t_{1}, t_{2}, …, t_{i}}, {t_{i + 1}, t_{i + 2}, …, t_{j}}, {t_{j + 1}, t_{j + 2}, …, t_{n}} as in Figure
5. Each set in the partition is called the cell 0, cell 1, and cell 2 of χ^{(i,j)}, respectively. That is, χ^{(i,j)}(t_{1}) = 0, χ^{(i,j)}(t_{i + 1}) = 1, χ^{(i,j)}(t_{j + 1}) = 2, and so on. Let M^{∗} be the matrix whose columns are the characters χ^{(i,j)} for 1 ≤ i < j < n. T is clearly a perfect phylogeny for M^{∗}, and
Figure 5. Characters of a perfect phylogeny with a large partition intersection graph. A 3state character created using “intervals” of taxa from a fully resolved tree
T. The 0^{th} piece of χ^{(i,j)} is the interval
Observation 5.1
Let χ^{(i,j) }and
For example, the cell 1 of χ^{(3,5)} and cell 1 χ^{(4,6)} share taxon t_{5} so
Conclusions
We have demonstrated how to use the minimal separator approach introduced in [14] to construct a perfect phylogeny for 3state data in O(nm^{2}) time. We also constructed a 3state matrix M with a perfect phylogeny that has Λ(m^{2}) edges. Thus, any explicit analysis of the edges of G(M) or of a proper triangulation of G(M) is inadequate to speed up our approach. Faster proper triangulation algorithms should use M for computation instead of G(M) aided with theoretical results about G(M). Constructing tree representations in order to minimally triangulate a graph without explicitly computing the fill edges was studied in [19] in order to achieve a faster time bound, and it would be interesting to see if these ideas can be extended to find a faster construction algorithm for 3state perfect phylogeny.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The authors wrote the paper. All authors read and approved the final manuscript.
Acknowledgements
This research was partially supported by NSF grants CCF0515378, IIS0803564, and CCF1017580. We thank an anonymous referee for careful reading and valuable comments.
References

Kannan S, Warnow T: Inferring evolutionary history from DNA sequences.
SIAM J Comput 1994, 23:713737. Publisher Full Text

Lam F, Gusfield D, Sridhar S: Generalizing the four gamete condition and splits equivalence theorem: perfect phylogeny on three state characters.
SIAM J Discrete Math 2011, 25:11441175. Publisher Full Text

Bodlaender H, Fellows M, Warnow T: Two strikes against perfect phylogeny.
Automata, Languages and Programming, 19th International Colloquium, ICALP 1992, Vienna, Austria, July 1317, 1992, Proceedings. LNCS 1992, 623:273283.

Steel MA: The complexity of reconstructing trees from qualitative characters and subtrees.
J Classification 1992, 9:91116. Publisher Full Text

Gusfield D: Efficient algorithms for inferring evolutionary trees.
Networks 1991, 21:1928. Publisher Full Text

Semple C, Steel M: Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications. New York: Oxford University Press; 2003.

Agarwala R, FernandezBaca D: A polynomialtime algorithm for the perfect phylogeny problem when the number of character states is fixed.
SIAM J Comput 1994, 23:12161224. Publisher Full Text

Kannan S, Warnow T: A fast algorithm for the computation and enumeration of perfect phylogenies.
SIAM J Comput 1997, 26:17491763. Publisher Full Text

Buneman P: A characterisation of rigid circuit graphs.
Discrete Math 1974, 9:205212. Publisher Full Text

Blair J, Peyton B: An introduction to chordal graphs and clique trees.
IMA Volumes in Mathematics and its Applications 1994, 56:127.

Dirac GA: On rigid circuit graphs.
Abh Math Sem Univ Hamburg 1961, 25:7176. Publisher Full Text

Gavril F: The intersection graphs of subtrees in trees are exactly the chordal graphs.
J Comb Theory, Ser B 1974, 16:4756. Publisher Full Text

Gusfield D: The multistate perfect phylogeny problem with missing and removable data.
J Comput Biol 2010, 17:383399. PubMed Abstract  Publisher Full Text

Golumbic M: Algorithmic graph theory and perfect graphs. Volume 57 of Annals of Discrete Mathematics. Essex, UK: Elsevier Science Publishers; 2004.

Kloks T, Kratsch D, Spinrad J: On treewidth and minimum fillin of asteroidal triplefree graphs.
Theor Comput Sci 1997, 175(2):309335. Publisher Full Text

Parra A, Scheffler P: How to use the minimal separators of a graph for its chordal triangulation.
Automata, Languages and Programming, 22nd International Colloquium, ICALP 1995, Szeged, Hungary, July 1014, 1995, Proceedings. LNCS 1995, 944:123134.

Parra A, Scheffler P: Characterizations and algorithmic applications of chordal graph embeddings.
Discret Appl Math 1997, 79:171188. Publisher Full Text

Berry A, Bordat JP, Heggernes P, Simonet G, Villanger Y: A widerange algorithm for minimal triangulation from an arbitrary ordering.
J Algorithms 2006, 58:3366. Publisher Full Text

Gysel R, Lam F, Gusfield D: Constructing perfect phylogenies and proper triangulations for threestate characters.
Algorithms in Bioinformatics, 11th International Workshop, WABI 2011, Saarbrücken, Germany, September 57, 2011, Proceedings. LNCS 2011, 6833:104115.

Ho C, Lee R: Counting clique trees and computing perfect elimination schemes in parallel.
Inf Process Lett 1989, 31:6168. Publisher Full Text

Blair J, Peyton B: On finding minimumdiameter clique trees.

Tarjan R, Yannakakis M: Simple lineartime algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs.
SIAM J Comput 1984, 13:566579. Publisher Full Text