Skip to main content

Incompatible quartets, triplets, and characters

Abstract

We study a long standing conjecture on the necessary and sufficient conditions for the compatibility of multi-state characters: There exists a function f(r) such that, for any set C of r-state characters, C is compatible if and only if every subset of f(r) characters of C is compatible. We show that for every r≥2, there exists an incompatible set C of Ω(r2)r-state characters such that every proper subset of C is compatible. This improves the previous lower bound of f(r)≥r given by Meacham (1983), and f(4)≥5 given by Habib and To (2011). For the case when r=3, Lam, Gusfield and Sridhar (2011) recently showed that f(3)=3. We give an independent proof of this result and completely characterize the sets of pairwise compatible 3-state characters by a single forbidden intersection pattern.

Our lower bound on f(r) is proven via a result on quartet compatibility that may be of independent interest: For every n≥4, there exists an incompatible set Q of Ω(n2) quartets over n labels such that every proper subset of Q is compatible. We show that such a set of quartets can have size at most 3 when n=5, and at most O(n3) for arbitrary n. We contrast our results on quartets with the case of rooted triplets: For every n≥3, if R is an incompatible set of more than n−1 triplets over n labels, then some proper subset of R is incompatible. We show this bound is tight by exhibiting, for every n≥3, a set of n−1 triplets over n taxa such that R is incompatible, but every proper subset of R is compatible.

Background

The multi-state character compatibility (or perfect phylogeny) problem is a basic question in computational phylogenetics [1]. Given a set C of characters, we are asked whether there exists a phylogenetic tree that displays every character in C; if so, C is said to be compatible, and incompatible otherwise. The problem is known to be NP-complete [2, 3], but certain special cases are known to be polynomially-solvable [410]. See [11] for more on the perfect phylogeny problem.

In this paper we study a long standing conjecture on the necessary and sufficient conditions for the compatibility of multi-state characters.

Conjecture 1

There exists a function f(r) such that, for any set C of r-state characters, C is compatible if and only if every subset of f(r) characters of C is compatible.

If Conjecture 1 is true, it would follow that we can determine if any set C of r-state characters is compatible by testing the compatibility of each subset of f(r) characters of C, and, in case of incompatibility, output a subset of at most f(r) characters of C that is incompatible. This would allow us to reduce the character removal problem (i.e., finding a subset of characters to remove from C so that the remaining characters are compatible) to f(r)-hitting set which is fixed-parameter tractable [12].

A classic result on binary character compatibility shows that f(2)=2; see [1, 6, 1315]. In 1975, Fitch [16, 17] gave an example of a set C of three 3-state characters such that C is incompatible, but every pair of characters in C is compatible; showing that f(3)≥3. In 1983, Meacham [15] generalized this example to r-state characters for every r≥3 demonstrating a lower bound of f(r)≥r for all r; see also [9]. For the case of r=3, Lam, Gusfield, and Sridhar [9] recently established that f(3)=3.

While the previous results could lead one to conjecture that f(r)=r for all r, Habib and To [18] recently disproved this possibility by exhibiting a set C of five 4-state characters such that C is incompatible, but every proper subset of the characters in C are compatible, showing that f(4)≥5. They conjectured that f(r)≥r+1 for every r≥4.

The main result of this paper is to prove the conjecture stated in [18] by giving a quadratic lower bound on f(r). Formally, we show that for every r≥2, there exists a set C of r-state characters such that all of the following conditions hold.

  1. 1.

    C is incompatible.

  2. 2.

    Every proper subset of C is compatible.

  3. 3.

    |C|= r 2 · r 2 +1.

Therefore, f(r) r 2 · r 2 +1 for every r≥2.

Our proof relies on a new result on quartet compatibility we believe is of independent interest. We show that for every n≥4, there exists a set Q of quartets over a set of n labels such that all of the following conditions hold.

  1. 1.

    Q is incompatible.

  2. 2.

    Every proper subset of Q is compatible.

  3. 3.

    |Q|= n 2 2 · n 2 2 +1.

This is an improvement over the previous lower bound on the maximum cardinality of such an incompatible set of quartets of n−2 given in [3]. We show that such a set of quartets can have size at most 3 when n=5, and at most O(n3) for arbitrary n. We note here that the construction given in [18] showing that f(4)≥5 can be viewed as a special case of the construction given here when n=6.

We study the compatibility of three-state characters further. The work of [9] completely characterized the sets of pairwise compatible 3-state characters by the existence of one of four forbidden intersection patterns. An alternative characterization of this result was given in [10] and was partially derived using the results of [9]. In this paper, we give a proof that f(3)=3 that is independent of the results in [9], and we completely characterize the sets of pairwise compatible 3-state characters by a single forbidden intersection pattern.

We contrast our result on quartet compatibility with a result on the compatibility of rooted triplets: For every n≥3, if R is an incompatible set of triplets over n labels, and |R|>n−1, then some proper subset of R is incompatible. We show this bound is tight by exhibiting, for every n≥3, a set of n−1 triplets over n labels such that R is incompatible, but every proper subset of R is compatible.

Preliminaries

Given a graph G, we represent the vertices and edges of G by V(G) and E(G) respectively. We use the abbreviated notation uv for an edge {u,v}E(G). For any eE(G), Ge represents the graph obtained from G by deleting edge e. For an integer i, we use [i] to represent the set {1,2,,i}.

Unrooted phylogenetic trees

An unrooted phylogenetic tree (or just tree) is a tree T whose leaves are in one to one correspondence with a label set L(T), and has no vertex of degree two. See Figure 1(a) for an example. For a collection T of trees, the label set of T, denoted L(T), is the union of the label sets of the trees in T. A tree is binary if every internal (non-leaf) vertex has degree three. A quartet is a binary tree with exactly four leaves. A quartet with label set {a,b,c,d} is denoted a b|c d if the path between the leaves labeled a and b does not intersect with the path between the leaves labeled c and d.

Figure 1
figure 1

A phylogenetic tree and a restricted subtree. (a) shows a tree T witnessing that the quartets q 1 =ab|ce, q 2 =cd|bf, and q 3 =ad|ef are compatible; T is also a witness that the characters χ q 1 =ab|ce|d|f, χ q 2 =cd|bf|a|e, and χ q 3 =ad|ef|b|c are compatible; (b) shows T|{a,b,c,d,e}.

For a tree T, and a label set LL(T), the restriction of T to L, denoted by T|L, is the tree obtained from the minimal subtree of T connecting all the leaves with labels in L by suppressing vertices of degree two. See Figure 1(b) for an example. A tree T displays another tree T, if T can be obtained from T|L(T) by contracting edges. A tree T displays a collection of trees T if T displays every tree in T. If such a tree T exists, then we say that T is compatible; otherwise, we say that T is incompatible. See Figure 1(a) for an example. Determining if a collection of unrooted trees is compatible is NP-complete [3].

Multi-state characters

There is also a notion of compatibility for sets of partitions of a label set L. A character χ on L is a partition of L; the parts of χ are called states. If χ has at most r parts, then χ is an r-state character. Given a tree T with L=L(T) and a state s of χ, we denote by T s (χ) the minimal subtree of T connecting all leaves with labels having state s for χ. We say that χ is convex on T, or equivalently T displays χ, if the subtrees T i (χ) and T j (χ) are vertex disjoint for all states i and j of χ where ij. A collection C of characters is compatible if there exists a tree T on which every character in C is convex. If no such tree exists, then we say that C is incompatible. See Figure 1(a) for an example. The perfect phylogeny problem (or character compatibility problem) is to determine whether a given set of characters is compatible.

For a collection C of characters, the intersection graph of C which we will denote by G(C), is the undirected graph G=(V,E) which has a vertex c i for each character cC and each state i of c, and an edge c i d j precisely when there is a taxon having state i for character c and state j for character d. Note that G(C) cannot have an edge between vertices associated with different states of the same character.

A graph G is chordal if there are no induced chordless cycles of length four or greater in H. In [19], Buneman established a fundamental connection between the perfect phylogeny problem and chordal graphs which we now describe. For a given set C of characters, suppose we color each of the vertices of G(C) by assigning a unique color to each character cC, and giving each vertex of G(C) corresponding to a state of c with the color assigned to the character c. A proper triangulation of G(C) is a chordal supergraph of G(C) such that every edge has endpoints with different colors.

Theorem 1.

A set C of characters is compatible if and only if G(C) has a proper triangulation.

Since there is no proper triangulation for a cycle in G(C) involving only vertices from two characters, we have the following corollary.

Corollary 1.

Let C be a collection of two characters. Then C is compatible if and only if G(C) is acyclic.

Quartet rules

We now introduce quartet (closure) rules which were originally used in the contexts of psychology [20] and linguistics [21]. The idea is that for a collection Q of quartets, any tree that displays Q may also necessarily display another quartet qQ, and if so we write Qq.

Example 1.

Let Q={a b|c e,a e|c d}. Then the tree of Figure 1(b) displays Q, and furthermore, it is easy to see that it is the only tree that displays Q. Hence, Qa b|d e, Qa b|c d, and Qb e|c d.

We use the following quartet rules in this paper:

{ab|cd,ab|ce}ab|de
(R1)
{ab|cd,ac|de}ab|ce
(R2)

For the purposes of this paper, we define the closure of an arbitrary collection Q of quartets, denoted Q, as the minimal set of quartets that contains Q, and has the property that if for some q1,q2Q, {q1,q2}q3 using either (R1) or (R2), then q3Q. Clearly, any tree that displays Q must also display Q. We will use the following lemma which follows by repeated application of (R!) and is formally proven in [22].

Lemma 1.

Let Q be an arbitrary set of quartets with {x,y,z1,…,z k }L(Q). If

i = 1 k 1 { xy | z i z i + 1 } Q ,

then x y|z1z k Q.

We refer the reader to [1, 23] for more on quartet rules.

Incompatible quartets

For every s,t≥2, we fix a set of labels Ls,t={a1,a2,…,a s ,b1,b2,…,b t } and define the set

Q s , t = { a 1 b 1 | a s b t } i = 1 s 1 j = 1 t 1 { a i a i + 1 | b j b j + 1 }

of quartets with L(Qs,t)=Ls,t. We denote the quartet a1b1|a s b t by q0, and a quartet of the form a i ai+1|b j bj+1 by qi,j.

Observation 1.

For all s,t ≥ 2, |Qs,t|=(s−1)(t−1) + 1.

Lemma 2.

For all s,t≥2, Qs,t is incompatible.

Proof.

For each i[s−1],

j = 1 t 1 { a i a i + 1 | b j b j + 1 } Q s , t Q s , t .

Then, by Lemma 1, it follows that for each i[s−1], a i a i + 1 | b 1 b t Q s , t . So,

i = 1 s 1 { b 1 b t | a i a i + 1 } Q s , t .

Then, again by Lemma 1, it follows that b 1 b t | a 1 a s Q s , t . But then { a 1 b 1 | a s b t , b 1 b t | a 1 a s } Q s , t . It follows that any tree that displays Qs,t must display both a1b1|a s b t and b1b t |a1a s . However, no such tree exists. Hence, Qs,t is incompatible. □

Lemma 3.

For all s,t≥2, every proper subset of Qs,t is compatible.

Proof.

Since every subset of a compatible set of quartets is compatible, it suffices to show that for every qQs,t, Qs,t{q} is compatible. Let qQs,t. Either q=q0 or q=qx,y for some 1≤x<s and 1≤y<t. In either case, we exhibit a tree witnessing that Qs,t{q} is compatible. □

  • Case 1. Suppose q=q0. We build the tree T as follows: There is a node for each label Ls,t and two additional nodes a and b along with the edge ab. There is an edge a x a for every a x Ls,t, and an edge b x b for every b x Ls,t. There are no other nodes or edges in T. See Figure 2(a) for an illustration. Now consider any quartet qQs,t{q0}. Then q=a i ai+1|b j bj+1 for some 1≤i<s and 1≤j<t. Then, the minimal subgraph of T connecting leaves with labels in {a i ,ai+1,b j ,bj+1} is the quartet q. Hence T displays q.

Figure 2
figure 2

Illustrating the proof of Lemma 3. (a) Case 1: a tree that displays Qs,t{q0}. (b) Case 2: a tree that displays Qs,t{qx,y}.

  • Case 2. Suppose q=qx,y for some 1≤x<s and 1≤y<t. We build the tree T as follows: There is a node for each label Ls,t and six additional nodes a , b , , h, a h , and b h . There are edges a , b , h, h a h , and h b h . For every a i Ls,t, there is an edge a i a if ix, and an edge a i a h if i>x. For every b j Ls,t there is an edge b j b if jx, and an edge b j b h if j>y. There are no other nodes or edges in T. See Figure 2(b). Now consider any quartet qQs,t{qx,y}. Either q=q0 or q=qi,j where ix or jy. If q=q0, then the minimal subgraph of T connecting leaves with labels in {a1,b1,a s ,b t } is the subtree of T induced by the nodes in {a1,a ,,b ,b1,a s ,a h ,h,b h ,b t }. Suppressing all degree two vertices results in a tree that is the same as q0. So T displays q. So assume that q=a i ai+1|b j bj+1 where ix or jy. We define the following subset of the nodes in T:

    V = { a i , a i + 1 , a , , b , b j , b j + 1 } if i < x and j < y , { a i , a i + 1 , a , , b y , b , h , b h , b y + 1 } if i < x and j = y , { a i , a i + 1 , a , , h , b h , b j , b j + 1 } if i < x and j > y , { a x , a , , h , a h , a x + 1 , b , b j , b j + 1 } if i = x and j < y , { a x , a , , h , a h , a x + 1 , b h , b j , b j + 1 } if i = x and j > y , { a j , a j + 1 , a h , h , , b , b j , b j + 1 } if i > x and j < y , { a j , a j + 1 , a h , h , b y , b , , b h , b y + 1 } if i > x and j = y , { a j , a j + 1 , a h , h , b h , b j , b j + 1 } if i > x and j > y.

Now, the subgraph of T induced by the nodes in V is the minimal subgraph of T connecting leaves with labels in q. Suppressing all degree two vertices gives q. Hence, T displays q.

With s= n 2 and t= n 2 , Observation 1and Lemmas 2 and 3 imply the following theorem.

Theorem 2.

For every integer n≥4, there exists a set Q of quartets over n taxa such that all of the following conditions hold.

  1. 1.

    Q is incompatible.

  2. 2.

    Every proper subset of Q is compatible.

  3. 3.

    |Q|= n 2 2 · n 2 2 +1.

Incompatible quartets on five taxa

When Q is a set of quartets over five taxa, we show that the set of quartets given by Theorem 2 is as large as possible. We hope that the technique used in the proof of the following theorem might be useful in proving tight bounds for n>5.

Theorem 3.

If Q is an incompatible set of quartets over five taxa such that every proper subset of Q is compatible, then |Q|≤3.

Lemma 4.

Let Q be an incompatible set of quartets with L(Q)={a,b,c,d,e} and q0=a b|c dQ. We will show that Q contains an incompatible subset of at most three quartets. If Q contains two different quartets on the same four taxa, then Q must contain an incompatible pair of quartets. So, we may assume that each quartet is on a unique subset of four of the five taxa. Hence, every pair of quartets in Q shares three taxa in common. We have the following two cases.

  • Case 1: Q contains at least one of the quartets a c|b e, a c|d e, a d|b e, a d|c e, a e|b c, a e|b d, b c|d e, or b d|c e. W.l.o.g. we may assume that Q contains q1=a c|d e, as all other cases are symmetric. By (R2), {q0,q1}a b|c e. Then, by (R1), {q0,q1,a b|c e}a b|d e. Then, again by (R1), {q0,q1,a b|c e,a b|d e}b c|d e. Now let Q={q0,q1,a b|c e,a b|d e,b c|d e}. Now, any quartet in Q must be either in Q or be pairwise incompatible with a quartet in Q. Since Q is compatible, but by assumption, Q is incompatible, Q must contain a quartet q2 that is pairwise incompatible with some quartet in Q. Hence, {q0,q1,q2} is an incompatible subset of Q.

  • Case 2: Q contains none of the quartets a c|b e, a c|d e, a d|b e, a d|c e, a e|b c, a e|b d, b c|d e, or b d|c e. Then every quartet in Q is either of the form a b|x y where {x,y}≠{c,d}, or c d|x y where {x,y}≠{a,b}. But then Q is compatible, contradicting our assumption that Q is incompatible.

In either case, the theorem holds. □

Incompatible quartets on arbitrarily many taxa

We say a set Q of compatible quartets is redundant if for some qQ, Q{q}q; otherwise, we say that Q is irredundant. The following lemma establishes a connection between sets of irredundant quartets and minimal sets of incompatible quartets.

Lemma 4.

If Q is incompatible, but every proper subset of Q is compatible, then every proper subset of Q is irredundant.

Proof.

Suppose that Q is incompatible and every proper subset of Q is compatible. Furthermore, suppose that some proper subset Q of Q is redundant. Since every compatible superset of a redundant set of quartets is also redundant, we may assume w.l.o.g., that there is a unique quartet qQQ (i.e., |Q|=|Q|+1). Since Q is redundant, there exists a qQ such that Q{q}q. But then (Q{q}){q} is incompatible, contradicting that every proper subset of Q is compatible. □

It follows from Lemma 4 that any upper bound on the maximum cardinality of an irredundant set of quartets can be used to place an upper bound on the maximum cardinality of a set of quartets satisfying the first two conditions of Theorem 2. The theorem follows from [22].

Theorem 4.

Let Q be a set of quartets over a set of n taxa. If Q is irredundant, then Q has cardinality at most (n−3)(n−2)2/3.

Lemma 4 together with Theorem 4 gives the following upper bound on the maximum cardinality of a set Q of quartets over n>5 taxa that satisfies the first two conditions of Theorem 2.

Theorem 5.

Let Q be a set of incompatible quartets over a set of n taxa such that every proper subset of Q is compatible. Then |Q|≤(n−3)(n−2)2/3+1.

Incompatible characters

There is a natural correspondence between quartet compatibility and character compatibility that we now describe. Let Q be a set of quartets, n=|L(Q)|, and r=n−2. For each q=a b|c dQ, we define the r-state character corresponding to q, denoted χ q , as the character where a and b have state 0 for χ q ; c and d have state 1 for χ q ; and, for each L(Q){a,b,c,d}, there is a state s of χ q such that is the only label with state s for character χ q (see Example 2). We define the set of r-state characters corresponding to Q by

C Q = q Q { χ q }.

Example 2.

Consider the quartets and characters given in Figure 1(a): χ q 1 is the character corresponding to q1, χ q 2 is the character corresponding to q2, and χ q 3 is the character corresponding to q3.

The following lemma relating quartet compatibility to character compatibility is well known [24], and its proof is omitted here.

Lemma 5.

A set Q of quartets is compatible if and only if C Q is compatible.

The next theorem allows us to use our result on quartet compatibility to establish a lower bound on f(r).

Theorem 6.

Let Q be a set of incompatible quartets over n labels such that every proper subset of Q is compatible, and let r=n−2. Then, there exists a set C of |Q|r-state characters such that C is incompatible, but every proper subset of C is compatible.

Proof.

We claim that C Q is such a set of incompatible r-state characters. Since for two quartets q1,q2Q, χ q 1 χ q 2 , it follows that |C Q |=|Q|. Since Q is incompatible, it follows by Lemma 5 that C Q is incompatible. Let C be any proper subset of C. Then, there is a proper subset Q of Q such that C = C Q . Then, since Q is compatible, it follows by Lemma 5 that C is compatible. □

Theorem 2 together with Theorem 6 gives the main theorem of this paper.

Theorem 7.

For every integer r≥2, there exists a set C of r-state characters such that all of the following hold.

  1. 1.

    C is incompatible.

  2. 2.

    Every proper subset of C is compatible.

  3. 3.

    |C|= r 2 · r 2 +1.

Proof.

By Theorem 2 and Observation 1, there exists a set Q of r 2 · r 2 +1 quartets over r+2 labels that that are incompatible, but every proper subset is compatible, namely Q r + 2 2 , r + 2 2 . The theorem follows from Theorem 6. □

The quadratic lower bound on f(r) follows from Theorem 7.

Corollary 2

f(r) r 2 · r 2 +1

.

Three-State Characters

In the remainder of this section we focus on the case when r=3, and thus, fix C to be an arbitrary set of 3-state characters over a set S of taxa. Lam, Gusfield, and Sridhar [9] recently established that f(3)=3, and they completely characterized the sets of pairwise compatible 3-state characters by the existence of one of four forbidden intersection patterns. We give an independent proof that f(3)=3. We then completely characterize the sets of pairwise compatible 3-state characters by a single forbidden intersection pattern. Our proof uses several structural results from the algorithm for the three-state perfect phylogeny problem given by Kannan and Warnow [7].

The Algorithm of Kannan and Warnow

The algorithm of [7] takes a divide and conquer approach to determining the compatibility of a set of three-state characters. An instance is reduced to subproblems by finding a partition S1,S2 of the taxon set S of C with both of the following properties:

  1. 1.

    2≤|S i |≤n−2,i=1,2.

  2. 2.

    Whenever C is compatible S there is a perfect phylogeny P that contains an edge e whose removal breaks P into subtrees P 1 and P 2 with L(P i )=S i ,i=1,2.

A partition of S satisfying both of these properties is a legal partition, and the following theorem shows that finding such a partition for a given set of characters is the crux of the algorithm.

Theorem 8.

[7] Given a set C of three state characters, we can in O(n k) time either find a legal partition of S of determine that the set of characters is incompatible.

Finding a legal partition

We now discuss the manner in which such a legal partition is found for a set of three-state characters C. Let T be a tree witnessing that C is compatible. The canonical labeling of T is the labeling where, for each internal node v of T, and each character αC, if there are leaves x and y in different components of T−{v} such that α(x)=α(y), then α(v)=α(x); otherwise α(v)= where denotes a dummy state for C. Note that such a labeling of T always exists and is unique. We will assume that every compatible tree for C is canonically labeled.

The tree-structure for a character α in T is formed by repeatedly contracting edges of T connecting nodes that have the same state (other than ) for α. Note that this tree does not depend on the sequence of edge-contractions and is thus well defined. Furthermore, there is exactly one node for each state (other than the dummy state) of α, and each node labeled by has degree at least three. A tree-structure for α that is formed from some compatible tree for C is called a realizable tree-structure for α. There are four possible realizable tree-structures for a three-state character α which are shown in Figure 3.

Figure 3
figure 3

The four possible realizable tree-structures for a three-state character α . (a) A path Pi for each i {1,2,3}. (b) A star S.

To find a realizable tree structure for a character α, the algorithm examines the pairwise intersection patterns of α with every other character βcC, and applies the following rules to rule out possible tree structures for α.

Rule 1.

Let α and β be two characters of C. If, under some relabeling of the states of α and β, we have that α1β1, α2β2, and α3β2, then P1 is not a realizable tree-structure for α. If this is the case, we say that α and β match Rule 1 with respect to α1.

Rule 2.

aLet α and β be two characters of C. If, under some relabeling of the states of α and β, we have that α1β1, α2β1, α2β2, and α3β2, then P2 is the only possible realizable tree-structure for α. If this is the case, we say that α and β match Rule 2 with respect to α2.

The set Q α C of candidate tree-structures for α are all of those possible tree-structures for α that are not ruled out after comparing the intersection pattern of α with every other character in C and applying Rules 1 and 2.

The following theorem which follows from [7] shows that a legal partition is found by choosing an arbitrary αC for which Q α C . Furthermore, if there is an αC for which Q α C =, then C is incompatible.

Theorem 9

([7]).If Q α C , then we can find a legal partition of S.

Corollary 3.

A set C of 3-state characters is compatible if and only if Q α C for every αC.

Tight bounds on three-state character compatibility

We use Corollary 3 to give upper bounds on the maximum cardinality of a minimal set of incompatible three-state characters.

Theorem 10.

Let C be a set of three-state characters on species set S. Then C is incompatible if and only if there exists a character αC, and two distinct states α i and α j of α, such that both of the following hold:

  1. 1.

    There is a βcC where the intersection pattern of α and β matches Rule 2 with respect to α i .

  2. 2.

    There is a γC where the intersection pattern of α and γ matches Rule 2 with respect to α j .

Proof.

() If C is pairwise incompatible, then by Corollary 1, there is a pair α,β C whose intersection graph contains a cycle. Since the intersection graph is bipartite, this cycle must have length at least four and contain at least two states of each character. Let α i and α j be the two states of α on this cycle. Then, the intersection pattern of α and β matches Rule 2 with respect to both α i and α j , and so the theorem holds. So we may assume that C is incompatible but pairwise compatible.

It follows from Corollary 3 that there exists an αC such that Q α C =. Then there must exist a character βcC such that the intersection pattern of α and β matches Rule 2 with respect to some state α i of α; otherwise S Q α C . Hence, Q α C { P i }. Then, since Q α C =, there must be a character γC such that the intersection pattern of α and γ places a constraint on Q α C that prevents Q α C from containing Pi. There are two possibilities.

Case 1: There is a state α j of α where ji and the intersection pattern of α and γ matches Rule 2 with respect to α j . In this case the theorem holds.

Case 2: The intersection pattern of α and γ matches Rule 1 with respect to α i . W.l.o.g., we fix i=1, and relabel the states of α, β, and γ so that α1β1, α1β2, α2β1, α3β2, α1γ1, α2γ2, and α3γ2. Such a labeling exists since, by assumption, α and β matches Rule 2 with respect to α1, and α and γ matches Rule 1 with respect to α1.

If α2γ1, then the intersection pattern of α and γ matches Rule 2 with respect to α2, in which case the theorem holds. If α3γ1, then the intersection pattern of α and γ matches Rule 2 with respect to α3, in which case the theorem holds. So we may assume hat α1=γ1. Now, since α1β1, α1β2, and α1=γ1, we have that both β1γ1 and β2γ2.

γ3 must have a nonempty intersection with at least one state of α, and since α1=γ1, we have that α1γ3=. So γ3 has a nonempty intersection with either α2 or α3. Due to the symmetry of the intersection graph of α and β, we may assume, w.l.o.g., that α3γ3.

By assumption, α2γ1=, and if α2γ3, then the intersection graph of α and β contains a cycle, contradicting our assumption that C is pairwise compatible. So we may assume that α2γ2. Then, since β1α2, we have that β1γ2.

Let sα3β2. Since, by assumption, α3γ1=, we have that either sγ2 or sγ3. However, if sγ2, then β2γ2 and intersection graph of β and γ contains a cycle, contradicting our assumption that C is pairwise compatible. Hence sγ3 and β2γ3.

We have now established all of the edges of the intersection graph of α, β, and γ represented by the solid edges in Figure 4. Now, let s5α3γ2. Now s5 must be in some state of β. If s5β1, then s5β1α3 and the intersection graph of β and α contains a cycle, contradicting our assumption that C is pairwise compatible. If s5β2, then s5β2γ2, and the intersection graph of β and γ contains a cycle, again contradicting our assumption that C is pairwise compatible. Hence s5β3. Then, we have that s5β3α3 and s5β3γ2, witnessing the dotted edges in Figure 4. So we have that the intersection pattern of β and α matches Rule 2 with β2 as witness, and the intersection pattern of β and γ matches Rule 2 with β1 as witness. Hence the theorem holds. □

Figure 4
figure 4

Illustrating the proof of Theorem 10.

Note that in the statement of Theorem 10, the characters β and γ are not necessarily distinct. In cases where they are not distinct, C contains an incompatible pair.

Corollary 4.

A set C of 3-state characters is compatible if and only if every subset of at most three characters of C is compatible.

In [9], it was also shown that we can determine the compatibility of a pairwise compatible set C of three-state characters by testing the intersection patterns of C for the existence of one of a set of four forbidden patterns. As a corollary to Theorem 10, we have that a single forbidden pattern suffices to determine the compatibility of C.

Corollary 5.

A pairwise compatible set C of 3-state characters is compatible if and only if the partition intersection graph of C does not contain, up to relabeling of characters and states, the subgraph of Figure 5.

Figure 5
figure 5

The forbidden subgraph for 3-state character compatibility.

Note that each edge of the graph of Figure 5 has one endpoint which is a state in α. It follows that we can find such a subgraph in the partition intersection graph of C by testing the intersection pattern of each pair of characters in C[10]. Furthermore, all p occurrences of the forbidden subgraph in the intersection graph of m characters on n taxa can be found in O(m2n+p) time. Whereas the forbidden subgraph given here is witnessed by eight taxa (or edges), each of the four forbidden subgraphs of [9] are witnessed by five taxa, making them better suited for taxon removal problems.

Incompatible Triplets

A rooted phylogenetic tree (or just rooted tree) is a tree whose leaves are in one to one correspondence with a label set L(T), has a distinguished vertex called the root, and no vertex other than the root has degree two. See Figure 6(a) for an example. A rooted tree is binary if the root vertex has degree two, and every other internal (non-leaf) vertex has degree three. A triplet is a rooted binary tree with exactly three leaves. A triplet with label set {a,b,c} is denoted a b|c if the path between the leaves labeled a and b avoids the path between the leaf labeled c and the root vertex. For a tree T, and a label set LL(T), let T be the minimal subtree of T connecting all the leaves with labels in L. The restriction of T to L, denoted by T|L, is the rooted tree obtained from T by distinguishing the vertex closest to the root of T as the root of T, and suppressing every vertex other than the root having degree two. A rooted tree T displays another rooted tree T if T can be obtained from T|L(T) by contracting edges. A rooted tree T displays a collection of rooted trees T if T displays every tree in T. If such a tree T exists, then we say that T is compatible; otherwise, we say that T is incompatible. Given a collection of rooted trees T, it can be determined in polynomial time if T is compatible [3],[25].

Figure 6
figure 6

Example of rooted phylogenetic trees. (a) shows a tree T that is a witness that the triplets a b|c, d e|b, e f|c, and e c|b are compatible; (b) shows the tree T restricted to the label set {a,b,c,e}.

The following theorems follow from the connection between collections of unrooted trees with at least one common label across all the trees, and collections of rooted trees [3].

Theorem 11.

Let Q be a collection of quartets where every quartet in Q shares a common label . Let R be the set of triplets such that there exists a triplet a b|c in R if and only if there exists a quartet a b|c in Q. Then, Q is compatible if and only if R is compatible.

Let R be a collection of triplets. For a subset SL(R), we define the graph [R,S] as the graph having a vertex for each label in S, and an edge {a,b} if and only if a b|cR for some cS. The following theorem is from page 439 of [26].

Theorem 12.

A collection R of rooted triplets is compatible if and only if [R,S] is not connected for every SL(R) with |S|≥3.

Corollary 6.

Let R be a set of rooted triplets such that R is incompatible but every proper subset of R is compatible. Then, [R,L(R)] is connected.

We now contrast our result on quartet compatibility with a result on triplets.

Theorem 13.

For every n≥3, if R is an incompatible set of triplets over n labels, and |R|>n−1, then some proper subset of R is incompatible.

Proof.

For sake of contradiction, let R be a set of triplets such that R is incompatible, every proper subset of R is compatible, |L(R)|=n, and |R|>n−1. The graph [R,L(R)] will contain n vertices and at least n edges. Since each triplet in R is distinct, there will be a cycle C of length at least three in [R,L(R)]. Since R is incompatible but every proper subset of R is compatible, by Corollary 6, [R,L(R)] is connected.

Consider any edge e in the cycle C. Let t be the triplet that contributed edge e in [R,L(R)]. Let R= Rt. Since the graph [R,L(R)]−e is connected, [R,L(R)] is connected. By Theorem 12, R is incompatible. But R R, contradicting that every proper subset of R is compatible. □

To show the bound is tight, we first prove a more restricted form of Theorem 2.

Theorem 14.

For every n≥4, there exists a set of quartets Q with |L(Q)|=n, and a label L(Q), such that all of the following hold.

  1. 1.

    Every qQ contains a leaf labeled by .

  2. 2.

    Q is incompatible.

  3. 3.

    Every proper subset of Q is compatible.

  4. 4.

    |Q|=n−2.

Proof.

Consider the set of quartets Q2,n−2. From Lemmas 2 and 3, Q2,n−2 is incompatible but every proper subset of Q2,n−2 is compatible. The set Q2,n−2 contains exactly n−2 quartets. From the construction, there are two labels in L which are present in all the quartets in Q2,n−2. Set one of them to be . □

The following is a consequence of Theorems 14 and 11.

Corollary 7.

For every n≥3, there exists a set R of triplets with |L(R)|=n such that all of the following hold.

  1. 1.

    R is incompatible.

  2. 2.

    Every proper subset of R is compatible.

  3. 3.

    |R|=n−1.

The generalization of the Fitch-Meacham examples given in [9] can also be expressed in terms of triplets. For any r≥2, let L={a,b 1 ,b 2 ,,b r }. Let

R r = a b r | b 1 i = 1 r 1 a b i | b i + 1

Let Q={a b|c :a b|cR r } for some label L. The set C Q of r-state characters corresponding to the quartet set Q is exactly the set of characters built for r in [9]. In the partition intersection graph of C Q , (following the terminology in [9]) labels and a correspond to the end cliques and the rest of the r labels {b 1 ,b 2 ,,b r } correspond to the r tower cliques. From Lemma 5 and Theorem 11, R r is compatible if and only of Q is compatible.

Conclusion

We have shown that for every r≥2, f(r) r 2 · r 2 +1, by showing that for every n≥4, there exists an incompatible set Q of n 2 2 · n 2 2 +1 quartets over a set of n labels such that every proper subset of Q is compatible. Previous results [1],[6],[9],[13]-[15], along with our discussion in Section Incompatible Characters, show that our lower bound on f(r) is tight for r=2 and r=3. For quartets, our discussion in Section Incompatible quartets gives an upper bound on the maximum cardinality of a minimal set of incompatible quartets. However, this argument does not extend to multi-state characters. Indeed, an upper bound on the maximum cardinality of a minimal set of incompatible r-state characters remains a central open question. We give the following conjecture.

Conjecture 2.

f(r)Θ(r2).

A less ambituous goal would be to narrow the gap between the upper bound of O(n3) and lower bound of Ω(n2) on the maximum cardinality of a minimal incompatible set of quartets over n taxa given in Section Incompatible Quartets. Note that, due to Theorem 6, a proof of Conjecture 2 would also show that the number of incompatible quartets given in the statement of Theorem 2 is also as large as possible.

Endnote

aRule 2 was state incorrectly in[7].

References

  1. Semple C, Steel M: Phylogenetics. Oxford Lecture Series in Mathematics and its Applications. USA: Oxford University Press 2003.

    Google Scholar 

  2. Bodlaender H, Fellows M, Warnow T: Two strikes against perfect phylogeny. Automata, Languages and Programming, Volume 623 of Lecture Notes in Computer Science. Edited by: Kuich W. Berlin/Heidelberg: Springer 1992, 273-283.

    Google Scholar 

  3. Steel M: The complexity of reconstructing trees from qualitative characters and subtrees. J Classif. 1992, 9: 91-116. 10.1007/BF02618470.

    Article  Google Scholar 

  4. Agarwala R, Fernández-Baca D: A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM J Comput. 1994, 23 (6): 1216-1224. 10.1137/S0097539793244587.

    Article  Google Scholar 

  5. Dress A, Steel M: Convex tree realizations of partitions. Appl Math Lett. 1992, 5 (3): 3-6. 10.1016/0893-9659(92)90026-6.

    Article  Google Scholar 

  6. Gusfield D: Efficient algorithms for inferring evolutionary trees. Networks. 1991, 21: 19-28. 10.1002/net.3230210104.

    Article  Google Scholar 

  7. Kannan S, Warnow T: Inferring evolutionary history from DNA sequences. SIAM J Comput. 1994, 23 (4): 713-737. 10.1137/S0097539791222171.

    Article  Google Scholar 

  8. Kannan S, Warnow T: A fast algorithm for the computation and enumeration of perfect phylogenies. SIAM J Comput. 1997, 26 (6): 1749-1763. 10.1137/S0097539794279067.

    Article  Google Scholar 

  9. Lam F, Gusfield D, Sridhar S: Generalizing the splits equivalence theorem and four Gamete condition: perfect phylogeny on three-state characters. SIAM J Discrete Math. 2011, 25 (3): 1144-1175. 10.1137/090776305.

    Article  Google Scholar 

  10. Shutters B, Fernández-Baca D: A simple characterization of the minimal obstruction sets for three-state perfect phylogenies. Appl Math Lett. 2012, 25 (9): 1226-1229. 10.1016/j.aml.2012.02.060.

    Article  Google Scholar 

  11. Fernández-Baca D: The Perfect Phylogeny Problem. Steiner Trees in Industry. Dordrecht: Kluwer 2001, 203-234.

    Chapter  Google Scholar 

  12. Niedermeier R, Rossmanith P: An efficient fixed-parameter algorithm for 3-Hitting Set. J Discrete Algorithms. 2003, 1: 89-102. 10.1016/S1570-8667(03)00009-1.

    Article  Google Scholar 

  13. Buneman P: The recovery of trees from measurements of dissimilarity. Mathematics in the Archeological and Historical Sciences. Edinburgh: Edinburgh University Press 1971, 387-395.

    Google Scholar 

  14. Estabrook GF, Johnson J, McMorris FR: A mathematical foundation for the analysis of cladistic character compatibility. Math Biosci. 1976, 29 (1-2): 181-187. 10.1016/0025-5564(76)90035-3.

    Article  Google Scholar 

  15. Meacham CA: Theoretical and computational considerations of the compatibility of qualitative taxonomic characters. Numerical Taxonomy, Volume G1 of Nato ASI series. Heidelberg: Springer 1983, 304-314.

    Google Scholar 

  16. Fitch WM: Toward finding the tree of maximum parsimony. Proceedings of the 8th International Conference on Numerical Taxonomy. San Francisco: Freeman 1975, 189-230.

    Google Scholar 

  17. Fitch WM: On the problem of discovering the most parsimonious tree. Am Nat. 1977, 111 (978): 223-257. 10.1086/283157.

    Article  Google Scholar 

  18. Habib M, To TH: On a conjecture of compatibility of multi-states characters. Algorithms in Bioinformatics, Volume 6833 of Lecture Notes in Computer Science. Edited by: Przytycka T, Sagot MF. Berlin/Heidelberg: Springer 2011, 116-127.

    Google Scholar 

  19. Buneman P: A characterization of rigid circuit graphs. Discrete Math. 1974, 9: 205-212. 10.1016/0012-365X(74)90002-8.

    Article  Google Scholar 

  20. Colonius H, Schulze HH: Tree structures for proximity data. Br J Math Stat Psychol. 1981, 34 (2): 167-180. 10.1111/j.2044-8317.1981.tb00626.x.

    Article  Google Scholar 

  21. Dekker MCH: Reconstruction methods for derivation trees. Master’s thesis. Amsterdam: Vrije Universiteit 1986.

    Google Scholar 

  22. Dietrich M, McCartin C, Semple C: Bounding the maximum size of a minimal definitive set of quartets. Inf Process Lett. 2012, 112 (16): 651-655. 10.1016/j.ipl.2012.06.001.

    Article  Google Scholar 

  23. Grünewald S, Huber KT: Identifying and defining trees. Reconstructing Evolution: New Mathematical and Computational Advances. Edited by: Gascuel O, Steel M. Oxford University Press, 2007.

    Google Scholar 

  24. Steel M: Personal communications. 2012,

    Google Scholar 

  25. Aho AV, Sagiv Y, Szymanski TG, Ullman JD: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput. 1981, 10 (3): 405-421. 10.1137/0210030.

    Article  Google Scholar 

  26. Bryant D, Steel M: Extension operations on sets of leaf-labelled trees. Adv Appl Math. 1995, 16: 425-453. 10.1006/aama.1995.1020.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Sylvain Guillemot, Mike Steel, and Rob Gysel for valuable comments. This work was supported in part by the National Science Foundation under grants CCF-1017189 and DEB-0829674.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brad Shutters.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

BS was responsible for the lower bounds on character compatibility, the upper and lower bounds on quartet compatibility, the characterization of three-state character compatibility, and wrote all portions of the manuscript other than the section on triplet compatibility. SV was responsible for the upper and lower bounds on triplet compatibility, contributed to the lower bounds on quartet and character compatibility, and wrote the portion of the manuscript on triplet compatibility. DFB supervised the project. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Shutters, B., Vakati, S. & Fernández-Baca, D. Incompatible quartets, triplets, and characters. Algorithms Mol Biol 8, 11 (2013). https://doi.org/10.1186/1748-7188-8-11

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1748-7188-8-11

Keywords