In this work we introduce and study a class of set systems that arise in various ways from trees, graphs and intervals. We are interested in this class because it can provide a setting in which certain hard optimization problems can be solved efficiently, and we provide a particular example of this for a parsimony problem on phylogenetic networks.
We first recall some standard phylogenetic terminology (for more details, the reader can consult [1]). Recall that a hierarchyHH on a finite set XX is a collection of sets with the property that the intersection of any two sets is either empty or equal to one of the two sets.
A hierarchy is maximum if |H|=2|X|−1|H|=2|X|−1, which is the largest possible cardinality. In this case HH corresponds to the set of clusters c(T)c(T) of some rooted binary tree TT with leaf set XX (a cluster of TT is the set of leaves that are separated from the root of the tree by any vertex). A maximum hierarchy necessarily contains {x}{x} for each x∈Xx∈X, as well as XX itself; we will refer to these |X|+1|X|+1 sets as the trivial clusters of XX. More generally, any hierarchy containing all the trivial clusters corresponds to the clusters c(T)c(T) of a rooted tree TT with leaf set XX (examples of these concepts are illustrated in Fig. 1(a), (b)). Note that a hierarchy HH is maximum if and only if (i) HH contains all the trivial clusters, and (ii) each set C∈HC∈H of size greater than 1 can be written as a disjoint union C=A⊔BC=A⊔B, for two (disjoint) sets A,B∈HA,B∈H.
We say that a collection CC of subsets of a finite set XX is a bureaucracy if (i) C≠0̸C≠0̸ and 0̸∉C0̸∉C, and (ii) every hierarchy H⊆CH⊆C can be extended to a maximum hierarchy H′H′ such that H⊆H′⊆CH⊆H′⊆C. In this case, we also say that CC is bureaucratic.
Simple examples of bureaucracies include two extreme cases: the set of clusters of a binary tree, and the set P(X)P(X) of all non-empty subsets of XX. Notice that {{a},{b},{c},{a,b},{a,b,c}}{{a},{b},{c},{a,b},{a,b,c}} and {{a},{b},{c},{b,c},{a,b,c}}{{a},{b},{c},{b,c},{a,b,c}} are both bureaucratic subsets of P(X)P(X) for X={a,b,c}X={a,b,c} but their intersection, {{a},{b},{c},{a,b,c}}{{a},{b},{c},{a,b,c}}, is not. In particular, for an arbitrary subset YY of P(X)P(X) (e.g. Y={{a},{b},{c},{a,b,c}}Y={{a},{b},{c},{a,b,c}}), there may not be a unique minimal bureaucratic subset of P(X)P(X) containing YY.
In the next section we describe a more extensive list of examples, but first we describe some properties and provide a characterization of bureaucracies. In the following lemma, given two sets AA and BB from CC we say that BBcoversAA if A⊊BA⊊B and there is no set C∈CC∈C with A⊊C⊊BA⊊C⊊B.
While it is beyond the scope of this short note, it could be of interest to characterize maximal bureaucratic set systems. The following computational question also seems of interest:
Question. Is there an algorithm for deciding whether or not CC is bureaucratic that runs in time polynomial in |C||C| and |X||X|?