Intersection number (graph theory)

In the mathematical field of graph theory, the intersection number of a graph $$G=(V,E)$$ is the smallest number of elements in a representation of $$G$$ as an intersection graph of finite sets. In such a representation, each vertex is represented as a set, and two vertices are connected by an edge whenever their sets have a common element. Equivalently, the intersection number is the smallest number of cliques needed to cover all of the edges of $$G$$.

A set of cliques that cover all edges of a graph is called a clique edge cover or edge clique cover, or even just a clique cover, although the last term is ambiguous: a clique cover can also be a set of cliques that cover all vertices of a graph. Sometimes "covering" is used in place of "cover". As well as being called the intersection number, the minimum number of these cliques has been called the R-content, edge clique cover number, or clique cover number. The problem of computing the intersection number has been called the intersection number problem, the intersection graph basis problem, covering by cliques, the edge clique cover problem, and the keyword conflict problem.

Every graph with $$n$$ vertices and $$m$$ edges has intersection number at most $$\min(m,n^2/4)$$. The intersection number is NP-hard to compute or approximate, but fixed-parameter tractable.

Intersection graphs
Let $$\mathcal{F}$$ be any family of sets, allowing sets in $$\mathcal{F}$$ to be repeated. Then the intersection graph of $$\mathcal{F}$$ is an undirected graph that has a vertex for each set in $$\mathcal{F}$$ and an edge between each two sets that have a nonempty intersection. Every graph can be represented as an intersection graph in this way. The intersection number of the graph is the smallest number $$k$$ such that there exists a representation of this type for which the union of the sets in $$\mathcal{F}$$ has $$k$$ elements. The problem of finding an intersection representation of a graph with a given number of elements is known as the intersection graph basis problem.

Clique edge covers
An alternative definition of the intersection number of a graph $$G$$ is that it is the smallest number of cliques in $$G$$ (complete subgraphs of $$G$$) that together cover all of the edges of $$G$$. A set of cliques with this property is known as a clique edge cover or edge clique cover, and for this reason the intersection number is also sometimes called the edge clique cover number.

Equivalence
The equality of the intersection number and the edge clique cover number is straightforward to prove. In one direction, suppose that $$G$$ is the intersection graph of a family $$\mathcal{F}$$ of sets whose union $$U$$ has $$k$$ elements. Then for any element $$x\in U$$, the subset of vertices of $$G$$ corresponding to sets that contain $$x$$ forms a clique: any two vertices in this subset are adjacent, because their sets have a nonempty intersection containing $$x$$. Further, every edge in $$G$$ is contained in one of these cliques: if an edge comes from a non-empty intersection of sets containing an element $$y$$, then that edge is contained in the clique for $$y$$. Therefore, the edges of $$G$$ can be covered by $$k$$ cliques, one per element of $$U$$.

In the other direction, if a graph $$G$$ can be covered by $$k$$ cliques, then each vertex $$v$$ of $$G$$ may be represented by a subset of the cliques, the ones that contain vertex $$v$$. Two of these subsets, for two vertices $$u$$ and $$v$$, have a nonempty intersection if and only if there is a clique in the intersection that contains both of them, if and only if there is an edge $$uv$$ included in one of the covering cliques.

Applications
The representation of a graph as an abstract intersection graph of sets can be used to construct more concrete geometric intersection representations of the same graph. In particular, if a graph has intersection number $$k$$, it can be represented as an intersection graph of $$k$$-dimensional unit hyperspheres (its sphericity is at most $$k$$).

A clique cover can be used as a kind of adjacency labelling scheme for a graph, in which one labels each vertex by a binary value with a bit for each clique, zero if it does not belong to the clique and one if it belongs. Then two vertices are adjacent if and only if the bitwise and of their labels is nonzero. The length of the labels is the intersection number of the graph. This method was used in an early application of intersection numbers, for labeling a set of keywords so that conflicting keywords could be quickly detected, by E. Kellerman of IBM. For this reason, another name for the problem of computing intersection numbers is the keyword conflict problem. Similarly, in computational geometry, representations based on the intersection number have been considered as a compact representation for visibility graphs, but there exist geometric inputs for which this representation requires a near-quadratic number of cliques.

Another class of applications comes from scheduling problems in which multiple users of a shared resource should be scheduled for time slots, in such a way that incompatible requests are never scheduled for the same time slot but all pairs of compatible requests are given at least one time slot together. The intersection number of a graph of compatibilities gives the minimum number of time slots needed for such a schedule. In the design of compilers for very long instruction word computers, a small clique cover of a graph of incompatible operations can be used to represent their incompatibilities by a small number of artificial resources, allowing resource-based scheduling techniques to be used to assign operations to instruction slots.

Shephard and Vetta observe that the intersection number of any network equals the minimum number of constraints needed in an integer programming formulation of the problem of computing maximum independent sets, in which one has a 0-1 variable per vertex and a constraint that in each clique of a clique cover the variables sum to at most one. They argue that, for the intersection graphs of paths in certain fiber optic communications networks, these intersection numbers are small, explaining the relative ease of solving certain optimization problems in allocating bandwidth on the networks.

In statistics and data visualization, edge clique covers of a graph representing statistically indistinguishable pairs of variables are used to produce compact letter displays that assist in visualizing multiple pairwise comparisons, by assigning a letter or other visual marker for each clique and using these to provide a graphical representation of which variables are indistinguishable.

In the analysis of food webs describing predator-prey relationships among animal species, a competition graph or niche overlap graph is an undirected graph in which the vertices represent species, and edges represent pairs of species that both compete for the same prey. These can be derived from a directed acyclic graph representing predator-prey relations by drawing an edge $$u-v$$ in the competition graph whenever there exists a prey species $$w$$ such that the predator-prey relation graph has edges $$u\to w$$ and $$v\to w$$. Every competition graph must have at least one isolated vertex, and the competition number of an arbitrary graph represents the smallest number of isolated vertices that could be added to make it into a competition graph. Biologically, if part of a competition graph is observed, then the competition number represents the smallest possible number of unobserved prey species needed to explain it. The competition number is at most equal to the intersection number: one can transform any undirected graph into a competition graph by adding a prey species for each clique in an edge clique cover. However, this relation is not exact, because it is also possible for the predator species to be prey of other species. In a graph with $$n$$ vertices, at most $$n-2$$ of them can be the prey of more than one other species, so the competition number is at least the intersection number minus $n-2$.

Edge clique covers have also been used to infer the existence of protein complexes, systems of mutually interacting proteins, from protein–protein interaction networks describing only the pairwise interactions between proteins. More generally, Guillaume and Latapy have argued that, for complex networks of all types, replacing the network by a bipartite graph connecting its vertices to the cliques in a clique cover highlights the structure in the network.

Upper bounds
Trivially, a graph with $$m$$ edges has intersection number at most $$m$$. Each edge is itself a two-vertex clique. There are $$m$$ of these cliques and together they cover all the edges. It is also true that every graph with $$n$$ vertices has intersection number at most $$\lfloor n^2/4\rfloor$$. More strongly, the edges of every $$n$$-vertex graph can be covered by at most $$\lfloor n^2/4\rfloor$$ cliques, all of which are either single edges or triangles. An algorithm for finding this cover is simple: remove any two adjacent vertices and inductively cover the remaining graph. Restoring the two removed vertices, cover edges to their shared neighbors by triangles, leaving edges to unshared neighbors as two-vertex cliques. The inductive cover has at most $$\lfloor (n-2)^2/4\rfloor$$ cliques, and the two removed vertices contribute at most $$n-1$$ cliques, maximized when all other vertices are unshared neighbors and the edge between the two vertices must be used as a clique. Adding these two quantities gives $$\lfloor n^2/4\rfloor$$ cliques total. This generalizes Mantel's theorem that a triangle-free graph has at most $$\lfloor n^2/4\rfloor$$ edges, for in a triangle-free graph the only optimal clique edge cover has one clique per edge and therefore the intersection number equals the number of edges.

An even tighter bound is possible when the number of edges is strictly greater than $$\tfrac{n^2}{4}$$. Let $$p$$ be the number of pairs of vertices that are not connected by an edge in the given graph $$G$$, and let $$t$$ be the unique integer for which $$(t-1)t\le p < t(t+1)$$. Then the intersection number of $$G$$ is at most $$p+t$$. Graphs that are the complement of a sparse graph have small intersection numbers: the intersection number of any $$n$$-vertex graph $$G$$ is at most $$2e^2(d+1)^2\ln n$$, where $$e$$ is the base of the natural logarithm and $$d$$ is the maximum degree of the complement graph of $$G$$.

It follows from deep results on the structure of claw-free graphs that, when a connected $$n$$-vertex claw-free graph has at least three independent vertices, it has intersection number at most $$n$$. It remains an unsolved problem whether this is true of all claw-free graphs without requiring them to have large independent sets. An important subclass of the claw-free graphs are the line graphs, graphs representing edges and touching pairs of edges of some other graph $$G$$. An optimal clique cover of the line graph $$L(G)$$ may be formed with one clique for each triangle in $$G$$ that has two or three degree-2 vertices, and one clique for each vertex that has degree at least two and is not a degree-two vertex of one of these triangles. The intersection number is the number of cliques of these two types.

In the Erdős–Rényi–Gilbert model of random graphs, in which all graphs on $$n$$ labeled vertices are equally likely (or equivalently, each edge is present or absent, independently of other edges, with probability $$\tfrac12$$) the intersection number of an $$n$$-vertex random graph is with high probability $$\Theta\left(\frac{n^2}{\log^2n}\right),$$ smaller by a factor of $$\log^2 n$$ than the number of edges. In these graphs, the maximum cliques have (with high probability) only a logarithmic number of vertices, implying that this many of them are needed to cover all edges. The tricker part of the bound is proving that it is possible to find enough logarithmically-sized cliques to cover many edges, allowing the remaining leftover edges to be covered by two-vertex cliques.

Much of the early research on intersection numbers involved calculating these numbers on various specific graphs, such as the graphs formed by removing a complete subgraph or a perfect matching from a larger complete graph.

Computational complexity
Testing whether a given graph $$G$$ has intersection number at most a given number $$k$$ is NP-complete. Therefore, it is also NP-hard to compute the intersection number of a given graph. In turn, the hardness of the intersection number has been used to prove that it is NP-complete to recognize the squares of split graphs.

The problem of computing the intersection number is, however, fixed-parameter tractable: that is, it can be solved in an amount of time bounded by a polynomial in $$n$$ multiplied by a larger but computable function of the intersection number $$k$$. This may be shown by observing that there are at most $$2^k$$ distinct closed neighborhoods in the graph – two vertices that belong to the same set of cliques have the same neighborhood – and that the graph formed by selecting one vertex per closed neighborhood has the same intersection number as the original graph. Therefore, in polynomial time the input can be reduced to a smaller kernel with at most $2^k$ vertices and $$O(4^k)$$ edges. Applying an exponential time dynamic programming search procedure over subsets of edges of this kernel gives time $$2^{O(4^k)}+n^{O(1)}$$, double exponential in $$k$$. The double-exponential dependence on $$k$$ cannot be reduced to single exponential by a kernelization of polynomial size, unless the polynomial hierarchy collapses, and if the exponential time hypothesis is true then double-exponential dependence is necessary regardless of whether kernelization is used. On graphs of bounded treewidth, dynamic programming on a tree decomposition of the graph can find the intersection number in linear time, but simpler algorithms based on finite sets of reduction rules do not work.

The problem cannot be approximated in polynomial time with an approximation ratio better than $$n^c$$, for some constant $$c$$, and the best approximation ratio is known is better than the trivial $$O(n^2)$$ by only a polylogarithmic factor. Researchers in this area have also investigated the computational efficiency of heuristics, without guarantees on the solution quality they produce, and their behavior on real-world networks.

More efficient algorithms are known for certain special classes of graphs. The intersection number of an interval graph is always equal to its number of maximal cliques, which may be computed in polynomial time. More generally, in chordal graphs, the intersection number may be computed by an algorithm that considers the vertices in an elimination ordering of the graph (an ordering in which each vertex and its later neighbors form a clique) and that, for each vertex $$v$$, forms a clique for $$v$$ and its later neighbors whenever at least one of the edges incident to $$v$$ is not covered by any earlier clique. It is also possible to find the intersection number in linear time in circular-arc graphs. However, although these graphs have only a polynomial number of cliques to choose among for the cover, having few cliques alone is not enough to make the problem easy: there exist families of graphs with polynomially many cliques for which the intersection number remains NP-hard. The intersection number can also be found in polynomial time for graphs whose maximum degree is five, but is NP-hard for graphs of maximum degree six. On planar graphs, computing the intersection number exactly remains NP-hard, but it has a polynomial-time approximation scheme based on Baker's technique.