Pick an -vertex graph uniformly at random. Pick another one. Will it have the same chromatic number? Or if not, how different are their chromatic numbers likely to be if is large?
The chromatic number of a random graph is a random variable, so what we are really asking is: Is this random variable essentially deterministic? That is, is the weight of the distribution concentrated on one value, or on just a few values which are close together?
Until recently we did not know whether or not this is the case. In this blog post I’ll describe a new result showing that, at least for infinitely many , the chromatic number of a random -vertex graph is not concentrated on fewer than about consecutive values.
A uniform -vertex graph is generated by the random graph where we include each possible edge independently with probability , and more generally we can ask about the chromatic number of .
Results about generally fall into one of two categories: Can we find (i) upper and lower bounds for its typical value, and (ii) bounds on how much it varies about this typical value?
The typical value
On (i), we have the well-known 1987 result of Bollobás who showed that, with high probability (whp), . This was improved and generalised several times, the current best bounds are from 2016:
(I am writing out this result because it will become important when talking about question (ii) later.)
Upper concentration bounds
So how about the width of the distribution of — what is the length of the shortest interval (or rather sequence of intervals) which is likely to contain ?
Of course (1) is already a weak concentration type result, giving an explicit such interval of length . However, it turns out that if we don’t have to specify the interval, we can do a lot better.
The starting point for question (ii) is the classic 1987 result of Shamir and Spencer, who showed that for any function , is whp contained in some sequence of intervals of length about . If quickly enough, however, much sharper concentration holds: in 1997, Alon and Krivelevich reduced the interval length to only in the case . So here the chromatic number behaves almost like a deterministic function; almost all of the weight of the distribution is on two consecutive values.
The opposite question
In view of strong results showing that the chromatic number is sharply concentrated, in the late 1980s Bollobás raised the following question (later popularized by him and Erdős): Can we find any examples where χ() is not very sharply concentrated? This is trivially true for p = 1 − 1/(10n) (as noted by Alon and Krivelevich), but how about non-trivial examples, and what about the most natural special case, p = 1/2?
This question remained open for quite some time, for a simple reason: while we have a number of standard tools to prove upper bounds on the concentration of a random variable (for example the martingale concentration argument of Shamir and Spencer), we have few ways of giving lower concentration bounds. (Unless we can work out the entire approximate distribution, for example by proving asymptotic normality.)
A lower bound for concentration
I will sketch the main ideas needed for the following result:
Theorem 1 (H. 2019; H., Riordan 2021). Let ε > 0, and let be a sequence of intervals such that whp. Then there are infinitely many values such that
Of course we can also replace with some function which tends to slowly. Up to this -term, the lower bound matches Shamir and Spencer’s upper bound.
A word of caution: The theorem only says that there are some so that is not very sharply concentrated. It does not tell us what these values are, and no bound is given for the other . In fact, we do not believe the conclusion of Theorem 1 is true for every — more on this later!
Basic proof strategyThe proof needs two ingredients:
(1) A (weak) concentration type result
A result that says that, whp, for some well-behaved function and an error term . Here, is much larger than the scale on which we are trying to prove non-concentration. We also need a lower bound on the slope of , where we will specify later.
(2) A coupling result
A coupling that shows that, for and some slightly larger , and are close together. More specifically, we need a coupling of and with (with as above) so that with probability at least , say,
Now suppose is a sequence of intervals as in Theorem 1. As shown in the picture, we now know that is concentrated around a function with slope more than (blue area), but it also follows from the coupling that the slope between the concentration intervals (red lines) is at most .
If all intervals were short, then as we increase , eventually this would lead to a contradiction: an interval lying outside the blue area. So there must be at least one long interval.
Working out the numbers, under certain reasonable conditions there is an interval of length about . We will have of order , of order and close to .
Independence number and large independent setsSo what’s the function ? This turns out to be the independence number of , that is, the size of the largest independent vertex set. Matula and independently Bollobás and Erdős proved that the independence number of behaves almost deterministically: for most , whp it takes an explicitly known deterministic value .
What does this have to do with ? Each colour class in a colouring (i.e. all the vertices of one particular colour) is independent, because neighbouring vertices must be coloured differently. So there are at most vertices in each colour class, and therefore . We saw in (1) that this easy lower bound is in fact asymptotically correct.
We call an independent set of size an -set. It is plausible that the optimal colouring of contains all or almost all -sets as colour classes. Roughly speaking, this is because the expected number of -colourings is dominated by colourings with as many -sets as possible.
Let count the number of -sets, then it can be shown that is approximately Poisson with mean , where . In particular, typically varies by about . If we must use all -sets in the colouring, then it seems plausible that the overall number of colours we end up with also varies by roughly this amount. (Divided by , as we will see in a moment.)
The weak concentration resultLuckily we already have a suitable estimate for in (1). Writing for this estimate, then for most ,
So what is the effect of having one extra -set in ? Using one colour for this -set, we need to colour the remaining vertices. So one extra -set should save us about
colours. So extra -sets should reduce by about .
The couplingWe construct the coupling in the following way: take vertices (we will pick later). Choose vertex sets of size uniformly at random and make them independent sets, and then include every edge outside these -sets independently with probability .
The inner graph on vertices is simply . It is also clear that, if is the outer graph on vertices, then
This is because any colouring of can be extended to a colouring of by giving a new colour to each of the -sets.
The trouble is that is not distributed as — we have disturbed the distribution by planting the -sets. The key point is that, as long as is not too large — less than the standard deviation of — then and are very similar. This makes sense on an intuitive level: the random graph doesn’t `notice’ the extra -sets as long as we have planted fewer than the natural fluctuation .
So we let (or in fact works). For the formal proof, we bound the total variation distance of the two random graph models.
Tying it all togetherThe two ingredients above can be combined to show that for every , there is some nearby with concentration interval length , with . If we pick so that is close to , Theorem 1 follows.
How close to Shamir and Spencer’s upper bound can we actually get? At the moment, nothing better than for some unspecified function is possible. The main bottleneck is the size of the error term in (1).
Konstantinos Panagiotou and I have been working on an improved estimate for in a paper we are currently writing up. Assuming this result, Oliver Riordan and I can prove that there are infinitely many such that
The best upper concentration bound for is , due to Alon.
Which of these is closer to the truth? We conjecture that, for the worst case , the width of the distribution of matches our lower bound up to the constant. However, the concentration behaviour seems to be very different for different , as described below.
The Zigzag conjectureLet be the standard deviation of . We already gave a heuristic argument that , coming from fluctuations in the number of -sets.
There is another conjectured lower bound coming from fluctuations in , the number of -sets, namely
The Zigzag conjecture, made recently by Bollobás, Heckel, Morris, Panagiotou, Riordan and Smith, states that at least for most , is essentially the maximum of these two lower bounds. Ignoring -terms, we have the following simplified statement.
Define by letting . Let
Then, if denotes the standard deviation of ,
It is not hard to show that the function `zigzags’ between and , roughly linearly in , as shown in the picture (the orange and blue lines are the lower bounds coming from – and -sets, respectively).
There’s a detailed heuristic explanation of the conjecture in the paper. We also think that we have a pretty good idea of the behaviour of at the extreme points. In particular, we believe that in the `worst case’, is of order , while in the `best case’ it’s of order .