**Shattering**

Let us return to extremal problems for families of sets and describe several basic theorems and basic open problems. In the next part we will discuss a nice proof technique called “shifting” or “compression.”

## The Sauer-Shelah (-Perles -Vapnik-Chervonenkis) Lemma:

(Here we write .) A family **shatters **a set if for every there is such that . Note that in order to shatter a set of size you need at least sets. But how many sets will guarantee that you shatter a set with elements? The Sauer-Shelah Lemma answers this question!

**Theorem (Sauer-Shelah):** If then there exists a set , such that shatters .

Hmm LaTeX can go wild, let me try that again:

**Theorem (Sauer-Shelah):** If then there exists a set , such that shatters .

(This is much better.)

The simplest way to prove this theorem is by induction. Here is a variation I just heard from Noga. **Proof **(a little sketchy)**:** Prove a slightly stronger fact that every family shatters at least as many sets as . Consider the subfamily of sets in the family not containing . By induction shatters at least as many subsets of as . Next consider , and = . By induction shatters as many subsets of as its cardinality. The number of subsets of shattered by and sum up to at least + = , and every subset of shattered by is shattered by . Sababa? not quite! there may be subsets shattered by both and . But note that in this case both and latex are shattered by . **walla**!

OK, let me give a few more words of explanation about the proof. When I say “every family” do I mean “every family that satisfies the condition of the theorem?” (And If I do why can I use induction?) No, i really **mean** every family. The point is that if every family shatters at least as many sets as its size then a family which satisfies shatters more sets than this sum. But then it must shatter a set of size larger than because there are not enough subsets of the set of size at most .

One additional word about the proof. Considering the sets containing ‘1’ and those not containing ‘1’ and then applying an inductive argument which can be simple or extremely complicated, direct or involving several additional methods, is a very basic proof method in extremal set theory.

This theorem can be described as an “**eigentheorem**” because it is important in many different places. It was mentioned (with an algebraic proof by Frankl and Pach) in Gowers’ blog and also, in another context, in Kowalski’s blog. Sauer proved it in response to a problem of Erdos. Shelah (with Perles) proved it as a useful lemma for Shelah’s theory of stable models. (At some later time, Benjy Weiss asked Perles about such a result in the context of ergodic theory and Perles who forgot that he proved it once proved it again.) Vapnik and Chervonenkis proved it in the context of statistical learning theory. The **VC-dimension** of a family of sets is the size of the largest set the family shatters. This notion plays an important role in learning theory, statistics and probability theory. There are several applications of the Sauer-Shelah theorem to analysis and to the geometry of Banach spaces, and there are several extensions and refinements.

## The Kruskal-Katona Theorem

We let denote the set of positive integers and let = .

Let be a subset of . The **Shadow** is the set of all (k-1)-subsets of which are contained in at least one set in .

We would like to answer the following question. How **small** can the Shadow of a family of m k-sets be? The precise answer to this question is given by the Kruskal-Katona’s theorem.

First write + + + , where .

It is not hard to see that (when k is fixed) there is a unique way to write m in this manner.

(Hint: Choose to be the largest integer so that . With this choice and you can continue choosing so that . Any other choice for will fail.)

**Kruskal Katona Theorem**: Let be a subset of . Suppose that . Then

+ .

**Shadow**

The Kruskal-Katona theorem is very basic both in extremal combinatorics and in algebraic combinatorics. An earlier similar theorem about families of multisets (or monomials) was proved by Macaulay in 1927.

## The Erdos-Rado Sunflower Theorem

A sunflower (a.k.a. Delta-system) of size is a family of sets such that every element that belongs to more than oneofthe sets belongs to all of them. A basic and simple result of Erdos and Rado asserts that

**Erdos-Rado sunflower theorem:** There is a function so that every family of -sets with more than members contains a sunflower of size .

(We denote by the smallest integer that suffices for the assertion of the theorem to be true.)

**Proof:** Let be a family of -sets without a sunflower of size . Let be a maximum subfamily of pairwise disjoint sets in . Since a family of pairwise disjoint sets is a sunflower, we must have . Now let . For every consider the family . Now, the size of is at most and the size of each is at most . (Because a sunflower in gives a sunflower in as well.) We get that . This gives . **Walla!**

## The Erdos-Rado Sunflower Conjecture

One of the most famous open problems in extremal combinatorics is:

**The Erdos-Rado conjecture: **Prove that .

Here, is a constant depending on . This is most interesting already for .

(Sept 09: The Erdos-Rado sunflower (Delta-system) conjecture was proposed by Tim Gowers, among other problems, for a “polymath project.”)

## The Harris-Kleitman theorem

An ideal of sets is a family of sets closed under taking subsets. Ideals are also called, down-families and simplicial complexes. A filter of sets is a family closed under taking supersets.

**The Harris-Kleitman Theorem:** Let and be two ideals in , . then:

This theorem asserts that for random subsets of the events “to be in ” and “to be in ” have **nonnegative correlation**. This can be seen as a first result in a wide and important area of **correlation inequalities**. The proof is by a rather easy induction.

By considering the complement of in we get that the events: “to be in ” and “to be in ” have nonpositive correlation, whenever is an ideal and is a filter.

### Solving Problem II from Lecture 1.

**Problem II: **What is the largest size of a family of subsets of such that for every two sets in the intersection of and is non empty **and** their union is not . (here, .)

**Answer:**. (This is a theorem by D. J. Kleitman.)

**Proof:** The family of all sets containing ‘1’ and missing ‘2’ is an example with sets. To show that there is no larger example, take the family and turn it to an ideal by closing it to subsets. This ideal will still satisfy the condition that the union of two sets is not and therefore . Now take the family again and make it a filter by closing it to supersets. This filter is intersecting (the intersection of two sets belonging to it is not empty) and therefore also . The Harris-Kleitman theorem asserts that the events “a random set belongs to ” and “a random set belongs to ” have nonpositive correlation. The probability of a random set to belong to is at most 1/2; the probability of a random set to belong to ist also at most 1/2, and, therefore, the probability of a random set to belong to both and is at most 1/4. In other words, the number of sets in is at most . Of course every set in is both in and in **Sababa!**

** **

## The Two Families Theorem

Bollobas’s Two Families theorem is beautiful and useful.

**Theorem:** Let and be two families of sets having the following properties:

1) for every , , and .

2) For every ,

3) for every , .

Then .

Of course, the bound of this theorem is sharp, as seen by taking the s to be all k-subsets of a set with elements and letting be the complement of . There is a very nice proof (due to Katona, I think) that is similar to the proofs we saw in part I for Sperner’s theorem and for Erdos-Ko-Rado’s theorem. Here is a sketch:

**Proof:** You can assume that and for every . (Just add fresh new elements to these sets if needed.) Let X be the union of all the s and s. Suppose that . **Count pairs ** where is an ordering of X, and is an index with the property that all elements of come before all elements of in the ordering. And now note first that the intersection property we described guarantees that for every there is at most one possible . So the total number of pairs is at most . On the other hand for every the number of for which the elements of come before those of is precisely . So and this gives . **Ahla!**

Reminder: Arabic words which are part of Hebrew slang: **ashkara** – for real; **sababa** – cool, wonderful; **walla** – true; **ahla** – great; **yalla** – hurry up, c’mon, let’s go.

We end this post with two other central open problems in extremal combinatorics.

## Chvatal’s conjecture

For a family of sets and an element denote by .

**Chvatal’s conjecture**: Let be an ideal of sets and let be an intersecting subfamily, (i.e., for every , ). Then there is an element so that .

There is something in Chvatal’s conjecture that resembles Frankl’s problem from the very first post.

## Simonyi’s conjecture

Here is a beautiful conjecture by Gabor Simonyi. Let , be two families of subsets of an -set (a set with n elements). Suppose that the following two conditions hold:

1) For every and

implies .

2) For every and

implies .

Then .

## Happy new year

**And to all the readers: happy new (Jewish) year!**

Pingback: Math and Food Blogs « Dave a.g.p.’s Blog

Pingback: Lovasz’s Two Families Theorem « Combinatorics and more

Pingback: Extremal Combinatorics on Permutations « Combinatorics and more

Pingback: How the g-Conjecture Came About « Combinatorics and more

Pingback: Cup Sets, Sunflowers, and Matrix Multiplication | Combinatorics and more

Pingback: Analysis of Boolean Functions – week 1 | Combinatorics and more

Pingback: Analysis of Boolean Functions – week 4 | Combinatorics and more

Pingback: Influence, Threshold, and Noise | Combinatorics and more

Pingback: Updates and plans III. | Combinatorics and more

Pingback: Polymath10: The Erdos Rado Delta System Conjecture | Combinatorics and more

Pingback: Polymath10: The Erdos Rado Delta System Conjecture | Combinatorics and more

What I never quite understood about the Kruskal-Katona theorem is the ideology behind looking at *uniform* set families. Why this ultimately turns out more useful than considering just *all* families? Is it trivial / easy to determine the smallest possible size of the shadow of a (not necessarily $k$-uniform) set family of given size $m$?