# Extremal Combinatorics III: Some Basic Theorems

.

Shattering

Let us return to extremal problems for families of sets and describe several basic theorems and basic open problems. In the next part we will discuss a nice proof technique called “shifting” or “compression.”

## The Sauer-Shelah (-Perles -Vapnik-Chervonenkis) Lemma:

(Here we write $N = \{1,2,\dots,n\}$.) A family ${\cal F} \subset 2^N$ shatters a set $S \subset N$ if for every $R \subset S$ there is $F \in {\cal F}$ such that $S \cap F=R$. Note that in order to shatter a set of size $m$  you need at least $2^m$ sets. But how many sets will guarantee that you shatter a set with $m$ elements? The Sauer-Shelah Lemma answers this question!

Theorem (Sauer-Shelah): If  $\cal F > {n \choose 0} + {n \choose 1} + \dots + {n \choose k}$  then there exists a set $S$, $|S|=k+1$ such that $\cal F$ shatters $S$.

Hmm LaTeX can go wild, let me try that again:

Theorem (Sauer-Shelah): If  $|{\cal F}| > {n \choose 0} + {n \choose 1} + \dots + {n \choose k}$ then there exists a set $S$, $|S|=k+1$ such that $\cal F$ shatters $S$.

(This is much better.)

The simplest way to prove this theorem is by induction. Here is a variation I just heard from Noga. Proof (a little sketchy): Prove a slightly stronger fact that every family $\cal F$ shatters at least as many sets as $|{\cal F}|$. Consider the subfamily ${\cal F}_0$ of sets in the family not containing $'1'$. By induction ${\cal F}_0$ shatters at least as many subsets of $N' = \{2,3, \dots , n\}$ as $|{\cal F}_0|$. Next consider ${\cal F}_1 =$ $\{S \in {\cal F}: 1 \in S\}$, and ${\cal F}'_1$ = $\{S \backslash \{1\}:$ $S \in {\cal F}, 1 \in S \}$.  By induction ${\cal F}'_1$ shatters as many subsets of $\{2,3,\dots,n\}$ as its cardinality. The number of subsets of $N'$ shattered by ${\cal F}_0$ and ${\cal F}'_1$ sum up to at least $|{\cal F}_0|$+$|{\cal F}'_1|$ = $|{\cal F}|$, and every subset of $N'$ shattered by ${\cal F}'_1$ is shattered by ${\cal F}_1 \subset {\cal F}$. Sababa? not quite! there may be subsets $R \subset N'$ shattered by both ${\cal F}_0$ and ${\cal F}'_1$. But note that in this case both $R$ and latex $R \cup \{1\}$ are shattered by $\cal F$. walla!

OK, let me give a few more words of explanation about the proof. When I say “every family” do I mean “every family that satisfies the condition of the theorem?” (And If I do why can I use induction?) No, i really mean every family. The point is that if every family shatters  at least as many sets as its size then a family which satisfies  $|{\cal F}| > {n \choose 0} + {n \choose 1} + \dots + {n \choose k}$ shatters more sets than this sum. But then it must shatter a set of size larger than $k$ because there are not enough subsets of the set $\{1,2,\dots,n\}$ of size at most $k$.

One additional word about the proof. Considering the sets containing ‘1’ and those not containing ‘1’ and then applying an inductive argument which can be simple or extremely complicated, direct or involving several additional methods, is a very basic proof method in extremal set theory.

This theorem can be described as an “eigentheorem” because it is important in many different places. It was mentioned (with an algebraic proof by Frankl and Pach) in Gowers’ blog and also, in another context, in Kowalski’s blog. Sauer proved it in response to a problem of Erdos. Shelah (with Perles) proved it as a useful lemma for Shelah’s theory of stable models. (At some later time, Benjy Weiss asked Perles about such a result in the context of ergodic theory and Perles who forgot that he proved it once proved it again.) Vapnik and Chervonenkis proved it in the context of statistical learning theory. The VC-dimension of a family of sets is the size of the largest set the family shatters. This notion plays an important role in learning theory, statistics and probability theory. There are several applications of the Sauer-Shelah theorem to analysis and to the geometry of Banach spaces, and there are several extensions and refinements.

## The Kruskal-Katona Theorem

We let $\bf N$ denote the set of positive integers and let ${{\bf N} \choose {k}}$  = $\{S: S \subset {\bf N}, |S|=k\}$.

Let ${\cal F}$ be a subset of ${{\bf N} \choose {k}}$. The Shadow $\partial {\cal F}$ is the set of all (k-1)-subsets of $\bf N$  which are contained in at least one set in $\cal F$.

We would like to answer the following question. How small can the Shadow of a family of m k-sets be? The precise answer to this question is given by the Kruskal-Katona’s theorem.