Emmanuel Abbe and Erdal Arıkan
This post is authored by Emmanuel Abbe
A new class of codes, called polar codes, recently made a breakthrough in coding theory.
In his seminal work of 1948, Shannon had characterized the highest rate (speed of transmission) at which one could reliably communicate over a discrete memoryless channel (a noise model); he called this limit the capacity of the channel. However, he used a probabilistic method in his proof and left open the problem of reaching this capacity with coding schemes of manageable complexities. In the 90’s, codes were found (turbo codes and LDPC rediscovered) with promising results in that direction. However, mathematical proofs could only be provided for few specific channel cases (pretty much only for the so-called binary erasure channel). In 2008, Erdal Arıkan at Bilkent University invented polar codes, providing a new mathematical framework to solve this problem.
Besides allowing rigorous proofs for coding theorems, an important attribute of polar codes is, in my opinion, that they bring a new perspective on how to handle randomness (beyond the channel coding problem). Indeed, after a couple of years of digestion of Arıkan’s work, it appears that there is a rather general phenomenon underneath the polar coding idea. The technique consist in applying a specific linear transform, constructed from many Kronecker products of a well-chosen small matrix, to a high-dimensional random vector (some assumptions are required on the vector distribution but let’s keep a general framework for now). The polarization phenomenon, if it occurs, then says that the transformed vector can be split into two parts (two groups of components): one of maximal randomness and one of minimal one (nearly deterministic). The polarization terminology comes from this antagonism. We will see below a specific example. But a remarkable point is that the separation procedure as well as the algorithm that reconstructs the original vector from the purely random components have low complexities (nearly linear). On the other hand, it is still an open problem to characterize mathematically if a given component belongs to the random or deterministic part. But there exist tractable algorithms to figure this out accurately.
Let us consider the simplest setting. Let be i.i.d. Bernoulli() and assume that $n$ is a power of 2. Define to be the matrix obtained by taking Kronecker products of , and multiply with over to get . Note that has same entropy as , since is invertible (its inverse is itself). However, if the entropy of is uniformly spread out over its components, i.e., , the entropy of may not be, since its components are correlated. In any case, we can write where are the `past components’. The polarization phenomenon then says that, except for a vanishing fraction of indices (w.r. to ), each term tends to either 0 or 1.
Here denotes the Shannon conditional entropy (it is not a random variable like the conditional expectation but a scalar); if you don’t know the definition, what matters is to know that ` close to 0′ means that is nearly a deterministic function of , and ` close to 1′ means that is nearly independent of and uniformly distributed.
Also note that defining to be the set of indices where is close to 1, we have that restricted on is roughly i.i.d. Bernoulli(1/2). Thus, randomness has been extracted, and what is remarkable is that the compression procedure (the matrix multiplication) and the reconstruction algorithm (recovering the deterministic entries from the pure bits) can be done in . This is a consequence of the Kronecker structure of , which allows the use of `divide and conquer’ algorithms (a la FFT). The error probability for the reconstruction is roughly .
Over the past two years, this result has been extended to random variables valued in , to some mixing processes, to some universal settings and to some multi-dimensional objects. For example, if the ‘s are i.i.d. random vectors of dimension rather than random variables, we can use a (component-wise) polarization transform that produces again special ‘s, which are, conditioned on the past, equivalent to linear forms of independent pure bits. The number of extremal structures is hence no longer 2 as for but is equal to the number of linear forms over . This also raised interesting connections with combinatorics topics (in particular I found matroid theory to be useful in this problem).
The original paper is “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels“, published in July 2009 in the IEEE Transactions on Information Theory. Arıkan received the IEEE Information Theory best paper award this year. This paper deals with the polarization of `channels’, whereas here I discussed the polarization of `sources’ (less general but same technique).
Emmanuel Abbe, firstname.lastname@example.org
Additional links: historical take on polar codes;