Tag Archives: Erdal Arıkan

Emmanuel Abbe: Erdal Arıkan’s Polar Codes

Click here for the most recent polymath3 research thread. A new thread is comming soon.

Emmanuel Abbe and Erdal Arıkan

This post is authored by Emmanuel Abbe

A new class of codes, called polar codes, recently made a breakthrough in coding theory.

In his seminal work of 1948, Shannon had characterized the highest rate (speed of transmission) at which one could reliably communicate over a discrete memoryless channel (a noise model); he called this limit the capacity of the channel. However, he used a probabilistic method in his proof and left open the problem of reaching this capacity with coding schemes of manageable complexities. In the 90’s, codes were found (turbo codes and LDPC rediscovered) with promising results in that direction. However, mathematical proofs could only be provided for few specific channel cases (pretty much only for the so-called binary erasure channel). In 2008,  Erdal Arıkan at Bilkent University invented polar codes, providing a new mathematical framework to solve this problem.

Besides allowing rigorous proofs for coding theorems, an important attribute of polar codes is, in my opinion, that they bring a new perspective on how to handle randomness (beyond the channel coding problem). Indeed, after a couple of years of digestion of Arıkan’s work, it appears that there is a rather general phenomenon underneath the polar coding idea. The technique consist in applying a specific linear transform, constructed from many Kronecker products of a well-chosen small matrix, to a high-dimensional random vector (some assumptions are required on the vector distribution but let’s keep a general framework for now). The polarization phenomenon, if it occurs, then says that the transformed vector can be split into two parts (two groups of components): one of maximal randomness and one of minimal one (nearly deterministic). The polarization terminology comes from this antagonism. We will see below a specific example. But a remarkable point is that the separation procedure as well as the algorithm that reconstructs the original vector from the purely random components have low complexities (nearly linear). On the other hand, it is still an open problem to characterize mathematically if a given component belongs to the random or deterministic part. But there exist tractable algorithms to figure this out accurately.

Let us consider the simplest setting. Let X_1,...,X_n be i.i.d. Bernoulli(p) and assume that $n$ is a power of 2. Define G_n to be the matrix obtained by taking \log_2(n) Kronecker products of G_2=\begin{pmatrix}1&0\\1&1\\\end{pmatrix} , and multiply X=(X_1,...,X_n) with G_n over GF(2) to get U=(U_1,...,U_n). Note that U has same entropy as X, since G_n is invertible (its inverse is itself). However, if the entropy of X is uniformly spread out over its components, i.e., H(X)=nH(X_1), the entropy of U may not be, since its components are correlated. In any case, we can write H(U)= \sum_i H(U_i|U^{i-1}) where U^{i-1}=(U_1,...,U_{i-1}) are the `past components’. The polarization phenomenon then says that, except for a vanishing fraction of indices i (w.r. to n), each term H(U_i|U^{i-1}) tends to either 0 or 1.

Continue reading