User:Harikine/sandbox

Motivation to ABNNR
Can we build a large distance code over a large alphabet using a smaller alphabet code? This is where ABNNR and AEL codes come in.

ABNNR code addresses this using a repetition code i.e every element in original code is repeated and using this and an expander graph larger code is generated. AEL code instead of just repeating the elements uses another code to generate an intermediate code(parity code in example below), using this and an expander graph larger code is generated.

Definition of Expander graphs
Consider a bipartite graph $$G(L,R,E)$$ where $$L$$ is the set of left vertices, $$R$$ is the set of right vertices and $$E$$ is the set of edges between left and right vertices. Every vertex in $$L$$ has degree $$d$$. Let $$|L|=n$$ and $$|R|=m, m\le n$$. The graph $$G$$ is said to be $$(N,M,d,\delta,\gamma)$$ expander if $$\forall S\subset L$$ where $$|S|\le\delta n$$ there will be at least $$\gamma|S|$$ neighbors in $$R$$.

So in expander graphs small $$\delta n$$ on the left almost produces entire right vertex i.e a subset of left vertices produces entire right vertex set.

Consider a code with alphabet size $$ q=2^a$$, where each letter can be represented by $$a$$ bits (let us consider this alphabet to be $$ \mathbb{F}_{\mathrm{q}} $$). As an example, consider a repetition code $$(n,k, \delta n)$$ over$$ \{0,1\}$$. If we split the code word into $$\dfrac{n}{a}$$ codewords each consisting of $$a$$ consecutive bits, the subsequent code can be represented as $$(\dfrac{n}{a},\dfrac{k}{a}, \delta \dfrac{n}{a})$$. Note that $$n$$ and $$k$$ reduce to $$\dfrac{n}{a}$$ and $$\dfrac{k}{a}$$ as we are encoding each of the consecutive $$a$$ bits at a time. Consequently the distance reduces by a factor of $$a$$

We can represent the the above code using a bipartite graph as follows: the left hand side of the bipartite consists of '$$n$$' vertices and right hand side consists of '$$\dfrac{n}{a}$$' vertices, where each of the consecutive $$a$$ vertices on the left map to a single vertex on the right. That is vertices $$1$$,....,$$a$$ map to vertex $$1$$ on the right, vertices $$a+1$$,.....,$$2a$$ map to vertex $$2$$ on the right and so on.

https://wiki.cse.buffalo.edu/cse545/sites/wiki.cse.buffalo.edu.cse545/files/70/abnnr_1.jpg

Alon, Brooks, Naor, Naor, and Roth [ABNNR] Construction
ABNNR construction requires: 1. $$(n,k,\delta n) _{q^a}$$ code $$C$$ is a repetition code 2. Bipartite graph G i.e $$d$$-regular $$(\delta,\gamma)$$ expander

Step1: We encode a message of size $$k$$ bits to $$n$$ using code $$C$$ Step2: Assign each of the $$n$$ bits to the left vertices of expander graph Step3: Each of right vertex has $$d$$ neighbors, assign each right vertex a $$d$$-bit string derived from its  $$d$$ neighbors. Step 4: The $$n$$ elements on the right will be our desired codeword.

https://wiki.cse.buffalo.edu/cse545/sites/wiki.cse.buffalo.edu.cse545/files/70/abnnr.jpg

Additive codes A code $$C$$ is said to be additive if its alphabet $$E$$ forms a vector space over a ground field $$F_q$$, and $$C$$ is linear over $$F_q$$. For an additive code weight of any non-zero codeword is at least distance of the code.

Since additive codes are linear codes.We argue that for any linear code $$[n,k,d] _q$$ code $$C$$, minimum distance $$d$$= min $$wt(c)$$ where  $$c \in C$$  and $$c \neq 0$$.

To show that $$d$$ is same as minimum weight we show that $$d$$ is no more than minimum weight and d is no less than minimum weight.

Consider $$\Delta(0,c')$$ where $$c'$$ is the non-zero codeword in $$C$$ with minimum weight. Its distance from $$0$$ is equal to its weight.Thus we have $$d\leq wt(c')$$

Consider $$c_1\neq c_2\in C$$ such that $$\Delta(c_1,c_2)=d$$.Note that $$c_1-c_2 \in C$$ (since C is a linear code). Now $$wt(c_1 - c_2)=\Delta(c_1,c_2)=d$$.Since the non-zero symbols in $$c_1\neq c_2$$ occur exactly in the positions where the two codewords differ, implies the minimum hamming weight of  any non-zero codeword  in $$C$$  is at most $$d$$.

From the two arguments we can conclude that for an additive code weight of any non-zero codeword is atleast the hamming distance of the code.

Theorem 1 Given an$$ (n,k,\delta .n)$$ code $$C$$ and a bipartite, d-regular graph $$G$$ that is  a$$ (\gamma, \delta) $$expander, the encoding process above creates an $$(n, \dfrac{k}{d}, \gamma  . \delta . n )_{2^d}$$ -code.

Proof: We start with $$k$$ bits, which can be interpreted as $$\dfrac{k}{q}$$ elements of $$\{0, 1\}^d$$, so the new code reduces the rate by a factor of d. Assuming $$C$$ is additive, we know that any code word of $$C$$ has weight at least $$\delta. n$$. Since $$G$$ is an expander graph, of the $$d$$- tuples obtained at least $$\gamma. \delta. n$$ tuples will be non-zero.Therefore the distance of the final code is $$\gamma. \delta. n$$. Hence this encoding produces an $$(n, \dfrac{k}{d}, \gamma . \delta . n )_{2^d}$$ -code.

Alon, Edmonds and Luby [AEL] code
Construction: In ABNNR construction, we assigned values of the edges using expander graph $$G$$ using repetition code on the vertices on the left partition of the expander graph.In AEL code, we use another code $$C_0$$ for this purpose. For AEL encoding we define a special type of expander graph.

Definition: Let $$G $$ be a $$d$$-regular, bipartite graph with a set  $$L $$ of left vertices and a set  $$R $$ of right vertices satisfying $$   |L|=|R|=n \cdot G$$ is a $$(d,\epsilon)\-$$ uniform graph if $$ \forall X \subseteq L$$,$$ Y\subseteq R$$,$$\beta = \dfrac{|X|}{n},\gamma=\dfrac{|Y|}{n}$$,the number of edges between $$X$$ and $$Y$$ is $$\geq(\beta \cdot \gamma - \epsilon)\cdot d \cdot n$$.

AEL Construction
The construction requires: 1.$$(n,k,D)_{q^a} $$ code $$C$$. 2.$$(d,R\cdot d,\delta \cdot D)_q$$ code $$C_0$$. 3.$$(d,\epsilon)$$-uniform graph $$G$$. ($$G$$ has $$n$$ left and $$n$$ right vertices). and 4. $$a=R\cdot d$$.

Step1: We start with message of size $$C$$ ie $$k$$ elements of an alphabet of size $$q^a$$ encoding this message gives $$n$$ elements of size $$q^a$$. Step2: Assign each of the $$n$$ elements to left vertex of $$G \Rightarrow$$ each left vertex has an element of $$q^a=q^{R.d}$$. This element can act as message for $$C_0$$. Step3: Encoding each element using $$C_0$$ gives us $$d$$ elements from an alphabet of size $$q$$.{In ABNNR all the $$d$$ outgoing edges are assigned same value(repetition code) but in AEL we use $$C_0$$ to generate each of d elements for a vertex.} Step4: We can place one of these d elements on each edge leaving the vertex. Step5: Each right vertex is assigned to $$d$$-tuple corresponding to edges incident to it.

https://wiki.cse.buffalo.edu/cse545/sites/wiki.cse.buffalo.edu.cse545/files/70/ael_1.jpg

Theorem 4 The AEL construction produces an $$( n,R.k,\delta - \dfrac{\epsilon.n}{D})_{q^d} $$code

Proof: $$k$$ elements of size $$q^a$$ each are given as input to code $$C$$,$$ q^a=q^{R.d}$$ Therefore the input message is of length $$R.k$$ over $$q^d.$$. Let $$\beta$$ be relative distance of the code $$C$$ i.e $$ D=\beta.n.$$ Since the code outputs $$n$$ tuples of $$d$$ elements each we get code length as $$n$$. Now we have to establish a bound on the distance of the code and we are done.

Assuming $$C$$ and $$C_0$$ are additive codes, from the definition of $$C$$  we have a codeword of length  $$n$$ and distance $$D=\beta. n$$. i.e the weight is atleast $$D=\beta. n$$.

In order to bound the weight of the output word, we have to bound the number of right vertices that have all their incident edges assigned to zero. Let $$X$$ be the subset of the left vertices of $$G$$ that are not zero, i.e $$|X|\geq \beta. n$$ Let $$Y$$ be the set of right vertices of $$G$$ that are assigned the value $$0$$.{All incident edges have $$0$$}

Since $$C_0$$ has a relative distance $$\delta$$ every element of $$X$$ will have at most $$(1- \delta). d$$ of its edges set to zero.Thus the number of edges leaving $$X$$ that are labelled zero is $$\leq |X|. (1-\delta). d$$. Since every member of $$Y$$ is labelled $$0$$ this implies number of edges from $$X$$ to $$Y \leq |X|. (1-\delta). d$$.

We also have that $$G$$ is a $$(d,\epsilon)$$ uniform graph which means that number of edges from $$X$$ to $$Y \geq ( \dfrac{ |X|\cdot |Y|}{n^2} - \epsilon) \cdot  d \cdot n$$.

From the above two statements we get.

$$( \dfrac{ |X|\cdot |Y|}{n^2} - \epsilon) \cdot d \cdot n \leq$${number of edges from X to Y } $$\leq |X|\cdot (1-\delta) \cdot d$$.

This gives the following bound on $$|Y|$$ :

$$|Y|\leq( 1 - \delta + \dfrac {\epsilon\cdot n}{|X|})\cdot n$$

Minimum no. of non-zero elements are $$= n-|Y|$$ $$                                                             = ( \delta - \dfrac {\epsilon\cdot n}{|X|}) \cdot n$$

So, the minimum distance $$ = ( \delta - \dfrac {\epsilon \cdot n}{|X|}) $$

Relative distance $$ = \delta - \dfrac {\epsilon \cdot n}{|X|} $$. Therefore AEL constructs an $$( n,R \cdot k,\delta - \dfrac{\epsilon\cdot n}{D})_{q^d} $$code

Decoding AEL code
Decoding is done in the reverse way as the encoding process .i.e from the output message (the final code word with large alphabet set) we traverse backward on the edges of the graph $$G$$ and form a candidate set of codeword for each vertex on the left side vertex set of the bipartite graph.We then use the decoding algorithm for $$C_{0}$$ to get the initial vertices which are of length $$n$$ i.e the left vertices and then apply the decoding algorithm for $$C $$ to get the original message back.

Summarizing the above:

Step:1Traverse along the edges from the right vertex to its $$d$$ neighbors. Step:2Using the edge weights form the codeword for each of vertex on the left side. Step:3Apply decoding algorithm of $$C_{0}$$ to get the initial left vertices. Step:4Apply decoding algorithm of $$C$$ to get the initial message sent. (Assumption in step 3 and 4 is because of the fact that the decoding algorithm is to be made in linear time the decoding of the code $$C$$ and $$C_{0}$$ should also be linear and hence the overall decoding algorithm is linear even if one the decoding is done in nonlinear time then the overall decoding is also not linear). The theorem below tries to prove that there exists codes with $$q$$ alphabets and satisfies other parameters as mentioned below such that with $$\leq \dfrac{\delta}{2}$$ fraction of errors it can be decodable in linear time.

Theorem 5:(Guruswami and Indyk) For all such values of $$R,\delta $$ which satisfy the condition $$R + \delta < 1 $$, there is a $$ n_{0} < \infty$$ such that for all such values of $$ n > n_{0} $$ there can exist  $$q$$ - ary codes of rate $$R$$ with relative distance $$\geq \delta $$ that are uniquely decodable from $$\leq \dfrac {\delta}{2}$$ fraction of errors in linear time.

Proof: As mentioned above, following the approach of reversing the method of encoding we can arrive at our messages. The following assumptions are made in the proof of the above theorem:

- Say the final output code word formed has $$\tau \cdot n$$ errors associated with it. - Say we have a constant time algorithm to decode $$C_{0}$$. - That we have a linear time algorithm to decode $$C$$ if there are $$\leq \dfrac{\beta \cdot n}{2}$$ errors.

Sketch: Take the bipartite graph used in the encoding process and follow the edges from the right vertices to the left vertices as there are errors in the codeword, these errors are propagated to the left vertices as we traverse along the edges. We can uniquely decode a vertex on the left side if the number of errors is or the number of edges coming into the vertex from the right vertex sets are $$\leq \dfrac{\delta\cdotd}{2}$$. i.e let us consider $$|X'|$$ be the number of vertices which have errors $$\req \dfrac{\delta\cdotn}{2}$$ and $$|Y'|$$ the number of vertices on the right side of the bipartite graph that are uncorrupted $$(1-\tau)\cdotn$$ i.e $$\tau\cdotn+(1-\tau)\cdotn = n $$ (total number of errors + correct bits of the output = $$n$$).

As the graph $$G$$ is a $$(d,\epsilon)$$ graph then it implies it has atleast $$($$$$\dfrac{|X'|\cdot(1-\tau)}{n}-\epsilon)\cdotd\cdotn$$ edges between $$X$$ and$$ Y$$. But every vertex in $$X$$ has atleast $$\dfrac{\delta\cdotd}{2}$$errors or we can say that each vertex in $$X$$ has atmost $$(1-\dfrac{\delta}{2})\cdotd$$ neighbors in $$Y$$. We can then show that:

$$($$$$\dfrac{|X'|\cdot(1-\tau)}{n}-\epsilon)\cdotd\cdotn \leq $$ $$($$ number of edges from $$X$$ to $$ Y $$ $$ ) $$ $$ \leq (1-\dfrac{\delta}{2})\cdotd\cdot|X'|}$$ which on simplification gives: $$|X'|\leq\dfrac{\epsilon\cdotn}{\dfrac{\delta}{2}-\tau}$$

choosing : $$\tau\leq\dfrac{\delta}{2}-\dfrac{2\cdot\epsilon}{\beta}$$ (the $$n$$ elements of the left vertex set has a distance $$D$$ and so we can get a correct code if the number of errors is less than $$\dfrac{D}{2}$$ but since this is a $$(d,\epsilon)$$ graph $$\beta = \dfrac{|X'|}{n}$$ or $$\beta = \dfrac{D}{2}$$). Hence substituting for $$D$$,

we get : $$|X'|\leq\dfrac{\beta \cdot n}{2}$$

Therefore we can decode from a fraction $$\tau = \dfrac{\delta}{2}-\dfrac{2\cdot\epsilon}{\beta}$$ of errors in linear time by choosing $$C,C_{0}$$ appropriately.