User:Ahughes6/Example/Testing

d-separable
Definition: A $$t$$ x $$n$$ matrix $$M$$ is $$d$$-separable if and only if $$ \forall S_1 \neq S_2 \subseteq [n]$$ where $$|S_1|,|S_2| \leq d$$ such that $$\bigcup_{j \in S_1} M_j \neq \bigcup_{i \in S_2} M_i$$

Decoding algorithm
First we will describe another way to look at the problem of group testing and how to decode it from a different notation. We can give a new interpretation of how group testing works as follows:

Group Testing: Given input $$M$$ and $$\mathbf{r}$$ such that $$\mathbf{r} = M \mathbf{x}$$ output $$\mathbf{x}$$
 * Take $$ M_j $$ to be the $$ j^{th} $$ column of $$M$$
 * Define $$ S_{M_j} \subseteq [t] $$ so that $$ M_j(i) = 1 $$ if and only if $$ i \in S_{M_j} $$
 * This gives that $$ S_\mathbf{r} = \bigcup_{j \in [n], \mathbf{x}_j = 1} S_{M_j} $$

This formalizes the relation between $$\mathbf{x}$$ and the columns of $$M$$ and $$\mathbf{r}$$ in a way more suitable to the thinking of $$d$$-separable and $$d$$-disjunct matrices. The algorithm to decode a $$d$$-separable matrix is as follows:

Given a $$t$$ x $$n$$ matrix $$M$$ such that $$M$$ is $$d$$-separable: This algorithm runs in time $$n^{\mathcal{O}(d)}$$.
 * 1) For each $$T \subseteq [n]$$ such that $$|T| \leq d$$ check if $$S_\mathbf{r} = \bigcup_{j \in T} S_{M_j} $$

d-disjunct
In literature disjunct matrices are also called super-imposed codes and d-cover-free families.

Definition: A $$t$$ x $$n$$ matrix $$M$$ is d-disjunct if $$\forall S \subseteq [n]$$ such that $$|S| \leq d$$, $$\forall j \notin S$$ $$ \exists i$$ such that $$M_{i,j} = 1$$ but $$\forall k \in S, M_{i,k} = 0$$. Denoting $$ M_a $$ is the $$a^{th}$$ column of $$M$$ and $$S_{M_a} \subseteq [t]$$ where $$ M_a(b) = 1 $$ if and only if $$ b \in S_{M_a} $$ gives that $$M$$ is $$d$$-disjunct if and only if $$ S_{M_j} \subsetneq \cup_{k \in S} S_{M_k}$$

Claim: $$M$$ is $$d$$-disjunct implies $$M$$ is $$d$$-separable

Proof: (by contradiction) Let $$M$$ be a $$t$$ x $$n$$ $$d$$-disjunct matrix. Assume for contradiction that $$M$$ is not $$d$$-separable. Then there exists $$T_1, T_2 \in [n]$$ and $$T_1 \neq T_2$$ with $$|T_1|,|T_2| \leq d$$ such that $$\bigcup_{i \in T_1} M_i = \cup_{i \in T_2} S_{M_i}$$. This implies that $$ \exists j \in T_2 \setminus T_1 $$ such that $$ S_{M_j} \subseteq \bigcup_{k \in T_1} T_{M_k}$$. This contradicts the fact that $$M$$ is $$d$$-disjunct. Therefore $$M$$ is $$d$$-separable. $$\Box $$

Decoding Algorithm
The algorithm for $$d$$-separable matrices was still a polynomial in $$n$$. The following will give a nicer algorithm for $$d$$-disjunct matrices which will be a $$d$$ multiple instead of raised to the power of $$d$$ given our bounds for $$t$$. The algorithm is as follows in the proof of the following lemma:

Lemma 1: There exists an $$\mathcal{O}(nt)$$ time decoding for any $$d$$-disjunct $$t$$ x $$n$$ matrix.
 * Observation 1: For any matrix $$M$$ and given $$M\mathbf{x} = \mathbf{r}$$ if $$\mathbf{r}_i = 1 $$ it implies $$ \exists j $$ such that $$ M_{i,j} = 1 $$ and $$ \mathbf{x}_j = 1 $$ where $$ 1 \leq i \leq t $$ and $$ 1 \leq j \leq n $$. The opposite is also true. If $$\mathbf{r}_i = 0 $$ it implies $$ \forall j $$ if $$ M_{i,j} = 1 $$ then $$ \mathbf{x}_j = 0 $$. This is the case because $$\mathbf{r}$$ is generated by taking all of the logical or of the $$ \mathbf{x}_j$$'s where $$ M_{i,j} = 1 $$.
 * Observation 2: For any $$d$$-disjunct matrix and every set $$ T = \{j | \mathbf{x}_j = 1\} $$ where $$ |T| \leq d $$ and for each $$ j \notin T $$ where $$ 1 \leq j \leq n $$ there exists some $$i$$ where $$ 1 \leq i \leq t $$ such that $$ M_{i,j} = 1$$ but $$ M_{i,l} = 0 \text{ }\forall l \in T$$. Thus, if $$\mathbf{r}_i = 0 $$ then $$\mathbf{x}_j = 0$$.

Proof of Lemma 1: Given as input $$ \mathbf{r} \in \{0,1\}^t, M $$ use the following algorithm: By Observation 1 we get that any position where $$mathbf{r}_i = 0 $$ the appropriate $$ \mathbf{x}_j $$'s will be set to 0 by step 2 of the algorithm. By Observation 2 we have that there is at least one $$i$$ such that if $$ \mathbf{x}_j$$ is supposed to be 1 then $$ M_{i,j} = 1 $$ and, if $$ \mathbf{x}_j$$ is supposed to be 1, it can only be the case that $$ \mathbf{r}_i = 1 $$ as well. Therefore step 2 will never assign $$ \mathbf{x}_j $$ the value 0 leaving it as a 1 and solving for $$\mathbf{x}$$. This takes time $$ \mathcal{O}(nt) $$ overall. $$\Box$$
 * 1) For each $$ j \in [n] $$ set $$\mathbf{x}_j = 1 $$
 * 2) For $$ i = 1 \ldots t $$, if $$ \mathbf{r}_i = 0 $$ then for all $$ j \in [n] $$, if $$ M_{i,j} = 1 $$ set $$ \mathbf{x}_j = 0 $$

Upper Bounds for Non-Adaptive Group Testing
The results for these upper bounds rely mostly on the properties of $$d$$-disjunct matrices. Not only are the upper bounds nice, but from Lemma 1 we know that there is also a nice decoding algorithm for these bounds. First the following lemma will be proved since it is relied upon for both constructions:

Lemma 2: Given $$ 1 \leq d \leq n $$ let $$ M $$ be a $$t$$ x $$n$$ matrix and: for some integers $$a_{max} \leq w_{min} \leq t $$ then $$M$$ is $$\geq d' \lfloor \frac{w_{min} - 1}{a_{max}} \rfloor $$-disjunct.
 * 1) $$ \forall j \in [n] \text{, } |S_{M_j}| \geq w_{min} $$
 * 2) $$ \forall i \neq j \in [n] \text{, } |S_{M_i} \cap S_{M_j}| \leq a_{max} $$

Note: these conditions are stronger than simply having a subset of size $$d$$ but rather applies to any pair of columns in a matrix. Therefore no matter what column $$i$$ that is chosen in the matrix, that column will contain at least $$w_{min}$$ 1's and the total number of shared 1's by any two columns is $$a_{max}$$.

Proof of Lemma 2: Fix an arbitrary $$ S \subseteq [n], |S| \leq d, j \notin S $$ and a matrix $$ M $$. There exists a match between $$i \in S \text{ and } j \notin S$$ if column $$i$$ has a 1 in the same row position as in column $$j$$. Then the total number of matches is $$ \leq a_{max} \cdot d \leq a_{max} \cdot (\frac{w_{min} - 1}{a_{max}}) = w_{min} - 1 < \text{ } w_{min} $$, i.e. a column $$j$$ has a fewer number of matches than the number of ones in it. Therefore there must be a row with all 0s in $$S$$ but a 1 in $$j$$. $$\Box $$

We will now generate constructions for the bounds.

Randomized Construction
This first construction will use a probabilistic argument to show the property wanted, in particular the Chernoff bound. Using this randomized construction gives that $$ t(d,n) \leq \mathcal{O}(d^2 \log{n}) $$. The following lemma will give the result needed.

Theorem 1: There exists a random $$d$$-disjunct matrix with $$\mathcal{O}(d^2 \log{n}) $$ rows.

Proof of Theorem 1: Begin by building a random $$t$$ x $$n$$ matrix $$ M $$ with $$ t = cd^2 \log{n} $$ (where $$c$$ will be picked later). It will be shown that $$M $$ is $$\Omega(d)$$-disjunct. First note that $$M_{i,j} \in \{0,1\}$$ and let $$M_{i,j} = 1$$ independently with probability $$\frac{1}{d}$$ for $$i \in [t] $$ and $$j \in [n] $$. Now fix $$ j \in [n] $$. Denote the $$j^{th}$$ column of $$M$$ as $$T_j \subseteq [t] $$. Then the expectancy is $$\mathbb{E}[|T_j|] = \frac{t}{d}$$. Using the Chernoff bound, with $$\mu = \frac{1}{2} $$, gives $$ \mathrm{Pr}[ |T_j| < \frac{t}{2d}] \leq e^{\frac{-t}{12d}} = e^{\frac{-cd\log{n}}{12}} \leq n^{-2d} [$$if $$ c \geq 24 ]$$. Taking the union bound over all columns gives $$ \mathrm{Pr}[\exists j$$, $$ |T_j| < \frac{t}{2d}] \leq n \cdot n^{-2d} \leq n^{-d}$$. This gives $$ \mathrm{Pr}[\forall j $$, $$ |T_j| \geq \frac{t}{2d}] \geq 1 - n^{-d}$$. Therefore $$ w_{min} \geq \frac{t}{2d} $$ with probability $$ \geq 1 - n^{-d} $$.

Now suppose $$j \neq k \in [n] $$ and $$ i \in [t]$$ then $$\mathrm{Pr} [M_{i,j} = M_{i,k} = 1] = \frac{1}{d^2} $$. So $$\mathbb{E}[|T_j \cap T_k|] = \frac{t}{d^2}$$. Using the Chernoff bound on this gives $$\mathrm{Pr}[ |T_j \cap T_k| < \frac{2t}{d^2}] \leq e^{\frac{-t}{3d^2}} = e^{-2\log{n}} \leq n^{-4} [$$if $$ c \geq 12 ]$$. By the union bound over $$(j,k) $$ pairs $$ \mathrm{Pr}[\exists (j,k) $$ such that $$ |T_j \cap T_k| < \frac{2t}{d^2}] \leq n^2 \cdot n^{-4} = n^{-2}$$. This gives that $$ a_{max} \leq \frac{2t}{d^2} $$ and $$w_{min} \geq \frac{t}{2d} $$ with probability $$ \geq 1 - n^{-d} - n^{-2} \geq 1 - \frac{1}{n} $$. Note that by changing $$ c $$ the probability $$1 - \frac{1}{n}$$ can be made to be $$1 - \frac{1}{poly(n)}$$. Thus $$ d' = \lfloor\frac{\frac{t}{2d} - 1}{\frac{2t}{d^2}}\rfloor \approx \frac{d}{4} $$. By setting $$ d $$ to be $$ 4d $$, the above argument shows that $$ M$$ is $$d$$-disjunct.

Note that in this proof $$ t = d^2\log{n} $$ thus giving the upper bound of $$ t(d,n) \leq \mathcal{O}(d^2 \log{n}) $$. $$\Box $$

Strongly Explicit Construction
It is possible to prove a bound of $$ t(d,n) \leq \mathcal{O}(d^2\log^2{n}) $$ using a strongly explicit code. Although this bound is worse by a $$ \log{n} $$ factor it is preferable because this produces a strongly explicit construction instead of a randomized one.

Theorem 2: There exists a strongly explicit $$d$$-disjunct matrix with $$\mathcal{O}(d^2\log^2{n}) $$ rows.

This proof will use the properties of concatenated codes along with the properties of disjunct matrices to construct a code that will satisfy the bound we are after.

Proof of Theorem 2: Let $$ C \subseteq \{0,1\}^t, |C| = n $$ such that $$ C = \{\mathbf{c}_1,\ldots,\mathbf{c}_n\} $$. Denote $$M_C$$ as the matrix with its $$i^{th}$$ column being $$\mathbf{c}_i$$. If $$C^*$$ can be found such that then $$ M_{C^*} $$ is $$ \lfloor \frac{w_{min} - 1}{a_{max}} \rfloor $$-disjunct. To complete the proof another concept must be introduced. This concept uses code concatenation to obtain the result we want.
 * 1) $$ \forall i \in C^* \text{, } |\mathbf{c}_i| \geq w_{min} $$
 * 2) $$ \forall \mathbf{c}^1 \neq \mathbf{c}^2 \in C^* \text{, } |\{i  | \mathbf{c}^1_i = \mathbf{c}^2_i = 1\}| \leq a_{max} $$,

Kautz-Singleton '64

Let $$C^* = C_{out} \circ C_{in}$$. Let $$C_{out}$$ be a $$[q,k]_q$$-Reed-Solomon code. Let $$C_{in} = [q] \rightarrow \{0,1\}^q$$ such that for $$i \in [q]$$, $$c_{in}(i) = (0,\ldots,0,1,0,\ldots,0)$$ where the 1 is in the $$i^{th}$$ position. Then $$n = q^k$$, $$t = q^2$$, and $$w_{min} = q$$.

---

Example: Let $$k = 1, q = 3, C_{out} = \{(0,0,0), (1,1,1), (2,2,2)\}$$. Below, $$M_C$$ denotes the matrix of codewords for $$C_{out}$$ and $$M_{C^*}$$ denotes the matrix of codewords for $$C^* = C_{out} \circ C_{in}$$, where each column is a codeword. The overall image shows the transition from the outer code to the concatenated code.



---

Divide the rows of $$M_{C^*}$$ into sets of size $$q$$ and number them as $$(i,j) \in [q] \text{ x } [q]$$ where $$i$$ indexes the set of rows and $$j$$ indexes the row in the set. If $$M_{(i,j),k_1} = M_{(i,j),k_2} = 1$$ then note that $$\mathbf{c}_{k_1}(i) = \mathbf{c}_{k_2}(i) = j$$ where $$\mathbf{c}_{k_1}, \mathbf{c}_{k_2} \in C_{out} $$. So that means $$|M_{k_1} \cap M_{k_2}| = q - \Delta(\mathbf{c}_{k_1}, \mathbf{c}_{k_2})$$. Since $$ \Delta(\mathbf{c}_{k_1}, \mathbf{c}_{k_2}) \geq q - k + 1$$ it gives that $$|M_{k_1} \cap M_{k_2}| \leq k - 1$$ so let $$a_{max} = k - 1$$. Since $$t = q^2$$, the entries in each column of $$M_{C^*}$$ can be looked at as $$q$$ sets of $$q$$ entries where only one of the entries is nonzero (by definition of $$C_{in}$$) which gives a total of $$q$$ nonzero entries in each column. Therefore $$w_{min} = q $$ and $$d =_{def} \lfloor \frac{w_{min} - 1}{a_{max}} \rfloor$$ (so $$M_{C^*}$$ is $$d$$-disjunct).

Now pick $$q$$ and $$k$$ such that $$\lfloor \frac{q-1}{k-1}\rfloor = d$$ (so $$\lfloor \frac{q}{k}\rfloor \approx d$$). Since $$q^k = n$$ we have $$k = \frac{\log{n}}{\log{q}} \leq \log{n}$$. Since $$q \approx kd$$ and $$t = q^2$$ it gives that $$t = q^2 \approx (kd)^2 \leq (d \log{n})^2$$. $$\Box$$

Thus we have a strongly explicit construction for a code that can be used to form a group testing matrix and so $$t(d,n) \leq (d \log{n})^2$$.

For non-adaptive testing we have shown that $$\Omega(d\log{n}) \leq t(d,n)$$ and we have that (i) $$t(d,n) \leq \mathcal{O}(d^2\log^2{n})$$ (strongly explicit) and (ii) $$t(d,n) \leq \mathcal{O}(d^2\log{n})$$ (randomized). As of recent work by Porat and Rothscheld they presented a explicit method construction (i.e. deterministic time but not strongly explicit) for $$t(d,n) \leq \mathcal{O}(d^2\log{n})$$, however it is not shown here. There is also a lower bound for disjunct matrices of $$t(d,n) \geq \Omega(\frac{d^2}{\log{d}}\log{n})$$  which is not shown here either.