User:OmriBH/sandbox

Definition
Consider a finite set on n-dimensional grid points:


 * $$S\subset \mathbb{Z}^n$$

For each dimension $$i\in [n]$$, let $$S_i$$ denote the (n-1)-dimensional projection of $$S$$ onto the coordinates $$[n]\setminus \{i\}$$.

Then the Discrete Loomis–Whitney inequality holds:


 * $$|S|^{n-1}\leq \prod_{i=1}^n |S_i|$$.

An explicit proof of the discrete inequality, using an efficient runtime construction, was shown in 2018 by Ngo et al., and is presented in the following sections.

General Case Proof Alt.
In the general case, we can generalize the algorithm presented in the previous section by the use of inductive construction. Using the same notations,
 * $$S\subseteq \bowtie_{i=1}^n S_i$$

We wish to prove:
 * $$|S|^{n-1}\leq \prod_{i=1}^n |S_i|$$.

Intuitively, we construct all the possibilities of $$S$$ by starting with only the $$n-1, n$$-th coordinates of $$S_n, S_{n-1}$$ respectively, then adding each projection and its corresponding coordinate for all the held projections, and calculating their join. After adding all projections and dimensions we get the join of all the projections, which contains $$S$$.

For each step $$i$$, let denote:
 * $$q_i = \bowtie_{j=i}^n \pi_{[i, n]} (S_j)$$
 * $$L_i = \pi_i (q_i)$$

Where:
 * $$S\subseteq q_1 = \bowtie_{i=1}^n S_i$$

Note that:
 * $$ q_i = \bigcup_{t\in L_i} q_i[t]$$

Since $$q_i, L_i$$ are join problems of smaller magnitude, we can construct them recursively.

Indeed, note that:
 * $$L_i = \bigcap_{j=i+1}^n \pi_i (S_j)$$

and for every $$t\in L_i$$:
 * $$q_i[t]=(\bowtie_{j=i+1}^n \pi_{[i+1, n]} (S_j[t])) \cap (\{t\}\times\pi_{[i+1, n]}(S_i))$$

Let denote $$Z_i=\bowtie_{j=i+1}^n \pi_{[i+1, n]} (S_j[t])$$. In case $$\left|Z_i\right|\geq \left|\pi_{[i+1, n]}(S_i)\right|$$, it is more efficient to calculate the intersection between $$\pi_{[i+1, n]}(S_i)$$ and each projection composing of $$Z$$ instead of calculating the join of $$Z_i$$, and visa versa.

Therefore, each step of dimension $$i$$ of the algorithm is as follows (note the sizes correctness later in the proof):
 * Calculate $$L_i = \bigcap_{j=i+1}^n \pi_i (S_j)$$
 * Initialize $$q_i=\emptyset$$
 * For each $$t\in L_i$$:
 * If $$\left|\pi_{[i+1, n]}(S_i)\right|\geq\prod_{j=i+1}^n \left|\pi_{[i+1, n]}(S_j[t])\right|^\frac{1}{n-i-1}$$ then:
 * Calculate $$Z_i$$ recursively, by continuing to the $$i+1$$th step with the respected projections according to $$t$$.
 * Filter $$Z_i$$ by $$\pi_{[i+1, n]}(S_i)$$.
 * Set the result to be $$q_i[t]$$
 * Otherwise:
 * Filter $$\pi_{[i+1, n]}(S_i)$$ against every $$\pi_{[i+1, n]} (S_j[t])$$.
 * Set the result to be $$q_i[t]$$
 * Add $$\{t\}\times q_i[t]$$ to $$q_i$$
 * Return $$q_i$$

We prove correctness by showing that the following invariant holds for each step $$1\leq i\leq n-1$$ of the recursion:
 * $$\left|q_i\right| = \left|\bowtie_{j=i}^n \pi_{[i, n]} (S_j)\right|\leq \prod_{j=i}^n \left|\pi_{[i, n]}(S_j)\right|^\frac{1}{n-i}$$

For the base case, $$i=n-1$$, we have:
 * $$L_{n-1} = \pi_{n-1} (S_n)$$

And for every $$t\in L_{n-1}$$:=
 * $$q_{n-1}[t]=\pi_n (S_{n-1})$$

Thus, the algorithm will build $$q_{n-1}$$ by:
 * $$q_{n-1}=\bigcup_{t\in L_{n-1}} q_{n-1}[t]=\bigcup_{t\in \pi_{n-1} (S_n)} (\{t\}\times\pi_n (S_{n-1}))= \pi_{n-1} (S_n) \times \pi_n (S_{n-1})$$

And we have:
 * $$\left|q_{n-1}\right|\leq \left|\pi_{n-1} (S_n)\right| \cdot \left|\pi_n (S_{n-1})\right|$$

which proves the base step.

For the general step $$i$$, we have:
 * $$L_i = \bigcap_{j=i+1}^n \pi_i (S_j)$$
 * $$q_i[t]=\bowtie_{j=i+1}^n \pi_{[i+1, n]} (S_j[t]) \cap \{t\}\times\pi_{[i+1, n]}(S_i)$$
 * $$q_i = \bigcup_{t\in L_i} q_i[t]$$

First, we claim that for each $$q_i[t]$$:
 * $$\left|q_i[t]\right|\leq \prod_{j=i}^n \left|\pi_{[i+1, n]}(S_j[t])\right|^\frac{1}{n-i}$$

From the definition of $$q_i[t]$$, we have:
 * $$\left|q_i[t]\right|\leq min(\left|\bowtie_{j=i+1}^n \pi_{[i+1, n]} (S_j[t])\right|,\left|\{t\}\times\pi_{[i+1, n]}(S_i)\right|)$$

By the induction hypothesis:
 * $$\left|\bowtie_{j=i+1}^n \pi_{[i+1, n]} (S_j[t])\right|\leq \prod_{j=i+1}^n \left|\pi_{[i+1, n]}(S_j)\right|^\frac{1}{n-i-1}$$

Simplifying:
 * $$\left|q_i[t]\right|\leq min(\prod_{j=i+1}^n \left|\pi_{[i+1, n]}(S_j[t])\right|^\frac{1}{n-i-1},\left|\pi_{[i+1, n]}(S_i)\right|)$$

Since for every $$a, b\geq 0, 0\leq c\leq 1$$ we have $$min(a,b)\leq a^c\cdot b^{1-c}$$, we can choose $$c=\frac{n-i-1}{n-i}$$:
 * $$\left|q_i[t]\right|\leq(\prod_{j=i+1}^n \left|\pi_{[i+1, n]}(S_j[t])\right|^\frac{1}{n-i-1})^\frac{n-i-1}{n-i}\cdot\left|\pi_{[i+1, n]}(S_i)\right|^\frac{1}{n-i}=\prod_{j=i+1}^n \left|\pi_{[i+1, n]}(S_j[t])\right|^\frac{1}{n-i}\cdot\left|\pi_{[i+1, n]}(S_i)\right|^\frac{1}{n-i}=\prod_{j=i}^n \left|\pi_{[i+1, n]}(S_j[t])\right|^\frac{1}{n-i}$$

Proving the claim.

Now, proving the size bound for $$q_i$$:
 * $$\left|q_i\right|=\sum_{t\in L_i} \left|q_i[t]\right|\leq\sum_{t\in L_i}\prod_{j=i}^n \left|\pi_{[i+1, n]} (S_j[t])\right|^\frac{1}{n-i}$$

Note that for every $$t\in L_i$$, $$\left|\pi_{[i+1, n]} (S_j[t])\right|=\left|\pi_{[i, n]} (S_j) \ltimes \{t\}_i\right|$$, where the notation represents the semijoin query where $$t$$ is the $$i$$-th coordinate. Thus:
 * $$=\sum_{t\in L_i}\prod_{j=i}^n \left|\pi_{[i, n]} (S_j)\ltimes \{t\}_i\right|^\frac{1}{n-i}$$

Since the $$S_i$$ set does not have the $$i$$th coordinate, it can be seperated:
 * $$=\sum_{t\in L_i}\prod_{j=i}^n \left|\pi_{[i, n]} (S_j)\ltimes \{t\}_i\right|^\frac{1}{n-i}=\left|\pi_{[i, n]} (S_i)\right|^\frac{1}{n-i}\sum_{t\in L_i}\prod_{j=i+1}^n \left|\pi_{[i, n]} (S_j)\ltimes \{t\}_i\right|^\frac{1}{n-i}$$

By using Hölder's inequality:
 * $$\leq\left|\pi_{[i, n]} (S_i)\right|^\frac{1}{n-i}\prod_{j=i+1}^n (\sum_{t\in L_i}\left|\pi_{[i, n]} (S_j)\ltimes \{t\}_i\right|)^\frac{1}{n-i}=\left|\pi_{[i, n]} (S_i)\right|^\frac{1}{n-i}\prod_{j=i+1}^n \left|\pi_{[i, n]} (S_j)\right|^\frac{1}{n-i}=\prod_{j=i}^n \left|\pi_{[i, n]} (S_j)\right|^\frac{1}{n-i}$$

Which wraps up the proof. Therefore, for the last case of $$S\subseteq q_1$$, we get:
 * $$|S|^{n-1}\leq \prod_{i=1}^n |S_i|$$

Original 1949 Proof
We present an algorithm, based on the original 1949 proof by Loomis and Whitney, that given the $$(n-1)$$-dimensional projections of $$S$$, finds every possible element in $$S$$ and shows that the number of elements found is at most:


 * $$|S|^{n-1}\leq \prod_{i=1}^n|S_i|$$.

The algorithm is inductive - assuming the construction of $$(n-1)$$ dimensions, we present a construction for the $$n$$-th dimension of $$S$$. As opposed to the previous algorithm presented, this algorithm is not effective, and usage of the construction induction introduces exponential runtime.

For $$n=2$$, the construction is trivial - consider the cartesian product of the two projection onto the $$x, y$$ axes.

For the inductive step, we first calculate the projection of each of the $$(n-1)$$-dimensional projections of $$S$$ onto the first coordinate (except the first projection, $$S_1$$):
 * $$L_{i} = \pi_{1} (S_i) $$

Denote the union of the results:
 * $$L = \bigcup_{i=2}^n L_{i} = \{l_1, l_2, ..., l_r\}$$

For each $$j = 1, 2, ..., r$$, calculate the reduction of each $$S_i$$ projection, where the first coordinate is fixed to be $$l_j$$. Denote the resulted set $$S_i^j$$.

Note that by using the inductive step for $$n-1$$ dimension, we can construct the sets $$S^j$$ by using $$S_i^j$$ sets for every $$i=2, 3,..., n$$, getting:
 * $$\left|S^j\right|^{n-2} \leq \prod_{i=2}^n \left|S_i^{j}\right|$$

Note that each $$S^j$$ contains all elements of $$S$$ where the first coordinate is fixed to be $$l_j$$. This gives the possible set of elements in $$S$$: $$S \subseteq \bigcup_{j=1}^r S^j$$. It is left to show that the number of elements we calculated is of a correct size.

Note that $$\left|S^j \right| \leq \left|S_1\right|$$. Therefore, we can multiply the last two inequalities respectively:
 * $$\left|S^j\right|^{n-1} \leq \left|S_1\right| \prod_{i=2}^n \left|S_i^{j}\right|$$

Note that $$\sum_{j=1}^r {\left|S_i^j\right|} = \left|S_i\right|$$ for every coordinate $$i$$, and $$\sum_{j=1}^r {\left|S^j\right|} = \left|S\right|$$. Therefore:


 * $$\left|S\right|=\sum_{j=1}^r {\left|S^j\right|} \leq \sum_{j=1}^r \left|S_1\right|^\frac{1}{n-1} \prod_{i=2}^n \left|S_i^{j}\right|^\frac{1}{n-1}$$

By using Hölder's inequality:
 * $$\left|S\right|\leq \left|S_1\right|^\frac{1}{n-1} \prod_{i=2}^n \left(\sum_{j=1}^r \left|S_i^{j}\right|\right)^\frac{1}{n-1} = \left|S_1\right|^\frac{1}{n-1} \prod_{i=2}^n \left|S_i\right|^\frac{1}{n-1} = \prod_{i=1}^n \left|S_i\right|^\frac{1}{n-1}$$

Therefore:
 * $$\left|S\right|^{n-1}\leq \prod_{i \in [n]} \left|S_i\right|$$

As required.

3-Dimensions Proof Example
Consider three groups $$A, B, C \subset \mathbb{R}$$, and three relations $$R \subseteq A\times B, S\subseteq B\times C,$$ and $$T\subseteq C \times A$$. For the simplicity, consider the case when $$\left|R\right|=\left|S\right|=\left|T\right|=k$$. It is sufficient to find all $$(x, y, z)\in J$$ such that $$\left(x,y\right)\in R,\left(y,z\right)\in S$$ and $$\left(z,x\right)\in T$$, and to show that:
 * $$|J|\leq \sqrt{\left|R\right|\left|S\right|\left|T\right|} = k^{1.5}$$

Intuitively, we divide $$S$$ by the $$y$$ coordinate to "heavy hitters" ($$y$$ values which occur more than a threshold values), and "non-hitters", then use each group separately to build $$J$$.

Let $$D$$ be the set of all $$y \in B$$ that appear at least $$\sqrt{k}$$ times in $$S$$ and let $$G$$ be the complement of $$D$$:
 * $$D=\left\{ y\mid\left|\left\{ \left(y,z\right)\in S\right\} \right|>\sqrt{k}\right\}$$
 * $$G=\left\{ y\mid\left|\left\{ \left(y,z\right)\in S\right\} \right|\leq\sqrt{k}\right\}$$

Define $$S_G$$ and $$S_D$$ to be the subsets of $$S$$ which each $$y$$ coordinate belongs to $$G$$ and $$D$$ respectively:
 * $$S_{G}=\left\{ \left(y,z\right)\in S\mid y\in G\right\}$$
 * $$S_{D}=\left\{ \left(y,z\right)\in S\mid y\in D\right\}$$

Note that the Natural Join query of $$S_{G}$$ with $$R$$ and of $$S_D$$ with $$T$$ holds:
 * $$J\subseteq \left(R\bowtie S_{G}\right) \cup \left(S_{D}\bowtie T\right)$$

The union is disjoint, since $$S_{D}, S_{G}$$ are disjoint. Furthermore, we can filter the second compound with a Semi-Join query with $$R$$ to narrow the computation:
 * $$J\subseteq \left(R\bowtie S_{G}\right) \cup \left(S_{D}\bowtie T\ltimes R\right)$$

By calculating these joins we can get all the potential elements of $$J$$. In order to prove that $$|J|\leq \sqrt{\left|R\right|\left|S\right|\left|T\right|} = k^{1.5}$$, it is enough to show that:
 * $$\left|J\right|\leq\left|R\bowtie S_G\right| + \left|T\bowtie S_D\ltimes R\right| \leq k\sqrt{k}$$

It remains to examine the size of each join compound.

For $$R\bowtie S_{G}$$, we argue:
 * $$\left|R\bowtie S_{G}\right|\leq \sum_{y\in G} \left|R_x[y]\right|\cdot\left|{S_G}_z [y]\right|\leq\sqrt{k}\sum_{y\in G} \left|R_x[y]\right| = \sqrt{k}\sum_{y\in R\setminus D} \left|R_x[y]\right|$$
 * $$ = \sqrt{k}\sum_{y\in R} \left|R_x[y]\right| - \sqrt{k}\sum_{y\in D} \left|R_x[y]\right| = k\sqrt{k} - \sqrt{k}\sum_{y\in D} \left|R_x[y]\right|$$
 * $$ = k\sqrt{k} - \sqrt{k}\left|R\ltimes D\right|$$

Where $$R_x[y]$$ stands for all $$x$$ values of $$R$$ where $$y$$ is fixed, and $${S_G}_z[y]$$ stands for all $$z$$ values of $$S_G$$ where $$y$$ is fixed. Note that by the construction of $$S_G$$ and since $$\left|R\right|=k$$ it holds that $$\sum_{y\in G} \left|R_x[y]\right|\cdot\left|{S_G}_z [y]\right|=\sqrt{k}\sum_{y\in R} \left|R_x[y]\right|=k\sqrt{k}$$.

Additionally, $$\sum_{y\in D} \left|R_x[y]\right|$$ is exactly the Semi-Join query of $$R$$ and $$D$$.

Recall that we want to prove that $$\left|J\right|\leq\left|R\bowtie S_G\right| + \left|T\bowtie S_D\ltimes R\right| \leq k\sqrt{k}$$, thus, all that is left is to show that:
 * $$\left|T\bowtie S_D\ltimes R\right| \leq \sqrt{k}\left|R\ltimes D\right|$$

Let examine the limitations of $$z$$-value occurrences in $$\left|T\bowtie S_D\ltimes R\right|$$. Note that $$\left|D\right| \leq \sqrt{k}$$ due to the Pigeonhole principle, since every $$y\in D$$ appears at least $$\sqrt{k}$$ times in $$S$$.

Let $$C_D$$ be the set of all $$z\in C$$ that there exist at least $$\left|D\right|$$ tuples $$\left(y,z\right)\in S_D$$:


 * $$C_D=\left\{ z\mid\left|\left\{ y\mid\left(y,z\right)\in S_D\right\} \right|>\left|D\right|\right\} $$.

Note that for every $$z\in C$$: $$\left\{ y\mid\left(y,z\right)\in S_{D}\right\} \subseteq S_D$$. Thus:


 * $$\left|\left\{ y\mid\left(y,z\right)\in S_D\right\} \right|\leq\left|D\right|$$

Then, $$C_D = \emptyset$$.

Now, consider $$(x,y,z)\in T\bowtie S_D\ltimes R$$. Note that $$(x,z)\in R\ltimes D$$. From the last result, there are at most $$\sqrt{k}$$ values of $$\hat{y}\in D$$ for which $$(\hat{y},z)\in S_D$$. Thus, there are at most $$(x,\hat{y},z)$$ tuples for which $$(x,\hat{y},z)\in T\bowtie S_D\ltimes R$$. Therefore: (Is this correct?)
 * $$\left|T\bowtie S_D\ltimes R\right| \leq \sqrt{k}\left|R\ltimes D\right|$$

As required, meaning:
 * $$\left|J\right|\leq k\sqrt{k}$$

General Case Proof
In the general case, we can generalize the algorithm presented in the previous section by the use of inductive construction. Intuitively, each step $$i$$ handles each projection $$S_i$$, by similarly splitting it on "heavy hitters" of all the next coordinates, then joining them with the heavy-hitters from the previous construction, adding it to the previous result while filtering it according to the current projection.

Let denote by $$J_i$$ the resulted construction of the $$i$$-th step.

Consider the $$i$$-th step of the inductive construction. Construct the following sets:
 * $$F_i = \pi_{[i+1,n]}[S_i] \cap \pi_{[i+1,n]}[D_{i-1}]$$
 * $$D_i=\left\{ x\in F_i \mid \left| D_{i-1} [x]\right| > \frac{P}{\left|S_i\right|}\right\}$$
 * $$G_i=\left\{ x\in F_i \mid \left| D_{i-1} [x]\right|\leq\frac{P}{\left|S_i\right|}\right\}$$

Where $$D_1$$ is defined trivially as $$S_1$$.

Then, construct the next step (assume $$J_1=\emptyset$$):
 * $$J_i = J_{i-1} \cup (D_{i-1}\bowtie S_i \ltimes G_i)$$

With the exception of the last step, since $$F_n$$ is empty therefore $$G_n, D_n$$, so we define instead:
 * $$J_n = J_{n-1} \cup (D_{n-1}\bowtie S_n)= J_{n-1} \cup (D_{n-1}\times S_n)$$

In order to prove correctness, we first prove an invariant of the algorithm. For every $$2\geq i < n$$, we claim:
 * $$\pi_{[i+1, n]}[J\setminus J_i]\subseteq D_i$$

We prove by induction. For the base $$i=2$$ case, we have:
 * $$J_2 = (\emptyset) \cup (D_1\bowtie S_2\ltimes G_2)= S_1\bowtie S_2\ltimes G_2$$

We can write:
 * $$S_1\bowtie S_2 = (S_1\bowtie S_2\ltimes G_2) \cup (S_1\bowtie S_2\ltimes D_2)$$

Since $$J\subseteq S_1\bowtie S_2$$, then we have:
 * $$J\setminus J_2 \subseteq S_1\bowtie S_2\ltimes D_2$$

Therefore:
 * $$\pi_{[3, n]}[J\setminus J_2]\subseteq \pi_{[3, n]}[S_1\bowtie S_2\ltimes D_2]\subseteq D_2$$

As required.

For the general case, assuming the induction hypothesis:
 * $$\pi_{[i, n]}[J\setminus J_{i-1}]\subseteq D_{i-1}$$

We also have:
 * $$\pi_{[i+1, n]}[J]\subseteq \pi_{[i+1, n]}[S_i]$$

Therefore:
 * $$\pi_{[i+1, n]}[J\setminus J_{i-1}]\subseteq \pi_{[i+1, n]}[D_{i-1}\cap S_i]= G_i \cup D_i$$

Thus, every $$t\in J\setminus J_{i-1}$$ has its projection either in $$G_i$$ or in $$D_i$$. Assuming $$\pi_{[i+1, n]}[t]\in G_i$$, then together with the induction hypothesis:
 * $$t\in D_{i-1}\bowtie S_i \ltimes G_i$$

Since:
 * $$J_i = J_{i-1} \cup (D_{i-1}\bowtie S_i \ltimes G_i)$$
 * $$J\setminus J_i = J\setminus (J_{i-1} \cup (D_{i-1}\bowtie S_i \ltimes G_i))$$

Then $$t\notin J\setminus J_i$$, meaning $$\pi_{[i+1, n]}[J\setminus J_i] \subseteq D_i$$ Proving the correctness of the invariant.

From this invariant, the correctness of the build follows. For the $$n-1$$ step, we have:
 * $$\pi_{n}[J\setminus J_{n-1}]\subseteq D_{n-1}$$

Recall:
 * $$J_n = J_{n-1} \cup (D_{n-1}\times S_n)$$

Thus:
 * $$J\setminus J_n = J\setminus (J_{n-1} \cup (D_{n-1}\times S_n))\subseteq (J\setminus J_{n-1}) \cap (J\setminus D_{n-1}\times S_n)$$

Consider $$t\in J\setminus J_n$$, then $$t\in J\setminus J_{n-1}$$ and $$t\in J\setminus D_{n-1}\times S_n$$. From the first component, $$\pi_{n}[t]\in D_{n-1}$$, then $$t\in D_{n-1}\times S_n$$ - contradicting $$t\in J\setminus D_{n-1}\times S_n$$, therefore $$J\setminus J_n=\emptyset$$.

This implies $$J\subseteq J_n$$. We can compute $$J$$ by filtering on each $$S_i$$.

It remains to show that the resulted $$J_n$$ is of the required size. For that, we prove the following invariant by induction for $$i\geq 2$$:
 * $$\left|J_i\right|\leq P$$

First, consider the base case $$i=2$$:
 * $$J_2 = (\emptyset) \cup (D_1\bowtie S_2\ltimes G_2)= S_1\bowtie S_2\ltimes G_2$$

Thus:
 * $$\left|J_2\right| = \left|S_1\bowtie S_2\ltimes G_2\right|\leq\sum_{t\in G_2}\left|S_1[t]\right|\cdot\left| S_2[t]\right|$$
 * $$\leq\sum_{t\in G_2}\left| S_2[t]\right|\frac{P}{\left|S_2\right|}=\left|S_2\right|\frac{P}{\left|S_2\right|}$$
 * $$=P$$

For the general $$i$$-th step:
 * $$J_i = J_{i-1} \cup (D_{i-1}\bowtie S_i \ltimes G_i)$$

Using the induction hypothesis $$\left|J_{i-1}\right|\leq P$$:
 * $$\left|J_i\right| \leq \left|J_{i-1}\right| + \left|D_{i-1}\bowtie S_i \ltimes G_i\right|$$
 * $$\leq P + \left|D_{i-1}\bowtie S_i \ltimes G_i\right|$$

Thus, it is enough to show (INCORRECT):
 * $$\left|D_{i-1}\bowtie S_i \ltimes G_i\right|\leq P$$

And indeed:
 * $$\left|D_{i-1}\bowtie S_i \ltimes G_i\right|\leq\sum_{t\in G_i} \left|D_{i-1}[t]\right|\cdot \left|S_i[t]\right|$$
 * $$\leq\sum_{t\in G_i}\left|S_i[t]\right|\cdot\left|S_i\right|=\left|S_i\right|\cdot\frac{P}{\left|S_i\right|}$$
 * $$=P$$

As required.