## Arithmetic progression in some subsets of $\mathbb{Z}_N$ (Part 2)

Last time I have mentioned the idea of the proof (of theorem 2). Now I continue and give a full detail here.

Lemma 7 The Bohr neighbourhood $\mathcal{B}(\mathcal{R}, \frac{\eta}{64})$ contains an arithmetic progression, named $P$, of length at least $\frac{\eta^3}{2^{17}\log(1/\beta)}N^\frac{\eta^2}{2^{11}\log(1/\beta)}$.

Proof. By Lemma 6 (see last post), every element in $\mathcal{R}$ is in the $\pm1$-linear span of a set $\Lambda$ of size at most $m = 2^{11}\eta^{-2}\log(1/\beta)$. Hence we have $\mathcal{B}(\Lambda, \frac{\eta}{64m}) \subset \mathcal{B}(\mathcal{R}, \frac{\eta}{64})$ (for, pick $x\in$LHS, then $\|\lambda x\|\leq\frac{\eta}{64m}$ for all $\lambda\in\Lambda$, and so by triangle inequality, together with the fact that elements in $\mathcal{R}$ can be written in the form $\sum^{m}_{i=1} \epsilon_i \lambda_i$, we have $\|\sum^{m}_{i=1} \epsilon_i \lambda_i x\|\leq m\|\lambda_i x\|\leq\frac{\eta}{64}$.) Note that by Lemma 5, LHS contains an AP of length at least $\frac{\eta}{64m}N^\frac{1}{m}$. The result follows. $\Box$

Lemma 8 For at least $(1-\frac{\eta}{16})N$ values of $x$ we have $|(x+P)\cap B|\leq\frac{16\beta}{\eta}|P|$.

Proof. Suppose not. Then more than $\frac{\eta}{16}N$ values of $x$ we have the reversed inequality. So

$\displaystyle \begin{array}{rcl} |P||B| &=& \sum_{x\in\mathbb{Z}_N} |\{(p, b) \in P\times B: p+x = b\}| \text{\quad (Since there are } |P||B| \text{ many choices of } x\text{)} \\ &=& \sum_{x\in\mathbb{Z}_N} |(x+P)\cap B| \\&>& \frac{\eta N}{16}\frac{16\beta}{\eta}|P| \\&=& |P||B|,\end{array}$

contradiction. $\Box$

Now let $C = \{x\in\mathbb{Z}_N: |(x+P)\cap B| \leq\frac{16\beta}{\eta}|P|\}$. By Lemma 8, $|C|\geq (1-\frac{\eta}{16})N$. Since the size of $C$ is large, it cannot have huge Fourier coefficient for $r\neq 0$.
(In fact, for $r\neq 0$, $|\hat{C}(r)| = |\widehat{C^c}(r)|\leq |C^c|\leq \frac{\eta N}{16}\leq \frac{\eta |C|}{8}$, where $C^c$ is the complement of $C$ and the first equality is because of the fact $\widehat{\mathbb{Z}_N}(r) = 0$ whenever $r\neq 0$.)

We now choose $D\subset C$, using probabilistic method, with certain size so that the Fourier coefficient of $D$ is small.

Lemma 9 Let $t\geq 2^{14}\eta^{-2}\log(N)$. Also assume $t\leq\frac{|C|}{2}$. Then there exists $D\subset C$ with size $t$ such that $\sup_{r\neq 0} |\hat{D}(r)|\leq\frac{\eta t}{4}$.

Proof. Choose a set $E\subset C$ by letting each $c\in C$ be in $E$ with probability $t/|C|$, these choices being independent.
We first compute the expectations and variances of the Fourier coefficient of $E$. Recall the notation $\omega = e^{\frac{2\pi i}{N}}$.

$\displaystyle \begin{array}{rcl} \mathbb{E}(\hat{E}(r)) &=& \sum_{x\in C} \mathbb{E}(E(x)\omega^{rx}) \\ &=& \sum_{x\in C} \omega^{rx} p \\ &=& p\hat{C}(r).\end{array}$

$\displaystyle \begin{array}{rcl} \text{Var}(\hat{E}(r)) &=& \mathbb{E}(|\hat{E}(r)|^2) -|\mathbb{E}(\hat{E}(r))|^2\\ &=& \mathbb{E}(|\sum_{x\in C}E(x)\omega^{rx}|^2)-p^2|\hat{C}(r)|^2\\ &=& \mathbb{E}(\sum_{x\in C}|E(x)|^2+\sum_{x\neq y}E(x)E(y)\omega^{r(x-y)})\\&\phantom{=}&-p^2(\sum_{x\in C}|C(x)|^2+\sum_{x\neq y}|C(x)||C(y)|\omega^{r(x-y)})\\ &=& p|C|+p^2\sum_{x\neq y}\omega^{r(x-y)}-p^2|C|-p^2\sum_{x\neq y}\omega^{r(x-y)}\\ &=& p|C|-p^{2}|C|. \end{array}$

Now consider the random variable $X^{(r)}_j = (E(j)-p)\omega^{rj}$. By above computations we have $\mathbb{E}(X^{(r)}_j) = 0$, $\text{Var}(X^{(r)}_j) = p|C|-p^2|C| \geq \frac{p|C|}{2} = \frac{t}{2}\geq \frac{\eta t}{24}$. Also it is easy to see that $|X^{(r)}_j| \leq 1$. Therefore, we can apply Lemma 4 with these random variables $X^{(r)}_j$.

Case 1: $r\neq0$. Then

$\displaystyle \begin{array}{rcl} \mathbb{P}(|\hat{E}(r)|\geq\frac{\eta t}{6}) &\leq&\mathbb{P}(|\hat{E}(r)-\mathbb{E}(\hat{E}(r))|\geq\frac{\eta t}{24})\text{\qquad (Triangle inequality)}\\&<&4e^{-\frac{\eta^2t^2}{8\times24^2(p|C|-p^2|C|)}}\text{\qquad (Lemma 4)}\\&<&4e^{-\frac{\eta^2t}{5000}}\end{array}$

Case 2: $r=0$. Then $\displaystyle\mathbb{P}(||E|-t|\geq\frac{\eta t}{24})<4e^{-\frac{\eta^2t}{5000}}$ by Lemma 4.

Therefore, if $t \geq 2^{14}\eta^{-2}\log{N}$, $4e^{-\frac{\eta^2t}{5000}} < 1/2$, and so there is a positive probability such that $||E|-t| < \frac{\eta t}{24}$ and $|\hat{E}(r)|<\frac{\eta t}{6}$ happens. By adding or deleting suitable number of elements (which is at most $\frac{\eta t}{24}$)  from $E$ we will get $D$ (Since the number of changed elements is at most $\frac{\eta t}{24}$, $|\hat{D}(r)-\hat{E}(r)| \leq \frac{\eta t}{24}$ and so $|\hat{D}(r)|\leq \frac{\eta t}{4}$). $\Box$

With the same proof we get the following lemma:

Lemma 10 Let $t\geq 2^{14}\eta^{-2}\log(N)$. Also assume $t\leq\beta N/2$. Then there exists $X\subset B$ with size $t$ such that $|\hat{X}(r) - \frac{t\hat{B}(r)}{|B|}|\leq\frac{\eta t}{12}$ for $r\neq0$.

Remark: The above two lemmas actually mean that for any set $G\subset \mathbb{Z}_N$, there is always a subset $H$ such that 1. $|G|/2\geq|H|\geq 2^{14}\eta^{-2}\log{N}$; 2. $H$ has Fourier coefficients close to $\frac{|H|}{|G|}\hat{G}(r)$.

Lemma 11 Let $S = (B\setminus X)\cup D$. Assume $(B\setminus X)\cap D=\emptyset$. Then $\sup_{r\in\mathcal{R}}|\hat{S}(r)|\leq\eta S - \frac{\eta t}{6}$ and $|\hat{S}(r)|\leq \frac{\eta|S|}{2} + \frac{\eta t}{3}$ for $r\in\mathcal{R}$.

(Little remark: I avoid the notion “multiset” which is used in original paper. Although the concept of multiset is easy to understand but somehow there is ambiguity, and so I prefer saying that $S$ is disjoint union of two sets instead of saying $S$ is a multiset.)
Proof. Note that
$\displaystyle \begin{array}{rcl} \hat{S}(r)&=&\hat{B}(r)-\hat{X}(r)+\hat{D}(r)\\&=&(1-\frac{t}{|B|})\hat{B}(r)+(\hat{D}(r)-(\hat{X}(r)-\frac{t\hat{B}(r)}{|B|})).\end{array}$
If $r\in\mathcal{R}$,

$\displaystyle \begin{array}{rcl} |\hat{S}(r)|&\leq&(1-\frac{t}{|B|})|\hat{B}(r)|+\frac{\eta t}{3}\text{\qquad (By above calculation, lemma 9 and lemma 10)}\\&\leq&|\hat{B}(r)|-\frac{\eta t}{2}+\frac{\eta t}{3}\text{\qquad (Definition of }\mathcal{R}\text{)}\\&\leq&\eta|B|-\frac{\eta t}{6}\\&=&\eta|S|-\frac{\eta t}{6}.\text{\qquad (Since }|S|=|B|\text{)}\end{array}$

If $r\not\in\mathcal{R}$,

$\displaystyle \begin{array}{rcl} |\hat{S}(r)|&\leq&(1-\frac{t}{|B|})|\hat{B}(r)|+\frac{\eta t}{3}\\&\leq&\frac{\eta|B|}{2}+\frac{\eta t}{3}\text{\qquad (Again definition of }\mathcal{R}\text{)}\\&=&\frac{\eta|S|}{2}+\frac{\eta t}{3}.\end{array}$

The result follows. $\Box$

Write $D=\{d_1, \cdots, d_t\}$. Let $D'$ be any set obtained by replacing $d_j$ by $x_j+d_j$, where $x_j\in P$. (Recall: $P$ is an AP in the Bohr neighbourhood of $\mathcal{R}$)

Lemma 12 Suppose that $t\leq\frac{\eta\beta N}{10}$. Let $S' = (B\setminus X)\cup D'$. Assume the union is disjoint. Then $\sup_{r\neq0} |\widehat{S'}(r)| < \eta |S'|$.

Proof. If $r\not\in\mathcal{R}$, as we changed the elements in $D$, $|\widehat{S'}(r)-\hat{S}(r)| \leq \sum_{x\in D'} + \sum_{x\in D} 1 = 2t$. Hence

$\displaystyle \begin{array}{rcl} |\widehat{S'}(r)|&\leq&2t+|\hat{S}(r)|\\&\leq&\frac{\eta|S'|}{2}+\frac{\eta t}{2}+2t\\&<&\frac{\eta|S'|}{2}+5t\\&\leq&\frac{\eta|S'|}{2}+\frac{\eta|S'|}{2}\\&=&\eta|S|.\end{array}$

If $r\in\mathcal{R}$, recall $P\subset\mathcal{B}(\mathcal{R}, \frac{\eta}{64})$, then

$\displaystyle \begin{array}{rcl} |\widehat{S'}(r)|&\leq&2t+|\hat{S}(r)|\\&\leq&\sum_{j=1}^{t} |\omega^{r(d_j+x_j)}-\omega^{rd_j}|\\&\leq&t\sup_{j}|\omega^{rx_j}-1|\\&\leq&\frac{\eta t}{8}.\text{\qquad(Since }\|rx_j\|\leq\frac{\eta}{64}\text{ together with some elementary inequalities)}\end{array}$

Therefore $|\widehat{S'}(r)|\leq\frac{\eta t}{8}+|\hat{S}(r)| = \frac{\eta t}{8}+\eta|S'|-\frac{\eta t}{6} < \eta|S'|$. $\Box$

We are almost done (finally!). Recall $B\subset A$ is the set such that $\sup_{r\neq0}|\hat{B}(r)|$ is minimal subject to the constraint $|B| = \beta N$. Therefore if $D'\subset A$, $S'$ could not exist! In another words, no such choice of $x_1,\cdots, x_t$ such that one could construct $S'$.

Lemma 13 There exists some $j$ such that $d_j + P$ is contained in $B\cup A^c$, except for at most $t$ elements, which could lie in $A\setminus B$.

Proof. Suppose not. Then for all $j$, more than $t$ elements in $d_j+P$ lie in $A\setminus B$. Choose $x_1\in P$ such that $d_1+x_1\in A\setminus B$. Choose $x_2\in P$ such that $d_2+x_2 \in A\setminus (B\cup\{d_1+x_1\})$. Continue this way, choose $x_t\in P$ such that $d_t+x_t\in A\setminus (B\cup \bigcup_{j=1}^{t-1}\{d_j+x_j\})$. This gives us a $D'\subset A$, contradiction. $\Box$

Let $j$ be such that Lemma 13 holds. Recall that $d_j\in C$. Therefore $|(d_j+P)\cap B|\leq\frac{16\beta}{\eta}|P|$. Therefore, $A^c$ contains at least $(1-\frac{16\beta}{\eta})|d_j+P|$ and minus at most $t$ points.
If one choose $t\leq\frac{16\beta}{\eta}|P|$, $A^c$ will contain a portion $(1-\frac{32\beta}{\eta})$ of $d_j+P$. Suppose further that the parameters have also been chosen such that $|P|\geq\frac{\eta}{32\beta}$. Then $A^c$ contains more than $(1-\frac{32\beta}{\eta})$ of $P$ must contain an AP of length at least $\frac{\eta}{96\beta}$.

Remark: If we choose $\beta$ such that $|A|<\beta N$, then it is even impossible to define $B$. But in this case $A^c$ will contain an AP of length at least $1/\beta > \eta/96\beta$.

Proposition 14 Suppose that $t\leq\frac{16\beta}{\eta}|P|$, $|P|\geq\frac{\eta}{32\beta}$, and $t$ satisfies the condition in Lemma 9, 10. Then $A^c$ contains an AP of length at least $\frac{\eta}{96\beta}$.
By taking $\beta=e^{-\alpha\sqrt{\log{N}}/16}$, $t = 2^{14}\eta^{-2}\log{N}$, with some painful calculations we can get theorem 2. (Actually I didn’t check this, it may be wrong. But I think this part is less important compared to the whole argument.)