1. Symmetrization

Stat/Empirical Process

1. Symmetrization

repaired_stat 2023. 1. 10. 18:37

본 내용은 아래의 노트 (Lecture notes in empirical process theory, Kengo Kato, 2019)를 참조하여 작성하였습니다.

https://drive.google.com/file/d/0B7C_CufYq6j6QU5rblF2Yl85d3c/view?resourcekey=0-ItZa4Z1yrAGhUa7scVo_aw

empirical_process_v3.pdf

drive.google.com

The main object is to study probability estimates of the random quantity $$ \| P_n − P \|_\mathcal{F} := \text{sup}_{f \in \mathcal{F}} | P_n f − P f |, $$ and limit theorems for the empirical process $(P_n − P) f, f \in \mathcal{F}.$
The symmetrization replaces $ \sum^n_{i = 1} (f(X_i) − P f)$ by $\sum^n_{i = 1} \varepsilon_i f(X_i)$ with independent Rademacher random variables $\varepsilon_1, \cdots , \varepsilon_n$ independent of $X_1, \cdots , X_n.$
A Rademacher random variable $\varepsilon$ is a random variable taking $\pm 1$ with equal probability, that is, $P(\varepsilon = 1) = P(\varepsilon = −1) = \frac{1}{2}.$
$\mathbb{E}_\varepsilon$ denotes the expectation with respect to $\varepsilon_1, \varepsilon_2,\cdots$ only.
Also, $\mathbb{E}_X$ denotes the expectation with respect to $X_1, X_2,\cdots$ only.

1.1 Symmetrization inequalities

The following is a simplest symmetrization inequality

Theorem 1. Suppose that $P f = 0$ for all $f \in \mathcal{F}$. Let $\varepsilon_1, \cdots, \varepsilon_n$ be independent Rademacher random variables independent of $X_1, \cdots , X_n$. Let $\Phi : \mathbb{R}_+ \to \mathbb{R}_+$ be a non-decreasing convex function, and let $\mu :\mathcal{F} \to \mathbb{R}$ be a bounded functional such that $\{f + \mu(f) : f \in \mathcal{F}\}$ is pointwise measurable. Then $$ \mathbb{E}\left[\Phi \left( \frac{1}{2} \left\| \sum^n_{i = 1} \varepsilon_i f(X_i) \right\|_\mathcal{F} \right)\right] \leq \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i = 1} f(X_i) \right\|_\mathcal{F} \right)\right] \leq \mathbb{E}\left[\Phi \left( 2 \left\| \sum^n_{i = 1} \varepsilon_i (f(X_i) + \mu(f)) \right\|_\mathcal{F} \right)\right]. \cdots (1) $$
- Proof) We begin with proving the left inequality. We claim that for any disjoint index sets $A, B \subset \{1, 2, \cdots, n\},$ $$ \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i \in A} f(X_i) \right\|_\mathcal{F} \right)\right] \leq \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i \in A \cup B} f(X_i) \right\|_\mathcal{F} \right)\right]. \cdots (2) $$ Indeed, because of pointwise measurability, there exists a countable subset $\mathcal{G} \subset \mathcal{F}$ such that for any $f ∈ \mathcal{F}$ there exists a sequence $g_m ∈ \mathcal{G}$ with $g_m → f$ pointwise (? Bolzano–Weierstrass). Then $$ \left \| \sum_{i \in A} f(X_i) \right \|_\mathcal{F} = \left \| \sum_{i \in A} f(X_i) \right \|_\mathcal{G} = \left \| \sum_{i \in A} f(X_i) + \mathbb{E} \left[ \sum_{i \in B} f(X_i) \right] \right \|_\mathcal{G} $$ because $P f = 0$ for each $f ∈ \mathcal{F}$. Fixing any $x_i ∈ S, i ∈ A,$ we have that $$ \left \| \sum_{i \in A} f(x_i) + \mathbb{E} \left[ \sum_{i \in B} f(X_i) \right] \right \|_\mathcal{G} = \left \| \mathbb{E} \left[ \sum_{i \in A} f(x_i) + \sum_{i \in B} f(X_i) \right] \right \|_\mathcal{G} \leq \mathbb{E} \left[ \left \| \sum_{i \in A} f(x_i) + \sum_{i \in B} f(X_i) \right \|_\mathcal{G} \right] , $$ (? Jensen, every norm is convex) and since $\Phi$ is non-decreasing and convex, $$ \Phi \left( \left \| \sum_{i \in A} f(x_i) + \mathbb{E} \left[ \sum_{i \in B} f(X_i) \right] \right \|_\mathcal{G} \right) \leq \Phi \left( \mathbb{E} \left[ \left \| \sum_{i \in A} f(x_i) + \sum_{i \in B} f(X_i) \right \|_\mathcal{G} \right] \right) \leq \mathbb{E} \left[ \Phi \left( \left \| \sum_{i \in A} f(x_i) + \sum_{i \in B} f(X_i) \right \|_\mathcal{G} \right) \right], $$ where the second inequality follows from Jensen’s inequality. Applying Fubini’s theorem and using the fact that $ \| \sum_{i ∈ A∪B} f(X_i) \|_\mathcal{G} = \| \sum_{i ∈ A∪B} f(X_i) \|_\mathcal{F} $, we obtain the inequality (2). From this, we have \begin{aligned}
  \mathbb{E}_X \left[ \Phi \left( \left\| \sum^n_{i = 1} \varepsilon_i f(X_i) \right\|_\mathcal{F} \right) \right]
  & = \mathbb{E}_X \left[ \Phi \left( \left\| \sum^n_{\varepsilon_i = 1} f(X_i) - \sum^n_{\varepsilon_i = -1} f(X_i) \right\|_\mathcal{F} \right) \right] \\
  & \leq \frac{1}{2} \mathbb{E}_X \left[ \Phi \left( 2 \left\| \sum^n_{\varepsilon_i = 1} f(X_i) \right\|_\mathcal{F} \right) \right] + \frac{1}{2} \mathbb{E}_X \left[ \Phi \left( 2 \left\| \sum^n_{\varepsilon_i = -1} f(X_i) \right\|_\mathcal{F} \right) \right] \\
  & \leq \mathbb{E} \left[ \Phi \left( 2 \left\| \sum^n_{i = 1} f(X_i) \right\|_\mathcal{F} \right) \right].
  \end{aligned} An application of Fubini’s theorem leads to the left inequality in (1).
  For the opposite inequality, using the argument used to prove the inequality (2), we have that $$ \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i = 1} f(X_i) \right\|_\mathcal{F} \right)\right] = \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i = 1} (f(X_i) - \mathbb{E}[f(X_{n + i})] \right\|_\mathcal{F} \right)\right] \leq \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i = 1} (f(X_i) - f(X_{n + i})) \right\|_\mathcal{F} \right)\right]. \cdots (3) $$ Because $(X_i , X_{n + i}) \overset{d}{=} (X_{n + i} , X_i)$ for each $1 ≤ i ≤ n$, and $(X_i , X_{n + i}), 1 ≤ i ≤ n$ are independent, the last expression in (3) is equal to \begin{aligned}
  \mathbb{E}\left[\Phi \left( \left\| \sum^n_{i = 1} \varepsilon_i(f(X_i) - f(X_{n + i})) \right\|_\mathcal{F} \right)\right]
  & \leq \frac{1}{2} \mathbb{E}_X \left[ \Phi \left( 2 \left\| \sum^n_{i = 1} \varepsilon_i(f(X_i) + \mu(f)) \right\|_\mathcal{F} \right) \right] + \frac{1}{2} \mathbb{E}_X \left[ \Phi \left( 2 \left\| \sum^n_{i = 1} \varepsilon_i(f(X_{n + i}) + \mu(f)) \right\|_\mathcal{F} \right) \right] \\
  & = \mathbb{E} \left[ \Phi \left( 2 \left\| \sum^n_{i = 1} \varepsilon_i(f(X_i) + \mu(f)) \right\|_\mathcal{F} \right) \right].
  \end{aligned} This completes the proof.
- We will often use the symmetrization inequality with $\Phi(x) = x^p$ for some $p ≥ 1$ and $\mu(f) = Pf$ when $F$ is not $P$-centered. In that case, we have $$ \frac{1}{2^p}\mathbb{E}\left[ \left\| \sum^n_{i = 1} \varepsilon_i (f(X_i) - Pf) \right\|^p_\mathcal{F} \right] \leq \mathbb{E}\left[ \left\| \sum^n_{i = 1} (f(X_i) - Pf) \right\|^p_\mathcal{F} \right] \leq 2^p \mathbb{E}\left[ \left\| \sum^n_{i = 1} \varepsilon_i f(X_i) \right\|^p_\mathcal{F} \right].$$ There is an analogous symmetrization inequality for probabilities.

Theorem 2. Let $\varepsilon_1, \cdots, \varepsilon_n$ be independent Rademacher random variables independent of $X_1, \cdots , X_n$. Let $\mu :\mathcal{F} \to \mathbb{R}$ be a bounded functional such that $\{f + \mu(f) : f \in \mathcal{F}\}$ is pointwise measurable. Then for every $x > 0$, $$ \beta_n(x) \mathbb{P} \left\{ \left\| \sum^n_{i = 1} f(X_i) \right\|_\mathcal{F} > x \right\} \leq 2 \mathbb{P} \left\{ 4 \left\| \sum^n_{i = 1} \varepsilon_i( f(X_i) + \mu(f) ) \right\|_\mathcal{F} > x \right\} ,$$ where$\beta_n(x)$ is any constant such that $\beta_n(x) < \text{inf}_{f \in \mathcal{F}} P \{|\sum^n_{i = 1} f(X_i)| < x/2\}.$ In particular, when $Pf = 0$ for all $f ∈ \mathcal{F}$, we may take $\beta_n(x) = 1 − (4n / x^2) \text{sup}_{f ∈ \mathcal{F}} Pf^2.$
- Proof) The second assertion follows from Markov’s inequality. We shall prove the first assertion. If $\| \sum^n_{i = 1} f(X_{n + i}) \|_\mathcal{F} > x$, then there is a function $\tilde{f} ∈ \mathcal{F}$ that may depend on $X_{n + 1}, \cdots, X_{2n}$ such that $| \sum^n_{i = 1} \tilde{f}(X_{n + i}) | > x.$ For such $\tilde{f},$ we have \begin{aligned}
  \beta_n(x) & \leq \mathbb{P}\left\{ \left| \sum^n_{i = 1} \tilde{f}(X_i) \right| < \frac{x}{2} |  X_{n + 1}, \cdots, X_{2n} \right\} \\
  & \leq \mathbb{P}\left\{ \left| \sum^n_{i = 1} (\tilde{f}(X_i) - \tilde{f}(X_{n + i})) \right| > \frac{x}{2} |  X_{n + 1}, \cdots, X_{2n} \right\} \cdots (?) \\
  & \leq \mathbb{P}\left\{ \left\| \sum^n_{i = 1} (f(X_i) - f(X_{n + i})) \right\|_\mathcal{F} > \frac{x}{2} |  X_{n + 1}, \cdots, X_{2n} \right\}.
  \end{aligned} The far left and right hand sides do not depend on $\tilde{f}$, and the inequality between them is valid on the event $\{\| \sum^n_{i = 1} f(X_{n + i}) \|_\mathcal{F} > x\}.$ Hence we have $$ \beta_n(x) \mathbb{P} \left\{ \left\| \sum^n_{i = 1} f(X_{n + i}) \right\|_\mathcal{F} > x \right\} \leq \mathbb{P}\left\{ \left\| \sum^n_{i = 1} (f(X_i) - f(X_{n + i})) \right\|_\mathcal{F} > \frac{x}{2} \right\}. $$ Because $(X_i , X_{n + i}) \overset{d}{=} (X_{n + i} , X_i)$ for each $1 ≤ i ≤ n$, and $(X_i , X_{n + i}), 1 ≤ i ≤ n$ are independent, the last expression is equal to \begin{aligned}
  \mathbb{P}\left\{ \left\| \sum^n_{i = 1} \varepsilon_i (f(X_i) - f(X_{n + i})) \right\|_\mathcal{F} > \frac{x}{2} \right\}
  & \leq \mathbb{P}\left\{ \left\| \sum^n_{i = 1} \varepsilon_i (f(X_i) + \mu(f)) \right\|_\mathcal{F} > \frac{x}{4} \right\} + \mathbb{P}\left\{ \left\| \sum^n_{i = 1} \varepsilon_i (f(X_{n + i}) + \mu(f)) \right\|_\mathcal{F} > \frac{x}{4} \right\} \\
  & = 2 \mathbb{P}\left\{ \left\| \sum^n_{i = 1} \varepsilon_i (f(X_i) + \mu(f)) \right\|_\mathcal{F} > \frac{x}{4} \right\}.
  \end{aligned} This completes the proof.

1.2 The contraction principle

A function