IntroReproducing kernel Hilbert spaces are particularly important because of the celebrated representer theorem which states that every function in an RKHS that minimizes an empirical risk functional can be written as a linear combination of the kernel function evaluated at training samples.Spaces1. Vector space : A set of vectors with operations of addition and scalar multiplication satisfying ..
Lebesgue-Stieltjes Integral Definition Suppose $G(·)$ is a right-continuous, non-decreasing step function having jumps at $x_1, x_2, \cdots$. Then for any function $f(·)$, we define the integral $$ \int^b_a f(x) ~ d G(x) \equiv \sum_{j ~: ~ a < x_j \leq b} f(x_j) \cdot \{G(x_j) - G(x_j-)\} = \sum_{j ~ : ~ a < x_j \leq b} f(x_j) \cdot \Delta G(x_j) $$ where $ \Delta G(x_j) = G(x_j) - G(x_j-) $. T..
Radon-Nikodym theorem Let $\mu$ and $\nu$ be $\sigma-$finite measures on some measurable space $(X, \mathcal{A})$. Then there exists a measurable function $$ f = \frac{d\nu}{d\mu} $$ if and only if $\nu$ is absolutely continuous with respect to $\mu$. This function $f$ is called the Radon–Nikodym derivative.
proxy variable A measurable variable that is used in place of a variable that cannot be measured. For example, since husbands and wives usually have similar views, an interviewer might use the view expressed by a wife who is present in place of the view that could not be expressed by an absent husband.
Preliminary Gaussian concentration The centered Gaussian random variable $X$ on $\mathbb{R}$ with variance $\sigma^2 > 0$ has density given by $$ p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \text{exp}\left( \frac{-x^2}{2\sigma^2}\right).$$ Important propertiy of gaussian If $X \sim N(0, \sigma^2)$, we have, for any $t > 0$, $$P(|X| \geq t) \leq \frac{\sigma\sqrt{2}}{t\sqrt{\pi}} \text{exp} \left(\frac{..
Definition Suppose we have independent observations $z_i (i = 1,...,n)$ with expectations $\mu_i$ and variances $V(\mu_i)$, where $V$ is some known function. Later we shall relax this specification and say $var(z_i) \propto V(\mu_i)$. We suppose that for each observation $\mu_i$ is some known function of a set of parameters $\beta_1, \cdots, \beta_r$. Then for each observation we define the quas..
Setup Let $X_1, X_2, \cdots $ be independent variables, and let $w_{ijn}(\cdot, \cdot)$ be Borel functions such that $Var[w_{ijn}(X_i, X_j)]$ is finite. Put $$ W(n) = \sum_{1 \leq i \leq n} \sum_{1 \leq j \leq n} w_{ijn}(X_i, X_j), ~~~ (\text{such as } W(n) = \sum_{1 \leq i \leq n} \sum_{1 \leq j \leq n} a_{ij} X_i X_j) $$ and $$ W_{ij} = w_{ijn}(X_i, X_j) + w_{jin}(X_j, X_i). $$ The index $n$ i..
Big O and little o A sequence $x_n$ of non-random vectors is said to be $O$(1) if it is bounded and $o$(1) if it converges to zero. If an is a sequence of non-random positive scalars, then $$ x_n = O(a_n) \text{ means } \frac{x_n}{a_n} = O(1) $$ (that is, $\frac{x_n}{a_n}$ is bounded), and $$ x_n = o(a_n) \text{ means } \frac{x_n}{a_n} = o(1) $$ (that is, $\frac{x_n}{a_n}$ converges to zero). Bi..
Inner products Definition Defiition 1 (Inner product). A function $\langle \cdot , \cdot \rangle$ : $\mathbb{R}^n \times \mathbb{R}^n → \mathbb{R}$ is an inner product if $\langle x , x \rangle \geq 0$, $\langle x , x \rangle = 0 \Leftrightarrow x = 0$ (positivity) $\langle x , y \rangle = \langle y , x \rangle$ (symmetry) $\langle x + y , z \rangle = \langle x , z \rangle + \langle y , z \rangl..
Hoeffding's inequality Let $X_1, \cdots , X_n$ be independent random variables such that $a_{i} \leq X_{i} \leq b_{i}$ almost surely. Consider the sum of these random variables, $S_n = X_1 + \cdots + X_n.$ Then Hoeffding's theorem states that, for all $t > 0$, $$ P \left( S_{n} - E \left[S_{n}\right] \geq t \right) \leq \text{exp} \left( - \frac{2t^2}{\sum _{i = 1}^{n}(b_{i} - a_{i})^2} \right) ..