IntroReproducing kernel Hilbert spaces are particularly important because of the celebrated representer theorem which states that every function in an RKHS that minimizes an empirical risk functional can be written as a linear combination of the kernel function evaluated at training samples.Spaces1. Vector space : A set of vectors with operations of addition and scalar multiplication satisfying ..
Lebesgue-Stieltjes Integral Definition Suppose G(·)G(⋅) is a right-continuous, non-decreasing step function having jumps at x1,x2,⋯x1,x2,⋯. Then for any function f(·)f(⋅), we define the integral ∫baf(x)dG(x)≡∑j:a<xj≤bf(xj)⋅{G(xj)−G(xj−)}=∑j:a<xj≤bf(xj)⋅ΔG(xj)∫baf(x)dG(x)≡∑j:a<xj≤bf(xj)⋅{G(xj)−G(xj−)}=∑j:a<xj≤bf(xj)⋅ΔG(xj) where ΔG(xj)=G(xj)−G(xj−)ΔG(xj)=G(xj)−G(xj−). T..
Radon-Nikodym theorem Let μμ and νν be σ−σ−finite measures on some measurable space (X,A)(X,A). Then there exists a measurable function f=dνdμf=dνdμ if and only if νν is absolutely continuous with respect to μμ. This function ff is called the Radon–Nikodym derivative.
proxy variable A measurable variable that is used in place of a variable that cannot be measured. For example, since husbands and wives usually have similar views, an interviewer might use the view expressed by a wife who is present in place of the view that could not be expressed by an absent husband.
Preliminary Gaussian concentration The centered Gaussian random variable XX on R with variance σ2>0 has density given by p(x)=1√2πσ2exp(−x22σ2). Important propertiy of gaussian If X∼N(0,σ2), we have, for any t>0, $$P(|X| \geq t) \leq \frac{\sigma\sqrt{2}}{t\sqrt{\pi}} \text{exp} \left(\frac{..
Definition Suppose we have independent observations zi(i=1,...,n) with expectations μi and variances V(μi), where V is some known function. Later we shall relax this specification and say var(zi)∝V(μi). We suppose that for each observation μi is some known function of a set of parameters β1,⋯,βr. Then for each observation we define the quas..
Setup Let X1,X2,⋯ be independent variables, and let wijn(⋅,⋅) be Borel functions such that Var[wijn(Xi,Xj)] is finite. Put W(n)=∑1≤i≤n∑1≤j≤nwijn(Xi,Xj),(such as W(n)=∑1≤i≤n∑1≤j≤naijXiXj) and Wij=wijn(Xi,Xj)+wjin(Xj,Xi). The index n i..
Big O and little o A sequence xn of non-random vectors is said to be O(1) if it is bounded and o(1) if it converges to zero. If an is a sequence of non-random positive scalars, then xn=O(an) means xnan=O(1) (that is, xnan is bounded), and xn=o(an) means xnan=o(1) (that is, xnan converges to zero). Bi..
Inner products Definition Defiition 1 (Inner product). A function ⟨⋅,⋅⟩ : Rn×Rn→R is an inner product if ⟨x,x⟩≥0, ⟨x,x⟩=0⇔x=0 (positivity) ⟨x,y⟩=⟨y,x⟩ (symmetry) $\langle x + y , z \rangle = \langle x , z \rangle + \langle y , z \rangl..
Hoeffding's inequality Let X1,⋯,Xn be independent random variables such that ai≤Xi≤bi almost surely. Consider the sum of these random variables, Sn=X1+⋯+Xn. Then Hoeffding's theorem states that, for all t>0, $$ P \left( S_{n} - E \left[S_{n}\right] \geq t \right) \leq \text{exp} \left( - \frac{2t^2}{\sum _{i = 1}^{n}(b_{i} - a_{i})^2} \right) ..