Dependency graph

Legend

Boxes: definitions
Ellipses: theorems and lemmas
Blue border: the statement of this result is ready to be formalized; all prerequisites are done
Orange border: the statement of this result is not ready to be formalized; the blueprint needs more work
Blue background: the proof of this result is ready to be formalized; all prerequisites are done
Green border: the statement of this result is formalized
Green background: the proof of this result is formalized
Dark green background: the proof of this result and all its ancestors are formalized

Corollary 1.2.13

\begin{align*} \mathcal B_\xi (\mu , \nu ) = \frac{1}{2}\left((P \circ \xi )(\mathcal X) - (P \circ \xi )\left[x \mapsto \left\vert \xi _0\frac{d \mu }{d(P \circ \xi )}(x) - \xi _1\frac{d \nu }{d(P \circ \xi )}(x)\right\vert \right] \right) \: . \end{align*}

LaTeX Lean

Corollary 4.1.2

For \(\alpha \in (0,1)\),

\begin{align*} \mathcal B_\xi (\mu , \nu ) \le e^{-(1 - \alpha ) R_\alpha (\xi _0\mu , \xi _1\nu )} = \xi _0^\alpha \xi _1^{1-\alpha } e^{-(1 - \alpha ) R_\alpha (\mu , \nu )} \: . \end{align*}

LaTeX

Corollary 2.3.69Bretagnolle-Huber inequality

TODO: move this somewhere after the definition of \(KL\).

For \(\mu , \nu \in \mathcal P(\mathcal X)\),

\begin{align*} \operatorname{KL}(\mu , \nu ) & \ge - \log (1 - \operatorname{TV}^2(\mu , \nu )) \: , \\ \text{or equivalently, } \operatorname{TV}(\mu , \nu ) & \le \sqrt{1 - \exp \left(-\operatorname{KL}(\mu , \nu ) \right)} \: . \end{align*}

LaTeX

Corollary 2.3.33

For \(f: \mathbb {R} \to \mathbb {R}\) a convex function, for all \(x \in \mathbb {R}\) ,

\begin{align*} f(x) & = f(1) + f’_+(1) (x - 1) + \int _{y} \phi _{1,y}(x) \partial \gamma _f \: . \end{align*}

LaTeX Lean

Corollary 2.3.70

TODO: move this somewhere after the definition of \(\operatorname{H}_\alpha \).

Let \(\mu , \nu \in \mathcal P(\mathcal X)\). For \(\alpha {\gt} 0\),

\begin{align*} \operatorname{H}_{1+\alpha }(\mu , \nu ) & \ge \frac{1}{\alpha } \left( (1 - \operatorname{TV}(\mu , \nu ))^{-\alpha } - 2 \right) \: , \\ \text{or equivalently, } 1 - \operatorname{TV}(\mu , \nu ) & \ge \exp \left(-\frac{1}{\alpha } \log \left( 2 + \alpha \operatorname{H}_{1+\alpha }(\mu , \nu ) \right)\right) \: . \end{align*}

LaTeX

Corollary 2.3.59

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(E\) be an event. Then \(D_f(\mu , \nu ) \ge d_f(\mu (E), \nu (E))\).

LaTeX

Corollary 2.3.62

Let \(\mu , \nu \in \mathcal P(\mathcal X)\) and let \(\kappa : \mathcal X \rightsquigarrow [0,1]\). Then

\begin{align*} D_f(\mu , \nu ) \ge d_f((\kappa \circ \mu )[X], (\kappa \circ \nu )[X]) \: . \end{align*}

LaTeX

Corollary 2.3.39

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\xi \) be a measure on \(\mathcal Y\). Then \(D_f(\mu \times \xi , \nu \times \xi ) = D_f(\mu , \nu )\).

LaTeX

Corollary 2.3.48

Let \(a,b \in [0, +\infty )\) and let \(\mu , \nu \) be two measures on \(\mathcal X\).

\begin{align*} D_{\phi _{a,b}}(\mu , \nu ) = \mathcal I_{(a,b)}(\mu , \nu ) + \frac{1}{2} \left\vert a \mu (\mathcal X) - b \nu (\mathcal X) \right\vert + \text{sign}(b-a)\frac{1}{2}(a \mu (\mathcal X) - b \nu (\mathcal X)) \: . \end{align*}

LaTeX Lean

Corollary 2.9.8

For \(\mu , \nu \in \mathcal P(\mathcal X)\) and \(\alpha \in (0,1)\), \(\lambda \le 1/2\) ,

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) & \le \alpha ^{1 - \lambda } \operatorname{H}_{1 - \lambda }(\mu , \nu ) \: . \end{align*}

LaTeX

Corollary 4.1.19Change of measure - mean

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(E\) be an event on \(\mathcal X\). Let \(\beta \in \mathbb {R}\). Then

\begin{align*} \nu (E) e^{\operatorname{KL}(\mu , \nu ) + \beta } \ge \mu (E) - \mu \left\{ \log \frac{d \mu }{d \nu } - \operatorname{KL}(\mu , \nu ) {\gt} \beta \right\} \: . \end{align*}

LaTeX

Corollary 4.1.7

Let \(\alpha , \beta \in (0, 1)\). Let \(P, Q : \{ 0,1\} \rightsquigarrow \mathcal X\). We write \(\pi _\alpha \) for the probability measure on \(\{ 0,1\} \) with \(\pi _\alpha (\{ 0\} ) = \alpha \). Then

\begin{align*} \operatorname{KL}(\pi _\beta \otimes Q, \pi _\alpha \otimes P) & \ge \operatorname{kl}(B_\beta (Q(0), Q(1)), B_\alpha (P(0), P(1))) \: . \end{align*}

LaTeX

Corollary 4.1.8

Let \(\mu , \nu \in \mathcal P(\mathcal X)\) and let \(\alpha , \beta \in (0, 1)\). Let \(P, Q : \{ 0,1\} \rightsquigarrow \mathcal X\). We write \(\pi _\alpha \) for the probability measure on \(\{ 0,1\} \) with \(\pi _\alpha (\{ 0\} ) = \alpha \). Then

\begin{align*} \operatorname{KL}(\pi _\beta \otimes Q, \pi _\beta \otimes P) & \ge \operatorname{kl}(B_\beta (Q(0), Q(1)), B_\alpha (P(0), P(1))) - \operatorname{kl}(\beta , \alpha ) \: . \end{align*}

LaTeX

Corollary 4.1.6

Let \(\pi , \xi \in \mathcal P(\Theta )\) and \(P, Q : \Theta \rightsquigarrow \mathcal X\). Suppose that the loss \(\ell '\) takes values in \([0,1]\). Then

\begin{align*} \operatorname{KL}(\pi \otimes Q, \pi \otimes P) & \ge \operatorname{kl}(\mathcal R_\pi ^Q, \mathcal R_\xi ^P) - \operatorname{KL}(\pi , \xi ) \: . \end{align*}

LaTeX

Corollary 3.1.5

For \(\mu \in \mathcal X\), \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : \mathcal Y \rightsquigarrow \mathcal Z\) two Markov kernels,

\begin{align*} I(\mu \otimes (\eta \circ \kappa )) \le I(\mu \otimes \kappa ) \: . \end{align*}

LaTeX

Corollary 3.2.3

For \(\mu \in \mathcal P(\{ 0,1\} )\) and \(\kappa : \{ 0,1\} \rightsquigarrow \mathcal Y\),

\begin{align*} I_{\operatorname{KL}}^R(\mu , \kappa ) = \operatorname{JS}_{\mu _0}(\kappa _0, \kappa _1) \: . \end{align*}

LaTeX

Corollary 2.7.6

Let \(\mu , \nu \) be two probability measures. Then

\begin{align*} 1 - \sqrt{1 - (1 - \operatorname{H^2}(\mu , \nu ))^2} \le 1 - \operatorname{TV}(\mu , \nu ) \le 1 - \operatorname{H^2}(\mu , \nu ) \: . \end{align*}

LaTeX

Corollary 2.7.5

Let \(\mu , \nu \) be two probability measures. Then

\begin{align*} \frac{1}{2}(1 - \operatorname{H^2}(\mu , \nu ))^2 \le 1 - \operatorname{TV}(\mu , \nu ) - \frac{1}{2}(1 - \operatorname{TV}(\mu , \nu ))^2 \: . \end{align*}

LaTeX

Corollary 2.7.7

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\) and let \(n \in \mathbb {N}\), and \(\mu ^{\otimes n}, \nu ^{\otimes n}\) be their product measures on \(\mathcal X^n\). Then

\begin{align*} \frac{1}{2}e^{-n R_{1/2}(\mu , \nu )} \le 1 - \sqrt{1 - e^{-n R_{1/2}(\mu , \nu )}} \le 1 - \operatorname{TV}(\mu ^{\otimes n}, \nu ^{\otimes n}) \le e^{-\frac{1}{2} n R_{1/2}(\mu , \nu )} \: . \end{align*}

LaTeX

Corollary 2.6.18

Let \(\mu , \nu , \xi \) be three measures on \(\mathcal X\) and let \(\alpha \in (0, 1)\). Then

\begin{align*} (1 - \alpha ) R_\alpha (\mu , \nu ) = \alpha \operatorname{KL}(\mu ^{(\alpha , \nu )}, \mu ) + (1 - \alpha )\operatorname{KL}(\mu ^{(\alpha , \nu )}, \nu ) \: . \end{align*}

LaTeX

Corollary 2.6.23

Let \(\mu , \nu \in \mathcal P(\mathcal X)\) and \(\xi , \lambda \in \mathcal P(\mathcal Y)\). Then \(R_\alpha (\mu \times \xi , \nu \times \lambda ) = R_\alpha (\mu , \nu ) + R_\alpha (\xi , \lambda )\).

LaTeX

Lemma B.2.5

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a finite kernel. Then for \((\nu \otimes \kappa )\)-almost all \((x, y)\), \(\frac{d (\mu \otimes \kappa )}{d (\nu \otimes \kappa )}(x,y) = \frac{d\mu }{d\nu }(x)\).

LaTeX Lean

Lemma B.2.9

Let \(\mu \in \mathcal M(\mathcal X)\) be a finite measure and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Then \((\mu \otimes \eta )\)-almost surely,

\begin{align*} \frac{d (\mu \otimes \kappa )}{d (\mu \otimes \eta )} = \frac{d \kappa }{d \eta } \: . \end{align*}

LaTeX Lean

Corollary B.1.2

For all \(x \in \mathcal X\), for \(\eta (x)\)-almost all \(y \in \mathcal Y\), \(\frac{d \kappa }{d \eta }(x, y) = \frac{d \kappa (x)}{d \eta (x)}(y)\).

LaTeX Lean

Corollary 2.2.6

Let \(\mu , \nu \) be two measures on \(\mathcal X\), \(\xi \in \mathcal M(\{ 0,1\} )\) and let \(E\) be an event on \(\mathcal X\). Let \(\mu _E\) and \(\nu _E\) be the two Bernoulli distributions with respective means \(\mu (E)\) and \(\nu (E)\). Then \(\mathcal I_\xi (\mu , \nu ) \ge \mathcal I_\xi (\mu _E, \nu _E)\).

LaTeX Lean

Corollary 2.2.10

For finite measures \(\mu , \nu \) and \(\xi \in \mathcal M(\{ 0,1\} )\),

\begin{align*} \mathcal I_\xi (\mu , \nu ) & = -\frac{1}{2} \left\vert \xi _0 \mu (\mathcal X) - \xi _1 \nu (\mathcal X)\right\vert + \frac{1}{2}\left( \nu \left[ x \mapsto \left\vert \xi _0\frac{d \mu }{d\nu }(x) - \xi _1 \right\vert \right] + \xi _0 \mu _{\perp \nu }(\mathcal X)\right) \: . \end{align*}

LaTeX Lean

Corollary 4.1.13

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\) and \(E\) an event. Then

\begin{align*} \sqrt{\mu (E)} + \sqrt{\nu (E^c)} \ge \exp \left(-\frac{1}{2} R_{1/2}(\mu , \nu )\right) = 1 - \operatorname{H^2}(\mu , \nu ) \: . \end{align*}

LaTeX

Definition 8

The Bayes binary risk between measures \(\mu \) and \(\nu \) with respect to prior \(\xi \in \mathcal M(\{ 0,1\} )\), denoted by \(\mathcal B_\xi (\mu , \nu )\), is the Bayes risk \(\mathcal R^P_\xi \) for \(\Theta = \mathcal Y = \mathcal Z = \{ 0,1\} \), \(\ell (y,z) = \mathbb {I}\{ y \ne z\} \), \(P\) the kernel sending 0 to \(\mu \) and 1 to \(\nu \) and prior \(\xi \). That is,

\begin{align*} \mathcal B_\xi (\mu , \nu ) = \inf _{\hat{y} : \mathcal X \rightsquigarrow \{ 0,1\} }\left(\xi _0 (\hat{y} \circ \mu )(\{ 1\} ) + \xi _1 (\hat{y} \circ \nu )(\{ 0\} )\right) \: , \end{align*}

in which the infimum is over Markov kernels.

If the prior is a probability measure with weights \((\pi , 1 - \pi )\), we write \(B_\pi (\mu , \nu ) = \mathcal B_{(\pi , 1 - \pi )}(\mu , \nu )\) .

LaTeX Lean

Definition 4Bayes estimator

An estimator \(\hat{y}\) is said to be a Bayes estimator for a prior \(\pi \in \mathcal P(\Theta )\) if \(R^P_\pi (\hat{y}) = \mathcal R^P_\pi \).

LaTeX Lean

Definition 2Bayesian risk

The Bayesian risk of an estimator \(\hat{y}\) on \((P, y, \ell ')\) for a prior \(\pi \in \mathcal M(\Theta )\) is \(R^P_\pi (\hat{y}) = \pi \left[\theta \mapsto r^P_\theta (\hat{y})\right]\) . It can also be expanded as \(R^P_\pi (\hat{y}) = (\pi \otimes (\hat{y} \circ P))\left[ (\theta , z) \mapsto \ell '(y(\theta ), z) \right]\) .

LaTeX Lean

Definition 48

For \(\mu \in \mathcal M(\mathcal X)\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\), a Bayesian inverse of \(\kappa \) is a Markov kernel \(\kappa _\mu ^\dagger : \mathcal Y \rightsquigarrow \mathcal X\) such that \(\mu \otimes \kappa = ((\kappa \circ \mu ) \otimes \kappa _\mu ^\dagger )_\leftrightarrow \) in which \((\cdot )_\leftrightarrow \) denotes swapping the two coordinates. If such an inverse exists it is unique up to a \((\kappa \circ \mu )\)-null set, and we talk about the Bayesian inverse of \(\kappa \) with respect to \(\mu \).

LaTeX Lean

Definition 3Bayes risk

The Bayes risk of \((P, y, \ell ')\) for prior \(\pi \in \mathcal M(\Theta )\) is \(\mathcal R^P_\pi = \inf _{\hat{y} : \mathcal X \rightsquigarrow \mathcal Z} R^P_\pi (\hat{y})\) , where the infimum is over Markov kernels.

The Bayes risk of \((P, y, \ell ')\) is \(\mathcal R^*_B = \sup _{\pi \in \mathcal P(\Theta )} \mathcal R^P_\pi \: .\)

LaTeX Lean

Definition 35

The sample complexity of simple binary hypothesis testing with prior \((\pi , 1 - \pi ) \in \mathcal P(\{ 0, 1\} )\) at risk level \(\delta \in \mathbb {R}_{+, \infty }\) is

\begin{align*} n(\mu , \nu , \pi , \delta ) = \min \{ n \in \mathbb {N} \mid B_\pi (\mu ^{\otimes n}, \nu ^{\otimes n}) \le \delta \} \: . \end{align*}

This is the sample complexity \(n_\xi ^P(\delta )\) of Definition 9 specialized to simple binary hypothesis testing.

LaTeX

Definition 47Blackwell sufficiency

We define a partial order on kernels by the following. Let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : \mathcal X \rightsquigarrow \mathcal Z\) (with same domain as \(\kappa \)). Then \(\kappa \) is Blackwell sufficient for \(\eta \), denoted by \(\eta \le _B \kappa \), if there exists a Markov kernel \(\xi : \mathcal Y \rightsquigarrow \mathcal Z\) such that \(\eta = \xi \circ \kappa \).

LaTeX

Definition 29Chernoff divergence

The Chernoff divergence of order \(\alpha {\gt} 0\) between two measures \(\mu \) and \(\nu \) on \(\mathcal X\) is

\begin{align*} C_\alpha (\mu , \nu ) = \inf _{\xi \in \mathcal P(\mathcal X)}\max \{ R_\alpha (\xi , \mu ), R_\alpha (\xi , \nu )\} \: . \end{align*}

LaTeX Lean

Definition 12Conditional divergence

Let \(D\) be a divergence. The conditional divergence of kernels \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) with respect to a measure \(\mu \in \mathcal M(\mathcal X)\) is \(\mu [x \mapsto D(\kappa (x), \eta (x)]\). It is denoted by \(D(\kappa , \eta \mid \mu )\).

LaTeX

Definition 18Conditional f-divergence

Let \(f : \mathbb {R} \to \mathbb {R}\), \(\mu \) a measure on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) two Markov kernels from \(\mathcal X\) to \(\mathcal Y\). The conditional f-divergence between \(\kappa \) and \(\eta \) with respect to \(\mu \) is

\begin{align*} D_f(\kappa , \eta \mid \mu ) = \mu \left[x \mapsto D_f(\kappa (x), \eta (x))\right] \end{align*}

if \(x \mapsto D_f(\kappa (x), \eta (x))\) is \(\mu \)-integrable and \(+\infty \) otherwise.

LaTeX Lean

Definition 24Conditional Hellinger \(\alpha \)-divergence

Let \(\mu \) be a measure on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two kernels. The conditional Hellinger divergence of order \(\alpha \in (0,+\infty ) \backslash \{ 1\} \) between \(\kappa \) and \(\eta \) conditionally to \(\mu \) is

\begin{align*} \operatorname{H}_\alpha (\kappa , \eta \mid \mu ) = D_{f_\alpha }(\kappa , \eta \mid \mu ) \: , \end{align*}

for \(f_\alpha : x \mapsto \frac{x^{\alpha } - 1}{\alpha - 1}\).

LaTeX Lean

Definition 22Conditional Kullback-Leibler divergence

Let \(\mu \) be a measure on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two kernels. The conditional Kullback-Leibler divergence between \(\kappa \) and \(\eta \) with respect to \(\mu \) is

\begin{align*} \operatorname{KL}(\kappa , \eta \mid \mu ) = \mu \left[x \mapsto \operatorname{KL}(\kappa (x), \eta (x))\right] \end{align*}

if \(x \mapsto \operatorname{KL}(\kappa (x), \eta (x))\) is \(\mu \)-integrable and \(+\infty \) otherwise.

LaTeX Lean

Definition 32

Let \(\kappa : \mathcal Z \rightsquigarrow \mathcal X \times \mathcal Y\). The conditional mutual information of \(\kappa \) with respect to \(\nu \in \mathcal M(\mathcal Z)\) is

\begin{align*} I(\kappa \mid \nu ) = \operatorname{KL}(\kappa , \kappa _X \times \kappa _Y \mid \nu ) \: . \end{align*}

LaTeX

Definition 26Conditional Rényi divergence

Let \(\mu \) be a measure on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two kernels. The conditional Rényi divergence of order \(\alpha \in (0,+\infty ) \backslash \{ 1\} \) between \(\kappa \) and \(\eta \) conditionally to \(\mu \) is

\begin{align*} R_\alpha (\kappa , \eta \mid \mu ) =\frac{1}{\alpha - 1} \log (1 + (\alpha - 1) \operatorname{H}_\alpha (\kappa , \eta \mid \mu )) \: . \end{align*}

LaTeX Lean

Definition 19Curvature measure

Let \(f: \mathbb {R} \to \mathbb {R}\) be a convex function. Then its right derivative \(f'_+(x) \coloneqq \lim _{y \downarrow x}\frac{f(y) - f(x)}{y - x}\) is a Stieltjes function (a monotone right continuous function) and it defines a measure \(\gamma _f\) on \(\mathbb {R}\) by \(\gamma _f((x,y]) \coloneqq f'_+(y) - f'_+(x)\) . [ Lie12 ] calls \(\gamma _f\) the curvature measure of \(f\).

LaTeX Lean

Definition 14

The DeGroot statistical information between finite measures \(\mu \) and \(\nu \) for \(\pi \in [0,1]\) is \(I_\pi (\mu , \nu ) = \mathcal I_{(\pi , 1 - \pi )}(\mu , \nu )\) .

LaTeX Lean

Definition 50

We define \(f'(\infty ) := \limsup _{x \to + \infty } f(x)/x\). This can be equal to \(+\infty \) (but not \(-\infty \)).

LaTeX Lean

Definition 38Deterministic kernel

The deterministic kernel defined by a measurable function \(f : \mathcal X \to \mathcal Y\) is the kernel \(d_f: \mathcal X \rightsquigarrow \mathcal Y\) defined by \(d_f(x) = \delta _{f(x)}\), where for any \(y \in \mathcal Y\), \(\delta _y\) is the Dirac probability measure at \(y\).

LaTeX Lean

Definition 10Divergence

A divergence between measures is a function \(D\) which for any measurable space \(\mathcal X\) and any two measures \(\mu , \nu \in \mathcal M(\mathcal X)\), returns a value \(D(\mu , \nu ) \in \mathbb {R} \cup \{ +\infty \} \).

LaTeX

Definition 11Data-processing inequality

A divergence \(D\) is said to satisfy the data-processing inequality (DPI) if for all measurable spaces \(\mathcal X, \mathcal Y\), all \(\mu , \nu \in \mathcal M(\mathcal X)\) and all Markov kernels \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\),

\begin{align*} D(\kappa \circ \mu , \kappa \circ \nu ) \le D(\mu , \nu ) \: . \end{align*}

LaTeX

Definition 15

The \(E_\gamma \) or hockey-stick divergence between finite measures \(\mu \) and \(\nu \) for \(\gamma \in (0,+\infty )\) is \(E_\gamma (\mu , \nu ) = \mathcal I_{(1,\gamma )}(\mu , \nu )\) .

LaTeX Lean

Definition 17f-divergence

Let \(f : \mathbb {R} \to \mathbb {R}\) and let \(\mu , \nu \) be two measures on a measurable space \(\mathcal X\). The f-divergence between \(\mu \) and \(\nu \) is

\begin{align*} D_f(\mu , \nu ) = \nu \left[x \mapsto f\left(\frac{d \mu }{d \nu }(x)\right)\right] + f’(\infty ) \mu _{\perp \nu }(\mathcal X) \end{align*}

if \(x \mapsto f\left(\frac{d \mu }{d \nu }(x)\right)\) is \(\nu \)-integrable and \(+\infty \) otherwise.

LaTeX Lean

Definition 39Finite kernel

A kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) is said to be finite if there exists \(C {\lt} +\infty \) such that for all \(x \in \mathcal X\), \(\kappa (x)(\mathcal Y) \le C\).

LaTeX Lean

Definition 6Generalized Bayes estimator

The generalized Bayes estimator for prior \(\pi \in \mathcal P(\Theta )\) on \((P, y, \ell ')\) is the deterministic estimator \(\mathcal X \to \mathcal Z\) given by

\begin{align*} x \mapsto \arg \min _z P_\pi ^\dagger (x)\left[\theta \mapsto \ell ’(y(\theta ), z)\right] \: , \end{align*}

if there exists such a measurable argmin.

LaTeX Lean

Definition 28Squared Hellinger distance

Let \(\mu , \nu \) be two measures. The squared Hellinger distance between \(\mu \) and \(\nu \) is

\begin{align*} \operatorname{H^2}(\mu , \nu ) = D_f(\mu , \nu ) \quad \text{with } f: x \mapsto \frac{1}{2}\left( 1 - \sqrt{x} \right)^2 \: . \end{align*}

LaTeX Lean

Definition 23Hellinger \(\alpha \)-divergence

Let \(\mu , \nu \) be two measures on \(\mathcal X\). The Hellinger divergence of order \(\alpha \in [0,+\infty )\) between \(\mu \) and \(\nu \) is

\begin{align*} \operatorname{H}_\alpha (\mu , \nu ) = \left\{ \begin{array}{ll} \nu \{ x \mid \frac{d\mu }{d\nu } (x) = 0\} & \text{for } \alpha = 0 \\ \operatorname{KL}(\mu , \nu ) & \text{for } \alpha = 1 \\ D_{f_\alpha }(\mu , \nu ) & \text{for } \alpha \in (0,+\infty ) \backslash \{ 1\} \end{array}\right. \end{align*}

with \(f_\alpha : x \mapsto \frac{x^{\alpha } - 1}{\alpha - 1}\).

LaTeX Lean

Definition 30Jensen-Shannon divergence

The Jensen-Shannon divergence indexed by \(\alpha \in (0,1)\) between two measures \(\mu \) and \(\nu \) is

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) = \alpha \operatorname{KL}(\mu , \alpha \mu + (1 - \alpha )\nu ) + (1 - \alpha ) \operatorname{KL}(\nu , \alpha \mu + (1 - \alpha )\nu ) \: . \end{align*}

LaTeX

Definition 37Kernel

Let \(\mathcal X, \mathcal Y\) be two measurable spaces. A probability transition kernel (or simply kernel) from \(\mathcal X\) to \(\mathcal Y\) is a measurable map from \(\mathcal X\) to \(\mathcal M (\mathcal Y)\), the measurable space of measures on \(\mathcal Y\). We write \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) for a kernel \(\kappa \) from \(\mathcal X\) to \(\mathcal Y\).

LaTeX Lean

Definition 44Composition

Let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : \mathcal Y \rightsquigarrow \mathcal Z\) be two kernels. The composition of \(\kappa \) and \(\eta \) is the kernel \(\eta \circ \kappa : \mathcal X \rightsquigarrow \mathcal Z\) such that for all measurable functions \(f : \mathcal Z \to \mathbb {R}_{+,\infty }\) and all \(x \in \mathcal X\),

\begin{align*} \int _z f(z) \partial (\eta \circ \kappa )(x) = \int _y \int _z f(z) \partial \eta (y) \partial \kappa (x) \: . \end{align*}

LaTeX Lean

Definition 42Composition-product

Let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : (\mathcal X \times \mathcal Y) \rightsquigarrow \mathcal Z\) be two s-finite kernels. The composition-product of \(\kappa \) and \(\eta \) is a kernel \(\kappa \otimes \eta : \mathcal X \rightsquigarrow (\mathcal Y \times \mathcal Z)\) such that for all measurable functions \(f : \mathcal Y \times \mathcal Z \to \mathbb {R}_{+,\infty }\) and \(x \in \mathcal X\) ,

\begin{align*} \int _{y,z} f(y,z) \partial (\kappa \otimes \eta )(x) = \int _y \int _z f(y,z) \partial \eta (x,y) \partial \kappa (x) \: . \end{align*}

LaTeX Lean

Definition 46Parallel product

Let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : \mathcal X' \rightsquigarrow \mathcal Y'\) be two s-finite kernels. The parallel product of \(\kappa \) and \(\eta \) is the kernel \(\kappa \parallel \eta : \mathcal X \times \mathcal X' \rightsquigarrow \mathcal Y \times \mathcal Y'\) such that for all measurable functions \(f : \mathcal Y \times \mathcal Y' \to \mathbb {R}_{+,\infty }\) and all \(x \in \mathcal X \times \mathcal X'\),

\begin{align*} \int _{y,y'} f(y,y’) \partial (\kappa \parallel \eta )(x,x’) = \int _y \int _{y'} f(y,y’) \partial \eta (x’) \partial \kappa (x) \: . \end{align*}

LaTeX Lean

Definition 45Product

Let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : \mathcal X \rightsquigarrow \mathcal Z\) be two s-finite kernels. The product of \(\kappa \) and \(\eta \) is the kernel \(\kappa \times \eta : \mathcal X \rightsquigarrow \mathcal Y \times \mathcal Z\) such that for all measurable functions \(f : \mathcal Y \times \mathcal Z \to \mathbb {R}_{+,\infty }\) and all \(x \in \mathcal X\),

\begin{align*} \int _{y,z} f(y,z) \partial (\kappa \times \eta )(x) = \int _y \int _z f(y,z) \partial \eta (x) \partial \kappa (x) \: . \end{align*}

LaTeX Lean

Definition 49

Let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. The Radon-Nikodym derivative of \(\kappa \) with respect to \(\eta \), denoted by \(\frac{d \kappa }{d \eta }\), is a measurable function \(\mathcal X \times \mathcal Y \to \mathbb {R}_{+, \infty }\) with \(\kappa = \frac{d \kappa }{d \eta } \cdot \eta + \kappa _{\perp \eta }\), where for all \(x\), \(\kappa _{\perp \eta }(x) \perp \eta (x)\).

LaTeX Lean

Definition 21Kullback-Leibler divergence

Let \(\mu , \nu \) be two measures on \(\mathcal X\). The Kullback-Leibler divergence between \(\mu \) and \(\nu \) is

\begin{align*} \operatorname{KL}(\mu , \nu ) = \mu \left[\log \frac{d \mu }{d \nu }\right] \end{align*}

if \(\mu \ll \nu \) and \(x \mapsto \log \frac{d \mu }{d \nu }(x)\) is \(\mu \)-integrable and \(+\infty \) otherwise.

LaTeX Lean

Definition 40Markov kernel

A kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) is said to be a Markov kernel if for all \(x \in \mathcal X\), \(\kappa (x)\) is a probability measure.

LaTeX Lean

Definition 43Composition-product of a measure and a kernel

Let \(\mu \in \mathcal M(\mathcal X)\) be an s-finite measure and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be an s-finite kernel. Let \(\mathcal U\) be a measurable space with a unique element \(u\). Let \(\mu _k : \mathcal U \rightsquigarrow \mathcal X\) be the constant kernel with value \(\mu \). The composition-product of \(\mu \) and \(\kappa \) is the measure on \(\mathcal M(\mathcal X \times \mathcal Y)\) defined by \((\mu _k \otimes \kappa ) (u)\) .

LaTeX Lean

Definition 36Measurable space of measures

Let \(\mathcal B\) be the Borel \(\sigma \)-algebra on \(\mathbb {R}_{+,\infty }\). Let \(\mathcal X\) be a measurable space. For a measurable set \(s\) of \(\mathcal X\), let \((\mu \mapsto \mu (s))^* \mathcal B\) be the \(\sigma \)-algebra on \(\mathcal M(\mathcal X)\) defined by the comap of the evaluation function at \(s\). Then \(\mathcal M(\mathcal X)\) is a measurable space with \(\sigma \)-algebra \(\bigsqcup _{s} (\mu \mapsto \mu (s))^* \mathcal B\) where the supremum is over all measurable sets \(s\).

LaTeX Lean

Definition 5Minimax risk

The minimax risk of \((P, y, \ell ')\) is \(\mathcal R^* = \inf _{\hat{y} : \mathcal X \rightsquigarrow \mathcal Z} \sup _{\theta \in \Theta } r^P_\theta (\hat{y})\) .

LaTeX Lean

Definition 31

The mutual information is, for \(\rho \in \mathcal M(\mathcal X \times \mathcal Y)\) ,

\begin{align*} I(\rho ) = \operatorname{KL}(\rho , \rho _X \times \rho _Y) \: . \end{align*}

LaTeX

Definition 33

Let \(D\) be a divergence between measures. The left \(D\)-mutual information for a measure \(\mu \in \mathcal M(\mathcal X)\) and a kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) is

\begin{align*} I_D^L(\mu , \kappa ) = \inf _{\xi \in \mathcal P(\mathcal Y)} D(\mu \times \xi , \mu \otimes \kappa ) \: . \end{align*}

LaTeX

Definition 34

Let \(D\) be a divergence between measures. The right \(D\)-mutual information for a measure \(\mu \in \mathcal M(\mathcal X)\) and a kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) is

\begin{align*} I_D^R(\mu , \kappa ) = \inf _{\xi \in \mathcal P(\mathcal Y)} D(\mu \otimes \kappa , \mu \times \xi ) \: . \end{align*}

LaTeX

Definition 9

The sample complexity of Bayesian estimation with respect to a prior \(\pi \in \mathcal M(\Theta )\) at risk level \(\delta \in \mathbb {R}_{+,\infty }\) is

\begin{align*} n_\pi ^P(\delta ) = \min \{ n \in \mathbb {N} \mid \mathcal R_\pi ^{P^{\otimes n}} \le \delta \} \: . \end{align*}

LaTeX

Definition 25Rényi divergence

Let \(\mu , \nu \) be two measures on \(\mathcal X\). The Rényi divergence of order \(\alpha \in \mathbb {R}\) between \(\mu \) and \(\nu \) is

\begin{align*} R_\alpha (\mu , \nu ) = \left\{ \begin{array}{ll} \operatorname{KL}(\mu , \nu ) & \text{for } \alpha = 1 \\ \frac{1}{\alpha - 1} \log (\nu (\mathcal X) + (\alpha - 1) \operatorname{H}_\alpha (\mu , \nu )) & \text{for } \alpha \neq 1 \end{array}\right. \end{align*}

LaTeX Lean

Definition 27

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\alpha \in (0, +\infty ) \backslash \{ 1\} \). Let \(p = \frac{d \mu }{d (\mu + \nu )}\) and \(q = \frac{d \nu }{d (\mu + \nu )}\). We define a measure \(\mu ^{(\alpha , \nu )}\), absolutely continuous with respect to \(\mu + \nu \) with density

\begin{align*} \frac{d \mu ^{(\alpha , \nu )}}{d (\mu + \nu )} = p^\alpha q^{1 - \alpha } e^{-(\alpha - 1)R_\alpha (\mu , \nu )} \: . \end{align*}

LaTeX Lean

Definition 1Risk

The risk of an estimator \(\hat{y}\) on the estimation problem \((P, y, \ell ')\) at \(\theta \in \Theta \) is \(r^P_\theta (\hat{y}) = (\hat{y} \circ P)(\theta )\left[z \mapsto \ell '(y(\theta ), z)\right]\) .

LaTeX Lean

Definition 7

The Bayes risk increase \(I^P_{\pi }(\kappa )\) of a kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal X'\) with respect to the estimation problem \((P, y, \ell ')\) and the prior \(\pi \in \mathcal M(\Theta )\) is the difference of the Bayes risk of \((\kappa \circ P, y, \ell ')\) and that of \((P, y, \ell ')\). That is,

\begin{align*} I^P_{\pi }(\kappa ) & = \mathcal R^{\kappa \circ P}_\pi - \mathcal R^P_\pi \\ & = \inf _{\hat{y} : \mathcal X' \rightsquigarrow \mathcal Z} (\pi \otimes (\hat{y} \circ \kappa \circ P))\left[(\theta , z) \mapsto \ell ’(y(\theta ), z)\right] - \inf _{\hat{y} : \mathcal X \rightsquigarrow \mathcal Z} (\pi \otimes (\hat{y} \circ P))\left[(\theta , z) \mapsto \ell ’(y(\theta ), z)\right] \: . \end{align*}

LaTeX Lean

Definition 41s-finite kernel

A kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) is said to be a s-finite if it is equal to a countable sum of finite kernels.

LaTeX Lean

Definition 13

The statistical information between measures \(\mu \) and \(\nu \) with respect to prior \(\xi \in \mathcal M(\{ 0,1\} )\) is \(\mathcal I_\xi (\mu , \nu ) = \min \{ \xi _0 \mu (\mathcal X), \xi _1 \nu (\mathcal X)\} - \mathcal B_\xi (\mu , \nu )\). This is the risk increase \(I_\xi ^P(d_{\mathcal X})\) in the binary hypothesis testing problem for \(d_{\mathcal X} : \mathcal X \rightsquigarrow *\) the Markov kernel to the point space.

LaTeX Lean

Definition 20

For \(a,b \in (0, +\infty )\) let \(\phi _{a,b} : \mathbb {R} \to \mathbb {R}\) be the function defined by

\begin{align*} \phi _{a,b}(x) & = \max \left\{ 0, a x - b \right\} & \text{ for } a \le b \: , \\ \phi _{a,b}(x) & = \max \left\{ 0, b - a x \right\} & \text{ for } a {\gt} b \: . \end{align*}

LaTeX Lean

Definition 16

The total variation distance between finite measures \(\mu \) and \(\nu \) is \(\operatorname{TV}(\mu , \nu ) = \mathcal I_{(1,1)}(\mu , \nu )\) .

LaTeX Lean

Lemma B.2.1

Let \(\mu , \nu \) be two \(\sigma \)-finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two s-finite kernels. Then

if \(\mu \otimes \kappa \ll \nu \otimes \eta \) then \(\mu \otimes \kappa \ll \mu \otimes \eta \),
if \(\mu \otimes \kappa \ll \nu \otimes \eta \) and \(\kappa (x) \ne 0\) for all \(x\) then \(\mu \ll \nu \),
if \(\mu \ll \nu \) and \(\mu \otimes \kappa \ll \mu \otimes \eta \) then \(\mu \otimes \kappa \ll \nu \otimes \eta \).

In particular,

if \(\kappa (x) \ne 0\) for all \(x\) then \(\mu \otimes \kappa \ll \nu \otimes \eta \iff \left( \mu \ll \nu \ \wedge \ \mu \otimes \kappa \ll \mu \otimes \eta \right)\) .
If \(\mu \ll \nu \) then \(\mu \otimes \kappa \ll \nu \otimes \eta \iff \mu \otimes \kappa \ll \mu \otimes \eta \) .

LaTeX

Lean

MeasureTheory.Measure.absolutelyContinuous_compProd_of_compProd
MeasureTheory.Measure.absolutelyContinuous_compProd_of_compProd'
MeasureTheory.Measure.absolutelyContinuous_of_compProd

Lemma B.2.10

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) be two \(\sigma \)-finite measures and let \(p\) be a predicate on \(\mathcal X\). If \(p\) is true \(\mu \)-almost surely, then for \(\nu \)-almost all \(x\), either \(\frac{d\mu }{d\nu }(x) = 0\) or \(p(x)\).

LaTeX Lean

Lemma 1.2.14

Let \(\hat{y}_B\) be the generalized Bayes estimator for simple binary hypothesis testing. The distribution \(\ell \circ (\mathrm{id} \parallel \hat{y}_B) \circ (\pi \otimes P)\) (in which \(\ell \) stands for the associated deterministic kernel) is a Bernoulli with mean \(\mathcal B_\pi (\mu , \nu )\).

LaTeX

Lemma 1.2.11

\begin{align*} \mathcal B_\xi (\mu , \nu ) = \inf _{E \text{ event}} \left( \xi _0 \mu (E) + \xi _1 \nu (E^c) \right) \: . \end{align*}

LaTeX Lean

Lemma 4.1.1 [ ZL18 ]

Let \(\zeta \) be a measure such that \(\mu \ll \zeta \) and \(\nu \ll \zeta \). Let \(p = \frac{d \mu }{d\zeta }\) and \(q = \frac{d \nu }{d\zeta }\). For \(\alpha \in (0,1)\), for \(g_\alpha (x) = \min \{ (\alpha -1)x, \alpha x\} \),

\begin{align*} \mathcal B_\xi (\mu , \nu ) = e^{-(1 - \alpha ) R_\alpha (\xi _0\mu , \xi _1\nu )} \int _x \exp \left(g_\alpha \left( \log \frac{\xi _1 q(x)}{\xi _0 p(x)} \right)\right) \partial (\xi _0\mu )^{(\alpha , \xi _1\nu )} \: . \end{align*}

LaTeX

Lemma 1.2.8

For all measures \(\mu , \nu \), \(\mathcal B_\xi (\mu , \nu ) \le \min \{ \xi _0 \mu (\mathcal X), \xi _1 \nu (\mathcal X)\} \) .

LaTeX Lean

Lemma 4.1.3

\begin{align*} \mathcal B_\xi (\mu , \nu ) \le e^{- C_1(\xi _0\mu , \xi _1\nu )} \: . \end{align*}

LaTeX

Lemma 1.2.3

For all \(a, b {\gt} 0\), \(\mathcal B_\xi (\mu , \nu ) = \mathcal B_{(a \xi _0, b \xi _1)}(a^{-1} \mu , b^{-1} \nu )\) .

LaTeX Lean

Lemma 1.2.7

For all measures \(\mu , \nu \), \(\mathcal B_\xi (\mu , \nu ) \ge 0\) .

LaTeX

Lemma 1.2.4

\(\mathcal B_\xi (\mu , \nu ) = \mathcal B_{(1,1)}(\xi _0\mu , \xi _1\nu )\) .

LaTeX Lean

Lemma 1.2.15Dummy lemma: bayesBinaryRisk properties

Dummy node to summarize properties of the Bayes binary risk.

LaTeX

Lemma 1.2.6

For \(\mu \in \mathcal M(\mathcal X)\), \(\mathcal B_\xi (\mu , \mu ) = \min \{ \xi _0, \xi _1\} \mu (\mathcal X)\) .

LaTeX Lean

Lemma 1.2.9

For \(\mu , \nu \in \mathcal M(\mathcal X)\) and \(\xi \in \mathcal M(\{ 0,1\} )\), \(\mathcal B_\xi (\mu , \nu ) = \mathcal B_{\xi _{\leftrightarrow }}(\nu , \mu )\) where \(\xi _{\leftrightarrow } \in \mathcal M(\{ 0,1\} )\) is such that \(\xi _{\leftrightarrow }(\{ 0\} ) = \xi _1\) and \(\xi _{\leftrightarrow }(\{ 1\} ) = \xi _0\). For \(\pi \in [0,1]\), \(B_\pi (\mu , \nu ) = B_{1 - \pi }(\nu , \mu )\) .

LaTeX Lean

Lemma 1.1.8

The Bayesian risk of a Markov kernel \(\hat{y} : \mathcal X \rightsquigarrow \mathcal Z\) with respect to a prior \(\pi \in \mathcal M(\Theta )\) on \((P, y, \ell ')\) satisfies

\begin{align*} R^P_\pi (\hat{y}) = ((P_\pi ^\dagger \times \hat{y}) \circ P \circ \pi )\left[(\theta , z) \mapsto \ell ’(y(\theta ), z)\right] \: , \end{align*}

whenever the Bayesian inverse \(P_\pi ^\dagger \) of \(P\) with respect to \(\pi \) exists (Definition 48).

LaTeX Lean

Lemma 1.1.9

The Bayesian risk of a Markov kernel \(\hat{y} : \mathcal X \rightsquigarrow \mathcal Z\) with respect to a prior \(\pi \in \mathcal M(\Theta )\) on \((P, y, \ell ')\) satisfies

\begin{align*} R^P_\pi (\hat{y}) \ge (P \circ \pi )\left[x \mapsto \inf _{z \in \mathcal Z} P_\pi ^\dagger (x) \left[\theta \mapsto \ell ’(y(\theta ), z)\right]\right] \: . \end{align*}

LaTeX Lean

Lemma 1.1.10

The Bayesian risk of the generalized Bayes estimator \(\hat{y}_B\) is

\begin{align*} R^P_\pi (\hat{y}_B) = (P \circ \pi )\left[x \mapsto \inf _{z \in \mathcal Z} P_\pi ^\dagger (x) \left[\theta \mapsto \ell ’(y(\theta ), z)\right]\right] \: . \end{align*}

LaTeX Lean

Lemma 1.2.1

The Bayesian inverse of a kernel \(P : \{ 0,1\} \rightsquigarrow \mathcal X\) with respect to a prior \(\xi \in \mathcal M(\{ 0,1\} )\) is \(P_\xi ^\dagger (x) = \left(\xi _0\frac{d P(0)}{d(P \circ \xi )}(x), \xi _1\frac{d P(1)}{d(P \circ \xi )}(x)\right)\) (almost surely w.r.t. \(P \circ \xi = \xi _0 P(0) + \xi _1 P(1)\)).

LaTeX Lean

Lemma A.7.6

Let \(\mu \in \mathcal M(\mathcal X)\), \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and \(\eta : \mathcal Y \rightsquigarrow \mathcal Z\). Then \((\eta \circ \kappa \circ \mu )\)-a.e.,

\begin{align*} (\eta \circ \kappa )_\mu ^\dagger = \kappa _{\mu }^\dagger \circ \eta _{\kappa \circ \mu }^\dagger \: . \end{align*}

LaTeX Lean

Lemma A.7.3

For \(\mu \in \mathcal M(\mathcal X)\) s-finite, \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) a Markov kernel and \(\kappa _\mu ^\dagger \) the Bayesian inverse of \(\kappa \) with respect to \(\mu \), these objects satisfy the equality \(\kappa _\mu ^\dagger \circ \kappa \circ \mu = \mu \).

LaTeX Lean

Lemma A.7.5

Let \(\mu \in \mathcal M (\mathcal X)\) and let \(\textup{id} : \mathcal X \rightsquigarrow \mathcal X\) be the identity kernel. Then \(\textup{id}_\mu ^\dagger = \textup{id}\).

LaTeX Lean

Lemma A.7.8Dummy lemma: bayesInv properties

Dummy node to summarize properties of the Bayesian inverse.

LaTeX

Lemma A.7.4

For \(\mathcal X\) and \(\mathcal Y\) two standard Borel spaces, \(\mu \in \mathcal M(\mathcal X)\) s-finite and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) a Markov kernel, \((\kappa _\mu ^\dagger )_{\kappa \circ \mu }^\dagger = \kappa \) (\(\mu \)-a.e.).

LaTeX Lean

Lemma 1.2.2

For \(\Theta = \{ 0,1\} \), the Bayes risk of a prior \(\xi \in \mathcal M(\{ 0,1\} )\) is

\begin{align*} \mathcal R^P_\xi = (P \circ \xi )\left[x \mapsto \inf _{z \in \mathcal Z} \left( \xi _0\frac{d P(0)}{d(P \circ \xi )}(x)\ell ’(y(0), z) + \xi _1\frac{d P(1)}{d(P \circ \xi )}(x)\ell ’(y(1), z) \right) \right] \: . \end{align*}

LaTeX Lean

Lemma 1.1.14

When the generalized Bayes estimator is well defined, the Bayes risk with respect to the prior \(\pi \in \mathcal M(\Theta )\) for \(\mathcal Y = \mathcal Z = \Theta \), \(y = \mathrm{id}\) and \(\ell ' = \mathbb {I}\{ \theta \ne z\} \) is

\begin{align*} \mathcal R^P_\pi = (P \circ \pi )[1] - (P \circ \pi )\left[x \mapsto \sup _{z \in \mathcal Z} \frac{d P_z}{d (P \circ \pi )}(x) \pi \{ z\} \right] \: . \end{align*}

When \(\pi \) is a probability measure and \(P\) is a Markov kernel, \((P \circ \pi )[1] = 1\).

LaTeX

Lemma 1.1.13

\begin{align*} \mathcal R^P_\pi = (P \circ \pi )[1] - (P \circ \pi )\left[x \mapsto \sup _{z \in \mathcal Z} P_\pi ^\dagger (x) \{ z\} \right] \: . \end{align*}

LaTeX

Lemma 1.1.16

Suppose that \(\Theta \) is finite and let \(\xi \in \mathcal P(\Theta )\). The Bayes risk with respect to the prior \(\pi \in \mathcal P(\Theta )\) for \(\mathcal Y = \mathcal Z = \Theta \), \(y = \mathrm{id}\), \(P\) a Markov kernel and \(\ell ' = \mathbb {I}\{ \theta \ne z\} \) satisfies

\begin{align*} \mathcal R^P_\pi \le 1 - \left(\prod _{\theta \in \Theta } \pi _\theta ^{\xi _\theta }\right) (P \circ \pi )\left[x \mapsto \prod _{\theta \in \Theta } \left(\frac{d P_\theta }{d (P \circ \pi )}(x)\right)^{\xi _\theta } \right] \: . \end{align*}

LaTeX

Lemma 1.1.5

For \(P : \Theta \rightsquigarrow \mathcal X\) and \(\kappa : \Theta \times \mathcal X \rightsquigarrow \mathcal X'\) a Markov kernel, \(\mathcal R^{P \otimes \kappa }_\pi \le \mathcal R^{P}_\pi \) .

LaTeX Lean

Lemma 1.1.6

For \(P : \Theta \rightsquigarrow \mathcal X\) and \(\kappa : \Theta \times \mathcal X \rightsquigarrow \mathcal X'\) a Markov kernel, \(\mathcal R^{P \otimes \kappa }_\pi \le \mathcal R^{(P \otimes \kappa )_{\mathcal X'}}_\pi \) , in which \((P \otimes \kappa )_{\mathcal X'} : \mathcal\Theta \rightsquigarrow \mathcal X'\) is the kernel obtained by marginalizing over \(\mathcal X\) in the output of \(P \otimes \kappa \) .

LaTeX

Lemma 1.1.7

The Bayes risk \(\mathcal R_\pi ^P\) is concave in \(P : \Theta \rightsquigarrow \mathcal X\) .

LaTeX

Lemma 1.1.3

The Bayes risk of a prior \(\pi \in \mathcal M(\Theta )\) on \((P, y, \ell ')\) with \(P\) a constant Markov kernel is

\begin{align*} \mathcal R^P_\pi = \inf _{z \in \mathcal Z} \pi \left[\theta \mapsto \ell ’(y(\theta ), z)\right] \: . \end{align*}

In particular, it does not depend on \(P\).

LaTeX

Lemma 1.1.12

When the generalized Bayes estimator is well defined, the Bayes risk with respect to the prior \(\pi \in \mathcal M(\Theta )\) is

\begin{align*} \mathcal R^P_\pi = (P \circ \pi )\left[x \mapsto \inf _{z \in \mathcal Z} \pi \left[\theta \mapsto \frac{d P_\theta }{d (P \circ \pi )}(x) \ell ’(y(\theta ), z)\right]\right] \: . \end{align*}

LaTeX

Lemma 1.1.2

The Bayes risk of a prior \(\pi \in \mathcal M(\Theta )\) on \((P, y, \ell ')\) with \(P\) a Markov kernel satisfies

\begin{align*} \mathcal R^P_\pi \le \inf _{z \in \mathcal Z} \pi \left[ \theta \mapsto \ell ’(y(\theta ), z) \right] \: . \end{align*}

LaTeX Lean

Lemma 1.1.1

\(\mathcal R_B^* \le \mathcal R^*\).

LaTeX Lean

Lemma 1.4.1

If \(n \le m\) then \(\mathcal R_\pi ^{P^{\otimes n}} \ge \mathcal R_\pi ^{P^{\otimes m}}\).

LaTeX

Lemma 4.2.1

For \(\delta \ge \min \{ \pi , 1 - \pi \} \), the sample complexity of simple binary hypothesis testing is \(n(\mu , \nu , \pi , \delta ) = 0\) .

LaTeX

Lemma 4.2.4

For \(\delta \le \min \{ \pi , 1 - \pi \} \), the sample complexity of simple binary hypothesis testing satisfies

\begin{align*} n(\mu , \nu , \pi , \delta ) \ge \frac{h_2(\pi ) - h_2(\delta )}{\operatorname{JS}_\pi (\mu , \nu )} \: , \end{align*}

in which \(h_2: x \mapsto x\log \frac{1}{x} + (1 - x)\log \frac{1}{1 - x}\) is the binary entropy function.

LaTeX

Lemma 4.2.3

For \(\delta \le \pi \le 1/2\), the sample complexity of simple binary hypothesis testing satisfies

\begin{align*} n(\mu , \nu , \pi , \delta ) \ge \frac{\log \frac{\pi }{2\delta }}{R_{1/2}(\mu , \nu )} \: . \end{align*}

LaTeX

Lemma 4.2.2

The sample complexity of simple binary hypothesis testing satisfies \(n(\mu , \nu , \pi , \delta ) \le n_0\) , with \(n_0\) the smallest natural number such that

\begin{align*} \log \frac{1}{\delta } \le \sup _{\alpha \in (0,1)} \left( n_0 (1 - \alpha )R_\alpha (\mu , \nu ) + \alpha \log \frac{1}{\pi } + (1 - \alpha )\log \frac{1}{1 - \pi } \right) \end{align*}

LaTeX

Lemma 4.1.16

Consider an estimation problem with loss \(\ell ' : \mathcal Y \times \mathcal Z \to [0,1]\). Let \(\pi , \zeta \in \mathcal P(\Theta )\) and \(P, Q : \Theta \rightsquigarrow \mathcal X\) be such that \(\zeta \otimes Q \ll \pi \otimes P\). Then for all \(\beta \in \mathbb {R}\),

\begin{align*} \mathcal R_\pi ^P e^{\beta } \ge \mathcal R_\zeta ^Q - (\zeta \otimes Q)\left\{ \log \frac{d (\zeta \otimes Q)}{d (\pi \otimes P)} {\gt} \beta \right\} \: . \end{align*}

LaTeX

Lemma 4.1.24

For \(\alpha \in (0,1)\), let \(\pi _\alpha \in \mathcal P(\{ 0,1\} )\) be the measure \((\alpha , 1 - \alpha )\). Let \(\alpha , \gamma \in (0,1)\), \(\mu , \nu \in \mathcal P(\mathcal X)\) and let \(P : \{ 0,1\} \rightsquigarrow \mathcal X\) be the kernel with \(P(0) = \mu \) and \(P(1) = \nu \) . Then for all \(\beta {\gt} 0\) and \(\varepsilon {\gt}0\) ,

\begin{align*} \mathcal B_{\pi _\alpha }(\mu , \nu ) e^{\inf _{\xi \in \mathcal P(\mathcal X)} R_{1 + \varepsilon }(\pi _\gamma \times \xi , \pi _\alpha \otimes P) + \beta } & \ge \min \{ \gamma , 1 - \gamma \} - e^{-\varepsilon \beta } \: . \end{align*}

LaTeX

Lemma 4.1.17

Consider an estimation problem with loss \(\ell ' : \mathcal Y \times \mathcal Z \to [0,1]\). Let \(\pi , \zeta \in \mathcal P(\Theta )\) and \(P : \Theta \rightsquigarrow \mathcal X\). Then for all \(\beta \in \mathbb {R}\),

\begin{align*} \mathcal R_\pi ^P e^{\beta } \ge \mathcal R_\zeta ^{d_{\mathcal X}} - \inf _{\xi \in \mathcal P(\mathcal X)}(\zeta \times \xi )\left\{ \log \frac{d (\zeta \times \xi )}{d (\pi \otimes P)} {\gt} \beta \right\} \: , \end{align*}

in which the infimum over \(\xi \) is restricted to probability measures such that \(\zeta \times \xi \ll \pi \otimes P\) and \(d_{\mathcal X} : \Theta \rightsquigarrow *\) is the discard kernel.

LaTeX

Lemma 4.1.20

\begin{align*} & \mathcal B_{\pi _\alpha }(\mu , \nu ) e^{\operatorname{kl}(\gamma , \alpha ) + (1 - \gamma ) R_\gamma (\mu , \nu ) + \beta } \\ & \ge \min \{ \gamma , 1 - \gamma \} - (\pi _\gamma \times \mu ^{(\gamma , \nu )})\left\{ \log \frac{d (\pi _\gamma \times \mu ^{(\gamma , \nu )})}{d (\pi _\alpha \otimes P)} {\gt} \operatorname{KL}(\pi _\gamma \times \mu ^{(\gamma , \nu )}, \pi _\alpha \otimes P) + \beta \right\} \: . \end{align*}

LaTeX

Lemma 4.1.25

Let \(\mu , \nu , \xi \) be three probability measures on \(\mathcal X\) and let \(E\) be an event on \(\mathcal X\). For \(\beta {\gt} 0\) ,

\begin{align*} \mu (E) e^{\operatorname{KL}(\xi , \mu ) + \sqrt{\beta \operatorname{Var}_{\xi }\left[\log \frac{d\xi }{d\mu }\right]}} + \nu (E^c) e^{\operatorname{KL}(\xi , \nu ) + \sqrt{\beta \operatorname{Var}_{\xi }\left[\log \frac{d\xi }{d\nu }\right]}} \ge 1 - \frac{2}{\beta } \: . \end{align*}

LaTeX

Lemma 2.8.1

\(C_1(\mu , \nu ) = \inf _{\xi \in \mathcal P(\mathcal X)}\max \{ \operatorname{KL}(\xi , \mu ), \operatorname{KL}(\xi , \nu )\} \) .

LaTeX Lean

Lemma 2.8.4

\begin{align*} C_1(\mu , \nu ) = \max _{\alpha \in [0,1]} (1 - \alpha )R_\alpha (\mu , \nu ) \: . \end{align*}

LaTeX

Lemma 2.8.3Monotonicity

The function \(\alpha \mapsto C_\alpha (\mu , \nu )\) is monotone.

LaTeX

Lemma 2.8.2Symmetry

\(C_\alpha (\mu , \nu ) = C_\alpha (\nu , \mu )\).

LaTeX

Lemma 2.3.23

Let \(\mu , \nu \) be measures on \(\mathcal X\), where \(\mu \) is finite, and let \(\xi \) be a finite measure on \(\mathcal Y\). Then \(D_f(x \mapsto \mu , x \mapsto \nu \mid \xi ) = D_f(\mu , \nu ) \xi (\mathcal X)\).

LaTeX Lean

Lemma 2.3.25

Let \(\mu \) be a finite measure on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, where \(\kappa \) is a Markov kernel. Then \(D_f(\kappa , \eta \mid \mu ) \ne \infty \) if and only if

for \(\mu \)-almost all \(x\), \(y \mapsto f \left( \frac{d\kappa (x)}{d\eta (x)}(y) \right)\) is \(\eta (x)\)-integrable,
\(x \mapsto \int _y f \left( \frac{d\kappa (x)}{d\eta (x)}(y) \right) \partial \eta (x)\) is \(\mu \)-integrable,
either \(f'(\infty ) {\lt} \infty \) or for \(\mu \)-almost all \(x\), \(\kappa (x) \ll \eta (x)\).

LaTeX Lean

Lemma 2.3.22

Let \(\mu \) be a measure on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) two Markov kernels. If \(f(1) = 0\) then \(D_f(\kappa , \eta \mid \mu ) \ge 0\).

LaTeX Lean

Lemma 2.3.28Dummy lemma: condFDiv properties

Dummy node to summarize properties of conditional \(f\)-divergences.

LaTeX

Lemma 2.4.4

Let \(\mu , \nu \) be finite measures on \(\mathcal X\), \(\xi \) be a finite measure on \(\mathcal Y\). Then \(\operatorname{KL}(x \mapsto \mu , x \mapsto \nu \mid \xi ) = \operatorname{KL}(\mu , \nu ) \xi (\mathcal X)\).

LaTeX Lean

Lemma 2.4.2

\(\operatorname{KL}(\kappa , \eta \mid \mu ) = D_f(\kappa , \eta \mid \mu )\) for \(f: x \mapsto x \log x\).

LaTeX Lean

Lemma 2.3.32

For \(f: \mathbb {R} \to \mathbb {R}\) a convex function and \(x,y \in \mathbb {R}\),

\begin{align*} f(y) - f(x) - (y - x)f’_+(x) & = \int _{z \in (x,y]} (y - z) \partial \gamma _f & \text{ if } x \le y \: , \\ f(y) - f(x) - (y - x)f’_+(x) & = \int _{z \in (y,x]} (z - y) \partial \gamma _f & \text{ if } y \le x \: . \end{align*}

LaTeX Lean

Lemma 2.3.30

For \(f,g: \mathbb {R} \to \mathbb {R}\) two convex functions, the curvature measure of \(f+g\) is \(\gamma _{f+g} = \gamma _f + \gamma _g\) .

LaTeX

Lemma 2.3.29

For \(a \ge 0\) and \(f: \mathbb {R} \to \mathbb {R}\) a convex function, the curvature measure of \(af\) is \(\gamma _{af} = a \gamma _f\) .

LaTeX

Lemma 2.3.34

The curvature measure of the function \(\phi _{a,b}\) is \(\gamma _{\phi _{a,b}} = a\delta _{b/a}\) , where \(\delta _x\) is the Dirac measure at \(x\).

LaTeX

Lemma 2.1.1Second marginal

Let \(D\) be a divergence that satisfies the DPI. Let \(\mathcal\mu , \nu \in \mathcal M(\mathcal X)\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\). Then

\begin{align*} D(\kappa \circ \mu , \eta \circ \nu ) \le D(\mu \otimes \kappa , \nu \otimes \eta ) \: . \end{align*}

LaTeX

Lemma 2.1.4Conditioning increases divergence

Let \(D\) be a divergence that satisfies the DPI and for which \(D(\kappa , \eta \mid \mu ) = D(\mu \otimes \kappa , \mu \otimes \eta )\). Let \(\mathcal\mu \in \mathcal M(\mathcal X)\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be Markov kernels. Then

\begin{align*} D(\kappa \circ \mu , \eta \circ \mu ) \le D(\kappa , \eta \mid \mu ) \: . \end{align*}

LaTeX

Lemma 2.1.3

Let \(D\) be a divergence that satisfies the DPI. Let \(\mathcal\mu , \nu \in \mathcal M(\mathcal X)\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then

\begin{align*} D(\mu \otimes \kappa , \nu \otimes \kappa ) = D(\mu , \nu ) \: . \end{align*}

LaTeX

Lemma 2.1.2First marginal

Let \(D\) be a divergence that satisfies the DPI. Let \(\mathcal\mu , \nu \in \mathcal M(\mathcal X)\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be Markov kernels. Then

\begin{align*} D(\mu , \nu ) \le D(\mu \otimes \kappa , \nu \otimes \eta ) \: . \end{align*}

LaTeX

Lemma 2.3.37

Let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel and \(\nu \in \mathcal M(\mathcal X)\) be a finite measure. Suppose that for all finite measures \(\mu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \), \(D_f(\kappa \circ \mu , \kappa \circ \nu ) \le D_f(\mu , \nu )\). Then the same is true without the absolute continuity hypothesis.

LaTeX Lean

Lemma A.7.2a.e.-uniqueness of the Bayesian inverse

Let \(\mu \in \mathcal M(\mathcal X)\) be a finite measure and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a finite kernel. If \(\eta : \mathcal Y \rightsquigarrow \mathcal X\) is such that \(\mu \otimes \kappa = ((\kappa \circ \mu ) \otimes \eta )_\leftrightarrow \), then \(\eta (y) = \kappa _\mu ^\dagger (y)\) for \((\kappa \circ \mu )\)-almost all \(y\).

LaTeX Lean

Lemma A.7.1Existence of the Bayesian inverse

For \(\mathcal X\) standard Borel, \(\mu \) and \(\kappa \) s-finite, the Bayesian inverse of \(\kappa \) with respect to \(\mu \) exists and is obtained by disintegration of the measure \(\mu \otimes \kappa \) on \(\mathcal X \times \mathcal Y\) into a measure \(\kappa \circ \mu \in \mathcal M(\mathcal Y)\) and a Markov kernel \(\kappa _\mu ^\dagger : \mathcal Y \rightsquigarrow \mathcal X\).

LaTeX

Lemma 2.4.20

Let \(\mu , \nu \) be two measures and \(E\) an event. Then \(\mu (E)\log \frac{\mu (E)}{\nu (E)} \le \mu \left[\mathbb {I}(E)\log \frac{d \mu }{d \nu }\right]\) .

LaTeX

Lemma 2.3.14

Let \(\mu _1, \mu _2\) and \(\nu \) be finite measures on \(\mathcal X\), with \(\mu _1 \ll \nu \) and \(\mu _2 \perp \nu \). Then \(D_f(\mu _1 + \mu _2, \nu ) = D_f(\mu _1, \nu ) + \mu _2(\mathcal X) f'(\infty )\).

LaTeX Lean

Lemma 2.3.7

\(D_{f + g}(\mu , \nu ) = D_f(\mu , \nu ) + D_g(\mu , \nu )\).

LaTeX Lean

Lemma 2.3.8

For finite measures \(\mu \) and \(\nu \) with \(\mu (\mathcal X) = \nu (\mathcal X)\), for all \(a \in \mathbb {R}\), \(D_{f + a(x - 1)}(\mu , \nu ) = D_{f}(\mu , \nu )\).

LaTeX Lean

Lemma 2.3.16

Let \(\mu _1, \mu _2, \nu \) be three finite measures on \(\mathcal X\). Then \(D_f(\mu _1 + \mu _2, \nu ) \le D_f(\mu _1, \nu ) + \mu _2(\mathcal X) f'(\infty )\).

LaTeX Lean

Lemma 2.3.15Superseded by Lemma 2.3.16

Let \(\mu _1, \mu _2, \nu \) be three finite measures on \(\mathcal X\) with \(\mu _1 \ll \nu \) and \(\mu _2 \ll \nu \). Then \(D_f(\mu _1 + \mu _2, \nu ) \le D_f(\mu _1, \nu ) + \mu _2(\mathcal X) f'(\infty )\).

LaTeX Lean

Lemma 2.3.11

\((\mu , \nu ) \mapsto D_f(a \mu + b \nu , \nu )\) is an \(f\)-divergence for the function \(x \mapsto f(ax + b)\)

LaTeX

Lemma 2.3.12

\((\mu , \nu ) \mapsto D_f(\mu , a \mu + b \nu )\) is an \(f\)-divergence for the function \(x \mapsto (ax+b)f\left(\frac{x}{ax+b}\right)\) .

LaTeX

Lemma 2.3.13

\((\mu , \nu ) \mapsto D_f(\nu , a \mu + b \nu )\) is an \(f\)-divergence for the function \(x \mapsto (ax+b)f\left(\frac{1}{ax+b}\right)\) .

LaTeX

Lemma 2.3.60

For all \(y \in [0,1]\), \(x \mapsto d_f(x, y)\) is convex and attains a minimum at \(x = y\).

LaTeX

Lemma 2.3.61

Let \(\mu , \nu \in \mathcal P([0,1])\). Then

\begin{align*} D_f(\mu , \nu ) \ge d_f(\mu [X], \nu [X]) \: . \end{align*}

LaTeX

Lemma 2.3.44

Let \(\mu , \nu \) be two finite measures on a standard Borel space \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. \(D_f(\kappa \circ \mu , \eta \circ \nu ) \le D_f(\mu \otimes \kappa , \nu \otimes \eta )\)

LaTeX Lean

Lemma 2.3.27

Let \(\mu \) be a finite measure on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, such that \(\kappa (x) \ne 0\) for all \(x\). Then \(D_f(\mu \otimes \kappa , \mu \otimes \eta ) = D_f(\kappa , \eta \mid \mu )\).

LaTeX Lean

Lemma 2.3.26

LaTeX Lean

Lemma 2.3.58

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow (\mathcal X \times \mathcal Y)\) be a Markov kernel such that for all \(x\), \((\kappa (x))_X = \delta _x\). Then \(D_f(\kappa \circ \mu , \kappa \circ \nu ) = D_f(\mu , \nu )\).

LaTeX

Lemma 2.3.2

For \(\nu \) a finite measure, for all \(a \in \mathbb {R}\), \(D_{x \mapsto a}(\mu , \nu ) = a \nu (\mathcal X)\).

LaTeX Lean

Lemma 2.3.17

Let \(\mu \) and \(\nu \) be two finite measures on \(\mathcal X\). Then \(D_f(\mu , \nu ) = D_f(\frac{d\mu }{d\nu }\cdot \nu , \nu ) + f'(\infty ) \mu _{\perp \nu }(\mathcal X)\).

LaTeX Lean

Lemma 2.3.4

Let \(\mu , \nu \) be two probability measures. Assume \(f'(\infty ) = + \infty \) and \(f(1) = 0\). Then \(D_f(\mu , \nu ) = 0\) if and only if \(\mu = \nu \).

LaTeX Lean

Lemma 2.3.63

Let \(\pi , \xi \in \mathcal P(\Theta )\) and \(P, Q : \Theta \rightsquigarrow \mathcal X\). Suppose that the loss \(\ell '\) takes values in \([0,1]\). Then

\begin{align*} D_f(\pi \otimes Q, \xi \otimes P) & \ge d_f(\mathcal R_\pi ^Q, \mathcal R_\xi ^P) \: . \end{align*}

LaTeX

Lemma 2.3.5

\(D_{x \mapsto x}(\mu , \nu ) = \mu (\mathcal X)\).

LaTeX Lean

Lemma 2.3.66

If \(f(1) = 0\), \(g(1) = 0\), \(f'(1) = 0\), \(g'(1) = 0\), and both \(f\) and \(g\) have a second derivative, then

\begin{align*} D_f(\mu , \nu ) \le \sup _x \frac{f''(x)}{g''(x)} D_g(\mu , \nu ) \end{align*}

LaTeX

Lemma 2.3.53

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) be finite measures with \(\mu \ll \nu \) and let \(g : \mathcal X \to \mathcal Y\) be a measurable function. Denote by \(g^* \mathcal Y\) the comap of the \(\sigma \)-algebra on \(\mathcal Y\) by \(g\). Then \(D_f(g_* \mu , g_* \nu ) = D_f(\mu _{| g^* \mathcal Y}, \nu _{| g^* \mathcal Y})\) .

LaTeX

Lemma 2.3.9

Let \(\mu \) and \(\nu \) be two measures on \(\mathcal X\) and let \(g : \mathcal X \to \mathcal Y\) be a measurable embedding. Then \(D_f(g_* \mu , g_* \nu ) = D_f(\mu , \nu )\).

LaTeX Lean

Lemma 2.3.65

If \(f \le g\), then \(D_f(\mu , \nu ) \le D_g(\mu , \nu )\).

LaTeX

Lemma 2.3.6

For all \(a \ge 0\), \(D_{a f}(\mu , \nu ) = a D_{f}(\mu , \nu )\).

LaTeX Lean

Lemma 2.3.1

For \(\mu \) and \(\nu \) two finite measures, \(D_f(\mu , \nu )\) is finite if and only if \(x \mapsto f\left(\frac{d \mu }{d \nu }(x)\right)\) is \(\nu \)-integrable and either \(f'(\infty ) {\lt} \infty \) or \(\mu \ll \nu \).

LaTeX Lean

Lemma 2.3.20

Let \(\mu , \nu \) be two probability measures. If \(f(1) = 0\) then \(D_f(\mu , \nu ) \ge 0\).

LaTeX Lean

Lemma 2.3.49

Let \(a,b \in [0, +\infty )\). Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then \(D_{\phi _{a,b}}(\kappa \circ \mu , \kappa \circ \nu ) \le D_{\phi _{a,b}}(\mu , \nu )\).

LaTeX Lean

Lemma 2.3.21Dummy lemma: fDiv properties

Dummy node to summarize properties of \(f\)-divergences.

LaTeX

Lemma 2.3.3

If \(f(1) = 0\) then \(D_{f}(\mu , \mu ) = 0\).

LaTeX Lean

Lemma 2.3.47

Let \(a,b \in [0, +\infty )\) and let \(\mu , \nu \) be two measures on \(\mathcal X\).

\begin{align*} D_{\phi _{a,b}}(\mu , \nu ) & = \text{sign}(b-a)\frac{1}{2}(a \mu (\mathcal X) - b \nu (\mathcal X)) + \frac{1}{2}\nu \left[ \left\vert a \frac{d\mu }{d\nu } - b \right\vert \right] + \frac{1}{2}a \mu _{\perp \nu }(\mathcal X) \: , \end{align*}

in which \(\text{sign}(b-a)\) is \(1\) if \(b-a {\gt} 0\) and \(-1\) if \(b-a \le 0\).

LaTeX

Lean

ProbabilityTheory.fDiv_statInfoFun_eq_StatInfo_of_nonneg_of_le
ProbabilityTheory.fDiv_statInfoFun_eq_StatInfo_of_nonneg_of_gt

Lemma 2.3.10

\(D_f(\mu , \nu ) = D_{x \mapsto xf(1/x)}(\nu , \mu )\) .

LaTeX

Lemma 1.2.10

The generalized Bayes estimator for the Bayes binary risk with prior \(\xi \in \mathcal M(\{ 0,1\} )\) is \(x \mapsto \text{if } \xi _1\frac{d \nu }{d(P \circ \xi )}(x) \le \xi _0\frac{d \mu }{d(P \circ \xi )}(x) \text{ then } 0 \text{ else } 1\), i.e. it is equal to \(\mathbb {I}_E\) for \(E = \{ x \mid \xi _1\frac{d \nu }{d(P \circ \xi )}(x) {\gt} \xi _0\frac{d \mu }{d(P \circ \xi )}(x)\} \) .

LaTeX Lean

Lemma 1.1.15

The generalized Bayes estimator for prior \(\pi \in \mathcal P(\Theta )\) on the estimation problem defined by \(\mathcal Y = \mathcal Z = \Theta \), \(y = \mathrm{id}\) and \(\ell ' = \mathbb {I}\{ \theta \ne z\} \) is

\begin{align*} x \mapsto \arg \max _z \left( \pi \{ z\} \frac{d P_z}{d (P \circ \pi )}(x) \right) \: . \end{align*}

LaTeX

Lemma 2.7.2

Let \(\mu , \nu \) be two probability measures. Then \(2 \operatorname{H^2}(\mu , \nu ) \le R_{1/2}(\mu , \nu )\).

LaTeX

Lemma 2.7.3

Let \(\mu , \nu \) be two probability measures. Then \(\operatorname{H^2}(\mu , \nu ) \le \operatorname{TV}(\mu , \nu )\).

LaTeX

Lemma 2.5.7

\((\mu , \nu ) \mapsto \operatorname{H}_\alpha (\mu , \nu )\) is convex.

LaTeX

Lemma 2.5.2

For \(\alpha \in (0,1)\cup (1, \infty )\), \(\mu \) a finite measure and \(\nu \) a probability measure, if \(\operatorname{H}_\alpha (\mu , \nu ) {\lt} \infty \) then

\begin{align*} \operatorname{H}_\alpha (\mu , \nu ) = \frac{1}{\alpha - 1} \left( \int _x \left(\frac{d \mu }{d \nu }(x)\right)^\alpha \partial \nu - 1 \right) \: . \end{align*}

LaTeX Lean

Lemma 2.5.1

For \(\alpha \in [0, 1)\) and finite measures \(\mu , \nu \), \(\operatorname{H}_\alpha (\mu , \nu ) {\lt} \infty \).

LaTeX Lean

Lemma 2.5.6

Let \(\mu , \nu \) be two probability measures. Then \(\operatorname{H}_\alpha (\mu , \nu ) \ge 0\).

LaTeX Lean

Lemma 2.5.8Dummy lemma: hellingerAlpha properties

Dummy node to summarize properties of the Hellinger \(\alpha \)-divergence.

LaTeX

Lemma 2.5.4

For \(\alpha \in (0, 1)\) and finite measures \(\mu , \nu \) with \(\mu (\mathcal X) = \nu (\mathcal X)\), \((1 - \alpha ) \operatorname{H}_\alpha (\mu , \nu ) = \alpha \operatorname{H}_{1 - \alpha }(\nu , \mu )\).

LaTeX Lean

Lemma C.0.1

If \(\mu \) and \(\nu \) are two finite measures and \(f'(\infty ) {\lt} \infty \), then \(x \mapsto f\left(\frac{d\mu }{d\nu }(x)\right)\) is \(\nu \)-integrable.

LaTeX Lean

Lemma 2.3.24

Let \(\mu \) be a finite measure on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels from \(\mathcal X\) to \(\mathcal Y\). Then \(p \mapsto f \left(\frac{d(\mu \otimes \kappa )}{d(\mu \otimes \eta )}(p)\right)\) is \((\mu \otimes \eta )\)-integrable iff

\(x \mapsto D_f(\kappa (x), \eta (x))\) is \(\mu \)-integrable and
for \(\mu \)-almost all \(x\), \(y \mapsto f \left( \frac{d\kappa (x)}{d\eta (x)}(y) \right)\) is \(\eta (x)\)-integrable.

LaTeX Lean

Lemma 2.4.11

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two Markov kernels such that \(\mu \otimes \kappa \ll \nu \otimes \eta \). Then \(p \mapsto \log \frac{d \mu \otimes \kappa }{d \nu \otimes \eta }(p)\) is \(\mu \otimes \kappa \)-integrable if and only if the following hold:

\(x \mapsto \log \frac{d \mu }{d \nu }(x)\) is \(\mu \)-integrable
\(x \mapsto \int _y \log \frac{d \kappa (x)}{d \eta (x)}(y) \partial \kappa (x)\) is \(\mu \)-integrable
for \(\mu \)-almost all \(x\), \(y \mapsto \log \frac{d \kappa (x)}{d \eta (x)}(y)\) is \(\kappa (x)\)-integrable

LaTeX Lean

Lemma 2.5.3

For \(\alpha \in (0,1)\cup (1, \infty )\), \(\mu , \nu \) two sigma-finite measures,

\begin{align*} \int _x \left(\frac{d \mu }{d \nu }(x)\right)^\alpha \partial \nu = \int _x \left(\frac{d \nu }{d \mu }(x)\right)^{1 - \alpha } \partial \mu \: . \end{align*}

LaTeX Lean

Lemma 2.9.2

\(\operatorname{JS}_\alpha \) is an \(f\)-divergence for \(f(x) = \alpha x \log (x) - (\alpha x + 1 - \alpha ) \log (\alpha x + 1 - \alpha )\) .

LaTeX

Lemma 2.9.4

Let \(\mu , \nu \in \mathcal P(\mathcal X)\) and let \(\alpha \in (0, 1)\). Then

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) = \inf _{\xi \in \mathcal P(\mathcal X)}\left( \alpha \operatorname{KL}(\mu , \xi ) + (1 - \alpha )\operatorname{KL}(\nu , \xi ) \right) \: . \end{align*}

The infimum is attained at \(\xi = \alpha \mu + (1 - \alpha ) \nu \).

LaTeX

Lemma 2.9.5

Let \(\mu , \nu \in \mathcal P(\mathcal X)\) and let \(\alpha \in (0, 1)\). Let \(\pi _\alpha = (\alpha , 1 - \alpha ) \in \mathcal P(\{ 0,1\} )\) and let \(P : \{ 0,1\} \rightsquigarrow \mathcal X\) be the kernel with \(P(0) = \mu \) and \(P(1) = \nu \). Then

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) = \inf _{\xi \in \mathcal P(\mathcal X)} \operatorname{KL}\left( \pi _\alpha \otimes P, \pi _\alpha \times \xi \right) \: . \end{align*}

The infimum is attained at \(\xi = \alpha \mu + (1 - \alpha ) \nu \).

LaTeX

Lemma 2.9.1

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) = \operatorname{KL}(\pi _\alpha \otimes P, \pi _\alpha \times (P \circ \pi _\alpha )) \: . \end{align*}

LaTeX

Lemma 2.9.7

For \(\mu , \nu \in \mathcal P(\mathcal X)\) and \(\alpha , \lambda \in (0,1)\) ,

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) \le \frac{(1 - \alpha )^{1 - \lambda } \alpha ^\lambda }{\lambda ^{1 - \lambda } (1 - \lambda )^\lambda } (1 - \lambda )\operatorname{H}_\lambda (\mu , \nu ) \: . \end{align*}

In particular,

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) & \le \left(\frac{1 - \alpha }{\alpha }\right)^{1 - 2 \alpha } (1 - \alpha )\operatorname{H}_\alpha (\mu , \nu ) \: , \\ \operatorname{JS}_\alpha (\mu , \nu ) & \le \alpha \operatorname{H}_{1 - \alpha }(\mu , \nu ) \: . \end{align*}

LaTeX

Lemma 2.9.6

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\). Let \(n \in \mathbb {N}\) and write \(\mu ^{\otimes n}\) for the product measure on \(\mathcal X^n\) of \(n\) times \(\mu \). Then \(\operatorname{JS}_\alpha (\mu ^{\otimes n}, \nu ^{\otimes n}) \le n \operatorname{JS}_\alpha (\mu , \nu )\).

LaTeX

Lemma 2.9.3

For \(\alpha \in (0,1)\) and \(\mu , \nu \in \mathcal M(\mathcal X)\),

\begin{align*} \operatorname{JS}_\alpha (\mu , \nu ) = \operatorname{JS}_{1 - \alpha }(\nu , \mu ) \: . \end{align*}

LaTeX

Lemma A.6.2Dummy lemma: kernel properties

Dummy node to summarize kernel properties.

LaTeX

Lemma 2.4.19

Let \(\mu , \nu \) be two measures and \(E\) an event. Let \(\mu _{|E}\) be the measure defined by \(\mu _{|E}(A) = \frac{\mu (A \cap E)}{\mu (E)}\) and define \(\nu _{|E}\), \(\mu _{| E^c}\) and \(\nu _{| E^c}\) similarly. Let \(\operatorname{kl}(p,q) = p\log \frac{p}{q} + (1-p)\log \frac{1-p}{1-q}\) be the Kullback-Leibler divergence between Bernoulli distributions with means \(p\) and \(q\). Then

\begin{align*} \operatorname{KL}(\mu , \nu ) \ge \operatorname{kl}(\mu (E), \nu (E)) + \mu (E) \operatorname{KL}(\mu _{|E}, \nu _{|E}) + \mu (E^c) \operatorname{KL}(\mu _{|E^c}, \nu _{|E^c}) \: . \end{align*}

LaTeX

Lemma 2.4.10

\((\mu , \nu ) \mapsto \operatorname{KL}(\mu , \nu )\) is convex.

LaTeX

Lemma 2.4.1

\(\operatorname{KL}(\mu , \nu ) = D_f(\mu , \nu )\) for \(f: x \mapsto x \log x\).

LaTeX Lean

Lemma 2.4.9Converse Gibbs’ inequality

Let \(\mu , \nu \) be two probability measures. Then \(\operatorname{KL}(\mu , \nu ) = 0\) if and only if \(\mu = \nu \).

LaTeX Lean

Lemma 4.1.5

Let \(\pi , \xi \in \mathcal P(\Theta )\) and \(P, Q : \Theta \rightsquigarrow \mathcal X\). Suppose that the loss \(\ell '\) takes values in \([0,1]\). Then

\begin{align*} \operatorname{KL}(\pi \otimes Q, \xi \otimes P) & \ge \operatorname{kl}(\mathcal R_\pi ^Q, \mathcal R_\xi ^P) \: . \end{align*}

LaTeX

Lemma 4.1.9

Let \(\mu , \nu , \xi \in \mathcal P(\mathcal X)\) and let \(\alpha , \beta \in (0, 1)\). Let \(P : \{ 0,1\} \rightsquigarrow \mathcal X\) be the kernel with \(P(0) = \mu \) and \(P(1) = \nu \). We write \(\pi _\alpha \) for the probability measure on \(\{ 0,1\} \) with \(\pi _\alpha (\{ 0\} ) = \alpha \). Let \(\bar{\beta } = \min \{ \beta , 1 - \beta \} \). Then

\begin{align*} \operatorname{KL}(\pi _\alpha \otimes P, \pi _\alpha \times \xi ) & \ge \operatorname{kl}(B_\alpha (\mu , \nu ), \bar{\beta }) - \operatorname{kl}(\alpha , \bar{\beta }) \: , \\ \operatorname{KL}(\pi _\beta \times \xi , \pi _\beta \otimes P) & \ge \operatorname{kl}(\bar{\beta }, B_\alpha (\mu , \nu )) - \operatorname{kl}(\bar{\beta }, \alpha ) \: . \end{align*}

LaTeX

Lemma 2.4.7

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\). Then \(\operatorname{KL}(\mu , \nu ) \ge \mu (\mathcal X) \log \frac{\mu (\mathcal X)}{\nu (\mathcal X)}\).

LaTeX Lean

Lemma 2.4.8Gibbs’ inequality

Let \(\mu , \nu \) be two probability measures. Then \(\operatorname{KL}(\mu , \nu ) \ge 0\).

LaTeX Lean

Lemma 2.4.16

Let \(\mu _1, \nu _1\) be finite measures on \(\mathcal X\) and \(\mu _2, \nu _2\) probability measures on \(\mathcal{Y}\). Then

\[ \operatorname{KL}(\mu _1\times \mu _2, \nu _1 \times \nu _2) = \operatorname{KL}(\mu _1, \nu _1) + \operatorname{KL}(\mu _2, \nu _2) \mu _1 (\mathcal X) \: . \]

LaTeX Lean

Lemma 2.4.21Dummy lemma: KL properties

Dummy node to summarize properties of the Kullback-Leibler divergence.

LaTeX

Lemma 2.6.17

Let \(\mu , \nu , \xi \) be three measures on \(\mathcal X\) and let \(\alpha \in (0, 1)\). Then

\begin{align*} \operatorname{KL}(\xi , \mu ^{(\alpha , \nu )}) = \alpha \operatorname{KL}(\xi , \mu ) + (1 - \alpha )\operatorname{KL}(\xi , \nu ) - (1 - \alpha ) R_\alpha (\mu , \nu ) \: . \end{align*}

LaTeX

Lemma 2.4.3

\(\operatorname{KL}(\mu , \mu ) = 0\).

LaTeX Lean

Lemma 2.3.19

Let \(\mu \) be a finite measure and \(\nu \) be a probability measure on the same space \(\mathcal X\). Then \(f(\mu (\mathcal X)) \le D_f(\mu , \nu )\).

LaTeX Lean

Lemma 2.3.18Superseded by Lemma 2.3.19

Let \(\mu \) be a finite measure and \(\nu \) be a probability measure on the same space \(\mathcal X\), such that \(\mu \ll \nu \). Then \(f(\mu (\mathcal X)) \le D_f(\mu , \nu )\).

LaTeX Lean

Lemma 4.1.14Change of measure lemma

Let \(\mu , \nu \) be two measures on \(\mathcal X\) with \(\mu \ll \nu \) and let \(E\) be an event on \(\mathcal X\). Let \(\beta \in \mathbb {R}\). Then

\begin{align*} \nu (E) e^{\beta } \ge \mu (E) - \mu \left\{ \log \frac{d \mu }{d \nu } {\gt} \beta \right\} \: . \end{align*}

LaTeX Lean

Lemma 4.1.183 points change of measure

Let \(\mu , \nu , \xi \in \mathcal P(\mathcal X)\) and let \(E\) be an event on \(\mathcal X\). Let \(\beta _1, \beta _2 \in \mathbb {R}\). Then

\begin{align*} \mu (E) e^{\beta _1} + \nu (E^c) e^{\beta _2} \ge 1 - \xi \left\{ \log \frac{d \xi }{d \mu } {\gt} \beta _1 \right\} - \xi \left\{ \log \frac{d \xi }{d \nu } {\gt} \beta _2 \right\} \: . \end{align*}

LaTeX Lean

Lemma 4.1.15Change of measure - functions

Let \(\mu , \nu \) be two measures on \(\mathcal X\) with \(\mu \ll \nu \) and let \(f : \mathcal X \to [0,1]\) be a measurable function. Let \(\beta \in \mathbb {R}\). Then

\begin{align*} \nu [f] e^{\beta } \ge \mu [f] - \mu \left\{ \log \frac{d \mu }{d \nu } {\gt} \beta \right\} \: . \end{align*}

LaTeX

Lemma 4.1.21Change of measure - variance

Let \(\mu , \nu \) be two measures on \(\mathcal X\) such that \(\mu \left[\left(\log \frac{d \mu }{d \nu }\right)^2\right] {\lt} \infty \). Let \(E\) be an event on \(\mathcal X\) and let \(\beta {\gt} 0\). Then

\begin{align*} \nu (E) e^{\operatorname{KL}(\mu , \nu ) + \sqrt{\operatorname{Var}_\mu [\log \frac{d \mu }{d \nu }]\beta }} \ge \mu (E) - \frac{1}{\beta } \: . \end{align*}

LaTeX

Lemma 4.3.1

For \(\mu , \nu \in \mathcal P(\mathcal X)\) with \(\mu \ll \nu \) and \(n \in \mathbb {N}\), \(\nu ^{\otimes \mathbb {N}}_{| \mathcal F_n}\)-almost surely, \(\frac{d \mu ^{\otimes \mathbb {N}}_{| \mathcal F_n}}{d \nu ^{\otimes \mathbb {N}}_{| \mathcal F_n}}(x) = \prod _{m=1}^n \frac{d \mu }{d \nu }(x_m)\).

LaTeX

Lemma 4.3.2

For \(\mu , \nu \in \mathcal P(\mathcal X)\) with \(\mu \ll \nu \), \(\nu _\tau \)-almost surely, \(\frac{d \mu _\tau }{d \nu _\tau }(x) = \prod _{n=1}^\tau \frac{d \mu }{d \nu }(x_n)\).

LaTeX

Lemma A.1.1

Let \(\mathcal X, \mathcal Y\) be measurable spaces, and let \(f : \mathcal X \to \mathcal M(\mathcal Y)\) such that for all measurable sets \(s\) of \(\mathcal X\), the function \(x \mapsto f(x)(s)\) is measurable. Then \(f\) is measurable.

LaTeX Lean

Lemma 3.1.1

For \(\mu \in \mathcal M(\mathcal X)\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) a Markov kernel,

\begin{align*} I(\mu \otimes \kappa ) = KL(\mu \otimes \kappa , \mu \times (\kappa \circ \mu )) = KL(\kappa , \kappa \circ \mu \mid \mu ) \: , \end{align*}

where in the conditional divergence the measure \(\kappa \circ \mu \) should be understood as the constant kernel from \(\mathcal X\) to \(\mathcal Y\) with that value.

LaTeX

Lemma 3.1.3

\begin{align*} I(\pi _\alpha \otimes P) = \operatorname{JS}_\alpha (\mu , \nu ) \: . \end{align*}

LaTeX

Lemma 3.1.2

\(I(\rho _\leftrightarrow ) = I(\rho )\) .

LaTeX

Lemma 3.2.1

For \(\mu \in \mathcal P(\{ 0,1\} )\) and \(\kappa : \{ 0,1\} \rightsquigarrow \mathcal Y\),

\begin{align*} I_{\operatorname{KL}}^L(\mu , \kappa ) = (1 - \mu _0) R_{\mu _0}(\kappa _0, \kappa _1) \: . \end{align*}

LaTeX

Lemma 3.2.5

Let \(\pi \in \mathcal P(\Theta )\) and \(P : \Theta \rightsquigarrow \mathcal X\). Suppose that the loss \(\ell '\) of an estimation task with kernel \(P\) takes values in \([0,1]\). Then

\begin{align*} I_{D_f}^L(\pi , P) & \ge d_f(\mathcal R_\pi ^{d_\Theta }, \mathcal R_\pi ^P) \: , \\ I_{D_f}^R(\pi , P) & \ge d_f(\mathcal R_\pi ^P, \mathcal R_\pi ^{d_\Theta }) \: . \end{align*}

LaTeX

Lemma 3.2.2

For \(\mu \in \mathcal P(\mathcal X)\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\),

\begin{align*} I_{\operatorname{KL}}^R(\mu , \kappa ) = I(\mu \otimes \kappa ) \: . \end{align*}

LaTeX

Lemma B.2.2

Let \(\mu , \nu \) be two \(\sigma \)-finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. Let \(\mu \sqcap \nu \) denote the infimum of \(\mu \) and \(\nu \). Then

if \(\mu \perp \nu \) then \(\mu \otimes \kappa \perp \nu \otimes \eta \),
\(\mu \otimes \kappa \perp \nu \otimes \eta \iff (\mu \sqcap \nu ) \otimes \kappa \perp (\mu \sqcap \nu ) \otimes \eta \), and the same holds for any measure which is equivalent to \(\mu \sqcap \nu \), like \(\frac{d \mu }{d \nu } \cdot \nu \) ,
if \(\mu \otimes \kappa \perp \nu \otimes \eta \) then for \((\mu \sqcap \nu )\)-almost every \(x\), \(\kappa (x) \perp \eta (x)\).

LaTeX

Lemma 4.1.4

For probability measures,

\begin{align*} 1 - \operatorname{TV}(\mu , \nu ) \le e^{- C_1(\mu , \nu )} \: . \end{align*}

LaTeX

Lemma 2.6.6

The cumulant generating function of \(\log \frac{d\mu }{d\nu }\) under \(\nu \) is \(\alpha \mapsto (\alpha - 1) R_\alpha (\mu , \nu )\).

LaTeX Lean

Lemma 2.6.7

Set \(\alpha {\gt} 0\). If \(R_{1+\alpha }(\mu , \nu ) {\lt} \infty \), the cumulant generating function of \(\log \frac{d\mu }{d\nu }\) under \(\mu \) has value \(\alpha R_{1+\alpha }(\mu , \nu )\) at \(\alpha \).

LaTeX Lean

Lemma 4.1.23

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(E\) be an event on \(\mathcal X\). Let \(\alpha ,\beta {\gt} 0\). Then

\begin{align*} \nu (E) e^{R_{1+\alpha }(\mu , \nu ) + \beta } \ge \mu (E) - e^{-\alpha \beta } \: . \end{align*}

LaTeX Lean

Lemma 4.1.26

Let \(\mu , \nu , \xi \) be three probability measures on \(\mathcal X\) and let \(E\) be an event on \(\mathcal X\). Let \(\alpha , \beta \ge 0\). Then

\begin{align*} \mu (E) e^{R_{1+\alpha }(\xi , \mu ) + \beta } + \nu (E^c) e^{R_{1+\alpha }(\xi , \nu ) + \beta } \ge 1 - 2 e^{-\alpha \beta } \: . \end{align*}

LaTeX Lean

Lemma 4.1.22Change of measure - c.g.f.

For \(\mu , \nu \) finite measures and \(\alpha , \beta {\gt} 0\),

\begin{align*} \mu \left\{ \log \frac{d \mu }{d \nu } {\gt} R_{1+\alpha }(\mu , \nu ) + \beta \right\} \le e^{- \alpha \beta } \: . \end{align*}

LaTeX Lean

Lemma 2.6.14

Let \(\mu , \nu \) be two finite measures. Then \(\alpha \mapsto R_\alpha (\mu , \nu )\) is continuous on \([0, 1]\) and on \([0, \sup \{ \alpha \mid R_\alpha (\mu , \nu ) {\lt} \infty \} )\).

LaTeX

Lemma 2.6.9

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(E\) be an event. Let \(\mu _E\) and \(\nu _E\) be the two Bernoulli distributions with respective means \(\mu (E)\) and \(\nu (E)\). Then \(R_\alpha (\mu , \nu ) \ge R_\alpha (\mu _E, \nu _E)\).

LaTeX

Lemma 2.6.19

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\) and let \(\alpha \in (0, 1)\). Then

\begin{align*} (1 - \alpha ) R_\alpha (\mu , \nu ) = \inf _{\xi \in \mathcal P(\mathcal X)}\left( \alpha \operatorname{KL}(\xi , \mu ) + (1 - \alpha )\operatorname{KL}(\xi , \nu ) \right) \: . \end{align*}

The infimum is attained at \(\xi = \mu ^{(\alpha , \nu )}\).

LaTeX

Lemma 2.6.20

\begin{align*} (1 - \alpha ) R_\alpha (\mu , \nu ) = \inf _{\xi \in \mathcal P(\mathcal X)} \operatorname{KL}\left( \pi _\alpha \times \xi , \pi _\alpha \otimes P \right) \: . \end{align*}

The infimum is attained at \(\xi = \mu ^{(\alpha , \nu )}\).

LaTeX

Lemma 2.6.3

For \(\alpha \in (0,1)\cup (1, \infty )\) and finite measures \(\mu , \nu \), if \(\left(\frac{d \mu }{d \nu }\right)^\alpha \) is integrable with respect to \(\nu \) and \(\mu \ll \nu \) then

\begin{align*} R_\alpha (\mu , \nu ) = \frac{1}{\alpha - 1} \log \int _x \left(\frac{d \mu }{d \nu }(x)\right)^\alpha \partial \nu \: . \end{align*}

LaTeX Lean

Lemma 2.6.4

For \(\alpha \in (0,1)\cup (1, \infty )\) and finite measures \(\mu , \nu \), if \(\left(\frac{d \mu }{d \nu }\right)^\alpha \) is integrable with respect to \(\nu \) and \(\mu \ll \nu \) then

\begin{align*} R_\alpha (\mu , \nu ) = \frac{1}{\alpha - 1} \log \int _x \left(\frac{d \mu }{d \nu }(x)\right)^{\alpha - 1} \partial \mu \: . \end{align*}

LaTeX Lean

Lemma 2.7.1Hellinger and Rényi

Let \(\mu , \nu \) be two probability measures. Then \(R_{1/2}(\mu , \nu ) = -2\log (1 - \operatorname{H^2}(\mu , \nu ))\).

LaTeX

Lemma 2.6.13

Let \(\mu , \nu \) be two finite measures. Then \(\alpha \mapsto R_\alpha (\mu , \nu )\) is nondecreasing on \([0, + \infty )\).

LaTeX

Corollary 2.6.25

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\). Let \(n \in \mathbb {N}\) and write \(\mu ^{\otimes n}\) for the product measure on \(\mathcal X^n\) of \(n\) times \(\mu \). Then \(R_\alpha (\mu ^{\otimes n}, \nu ^{\otimes n}) = n R_\alpha (\mu , \nu )\).

LaTeX

Lemma 2.6.27Dummy lemma: Renyi properties

Dummy node to summarize properties of the Rényi divergence.

LaTeX

Lemma 2.6.5

For \(\alpha \in (0, 1)\) and finite measures \(\mu , \nu \) with \(\mu (\mathcal X) = \nu (\mathcal X)\),

\[ (1 - \alpha ) R_\alpha (\mu , \nu ) = \alpha R_{1 - \alpha }(\nu , \mu ) \: . \]

LaTeX Lean

Lemma 2.6.11

Let \(\mu , \nu \) be two finite measures. \(R_1(\mu , \nu ) = \lim _{\alpha \uparrow 1} R_\alpha (\mu , \nu )\).

LaTeX

Lemma 2.6.12

Let \(\mu , \nu \) be two finite measures such that there exists \(\alpha {\gt} 1\) with \(R_\alpha (\mu , \nu )\) finite. Then \(R_1(\mu , \nu ) = \lim _{\alpha \downarrow 1} R_\alpha (\mu , \nu )\).

LaTeX

Lemma 2.6.10

Let \(\mu , \nu \) be two finite measures. \(R_0(\mu , \nu ) = \lim _{\alpha \downarrow 0} R_\alpha (\mu , \nu )\).

LaTeX

Lemma 2.6.2

For \(\alpha \in [0, 1)\) and finite measures \(\mu , \nu \),

\[ R_\alpha (\mu , \nu ) = \infty \iff \mu \perp \nu \: . \]

LaTeX Lean

Lemma 2.6.1

For \(\mu \) a sigma-finite measure and \(\nu \) a finite measure

\[ R_0(\mu , \nu ) = - \log (\nu \{ x \mid \frac{d \mu }{d \nu }(x) {\gt} 0\} ) \: . \]

LaTeX Lean

Lemma 2.6.15

For \(\alpha \in (0,1)\), \(\mu ^{(\alpha , \nu )} \ll \mu \) and \(\mu ^{(\alpha , \nu )} \ll \nu \).

LaTeX

Lemma 4.3.4

For \(\mu , \nu \) two probability measures on \(\mathcal X\) and \(\alpha \in (0,1)\), \((\mu ^{(\alpha , \nu )})_\tau = (\mu _\tau )^{(\alpha , \nu _\tau )}\).

LaTeX

Lemma 1.1.18

For \(\kappa : \mathcal X \rightsquigarrow \mathcal X'\) and \(\eta : \mathcal X' \rightsquigarrow \mathcal X''\) two Markov kernels, \(I^P_\pi (\eta \circ \kappa ) = I^P_\pi (\kappa ) + I^{\kappa \circ P}_\pi (\eta )\) .

LaTeX Lean

Lemma 1.1.19Data-processing inequality

For any measurable space \(\mathcal X\), let \(d_{\mathcal X} : \mathcal X \rightsquigarrow *\) be the Markov kernel to the point space. For all Markov kernels \(\kappa : \mathcal X \rightsquigarrow \mathcal X'\),

\begin{align*} I_\pi ^P(d_{\mathcal X}) \ge I_\pi ^{\kappa \circ P}(d_{\mathcal X'}) \: . \end{align*}

LaTeX Lean

Lemma 1.1.17

For \(\kappa \) a Markov kernel, \(I^P_\pi (\kappa ) \ge 0\) .

LaTeX

Lemma A.7.7

Let \(\mu \in \mathcal M(\mathcal X)\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) such that \(\kappa _\mu ^\dagger \) exists and there exists \(\nu \in \mathcal M(\mathcal X)\) such that \(\kappa _\mu ^\dagger \). Then for \(\mu \)-almost all \(x\) and \((\kappa \circ \mu )\)-almost all \(y\),

\begin{align*} \frac{d \kappa _\mu ^\dagger (y)}{d \mu }(x) = \frac{d \kappa (x)}{d(\kappa \circ \mu )}(y) \: . \end{align*}

LaTeX

Lemma B.2.6

Let \(\mu , \nu , \xi \) be \(\sigma \)-finite measures on \(\mathcal X\).

If \(\mu \ll \nu \) then \(\xi \)-almost surely, \(\frac{d \mu }{d \xi } = \frac{d \mu }{d \nu } \frac{d \nu }{d \xi }\).
If \(\nu \ll \xi \) then \(\nu \)-almost surely, \(\frac{d \mu }{d \xi } = \frac{d \mu }{d \nu } \frac{d \nu }{d \xi }\).

LaTeX

Lean

MeasureTheory.Measure.rnDeriv_mul_rnDeriv
MeasureTheory.Measure.rnDeriv_mul_rnDeriv'

Lemma B.3.2

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be finite kernels with \(\kappa (x) \ll \eta (x)\) \(\nu \)-a.e.. Let \(\mathcal B\) be the sigma-algebra on \(\mathcal X \times \mathcal Y\) obtained by taking the comap of the sigma-algebra of \(\mathcal Y\) by the projection. Then for \((\nu \otimes \eta )\)-almost every \((x,y)\),

\begin{align*} \frac{d(\kappa \circ \mu )}{d(\eta \circ \nu )}(y) & = (\nu \otimes \eta )\left[ \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )} \mid \mathcal B \right](x,y) \: . \end{align*}

LaTeX Lean

Lemma B.3.3 [ Csi63 ]

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a finite kernel. Let \(\mathcal B\) be the sigma-algebra on \(\mathcal X \times \mathcal Y\) obtained by taking the comap of the sigma-algebra of \(\mathcal Y\) by the projection. Then for \((\nu \otimes \kappa )\)-almost every \((x,y)\),

\begin{align*} \frac{d(\kappa \circ \mu )}{d(\kappa \circ \nu )}(y) & = (\nu \otimes \kappa )\left[ (x, y) \mapsto \frac{d \mu }{d \nu }(x) \mid \mathcal B \right](x,y) \: . \end{align*}

LaTeX Lean

Lemma B.2.11

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Then for \((\nu \otimes \eta )\)-almost all \((x, y)\),

\begin{align*} \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )}(x,y) = \frac{d\mu }{d\nu }(x)\frac{d \kappa }{d \eta }(x,y) \: . \end{align*}

This implies that the equality is true for \(\nu \)-almost all \(x\), for \(\eta (x)\)-almost all \(y\).

LaTeX Lean

Lemma B.2.8

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Let \(\mu ' = \left(\frac{\partial \mu }{\partial \nu }\right) \cdot \nu \) and \(\kappa ' = \left(\frac{\partial \kappa }{\partial \eta }\right) \cdot \eta \). Then for \((\nu \otimes \eta )\)-almost all \(z\), \(\frac{\partial (\mu ' \otimes \kappa ')}{\partial (\nu \otimes \eta )}(z) = \frac{\partial (\mu \otimes \kappa )}{\partial (\nu \otimes \eta )}(z)\).

LaTeX Lean

Lemma B.2.4

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. Let \(\mu _{\parallel \nu } = \left(\frac{\partial \mu }{\partial \nu }\right) \cdot \nu \). Then for \((\nu \otimes \eta )\)-almost all \(z\), \(\frac{d (\mu _{\parallel \nu } \otimes \kappa )}{d (\nu \otimes \eta )}(z) = \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )}(z)\).

LaTeX Lean

Lemma B.3.1

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \), \(g : \mathcal X \to \mathcal Y\) a measurable function and denote by \(g^* \mathcal Y\) the comap of the \(\sigma \)-algebra on \(\mathcal Y\) by \(g\). Then \(\nu \)-almost everywhere,

\begin{align*} \frac{d g_*\mu }{d g_*\nu }(g(x)) = \nu \left[ \frac{d \mu }{d \nu } \mid g^* \mathcal Y\right](x) \: . \end{align*}

LaTeX Lean

Lemma B.4.2

\begin{align*} \frac{d g_*\mu }{d g_*\nu }(g(x)) = \frac{d \mu _{| g^* \mathcal Y}}{d \nu _{| g^* \mathcal Y}}(x) \: . \end{align*}

LaTeX

Lemma 2.6.16

\(\frac{d \mu ^{(\alpha , \nu )}}{d \nu } = \left(\frac{d\mu }{d\nu }\right)^\alpha e^{-(\alpha - 1) R_\alpha (\mu , \nu )}\) , \(\nu \)-a.e., and \(\frac{d \mu ^{(\alpha , \nu )}}{d \mu } = \left(\frac{d\nu }{d\mu }\right)^{1 - \alpha } e^{-(\alpha - 1) R_\alpha (\mu , \nu )}\) , \(\mu \)-a.e..

LaTeX

Lemma B.4.1

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) with \(\mu \ll \nu \) and let \(\mathcal A\) be a sub-\(\sigma \)-algebra of \(\mathcal X\). Then \(\frac{d \mu _{| \mathcal A}}{d \nu _{| \mathcal A}}\) is \(\nu _{| \mathcal A}\)-almost everywhere (hence also \(\nu \)-a.e.) equal to \(\nu \left[ \frac{d \mu }{d \nu } \mid \mathcal A\right]\).

LaTeX Lean

Lemma B.1.1

Let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. If for some \(f\) and \(\xi \), \(\kappa = f \cdot \eta + \xi \) with \(\xi (x) \perp \eta (x)\) for all \(x\), then for all \(x\), \(f(x, y) = \frac{d \kappa (x)}{d \eta (x)}(y)\) for \(\eta (x)\)-almost all \(y \in \mathcal Y\).

LaTeX Lean

Lemma B.2.3

Let \(\mu , \nu \) be two \(\sigma \)-finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two s-finite kernels. We denote \(\frac{d\mu }{d\nu }\cdot \nu \) by \(\mu _{\parallel \nu }\). Then

\begin{align*} (\mu \otimes \kappa )_{\perp (\nu \otimes \eta )} = \mu _{\perp \nu } \otimes \kappa + (\mu _{\parallel \nu } \otimes \kappa )_{\perp (\mu _{\parallel \nu } \otimes \eta )} \: . \end{align*}

LaTeX

Lemma 2.3.71

On probability measures, the statistical information \(\mathcal I_\xi \) is an \(f\)-divergence for the function \(\phi _{\xi _0, \xi _1}\) of Definition 20.

LaTeX

Lemma 2.2.8

For finite measures \(\mu , \nu \) and \(\xi \in \mathcal M(\{ 0,1\} )\), for any measure \(\zeta \) with \(\mu \ll \zeta \) and \(\nu \ll \zeta \) ,

\begin{align*} \mathcal I_\xi (\mu , \nu ) & = - \frac{1}{2} \left\vert \xi _0\mu (\mathcal X) - \xi _1\nu (\mathcal X) \right\vert + \frac{1}{2} \zeta \left[x \mapsto \left\vert \xi _0\frac{d \mu }{d\zeta }(x) - \xi _1\frac{d \nu }{d\zeta }(x)\right\vert \right] \: . \end{align*}

This holds in particular for \(\zeta = P \circ \xi \).

LaTeX Lean

Lemma 2.2.11

For finite measures \(\mu , \nu \) and \(\xi \in \mathcal M(\{ 0,1\} )\),

\begin{align*} \mathcal I_\xi (\mu , \nu ) & = \min \{ \xi _0 \mu (\mathcal X), \xi _1 \nu (\mathcal X)\} - \inf _{E \text{ event}} \left( \xi _0 \mu (E) + \xi _1 \nu (E^c) \right) \: . \end{align*}

LaTeX Lean

Lemma 2.2.7

For finite measures \(\mu , \nu \) and \(\xi \in \mathcal M(\{ 0,1\} )\), for any measure \(\zeta \) with \(\mu \ll \zeta \) and \(\nu \ll \zeta \) ,

\begin{align*} \mathcal I_\xi (\mu , \nu ) & = \min \{ \xi _0\mu (\mathcal X), \xi _1\nu (\mathcal X)\} - \zeta \left[x \mapsto \min \left\{ \xi _0\frac{d \mu }{d\zeta }(x), \xi _1\frac{d \nu }{d\zeta }(x)\right\} \right] \: . \end{align*}

This holds in particular for \(\zeta = P \circ \xi \).

LaTeX

Lean

ProbabilityTheory.statInfo_eq_min_sub_lintegral
ProbabilityTheory.statInfo_eq_min_sub_lintegral'
ProbabilityTheory.toReal_statInfo_eq_min_sub_integral

Lemma 2.2.12

For finite measures \(\mu , \nu \) and \(\xi \in \mathcal M(\{ 0,1\} )\),

\begin{align*} \mathcal I_\xi (\mu , \nu ) & = - \max \{ 0, \xi _1 \nu (\mathcal X) - \xi _0 \mu (\mathcal X) \} + \sup _{E \text{ event}} \left( \xi _1 \nu (E) - \xi _0 \mu (E) \right) \\ & = - \max \{ 0, \xi _0 \mu (\mathcal X) - \xi _1 \nu (\mathcal X) \} + \sup _{E \text{ event}} \left( \xi _0 \mu (E) - \xi _1 \nu (E) \right) \: . \end{align*}

LaTeX

Lemma 2.2.3

For \(\mu , \nu \in \mathcal M(\mathcal X)\), \(\mathcal I_\xi (\mu , \nu ) \le \min \{ \xi _0 \mu (\mathcal X), \xi _1 \nu (\mathcal X)\} \) .

LaTeX Lean

Lemma 2.2.2

For \(\mu , \nu \in \mathcal M(\mathcal X)\), \(\mathcal I_\xi (\mu , \nu ) \ge 0\) .

LaTeX

Lemma 2.2.13Dummy lemma: statInfo properties

Dummy node to summarize properties of the statistical information.

LaTeX

Lemma 2.2.1

For \(\mu \in \mathcal M(\mathcal X)\), \(\mathcal I_\xi (\mu , \mu ) = 0\) .

LaTeX Lean

Lemma 2.2.4

For \(\mu , \nu \in \mathcal M(\mathcal X)\) and \(\xi \in \mathcal M(\{ 0,1\} )\), \(\mathcal I_\xi (\mu , \nu ) = \mathcal I_{\xi _\leftrightarrow }(\nu , \mu )\) .

LaTeX Lean

Lemma 4.1.12

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\) and \(E\) an event. Let \(\alpha \in (0,1)\). Then

\begin{align*} \mu (E)^\alpha + \nu (E^c)^{1 - \alpha } \ge \exp \left(-(1 - \alpha ) R_{\alpha }(\mu , \nu )\right) \: . \end{align*}

LaTeX

Lemma 4.1.27

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\) and let \(E\) be an event on \(\mathcal X\). Let \(\alpha {\gt} 0\). Then

\begin{align*} \mu (E) + \nu (E^c) \ge \frac{1}{2}\exp \left( - C_{1+\alpha }(\mu , \nu ) - \frac{\log 4}{\alpha }\right) \: . \end{align*}

LaTeX

Lemma 4.1.28

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\), let \(n \in \mathbb {N}\) and let \(E\) be an event on \(\mathcal X^n\). For all \(\alpha {\gt} 0\),

\begin{align*} \mu ^{\otimes n}(E) + \nu ^{\otimes n}(E^c) \ge \frac{1}{2}\exp \left( - n C_{1+\frac{\alpha }{\sqrt{n}}}(\mu , \nu ) - \frac{\log 4}{\alpha }\sqrt{n}\right) \: . \end{align*}

LaTeX

Lemma 2.2.27

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(E\) be an event. Let \(\mu _E\) and \(\nu _E\) be the two Bernoulli distributions with respective means \(\mu (E)\) and \(\nu (E)\). Then \(\operatorname{TV}(\mu , \nu ) \ge \operatorname{TV}(\mu _E, \nu _E)\).

LaTeX

Lemma 2.3.72

On probability measures, the total variation distance \(\operatorname{TV}\) is an \(f\)-divergence for the function \(x \mapsto \frac{1}{2}\vert x - 1 \vert \).

LaTeX

Lemma 2.2.21Integral form of the total variation distance

For finite measures \(\mu , \nu \),

\begin{align*} \operatorname{TV}(\mu , \nu ) & = \nu \left[ x \mapsto \max \left\{ 0 , \frac{d \mu }{d\nu }(x) - 1 \right\} \right] + \mu _{\perp \nu }(\mathcal X) & \text{ if } \mu (\mathcal X) \le \nu (\mathcal X) \: , \\ \operatorname{TV}(\mu , \nu ) & = \nu \left[ x \mapsto \max \left\{ 0 , 1 - \frac{d \mu }{d\nu }(x) \right\} \right] & \text{ if } \mu (\mathcal X) \ge \nu (\mathcal X) \: . \end{align*}

LaTeX

Lemma 2.2.22

For finite measures \(\mu , \nu \),

\begin{align*} \operatorname{TV}(\mu , \nu ) & = -\frac{1}{2} \left\vert \mu (\mathcal X) - \nu (\mathcal X)\right\vert + \frac{1}{2}\left( \nu \left[ x \mapsto \left\vert \frac{d \mu }{d\nu }(x) - 1 \right\vert \right] + \mu _{\perp \nu }(\mathcal X)\right) \: . \end{align*}

LaTeX

Lemma 2.2.20

For finite measures \(\mu , \nu \), for any measure \(\zeta \) with \(\mu \ll \zeta \) and \(\nu \ll \zeta \) ,

\begin{align*} \operatorname{TV}(\mu , \nu ) & = - \frac{1}{2} \left\vert \mu (\mathcal X) - \nu (\mathcal X) \right\vert + \frac{1}{2} \zeta \left[x \mapsto \left\vert \frac{d \mu }{d\zeta }(x) - \frac{d \nu }{d\zeta }(x)\right\vert \right] \: . \end{align*}

This holds in particular for \(\zeta = P \circ \xi \).

LaTeX

Lemma 2.2.23

For finite measures \(\mu , \nu \),

\begin{align*} \operatorname{TV}(\mu , \nu ) & = \min \{ \mu (\mathcal X), \nu (\mathcal X)\} - \inf _{E \text{ event}} \left( \mu (E) + \nu (E^c) \right) \: . \end{align*}

LaTeX

Lemma 2.2.19

For finite measures \(\mu , \nu \), for any measure \(\zeta \) with \(\mu \ll \zeta \) and \(\nu \ll \zeta \) ,

\begin{align*} \operatorname{TV}(\mu , \nu ) & = \min \{ \mu (\mathcal X), \nu (\mathcal X)\} - \zeta \left[x \mapsto \min \left\{ \frac{d \mu }{d\zeta }(x), \frac{d \nu }{d\zeta }(x)\right\} \right] \: . \end{align*}

This holds in particular for \(\zeta = P \circ \xi \).

LaTeX

Lemma 2.2.25

Let \(\mathcal F = \{ f : \mathcal X \to \mathbb {R} \mid \Vert f \Vert _\infty \le 1\} \). Then for \(\mu , \nu \) finite measures with \(\mu (\mathcal X) = \nu (\mathcal X)\), \(\frac{1}{2} \sup _{f \in \mathcal F} \left( \mu [f] - \nu [f] \right) \le \operatorname{TV}(\mu , \nu )\).

LaTeX

Lemma 2.2.16

For \(\mu , \nu \in \mathcal M(\mathcal X)\), \(\operatorname{TV}(\mu , \nu ) \le \min \{ \mu (\mathcal X), \nu (\mathcal X)\} \) .

LaTeX Lean

Lemma 2.7.4

Let \(\mu , \nu \) be two probability measures. Then \(\operatorname{TV}(\mu , \nu ) \le \sqrt{\operatorname{H^2}(\mu , \nu )(2 - \operatorname{H^2}(\mu , \nu ))}\).

LaTeX

Lemma 2.2.15

For \(\mu , \nu \in \mathcal M(\mathcal X)\), \(\operatorname{TV}(\mu , \nu ) \ge 0\) .

LaTeX Lean

Lemma 2.2.18Dummy lemma: TV properties

Dummy node to summarize properties of the total variation distance.

LaTeX

Lemma 2.2.14

For \(\mu \in \mathcal M(\mathcal X)\), \(\operatorname{TV}(\mu , \mu ) = 0\) .

LaTeX Lean

Theorem 1.2.12

The Bayes risk of simple binary hypothesis testing for prior \(\xi \in \mathcal M(\{ 0,1\} )\) is

\begin{align*} \mathcal B_\xi (\mu , \nu ) = (P \circ \xi )\left[x \mapsto \min \left\{ \xi _0\frac{d \mu }{d(P \circ \xi )}(x), \xi _1\frac{d \nu }{d(P \circ \xi )}(x)\right\} \right] \: . \end{align*}

LaTeX

Lean

ProbabilityTheory.bayesBinaryRisk_eq_lintegral_min
ProbabilityTheory.toReal_bayesBinaryRisk_eq_integral_min

Theorem 2.3.68

Let \(\mu , \nu \in \mathcal P(\mathcal X)\). If \(f(1) = 0\),

\begin{align*} D_f(\mu , \nu ) & \ge f(1 + \operatorname{TV}(\mu , \nu )) + f(1 - \operatorname{TV}(\mu , \nu )) \: , \\ \text{ and } D_f(\mu , \nu ) & \ge (1 + \operatorname{TV}(\mu , \nu ))f\left(\frac{1}{1 + \operatorname{TV}(\mu , \nu )}\right) + (1 - \operatorname{TV}(\mu , \nu ))f\left(\frac{1}{1 - \operatorname{TV}(\mu , \nu )}\right) \: . \end{align*}

LaTeX

Theorem C.0.2

For \(f\) convex, \(f\left(\mu [g \mid m]\right) \le \mu [f \circ g \mid m]\).

LaTeX

Theorem 1.2.5Data-processing inequality

For \(\mu , \nu \in \mathcal M(\mathcal X)\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) a Markov kernel, \(\mathcal B_\xi (\kappa \circ \mu , \kappa \circ \nu ) \ge \mathcal B_\xi (\mu , \nu )\) .

LaTeX Lean

Theorem 1.1.4Data-processing inequality

For \(P : \Theta \rightsquigarrow \mathcal X\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal X'\) a Markov kernel, \(\mathcal R^{\kappa \circ P}_\pi \ge \mathcal R^{P}_\pi \) (where the estimation problems differ only in the kernel).

LaTeX Lean

Theorem 2.2.5Data-processing inequality

For \(\mu , \nu \in \mathcal M(\mathcal X)\), \(\xi \in \mathcal M(\{ 0,1\} )\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) a Markov kernel, \(\mathcal I_\xi (\kappa \circ \mu , \kappa \circ \nu ) \le \mathcal I_\xi (\mu , \nu )\) .

LaTeX Lean

Theorem A.6.1Disintegration in standard Borel spaces

If either \(\mathcal X\) is countable of \(\mathcal Y\) has a countably generated \(\sigma \)-algebra, and \(\mathcal Z\) is standard Borel, then for every s-finite kernel \(\kappa : \mathcal X \rightsquigarrow \mathcal Y \times \mathcal Z\), there exists a Markov kernel \(\kappa _{Z \mid Y} : \mathcal X \times \mathcal Y \rightsquigarrow \mathcal Z\) that disintegrates \(\kappa \).

LaTeX

Theorem 2.3.55

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. Then \(D_f(\kappa \circ \mu , \eta \circ \nu ) \le D_f(\mu \otimes \kappa , \nu \otimes \eta )\).

LaTeX Lean

Theorem 2.3.45Conditioning increases f-divergence

Let \(\mu \) be a measure on a standard Borel space \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, such that \(\kappa (x) \ne 0\) for all \(x\). Then \(D_f(\kappa \circ \mu , \eta \circ \mu ) \le D_f(\mu \otimes \kappa , \mu \otimes \eta ) = D_f(\kappa , \eta \mid \mu )\)

LaTeX Lean

Theorem 2.3.56Conditioning increases f-divergence

Let \(\mu \) be a finite measure \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. Then \(D_f(\kappa \circ \mu , \eta \circ \mu ) \le D_f(\mu \otimes \kappa , \mu \otimes \eta ) = D_f(\kappa , \eta \mid \mu )\) .

LaTeX Lean

Lemma 2.3.38Composition-product with a kernel

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then \(D_f(\mu \otimes \kappa , \nu \otimes \kappa ) = D_f(\mu , \nu )\).

LaTeX Lean

Theorem 2.3.64Joint convexity

The function \((\mu , \nu ) \mapsto D_f(\mu , \nu )\) is convex.

LaTeX

Theorem 2.3.46Data-processing

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel, where both \(\mathcal X\) and \(\mathcal Y\) are standard Borel. Then \(D_f(\kappa \circ \mu , \kappa \circ \nu ) \le D_f(\mu , \nu )\).

LaTeX Lean

Theorem 2.3.50Data-processing

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then \(D_f(\kappa \circ \mu , \kappa \circ \nu ) \le D_f(\mu , \nu )\).

LaTeX Lean

Theorem 2.3.41Data-processing

LaTeX Lean

Theorem 2.3.36 [ LV06 , Lie12 ]

\begin{align*} D_f(\mu , \nu ) = f(1) \nu (\mathcal X) + f’_+(1)(\mu (\mathcal X) - \nu (\mathcal X)) + \int _\pi I_\pi (\mu , \nu ) \partial \Gamma _f \: . \end{align*}

LaTeX

Theorem 2.3.35

For two finite measures \(\mu , \nu \in \mathcal M(\mathcal X)\),

\begin{align*} D_f(\mu , \nu ) = f(1) \nu (\mathcal X) + f’_+(1)(\mu (\mathcal X) - \nu (\mathcal X)) + \int _y D_{\phi _{1,y}}(\mu , \nu ) \partial \gamma _f \: . \end{align*}

LaTeX Lean

Theorem 2.3.67 [ SV16 ]

Suppose that \(f(1) = g(1) = 0\) and that \(f'(1) = g'(1) = 0\), and that \(g{\gt}0\) on \((0,1) \cup (1, +\infty )\). Suppose that the space \(\mathcal X\) has at least two disjoint non-empty measurable sets. Then

\begin{align*} \sup _{\mu , \nu \in \mathcal P(\mathcal X)} \frac{D_f(\mu , \nu )}{D_g(\mu , \nu )} = \sup _{t {\gt} 0, t \ne 1} \frac{f(t)}{g(t)} \: . \end{align*}

LaTeX

Theorem 2.3.43Marginals

Let \(\mu \) and \(\nu \) be two measures on \(\mathcal X \times \mathcal Y\) where \(\mathcal Y\) is standard Borel, and let \(\mu _X, \nu _X\) be their marginals on \(\mathcal X\). Then \(D_f(\mu _X, \nu _X) \le D_f(\mu , \nu )\). Similarly, for \(\mathcal X\) standard Borel and \(\mu _Y, \nu _Y\) the marginals on \(\mathcal Y\), \(D_f(\mu _Y, \nu _Y) \le D_f(\mu , \nu )\).

LaTeX Lean

Theorem 2.3.57Marginals

Let \(\mu \) and \(\nu \) be two measures on \(\mathcal X \times \mathcal Y\), and let \(\mu _X, \nu _X\) be their marginals on \(\mathcal X\). Then \(D_f(\mu _X, \nu _X) \le D_f(\mu , \nu )\). Similarly, for \(\mu _Y, \nu _Y\) the marginals on \(\mathcal Y\), \(D_f(\mu _Y, \nu _Y) \le D_f(\mu , \nu )\).

LaTeX

Lean

ProbabilityTheory.fDiv_fst_le'
ProbabilityTheory.fDiv_snd_le'

Theorem 2.3.42

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two Markov kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Then \(D_f(\mu , \nu ) \le D_f(\mu \otimes \kappa , \nu \otimes \eta )\).

LaTeX Lean

Theorem 2.3.54

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two Markov kernels. Then \(D_f(\mu , \nu ) \le D_f(\mu \otimes \kappa , \nu \otimes \eta )\).

LaTeX Lean

Theorem 2.3.40

Let \(\mu , \nu \in \mathcal M (\mathcal X)\) be two finite measures and let \(g : \mathcal X \to \mathcal Y\) be a measurable function. Then \(D_f(g_* \mu , g_* \nu ) \le D_f(\mu , \nu )\).

LaTeX Lean

Theorem 2.3.51

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\mathcal A\) be a sub-\(\sigma \)-algebra of \(\mathcal X\). Then \(D_f(\mu _{| \mathcal A}, \nu _{| \mathcal A}) \le D_f(\mu , \nu )\).

LaTeX Lean

Theorem 2.5.5Data-processing

Let \(\alpha {\gt} 0\), \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then \(\operatorname{H}_\alpha (\kappa \circ \mu , \kappa \circ \nu ) \le \operatorname{H}_\alpha (\mu , \nu )\).

LaTeX Lean

Theorem 2.3.31

If \(f\) and \(g\) are two Stieltjes functions with associated measures \(\mu _f\) and \(\mu _g\) and \(f\) is continuous on \([a, b]\), then

\begin{align*} \int _x f(x) \partial \mu _g = f(b)g(b) - f(a)g(a) - \int _x g(x) \partial \mu _f \: . \end{align*}

LaTeX

Theorem 1.1.11

When the generalized Bayes estimator is well defined, it is a Bayes estimator. The value of the Bayes risk with respect to the prior \(\pi \in \mathcal M(\Theta )\) is then

\begin{align*} \mathcal R^P_\pi = (P \circ \pi )\left[x \mapsto \inf _{z \in \mathcal Z} P_\pi ^\dagger (x) \left[\theta \mapsto \ell ’(y(\theta ), z)\right]\right] \: . \end{align*}

LaTeX

Lean

ProbabilityTheory.isBayesEstimator_of_isGenBayesEstimator
ProbabilityTheory.bayesRiskPrior_eq_of_hasGenBayesEstimator

Theorem 2.3.52

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\). Then \(\sup _{\mathcal A \text{ finite}} D_f(\mu _{| \mathcal A}, \nu _{| \mathcal A}) = D_f(\mu , \nu )\).

LaTeX

Theorem A.1.2

Two kernels \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) are equal iff for all measurable functions \(f : \mathcal Y \to \mathbb {R}_{+,\infty }\) and all \(x \in \mathcal X\),

\begin{align*} \int _y f(y) \partial \kappa (x) = \int _y f(y) \partial \eta (x) \: . \end{align*}

LaTeX

Theorem 2.4.13Chain rule, kernel version

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) two Markov kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Then \(\operatorname{KL}(\mu \otimes \kappa , \nu \otimes \eta ) = \operatorname{KL}(\mu , \nu ) + \operatorname{KL}(\kappa , \eta \mid \mu )\).

LaTeX Lean

Theorem 2.4.12

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) two Markov kernels. Then \(\operatorname{KL}(\mu \otimes \kappa , \nu \otimes \eta ) = \operatorname{KL}(\mu , \nu ) + \operatorname{KL}(\mu \otimes \kappa , \mu \otimes \eta )\).

LaTeX

Theorem 2.4.14Chain rule, with Bayesian inverse

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) two Markov kernels with Bayesian inverses \(\kappa _\mu ^\dagger \) and \(\eta _\nu ^\dagger \). Then

\begin{align*} \operatorname{KL}(\mu \otimes \kappa , \nu \otimes \eta ) = \operatorname{KL}(\kappa \circ \mu , \eta \circ \nu ) + \operatorname{KL}(\kappa _\mu ^\dagger , \eta _\nu ^\dagger \mid \kappa \circ \mu ) \: . \end{align*}

LaTeX

Theorem 2.4.6Data-processing

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then \(\operatorname{KL}(\kappa \circ \mu , \kappa \circ \nu ) \le \operatorname{KL}(\mu , \nu )\).

LaTeX Lean

Theorem 2.4.15Chain rule, product version

Let \(\mu , \nu \) be two finite measures on \(\mathcal X \times \mathcal Y\), where \(\mathcal Y\) is standard Borel. Then \(\operatorname{KL}(\mu , \nu ) = \operatorname{KL}(\mu _X, \nu _X) + \operatorname{KL}(\mu _{Y|X}, \nu _{Y|X} \mid \mu _X)\).

LaTeX Lean

Theorem 2.4.5Marginals

Let \(\mu \) and \(\nu \) be two measures on \(\mathcal X \times \mathcal Y\), and let \(\mu _X, \nu _X\) be their marginals on \(\mathcal X\). Then \(\operatorname{KL}(\mu _X, \nu _X) \le \operatorname{KL}(\mu , \nu )\). Similarly, for \(\mu _Y, \nu _Y\) the marginals on \(\mathcal Y\), \(\operatorname{KL}(\mu _Y, \nu _Y) \le \operatorname{KL}(\mu , \nu )\).

LaTeX

Lean

ProbabilityTheory.kl_fst_le
ProbabilityTheory.kl_snd_le

Theorem 2.4.18Tensorization - finite product

Let \(I\) be a finite index set. Let \((\mu _i)_{i \in I}, (\nu _i)_{i \in I}\) be probability measures on spaces \((\mathcal X_i)_{i \in I}\). Then

\begin{align*} \operatorname{KL}(\prod _{i \in I} \mu _i, \prod _{i \in I} \nu _i) = \sum _{i \in I} \operatorname{KL}(\mu _i, \nu _i) \: . \end{align*}

LaTeX Lean

Theorem 2.4.17Tensorization

For \(\mu _1\) a probability measure on \(\mathcal X\), \(\nu _1\) a finite measure on \(\mathcal{X}\) and \(\mu _2, \nu _2\) two probability measures on \(\mathcal Y\),

\begin{align*} \operatorname{KL}(\mu _1 \times \mu _2, \nu _1 \times \nu _2) = \operatorname{KL}(\mu _1, \nu _1) + \operatorname{KL}(\mu _2 \times \nu _2) \: . \end{align*}

LaTeX Lean

Theorem 4.3.3

For \(\mu , \nu \in \mathcal P(\mathcal X)\), \(\operatorname{KL}(\mu _\tau , \nu _\tau ) = \mu [\tau ] \operatorname{KL}(\mu , \nu )\).

LaTeX

Theorem 4.1.11

For \(\alpha , \beta \in (0, 1/2)\),

\begin{align*} \beta \log \frac{\alpha }{B_\alpha (\mu , \nu )} + (1 - \beta ) \log \frac{1 - \alpha }{1 - B_\alpha (\mu , \nu )} & \le (1 - \beta ) R_{\beta }(\mu , \nu ) \: . \end{align*}

As a consequence,

\begin{align*} \beta \log \frac{\alpha }{B_\alpha (\mu , \nu )} & \le (1 - \beta ) (R_{\beta }(\mu , \nu ) + \log 2) \: . \end{align*}

In particular, \(\log \frac{\alpha }{B_\alpha (\mu , \nu )} \le R_{1/2}(\mu , \nu ) + \log 2\) .

LaTeX

Theorem 3.1.4Data-processing

For \(\rho \in \mathcal M(\mathcal X \times \mathcal Y)\), \(\kappa : \mathcal X \rightsquigarrow \mathcal X'\) and \(\eta : \mathcal Y \rightsquigarrow \mathcal Y'\) two Markov kernels,

\begin{align*} I((\kappa \parallel \eta ) \circ \rho ) \le I(\rho ) \: . \end{align*}

LaTeX

Theorem 3.2.4

If the divergence \(D\) satisfies the data-processing inequality, then for all \(\mu \in \mathcal M(\mathcal X)\), \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) and all Markov kernels \(\eta : \mathcal Y \rightsquigarrow \mathcal Z\) then

\begin{align*} I_D^L(\mu , \eta \circ \kappa ) \le I_D^L(\mu , \kappa ) \: , \\ I_D^R(\mu , \eta \circ \kappa ) \le I_D^R(\mu , \kappa ) \: . \end{align*}

LaTeX

Theorem 2.6.21Chain rule

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two Markov kernels. Then \(R_\alpha (\mu \otimes \kappa , \nu \otimes \eta ) = R_\alpha (\mu , \nu ) + R_\alpha (\kappa , \eta \mid \mu ^{(\alpha , \nu )})\).

LaTeX

Theorem 2.6.22Chain rule, with Bayesian inverse

\begin{align*} R_\alpha (\mu \otimes \kappa , \nu \otimes \eta ) = R_\alpha (\kappa \circ \mu , \eta \circ \nu ) + R_\alpha (\kappa _\mu ^\dagger , \eta _\nu ^\dagger \mid (\kappa \circ \mu )^{(\alpha , \eta \circ \nu )}) \: . \end{align*}

LaTeX

Theorem 2.6.8Data-processing

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a Markov kernel. Then \(R_\alpha (\kappa \circ \mu , \kappa \circ \nu ) \le R_\alpha (\mu , \nu )\).

LaTeX Lean

Theorem 2.6.24Tensorization - finite product

Let \(I\) be a finite index set. Let \((\mu _i)_{i \in I}, (\nu _i)_{i \in I}\) be probability measures on measurable spaces \((\mathcal X_i)_{i \in I}\). Then \(R_\alpha (\prod _{i \in I} \mu _i, \prod _{i \in I} \nu _i) = \sum _{i \in I} R_\alpha (\mu _i, \nu _i)\).

LaTeX

Theorem 2.6.26Tensorization - countable product

Let \(I\) be a countable index set. Let \((\mu _i)_{i \in I}, (\nu _i)_{i \in I}\) be probability measures on measurable spaces \((\mathcal X_i)_{i \in I}\). Then \(R_\alpha (\prod _{i \in I} \mu _i, \prod _{i \in I} \nu _i) = \sum _{i \in I} R_\alpha (\mu _i, \nu _i)\).

LaTeX

Theorem 4.3.5

For \(\mu , \nu \) two probability measures on \(\mathcal X\) and \(\alpha \in (0,1)\), \(R_\alpha (\mu _\tau , \nu _\tau ) = \mu ^{\alpha , \nu }[\tau ] R_\alpha (\mu , \nu )\).

LaTeX

Theorem B.2.7Chain rule for Radon-Nikodym derivatives

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels with \(\mu \otimes \kappa \ll \mu \otimes \eta \). Then for \((\nu \otimes \eta )\)-almost all \((x,y)\),

\begin{align*} \frac{d(\mu \otimes \kappa )}{d(\nu \otimes \eta )}(x, y) = \frac{d \mu }{d \nu }(x) \frac{d(\mu \otimes \kappa )}{d(\mu \otimes \eta )}(x, y) \: . \end{align*}

LaTeX Lean

Theorem 2.2.9Integral form of the statistical information

For finite measures \(\mu , \nu \) and \(\xi \in \mathcal M(\{ 0,1\} )\),

\begin{align*} \mathcal I_\xi (\mu , \nu ) & = \nu \left[ x \mapsto \max \left\{ 0 , \xi _0\frac{d \mu }{d\nu }(x) - \xi _1 \right\} \right] + \xi _0 \mu _{\perp \nu }(\mathcal X) & \text{ if } \xi _0 \mu (\mathcal X) \le \xi _1 \nu (\mathcal X) \: , \\ \mathcal I_\xi (\mu , \nu ) & = \nu \left[ x \mapsto \max \left\{ 0 , \xi _1 - \xi _0\frac{d \mu }{d\nu }(x) \right\} \right] & \text{ if } \xi _0 \mu (\mathcal X) \ge \xi _1 \nu (\mathcal X) \: . \end{align*}

LaTeX

Lean

ProbabilityTheory.toReal_statInfo_eq_integral_max_of_le
ProbabilityTheory.toReal_statInfo_eq_integral_max_of_ge

Theorem 4.1.10Fano’s inequality, binary case

Let \(\mu , \nu \in \mathcal P(\mathcal X)\) and let \(\alpha \in (0, 1)\).

\begin{align*} h_2(\alpha ) - h_2(B_\alpha (\mu , \nu )) \le \operatorname{JS}_\alpha (\mu , \nu ) \: , \end{align*}

in which \(h_2: x \mapsto x\log \frac{1}{x} + (1 - x)\log \frac{1}{1 - x}\) is the binary entropy function.

LaTeX

Theorem 4.1.29

Let \(\mu , \nu \) be two probability measures on \(\mathcal X\) and let \((E_n)_{n \in \mathbb {N}}\) be events on \(\mathcal X^n\). For all \(\gamma \in (0,1)\),

\begin{align*} \limsup _{n \to +\infty } \frac{1}{n}\log \frac{1}{\gamma \mu ^{\otimes n}(E_n) + (1 - \gamma )\nu ^{\otimes n}(E_n^c)} \le C_1(\mu , \nu ) \: . \end{align*}

LaTeX

Theorem 2.2.17Data-processing inequality

For \(\mu , \nu \in \mathcal M(\mathcal X)\) and \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) a Markov kernel, \(\operatorname{TV}(\kappa \circ \mu , \kappa \circ \nu ) \le \operatorname{TV}(\mu , \nu )\) .

LaTeX Lean

Theorem 2.2.26

Let \(\mathcal F = \{ f : \mathcal X \to \mathbb {R} \mid \Vert f \Vert _\infty \le 1\} \). Then for \(\mu , \nu \) finite measures with \(\mu (\mathcal X) = \nu (\mathcal X)\), \(\operatorname{TV}(\mu , \nu ) = \frac{1}{2} \sup _{f \in \mathcal F} \left( \mu [f] - \nu [f] \right)\).

LaTeX

Theorem 2.2.24

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) with \(\mu (\mathcal X) \leq \nu (\mathcal X)\).
Then \(TV(\mu , \nu ) = \sup _{E \text{ event}} \left( \mu (E) - \nu (E) \right)\).

LaTeX