Lower bounds for hypothesis testing based on information theory

B Radon-Nikodym derivatives and disintegration

B.1 Kernel Radon-Nikodym derivative

Definition 49
#

Let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. The Radon-Nikodym derivative of \(\kappa \) with respect to \(\eta \), denoted by \(\frac{d \kappa }{d \eta }\), is a measurable function \(\mathcal X \times \mathcal Y \to \mathbb {R}_{+, \infty }\) with \(\kappa = \frac{d \kappa }{d \eta } \cdot \eta + \kappa _{\perp \eta }\), where for all \(x\), \(\kappa _{\perp \eta }(x) \perp \eta (x)\).

Lemma B.1.1

Let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. If for some \(f\) and \(\xi \), \(\kappa = f \cdot \eta + \xi \) with \(\xi (x) \perp \eta (x)\) for all \(x\), then for all \(x\), \(f(x, y) = \frac{d \kappa (x)}{d \eta (x)}(y)\) for \(\eta (x)\)-almost all \(y \in \mathcal Y\).

Proof

Let \(f\) and \(\xi \) be such that \(\kappa = f \cdot \eta + \xi \) with \(\xi (x) \perp \eta (x)\) for all \(x\). Then for all \(x \in \mathcal X\), \(\kappa (x) = f(x, \cdot ) \cdot \eta (x) + \xi (x)\) with \(\xi (x) \perp \eta (x)\). By the uniqueness result for the Radon-Nikodym derivative of measures, \(f(x, \cdot ) = \frac{d \kappa (x)}{d \eta (x)}\) almost everywhere.

Corollary B.1.2

For all \(x \in \mathcal X\), for \(\eta (x)\)-almost all \(y \in \mathcal Y\), \(\frac{d \kappa }{d \eta }(x, y) = \frac{d \kappa (x)}{d \eta (x)}(y)\).

Proof

Apply Lemma B.1.1.

B.2 Composition-product

B.2.1 Absolute Continuity

Let \(\mu , \nu \) be two \(\sigma \)-finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two s-finite kernels. Then

  • if \(\mu \otimes \kappa \ll \nu \otimes \eta \) then \(\mu \otimes \kappa \ll \mu \otimes \eta \),

  • if \(\mu \otimes \kappa \ll \nu \otimes \eta \) and \(\kappa (x) \ne 0\) for all \(x\) then \(\mu \ll \nu \),

  • if \(\mu \ll \nu \) and \(\mu \otimes \kappa \ll \mu \otimes \eta \) then \(\mu \otimes \kappa \ll \nu \otimes \eta \).

In particular,

  • if \(\kappa (x) \ne 0\) for all \(x\) then \(\mu \otimes \kappa \ll \nu \otimes \eta \iff \left( \mu \ll \nu \ \wedge \ \mu \otimes \kappa \ll \mu \otimes \eta \right)\) .

  • If \(\mu \ll \nu \) then \(\mu \otimes \kappa \ll \nu \otimes \eta \iff \mu \otimes \kappa \ll \mu \otimes \eta \) .

Proof
Lemma B.2.2

Let \(\mu , \nu \) be two \(\sigma \)-finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. Let \(\mu \sqcap \nu \) denote the infimum of \(\mu \) and \(\nu \). Then

  1. if \(\mu \perp \nu \) then \(\mu \otimes \kappa \perp \nu \otimes \eta \),

  2. \(\mu \otimes \kappa \perp \nu \otimes \eta \iff (\mu \sqcap \nu ) \otimes \kappa \perp (\mu \sqcap \nu ) \otimes \eta \), and the same holds for any measure which is equivalent to \(\mu \sqcap \nu \), like \(\frac{d \mu }{d \nu } \cdot \nu \) ,

  3. if \(\mu \otimes \kappa \perp \nu \otimes \eta \) then for \((\mu \sqcap \nu )\)-almost every \(x\), \(\kappa (x) \perp \eta (x)\).

Proof

First, let’s state two facts about mutually singular measures that we will use without proof:

  • \(\mu \perp (\nu + \nu ') \iff \mu \perp \nu \ \wedge \ \mu \perp \nu '\) ,

  • if \(\mu \ll \mu '\) and \(\mu ' \ll \mu \) then \(\mu \perp \nu \iff \mu ' \perp \nu \) .

1. Let \(s\) be a measurable set of \(\mathcal X\) such that \(\mu (s) = 0\) and \(\nu (s^c) = 0\). One can check that the sets \(s \times \mathcal Y\) and \(s^c \times \mathcal Y\) demonstrate \(\mu \otimes \kappa \perp \nu \otimes \eta \).

2. Write \(\mu = \frac{d \mu }{d \nu } \cdot \nu + \mu _{\perp \nu }\) and \(\nu = \frac{d \nu }{d \mu } \cdot \mu + \nu _{\perp \mu }\). Then

\begin{align*} \mu \otimes \kappa \perp \nu \otimes \eta & \iff \left(\frac{d \mu }{d \nu } \cdot \nu \right) \otimes \kappa \perp \left(\frac{d \nu }{d \mu } \cdot \mu \right) \otimes \eta \\ & \qquad \wedge \ \left(\frac{d \mu }{d \nu } \cdot \nu \right) \otimes \kappa \perp \nu _{\perp \mu } \otimes \eta \\ & \qquad \wedge \ \mu _{\perp \nu } \otimes \kappa \perp \left(\frac{d \nu }{d \mu } \cdot \mu \right) \otimes \eta \\ & \qquad \wedge \ \mu _{\perp \nu } \otimes \kappa \perp \nu _{\perp \mu } \otimes \eta \end{align*}

The three last lines are true since \(\mu \perp \nu _{\perp \mu }\) and \(\mu _{\perp \nu } \perp \nu \). Only the first line remains. It suffices then to prove that \(\mu \sqcap \nu \), \(\frac{d \mu }{d \nu } \cdot \nu \) and \(\frac{d \nu }{d \mu } \cdot \mu \) are equivalent.

TODO

3. By 2. it suffices to consider the case \(\nu = \mu \) and show that for \(\mu \)-almost every \(x\), \(\kappa (x) \perp \eta (x)\). Let \(s\) be a measurable set of \(\mathcal X \times \mathcal Y\) such that \((\mu \otimes \kappa )(s) = 0\) and \((\mu \otimes \eta )(s^c) = 0\). Then

\begin{align*} 0 & = (\mu \otimes \kappa )(s) \\ & = \int _{x} \kappa (x)(\{ y \mid (x, y) \in s\} ) \partial \mu \: . \end{align*}

Hence for \(\mu \)-almost all \(x\), \(\kappa (x)(\{ y \mid (x, y) \in s\} ) = 0\). Similarly, for \(\mu \)-almost all \(x\), \(\eta (x)(\{ y \mid (x, y) \in s^c\} ) = 0\). Since \(\{ y \mid (x, y) \in s^c\} = \{ y \mid (x, y) \in s\} ^c\), we have a measurable set witnessing \(\kappa (x) \perp \eta (x)\) for \(\mu \)-almost all \(x\).

B.2.2 Radon-Nikodym derivative and singular part

Lemma B.2.3

Let \(\mu , \nu \) be two \(\sigma \)-finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two s-finite kernels. We denote \(\frac{d\mu }{d\nu }\cdot \nu \) by \(\mu _{\parallel \nu }\). Then

\begin{align*} (\mu \otimes \kappa )_{\perp (\nu \otimes \eta )} = \mu _{\perp \nu } \otimes \kappa + (\mu _{\parallel \nu } \otimes \kappa )_{\perp (\mu _{\parallel \nu } \otimes \eta )} \: . \end{align*}
Proof
Lemma B.2.4
#

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels. Let \(\mu _{\parallel \nu } = \left(\frac{\partial \mu }{\partial \nu }\right) \cdot \nu \). Then for \((\nu \otimes \eta )\)-almost all \(z\), \(\frac{d (\mu _{\parallel \nu } \otimes \kappa )}{d (\nu \otimes \eta )}(z) = \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )}(z)\).

Proof
Lemma B.2.5

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a finite kernel. Then for \((\nu \otimes \kappa )\)-almost all \((x, y)\), \(\frac{d (\mu \otimes \kappa )}{d (\nu \otimes \kappa )}(x,y) = \frac{d\mu }{d\nu }(x)\).

Proof

We can suppose \(\mu \ll \nu \) without loss of generality (by Lemma B.2.4). That implies \(\mu \otimes \kappa \ll \nu \otimes \kappa \). It suffices to show that the integrals of the two functions agree on all sets in the \(\pi \)-system of products of measurable sets. Let \(s, t\) be two measurable sets of \(\mathcal X\) and \(\mathcal Y\) respectively.

\begin{align*} \int _{p \in s \times t} \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \kappa )}(p) \partial (\nu \otimes \kappa ) & = (\mu \otimes \kappa ) (s \times t) \: . \end{align*}
\begin{align*} \int _{p \in s \times t} \frac{d \mu }{d \nu }(p_1) \partial (\nu \otimes \kappa ) & = \int _{x \in s} \int _{y \in t} \frac{d \mu }{d \nu }(x) \partial \kappa (x) \partial \nu \\ & = \int _{x \in s} \frac{d \mu }{d \nu }(x) \kappa (x)(t) \partial \nu \\ & = \int _{x \in s} \kappa (x)(t) \partial \mu \\ & = (\mu \otimes \kappa ) (s \times t) \: . \end{align*}

Let \(\mu , \nu , \xi \) be \(\sigma \)-finite measures on \(\mathcal X\).

  1. If \(\mu \ll \nu \) then \(\xi \)-almost surely, \(\frac{d \mu }{d \xi } = \frac{d \mu }{d \nu } \frac{d \nu }{d \xi }\).

  2. If \(\nu \ll \xi \) then \(\nu \)-almost surely, \(\frac{d \mu }{d \xi } = \frac{d \mu }{d \nu } \frac{d \nu }{d \xi }\).

Proof
Theorem B.2.7 Chain rule for Radon-Nikodym derivatives

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels with \(\mu \otimes \kappa \ll \mu \otimes \eta \). Then for \((\nu \otimes \eta )\)-almost all \((x,y)\),

\begin{align*} \frac{d(\mu \otimes \kappa )}{d(\nu \otimes \eta )}(x, y) = \frac{d \mu }{d \nu }(x) \frac{d(\mu \otimes \kappa )}{d(\mu \otimes \eta )}(x, y) \: . \end{align*}
Proof

By the first point of Lemma B.2.6, \((\nu \otimes \eta )\)-almost surely,

\begin{align*} \frac{d(\mu \otimes \kappa )}{d(\nu \otimes \eta )} = \frac{d(\mu \otimes \kappa )}{d(\mu \otimes \eta )} \frac{d (\mu \otimes \eta )}{d (\nu \otimes \eta )} \end{align*}

Then, by Lemma B.2.5, \(\frac{d (\mu \otimes \eta )}{d (\nu \otimes \eta )}\) is almost everywhere equal to \(\frac{d\mu }{d\nu }\).

Lemma B.2.8
#

Let \(\mu , \nu \) be two measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Let \(\mu ' = \left(\frac{\partial \mu }{\partial \nu }\right) \cdot \nu \) and \(\kappa ' = \left(\frac{\partial \kappa }{\partial \eta }\right) \cdot \eta \). Then for \((\nu \otimes \eta )\)-almost all \(z\), \(\frac{\partial (\mu ' \otimes \kappa ')}{\partial (\nu \otimes \eta )}(z) = \frac{\partial (\mu \otimes \kappa )}{\partial (\nu \otimes \eta )}(z)\).

Proof

Let \(\mu \in \mathcal M(\mathcal X)\) be a finite measure and \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Then \((\mu \otimes \eta )\)-almost surely,

\begin{align*} \frac{d (\mu \otimes \kappa )}{d (\mu \otimes \eta )} = \frac{d \kappa }{d \eta } \: . \end{align*}
Proof

That last lemma is significant because it means that we can get a kind of kernel derivative without any assumptions on the spaces. As long as we are happy with \(\mu \)-almost sure statements, we can use \(\frac{d (\mu \otimes \kappa )}{d (\mu \otimes \eta )}\) (which is jointly measurable) instead of the possibly non-existent \(\frac{d \kappa }{d \eta }\).

Lemma B.2.10

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) be two \(\sigma \)-finite measures and let \(p\) be a predicate on \(\mathcal X\). If \(p\) is true \(\mu \)-almost surely, then for \(\nu \)-almost all \(x\), either \(\frac{d\mu }{d\nu }(x) = 0\) or \(p(x)\).

Proof

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be two finite kernels, with either \(\mathcal X\) countable or \(\mathcal{Y}\) countably generated. Then for \((\nu \otimes \eta )\)-almost all \((x, y)\),

\begin{align*} \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )}(x,y) = \frac{d\mu }{d\nu }(x)\frac{d \kappa }{d \eta }(x,y) \: . \end{align*}

This implies that the equality is true for \(\nu \)-almost all \(x\), for \(\eta (x)\)-almost all \(y\).

Proof

First, by Theorem B.2.7, for \((\nu \otimes \eta )\)-almost all \((x,y)\),

\begin{align*} \frac{d(\mu \otimes \kappa )}{d(\nu \otimes \eta )}(x, y) = \frac{d \mu }{d \nu }(x) \frac{d(\mu \otimes \kappa )}{d(\mu \otimes \eta )}(x, y) \: . \end{align*}

It remains to show that \(\frac{d \mu }{d \nu }(x) \frac{d(\mu \otimes \kappa )}{d(\mu \otimes \eta )}(x, y)\) is \((\nu \otimes \eta )\)-almost everywhere equal to \(\frac{d\mu }{d\nu }(x)\frac{d \kappa }{d \eta }(x,y)\). By Lemma B.2.9, \((\mu \otimes \eta )\)-almost surely,

\begin{align*} \frac{d \kappa }{d \eta } = \frac{d (\mu \otimes \kappa )}{d (\mu \otimes \eta )} \: . \end{align*}

We then use Lemma B.2.10 to turn that into a \((\nu \otimes \eta )\)-almost sure equality:

\begin{align*} \frac{d(\mu \otimes \eta )}{d(\nu \otimes \eta )}\frac{d \kappa }{d \eta } = \frac{d(\mu \otimes \eta )}{d(\nu \otimes \eta )}\frac{d (\mu \otimes \kappa )}{d (\mu \otimes \eta )} \: . \end{align*}

Finally, by Lemma B.2.5, \(\frac{d(\mu \otimes \eta )}{d(\nu \otimes \eta )} = \frac{d\mu }{d\nu }\) almost surely.

B.3 Composition

Lemma B.3.1
#

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \), \(g : \mathcal X \to \mathcal Y\) a measurable function and denote by \(g^* \mathcal Y\) the comap of the \(\sigma \)-algebra on \(\mathcal Y\) by \(g\). Then \(\nu \)-almost everywhere,

\begin{align*} \frac{d g_*\mu }{d g_*\nu }(g(x)) = \nu \left[ \frac{d \mu }{d \nu } \mid g^* \mathcal Y\right](x) \: . \end{align*}
Proof

We show that the integrals of the two functions agree on all \(g^* \mathcal Y\)-measurable sets. It suffices to show equality on all sets oc \(\mathcal X\) of the form \(g^{-1}(t)\) for \(t\) a measurable set of \(\mathcal Y\). TODO: also show measurability.

\begin{align*} \int _{x \in g^{-1}(t)}\frac{d g_*\mu }{d g_*\nu }(g(x)) \partial \nu & = \int _{y \in t}\frac{d g_*\mu }{d g_*\nu }(y) \partial (g_*\nu ) \\ & = g_*\mu (t) \: . \end{align*}
\begin{align*} \int _{x \in g^{-1}(t)}\nu \left[ \frac{d \mu }{d \nu } \mid g^* \mathcal Y\right](x) \partial \nu & = \int _{x \in g^{-1}(t)}\frac{d \mu }{d \nu }(x) \partial \nu \\ & = \mu (g^{-1}(t)) \\ & = g_*\mu (t) \: . \end{align*}
Lemma B.3.2

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \) and let \(\kappa , \eta : \mathcal X \rightsquigarrow \mathcal Y\) be finite kernels with \(\kappa (x) \ll \eta (x)\) \(\nu \)-a.e.. Let \(\mathcal B\) be the sigma-algebra on \(\mathcal X \times \mathcal Y\) obtained by taking the comap of the sigma-algebra of \(\mathcal Y\) by the projection. Then for \((\nu \otimes \eta )\)-almost every \((x,y)\),

\begin{align*} \frac{d(\kappa \circ \mu )}{d(\eta \circ \nu )}(y) & = (\nu \otimes \eta )\left[ \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )} \mid \mathcal B \right](x,y) \: . \end{align*}
Proof

Let \(\pi _Y : \mathcal X \times \mathcal Y \to \mathcal Y\) be the projection \(\pi _Y(x,y) = y\). Remark that \(\kappa \circ \mu = \pi _{Y*}(\mu \otimes \kappa )\) and similarly for \(\nu , \eta \). By Lemma B.3.1, \((\nu \otimes \eta )\)-almost everywhere,

\begin{align*} \frac{d(\kappa \circ \mu )}{d(\eta \circ \nu )}(y) & = \frac{d \pi _{Y*}(\mu \otimes \kappa )}{d \pi _{Y*}(\nu \otimes \eta )}(\pi _Y((x,y))) \\ & = (\nu \otimes \eta )\left[ \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \eta )} \mid \mathcal B\right](x,y) \: . \end{align*}
Lemma B.3.3 [ Csi63 ]

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \) and let \(\kappa : \mathcal X \rightsquigarrow \mathcal Y\) be a finite kernel. Let \(\mathcal B\) be the sigma-algebra on \(\mathcal X \times \mathcal Y\) obtained by taking the comap of the sigma-algebra of \(\mathcal Y\) by the projection. Then for \((\nu \otimes \kappa )\)-almost every \((x,y)\),

\begin{align*} \frac{d(\kappa \circ \mu )}{d(\kappa \circ \nu )}(y) & = (\nu \otimes \kappa )\left[ (x, y) \mapsto \frac{d \mu }{d \nu }(x) \mid \mathcal B \right](x,y) \: . \end{align*}
Proof

Apply Lemma B.3.2 to get, \((\nu \otimes \kappa )\)-almost everywhere,

\begin{align*} \frac{d(\kappa \circ \mu )}{d(\kappa \circ \nu )}(y) & = (\nu \otimes \kappa )\left[ \frac{d (\mu \otimes \kappa )}{d (\nu \otimes \kappa )} \mid \mathcal B\right](x,y) \: . \end{align*}

Finally, by Corollary B.2.5, \(\frac{d (\mu \otimes \kappa )}{d (\nu \otimes \kappa )} = \frac{d \mu }{d \nu }\) a.e.

B.4 Sigma-algebras

For \(\mathcal A\) a sub-\(\sigma \)-algebra and \(\mu \) a measure, we write \(\mathcal\mu _{| \mathcal A}\) for the measure restricted to the \(\sigma \)-algebra.

Lemma B.4.1

Let \(\mu , \nu \) be two finite measures on \(\mathcal X\) with \(\mu \ll \nu \) and let \(\mathcal A\) be a sub-\(\sigma \)-algebra of \(\mathcal X\). Then \(\frac{d \mu _{| \mathcal A}}{d \nu _{| \mathcal A}}\) is \(\nu _{| \mathcal A}\)-almost everywhere (hence also \(\nu \)-a.e.) equal to \(\nu \left[ \frac{d \mu }{d \nu } \mid \mathcal A\right]\).

Proof

The restriction \(\mu _{| \mathcal A}\) is the map of \(\mu \) by the identity, seen as a function from \(\mathcal X\) with its \(\sigma \)-algebra to \(\mathcal X\) with \(\mathcal{A}\). We can thus apply Lemma B.3.1.

Lemma B.4.2

Let \(\mu , \nu \in \mathcal M(\mathcal X)\) with \(\mu \ll \nu \), \(g : \mathcal X \to \mathcal Y\) a measurable function and denote by \(g^* \mathcal Y\) the comap of the \(\sigma \)-algebra on \(\mathcal Y\) by \(g\). Then \(\nu \)-almost everywhere,

\begin{align*} \frac{d g_*\mu }{d g_*\nu }(g(x)) = \frac{d \mu _{| g^* \mathcal Y}}{d \nu _{| g^* \mathcal Y}}(x) \: . \end{align*}
Proof

Combine Lemma B.3.1 and Lemma B.4.1.