Introduction
The goal of this project is to formalize information divergences between probability measures, as well as results about error bounds for (sequential) hypothesis testing. This is not an introductory text and we will not explain why the objects described here are useful, where they come from or what applications the results might have. For all those considerations, we refer the reader to the recent book [ PW24 ] .
Our goal is to compile properties of information divergences and their uses in hypothesis testing. Most software libraries have tutorials, to learn how to use them, as well as a documentation listing all the available objects and methods of the library: in this project, we write the “documentation” of a part of the “information theory library”.
This document and the Lean implementation
This document is generated from a latex file written by the authors, thanks to the leanblueprint library developed by Patrick Massot (https://github.com/PatrickMassot/leanblueprint). It contains links to a Lean implementation of the results, but there is no technical guarantee that the results written here and the ones in the code match in any way. In particular, nothing in this document is automatically generated from the code. In summary, although the authors strive to make this document as close to the Lean implementation as possible, the only formally verified statements are those written in Lean.
Assumptions
Information divergences are usually considered for probability measures, but we will be more granular about the assumptions we impose on the measures. Since Radon-Nikodym derivatives don’t have good properties unless the measures are sigma-finite and since those derivatives are the basic building block of all the information divergences we consider, every measure we consider will be supposed sigma-finite. Many results additionally need that the measures are finite, and some will be about probability measures specifically.
In most results, the measures can be defined on a general measurable space, without additional assumptions. However some statements, notably those involving conditional divergences, will require the sigma-algebra to be countably generated or the space to be standard Borel. The countably generated assumption means that there is a countable set of sets that generates the sigma-algebra. This is the case for example in any standard Borel space, which covers the vast majority of applications.
Notation
For a function
Some references
A very useful book (draft) about information theory and hypothesis testing: [ PW24 ]
Main reference for the properties of the Rényi divergence: [ VEH14 ]