4 Concentration inequalities
4.1 Sub-Gaussian random variables
A real valued random variable \(X\) is \(\sigma ^2\)-sub-Gaussian if for any \(\lambda \in \mathbb {R}\),
If \(X\) is \(\sigma _1^2\)-sub-Gaussian and \(Y\) is \(\sigma _2^2\)-sub-Gaussian, and \(X\) and \(Y\) are independent, then \(X + Y\) is \((\sigma _1^2 + \sigma _2^2)\)-sub-Gaussian.
For \(X\) a \(\sigma ^2\)-sub-Gaussian random variable, for any \(t \ge 0\),
Let \(X_1, \ldots , X_n\) be independent random variables such that \(X_i\) is \(\sigma _i^2\)-sub-Gaussian for \(i \in [n]\). Then for any \(t \ge 0\),
Let \(X_1, \ldots , X_n\) be random variables such that \(X_i - P[X_i]\) is \(\sigma _{X,i}^2\)-sub-Gaussian for \(i \in [n]\). Let \(Y_1, \ldots , Y_m\) be random variables such that \(Y_i - P[Y_i]\) is \(\sigma _{Y,i}^2\)-sub-Gaussian for \(i \in [m]\). Suppose further that the vectors \(X\) and \(Y\) are independent and that \(\sum _{i = 1}^m P[Y_i] \le \sum _{i = 1}^n P[X_i]\). Then
4.2 Concentration of the sums of rewards in bandit models
In the array model, for \(t \in \mathbb {N}\), the random variable \((N_{t,a}, S_{t, a})_{a \in \mathcal{A}}\) has the same distribution as \((N_{t,a}, \sum _{s=0}^{N_{t,a}-1} \omega _{2, s, a})_{a \in \mathcal{A}}\).
In the array model, for \(k \in \mathbb {N}\), the random variable \(\sum _{s=0}^{k-1} \omega _{2, s, a}\) has the same distribution as a sum of \(k\) i.i.d. random variables with law \(\nu (a)\).
By definition of \(P_{\mathcal{A}}\) (Definition 3.3).
In the array model, for \(t \in \mathbb {N}\), \(a \in \mathcal{A}\), and a measurable set \(B \subseteq \mathbb {N} \times \mathbb {R}\),
As a consequence, this also holds for any algorithm-environment sequence.
In the array model,
As a consequence, this also holds for any algorithm-environment sequence.
4.2.1 Sub-Gaussian rewards
TODO: extend this to sub-Gaussian with constant other than 1.
Let \(\nu (a)\) be a 1-sub-Gaussian distribution on \(\mathbb {R}\) for each arm \(a \in \mathcal{A}\).
Let \(\nu (a)\) be a 1-sub-Gaussian distribution on \(\mathbb {R}\) for each arm \(a \in \mathcal{A}\). Let \(c \ge 0\) be a real number and \(k\) a positive natural number. Then
The same upper bound holds for the upper tail: