D Proofs of results from Section 5
D.1 Gaussian bandits
First, since \(\mathcal C^{sp} \subseteq \mathcal C\), for any algorithm \(\mathcal A\) and \(\mu \in \mathcal D\), \(R_{H_{\mathcal C}, T}(\mathcal A, \mu ) \ge R_{H_{\mathcal C^{sp}}, T}(\mathcal A, \mu )\). It suffices to give a lower bound for \(H_{\mathcal C^{sp}}\).
Let \(H_{\Delta }(\mu ) = \frac{2}{\min _{k : \Delta _k{\gt}0} \Delta _k^2} + \sum _{k:\Delta _k{\gt}0} \frac{2}{\Delta _k^2}\). It was shown in [ GK16 ] that for all \(\mu \in \mathcal D\), this function satisfies the inequalities \(H_\Delta (\mu ) \le H_{C^{sp}}(\mu ) \le 2 H_\Delta (\mu )\) . Thus \(R_{H_{\mathcal C}, T}(\mathcal A, \mu ) \ge R_{H_\Delta , T}(\mathcal A, \mu ) / 2\). From this point on, we use a construction similar to the one used in [ CL16 ] to prove a lower bound on the ratio to \(H_\Delta \) for Bernoulli bandits. We define a Gaussian problem \(\mu \) by \(\mu _1 = 0\) (or any arbitrary value) and \(\mu _k = \mu _1 - k \Delta \) for all \(k \in \{ 2, \ldots , K\} \) and some arbitrary \(\Delta {\gt} 0\). We apply Corollary 1 to \(\mu \) and \(\lambda ^{(2)}, \ldots , \lambda ^{(K)}\) where each \(\lambda ^{(j)}\) is identical to \(\mu \) except that \(\lambda ^{(j)}_j = \mu _1 + (\mu _1 - \mu _j)\).
For our specific choice of \(\lambda ^{(j)}\),
We now use that \(\mu _k = \mu _1 - k \Delta \).
We finally have the lower bound
D.2 Bernoulli bandits
We consider the best arm identification task in bandits with two arms, both in the same exponential family with one parameter. Two distributions in that family with means \(\mu _1, \mu _2\) correspond to some natural parameters \(\xi _1, \xi _2\) and the Kullback-Leibler divergence can be written
where \(\phi : \mathbb {R} \to \mathbb {R}\) is a convex function specific to the exponential family and \(d\) is its Bregman divergence. The mean parameter \(\mu _1\) and the corresponding natural parameter \(\xi _1\) are related by the equation \(\phi '(\xi _1) = \mu _1\) (or \(\xi _1 = \phi '^{-1}(\mu _1)\) since \(\phi '\) is invertible). In that setting, we want to compute
In the one-parameter exponential family setting described above,
The infimum in the definition of the difficulty is attained for any \(\omega \) at \(x(\omega ) = \phi '(\omega _1 \xi _1 + \omega _2 \xi _2)\). The maximum over the simplex is attained at \(\omega ^\star \) such that \(\omega ^\star _1 = \frac{\phi '^{-1}(x^\star ) - \xi _2}{\xi _1 - \xi _2}\), with \(x^\star = x(\omega ^\star ) = \frac{\phi (\xi _1) - \phi (\xi _2)}{\xi _1 - \xi _2}\).
We parametrize by the natural parameters:
The optimality condition for \(y\) is \(\omega _1 \frac{\partial }{\partial y}d(\xi _1, y) + \omega _2 \frac{\partial }{\partial y}d(\xi _2, y) = 0\). That derivative is \(\frac{\partial }{\partial y}d(x, y) = - (x - y)\phi ''(y)\). We obtain
We note for later the property
We now want to compute
At the optimal value for \(\omega \) the gradient is zero:
We use Equation ?? to get that \(\omega _2 \frac{\partial }{\partial y} d(\xi _2, \omega _1 \xi _1 + (1 - \omega _1) \xi _2) = - \omega _1 \frac{\partial }{\partial y} d(\xi _1, \omega _1 \xi _1 + (1 - \omega _1) \xi _2)\). We simplify the equation to
We expand the Bregman divergence.
Solving this equation for \(y\) also gives the value of \(\omega \) thanks to \(y = \omega _1 \xi _1 + (1 - \omega _1) \xi _2\). We get \(\omega _1 = \frac{y - \xi _2}{\xi _1 - \xi _2}\), and \(y\) is given by the equation above. The value of the objective is then
But we can simplify this further since \(d(\xi _1, y) = \mathrm{KL}(\phi '(y), \mu _1)\) (also equal to \(\mathrm{KL}(\phi '(y), \mu _2)\)).
If the distributions with parameters \(\mu _1\) and \(\mu _2\) are \(\sigma ^2\)-sub-Gaussian, then
For an exponential family of Gaussians with same variance \(\sigma ^2\) there is equality, and the two terms of the maximum are equal.
Gaussian case
For Gaussian distributions, the functions used above are
\(\phi (a) = \frac{1}{2} \sigma ^2 a^2\) with \(\phi '(a) = a \sigma ^2\), \(\phi '^{-1}(x) = x / \sigma ^2\), \(\phi (\phi '^{-1}(x)) = \frac{1}{2 \sigma ^2} x^2\)
\(d(a, b) = \sigma ^2(\frac{1}{2} a^2 - \frac{1}{2} b^2 - (a - b) b) = \frac{1}{2}\sigma ^2(a - b)^2\)
\(\mathrm{KL}(x, y) = \frac{1}{2 \sigma ^2}(x - y)^2\).
Using these values in Lemma 7 gives a static proportions difficulty equal to the inverse of \(\frac{1}{8 \sigma ^2}(\mu _1 - \mu _2)^2\).
Bernoulli case
For Bernoulli distributions, the functions used above are
\(\phi (a) = \log (1 + e^a)\) with \(\phi '(a) = \frac{e^a}{1 + e^a}\), \(\phi '^{-1}(x) = \log \frac{x}{1 - x}\), \(\phi (\phi '^{-1}(x)) = - \log (1 - x)\)
\(d(a, b) = \log (1 + e^a) - \log (1 + e^b) - (a - b) \frac{e^b}{1 + e^b}\)
\(\mathrm{KL}(x, y) = x \log \frac{x}{y} + (1-x) \log \frac{1-x}{1-y}\).
Using these values in Lemma 7 proves Lemma 1.
We gather now a few limits, which will be useful in the proof of Theorem 8. These results use the explicit formulas for \(H_{\mathcal C^{sp}}\) derived above.
For \(x \in (0,1/2)\), let \(\mu (x) = (x(1+x), x)\), \(\lambda ^{(1)}(x) = (x/2, x)\), \(\lambda ^{(2)}(x) = (x(1+x), 1/2)\). Then Corollary 1 gives
The limit of the quantity on the right when \(x \to 0\) is strictly greater than 1 (it is approximately 1.22).