B Proofs of results from Section 3
B.1 Proof of the lower bound Theorem 2
The proof of this inequality follows the standard bandit lower bound argument, which can be found for example in [ GMS19 ] . The Kullback-Leibler divergence between the observations up to \(T\) under models \(\mu \) and \(\lambda \) is \(\sum _{k=1}^K \mathbb {E}_\mu [N_{T,k}] \mathrm{KL}(\mu _k, \lambda _k)\). By the data processing inequality, this Kullback-Leibler divergence is larger than the KL between Bernoulli distributions of means \(\mathbb {P}_{\mu , \mathcal A}(E)\) and \(\mathbb {P}_{\lambda , \mathcal A}(E)\) for any event \(E\). We apply this to \(E = \{ \hat{i}_T = i^\star (\mu )\} \) to obtain
We use the inequality \(\mathrm{kl}(a, b) \ge a \log \frac{1}{b} - \log 2\), then \(\mathbb {P}_{\mu , \mathcal A}(\hat{i}_T = i^\star (\mu )) = 1 - p_{\mu , T}(\mathcal A)\), \(\mathbb {P}_{\lambda , \mathcal A}(\hat{i}_T = i^\star (\mu )) \le p_{\lambda , T}(\mathcal A)\) (since \(i^\star (\lambda ) \ne i^\star (\mu )\)) to get
By definition, \(p_{\lambda , T}(\mathcal A) = \exp (-T R_{H,T}(\mathcal A, \lambda )^{-1} H(\lambda )^{-1})\),. We get
Dividing by \(T H(\lambda )^{-1}\) and using \(H(\lambda ) \le \sqrt{T}\) gives the result.
B.2 Additional results
Let \(\mu \in \mathcal D\) and let \(\mathcal A\) be an algorithm with \(\lim _{T \to + \infty } p_{\mu , T}(\mathcal A) = 0\). Let \(D(\mu ) \subseteq \mathrm{Alt}(\mu )\) be a set such that \(\sup _{\lambda \in D(\mu )} H(\lambda ) {\lt} +\infty \). Then
For \(T\) large enough, we can apply Theorem 2 for any \(\lambda \in D(\mu )\), hence we can take an infimum over \(\lambda \in D(\mu )\) to get
Taking a limit when \(T \to +\infty \) and using \(\lim _{T \to + \infty } p_{\mu , T}(\mathcal A) = 0\), we get the inequality we want to prove.
For all \(x \in \mathbb {R}\), let \(\mathrm{Alt}_x(\mu ) = \mathrm{Alt}(\mu ) \cap \{ \lambda \in \mathcal D \mid H(\lambda ) \le x\} \). For all consistent algorithm families \(\mathcal A\),
Let \(\mu \in \mathcal D\) and \(x {\gt} 0\). We apply Theorem 10 to \(\mathrm{Alt}_{x}(\mu )\).
The left hand side is larger than \((\liminf _{T\to +\infty }\sup _{\lambda \in \mathcal D} R_{H,T}(\mathcal A, \lambda ))^{-1}\), which is now independent of \(\mu \) and \(x\). We then take on the right hand side first an infimum over \(\mu \), then a liminf over \(x\). Doing it in this order leads to the tighter bound (compared to \(\inf _\mu \liminf _x\)).