3 Lower bounds on the difficulty ratio
Most of the bounds on the difficulty ratio we derive are consequences of the following theorem.
Let \(H : \mathcal D \to \mathbb {R}^+\) be an arbitrary difficulty function. Let \(\mu , \lambda \in \mathcal D\) be such that \(i^*(\lambda ) \ne i^*(\mu )\) and \(H(\lambda ) \le \sqrt{T}\). Then for any algorithm \(\mathcal A\),
The proof of this inequality follows the standard bandit lower bound argument, appealing to the data processing inequality for the KL divergence, which can be found for example in [ GMS19 ] . The proof is in Appendix B. The only mildly original step is to put \(H(\lambda )\) on the right of the inequality instead of writing a lower bound on \(p_{\lambda , T}(\mathcal A)\) (which would give a bound akin to Lemma 6 of [ BGS22 ] when taking the limit as \(T \to + \infty \)).
For any consistent algorithm family \(\mathcal A\), for all \(\mu \in \mathcal D\) and all sets \(D(\mu ) \subseteq \mathrm{Alt}(\mu )\),
Let \(\mu \in \mathcal D\). Since \(\triangle _K\) is compact, the sequence \((\mathbb {E}_{\mu , \mathcal A}[N_{T}/T])_{T \in \mathbb {N}}\) has a subsequence indexed by some \((T_n)_{n \in \mathbb {N}}\) which converges to a vector \(\omega _{\mu } \in \triangle _K\). Let \(\lambda \in \mathrm{Alt}(\mu )\). Theorem 2 gives, for \(n\) large enough,
Since \(\mathcal A\) is consistent, \(1 - p_{\mu , T_n}(\mathcal A) \to 1\). Taking a limit as \(n \to +\infty \), we have
That bound on the liminf of a subsequence gives a bound on the liminf of the whole sequence. We finally take an infimum over \(\lambda \in D(\mu )\) on both sides of the inequality, and replace \(\omega _{\mu }\) by a maximum over the simplex. We proved the first statement. The second inequality is obtained by choosing \(D(\mu ) = \mathrm{Alt}(\mu )\) and taking an infimum over \(\mu \in \mathcal D\).
The second inequality of Theorem 3 recovers Theorem 1 of [ KTH22 ] , which they prove differently: they introduce typical concentration events, reduce the study to those events and use a change of measure. Their proof does not give an explicit non-asymptotic version of the bound, unlike Theorem 2. In contrast, our short proof is a direct application of the data processing inequality for the KL divergence.
Instead of an inequality on the supremum of the limsup of \(R_{H, T}(\mathcal A, \mu )\) as in Theorem 3, we can also get a bound on the liminf of the supremum of \(R_{H, T}(\mathcal A, \mu )\) over sets with bounded \(H\). See Theorem 10 in Appendix B. We will use Theorem 3 in order to describe the asymptotic difficulty of fixed budget identification. We could derive bounds for a fixed \(T\) by using Theorem 2 instead, at the cost of second order terms and restrictions of the alternative to problems with \(H\) bounded by \(\sqrt{T}\), that is to problems which are not too hard at time \(T\).
Let \(\mu , \lambda ^{(1)}, \ldots , \lambda ^{(K)}\) be such that for all \(j \in [K]\), \(i^\star (\lambda ^{(j)}) \ne i^\star (\mu )\), \(H(\lambda ^{(j)}){\gt}0\), and each \(\lambda ^{(j)}\) differ from \(\mu \) only along coordinate \(j\). Then for all algorithms \(\mathcal A\) such that \(\lim _{T \to +\infty } p_{\mu , T}(\mathcal A) = 0\) ,
We apply the first inequality of Theorem 3 with \(D(\mu ) = \{ \lambda ^{(1)}, \ldots , \lambda ^{(K)}\} \).
The optimal \(\omega \) equalizes \(H(\lambda ^{(j)}) \omega _j \mathrm{KL}(\mu _j, \lambda _j^{(j)})\) for all \(j\), which gives the result.
The sum on the right hand side of Corollary 1 is very close to the quantity \(h^*\) defined in [ CL16 ] in the setting of Bernoulli bandits with \(H\) the sum of inverse squared gaps. This is due to the similar construction of a set of points in the alternative that each differ from a given \(\mu \in \mathcal D\) in one coordinate only. That construction was reused by [ AKK\(^{+}\)21 ] to get a bound on a quantity called expected policy regret and by [ YT22 ] to prove a lower bound for fixed budget BAI in linear bandits.
The main advantage of Corollary 1 is that it is simpler to use than Theorem 10, but it can lead to worse bounds. For example in BAI in two-arms Gaussian bandits with known variance 1, with \(H = H_{\mathcal C^{sp}}\) Theorem 3 gives \(\sup _{\lambda \in \mathcal D} R_{H, \infty }(\mathcal A, \lambda ) \ge 1\) while the best bound that can be achieved with Corollary 1 is 1/2. That task is very simple, as remarked by [ KCG16 ] : the oracle fixed proportions are independent of the means (both arms are played equally), which means that the algorithm that plays those proportions has \(\sup _{\lambda \in \mathcal D} R_{H, \infty }(\mathcal A, \lambda ) \le 1\). Theorem 3 shows that this is tight and that no adaptive algorithm can beat it everywhere. We could not arrive to that conclusion with the weaker Corollary 1 since it only proves a \(1/2\) lower bound.