On the Existence of a Complexity in Fixed Budget Bandit Identification

B Proofs of results from Section 3

B.1 Proof of the lower bound Theorem 2

Proof of Theorem 2

The proof of this inequality follows the standard bandit lower bound argument, which can be found for example in [ GMS19 ] . The Kullback-Leibler divergence between the observations up to \(T\) under models \(\mu \) and \(\lambda \) is \(\sum _{k=1}^K \mathbb {E}_\mu [N_{T,k}] \mathrm{KL}(\mu _k, \lambda _k)\). By the data processing inequality, this Kullback-Leibler divergence is larger than the KL between Bernoulli distributions of means \(\mathbb {P}_{\mu , \mathcal A}(E)\) and \(\mathbb {P}_{\lambda , \mathcal A}(E)\) for any event \(E\). We apply this to \(E = \{ \hat{i}_T = i^\star (\mu )\} \) to obtain

\begin{align*} \mathrm{kl}(\mathbb {P}_{\mu , \mathcal A}(\hat{i}_T = i^\star (\mu )), \mathbb {P}_{\lambda , \mathcal A}(\hat{i}_T = i^\star (\mu ))) \le \sum _{k=1}^K \mathbb {E}_\mu [N_{T,k}] \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}

We use the inequality \(\mathrm{kl}(a, b) \ge a \log \frac{1}{b} - \log 2\), then \(\mathbb {P}_{\mu , \mathcal A}(\hat{i}_T = i^\star (\mu )) = 1 - p_{\mu , T}(\mathcal A)\), \(\mathbb {P}_{\lambda , \mathcal A}(\hat{i}_T = i^\star (\mu )) \le p_{\lambda , T}(\mathcal A)\) (since \(i^\star (\lambda ) \ne i^\star (\mu )\)) to get

\begin{align*} (1 - p_{\mu , T}(\mathcal A)) \log \frac{1}{p_{\lambda , T}(\mathcal A)} - \log 2 \le \sum _{k=1}^K \mathbb {E}_\mu [N_{T,k}] \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}

By definition, \(p_{\lambda , T}(\mathcal A) = \exp (-T R_{H,T}(\mathcal A, \lambda )^{-1} H(\lambda )^{-1})\),. We get

\begin{align*} (1 - p_{\mu , T}(\mathcal A)) T R_{H,T}(\mathcal A, \lambda )^{-1} H(\lambda )^{-1} - \log 2 \le \sum _{k=1}^K \mathbb {E}_\mu [N_{T,k}] \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}

Dividing by \(T H(\lambda )^{-1}\) and using \(H(\lambda ) \le \sqrt{T}\) gives the result.

B.2 Additional results

Theorem 10

Let \(\mu \in \mathcal D\) and let \(\mathcal A\) be an algorithm with \(\lim _{T \to + \infty } p_{\mu , T}(\mathcal A) = 0\). Let \(D(\mu ) \subseteq \mathrm{Alt}(\mu )\) be a set such that \(\sup _{\lambda \in D(\mu )} H(\lambda ) {\lt} +\infty \). Then

\begin{align*} (\liminf _{T\to +\infty }\sup _{\lambda \in D(\mu )}R_{H,T}(\mathcal A, \lambda ))^{-1} \le \max _{\omega \in \triangle _K} \inf _{\lambda \in D(\mu )} H(\lambda )\sum _{k=1}^K \omega _k \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}
If \(\mathcal A\) is consistent, then it satisfies in particular the condition of the theorem \(\lim _{T \to + \infty } p_{\mu , T}(\mathcal A) = 0\).

Proof

For \(T\) large enough, we can apply Theorem 2 for any \(\lambda \in D(\mu )\), hence we can take an infimum over \(\lambda \in D(\mu )\) to get

\begin{align*} (\sup _{\lambda \in D(\mu )}R_{H,T}(\mathcal A, \lambda ))^{-1} (1 - p_{\mu , T}(\mathcal A)) - \frac{\log 2}{\sqrt{T}} & \le \inf _{\lambda \in D(\mu )} H(\lambda )\sum _{k=1}^K \mathbb {E}_\mu [\frac{N_{T,k}}{T}] \mathrm{KL}(\mu _k, \lambda _k) \\ & \le \max _{\omega \in \triangle _K} \inf _{\lambda \in D(\mu )} H(\lambda )\sum _{k=1}^K \omega _k \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}

Taking a limit when \(T \to +\infty \) and using \(\lim _{T \to + \infty } p_{\mu , T}(\mathcal A) = 0\), we get the inequality we want to prove.

Corollary 2

For all \(x \in \mathbb {R}\), let \(\mathrm{Alt}_x(\mu ) = \mathrm{Alt}(\mu ) \cap \{ \lambda \in \mathcal D \mid H(\lambda ) \le x\} \). For all consistent algorithm families \(\mathcal A\),

\begin{align*} (\liminf _{T\to +\infty }\sup _{\lambda \in \mathcal D}R_{H,T}(\mathcal A, \lambda ))^{-1} \le \liminf _{x \to \infty } \inf _{\mu \in \mathcal D} \max _{\omega \in \triangle _K} \inf _{\lambda \in \mathrm{Alt}_x(\mu )} H(\lambda )\sum _{k=1}^K \omega _k \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}

Proof

Let \(\mu \in \mathcal D\) and \(x {\gt} 0\). We apply Theorem 10 to \(\mathrm{Alt}_{x}(\mu )\).

\begin{align*} (\liminf _{T\to +\infty }\sup _{\lambda \in \mathrm{Alt}_x(\mu )}R_{H,T}(\mathcal A, \lambda ))^{-1} \le \max _{\omega \in \triangle _K} \inf _{\lambda \in \mathrm{Alt}_{x}(\mu )} H(\lambda )\sum _{k=1}^K \omega _k \mathrm{KL}(\mu _k, \lambda _k) \: . \end{align*}

The left hand side is larger than \((\liminf _{T\to +\infty }\sup _{\lambda \in \mathcal D} R_{H,T}(\mathcal A, \lambda ))^{-1}\), which is now independent of \(\mu \) and \(x\). We then take on the right hand side first an infimum over \(\mu \), then a liminf over \(x\). Doing it in this order leads to the tighter bound (compared to \(\inf _\mu \liminf _x\)).