6 Practical Algorithms

The algorithms presented in the previous chapters are theoretical algorithms, in the very basic sense that they perform operations on real numbers to decide what they will sample, and those numbers are not representable in a computer. In practice, we need to use floating point numbers (or bounded integers or rationals), both for the algorithm and for the rewards that are sampled from the environment.

We will want to perform experiments about the algorithms for which we have theoretical guarantees, and that means using an implementation of the algorithms that can actually run on a computer. However if we use float in the specification of the algorithms, we won’t be able to prove anything, because those numbers lack any good algebraic properties.

Our strategy to resolve that conflict is to describe first the algorithm as a function of real numbers, and then automatically generate a version of the algorithm that uses floating point numbers, by using a translation tactic (similar to to_additive). The result is that we have only one hand-written algorithm, and even though the theorems don’t directly apply to the floating point version, we are assured that the only difference between the experimental and theoretical algorithms is the result of the translation tactic. This is in stark contrast with the situation in the literature, where the theoretical algorithm is specified in pseudo-code in LaTeX and the experiments are done on a separately written implementation, typically in Python. The experimental version may or may not be also implementing a series of tricks or optimizations of constants.

TODO: describe the translation tactic. It should basically be like to_additive.

Alternative strategy: atomic typeclasses

The ETC algorithm makes sense for any reward type on which we can define addition and the maximum. As written, it also requires division by natural numbers but we could equivalently compare sums of rewards instead of empirical means. The UCB algorithm also requires multiplication, a square root and a logarithm.

We can write the algorithm under typeclasses stating that those operations exist. Those typeclasses would be satisfied by both the real numbers and floating point numbers. That way, the algorithm is only written once. We would still only get theorems about the real number version because proving theorems will actually require algebraic properties on the reward type, but we don’t need to write a translation tactic.

Error sensitivity analysis

A more theoretically satisfying but more involved method to link the Float and Real worlds is to assume in the theoretical analysis that each of the operations performed by the algorithm is imprecise: it is only supposed to give the expected result up to a specified error margin. Then we could prove regret bounds on the algorithm that apply under that imprecise assumption.

Finally, we need to prove such error bounds for Float operations. The bounds might depend on the magnitude of the numbers involved.

This method would give theoretical guarantees about the algorithm that is actually running, but the theoretical analysis would depart from the standard literature on bandits and would be more tedious because of the need to track error terms.