The Explore-Then-Commit Algorithm #
Arm pulled by the ETC algorithm at time n + 1.
For n < K * m - 1, this is arm n % K.
For n = K * m - 1, this is the arm with the highest empirical mean after the exploration phase.
For n ≥ K * m, this is the same arm as at time n.
Equations
Instances For
The next arm pulled by ETC is chosen in a measurable way.
The Explore-Then-Commit algorithm: deterministic algorithm that chooses the next arm according
to ETC.nextArm.
Equations
- Bandits.etcAlgorithm hK m = Learning.detAlgorithm (Bandits.ETC.nextArm hK m) ⋯ ⟨0, hK⟩
Instances For
For n < K * m, the arm pulled at time n is the arm n % K.
The arm pulled at time K * m is the arm with the highest empirical mean after the exploration
phase.
For n ≥ K * m, the arm pulled at time n + 1 is the same as the arm pulled at time n.
For n ≥ K * m, the arm pulled at time n is the same as the arm pulled at time K * m.
At time K * m, the number of pulls of each arm is equal to m.
For n ≥ K * m, the number of pulls of each arm a at time n is equal to m plus
n - K * m if arm a is the best arm after the exploration phase.
If at time K * m the algorithm chooses arm a, then the total reward obtained by pulling
arm a is at least the total reward obtained by pulling the best arm.
The probability that at time K * m the ETC algorithm chooses arm a is at most
exp(- m * Δ_a^2 / 4).
Bound on the expectation of the number of pulls of each arm by the ETC algorithm.
Regret bound for the ETC algorithm.