UCB algorithm #
noncomputable def
Bandits.ucbWidth'
{α : Type u_1}
[DecidableEq α]
(c : ℝ)
(n : ℕ)
(h : { x : ℕ // x ∈ Finset.Iic n } → α × ℝ)
(a : α)
:
The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.
Equations
- Bandits.ucbWidth' c n h a = √(c * Real.log (↑n + 1) / ↑(Bandits.pullCount' n h a))
Instances For
noncomputable def
Bandits.ucbNextArm
{α : Type u_1}
{mα : MeasurableSpace α}
[Nonempty α]
[DecidableEq α]
[Finite α]
[Encodable α]
[MeasurableSingletonClass α]
(c : ℝ)
(n : ℕ)
(h : { x : ℕ // x ∈ Finset.Iic n } → α × ℝ)
:
α
Arm pulled by the UCB algorithm at time n + 1
.
Equations
- Bandits.ucbNextArm c n h = measurableArgmax (fun (h : { x : ℕ // x ∈ Finset.Iic n } → α × ℝ) (a : α) => Bandits.empMean' n h a + Bandits.ucbWidth' c n h a) h
Instances For
theorem
Bandits.measurable_ucbNextArm
{α : Type u_1}
{mα : MeasurableSpace α}
[Nonempty α]
[DecidableEq α]
[Finite α]
[Encodable α]
[MeasurableSingletonClass α]
(c : ℝ)
(n : ℕ)
:
Measurable (ucbNextArm c n)
noncomputable def
Bandits.ucbAlgorithm
{α : Type u_1}
{mα : MeasurableSpace α}
[Nonempty α]
[DecidableEq α]
[Finite α]
[Encodable α]
[MeasurableSingletonClass α]
(c : ℝ)
:
The UCB algorithm.
Equations
Instances For
theorem
Bandits.gap_ucbArm_le_two_mul_ucbWidth
{α : Type u_1}
{mα : MeasurableSpace α}
{ν : ProbabilityTheory.Kernel α ℝ}
{t : ℕ}
[Fintype α]
[Nonempty α]
{c : ℝ}
{μ : α → ℝ}
{N : α → ℕ}
(h_best : ∫ (x : ℝ), id x ∂ν (bestArm ν) ≤ μ (bestArm ν) + ucbWidth c N t (bestArm ν))
(h_ucb : μ (ucbArm c μ N t) - ucbWidth c N t (ucbArm c μ N t) ≤ ∫ (x : ℝ), id x ∂ν (ucbArm c μ N t))
:
theorem
Bandits.N_ucbArm_le
{α : Type u_1}
{mα : MeasurableSpace α}
{ν : ProbabilityTheory.Kernel α ℝ}
{t : ℕ}
[Fintype α]
[Nonempty α]
{c : ℝ}
{μ : α → ℝ}
{N : α → ℕ}
(h_best : ∫ (x : ℝ), id x ∂ν (bestArm ν) ≤ μ (bestArm ν) + ucbWidth c N t (bestArm ν))
(h_ucb : μ (ucbArm c μ N t) - ucbWidth c N t (ucbArm c μ N t) ≤ ∫ (x : ℝ), id x ∂ν (ucbArm c μ N t))
: