UCB algorithm #

noncomputable def Bandits.ucbWidth' {α : Type u_1} [DecidableEq α] (c : ℝ) (n : ℕ) (h : { x : ℕ // x ∈ Finset.Iic n } → α × ℝ) (a : α) :

The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

Equations

Instances For

Arm pulled by the UCB algorithm at time n + 1.

Equations

Instances For

The UCB algorithm.

Equations

Instances For

noncomputable def Bandits.ucbWidth {α : Type u_1} (c : ℝ) (N : α → ℕ) (t : ℕ) (a : α) :

The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

Equations

Instances For

noncomputable def Bandits.ucbArm {α : Type u_1} [Fintype α] [Nonempty α] (c : ℝ) (μ : α → ℝ) (N : α → ℕ) (t : ℕ) :

The arm pulled by the UCB algorithm.

Equations

Instances For

theorem Bandits.le_ucb {α : Type u_1} {t : ℕ} [Fintype α] [Nonempty α] {c : ℝ} {μ : α → ℝ} {N : α → ℕ} (a : α) :

μ a + ucbWidth c N t a ≤ μ (ucbArm c μ N t) + ucbWidth c N t (ucbArm c μ N t)

theorem Bandits.gap_ucbArm_le_two_mul_ucbWidth {α : Type u_1} {mα : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α ℝ} {t : ℕ} [Fintype α] [Nonempty α] {c : ℝ} {μ : α → ℝ} {N : α → ℕ} (h_best : ∫ (x : ℝ), id x ∂ν (bestArm ν) ≤ μ (bestArm ν) + ucbWidth c N t (bestArm ν)) (h_ucb : μ (ucbArm c μ N t) - ucbWidth c N t (ucbArm c μ N t) ≤ ∫ (x : ℝ), id x ∂ν (ucbArm c μ N t)) :

gap ν (ucbArm c μ N t) ≤ 2 * ucbWidth c N t (ucbArm c μ N t)

theorem Bandits.N_ucbArm_le {α : Type u_1} {mα : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α ℝ} {t : ℕ} [Fintype α] [Nonempty α] {c : ℝ} {μ : α → ℝ} {N : α → ℕ} (h_best : ∫ (x : ℝ), id x ∂ν (bestArm ν) ≤ μ (bestArm ν) + ucbWidth c N t (bestArm ν)) (h_ucb : μ (ucbArm c μ N t) - ucbWidth c N t (ucbArm c μ N t) ≤ ∫ (x : ℝ), id x ∂ν (ucbArm c μ N t)) :

↑(N (ucbArm c μ N t)) ≤ 4 * c * Real.log ↑t / gap ν (ucbArm c μ N t) ^ 2

Documentation