Documentation

LeanBandits.UCB

UCB algorithm #

noncomputable def Bandits.ucbWidth' {α : Type u_1} [DecidableEq α] (c : ) (n : ) (h : { x : // x Finset.Iic n }α × ) (a : α) :

The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

Equations
Instances For
    noncomputable def Bandits.ucbNextArm {α : Type u_1} { : MeasurableSpace α} [Nonempty α] [DecidableEq α] [Finite α] [Encodable α] [MeasurableSingletonClass α] (c : ) (n : ) (h : { x : // x Finset.Iic n }α × ) :
    α

    Arm pulled by the UCB algorithm at time n + 1.

    Equations
    Instances For
      noncomputable def Bandits.ucbAlgorithm {α : Type u_1} { : MeasurableSpace α} [Nonempty α] [DecidableEq α] [Finite α] [Encodable α] [MeasurableSingletonClass α] (c : ) :

      The UCB algorithm.

      Equations
      Instances For
        noncomputable def Bandits.ucbWidth {α : Type u_1} (c : ) (N : α) (t : ) (a : α) :

        The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

        Equations
        Instances For
          noncomputable def Bandits.ucbArm {α : Type u_1} [Fintype α] [Nonempty α] (c : ) (μ : α) (N : α) (t : ) :
          α

          The arm pulled by the UCB algorithm.

          Equations
          Instances For
            theorem Bandits.le_ucb {α : Type u_1} {t : } [Fintype α] [Nonempty α] {c : } {μ : α} {N : α} (a : α) :
            μ a + ucbWidth c N t a μ (ucbArm c μ N t) + ucbWidth c N t (ucbArm c μ N t)
            theorem Bandits.gap_ucbArm_le_two_mul_ucbWidth {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {t : } [Fintype α] [Nonempty α] {c : } {μ : α} {N : α} (h_best : (x : ), id x ν (bestArm ν) μ (bestArm ν) + ucbWidth c N t (bestArm ν)) (h_ucb : μ (ucbArm c μ N t) - ucbWidth c N t (ucbArm c μ N t) (x : ), id x ν (ucbArm c μ N t)) :
            gap ν (ucbArm c μ N t) 2 * ucbWidth c N t (ucbArm c μ N t)
            theorem Bandits.N_ucbArm_le {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {t : } [Fintype α] [Nonempty α] {c : } {μ : α} {N : α} (h_best : (x : ), id x ν (bestArm ν) μ (bestArm ν) + ucbWidth c N t (bestArm ν)) (h_ucb : μ (ucbArm c μ N t) - ucbWidth c N t (ucbArm c μ N t) (x : ), id x ν (ucbArm c μ N t)) :
            (N (ucbArm c μ N t)) 4 * c * Real.log t / gap ν (ucbArm c μ N t) ^ 2