Documentation

LeanBandits.Regret

Regret #

Definitions of regret, gaps, pull counts #

noncomputable def Bandits.regret {α : Type u_1} { : MeasurableSpace α} (ν : ProbabilityTheory.Kernel α ) (k : α) (t : ) :

Regret of a sequence of pulls k : ℕ → α at time t for the reward kernel ν ; Kernel α ℝ.

Equations
Instances For
    noncomputable def Bandits.gap {α : Type u_1} { : MeasurableSpace α} (ν : ProbabilityTheory.Kernel α ) (a : α) :

    Gap of an arm a: difference between the highest mean of the arms and the mean of a.

    Equations
    Instances For
      theorem Bandits.gap_nonneg {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {a : α} [Fintype α] :
      0 gap ν a
      noncomputable def Bandits.pullCount {α : Type u_1} (k : α) (a : α) (t : ) :

      Number of times arm a was pulled up to time t.

      Equations
      Instances For
        theorem Bandits.sum_pullCount_mul {α : Type u_1} [Fintype α] (k : α) (f : α) (t : ) :
        a : α, (pullCount k a t) * f a = sFinset.range t, f (k s)
        theorem Bandits.sum_pullCount {α : Type u_1} {k : α} {t : } [Fintype α] :
        a : α, pullCount k a t = t
        theorem Bandits.regret_eq_sum_pullCount_mul_gap {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {k : α} {t : } [Fintype α] :
        regret ν k t = a : α, (pullCount k a t) * gap ν a
        noncomputable def Bandits.bestArm {α : Type u_1} { : MeasurableSpace α} [Fintype α] [Nonempty α] (ν : ProbabilityTheory.Kernel α ) :
        α

        Arm with the highest mean.

        Equations
        Instances For
          theorem Bandits.le_bestArm {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } [Fintype α] [Nonempty α] (a : α) :
          (x : ), id x ν a (x : ), id x ν (bestArm ν)
          theorem Bandits.gap_eq_bestArm_sub {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {a : α} [Fintype α] [Nonempty α] :
          gap ν a = (x : ), id x ν (bestArm ν) - (x : ), id x ν a