Documentation

LeanBandits.Bandit.Regret

Regret, gap, best arm #

noncomputable def Bandits.gap {α : Type u_1} { : MeasurableSpace α} (ν : ProbabilityTheory.Kernel α ) (a : α) :

Gap of an action a: difference between the highest mean of the actions and the mean of a.

Equations
Instances For
    theorem Bandits.gap_nonneg {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {a : α} [Fintype α] :
    0 gap ν a
    noncomputable def Bandits.regret {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} (ν : ProbabilityTheory.Kernel α ) (A : Ωα) (t : ) (ω : Ω) :

    Regret of a sequence of pulls k : ℕ → α at time t for the reward kernel ν ; Kernel α ℝ.

    Equations
    Instances For
      theorem Bandits.regret_eq_sum_gap {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {A : Ωα} {ω : Ω} {t : } :
      regret ν A t ω = sFinset.range t, gap ν (A s ω)
      theorem Bandits.regret_nonneg {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {A : Ωα} {ω : Ω} {t : } [Fintype α] :
      0 regret ν A t ω
      theorem Bandits.gap_eq_zero_of_regret_eq_zero {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {A : Ωα} {ω : Ω} {t : } [Fintype α] (hr : regret ν A t ω = 0) {s : } (hs : s < t) :
      gap ν (A s ω) = 0
      theorem Bandits.regret_eq_sum_pullCount_mul_gap {α : Type u_1} {Ω : Type u_2} [DecidableEq α] { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {A : Ωα} {ω : Ω} {t : } [Fintype α] :
      regret ν A t ω = a : α, (Learning.pullCount A a t ω) * gap ν a
      noncomputable def Bandits.bestArm {α : Type u_1} { : MeasurableSpace α} [Fintype α] [Nonempty α] (ν : ProbabilityTheory.Kernel α ) :
      α

      action with the highest mean.

      Equations
      Instances For
        theorem Bandits.le_bestArm {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } [Fintype α] [Nonempty α] (a : α) :
        (x : ), id x ν a (x : ), id x ν (bestArm ν)
        theorem Bandits.gap_eq_bestArm_sub {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {a : α} [Fintype α] [Nonempty α] :
        gap ν a = (x : ), id x ν (bestArm ν) - (x : ), id x ν a
        @[simp]
        theorem Bandits.gap_bestArm {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } [Fintype α] [Nonempty α] :
        gap ν (bestArm ν) = 0
        theorem Bandits.integral_eq_of_gap_eq_zero {α : Type u_1} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {a : α} [Fintype α] [Nonempty α] (hg : gap ν a = 0) :
        (x : ), id x ν (bestArm ν) = (x : ), id x ν a
        theorem Bandits.avg_mean_reward_tendsto_of_sublinear_regret {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {A : Ωα} {ω : Ω} (hr : (fun (x : ) => regret ν A x ω) =o[Filter.atTop] fun (t : ) => t) :
        Filter.Tendsto (fun (t : ) => (∑ sFinset.range t, (x : ), id x ν (A s ω)) / t) Filter.atTop (nhds (⨆ (a : α), (x : ), id x ν a))

        If the regret is sublinear, the average mean reward tends to the highest mean of the arms.

        theorem Bandits.pullCount_rate_tendsto_of_sublinear_regret {α : Type u_1} {Ω : Type u_2} [DecidableEq α] { : MeasurableSpace α} {ν : ProbabilityTheory.Kernel α } {A : Ωα} {ω : Ω} {a : α} [Fintype α] (hr : (fun (x : ) => regret ν A x ω) =o[Filter.atTop] fun (t : ) => t) (hg : 0 < gap ν a) :
        Filter.Tendsto (fun (t : ) => (Learning.pullCount A a t ω) / t) Filter.atTop (nhds 0)

        If the regret is sublinear, the rate of suboptimal arm pulls tends to zero.