Documentation

LeanBandits.BanditAlgorithms.RoundRobin

Round-Robin algorithm #

That algorithm pulls each arm in a round-robin fashion.

noncomputable def Bandits.RoundRobin.nextArm {K : } (hK : 0 < K) (n : ) :
Fin K

Arm pulled by the Round-Robin algorithm at time n + 1. This is arm n % K.

Equations
Instances For
    noncomputable def Bandits.roundRobinAlgorithm {K : } (hK : 0 < K) :

    The Round-Robin algorithm: deterministic algorithm that chooses the next arm according to RoundRobin.nextArm.

    Equations
    Instances For
      theorem Bandits.RoundRobin.arm_zero {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P 0) :
      A 0 =ᵐ[P] fun (x : Ω) => 0, hK
      theorem Bandits.RoundRobin.arm_ae_eq_roundRobinNextArm {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (n : ) (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (n + 1)) :
      A (n + 1) =ᵐ[P] fun (x : Ω) => nextArm hK n
      theorem Bandits.RoundRobin.arm_ae_eq {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (n : ) (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P n) :
      A n =ᵐ[P] fun (x : Ω) => n % K,

      The arm pulled at time n is the arm n % K.

      theorem Bandits.RoundRobin.pullCount_mul {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (m : ) (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K * m - 1)) (a : Fin K) :
      Learning.pullCount A a (K * m) =ᵐ[P] fun (x : Ω) => m

      At time K * m, the number of pulls of each arm is equal to m.

      theorem Bandits.RoundRobin.pullCount_eq_one {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K - 1)) (a : Fin K) :
      Learning.pullCount A a K =ᵐ[P] fun (x : Ω) => 1
      theorem Bandits.RoundRobin.time_gt_of_pullCount_gt_one {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K - 1)) (a : Fin K) :
      ∀ᵐ (ω : Ω) P, ∀ (n : ), 1 < Learning.pullCount A a n ωK < n
      theorem Bandits.RoundRobin.pullCount_pos_of_time_ge {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K - 1)) :
      ∀ᵐ (ω : Ω) P, ∀ (n : ), K n∀ (b : Fin K), 0 < Learning.pullCount A b n ω
      theorem Bandits.RoundRobin.pullCount_pos_of_pullCount_gt_one {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (Learning.stationaryEnv ν) P (K - 1)) (a : Fin K) :
      ∀ᵐ (ω : Ω) P, ∀ (n : ), 1 < Learning.pullCount A a n ω∀ (b : Fin K), 0 < Learning.pullCount A b n ω