Documentation

LeanMachineLearning.SequentialLearning.Algorithms.RoundRobin

Round-Robin algorithm #

That algorithm chooses each of finitely many actions in a round-robin fashion. That is, if there are K actions numbered from 0 to K - 1, then at time n it chooses he action n % K.

Main definitions #

theorem sum_mod_range {K : } (hK : 0 < K) (a : Fin K) :
(∑ sFinset.range K, if s % K, = a then 1 else 0) = 1
theorem sum_mod_range_mul {K : } (hK : 0 < K) (m : ) (a : Fin K) :
(∑ sFinset.range (K * m), if s % K, = a then 1 else 0) = m
noncomputable def Learning.RoundRobin.nextAction {K : } (hK : 0 < K) (n : ) :
Fin K

Action chosen by the Round-Robin algorithm at time n + 1. This is action (n + 1) % K.

Equations
Instances For
    noncomputable def Learning.roundRobinAlgorithm {K : } (hK : 0 < K) :

    The Round-Robin algorithm: deterministic algorithm that chooses action n % K at time n.

    Equations
    Instances For
      theorem Learning.RoundRobin.action_zero {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P 0) :
      A 0 =ᵐ[P] fun (x : Ω) => 0, hK
      theorem Learning.RoundRobin.action_ae_eq_roundRobinNextAction {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (n : ) (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (n + 1)) :
      A (n + 1) =ᵐ[P] fun (x : Ω) => nextAction hK n
      theorem Learning.RoundRobin.action_ae_eq {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (n : ) (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P n) :
      A n =ᵐ[P] fun (x : Ω) => n % K,

      The action chosen at time n is the action n % K.

      theorem Learning.RoundRobin.pullCount_mul {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (m : ) (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K * m - 1)) (a : Fin K) :
      pullCount A a (K * m) =ᵐ[P] fun (x : Ω) => m

      At time K * m, the number of times each action is chosen is equal to m.

      theorem Learning.RoundRobin.pullCount_eq_one {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1)) (a : Fin K) :
      pullCount A a K =ᵐ[P] fun (x : Ω) => 1
      theorem Learning.RoundRobin.time_gt_of_pullCount_gt_one {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1)) (a : Fin K) :
      ∀ᵐ (ω : Ω) P, ∀ (n : ), 1 < pullCount A a n ωK < n
      theorem Learning.RoundRobin.pullCount_pos_of_time_ge {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1)) :
      ∀ᵐ (ω : Ω) P, ∀ (n : ), K n∀ (b : Fin K), 0 < pullCount A b n ω
      theorem Learning.RoundRobin.pullCount_pos_of_pullCount_gt_one {K : } {hK : 0 < K} {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1)) (a : Fin K) :
      ∀ᵐ (ω : Ω) P, ∀ (n : ), 1 < pullCount A a n ω∀ (b : Fin K), 0 < pullCount A b n ω