Round-Robin algorithm #
That algorithm chooses each of finitely many actions in a round-robin fashion.
That is, if there are K actions numbered from 0 to K - 1, then at time n it chooses
he action n % K.
Main definitions #
roundRobinAlgorithm: the Round-Robin algorithm.
The Round-Robin algorithm: deterministic algorithm that chooses action n % K at time n.
Equations
- Learning.roundRobinAlgorithm hK = Learning.detAlgorithm (fun (n : ℕ) (x : ↥(Finset.Iic n) → Fin K × ℝ) => Learning.RoundRobin.nextAction hK n) ⋯ ⟨0, hK⟩
Instances For
theorem
Learning.RoundRobin.action_zero
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P 0)
:
theorem
Learning.RoundRobin.action_ae_eq_roundRobinNextAction
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(n : ℕ)
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (n + 1))
:
theorem
Learning.RoundRobin.action_ae_eq
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(n : ℕ)
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P n)
:
The action chosen at time n is the action n % K.
theorem
Learning.RoundRobin.pullCount_mul
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(m : ℕ)
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K * m - 1))
(a : Fin K)
:
At time K * m, the number of times each action is chosen is equal to m.
theorem
Learning.RoundRobin.pullCount_eq_one
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
(a : Fin K)
:
theorem
Learning.RoundRobin.time_gt_of_pullCount_gt_one
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
(a : Fin K)
:
theorem
Learning.RoundRobin.pullCount_pos_of_time_ge
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
:
theorem
Learning.RoundRobin.pullCount_pos_of_pullCount_gt_one
{K : ℕ}
{hK : 0 < K}
{ν : ProbabilityTheory.Kernel (Fin K) ℝ}
[ProbabilityTheory.IsMarkovKernel ν]
{Ω : Type u_1}
{mΩ : MeasurableSpace Ω}
{P : MeasureTheory.Measure Ω}
[MeasureTheory.IsProbabilityMeasure P]
{A : ℕ → Ω → Fin K}
{R : ℕ → Ω → ℝ}
[Nonempty (Fin K)]
(h : IsAlgEnvSeqUntil A R (roundRobinAlgorithm hK) (stationaryEnv ν) P (K - 1))
(a : Fin K)
: