Documentation

LeanBandits.BanditAlgorithms.UCB

UCB algorithm #

noncomputable def Bandits.ucbWidth' {K : } (c : ) (n : ) (h : (Finset.Iic n)Fin K × ) (a : Fin K) :

The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

Equations
Instances For
    noncomputable def Bandits.UCB.nextArm {K : } (hK : 0 < K) (c : ) (n : ) (h : (Finset.Iic n)Fin K × ) :
    Fin K

    Arm pulled by the UCB algorithm at time n + 1.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      theorem Bandits.UCB.measurable_nextArm {K : } (hK : 0 < K) (c : ) (n : ) :
      noncomputable def Bandits.ucbAlgorithm {K : } (hK : 0 < K) (c : ) :

      The UCB algorithm.

      Equations
      Instances For
        noncomputable def Bandits.UCB.ucbWidth {K : } {Ω : Type u_1} (A : ΩFin K) (c : ) (a : Fin K) (n : ) (ω : Ω) :

        The exploration bonus of the UCB algorithm, which corresponds to the width of a confidence interval.

        Equations
        Instances For
          theorem Bandits.UCB.measurable_ucbWidth {K : } {Ω : Type u_1} { : MeasurableSpace Ω} {A : ΩFin K} {n : } (hA : ∀ (n : ), Measurable (A n)) (c : ) (a : Fin K) :
          theorem Bandits.UCB.ucbWidth_eq_ucbWidth' {K : } {Ω : Type u_1} {A : ΩFin K} {R : Ω} (c : ) (a : Fin K) (n : ) (ω : Ω) (hn : n 0) :
          ucbWidth A c a n ω = ucbWidth' c (n - 1) (Learning.IsAlgEnvSeq.hist A R (n - 1) ω) a
          theorem Bandits.UCB.arm_zero {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) :
          A 0 =ᵐ[P] fun (x : Ω) => 0, hK
          theorem Bandits.UCB.arm_ae_eq_ucbNextArm {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (n : ) :
          A (n + 1) =ᵐ[P] fun (ω : Ω) => nextArm hK c n (Learning.IsAlgEnvSeq.hist A R n ω)
          theorem Bandits.UCB.arm_ae_all_eq {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) :
          ∀ᵐ (h : Ω) P, A 0 h = 0, hK ∀ (n : ), A (n + 1) h = nextArm hK c n (Learning.IsAlgEnvSeq.hist A R n h)
          theorem Bandits.UCB.ucbIndex_le_ucbIndex_arm {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} {n : } [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (a : Fin K) (hn : K n) :
          ∀ᵐ (h : Ω) P, Learning.empMean A R a n h + ucbWidth A c a n h Learning.empMean A R (A n h) n h + ucbWidth A c (A n h) n h
          theorem Bandits.UCB.forall_arm_eq_mod_of_lt {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) :
          ∀ᵐ (h : Ω) P, n < K, A n h = n % K,
          theorem Bandits.UCB.forall_ucbIndex_le_ucbIndex_arm {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (a : Fin K) :
          ∀ᵐ (h : Ω) P, ∀ (n : ), K nLearning.empMean A R a n h + ucbWidth A c a n h Learning.empMean A R (A n h) n h + ucbWidth A c (A n h) n h
          theorem Bandits.UCB.forall_arm_prop {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) :
          ∀ᵐ (h : Ω) P, (∀ n < K, A n h = n % K, ) ∀ (n : ), K n∀ (a : Fin K), Learning.empMean A R a n h + ucbWidth A c a n h Learning.empMean A R (A n h) n h + ucbWidth A c (A n h) n h
          theorem Bandits.UCB.pullCount_eq_of_time_eq {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (a : Fin K) :
          ∀ᵐ (ω : Ω) P, Learning.pullCount A a K ω = 1
          theorem Bandits.UCB.time_gt_of_pullCount_gt_one {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (a : Fin K) :
          ∀ᵐ (ω : Ω) P, ∀ (n : ), 1 < Learning.pullCount A a n ωK < n
          theorem Bandits.UCB.pullCount_pos_of_time_ge {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) :
          ∀ᵐ (ω : Ω) P, ∀ (n : ), K n∀ (b : Fin K), 0 < Learning.pullCount A b n ω
          theorem Bandits.UCB.pullCount_pos_of_pullCount_gt_one {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (a : Fin K) :
          ∀ᵐ (ω : Ω) P, ∀ (n : ), 1 < Learning.pullCount A a n ω∀ (b : Fin K), 0 < Learning.pullCount A b n ω
          theorem Bandits.UCB.gap_arm_le_two_mul_ucbWidth {K : } {c : } {ν : ProbabilityTheory.Kernel (Fin K) } {Ω : Type u_1} {A : ΩFin K} {R : Ω} {n : } {ω : Ω} [Nonempty (Fin K)] (h_best : (x : ), id x ν (bestArm ν) Learning.empMean A R (bestArm ν) n ω + ucbWidth A c (bestArm ν) n ω) (h_arm : Learning.empMean A R (A n ω) n ω - ucbWidth A c (A n ω) n ω (x : ), id x ν (A n ω)) (h_le : Learning.empMean A R (bestArm ν) n ω + ucbWidth A c (bestArm ν) n ω Learning.empMean A R (A n ω) n ω + ucbWidth A c (A n ω) n ω) :
          gap ν (A n ω) 2 * ucbWidth A c (A n ω) n ω
          theorem Bandits.UCB.pullCount_arm_le {K : } {c : } {ν : ProbabilityTheory.Kernel (Fin K) } {Ω : Type u_1} {A : ΩFin K} {R : Ω} {n : } {ω : Ω} [Nonempty (Fin K)] (hc : 0 c) (h_best : (x : ), id x ν (bestArm ν) Learning.empMean A R (bestArm ν) n ω + ucbWidth A c (bestArm ν) n ω) (h_arm : Learning.empMean A R (A n ω) n ω - ucbWidth A c (A n ω) n ω (x : ), id x ν (A n ω)) (h_le : Learning.empMean A R (bestArm ν) n ω + ucbWidth A c (bestArm ν) n ω Learning.empMean A R (A n ω) n ω + ucbWidth A c (A n ω) n ω) (h_gap_pos : 0 < gap ν (A n ω)) (h_pull_pos : 0 < Learning.pullCount A (A n ω) n ω) :
          (Learning.pullCount A (A n ω) n ω) 4 * c * Real.log (n + 1) / gap ν (A n ω) ^ 2
          theorem Bandits.UCB.todo {K : } {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 c) (a : Fin K) (n k : ) (hk : k 0) :
          (Bandit.streamMeasure ν) {ω : Fin K | (∑ mFinset.range k, ω m a) / k + (c * Real.log (n + 1) / k) (x : ), id x ν a} 1 / (n + 1) ^ (c / 2)
          theorem Bandits.UCB.todo' {K : } {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 c) (a : Fin K) (n k : ) (hk : k 0) :
          (Bandit.streamMeasure ν) {ω : Fin K | (x : ), id x ν a (∑ mFinset.range k, ω m a) / k - (c * Real.log (n + 1) / k)} 1 / (n + 1) ^ (c / 2)
          theorem Bandits.UCB.prob_ucbIndex_le {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 c) (a : Fin K) (n : ) :
          P {h : Ω | 0 < Learning.pullCount A a n h Learning.empMean A R a n h + ucbWidth A c a n h (x : ), id x ν a} 1 / (n + 1) ^ (c / 2 - 1)
          theorem Bandits.UCB.prob_ucbIndex_ge {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 c) (a : Fin K) (n : ) :
          P {h : Ω | 0 < Learning.pullCount A a n h (x : ), id x ν a Learning.empMean A R a n h - ucbWidth A c a n h} 1 / (n + 1) ^ (c / 2 - 1)
          theorem Bandits.UCB.probReal_ucbIndex_le {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 c) (a : Fin K) (n : ) :
          P.real {h : Ω | 0 < Learning.pullCount A a n h Learning.empMean A R a n h + ucbWidth A c a n h (x : ), id x ν a} 1 / (n + 1) ^ (c / 2 - 1)
          theorem Bandits.UCB.probReal_ucbIndex_ge {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 c) (a : Fin K) (n : ) :
          P.real {h : Ω | 0 < Learning.pullCount A a n h (x : ), id x ν a Learning.empMean A R a n h - ucbWidth A c a n h} 1 / (n + 1) ^ (c / 2 - 1)
          theorem Bandits.UCB.pullCount_le_add_three {K : } {c : } {ν : ProbabilityTheory.Kernel (Fin K) } {Ω : Type u_1} {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (a : Fin K) (n C : ) (ω : Ω) :
          Learning.pullCount A a n ω C + 1 + sFinset.range n, {s : | A s ω = a C < Learning.pullCount A a s ω (x : ), id x ν (bestArm ν) Learning.empMean A R (bestArm ν) s ω + ucbWidth A c (bestArm ν) s ω Learning.empMean A R (A s ω) s ω - ucbWidth A c (A s ω) s ω (x : ), id x ν (A s ω)}.indicator 1 s + sFinset.range n, {s : | C < Learning.pullCount A a s ω Learning.empMean A R (bestArm ν) s ω + ucbWidth A c (bestArm ν) s ω < (x : ), id x ν (bestArm ν)}.indicator 1 s + sFinset.range n, {s : | C < Learning.pullCount A a s ω (x : ), id x ν a < Learning.empMean A R a s ω - ucbWidth A c a s ω}.indicator 1 s
          theorem Bandits.UCB.pullCount_le_add_three_ae {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (a : Fin K) (n C : ) (hC : C 0) :
          ∀ᵐ (ω : Ω) P, Learning.pullCount A a n ω C + 1 + sFinset.range n, {s : | A s ω = a C < Learning.pullCount A a s ω (x : ), id x ν (bestArm ν) Learning.empMean A R (bestArm ν) s ω + ucbWidth A c (bestArm ν) s ω Learning.empMean A R (A s ω) s ω - ucbWidth A c (A s ω) s ω (x : ), id x ν (A s ω)}.indicator 1 s + sFinset.range n, {s : | 0 < Learning.pullCount A (bestArm ν) s ω Learning.empMean A R (bestArm ν) s ω + ucbWidth A c (bestArm ν) s ω < (x : ), id x ν (bestArm ν)}.indicator 1 s + sFinset.range n, {s : | 0 < Learning.pullCount A a s ω (x : ), id x ν a < Learning.empMean A R a s ω - ucbWidth A c a s ω}.indicator 1 s
          theorem Bandits.UCB.some_sum_eq_zero {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (hc : 0 c) (a : Fin K) (h_gap : 0 < gap ν a) (n C : ) (hC : C 0) (hC' : 4 * c * Real.log (n + 1) / gap ν a ^ 2 C) :
          ∀ᵐ (ω : Ω) P, sFinset.range n, {s : | A s ω = a C < Learning.pullCount A a s ω (x : ), id x ν (bestArm ν) Learning.empMean A R (bestArm ν) s ω + ucbWidth A c (bestArm ν) s ω Learning.empMean A R (A s ω) s ω - ucbWidth A c (A s ω) s ω (x : ), id x ν (A s ω)}.indicator 1 s = 0
          theorem Bandits.UCB.pullCount_ae_le_add_two {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) (hc : 0 c) (a : Fin K) (h_gap : 0 < gap ν a) (n C : ) (hC : C 0) (hC' : 4 * c * Real.log (n + 1) / gap ν a ^ 2 C) :
          ∀ᵐ (ω : Ω) P, Learning.pullCount A a n ω C + 1 + sFinset.range n, {s : | 0 < Learning.pullCount A (bestArm ν) s ω Learning.empMean A R (bestArm ν) s ω + ucbWidth A c (bestArm ν) s ω < (x : ), id x ν (bestArm ν)}.indicator 1 s + sFinset.range n, {s : | 0 < Learning.pullCount A a s ω (x : ), id x ν a < Learning.empMean A R a s ω - ucbWidth A c a s ω}.indicator 1 s
          noncomputable def Bandits.UCB.constSum (c : ) (n : ) :

          A sum that appears in the UCB regret upper bound.

          Equations
          Instances For
            theorem Bandits.UCB.expectation_pullCount_le' {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 < c) (a : Fin K) (h_gap : 0 < gap ν a) (n : ) :
            ∫⁻ (ω : Ω), (Learning.pullCount A a n ω) P ENNReal.ofReal (4 * c * Real.log (n + 1) / gap ν a ^ 2 + 1) + 1 + 2 * constSum c n

            Bound on the expectation of the number of pulls of each arm by the UCB algorithm.

            theorem Bandits.UCB.expectation_pullCount_le {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 < c) (a : Fin K) (h_gap : 0 < gap ν a) (n : ) :
            (x : Ω), (fun (ω : Ω) => (Learning.pullCount A a n ω)) x P 4 * c * Real.log (n + 1) / gap ν a ^ 2 + 2 + 2 * (constSum c n).toReal

            Bound on the expectation of the number of pulls of each arm by the UCB algorithm.

            theorem Bandits.UCB.regret_le {K : } {hK : 0 < K} {c : } {ν : ProbabilityTheory.Kernel (Fin K) } [ProbabilityTheory.IsMarkovKernel ν] {Ω : Type u_1} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : ΩFin K} {R : Ω} [Nonempty (Fin K)] (h : Learning.IsAlgEnvSeq A R (ucbAlgorithm hK c) (Learning.stationaryEnv ν) P) ( : ∀ (a : Fin K), ProbabilityTheory.HasSubgaussianMGF (fun (x : ) => x - (x : ), id x ν a) 1 (ν a)) (hc : 0 < c) (n : ) :
            (x : Ω), regret ν A n x P a : Fin K, (4 * c * Real.log (n + 1) / gap ν a + gap ν a * (2 + 2 * (constSum c n).toReal))

            Regret bound for the UCB algorithm.