Documentation

LeanBandits.Bandit.RewardByCountMeasure

Laws of stepsUntil and rewardByCount #

theorem Bandits.hasLaw_Z {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} { : MeasurableSpace Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] (a : α) (m : ) :
ProbabilityTheory.HasLaw (fun (ω : Ω × (α)) => ω.2 m a) (ν a) (P.prod (Bandit.streamMeasure ν))

Law of Y conditioned on the event s.

Equations
  • One or more equations did not get rendered due to their size.
Instances For

    Law of Y conditioned on the event that X is in s.

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For

      Law of Y conditioned on the event that X equals x.

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For
        theorem Bandits.condDistrib_reward'' {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} { : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] {A : Ωα} {R : Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α } {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (n : ) :
        𝓛[fun (ω : Ω × (α)) => R n ω.1 | fun (ω : Ω × (α)) => A n ω.1; P.prod (Bandit.streamMeasure ν)] =ᵐ[MeasureTheory.Measure.map (fun (ω : Ω × (α)) => A n ω.1) (P.prod (Bandit.streamMeasure ν))] ν
        theorem Bandits.reward_cond_action {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} { : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] {A : Ωα} {R : Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α } {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (n : ) (hμa : (MeasureTheory.Measure.map (fun (ω : Ω × (α)) => A n ω.1) (P.prod (Bandit.streamMeasure ν))) {a} 0) :
        𝓛[fun (ω : Ω × (α)) => R n ω.1 | fun (ω : Ω × (α)) => A n ω.1 in {a}; P.prod (Bandit.streamMeasure ν)] = ν a
        theorem Bandits.condIndepFun_reward_stepsUntil_action {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} { : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : Ωα} {R : Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α } {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m n : ) :
        ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (fun (ω : Ω × (α)) => A n ω.1) ) (fun (ω : Ω × (α)) => R n ω.1) ({ω : Ω × (α) | Learning.stepsUntil A a m ω.1 = n}.indicator fun (x : Ω × (α)) => 1) (P.prod (Bandit.streamMeasure ν))
        theorem Bandits.reward_cond_stepsUntil {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} { : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : Ωα} {R : Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α } {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m n : ) (hm : m 0) (hμn : (P.prod (Bandit.streamMeasure ν)) ((fun (ω : Ω × (α)) => Learning.stepsUntil A a m ω.1) ⁻¹' {n}) 0) :
        𝓛[fun (ω : Ω × (α)) => R n ω.1 | fun (ω : Ω × (α)) => Learning.stepsUntil A a m ω.1 in {n}; P.prod (Bandit.streamMeasure ν)] = ν a
        theorem Bandits.condDistrib_rewardByCount_stepsUntil {α : Type u_1} {Ω : Type u_2} { : MeasurableSpace α} { : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : Ωα} {R : Ω} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α } {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m : ) (hm : m 0) :
        𝓛[Learning.rewardByCount A R a m | fun (ω : Ω × (α)) => Learning.stepsUntil A a m ω.1; P.prod (Bandit.streamMeasure ν)] =ᵐ[MeasureTheory.Measure.map (fun (ω : Ω × (α)) => Learning.stepsUntil A a m ω.1) (P.prod (Bandit.streamMeasure ν))] (ProbabilityTheory.Kernel.const ℕ∞ (ν a))

        The conditional distribution of the reward received at the m-th pull of action a given the time at which number of pulls is m is the constant kernel with value ν a.

        The reward received at the m-th pull of action a has law ν a.

        theorem Bandits.identDistrib_eval_streamMeasure_measure {α : Type u_1} { : MeasurableSpace α} {alg : Learning.Algorithm α } {ν : ProbabilityTheory.Kernel α } [ProbabilityTheory.IsMarkovKernel ν] (a : α) :
        ProbabilityTheory.IdentDistrib (fun (ω : α) (n : ) => ω n a) (fun (ω : (α × ) × (α)) (n : ) => ω.2 n a) (Bandit.streamMeasure ν) (Bandit.measure alg ν)