Laws of `stepsUntil` and `rewardByCount` #

⇑𝓛[fun (ω : Ω × (ℕ → α → ℝ)) => R n ω.1 | fun (ω : Ω × (ℕ → α → ℝ)) => A n ω.1; P.prod (streamMeasure ν)] =ᵐ[MeasureTheory.Measure.map (fun (ω : Ω × (ℕ → α → ℝ)) => A n ω.1) (P.prod (streamMeasure ν))] ⇑ν

source

theorem Bandits.reward_cond_action {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (n : ℕ) (hμa : (MeasureTheory.Measure.map (fun (ω : Ω × (ℕ → α → ℝ)) => A n ω.1) (P.prod (streamMeasure ν))) {a} ≠ 0) :

𝓛[fun (ω : Ω × (ℕ → α → ℝ)) => R n ω.1 | fun (ω : Ω × (ℕ → α → ℝ)) => A n ω.1 in {a}; P.prod (streamMeasure ν)] = ν a

source

theorem Bandits.condIndepFun_reward_stepsUntil_action' {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m n : ℕ) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) ⋯ (R n) ({ω : Ω | Learning.stepsUntil A a m ω = ↑n}.indicator fun (x : Ω) => 1) P

source

theorem Bandits.condIndepFun_reward_stepsUntil_action {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m n : ℕ) :

ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (fun (ω : Ω × (ℕ → α → ℝ)) => A n ω.1) mα) ⋯ (fun (ω : Ω × (ℕ → α → ℝ)) => R n ω.1) ({ω : Ω × (ℕ → α → ℝ) | Learning.stepsUntil A a m ω.1 = ↑n}.indicator fun (x : Ω × (ℕ → α → ℝ)) => 1) (P.prod (streamMeasure ν))

source

theorem Bandits.reward_cond_stepsUntil {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m n : ℕ) (hm : m ≠ 0) (hμn : (P.prod (streamMeasure ν)) ((fun (ω : Ω × (ℕ → α → ℝ)) => Learning.stepsUntil A a m ω.1) ⁻¹' {↑n}) ≠ 0) :

𝓛[fun (ω : Ω × (ℕ → α → ℝ)) => R n ω.1 | fun (ω : Ω × (ℕ → α → ℝ)) => Learning.stepsUntil A a m ω.1 in {↑n}; P.prod (streamMeasure ν)] = ν a

source

theorem Bandits.condDistrib_rewardByCount_stepsUntil {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m : ℕ) (hm : m ≠ 0) :

⇑𝓛[Learning.rewardByCount A R a m | fun (ω : Ω × (ℕ → α → ℝ)) => Learning.stepsUntil A a m ω.1; P.prod (streamMeasure ν)] =ᵐ[MeasureTheory.Measure.map (fun (ω : Ω × (ℕ → α → ℝ)) => Learning.stepsUntil A a m ω.1) (P.prod (streamMeasure ν))] ⇑(ProbabilityTheory.Kernel.const ℕ∞ (ν a))

The conditional distribution of the reward received at the m-th pull of action a given the time at which number of pulls is m is the constant kernel with value ν a.

source

theorem Bandits.hasLaw_rewardByCount {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (m : ℕ) (hm : m ≠ 0) :

ProbabilityTheory.HasLaw (Learning.rewardByCount A R a m) (ν a) (P.prod (streamMeasure ν))

The reward received at the m-th pull of action a has law ν a.

source

theorem Bandits.identDistrib_rewardByCount {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (n m : ℕ) (hn : n ≠ 0) (hm : m ≠ 0) :

ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (Learning.rewardByCount A R a m) (P.prod (streamMeasure ν)) (P.prod (streamMeasure ν))

source

theorem Bandits.identDistrib_rewardByCount_id {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (n : ℕ) (hn : n ≠ 0) :

ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) id (P.prod (streamMeasure ν)) (ν a)

source

theorem Bandits.identDistrib_rewardByCount_eval {α : Type u_1} {Ω : Type u_2} {mα : MeasurableSpace α} {mΩ : MeasurableSpace Ω} [DecidableEq α] [StandardBorelSpace α] [Nonempty α] {A : ℕ → Ω → α} {R : ℕ → Ω → ℝ} {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {alg : Learning.Algorithm α ℝ} {ν : ProbabilityTheory.Kernel α ℝ} [ProbabilityTheory.IsMarkovKernel ν] [StandardBorelSpace Ω] [Countable α] (h : Learning.IsAlgEnvSeq A R alg (Learning.stationaryEnv ν) P) (a : α) (n m : ℕ) (hn : n ≠ 0) :

ProbabilityTheory.IdentDistrib (Learning.rewardByCount A R a n) (fun (ω : ℕ → α → ℝ) => ω m a) (P.prod (streamMeasure ν)) (streamMeasure ν)

Documentation

LeanBandits.Bandit.RewardByCountMeasure

Laws of `stepsUntil` and `rewardByCount` #

Laws of stepsUntil and rewardByCount #

Laws of `stepsUntil` and `rewardByCount` #