Documentation

LeanBandits.SequentialLearning.StationaryEnv

Stationary environments #

A stationary environment, in which the distribution of the next reward depends only on the last action.

Equations
Instances For
    theorem Learning.IsAlgEnvSeq.condDistrib_reward_stationaryEnv {α : Type u_1} {R : Type u_2} { : MeasurableSpace α} {mR : MeasurableSpace R} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] [StandardBorelSpace R] [Nonempty R] {alg : Algorithm α R} {ν : ProbabilityTheory.Kernel α R} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ωα} {R' : ΩR} (h : IsAlgEnvSeq A R' alg (stationaryEnv ν) P) (n : ) :
    𝓛[R' n | A n; P] =ᵐ[MeasureTheory.Measure.map (A n) P] ν

    The conditional distribution of the reward at time n given the action at time n is ν.

    theorem Learning.IsAlgEnvSeq.condIndepFun_reward_hist_action {α : Type u_1} {R : Type u_2} { : MeasurableSpace α} {mR : MeasurableSpace R} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] [StandardBorelSpace R] [Nonempty R] {alg : Algorithm α R} {ν : ProbabilityTheory.Kernel α R} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ωα} {R' : ΩR} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A R' alg (stationaryEnv ν) P) (n : ) :

    The reward at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.

    theorem Learning.IsAlgEnvSeq.condIndepFun_reward_hist_action_action {α : Type u_1} {R : Type u_2} { : MeasurableSpace α} {mR : MeasurableSpace R} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] [StandardBorelSpace R] [Nonempty R] {alg : Algorithm α R} {ν : ProbabilityTheory.Kernel α R} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ωα} {R' : ΩR} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A R' alg (stationaryEnv ν) P) (n : ) :
    ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A (n + 1)) inferInstance) (R' (n + 1)) (fun (ω : Ω) => (hist A R' n ω, A (n + 1) ω)) P
    theorem Learning.IsAlgEnvSeq.condIndepFun_reward_hist_action_action' {α : Type u_1} {R : Type u_2} { : MeasurableSpace α} {mR : MeasurableSpace R} {Ω : Type u_3} { : MeasurableSpace Ω} [StandardBorelSpace α] [Nonempty α] [StandardBorelSpace R] [Nonempty R] {alg : Algorithm α R} {ν : ProbabilityTheory.Kernel α R} [ProbabilityTheory.IsMarkovKernel ν] {P : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure P] {A : Ωα} {R' : ΩR} [StandardBorelSpace Ω] (h : IsAlgEnvSeq A R' alg (stationaryEnv ν) P) (n : ) (hn : n 0) :
    ProbabilityTheory.CondIndepFun (MeasurableSpace.comap (A n) inferInstance) (R' n) (fun (ω : Ω) => (hist A R' (n - 1) ω, A n ω)) P

    The conditional distribution of the reward at time n given the action at time n is ν.

    The reward at time n + 1 is conditionally independent of the history up to time n given the action at time n + 1.