Algorithms #
A stochastic, sequential algorithm.
- policy (n : ℕ) : ProbabilityTheory.Kernel (↥(Finset.Iic n) → α × R) α
Policy or sampling rule: distribution of the next action.
- h_policy (n : ℕ) : ProbabilityTheory.IsMarkovKernel (self.policy n)
- p0 : MeasureTheory.Measure α
Distribution of the first action.
- hp0 : MeasureTheory.IsProbabilityMeasure self.p0
Instances For
A stochastic environment.
- feedback (n : ℕ) : ProbabilityTheory.Kernel ((↥(Finset.Iic n) → α × R) × α) R
Distribution of the next observation as function of the past history.
- h_feedback (n : ℕ) : ProbabilityTheory.IsMarkovKernel (self.feedback n)
- ν0 : ProbabilityTheory.Kernel α R
Distribution of the first observation given the first action.
- hp0 : ProbabilityTheory.IsMarkovKernel self.ν0
Instances For
Kernel describing the distribution of the next action-reward pair given the history
up to n.
Equations
- Learning.stepKernel alg env n = (alg.policy n).compProd (env.feedback n)
Instances For
Equations
- ⋯ = ⋯
History of the algorithm-environment sequence up to time n.
Equations
- Learning.IsAlgEnvSeq.hist A R' n ω i = (A (↑i) ω, R' (↑i) ω)
Instances For
An algorithm-environment sequence: a sequence of actions and rewards generated by an algorithm interacting with an environment.
- measurable_A (n : ℕ) : Measurable (A n)
- measurable_R (n : ℕ) : Measurable (R' n)
- hasLaw_action_zero : ProbabilityTheory.HasLaw (fun (ω : Ω) => A 0 ω) alg.p0 P
- hasCondDistrib_reward_zero : ProbabilityTheory.HasCondDistrib (R' 0) (A 0) env.ν0 P
- hasCondDistrib_action (n : ℕ) : ProbabilityTheory.HasCondDistrib (A (n + 1)) (hist A R' n) (alg.policy n) P
Instances For
Filtration generated by the history up to time n.
Equations
- Learning.IsAlgEnvSeq.filtration hA hR' = { seq := fun (i : ℕ) => MeasurableSpace.comap (Learning.IsAlgEnvSeq.hist A R' i) inferInstance, mono' := ⋯, le' := ⋯ }
Instances For
Filtration generated by the history at time n-1 together with the action at time n.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Kernel sending a partial trajectory of the bandit Seq Iic n → α × ℝ to a measure
on ℕ → α × ℝ, supported on full trajectories that start with the partial one.
Equations
- Learning.traj alg env n = ProbabilityTheory.Kernel.traj (Learning.stepKernel alg env) n
Instances For
Equations
- ⋯ = ⋯
Measure on the sequence of actions and observations generated by the algorithm/environment.
Equations
- Learning.trajMeasure alg env = ProbabilityTheory.Kernel.trajMeasure (alg.p0.compProd env.ν0) (Learning.stepKernel alg env)
Instances For
Equations
- ⋯ = ⋯