Algorithms #
A stochastic, sequential algorithm.
- policy (n : ℕ) : ProbabilityTheory.Kernel (↥(Finset.Iic n) → α × R) αPolicy or sampling rule: distribution of the next action. 
- h_policy (n : ℕ) : ProbabilityTheory.IsMarkovKernel (self.policy n)
- p0 : MeasureTheory.Measure αDistribution of the first action. 
- hp0 : MeasureTheory.IsProbabilityMeasure self.p0
Instances For
A stochastic environment.
- feedback (n : ℕ) : ProbabilityTheory.Kernel ((↥(Finset.Iic n) → α × R) × α) RDistribution of the next observation as function of the past history. 
- h_feedback (n : ℕ) : ProbabilityTheory.IsMarkovKernel (self.feedback n)
- ν0 : ProbabilityTheory.Kernel α RDistribution of the first observation given the first action. 
- hp0 : ProbabilityTheory.IsMarkovKernel self.ν0
Instances For
Kernel describing the distribution of the next action-reward pair given the history
up to n.
Equations
- Learning.stepKernel alg env n = (alg.policy n).compProd (env.feedback n)
Instances For
Equations
- ⋯ = ⋯
Kernel sending a partial trajectory of the bandit interaction Iic n → α × ℝ to a measure
on ℕ → α × ℝ, supported on full trajectories that start with the partial one.
Equations
- Learning.traj alg env n = ProbabilityTheory.Kernel.traj (Learning.stepKernel alg env) n
Instances For
Equations
- ⋯ = ⋯
Measure on the sequence of actions and observations generated by the algorithm/environment.
Equations
- Learning.trajMeasure alg env = ProbabilityTheory.Kernel.trajMeasure (alg.p0.compProd env.ν0) (Learning.stepKernel alg env)
Instances For
Equations
- ⋯ = ⋯
action n is the action pulled at time n. This is a random variable on the measurable space
ℕ → α × ℝ.
Equations
- Learning.action n h = (h n).1
Instances For
reward n is the reward at time n. This is a random variable on the measurable space
ℕ → α × R.
Equations
- Learning.reward n h = (h n).2
Instances For
hist n is the history up to time n. This is a random variable on the measurable space
ℕ → α × R.
Equations
- Learning.hist n h i = h ↑i
Instances For
Filtration of the algorithm interaction.
Equations
Instances For
A stationary environment, in which the distribution of the next reward depends only on the last action.
Equations
- Learning.stationaryEnv ν = { feedback := fun (x : ℕ) => ProbabilityTheory.Kernel.prodMkLeft (↥(Finset.Iic x) → α × R) ν, h_feedback := ⋯, ν0 := ν, hp0 := inst✝ }