Bandit #
Measurable equivalence between Iic 0 → α
and α
.
Equations
- MeasurableEquiv.piIicZero α = MeasurableEquiv.funUnique { x : ℕ // x ∈ Finset.Iic 0 } α
Instances For
A bandit interaction between an agent described by a policy and an environment given by reward distributions.
- ν : ProbabilityTheory.Kernel α ℝ
Conditional distribution of the rewards given the arm pulled.
- hν : ProbabilityTheory.IsMarkovKernel self.ν
Policy or sampling rule: distribution of the next pull.
- h_policy (n : ℕ) : ProbabilityTheory.IsMarkovKernel (self.policy n)
- p0 : MeasureTheory.Measure α
Distribution of the first pull.
- hp0 : MeasureTheory.IsProbabilityMeasure self.p0
Instances For
instance
Bandits.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdRealPolicy
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
instance
Bandits.instIsProbabilityMeasureP0
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
:
noncomputable def
Bandits.Bandit.stepKernel
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
Kernel describing the distribution of the next arm-reward pair given the history up to n
.
Equations
- b.stepKernel n = (b.policy n).compProd (ProbabilityTheory.Kernel.prodMkLeft ({ x : ℕ // x ∈ Finset.Iic n } → α × ℝ) b.ν)
Instances For
instance
Bandits.Bandit.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdRealStepKernel
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
@[simp]
theorem
Bandits.Bandit.fst_stepKernel
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
@[simp]
theorem
Bandits.Bandit.snd_stepKernel
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
noncomputable def
Bandits.Bandit.traj
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
Kernel sending a partial trajectory of the bandit interaction Iic n → α × ℝ
to a measure
on ℕ → α × ℝ
, supported on full trajectories that start with the partial one.
Equations
- b.traj n = ProbabilityTheory.Kernel.traj b.stepKernel n
Instances For
instance
Bandits.Bandit.instIsMarkovKernelForallSubtypeNatMemFinsetIicProdRealForallTraj
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
(n : ℕ)
:
noncomputable def
Bandits.Bandit.measure
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
:
MeasureTheory.Measure (ℕ → α × ℝ)
Measure on the sequence of arms pulled and rewards observed generated by the bandit.
Equations
Instances For
instance
Bandits.Bandit.instIsProbabilityMeasureForallNatProdRealMeasure
{α : Type u_1}
{mα : MeasurableSpace α}
(b : Bandit α)
:
arm n
is the arm pulled at time n
. This is a random variable on the measurable space
ℕ → α × ℝ
.
Equations
- Bandits.arm n h = (h n).1
Instances For
Equations
Instances For
theorem
Bandits.condDistrib_arm_reward
{α : Type u_1}
{mα : MeasurableSpace α}
[StandardBorelSpace α]
[Nonempty α]
(b : Bandit α)
(n : ℕ)
:
theorem
Bandits.condDistrib_arm
{α : Type u_1}
{mα : MeasurableSpace α}
[StandardBorelSpace α]
[Nonempty α]
(b : Bandit α)
(n : ℕ)
: