Regret #
Definitions of regret, gaps, pull counts #
noncomputable def
Bandits.regret
{α : Type u_1}
{mα : MeasurableSpace α}
(ν : ProbabilityTheory.Kernel α ℝ)
(k : ℕ → α)
(t : ℕ)
:
Regret of a sequence of pulls k : ℕ → α
at time t
for the reward kernel ν ; Kernel α ℝ
.
Equations
Instances For
noncomputable def
Bandits.gap
{α : Type u_1}
{mα : MeasurableSpace α}
(ν : ProbabilityTheory.Kernel α ℝ)
(a : α)
:
Gap of an arm a
: difference between the highest mean of the arms and the mean of a
.
Instances For
theorem
Bandits.gap_nonneg
{α : Type u_1}
{mα : MeasurableSpace α}
{ν : ProbabilityTheory.Kernel α ℝ}
{a : α}
[Fintype α]
:
Number of times arm a
was pulled up to time t
.
Equations
- Bandits.pullCount k a t = {s ∈ Finset.range t | k s = a}.card
Instances For
theorem
Bandits.regret_eq_sum_pullCount_mul_gap
{α : Type u_1}
{mα : MeasurableSpace α}
{ν : ProbabilityTheory.Kernel α ℝ}
{k : ℕ → α}
{t : ℕ}
[Fintype α]
:
noncomputable def
Bandits.bestArm
{α : Type u_1}
{mα : MeasurableSpace α}
[Fintype α]
[Nonempty α]
(ν : ProbabilityTheory.Kernel α ℝ)
:
α
Arm with the highest mean.
Equations
- Bandits.bestArm ν = ⋯.choose