Bandit #
Measure of an infinite stream of rewards from each action.
Equations
- Bandits.streamMeasure ν = MeasureTheory.Measure.infinitePi fun (x : ℕ) => MeasureTheory.Measure.infinitePi ⇑ν
Instances For
Probability space for the array model of stochastic bandits.
Equations
- Bandits.ArrayModel.probSpace α R = ((ℕ → ↑unitInterval) × (ℕ → α → R))
Instances For
Equations
- Bandits.ArrayModel.instMeasurableSpaceProbSpace = inferInstanceAs (MeasurableSpace ((ℕ → ↑unitInterval) × (ℕ → α → R)))
Probability measure for the array model of stochastic bandits.
Equations
Instances For
The initial action is the image of a uniform random variable by this function.
Equations
Instances For
The next action is the image of the history and a uniform random variable by this function.
Equations
- Bandits.ArrayModel.algFunction alg n = ⋯.choose
Instances For
History of actions and rewards up to time n in the array model.
Equations
- One or more equations did not get rendered due to their size.
- Bandits.ArrayModel.hist alg ω 0 = fun (x : ↥(Finset.Iic 0)) => (Bandits.ArrayModel.initAlgFunction alg (ω.1 0), ω.2 0 (Bandits.ArrayModel.initAlgFunction alg (ω.1 0)))
Instances For
Action taken at time n in the array model.
Equations
- Bandits.ArrayModel.action alg n ω = (Bandits.ArrayModel.hist alg ω n ⟨n, ⋯⟩).1
Instances For
Reward received at time n in the array model.
Equations
- Bandits.ArrayModel.reward alg n ω = (Bandits.ArrayModel.hist alg ω n ⟨n, ⋯⟩).2
Instances For
All random variables in the space, except for the unseen rewards for action a after
time n.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The conditional distribution of the reward at time n + 1, given the action at time n + 1
and the number of times that action has been pulled before time n + 1, is equal to
the kernel ν.
The conditional distribution of the reward at time n + 1, given the history up to time n,
the action at time n + 1, and the number of times that action has been pulled before time n + 1,
is equal to the kernel ν.
The reward at time n + 1 is conditionally independent of the history up to time n,
given the action at time n + 1 and the number of times that action has been pulled before
time n + 1.
The conditional distribution of the reward at time n + 1, given the history up to time n
and the action at time n + 1, is equal to the kernel ν.