Array-of-rewards probability space for stochastic bandits #
We build a particular probability space for stochastic bandits, called the "array model", in which
an infinite array of i.i.d. rewards is first produced for all actions. When the algorithm chooses
action a for the nth time, it receives the reward in the row a of the array and column n.
Some statements about bandit algorithms are easier to prove in this space, and can then be transfered to any other probability space using the fact that the conditinonal distributions of the arms and rewards specified in the bandit model determine their laws uniquely.
Main definitions #
streamMeasure ν: probability measure on the space of infinite arrays of rewards, where the rewards in each row are i.i.d. according toν.probSpace α R: probability space for the array model of stochastic bandits with action spaceαand reward spaceR.arrayMeasure ν: probability measure onprobSpace α Rfor the array model of stochastic bandits with reward kernelν.
Measure of an infinite stream of rewards from each action.
Equations
- Bandits.streamMeasure ν = MeasureTheory.Measure.infinitePi fun (x : ℕ) => MeasureTheory.Measure.infinitePi ⇑ν
Instances For
Probability space for the array model of stochastic bandits.
Equations
- Bandits.ArrayModel.probSpace α R = ((ℕ → ↑unitInterval) × (ℕ → α → R))
Instances For
Equations
- One or more equations did not get rendered due to their size.
Probability measure for the array model of stochastic bandits.
Equations
Instances For
The initial action is the image of a uniform random variable by this function.
Equations
Instances For
The next action is the image of the history and a uniform random variable by this function.
Equations
- Bandits.ArrayModel.algFunction alg n = ⋯.choose
Instances For
History of actions and rewards up to time n in the array model.
Equations
- One or more equations did not get rendered due to their size.
- Bandits.ArrayModel.hist alg ω 0 = fun (x : ↥(Finset.Iic 0)) => (Bandits.ArrayModel.initAlgFunction alg (ω.1 0), ω.2 0 (Bandits.ArrayModel.initAlgFunction alg (ω.1 0)))
Instances For
Action taken at time n in the array model.
Equations
- Bandits.ArrayModel.action alg n ω = (Bandits.ArrayModel.hist alg ω n ⟨n, ⋯⟩).1
Instances For
Reward received at time n in the array model.
Equations
- Bandits.ArrayModel.reward alg n ω = (Bandits.ArrayModel.hist alg ω n ⟨n, ⋯⟩).2
Instances For
All random variables in the space, except for the unseen rewards for action a after
time n.
Equations
- One or more equations did not get rendered due to their size.
Instances For
The conditional distribution of the reward at time n + 1, given the action at time n + 1
and the number of times that action has been pulled before time n + 1, is equal to
the kernel ν.
The conditional distribution of the reward at time n + 1, given the history up to time n,
the action at time n + 1, and the number of times that action has been pulled before time n + 1,
is equal to the kernel ν.
The reward at time n + 1 is conditionally independent of the history up to time n,
given the action at time n + 1 and the number of times that action has been pulled before
time n + 1.
The conditional distribution of the reward at time n + 1, given the history up to time n
and the action at time n + 1, is equal to the kernel ν.