1
Introduction
2
Iterative stochastic algorithms
▶
2.1
Stationary environment
2.2
Probability space: Ionescu-Tulcea theorem
▶
2.2.1
Ionescu-Tulcea theorem
2.2.2
Case of an algorithm-environment interaction
2.3
Finitely many actions
2.4
Scalar rewards
3
Stochastic multi-armed bandits
▶
3.1
Algorithm, bandit and probability space
3.2
The array model of rewards
▶
3.2.1
Measurability
3.2.2
Independence
3.2.3
Laws
3.3
Law of the n
th
pull
3.4
Regret and other bandit quantities
4
Concentration inequalities
▶
4.1
Sub-Gaussian random variables
4.2
Concentration of the sums of rewards in bandit models
▶
4.2.1
Sub-Gaussian rewards
5
Bandit algorithms
▶
5.1
Explore-Then-Commit
5.2
UCB
6
Practical Algorithms
▶
Alternative strategy: atomic typeclasses
Error sensitivity analysis
7
Bibliography
A
Conditional independence
Dependency graph
LeanBandits
Rémy Degenne, Paulo Rauber
A Lean package for bandit algorithms
1
Introduction
2
Iterative stochastic algorithms
2.1
Stationary environment
2.2
Probability space: Ionescu-Tulcea theorem
2.2.1
Ionescu-Tulcea theorem
2.2.2
Case of an algorithm-environment interaction
2.3
Finitely many actions
2.4
Scalar rewards
3
Stochastic multi-armed bandits
3.1
Algorithm, bandit and probability space
3.2
The array model of rewards
3.2.1
Measurability
3.2.2
Independence
3.2.3
Laws
3.3
Law of the n
th
pull
3.4
Regret and other bandit quantities
4
Concentration inequalities
4.1
Sub-Gaussian random variables
4.2
Concentration of the sums of rewards in bandit models
4.2.1
Sub-Gaussian rewards
5
Bandit algorithms
5.1
Explore-Then-Commit
5.2
UCB
6
Practical Algorithms
Alternative strategy: atomic typeclasses
Error sensitivity analysis
7
Bibliography
A
Conditional independence