• 1 Introduction
  • 2 Iterative stochastic algorithms ▶
    • 2.1 Stationary environment
    • 2.2 Probability space: Ionescu-Tulcea theorem ▶
      • 2.2.1 Ionescu-Tulcea theorem
      • 2.2.2 Case of an algorithm-environment interaction
    • 2.3 Finitely many actions
    • 2.4 Scalar rewards
  • 3 Stochastic multi-armed bandits ▶
    • 3.1 Algorithm, bandit and probability space
    • 3.2 The array model of rewards ▶
      • 3.2.1 Measurability
      • 3.2.2 Independence
      • 3.2.3 Laws
    • 3.3 Law of the nth pull
    • 3.4 Regret and other bandit quantities
  • 4 Concentration inequalities ▶
    • 4.1 Sub-Gaussian random variables
    • 4.2 Concentration of the sums of rewards in bandit models ▶
      • 4.2.1 Sub-Gaussian rewards
  • 5 Bandit algorithms ▶
    • 5.1 Explore-Then-Commit
    • 5.2 UCB
  • 6 Practical Algorithms ▶
    • Alternative strategy: atomic typeclasses
    • Error sensitivity analysis
  • 7 Bibliography
  • A Conditional independence
  • Dependency graph

LeanBandits

Rémy Degenne, Paulo Rauber

A Lean package for bandit algorithms
  • 1 Introduction
  • 2 Iterative stochastic algorithms
    • 2.1 Stationary environment
    • 2.2 Probability space: Ionescu-Tulcea theorem
      • 2.2.1 Ionescu-Tulcea theorem
      • 2.2.2 Case of an algorithm-environment interaction
    • 2.3 Finitely many actions
    • 2.4 Scalar rewards
  • 3 Stochastic multi-armed bandits
    • 3.1 Algorithm, bandit and probability space
    • 3.2 The array model of rewards
      • 3.2.1 Measurability
      • 3.2.2 Independence
      • 3.2.3 Laws
    • 3.3 Law of the nth pull
    • 3.4 Regret and other bandit quantities
  • 4 Concentration inequalities
    • 4.1 Sub-Gaussian random variables
    • 4.2 Concentration of the sums of rewards in bandit models
      • 4.2.1 Sub-Gaussian rewards
  • 5 Bandit algorithms
    • 5.1 Explore-Then-Commit
    • 5.2 UCB
  • 6 Practical Algorithms
    • Alternative strategy: atomic typeclasses
    • Error sensitivity analysis
  • 7 Bibliography
  • A Conditional independence