Documentation

LeanBandits.Bandit

Bandit #

def MeasurableEquiv.piIicZero (α : Type u_1) [MeasurableSpace α] :
({ x : // x Finset.Iic 0 }α) ≃ᵐ α

Measurable equivalence between Iic 0 → α and α.

Equations
Instances For
    structure Bandits.Bandit (α : Type u_2) [MeasurableSpace α] :
    Type u_2

    A bandit interaction between an agent described by a policy and an environment given by reward distributions.

    Instances For
      noncomputable def Bandits.Bandit.stepKernel {α : Type u_1} { : MeasurableSpace α} (b : Bandit α) (n : ) :

      Kernel describing the distribution of the next arm-reward pair given the history up to n.

      Equations
      Instances For
        @[simp]
        theorem Bandits.Bandit.fst_stepKernel {α : Type u_1} { : MeasurableSpace α} (b : Bandit α) (n : ) :
        @[simp]
        theorem Bandits.Bandit.snd_stepKernel {α : Type u_1} { : MeasurableSpace α} (b : Bandit α) (n : ) :
        (b.stepKernel n).snd = b.ν.comp (b.policy n)
        noncomputable def Bandits.Bandit.traj {α : Type u_1} { : MeasurableSpace α} (b : Bandit α) (n : ) :

        Kernel sending a partial trajectory of the bandit interaction Iic n → α × ℝ to a measure on ℕ → α × ℝ, supported on full trajectories that start with the partial one.

        Equations
        Instances For
          noncomputable def Bandits.Bandit.measure {α : Type u_1} { : MeasurableSpace α} (b : Bandit α) :

          Measure on the sequence of arms pulled and rewards observed generated by the bandit.

          Equations
          Instances For
            def Bandits.arm {α : Type u_1} (n : ) (h : α × ) :
            α

            arm n is the arm pulled at time n. This is a random variable on the measurable space ℕ → α × ℝ.

            Equations
            Instances For
              def Bandits.reward {α : Type u_1} (n : ) (h : α × ) :

              reward n is the reward at time n. This is a random variable on the measurable space ℕ → α × ℝ.

              Equations
              Instances For
                def Bandits.hist {α : Type u_1} (n : ) (h : α × ) :
                { x : // x Finset.Iic n }α ×

                hist n is the history up to time n. This is a random variable on the measurable space ℕ → α × ℝ.

                Equations
                Instances For
                  theorem Bandits.condDistrib_arm_reward {α : Type u_1} { : MeasurableSpace α} [StandardBorelSpace α] [Nonempty α] (b : Bandit α) (n : ) :
                  ProbabilityTheory.condDistrib (fun (h : α × ) => (arm n h, reward n h)) (hist n) b.measure = b.stepKernel n