Probability space for algorithm-environment interactions #
For any algorithm and environment, we construct a probability space on which we can define a sequence of random variables representing the actions and feedback generated by the interaction of the algorithm and the environment. The main ingredient of the construction is the Ionescu-Tulcea theorem.
TODO: actually, the probability measure is already defined in the Algorithm file. Reorganize?
action n is the action pulled at time n. This is a random variable on the measurable space
ℕ → α × ℝ.
Equations
- Learning.IT.action n h = (h n).1
Instances For
reward n is the reward at time n. This is a random variable on the measurable space
ℕ → α × R.
Equations
- Learning.IT.reward n h = (h n).2
Instances For
hist n is the history up to time n. This is a random variable on the measurable space
ℕ → α × R.
Equations
- Learning.IT.hist n h i = h ↑i
Instances For
Filtration of the algorithm Seq.
Equations
Instances For
Filtration generated by the history at time n-1 together with the action at time n.
Equations
- One or more equations did not get rendered due to their size.