Functor Obandit.MakeEpsilonGreedy

module MakeEpsilonGreedy: functor (P : EpsilonGreedyParam) -> Bandit  with type bandit = banditEstimates
The Epsilon-Greedy Bandit with a fixed exploration rate.
Parameters:
 P : EpsilonGreedyParam

type bandit
The internal data structure of the bandit algorithm.
val initialBandit : bandit
The initial state of the bandit algorithm.
val step : bandit -> float -> int * bandit
step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $\{ 0, \cdots , K-1 \}$, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.