module MakeFixedExp3:
The Exp3 Bandit for adversarial regret minimization with a decaying learning rate as per [1].
| Parameters: |
|
type bandit
The internal data structure of the bandit algorithm.
val initialBandit : banditThe initial state of the bandit algorithm.
val step : bandit -> float -> int * banditstep r advances the bandit game one step, where r is the reward for
the last action. The result of this call is the next action, encoded as an
integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit.
The reward range depends on the bandit algorithm in use and the first reward
provided to the algorithm is discarded.