module WrapRange:
The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a bandit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.
Parameters: |
|
type
bandit
val initialBandit : bandit Obandit.rangedBandit
val step : bandit Obandit.rangedBandit ->
float ->
Obandit.rangedAction * bandit Obandit.rangedBandit