Functor Obandit.WrapRange

module WrapRange: 
functor (R : RangeParam) ->
functor (B : Bandit) -> RangedBandit with type bandit = B.bandit

The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a bandit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.

Parameters:
R : RangeParam
B : Bandit

type bandit 
val initialBandit : bandit Obandit.rangedBandit
val step : bandit Obandit.rangedBandit ->
float ->
Obandit.rangedAction * bandit Obandit.rangedBandit