The WrapRange functor wraps a bandit algorithm with the doubling trick.
This heuristic allows to use a bandit algorithm without knowing the reward
ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam).
When a value is observed above the range, the bandit algorithm is restarted
and the range interval is doubled in that direction.
val initialBandit :
val step :
bandit Obandit.rangedBandit ->
Obandit.rangedAction * bandit Obandit.rangedBandit