A | |
alpha [Obandit.AlphaUCBParam] | The $ \alpha $ parameter. |
alpha [Obandit.AlphaPhiUCBParam] | The $ \alpha $ parameter. |
C | |
c [Obandit.DecayingEpsilonGreedyParam] | The $ c$ hyperparameter. |
D | |
d [Obandit.DecayingEpsilonGreedyParam] | The $ d$ hyperparameter, a tight lower bound on $ \max_{i=1,\cdots,K} \Delta_i $. |
E | |
epsilon [Obandit.EpsilonGreedyParam] | The $ \epsilon $ parameter. |
eta [Obandit.FixedExp3Param] | The fixed learning rate $ \eta $. |
I | |
initialBandit [Obandit.RangedBandit] | |
initialBandit [Obandit.Bandit] | The initial state of the bandit algorithm. |
invLFPhi [Obandit.AlphaPhiUCBParam] | The inverse of the Legendre-Fenchel transform of $ \psi $. |
K | |
k [Obandit.HorizonExp3Param] | The number of actions $K$ . |
k [Obandit.FixedExp3Param] | The number of actions $ K $ . |
k [Obandit.EpsilonGreedyParam] | The number of actions $ K $ . |
k [Obandit.DecayingEpsilonGreedyParam] | The number of actions $ K $ . |
k [Obandit.RateBanditParam] | The number of actions $K$ . |
k [Obandit.KBanditParam] | The number of actions $ K $ . |
k [Obandit.AlphaUCBParam] | The number of actions $ K $ . |
k [Obandit.AlphaPhiUCBParam] | The number of actions $ K $ . |
L | |
lower [Obandit.RangeParam] | The lower value of the range |
N | |
n [Obandit.HorizonExp3Param] | The $ n $ parameter, the horizon to optimize for. |
R | |
rate [Obandit.RateBanditParam] | The rate. |
S | |
step [Obandit.RangedBandit] | |
step [Obandit.Bandit] |
|
U | |
upper [Obandit.RangeParam] | The upper value of the range |