Commit 6a8059d7 authored by Valentin Reis's avatar Valentin Reis
Browse files

[feature] tutorial improvements + vanishing rates

parent eb64fb23
......@@ -144,7 +144,7 @@ doc: src/Bandit/Tutorial.hs hbandit.cabal hbandit.nix
buildInputs = [cabal-install];
}
' --run <<< bash '
cabal v2-haddock hbandit --haddock-internal --builddir=.build
cabal v2-haddock hbandit --haddock-internal
'
.PRECIOUS: src/Bandit/Tutorial.hs
......@@ -160,6 +160,7 @@ src/Bandit/Tutorial.hs: literate/tutorial.md hbandit.nix src
libraryHaskellDepends = [
aeson
inline-r
data-default
pretty-simple
];
description = "extra";
......@@ -173,6 +174,7 @@ src/Bandit/Tutorial.hs: literate/tutorial.md hbandit.nix src
];
buildInputs = [
inline-r
data-default
aeson
pretty-simple
panhandle
......@@ -197,6 +199,7 @@ README.md: literate/readme.md
name="pandoc-tools";
buildInputs = [
inline-r
data-default
aeson
pretty-simple
panhandle
......
......@@ -3,7 +3,7 @@ hbandit
Safe multi-armed bandit implementations:
- Eps-Greedy (fixed rate)
- Eps-Greedy (fixed rate, inverse squared rate)
- Exp3 (hyperparameter-free rate from \[[1](#ref-bubeck2012regret)\])
- Exp4.R \[[2](#ref-sun2017safety)\]
......
{ pkgs ? import (builtins.fetchTarball
"http://nixos.org/channels/nixos-20.03/nixexprs.tar.xz") { } }:
"https://github.com/NixOS/nixpkgs/archive/20.03.tar.gz") { } }:
with pkgs.lib;
let
......
cradle: {cabal: {component: "hbandit"}}
......@@ -8,7 +8,7 @@ link-citations: true
Safe multi-armed bandit implementations:
- Eps-Greedy (fixed rate)
- Eps-Greedy (fixed rate, inverse squared rate)
- Exp3 (hyperparameter-free rate from @bubeck2012regret)
- Exp4.R @sun2017safety
......
literate/regretPlot.png

66.6 KB | W: | H:

literate/regretPlot.png

86 KB | W: | H:

literate/regretPlot.png
literate/regretPlot.png
literate/regretPlot.png
literate/regretPlot.png
  • 2-up
  • Swipe
  • Onion skin
literate/summaryPlot.png

7.64 KB | W: | H:

literate/summaryPlot.png

7.54 KB | W: | H:

literate/summaryPlot.png
literate/summaryPlot.png
literate/summaryPlot.png
literate/summaryPlot.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -62,9 +62,13 @@ build-depends:
{-# LANGUAGE OverloadedLists #-}
{-# LANGUAGE OverloadedLabels #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE QuasiQuotes #-}
import Protolude
import Text.Pretty.Simple
......@@ -121,18 +125,19 @@ main = do
```
> -- * Non-contextual
> -- | We'll first cover the case of simple MABs that do not use context information.
> -- | This tutorial only covers non-contextual bandit algorithms.
>
> -- ** Classes
> --
> -- | The main algorithm class for non-contextual bandits is 'Bandit'. This class gives
> -- *** Non-contextual
> --
> -- | The algorithm class for non-contextual bandits is 'Bandit'. This class gives
> -- types for a basic bandit game between a learner and an environment, where the
> -- learner has access to a random generator and is defined via a stateful 'step'
> -- function. All non-contetual bandit algorithms in this library are instances of this.
> -- function.
> Bandit.Class.Bandit(..)
>
> -- *** Example instance: Epsilon-Greedy
> -- **** example instance: Epsilon-Greedy
> --
> -- | Let's take a look at the instance for the classic fixed-rate \(\epsilon\)-Greedy
> -- algorithm. The necessary hyperparameters are the number of arms and the rate value,
......@@ -182,15 +187,15 @@ onePass hyper g adversary = runGame initialGame
```
> -- | Specializing this to the 'EpsGreedy' datatype on a small toy dataset:
> -- | Specializing this to the 'EpsGreedy' datatype on a small toy dataset, using a fixed rate:
```{.haskell pipe="tee -a Tmodule.hs | awk '{print \"> -- > \" $0}' | (echo '> -- | ' ;cat - )"}
runOnePassEG :: StdGen -> GameState (EpsGreedy Bool) Bool Double
runOnePassEG :: StdGen -> GameState (EpsGreedy Bool FixedRate) Bool Double
runOnePassEG g = onePass hyper g (getZipList $ f <$> ZipList [40, 2, 10] <*> ZipList [4, 44 ,3] )
where
f a b = \case True -> a; False -> b
hyper = EpsGreedyHyper {epsilon = 0.5, arms = Bandit.Arms [True, False]}
hyper = EpsGreedyHyper {rateRep = (FixedRate 0.5), arms = Bandit.Arms [True, False]}
printOnePassEG :: IO ()
printOnePassEG = putText $
......@@ -207,17 +212,20 @@ printOnePassEG = putText $
```
> -- *** Other classes
> -- | Some other, more restrictive classes are available in [Bandit.Class](Bandit-Class.html) for convenience. See for
> -- example 'Bandit.Class.ParameterFreeMAB', which exposes a hyperparameter-free interface for
> -- algorithms that don't need any information besides the arm count. Those instances are not necessary
> -- per se, and the 'Bandit' class is always sufficient. Note that some instances make agressive use
> -- of type refinement (See e.g. Bandit.Exp3.Exp3) through the 'Refined' package.
> -- In particular, we are about to make use of the \(\left[0,1\right]\) interval through the 'ZeroOne'
> -- type alias.
> ,Bandit.Types.ZeroOne
> -- ** Algorithm comparison
> -- *** Contextual
> --
> -- | The algorithm class for contextual bandits is 'ContextualBandit'. This class gives
> -- types for a bandit game between a learner and an environment with context, where the
> -- learner has access to a random generator and is defined via a stateful 'step'
> -- function.
> , Bandit.Class.ContextualBandit(..)
> -- | The 'ExpertRepresentation' class is used to encode experts.
> , Bandit.Class.ExpertRepresentation(..)
> -- ** Non-contextual algorithm comparison
> -- | This subsection runs bandit experiments on an example dataset with some of the @Bandit@ instances.
> -- The data for this tutorial is generated in R using the [inline-r](https://hackagehaskell.org/package/inline-r) package.
> -- Let's define a simple problem with three gaussian arms. We will threshold all cost values to \(\left[0,1\right]\).
......@@ -281,66 +289,74 @@ exp3 dataset g =
g
(toAdversary $ refineDataset dataset)
greedy :: [[Double]] -> StdGen -> Double -> GameState (EpsGreedy Int) Int (Double)
greedy dataset g eps =
greedy :: (Rate r) => [[Double]] -> StdGen -> r -> GameState (EpsGreedy Int r) Int (Double)
greedy dataset g r =
onePass
(EpsGreedyHyper {epsilon = eps, arms = Bandit.Arms [0..2]})
(EpsGreedyHyper {rateRep = r, arms = Bandit.Arms [0..2]})
g
(toAdversary dataset)
simulation :: Int -> IO ([Int],[Int],[Double],[Double],[Double])
simulation seed = do
data SimResult t = SimResult {
t :: t Int,
seed :: t Int,
greedy05 :: t Double,
greedy03 :: t Double,
greedysqrt05 :: t Double,
exp3pf :: t Double
} deriving (Generic)
simulation :: Int -> Int -> IO (SimResult [])
simulation tmax seed@(mkStdGen -> g) = do
dataset <- generateGaussianData tmax (unsafeRefine <$> [0.1, 0.5, 0.6])
return ([1 .. tmax], Protolude.replicate tmax seed, greedy05 dataset, greedy03 dataset, exp3pf dataset)
where tmax = 400
g = mkStdGen seed
greedy05 :: [[Double]] -> [Double]
greedy05 dataset = extract $ greedy dataset g 0.5
greedy03 :: [[Double]] -> [Double]
greedy03 dataset = extract $ greedy dataset g 0.3
exp3pf :: [[Double]] -> [Double]
exp3pf dataset = fmap unrefine . extract $ exp3 dataset g
extract = Protolude.toList . Sequence.reverse . historyLosses
newtype Reducer = Reducer {getReducer :: ([Int],[Int],[Double],[Double],[Double])}
instance Semigroup Reducer where
(getReducer -> (a,b,c,d,e)) <> (getReducer -> (a',b',c',d',e')) = Reducer (a<>a',b<>b',c<>c',d<>d',e<>e')
instance Monoid Reducer where
mempty = Reducer ([],[],[],[],[])
```
```{.haskell pipe="bash execute.sh expe"}
results <- forM ([2..10] ::[Int]) simulation
let exported = T.unpack $ T.decodeUtf8 $ encode $ getReducer $ mconcat (Reducer <$> results)
[r|
data.frame(t(jsonlite::fromJSON(exported_hs))) %>%
summary %>%
print
|]
return $ SimResult {
t = [1 .. tmax],
seed = Protolude.replicate tmax seed,
greedy05 = extract $ greedy dataset g (FixedRate 0.5),
greedy03 = extract $ greedy dataset g (FixedRate 0.3),
greedysqrt05 = extract $ greedy dataset g (InverseSqrtRate 0.5),
exp3pf = fmap unrefine . extract $ exp3 dataset g
}
where
extract = Protolude.toList . Sequence.reverse . historyLosses
instance Semigroup (SimResult []) where
x <> y = SimResult {
t = f Main.t,
seed = f seed,
greedy05 = f greedy05,
greedy03 = f greedy03,
greedysqrt05 = f greedysqrt05,
exp3pf = f exp3pf
}
where f accessor = accessor x <> accessor y
instance Monoid (SimResult []) where
mempty = SimResult mempty mempty mempty mempty mempty mempty
instance ToJSON (SimResult []) where
toJSON SimResult{..} =
toJSON (t, seed, greedy05, greedy03, greedysqrt05, exp3pf)
```
```{.haskell pipe="bash ggplot.sh regretPlot 20 7 "}
results <- forM ([2..10] ::[Int]) (simulation 400)
let exported = T.unpack $ T.decodeUtf8 $ encode $ mconcat results
[r| data.frame(t(jsonlite::fromJSON(exported_hs))) %>%
rename(t = X1, iteration = X2, greedy05= X3, greedy03=X4, exp3=X5 ) %>%
rename(t = X1, iteration = X2, greedy05= X3, greedy03=X4, greedysqrt05=X5,exp3=X6 ) %>%
gather("strategy", "loss", -t, -iteration) %>%
mutate(strategy=factor(strategy)) %>%
group_by(strategy,iteration) %>%
mutate(regret = cumsum(loss-0.1)) %>%
ungroup() %>%
ggplot(., aes(t, regret, color=strategy, group=interaction(strategy, iteration))) +
geom_line() + ylab("External Regret")
geom_line(alpha=0.5) + ylab("External Regret")
|]
```
> ) where
> import Bandit.Class
> import Bandit.Types
> import Bandit.EpsGreedy
```{.haskell pipe="tee -a main.hs | awk '{print \"> -- \" $0}'"}
......
......@@ -11,16 +11,19 @@
module Bandit.Class
( -- * Generalized Bandit
Bandit (..),
ExpertRepresentation (..),
ContextualBandit (..),
-- * Discrete Multi-Armed-Bandits
Arms (..),
ParameterFreeMAB (..),
-- * Hyperparameters
-- | These typeclass-based indirection layers help avoid unserializable
-- hyperparameters.
ExpertRepresentation (..),
Rate (..),
)
where
import Bandit.Types
import Data.Coerce
import Protolude
import System.Random
......@@ -67,32 +70,25 @@ class (ExpertRepresentation er s a) => ContextualBandit b hyper s a l er | b ->
-- | @step loss@ iterates the bandit process one step forward.
stepCtx :: (RandomGen g, MonadState b m, Ord a) => g -> l -> s -> m (a, g)
-- | ExpertRepresentation er s a is a distribution over
-- experts.
-- | ExpertRepresentation er s a is a representation that can be casted
-- into a distribution over actions.
--
-- @represent er@ returns this distribution encoded as a conditional
-- @toExpert er@ returns the expert encoded as a conditional
-- distribution over actions.
class ExpertRepresentation er s a | er -> s, er -> a where
represent :: er -> (s -> NonEmpty (ZeroOne Double, a))
toExpert :: er -> (s -> NonEmpty (ZeroOne Double, a))
-- | Arms a represents a set of possible actions.
newtype Arms a = Arms (NonEmpty a)
deriving (Show, Generic)
instance ExpertRepresentation (ObliviousRep a) () a where
toExpert (ObliviousRep l) () = l
-- | Hyper-Parameter-free MAB. In this context, \(\mathbb{L}\) is known
-- statically, and \(\mathbb{A}\) is specified by \(\mathbb{H}\)=@Arms a@,
-- which is the set of finite non-empty sets. We define the regret \(R_T\) as:
-- | Rate r is a learning rate.
--
-- \[ R_T = \sum_{t=1}^{T} \ell_{a^t}^t - \text{min}_{a=1}^{K} \sum_{t=1}^{T}
-- \ell_{a}^t \]
class (Eq a, Bandit b (Arms a) a l) => ParameterFreeMAB b a l | b -> l where
-- @toRate r@ returns the rate schedule.
class Rate r where
toRate :: r -> Int -> Double
-- | @init as@ returns the initial state of the bandit algorithm, where @as@
-- is a set of available actions.
initPFMAB :: (RandomGen g) => g -> Arms a -> (b, a, g)
initPFMAB = init
instance Rate FixedRate where
toRate = const . coerce
-- | @step l@ iterates the bandit process one step forward by feeding loss
-- value @l@.
stepPFMAB :: (RandomGen g, MonadState b m) => g -> l -> m (a, g)
stepPFMAB = step
instance Rate InverseSqrtRate where
toRate x t = coerce x / sqrt (fromIntegral t)
{-# OPTIONS_GHC -fno-warn-partial-fields #-}
-- |
-- Module : Bandit.EpsGreedy
-- Copyright : (c) 2019, UChicago Argonne, LLC.
......@@ -13,49 +15,47 @@ module Bandit.EpsGreedy
( EpsGreedy (..),
Weight (..),
EpsGreedyHyper (..),
ScreeningGreedy (..),
ExploreExploitGreedy (..),
pickreturn,
pickRandom,
updateAvgLoss,
)
where
import Bandit.Class
import Bandit.Types
import Bandit.Util
import Control.Lens
import Control.Monad.Random as MR (fromList, runRand)
import Data.Generics.Labels ()
import Protolude
import System.Random
-- | The EpsGreedy state
data EpsGreedy a
= -- | Still screening for initial estimates
Screening (ScreeningGreedy a)
| -- | The sampling procedure has started.
ExploreExploit (ExploreExploitGreedy a)
deriving (Show)
data EpsGreedy a r
= EpsGreedy
{ t :: Int,
rate :: r,
lastAction :: a,
params :: Params a
}
deriving (Show, Generic)
data Params a = InitialScreening (Screening a) | Started (ExploreExploit a)
deriving (Show, Generic)
-- | A subcomponent of the EpsGreedy state.
data ScreeningGreedy a
= ScreeningGreedy
{ tScreening :: Int,
epsScreening :: Double,
screening :: a,
screened :: [(Double, a)],
-- | Still screening for initial estimates
data Screening a
= Screening
{ screened :: [(Double, a)],
screenQueue :: [a]
}
deriving (Show)
deriving (Show, Generic)
-- | A subcomponent of the EpsGreedy state.
data ExploreExploitGreedy a
= ExploreExploitGreedy
{ t :: Int,
eps :: Double,
lastAction :: a,
k :: Int,
-- | The sampling procedure has started.
data ExploreExploit a
= ExploreExploit
{ k :: Int,
weights :: NonEmpty (Weight a)
}
deriving (Show)
deriving (Show, Generic)
-- | The information maintaining structure for one action.
data Weight a
......@@ -67,82 +67,84 @@ data Weight a
deriving (Show)
deriving (Generic)
toW :: (Double, a) -> Weight a
toW (loss, action) = Weight loss 1 action
-- | The epsilon-greedy hyperparameter.
data EpsGreedyHyper a
data EpsGreedyHyper a r
= EpsGreedyHyper
{ epsilon :: Double,
{ rateRep :: r,
arms :: Arms a
}
deriving (Show)
-- | The fixed rate \(\epsilon\)-Greedy MAB algorithm.
-- | The variable rate \(\epsilon\)-Greedy MAB algorithm.
-- Offers no interesting guarantees, works well in practice.
instance (Eq a) => Bandit (EpsGreedy a) (EpsGreedyHyper a) a Double where
instance (Rate r, Eq a) => Bandit (EpsGreedy a r) (EpsGreedyHyper a r) a Double where
init g (EpsGreedyHyper e (Arms (a :| as))) =
( Screening $ ScreeningGreedy
{ tScreening = 1,
epsScreening = e,
screening = a,
screened = [],
screenQueue = as
init g (EpsGreedyHyper r (Arms (a :| as))) =
( EpsGreedy
{ t = 1,
rate = r,
lastAction = a,
params = InitialScreening $ Screening
{ screened = [],
screenQueue = as
}
},
a,
g
)
step g l =
get >>= \case
Screening sg ->
step g l = do
oldAction <- use #lastAction
schedule <- use #rate <&> toRate
e <- use #t <&> schedule
#t += 1
(a, newGen) <- use #params >>= \case
InitialScreening sg ->
case screenQueue sg of
(a : as) -> do
put . Screening $
sg
{ tScreening = tScreening sg + 1,
screening = a,
screened = (l, screening sg) : screened sg,
screenQueue = as
}
#params . #_InitialScreening .= Screening
{ screened = (l, oldAction) : screened sg,
screenQueue = as
}
return (a, g)
[] -> do
let eeg = ExploreExploitGreedy
{ t = tScreening sg + 1,
eps = epsScreening sg,
lastAction = screening sg,
k = length (screened sg) + 1,
weights = toW <$> ((l, screening sg) :| screened sg)
let ee = ExploreExploit
{ k = length (screened sg) + 1,
weights = toW <$> ((l, oldAction) :| screened sg)
}
pickreturn eeg g
where
toW :: forall a. (Double, a) -> Weight a
toW (loss, action) = Weight loss 1 action
ExploreExploit s -> do
#params . #_Started .= ee
pickreturn e g ee
Started s -> do
let eeg =
s
{ t = t s + 1,
weights = weights s <&> \w ->
if action w == lastAction s
{ weights = weights s <&> \w ->
if action w == oldAction
then updateAvgLoss l w
else w
}
pickreturn eeg g
pickreturn e g eeg
#lastAction .= a
return (a, newGen)
-- | Action selection and return
pickreturn ::
(RandomGen g, MonadState (EpsGreedy b) m) =>
ExploreExploitGreedy b ->
(RandomGen g, MonadState (EpsGreedy b r) m) =>
Double ->
g ->
ExploreExploit b ->
m (b, g)
pickreturn eeg g = do
let (a, g') = runRand (MR.fromList [(True, toRational $ eps eeg), (False, toRational $ 1 - eps eeg)]) g & \case
pickreturn eps g eeg = do
let (a, g') = runRand (MR.fromList [(True, toRational eps), (False, toRational $ 1 - eps)]) g & \case
(True, g'') -> pickRandom eeg g''
(False, g'') -> (action $ minimumBy (\(averageLoss -> a1) (averageLoss -> a2) -> compare a1 a2) (weights eeg), g'')
put . ExploreExploit $ eeg {lastAction = a}
return (a, g')
-- | Random action selection primitive
pickRandom :: (RandomGen g) => ExploreExploitGreedy a -> g -> (a, g)
pickRandom ExploreExploitGreedy {..} =
pickRandom :: (RandomGen g) => ExploreExploit a -> g -> (a, g)
pickRandom ExploreExploit {..} =
sampleWL $
fromMaybe
(panic "distribution normalization failure")
......
......@@ -119,9 +119,3 @@ recompute t k weights = updatep <$> weights
exp (- sqrt (2.0 * log (fromIntegral k) / fromIntegral (t * k)) * cL)
denom = getSum $ foldMap denomF weights
denomF (getCumulativeLoss . cumulativeLoss -> cL) = Sum $ expw cL
-- | Regret bound for this \(\mathbb{L}=[0,1]\)-loss hyperparameter-free EXP3 version:
-- \[
-- R_T \leq \sqrt{2 T K \ln K}
-- \]
instance (Eq a) => ParameterFreeMAB (Exp3 a) a (ZeroOne Double)
......@@ -107,7 +107,7 @@ instance
}
stepCtx g feedback s = do
weightedAdvice <- use #experts <&> fmap (fmap (($ s) . represent))
weightedAdvice <- use #experts <&> fmap (fmap (($ s) . toExpert))
lastAction <- use #lastAction
fromMaybe
pass
......@@ -185,11 +185,3 @@ mkDelta Exp4R {..} = fromIntegral $ 3 * k
-- | \( \lambda_1 = 0 \)
lambdaInitial :: R.Refined R.NonNegative Double
lambdaInitial = R.unsafeRefine 0
-- | Oblivious Categorical Expert Representation
newtype ObliviousRep a
= ObliviousRep (NonEmpty (ZeroOne Double, a))
deriving (Generic)
instance ExpertRepresentation (ObliviousRep a) () a where
represent (ObliviousRep l) () = l
......@@ -16,9 +16,13 @@ module Bandit.Tutorial (
-- > {-# LANGUAGE OverloadedLists #-}
-- > {-# LANGUAGE OverloadedLabels #-}
-- > {-# LANGUAGE DataKinds #-}
-- > {-# LANGUAGE FlexibleInstances #-}
-- > {-# LANGUAGE ScopedTypeVariables #-}
-- > {-# LANGUAGE StandaloneDeriving #-}
-- > {-# LANGUAGE NoImplicitPrelude #-}
-- > {-# LANGUAGE TemplateHaskell #-}
-- > {-# LANGUAGE DeriveAnyClass #-}
-- > {-# LANGUAGE RecordWildCards #-}
-- > {-# LANGUAGE QuasiQuotes #-}
-- > import Protolude
-- > import Text.Pretty.Simple
......@@ -70,18 +74,19 @@ module Bandit.Tutorial (
-- legend.box.background = element_rect(fill = "transparent")
-- )) |]
-- * Non-contextual
-- | We'll first cover the case of simple MABs that do not use context information.
-- | This tutorial only covers non-contextual bandit algorithms.
-- ** Classes
--
-- | The main algorithm class for non-contextual bandits is 'Bandit'. This class gives
-- *** Non-contextual
--
-- | The algorithm class for non-contextual bandits is 'Bandit'. This class gives
-- types for a basic bandit game between a learner and an environment, where the
-- learner has access to a random generator and is defined via a stateful 'step'
-- function. All non-contetual bandit algorithms in this library are instances of this.
-- function.
Bandit.Class.Bandit(..)
-- *** Example instance: Epsilon-Greedy
-- **** example instance: Epsilon-Greedy
--
-- | Let's take a look at the instance for the classic fixed-rate \(\epsilon\)-Greedy
-- algorithm. The necessary hyperparameters are the number of arms and the rate value,
......@@ -129,14 +134,14 @@ Bandit.Class.Bandit(..)
-- > #historyActions %= (action NonEmpty.<|)
-- > #historyLosses %= (loss Sequence.<|)
-- | Specializing this to the 'EpsGreedy' datatype on a small toy dataset:
-- | Specializing this to the 'EpsGreedy' datatype on a small toy dataset, using a fixed rate:
-- |