Skip to content

Agents

rlib ships with the following agents. All agents (except A3C) subclass SyncMultiEnvTrainer and therefore share a common train() / validation interface, plus a python -m rlib.<Agent> <config>.yaml CLI entry point.

Agent Module Trainer class Off-/on-policy Best for
Advantage Actor Critic rlib.A2C A2C on-policy Classic control, baseline Atari
A2C with LSTM rlib.A2C A2CLSTMTrainer on-policy Partially observable tasks
Asynchronous A3C rlib.A3C (custom) on-policy CPU-only multi-worker setups
Synchronous n-step Double DQN rlib.DDQN SyncDDQN off-policy Discrete-action Atari
Proximal Policy Optimisation rlib.PPO PPOTrainer on-policy Strong general-purpose baseline
Random Network Distillation rlib.RND RNDTrainer on-policy + intrinsic Hard-exploration / sparse rewards
Intrinsic Curiosity Module rlib.Curiosity CuriosityTrainer on-policy + intrinsic Sparse-reward exploration
UNREAL-A2C / A2C2 rlib.Unreal UnrealTrainer on-policy + auxiliary Sample-efficient pixel learning
Decoupled Advantage AC rlib.DAAC DAACTrainer on-policy Generalisation in procgen-style envs
Value Iteration Networks rlib.VIN VINTrainer imitation / planning Grid-world planning tasks
RANDAL rlib.RANDAL RANDALTrainer RND + UNREAL combo Hard-exploration with auxiliary tasks

Common training pattern

Via the YAML CLI (recommended):

python -m rlib.A2C examples/paper/configs/classic_a2c.yaml

Or from Python:

import gymnasium as gym
from rlib.A2C import A2CTrainer, A2CConfig, ActorCritic
from rlib.envs import DummyBatchEnv
from rlib.models import MLP
from rlib.training import TrainerConfig

train_envs = DummyBatchEnv(lambda e: e, "CartPole-v1", num_envs=8)
val_envs = [gym.make("CartPole-v1") for _ in range(4)]

agent = ActorCritic(
    MLP,
    input_shape=train_envs.observation_space.shape,
    action_size=train_envs.action_space.n,
    config=A2CConfig(lr=7e-4, decay_steps=int(1e5), grad_clip=0.5),
)
A2CTrainer(
    envs=train_envs,
    agent=agent,
    val_envs=val_envs,
    config=TrainerConfig(total_steps=int(1e5), nsteps=5),
).train()

Full-fidelity reproductions of the experiments from arXiv:1910.09281 live under examples/paper/. For per-class API details see the API reference.