Changelog¶
All notable changes to rlib are documented in this file. The format is loosely based on Keep a Changelog and this project adheres to Semantic Versioning.
3.0.0 - Unreleased¶
A large modernisation of the library covering packaging, the env layer, the
agent / trainer split, configuration, the CLI, docs and CI. It contains
breaking changes for users still on legacy gym and for anything that
imported from rlib.networks or rlib.utils.SyncMultiEnvTrainer.
Added¶
Packaging, licensing and tooling¶
- Apache 2.0 license (
LICENSE+NOTICErecording attribution to OpenAI Baselines), PEP 621 packaging inpyproject.tomlwith optional extras[atari],[classic],[mujoco],[docs],[dev], aDockerfilefor reproducible CPU runtimes, arequirements.txtforpip install -rworkflows, and a PEP 561py.typedmarker. - GitHub Actions CI (
.github/workflows/ci.yml):rufflint + format-check,mypyoverrlib/envsandrlib/utils/schedulers.py,pytestmatrix on Python 3.11 + 3.12,python -m build+twine checkartefact upload. - Docs CI (
.github/workflows/docs.yml):mkdocs build --strictartefact +actions/deploy-pages@v4deploy on push tomasterorv3. - Makefile mirroring the CI workflow (
make install / lint / format / format-check / typecheck / test / build / docs / ci / clean) so the same commands run locally and in CI.TYPECHECK_PATHSis shared. pre-commitconfiguration runningruff+ruff-format.
Environment layer¶
rlib.envspackage — single canonical, backend-agnostic env layer:RLEnv— abstract base class for the modern 5-tuple(obs, reward, terminated, truncated, info)step /(obs, info)reset contract, with helpful defaults (unwrapped,__getattr__forwarding, context-manager support,render,close,spec).RLVecEnv— abstract base for vectorised env runners. Static helpersmerge_done(terminated, truncated)andmerge_info(info, terminated, truncated)are the single canonical place in the codebase that collapses the 5-tuple into the legacydoneflag.make(env_or_id, **kwargs)— re-exportsgymnasium.makeso import sites do not need to touchgymnasiumdirectly.rlib.envs.vec_env—BatchEnv(multiprocessing),DummyBatchEnv(in-process) andChunkEnv(sharded multiprocessing), all subclassingRLVecEnv.rlib.envs.wrappers— Atari + classic preprocessing wrappers (AtariEnv,RescaleEnv,NoopResetEnv,FireResetEnv,EpisodicLifeEnv,ClipRewardEnv,StackEnv,AutoResetEnv,ChannelsFirstEnv,GreyScaleEnv,ToTorchEnv, …) all subclassingRLEnv.rlib.envs.apple_picker— theApplePicker-v0/ApplePickerDeterministic-v0exploration envs from the RANDAL paper, ported to the Gymnasium 5-tuple API and auto-registered at import time.
Training infrastructure¶
rlib.trainingpackage — promoted fromrlib.utilsso the trainer, configs and validation code are first-class:SyncMultiEnvTrainer— synchronous multi-env trainer.TrainerConfig+ per-agent subclasses (PPOTrainerConfig,RNDTrainerConfig,RANDALTrainerConfig,DDQNTrainerConfig,DAACTrainerConfig,UnrealTrainerConfig) co-located with their trainer modules.TrainMode(StrEnum:NSTEP/ONESTEP).Returns— enum-of-functions wrappingnstep_return,lambda_returnandGAE. Replaces all the previousreturn_type="GAE"string arguments and centralises the maths inrlib.training.returns.Validator/make_validator(val_envs)— pluggable thread / vec / in-process validation runner used by every trainer.tqdmprogress bars on every training loop (base trainer + every subclass), with livescore / loss / fpspostfix metrics.validation_summaryand saved-model logs usetqdm.write.- Auto-logged hyperparameters:
dataclasses.asdictof the trainer- agent configs is written to
<log_dir>/hyperparameters.txtat the start of every run.
- agent configs is written to
Agent layer¶
rlib.agent.Agent+rlib.agent.ModelConfig— single base class for every agent, with the dataclass-config pattern (frozen configs co-located in each agent'smodel.py).- Every agent split into
model.py(network + config) andtrainer.py(training loop + per-trainer config). Each subpackage exports a curated__all__. rlib.models— promoted fromrlib.networksand now houses every reusable building block (NatureCNN,MaskedLSTMBlock, MLP heads, …).rlib.networksis gone.
CLI and YAML configs¶
rlib._cli— Hydra-style YAML runner usable aspython -m rlib.<Agent> path/to/config.yaml [--set key=value ...]. Supportsconstructor: dotted.pathinstantiation,partial: trueforfunctools.partial,${name}interpolation against runtime namespace values (device,agent, env-factory bundle), and helpersatari_envs(...)/classic_envs(...)/clone_module(...)for building the train + val env pair from a single declaration. Auto-coercesreturns: "GAE"andtrain_mode: "nstep"strings to the matching enum members.- Per-agent
__main__shims (python -m rlib.A2C config.yaml, etc.) for every agent.
Examples and docs¶
examples/— runnable scripts (cartpole_a2c.py,atari_ppo.py,montezuma_rnd.py).examples/paper/— full reproductions of the experiments from arXiv:1910.09281:examples/paper/scripts/{atari,classic}_*.py— 11 Python recipe scripts, one per (agent, env class) pair.examples/paper/configs/*.yaml— 11 matching YAML configs that reproduce each script's hyperparameters via therlib._clirunner.- MkDocs site with
mkdocs-material, including: - Hand-written overviews (
index.mdincludesREADME.mdviapymdownx.snippets, plusagents.md,environments.md,wrappers.md). - API reference auto-generated by
mkdocstrings[python]covering every public agent package, theAgentbase,rlib.training,rlib.envs,rlib.modelsandrlib.utils. - Deployed to https://jhare96.github.io/reinforcement-learning/.
Changed¶
- Python 3.11+ required (was 3.8+). Driven by
enum.member(used by theReturnsenum) being a 3.11 feature. New code uses PEP 604X | Yunions and modern built-in generic aliases. - PyTorch 1.13+, Gymnasium 0.29+, PyYAML ≥ 6, tqdm ≥ 4.60.
- Trainers consume agents through
self.agent(wasself.model). Every trainer subclass has a class-levelagent: <ConcreteSubclass>annotation so type-checkers narrow correctly. SyncMultiEnvTraineris now config-only — instead of accepting every hyperparameter as a kwarg, you pass a singleTrainerConfig(or per-agent subclass).- Agents are now config-only —
ModelConfig(or per-agent subclass) carrieslr,decay_steps,grad_clip,entropy,value, … so the network constructor signature stays small. - Vec envs cleaned up:
BatchEnv/DummyBatchEnv/ChunkEnvconsume the wrappers' 5-tuple internally and produce the legacy 4-tuple at exactly one boundary, so agent rollout code is unchanged. - README rewritten with installation, quickstart (using YAML CLI),
agent overview, citation and contribution sections; the docs home
page now includes
README.mddirectly.
Removed¶
- Legacy
gymsupport. Anywhere that previously importedgymmustimport gymnasium as gym(or userlib.envs.make). TheLegacyGymAdapterandregister_backendextension points have been removed; all envs now flow through the canonical Gymnasium API. rlib.utils.gym_compat(already deprecated in v3 development) — userlib.envs.make/rlib.envs.wrapdirectly.rlib.networkspackage — replaced byrlib.agent(base class +ModelConfig) plusrlib.models(network building blocks).rlib.utils.SyncMultiEnvTrainer— moved torlib.training.- Demo
def main()blocks previously embedded in each agent module — replaced by the YAML CLI +examples/.
Fixed¶
- Several latent bugs in trainers, models and replay memory found while
the library was being refactored (
fastsample().item()on 0-d tensors, RND / RANDAL observation-shape mismatches, anstepstypo in the n-step DDQN target,get_valueshape, …). - mypy clean across
rlib/envsandrlib/utils/schedulers.py; ruff + ruff-format clean across the whole repository. mkdocs --strictpasses (the changelog and examples links are now absolute URLs).
Migration notes¶
-from rlib.utils.SyncMultiEnvTrainer import SyncMultiEnvTrainer
+from rlib.training import SyncMultiEnvTrainer, TrainerConfig
-from rlib.utils.VecEnv import BatchEnv, DummyBatchEnv
+from rlib.envs import BatchEnv, DummyBatchEnv
-from rlib.utils.wrappers import AtariEnv
+from rlib.envs.wrappers import AtariEnv
-from rlib.networks.networks import NatureCNN
+from rlib.models import NatureCNN
-from rlib.utils.gym_compat import gym
+import gymnasium as gym
Trainer(envs, model=..., total_steps=..., nsteps=..., ...) becomes:
trainer = A2C(envs, agent, val_envs, config=TrainerConfig(
total_steps=int(1e5), nsteps=5, validate_freq=int(2e4),
log_dir="logs/A2C/CartPole", model_dir="models/A2C/CartPole",
))
trainer.train()
For a fully declarative setup, prefer the YAML CLI:
python -m rlib.A2C examples/paper/configs/classic_a2c.yaml
2.0 - 2021¶
- PyTorch port of the original library (previously TensorFlow-based).
1.0 - 2019¶
- Initial public release accompanying the RANDAL paper.