Skip to content

Environments

rlib is built directly on Gymnasium — the maintained successor to OpenAI Gym. The canonical env contract (RLEnv / RLVecEnv ABCs, BatchEnv / DummyBatchEnv runners, wrappers, the ApplePicker exploration env) lives in rlib.envs.

The 5-tuple contract

import gymnasium as gym

env = gym.make("CartPole-v1")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

All wrappers and vec-env runners shipped with rlib consume this tuple internally. The single boundary that collapses (terminated, truncated) into the legacy done flag for agent rollouts lives in RLVecEnv.merge_done / merge_info, so agents see a clean (obs, rewards, dones, infos) API.

Vectorised environments

Two runners are provided in rlib.envs:

  • BatchEnv — each env runs in its own subprocess via multiprocessing.Pipe. Use this for expensive envs (e.g. Atari).
  • DummyBatchEnv — all envs run in-process. Use this for cheap envs (e.g. classic control), where multiprocessing overhead dominates.
from rlib.envs import BatchEnv, DummyBatchEnv
from rlib.envs.wrappers import AtariEnv

envs = BatchEnv(AtariEnv, "ALE/Pong-v5", num_envs=16, k=4)

The rlib._cli runner exposes two convenience factories, atari_envs(id, num_envs, num_val_envs, frame_stack, episodic, ...) and classic_envs(id, num_envs, num_val_envs), which build the train + val pair from a single declaration; YAML configs under examples/paper/configs/ show the typical wiring.

Built-in env: ApplePicker

The ApplePicker-v0 / ApplePickerDeterministic-v0 exploration grid-world from the RANDAL paper is registered automatically when rlib.envs is imported.

Supported environment families

Family Install extra Notes
Classic control pip install -e ".[classic]" CartPole, MountainCar, Acrobot, ...
Atari pip install -e ".[atari]" ROMs auto-licensed via gymnasium[atari,accept-rom-license]
MuJoCo pip install -e ".[mujoco]" Continuous control

Other Gymnasium-compatible suites (e.g. MiniGrid, Procgen) work as long as their observation/action spaces are compatible with the chosen agent.