envs¶

envs ¶

rlib's environment subpackage.

Targets the modern Gymnasium 5-tuple API directly:

:class:RLEnv — abstract base class for rlib wrappers (provides __getattr__ delegation, unwrapped, context-manager support).
:class:RLVecEnv — abstract base for vectorised env runners (BatchEnv and DummyBatchEnv).
:func:make — re-export of :func:gymnasium.make.

Built-in custom envs (ApplePicker-v0, ApplePickerDeterministic-v0) are registered with Gymnasium at import time so gymnasium.make("ApplePicker-v0") works out of the box.

RLEnv ¶

Bases: ABC

Concrete-friendly base class for rlib adapters and wrappers.

Subclasses must implement :meth:reset and :meth:step to honour the canonical 5-tuple / (obs, info) contract. All other convenience members (unwrapped, __getattr__ forwarding, context-manager support, render, close, spec, ...) delegate to self.env when present so wrapper classes get sane defaults for free.

A subclass that does not wrap another env (e.g. a hand-written simulator) should set self.env = None and override the relevant members directly.

unwrapped `property` ¶

unwrapped: Any

Walk through nested wrappers to the bottom-most env.

reset `abstractmethod` ¶

reset(*, seed: Any = None, options: Any = None) -> tuple[Any, dict]

Reset the environment and return (obs, info).

Source code in rlib/envs/base.py

@abstractmethod
def reset(self, *, seed: Any = None, options: Any = None) -> tuple[Any, dict]:
    """Reset the environment and return ``(obs, info)``."""

step `abstractmethod` ¶

step(action: Any) -> tuple[Any, float, bool, bool, dict]

Step the environment and return the modern 5-tuple.

Source code in rlib/envs/base.py

@abstractmethod
def step(self, action: Any) -> tuple[Any, float, bool, bool, dict]:
    """Step the environment and return the modern 5-tuple."""

RLVecEnv ¶

Bases: ABC

Abstract base for rlib's vectorised environment runners.

Agent rollout code in this library has historically consumed the legacy 4-tuple (obs, rewards, dones, infos). We keep that agent-facing shape on purpose — the per-env 5-tuple lives on the wrapper side, and RLVecEnv implementations are responsible for collapsing terminated/truncated into a single done flag in one place (this base class' :meth:merge_done helper).

reset `abstractmethod` ¶

reset() -> Any

Return a stacked batch of initial observations.

Source code in rlib/envs/base.py

@abstractmethod
def reset(self) -> Any:
    """Return a stacked batch of initial observations."""

step `abstractmethod` ¶

step(actions: Any) -> Any

Step every sub-env and return (obs, rewards, dones, infos).

Source code in rlib/envs/base.py

@abstractmethod
def step(self, actions: Any) -> Any:
    """Step every sub-env and return ``(obs, rewards, dones, infos)``."""

merge_done `staticmethod` ¶

merge_done(terminated: bool, truncated: bool) -> bool

Single canonical place where done = terminated or truncated.

Centralised so future agents that want to distinguish the two (e.g. for correct value-bootstrapping on truncation) only need to change call sites here.

Source code in rlib/envs/base.py

@staticmethod
def merge_done(terminated: bool, truncated: bool) -> bool:
    """Single canonical place where ``done = terminated or truncated``.

    Centralised so future agents that want to distinguish the two
    (e.g. for correct value-bootstrapping on truncation) only need
    to change call sites here.
    """
    return bool(terminated) or bool(truncated)

merge_info `staticmethod` ¶

merge_info(info: dict, terminated: bool, truncated: bool) -> dict

Annotate info with the legacy TimeLimit.truncated key.

Mirrors Gymnasium's behaviour so any agent that inspects the info dict for truncation sees the same value regardless of backend.

Source code in rlib/envs/base.py

@staticmethod
def merge_info(info: dict, terminated: bool, truncated: bool) -> dict:
    """Annotate ``info`` with the legacy ``TimeLimit.truncated`` key.

    Mirrors Gymnasium's behaviour so any agent that inspects the
    info dict for truncation sees the same value regardless of
    backend.
    """
    if not isinstance(info, dict):
        return info
    out = dict(info)
    out.setdefault(
        "TimeLimit.truncated",
        bool(truncated) and not bool(terminated),
    )
    return out

BatchEnv ¶

BatchEnv(env_constructor: Callable[..., RLEnv], env_id: str, num_envs: int, blocking: bool = False, make_args: dict | None = None, **env_args)

Bases: RLVecEnv

Run num_envs envs in parallel, one subprocess each.

Source code in rlib/envs/vec_env.py

def __init__(
    self,
    env_constructor: Callable[..., RLEnv],
    env_id: str,
    num_envs: int,
    blocking: bool = False,
    make_args: dict | None = None,
    **env_args,
):
    make_args = make_args or {}
    self.envs: list[Env] = []
    for _ in range(num_envs):
        inner = gym.make(env_id, **make_args)
        self.envs.append(Env(env_constructor(inner, **env_args)))
    self.blocking = blocking

DummyBatchEnv ¶

DummyBatchEnv(env_constructor: Callable[..., RLEnv], env_id: str, num_envs: int, make_args: dict | None = None, **env_args)

Bases: RLVecEnv

Synchronous (in-process) vec env runner.

Lower overhead than :class:BatchEnv for cheap envs where multi-processing is not worth it.

Source code in rlib/envs/vec_env.py

def __init__(
    self,
    env_constructor: Callable[..., RLEnv],
    env_id: str,
    num_envs: int,
    make_args: dict | None = None,
    **env_args,
):
    make_args = make_args or {}
    self.envs: list[RLEnv] = [
        env_constructor(gym.make(env_id, **make_args), **env_args) for _ in range(num_envs)
    ]

envs¶

envs ¶

RLEnv ¶

unwrapped property ¶

reset abstractmethod ¶

step abstractmethod ¶

RLVecEnv ¶

reset abstractmethod ¶

step abstractmethod ¶

merge_done staticmethod ¶

merge_info staticmethod ¶

BatchEnv ¶

DummyBatchEnv ¶

unwrapped `property` ¶

reset `abstractmethod` ¶

step `abstractmethod` ¶

reset `abstractmethod` ¶

step `abstractmethod` ¶

merge_done `staticmethod` ¶

merge_info `staticmethod` ¶