Skip to content

envs

envs

rlib's environment subpackage.

Targets the modern Gymnasium 5-tuple API directly:

  • :class:RLEnv — abstract base class for rlib wrappers (provides __getattr__ delegation, unwrapped, context-manager support).
  • :class:RLVecEnv — abstract base for vectorised env runners (BatchEnv and DummyBatchEnv).
  • :func:make — re-export of :func:gymnasium.make.

Built-in custom envs (ApplePicker-v0, ApplePickerDeterministic-v0) are registered with Gymnasium at import time so gymnasium.make("ApplePicker-v0") works out of the box.

RLEnv

Bases: ABC

Concrete-friendly base class for rlib adapters and wrappers.

Subclasses must implement :meth:reset and :meth:step to honour the canonical 5-tuple / (obs, info) contract. All other convenience members (unwrapped, __getattr__ forwarding, context-manager support, render, close, spec, ...) delegate to self.env when present so wrapper classes get sane defaults for free.

A subclass that does not wrap another env (e.g. a hand-written simulator) should set self.env = None and override the relevant members directly.

unwrapped property

unwrapped: Any

Walk through nested wrappers to the bottom-most env.

reset abstractmethod

reset(*, seed: Any = None, options: Any = None) -> tuple[Any, dict]

Reset the environment and return (obs, info).

Source code in rlib/envs/base.py
53
54
55
@abstractmethod
def reset(self, *, seed: Any = None, options: Any = None) -> tuple[Any, dict]:
    """Reset the environment and return ``(obs, info)``."""

step abstractmethod

step(action: Any) -> tuple[Any, float, bool, bool, dict]

Step the environment and return the modern 5-tuple.

Source code in rlib/envs/base.py
57
58
59
@abstractmethod
def step(self, action: Any) -> tuple[Any, float, bool, bool, dict]:
    """Step the environment and return the modern 5-tuple."""

RLVecEnv

Bases: ABC

Abstract base for rlib's vectorised environment runners.

Agent rollout code in this library has historically consumed the legacy 4-tuple (obs, rewards, dones, infos). We keep that agent-facing shape on purpose — the per-env 5-tuple lives on the wrapper side, and RLVecEnv implementations are responsible for collapsing terminated/truncated into a single done flag in one place (this base class' :meth:merge_done helper).

reset abstractmethod

reset() -> Any

Return a stacked batch of initial observations.

Source code in rlib/envs/base.py
127
128
129
@abstractmethod
def reset(self) -> Any:
    """Return a stacked batch of initial observations."""

step abstractmethod

step(actions: Any) -> Any

Step every sub-env and return (obs, rewards, dones, infos).

Source code in rlib/envs/base.py
131
132
133
@abstractmethod
def step(self, actions: Any) -> Any:
    """Step every sub-env and return ``(obs, rewards, dones, infos)``."""

merge_done staticmethod

merge_done(terminated: bool, truncated: bool) -> bool

Single canonical place where done = terminated or truncated.

Centralised so future agents that want to distinguish the two (e.g. for correct value-bootstrapping on truncation) only need to change call sites here.

Source code in rlib/envs/base.py
141
142
143
144
145
146
147
148
149
@staticmethod
def merge_done(terminated: bool, truncated: bool) -> bool:
    """Single canonical place where ``done = terminated or truncated``.

    Centralised so future agents that want to distinguish the two
    (e.g. for correct value-bootstrapping on truncation) only need
    to change call sites here.
    """
    return bool(terminated) or bool(truncated)

merge_info staticmethod

merge_info(info: dict, terminated: bool, truncated: bool) -> dict

Annotate info with the legacy TimeLimit.truncated key.

Mirrors Gymnasium's behaviour so any agent that inspects the info dict for truncation sees the same value regardless of backend.

Source code in rlib/envs/base.py
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
@staticmethod
def merge_info(info: dict, terminated: bool, truncated: bool) -> dict:
    """Annotate ``info`` with the legacy ``TimeLimit.truncated`` key.

    Mirrors Gymnasium's behaviour so any agent that inspects the
    info dict for truncation sees the same value regardless of
    backend.
    """
    if not isinstance(info, dict):
        return info
    out = dict(info)
    out.setdefault(
        "TimeLimit.truncated",
        bool(truncated) and not bool(terminated),
    )
    return out

BatchEnv

BatchEnv(env_constructor: Callable[..., RLEnv], env_id: str, num_envs: int, blocking: bool = False, make_args: dict | None = None, **env_args)

Bases: RLVecEnv

Run num_envs envs in parallel, one subprocess each.

Source code in rlib/envs/vec_env.py
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def __init__(
    self,
    env_constructor: Callable[..., RLEnv],
    env_id: str,
    num_envs: int,
    blocking: bool = False,
    make_args: dict | None = None,
    **env_args,
):
    make_args = make_args or {}
    self.envs: list[Env] = []
    for _ in range(num_envs):
        inner = gym.make(env_id, **make_args)
        self.envs.append(Env(env_constructor(inner, **env_args)))
    self.blocking = blocking

DummyBatchEnv

DummyBatchEnv(env_constructor: Callable[..., RLEnv], env_id: str, num_envs: int, make_args: dict | None = None, **env_args)

Bases: RLVecEnv

Synchronous (in-process) vec env runner.

Lower overhead than :class:BatchEnv for cheap envs where multi-processing is not worth it.

Source code in rlib/envs/vec_env.py
277
278
279
280
281
282
283
284
285
286
287
288
def __init__(
    self,
    env_constructor: Callable[..., RLEnv],
    env_id: str,
    num_envs: int,
    make_args: dict | None = None,
    **env_args,
):
    make_args = make_args or {}
    self.envs: list[RLEnv] = [
        env_constructor(gym.make(env_id, **make_args), **env_args) for _ in range(num_envs)
    ]