RND¶
RND ¶
Random Network Distillation.
PPOIntrinsic ¶
PPOIntrinsic(model, input_size, action_size, config: PPOConfig, *, extr_coeff: float = 2.0, intr_coeff: float = 1.0, build_optimiser: bool = True, optim: type[Optimizer] = torch.optim.Adam, optim_args: dict | None = None, **model_args)
Bases: PPOModel
Twin-critic PPO with extrinsic + intrinsic value heads.
Source code in rlib/RND/model.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
RNDTrainer ¶
RNDTrainer(envs, agent: RND, val_envs, config: RNDTrainerConfig)
Bases: SyncMultiEnvTrainer
Trainer for the Random Network Distillation agent.
Source code in rlib/RND/trainer.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
RNDTrainerConfig
dataclass
¶
RNDTrainerConfig(train_mode: TrainMode = TrainMode.NSTEP, returns: Returns = Returns.NSTEP, total_steps: int = 50000000, nsteps: int = 5, gamma: float = 0.99, lambda_: float = 0.95, validate_freq: int = 1000000, num_val_episodes: int = 50, max_val_steps: int = 10000, log_dir: str = 'logs/', model_dir: str = 'models/', save_freq: int = 0, log_scalars: bool = True, update_target_freq: int = 0, render_freq: int = 0, gamma_intr: float = 0.99, init_obs_steps: int = 600, num_epochs: int = 4, num_minibatches: int = 4)
Bases: TrainerConfig
Hyperparameters for :class:RNDTrainer.
gamma is reused as the extrinsic discount; the intrinsic
discount is the new gamma_intr field.