RANDAL¶
RANDAL ¶
Random Network Distillation with Auxiliary Learning (RANDAL).
RANDAL ¶
RANDAL(policy_model, target_model, input_size, action_size, config: PPOConfig, *, pixel_control: bool = True, intr_coeff: float = 0.5, extr_coeff: float = 1.0, RP: float = 1, VR: float = 1, PC: float = 1, policy_args: dict | None = None, RND_args: dict | None = None, optim: type[Optimizer] = torch.optim.Adam, optim_args: dict | None = None)
Bases: Agent
Source code in rlib/RANDAL/model.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
pixel_loss ¶
pixel_loss(Qaux, Qaux_actions, Qaux_target)
Qaux_target temporal difference target for Q_aux
Source code in rlib/RANDAL/model.py
124 125 126 127 128 129 130 131 132 | |
predictor_loss ¶
predictor_loss(next_states, state_mean, state_std)
loss for predictor network
Source code in rlib/RANDAL/model.py
188 189 190 191 192 193 194 | |
RANDALTrainer ¶
RANDALTrainer(envs, agent: RANDAL, val_envs, config: RANDALTrainerConfig)
Bases: SyncMultiEnvTrainer
Trainer for the RANDAL agent (RND + UNREAL auxiliary tasks).
Source code in rlib/RANDAL/trainer.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
norm_obs ¶
norm_obs(obs)
normalise pixel intensity changes by recording min and max pixel observations not using per pixel normalisation because expected image is singular greyscale frame
Source code in rlib/RANDAL/trainer.py
74 75 76 77 78 | |
RANDALTrainerConfig
dataclass
¶
RANDALTrainerConfig(train_mode: TrainMode = TrainMode.NSTEP, returns: Returns = Returns.NSTEP, total_steps: int = 50000000, nsteps: int = 5, gamma: float = 0.99, lambda_: float = 0.95, validate_freq: int = 1000000, num_val_episodes: int = 50, max_val_steps: int = 10000, log_dir: str = 'logs/', model_dir: str = 'models/', save_freq: int = 0, log_scalars: bool = True, update_target_freq: int = 0, render_freq: int = 0, gamma_intr: float = 0.99, init_obs_steps: int = 600, num_epochs: int = 4, num_minibatches: int = 4, replay_length: int = 2000, norm_pixel_reward: bool = True)
Bases: RNDTrainerConfig
Hyperparameters for :class:RANDALTrainer.
Inherits the RND extra fields and adds the UNREAL replay buffer knobs.