DAAC¶
DAAC ¶
Decoupled Advantage Actor-Critic (DAAC).
PolicyModel ¶
PolicyModel(model, input_shape, action_size, config: PPOConfig, *, adv_coeff: float = 0.25, build_optimiser: bool = True, optim: type[Optimizer] = torch.optim.Adam, optim_args: dict | None = None, **model_args)
Bases: PPOModel
DAAC policy head: clipped PPO objective with a separate advantage prediction.
Source code in rlib/DAAC/model.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
DAACTrainer ¶
DAACTrainer(envs, agent: DAAC, val_envs, config: DAACTrainerConfig)
Bases: SyncMultiEnvTrainer
Trainer for the Decoupled Advantage Actor-Critic agent.
Source code in rlib/DAAC/trainer.py
27 28 29 30 31 32 33 34 35 36 37 | |
DAACTrainerConfig
dataclass
¶
DAACTrainerConfig(train_mode: TrainMode = TrainMode.NSTEP, returns: Returns = Returns.NSTEP, total_steps: int = 50000000, nsteps: int = 5, gamma: float = 0.99, lambda_: float = 0.95, validate_freq: int = 1000000, num_val_episodes: int = 50, max_val_steps: int = 10000, log_dir: str = 'logs/', model_dir: str = 'models/', save_freq: int = 0, log_scalars: bool = True, update_target_freq: int = 0, render_freq: int = 0, policy_epochs: int = 1, value_epochs: int = 9, num_minibatches: int = 8)