A2C¶
A2C ¶
Advantage Actor-Critic agents.
A2CConfig
dataclass
¶
A2CConfig(lr: float = 0.001, lr_final: float = 0.0, decay_steps: int = 600000, grad_clip: float | None = 0.5, device: str = 'cuda', entropy_coeff: float = 0.01, value_coeff: float = 0.5)
Bases: ModelConfig
Hyperparameters for advantage actor-critic agents (A2C / A3C / UNREAL).
Attributes:
| Name | Type | Description |
|---|---|---|
entropy_coeff |
float
|
Coefficient on the policy entropy bonus. |
value_coeff |
float
|
Weight on the value function loss term. |
A2CModel ¶
A2CModel(action_size: int, config: A2CConfig)
Bases: Agent
A2C-family base class: defines the actor-critic + entropy loss.
Concrete subclasses (feed-forward, recurrent, ...) only need to
implement forward, :meth:evaluate and :meth:backprop; they
all share the same loss function via :meth:loss.
Source code in rlib/A2C/model.py
50 51 52 53 54 | |
loss ¶
loss(policy: Tensor, R: Tensor, V: Tensor, actions_onehot: Tensor) -> torch.Tensor
Standard A2C/A3C actor–critic loss with entropy bonus.
Combines:
- a half-MSE value loss on
R - V, - the negative log-likelihood policy gradient using a detached advantage, and
- an entropy bonus on the action distribution.
policy is expected to be a normalised distribution
(i.e. the output of a softmax); we numerically clip it before
taking log so the loss stays finite for near-deterministic
policies.
Source code in rlib/A2C/model.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
ActorCritic ¶
ActorCritic(model, input_size, action_size, config: A2CConfig, *, build_optimiser: bool = True, optim: type[Optimizer] = torch.optim.RMSprop, optim_args: dict | None = None, **model_args)
Bases: A2CModel
Feed-forward A2C actor-critic.
Source code in rlib/A2C/model.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
ActorCritic_LSTM ¶
ActorCritic_LSTM(model, input_size, action_size, cell_size, config: A2CConfig, *, build_optimiser: bool = True, optim: type[Optimizer] = torch.optim.RMSprop, optim_args: dict | None = None, **model_args)
Bases: A2CModel
Recurrent A2C actor-critic (masked LSTM body).
Source code in rlib/A2C/model.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
A2CLSTMTrainer ¶
A2CLSTMTrainer(envs, agent: ActorCritic_LSTM, val_envs, config: TrainerConfig)
Bases: SyncMultiEnvTrainer
Recurrent A2C trainer (LSTM hidden state propagated across rollouts).
Source code in rlib/A2C/trainer.py
85 86 87 88 89 90 91 92 93 | |
A2CTrainer ¶
A2CTrainer(envs, agent: ActorCritic, val_envs, config: TrainerConfig)
Bases: SyncMultiEnvTrainer
Synchronous Advantage Actor-Critic trainer (feed-forward).
Source code in rlib/A2C/trainer.py
17 18 19 20 21 22 23 24 | |