Hyperparameter | PPO | TD3 | SAC |
Network Architecture | [64, 64] | [256, 256] | [256, 256] |
Activation | ReLU | ReLU | ReLU |
Optimizer | Adam | Adam | Adam |
Learning Rate | 0.0003 | 0.001 | 0.0003 |
Target Update Rate | 2048 Steps | 1 Episode | 1 Episode |
Batch Size | 64 | 100 | 256 |
Epochs | 10 | - | - |
Discount Factor (γ) | 0.99 | 0.99 | 0.99 |
Replay Buffer Size | - | 106 | 106 |
Clip Range (ε) | 0.2 | - | - |
GAE (λ) | 0.95 | - | - |
Soft Update Coefficient (τ) | - | 0.005 | 0.005 |
Target Entropy (α) | - | - | Auto |
Action Noise | - |
| - |
Policy Delay | - | 2 | - |