rl datasets
2 datasets tagged "rl"
AlphaZero-Style RL Training Metrics (13 Iterations)
Policy and value network training logs tracking losses, game length, MCTS agreement, and value calibration across 13 self-play iterations.
Catan RL Training — Implementation
171 training iterations of a Catan RL implementation. Tracks policy and value loss convergence, game length evolution, and self-play performance metri...