rl datasets

2 datasets tagged "rl"

AlphaZero-Style RL Training Metrics (13 Iterations)

Policy and value network training logs tracking losses, game length, MCTS agreement, and value calibration across 13 self-play iterations.

13 rows · 51 columns

Catan RL Training — Implementation

171 training iterations of a Catan RL implementation. Tracks policy and value loss convergence, game length evolution, and self-play performance metri...

171 rows · 38 columns