policy datasets
2 datasets tagged "policy"
AlphaZero Training Metrics: Policy and Value Loss Over 13 Iterations
Reinforcement learning training run tracking policy/value losses, game length, MCTS simulations, and value calibration across 13 self-play iterations.
AlphaZero Training Run: Policy and Value Network Convergence
212 iterations of AlphaZero-style self-play training tracking policy/value loss, MCTS agreement, game outcomes, and value calibration.