training datasets

10 datasets tagged "training"

AlphaZero-Style RL Training Metrics (13 Iterations)

Policy and value network training logs tracking losses, game length, MCTS agreement, and value calibration across 13 self-play iterations.

13 rows · 51 columns

AlphaZero-Style Game Agent Training Metrics (13 Iterations)

Policy and value loss, game outcomes, and MCTS statistics from a reinforcement learning agent training run over 13 self-play iterations.

13 rows · 1 columns

Catan AI Self-Play Training Metrics (171 Iterations)

AlphaZero-style training run for Catan: policy/value losses, game lengths, MCTS agreement, and value calibration across 171 self-play iterations.

171 rows · 38 columns

AlphaZero Training Metrics: Policy and Value Loss Over 13 Iterations

Reinforcement learning training run tracking policy/value losses, game length, MCTS simulations, and value calibration across 13 self-play iterations.

13 rows · 51 columns

AlphaZero-Style Training Run (177 Iterations)

Reinforcement learning training metrics tracking policy loss, value loss, game length, and MCTS agreement over 177 self-play iterations.

176 rows · 41 columns

AlphaZero-Style Self-Play Training Metrics (177 Iterations)

Policy loss, value loss, game length, and MCTS agreement tracked over 177 self-play iterations of AlphaZero-style reinforcement learning.

176 rows · 41 columns

AlphaZero-Style Training Metrics (177 Iterations)

Self-play reinforcement learning run tracking policy loss, value loss, game length, and MCTS agreement across 177 training iterations.

176 rows · 41 columns

AlphaZero-Style Training Run: 171 Iterations of Self-Play

Policy and value network training metrics over 171 iterations, tracking loss convergence, game length, MCTS agreement, and value calibration.

171 rows · 38 columns

Catan RL Training — Implementation

171 training iterations of a Catan RL implementation. Tracks policy and value loss convergence, game length evolution, and self-play performance metri...

171 rows · 38 columns

AlphaZero Training Run: Policy and Value Network Convergence

212 iterations of AlphaZero-style self-play training tracking policy/value loss, MCTS agreement, game outcomes, and value calibration.

212 rows · 38 columns