AlphaZero Training Metrics: Policy and Value Loss Over 13 Iterations

Reinforcement learning training run tracking policy/value losses, game length, MCTS simulations, and value calibration across 13 self-play iterations.
# iteration
loss_policy_train
loss_value_train
loss_policy_val
loss_value_val
loss_soft_policy_train
loss_soft_policy_val
loss_aux_value_train
loss_aux_value_val
loss_aux_value_0_train
loss_aux_value_0_val
loss_aux_value_1_train
loss_aux_value_1_val
loss_aux_value_2_train
loss_aux_value_2_val
loss_aux_value_3_train
loss_aux_value_3_val
gradient_steps
game_length_avg
game_length_stddev
game_length_min
game_length_max
game_wins
game_losses
game_draws
policy_entropy_avg
policy_max_prob_avg
policy_entropy_high_branch_avg
policy_max_prob_high_branch_avg
policy_agreement_avg
policy_agreement_high_branch_avg
policy_surprise_avg
value_z_avg
value_q_avg
value_z_stddev
value_q_stddev
value_correction_avg
value_correction_high_branch_avg
value_q_spread_avg
value_q_spread_high_branch_avg
value_error_early_avg
value_error_mid_avg
value_error_late_avg
value_network_stddev
lr
q_weight
mcts_sims
replay_samples
samples_iter
time_selfplay_secs
time_train_secs
+
1 1 3.864787 0.320826 3.18191 0.242259 3.646052 3.076387 0.170796 0.031286 0.061016 0.005071 0.041854 0.009963 0.716512 0.133743 0.0 0.0 35 354.264 67.721033 134 418 206 257 37 1.069399 0.552025 1.627311 0.355778 0.306516 0.152918 0.900903 0.117202 -0.043852 0.755335 0.117126 0.256406 0.220012 0.110637 0.128793 0.703323 0.707441 0.717725 0.043437 0.0005 0.028333 100 178444 178444 286.273461 77.393945
2 2 3.054179 0.214878 2.736785 0.18571 2.956313 2.728005 0.025733 0.01787 0.036853 0.028107 0.038285 0.029353 0.048863 0.028961 0.0 0.0 66 363.372 59.736836 132 420 235 235 30 1.107233 0.555301 1.653229 0.386097 0.212524 0.147646 1.226714 0.125226 0.066841 0.75366 0.302264 0.161567 0.139284 0.075105 0.073412 0.623991 0.541579 0.498636 0.311493 0.0005 0.056667 197 334478 156034 311.520347 122.90891
3 3 2.473027 0.319545 2.383122 0.304262 2.611523 2.547672 0.030155 0.017883 0.043951 0.021795 0.04879 0.029612 0.051183 0.033476 0.0 0.0 42 306.762 80.571219 125 456 246 242 12 0.914362 0.649239 1.223786 0.563321 0.343803 0.29213 1.169231 0.107064 0.078501 0.913341 0.397952 0.118311 0.109984 0.055504 0.064856 0.795905 0.72426 0.626296 0.413562 0.0005 0.085 293 214466 214466 565.548518 90.27873
4 4 2.306242 0.283898 2.178379 0.285746 2.373558 2.262347 0.023122 0.027903 0.033743 0.03843 0.035401 0.045739 0.040325 0.047388 0.0 0.0 92 290.198 101.386364 99 488 223 266 11 0.636061 0.749145 0.864953 0.690746 0.414222 0.362223 1.309917 0.052395 0.030916 0.88627 0.428252 0.157282 0.157085 0.05787 0.074392 0.807009 0.671657 0.560194 0.441084 0.0005 0.113333 390 468004 253538 776.247988 171.735797
5 5 2.120492 0.266878 2.084083 0.281335 2.201041 2.173777 0.022068 0.019783 0.029671 0.025418 0.034216 0.0314 0.040508 0.03751 0.0 0.0 125 245.736 82.902221 91 448 297 197 6 0.727392 0.718925 1.036619 0.622322 0.347107 0.255754 1.275226 0.092726 0.062839 0.952357 0.48553 0.178129 0.207145 0.073739 0.089232 0.855319 0.713809 0.555246 0.479771 0.0005 0.141667 487 635428 167424 624.092566 233.373341
6 6 2.056717 0.254081 2.004317 0.244936 2.128085 2.089925 0.025475 0.023007 0.035926 0.038931 0.040308 0.031487 0.044802 0.038463 0.0 0.0 159 224.934 85.427499 84 460 227 271 2 0.507362 0.801921 0.702605 0.743406 0.388656 0.307349 1.443071 0.064813 0.037782 0.960654 0.475798 0.198953 0.211323 0.075591 0.088419 0.876541 0.746356 0.578257 0.496498 0.0005 0.17 583 810439 175011 721.00008 297.277773
7 7 1.980565 0.240194 1.962871 0.247953 2.056309 2.038054 0.024335 0.023682 0.035475 0.028279 0.036753 0.039402 0.043138 0.045329 0.0 0.0 184 199.75 63.037604 71 438 253 243 4 0.709711 0.730818 1.050833 0.624401 0.331796 0.203324 1.272419 0.052391 0.052032 0.982562 0.453649 0.125269 0.11675 0.056044 0.065202 0.901564 0.776227 0.608777 0.460589 0.0005 0.198333 680 939431 128992 609.804331 345.188357
8 8 1.934722 0.224824 1.926557 0.237877 2.009667 1.996964 0.023222 0.030667 0.031103 0.045675 0.036661 0.050567 0.042819 0.049246 0.0 0.0 209 205.662 65.041493 92 434 297 201 2 0.669525 0.741777 1.023188 0.622877 0.319223 0.180888 1.501361 0.056121 0.060203 0.982131 0.472652 0.203874 0.193068 0.091324 0.104065 0.890773 0.743353 0.584211 0.457012 0.0005 0.226667 777 1065506 126075 655.579999 390.819171
9 9 1.905073 0.210875 1.928705 0.205652 1.952473 1.973959 0.022604 0.02032 0.028725 0.02477 0.03595 0.031809 0.042882 0.040527 0.0 0.0 198 191.228 71.703305 79 462 234 265 1 0.450463 0.823819 0.639126 0.766002 0.38914 0.288425 1.395133 0.044485 0.032008 0.976307 0.470015 0.16768 0.161381 0.074023 0.086891 0.903919 0.743153 0.570822 0.480495 0.0005 0.255 873 1011396 160356 953.552464 371.128067
10 10 1.885457 0.202162 1.930772 0.194228 1.936023 1.970428 0.023354 0.021064 0.028933 0.025361 0.037461 0.033421 0.044752 0.041666 0.0 0.0 174 183.046 61.119096 70 460 255 240 5 0.657052 0.752794 1.023239 0.635632 0.362461 0.230714 1.293476 0.062379 0.061959 0.981138 0.480829 0.122319 0.116329 0.05668 0.06547 0.893507 0.749269 0.558663 0.493849 0.0005 0.283333 970 888992 131134 815.300974 326.27391
11 11 1.845871 0.191741 1.825237 0.184761 1.896899 1.87675 0.02634 0.023318 0.03383 0.02774 0.042235 0.03806 0.049186 0.044817 0.0 0.0 172 190.236 63.771156 70 456 274 226 0 0.661495 0.752031 1.068277 0.625464 0.433967 0.306839 1.176193 0.103805 0.096743 0.980855 0.551427 0.096549 0.095214 0.046351 0.053343 0.853389 0.680149 0.469069 0.555274 0.0005 0.311667 1067 878536 156968 1076.774026 322.572067
12 12 1.769169 0.18186 1.772084 0.176119 1.83411 1.844026 0.028092 0.024488 0.036672 0.028815 0.045173 0.039682 0.051839 0.047161 0.0 0.0 166 152.794 46.067033 73 468 249 250 1 0.535325 0.799832 0.891145 0.685752 0.475168 0.291591 1.142953 0.106931 0.078072 0.989762 0.506 0.09566 0.093859 0.051289 0.056117 0.880376 0.72019 0.516044 0.519486 0.0005 0.34 1163 848198 144673 1061.730934 311.39659
13 13 1.743754 0.158582 1.726435 0.153854 1.81128 1.799296 0.025183 0.021926 0.030586 0.025736 0.040735 0.035313 0.04855 0.042916 0.0 0.0 168 153.432 51.590012 68 478 231 268 1 0.579754 0.784928 0.941172 0.672857 0.472446 0.284693 1.125724 0.159865 0.117188 0.97555 0.534392 0.094677 0.101375 0.043354 0.04914 0.861803 0.653665 0.446512 0.536952 0.0005 0.368333 1260 857260 138054 1159.378612 314.991295

AlphaZero Training Metrics: Policy and Value Loss Over 13 Iterations — AI Analysis

Policy loss dropped 55% (3.86 → 1.74) and games shortened 57% (354 → 153 moves) over 13 iterations

Value error in late-game positions fell 38% but early-game error remains stubbornly high (~0.86), signaling poor opening evaluation

Training Summary

  • Policy loss fell 55% (3.86 → 1.74) with train and validation tracking closely — no sign of overfitting
  • Average game length dropped from 354 to 153 moves as the agent learned to finish games decisively; draws went from 37 to just 1
  • Policy agreement with MCTS rose from 31% to 47%, meaning the network increasingly agrees with tree search but still overrides it half the time
  • Late-game value error improved 38% (0.72 → 0.45) while early-game error stayed flat at ~0.86 — the model evaluates endgames well but struggles with openings
  • MCTS simulations scaled from 100 to 1,260 per move across the run, giving the search tree progressively more budget

Visualizations

Value Error by Game Phase
Average Game Length
Value Loss (Train vs Validation)
Policy Loss (Train vs Validation)

Opening Weakness

Early-game value error actually increased slightly (0.70 → 0.86) over training, even as mid and late-game errors fell. This suggests the model is learning tactical play but not positional understanding in the opening phase — a common pattern in AlphaZero-style training that typically resolves with more iterations.

This dataset contains 13 records across 51 fields: iteration, loss_policy_train, loss_value_train, loss_policy_val, loss_value_val, loss_soft_policy_train, and 45 more.

13 rows · 51 columns · 2026-03-23

Embed this data story

Embedding launches soon. You'll be able to embed interactive charts and data tables on any website.