AlphaZero Training Metrics: Policy and Value Loss Over 13 Iterations
Reinforcement learning training run tracking policy/value losses, game length, MCTS simulations, and value calibration across 13 self-play iterations.
| # | iteration | loss_policy_train | loss_value_train | loss_policy_val | loss_value_val | loss_soft_policy_train | loss_soft_policy_val | loss_aux_value_train | loss_aux_value_val | loss_aux_value_0_train | loss_aux_value_0_val | loss_aux_value_1_train | loss_aux_value_1_val | loss_aux_value_2_train | loss_aux_value_2_val | loss_aux_value_3_train | loss_aux_value_3_val | gradient_steps | game_length_avg | game_length_stddev | game_length_min | game_length_max | game_wins | game_losses | game_draws | policy_entropy_avg | policy_max_prob_avg | policy_entropy_high_branch_avg | policy_max_prob_high_branch_avg | policy_agreement_avg | policy_agreement_high_branch_avg | policy_surprise_avg | value_z_avg | value_q_avg | value_z_stddev | value_q_stddev | value_correction_avg | value_correction_high_branch_avg | value_q_spread_avg | value_q_spread_high_branch_avg | value_error_early_avg | value_error_mid_avg | value_error_late_avg | value_network_stddev | lr | q_weight | mcts_sims | replay_samples | samples_iter | time_selfplay_secs | time_train_secs | + |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3.864787 | 0.320826 | 3.18191 | 0.242259 | 3.646052 | 3.076387 | 0.170796 | 0.031286 | 0.061016 | 0.005071 | 0.041854 | 0.009963 | 0.716512 | 0.133743 | 0.0 | 0.0 | 35 | 354.264 | 67.721033 | 134 | 418 | 206 | 257 | 37 | 1.069399 | 0.552025 | 1.627311 | 0.355778 | 0.306516 | 0.152918 | 0.900903 | 0.117202 | -0.043852 | 0.755335 | 0.117126 | 0.256406 | 0.220012 | 0.110637 | 0.128793 | 0.703323 | 0.707441 | 0.717725 | 0.043437 | 0.0005 | 0.028333 | 100 | 178444 | 178444 | 286.273461 | 77.393945 | |
| 2 | 2 | 3.054179 | 0.214878 | 2.736785 | 0.18571 | 2.956313 | 2.728005 | 0.025733 | 0.01787 | 0.036853 | 0.028107 | 0.038285 | 0.029353 | 0.048863 | 0.028961 | 0.0 | 0.0 | 66 | 363.372 | 59.736836 | 132 | 420 | 235 | 235 | 30 | 1.107233 | 0.555301 | 1.653229 | 0.386097 | 0.212524 | 0.147646 | 1.226714 | 0.125226 | 0.066841 | 0.75366 | 0.302264 | 0.161567 | 0.139284 | 0.075105 | 0.073412 | 0.623991 | 0.541579 | 0.498636 | 0.311493 | 0.0005 | 0.056667 | 197 | 334478 | 156034 | 311.520347 | 122.90891 | |
| 3 | 3 | 2.473027 | 0.319545 | 2.383122 | 0.304262 | 2.611523 | 2.547672 | 0.030155 | 0.017883 | 0.043951 | 0.021795 | 0.04879 | 0.029612 | 0.051183 | 0.033476 | 0.0 | 0.0 | 42 | 306.762 | 80.571219 | 125 | 456 | 246 | 242 | 12 | 0.914362 | 0.649239 | 1.223786 | 0.563321 | 0.343803 | 0.29213 | 1.169231 | 0.107064 | 0.078501 | 0.913341 | 0.397952 | 0.118311 | 0.109984 | 0.055504 | 0.064856 | 0.795905 | 0.72426 | 0.626296 | 0.413562 | 0.0005 | 0.085 | 293 | 214466 | 214466 | 565.548518 | 90.27873 | |
| 4 | 4 | 2.306242 | 0.283898 | 2.178379 | 0.285746 | 2.373558 | 2.262347 | 0.023122 | 0.027903 | 0.033743 | 0.03843 | 0.035401 | 0.045739 | 0.040325 | 0.047388 | 0.0 | 0.0 | 92 | 290.198 | 101.386364 | 99 | 488 | 223 | 266 | 11 | 0.636061 | 0.749145 | 0.864953 | 0.690746 | 0.414222 | 0.362223 | 1.309917 | 0.052395 | 0.030916 | 0.88627 | 0.428252 | 0.157282 | 0.157085 | 0.05787 | 0.074392 | 0.807009 | 0.671657 | 0.560194 | 0.441084 | 0.0005 | 0.113333 | 390 | 468004 | 253538 | 776.247988 | 171.735797 | |
| 5 | 5 | 2.120492 | 0.266878 | 2.084083 | 0.281335 | 2.201041 | 2.173777 | 0.022068 | 0.019783 | 0.029671 | 0.025418 | 0.034216 | 0.0314 | 0.040508 | 0.03751 | 0.0 | 0.0 | 125 | 245.736 | 82.902221 | 91 | 448 | 297 | 197 | 6 | 0.727392 | 0.718925 | 1.036619 | 0.622322 | 0.347107 | 0.255754 | 1.275226 | 0.092726 | 0.062839 | 0.952357 | 0.48553 | 0.178129 | 0.207145 | 0.073739 | 0.089232 | 0.855319 | 0.713809 | 0.555246 | 0.479771 | 0.0005 | 0.141667 | 487 | 635428 | 167424 | 624.092566 | 233.373341 | |
| 6 | 6 | 2.056717 | 0.254081 | 2.004317 | 0.244936 | 2.128085 | 2.089925 | 0.025475 | 0.023007 | 0.035926 | 0.038931 | 0.040308 | 0.031487 | 0.044802 | 0.038463 | 0.0 | 0.0 | 159 | 224.934 | 85.427499 | 84 | 460 | 227 | 271 | 2 | 0.507362 | 0.801921 | 0.702605 | 0.743406 | 0.388656 | 0.307349 | 1.443071 | 0.064813 | 0.037782 | 0.960654 | 0.475798 | 0.198953 | 0.211323 | 0.075591 | 0.088419 | 0.876541 | 0.746356 | 0.578257 | 0.496498 | 0.0005 | 0.17 | 583 | 810439 | 175011 | 721.00008 | 297.277773 | |
| 7 | 7 | 1.980565 | 0.240194 | 1.962871 | 0.247953 | 2.056309 | 2.038054 | 0.024335 | 0.023682 | 0.035475 | 0.028279 | 0.036753 | 0.039402 | 0.043138 | 0.045329 | 0.0 | 0.0 | 184 | 199.75 | 63.037604 | 71 | 438 | 253 | 243 | 4 | 0.709711 | 0.730818 | 1.050833 | 0.624401 | 0.331796 | 0.203324 | 1.272419 | 0.052391 | 0.052032 | 0.982562 | 0.453649 | 0.125269 | 0.11675 | 0.056044 | 0.065202 | 0.901564 | 0.776227 | 0.608777 | 0.460589 | 0.0005 | 0.198333 | 680 | 939431 | 128992 | 609.804331 | 345.188357 | |
| 8 | 8 | 1.934722 | 0.224824 | 1.926557 | 0.237877 | 2.009667 | 1.996964 | 0.023222 | 0.030667 | 0.031103 | 0.045675 | 0.036661 | 0.050567 | 0.042819 | 0.049246 | 0.0 | 0.0 | 209 | 205.662 | 65.041493 | 92 | 434 | 297 | 201 | 2 | 0.669525 | 0.741777 | 1.023188 | 0.622877 | 0.319223 | 0.180888 | 1.501361 | 0.056121 | 0.060203 | 0.982131 | 0.472652 | 0.203874 | 0.193068 | 0.091324 | 0.104065 | 0.890773 | 0.743353 | 0.584211 | 0.457012 | 0.0005 | 0.226667 | 777 | 1065506 | 126075 | 655.579999 | 390.819171 | |
| 9 | 9 | 1.905073 | 0.210875 | 1.928705 | 0.205652 | 1.952473 | 1.973959 | 0.022604 | 0.02032 | 0.028725 | 0.02477 | 0.03595 | 0.031809 | 0.042882 | 0.040527 | 0.0 | 0.0 | 198 | 191.228 | 71.703305 | 79 | 462 | 234 | 265 | 1 | 0.450463 | 0.823819 | 0.639126 | 0.766002 | 0.38914 | 0.288425 | 1.395133 | 0.044485 | 0.032008 | 0.976307 | 0.470015 | 0.16768 | 0.161381 | 0.074023 | 0.086891 | 0.903919 | 0.743153 | 0.570822 | 0.480495 | 0.0005 | 0.255 | 873 | 1011396 | 160356 | 953.552464 | 371.128067 | |
| 10 | 10 | 1.885457 | 0.202162 | 1.930772 | 0.194228 | 1.936023 | 1.970428 | 0.023354 | 0.021064 | 0.028933 | 0.025361 | 0.037461 | 0.033421 | 0.044752 | 0.041666 | 0.0 | 0.0 | 174 | 183.046 | 61.119096 | 70 | 460 | 255 | 240 | 5 | 0.657052 | 0.752794 | 1.023239 | 0.635632 | 0.362461 | 0.230714 | 1.293476 | 0.062379 | 0.061959 | 0.981138 | 0.480829 | 0.122319 | 0.116329 | 0.05668 | 0.06547 | 0.893507 | 0.749269 | 0.558663 | 0.493849 | 0.0005 | 0.283333 | 970 | 888992 | 131134 | 815.300974 | 326.27391 | |
| 11 | 11 | 1.845871 | 0.191741 | 1.825237 | 0.184761 | 1.896899 | 1.87675 | 0.02634 | 0.023318 | 0.03383 | 0.02774 | 0.042235 | 0.03806 | 0.049186 | 0.044817 | 0.0 | 0.0 | 172 | 190.236 | 63.771156 | 70 | 456 | 274 | 226 | 0 | 0.661495 | 0.752031 | 1.068277 | 0.625464 | 0.433967 | 0.306839 | 1.176193 | 0.103805 | 0.096743 | 0.980855 | 0.551427 | 0.096549 | 0.095214 | 0.046351 | 0.053343 | 0.853389 | 0.680149 | 0.469069 | 0.555274 | 0.0005 | 0.311667 | 1067 | 878536 | 156968 | 1076.774026 | 322.572067 | |
| 12 | 12 | 1.769169 | 0.18186 | 1.772084 | 0.176119 | 1.83411 | 1.844026 | 0.028092 | 0.024488 | 0.036672 | 0.028815 | 0.045173 | 0.039682 | 0.051839 | 0.047161 | 0.0 | 0.0 | 166 | 152.794 | 46.067033 | 73 | 468 | 249 | 250 | 1 | 0.535325 | 0.799832 | 0.891145 | 0.685752 | 0.475168 | 0.291591 | 1.142953 | 0.106931 | 0.078072 | 0.989762 | 0.506 | 0.09566 | 0.093859 | 0.051289 | 0.056117 | 0.880376 | 0.72019 | 0.516044 | 0.519486 | 0.0005 | 0.34 | 1163 | 848198 | 144673 | 1061.730934 | 311.39659 | |
| 13 | 13 | 1.743754 | 0.158582 | 1.726435 | 0.153854 | 1.81128 | 1.799296 | 0.025183 | 0.021926 | 0.030586 | 0.025736 | 0.040735 | 0.035313 | 0.04855 | 0.042916 | 0.0 | 0.0 | 168 | 153.432 | 51.590012 | 68 | 478 | 231 | 268 | 1 | 0.579754 | 0.784928 | 0.941172 | 0.672857 | 0.472446 | 0.284693 | 1.125724 | 0.159865 | 0.117188 | 0.97555 | 0.534392 | 0.094677 | 0.101375 | 0.043354 | 0.04914 | 0.861803 | 0.653665 | 0.446512 | 0.536952 | 0.0005 | 0.368333 | 1260 | 857260 | 138054 | 1159.378612 | 314.991295 |
1–13 of 13
Rows per page:
1 / 1
AlphaZero Training Metrics: Policy and Value Loss Over 13 Iterations — AI Analysis
Training Summary
- Policy loss fell 55% (3.86 → 1.74) with train and validation tracking closely — no sign of overfitting
- Average game length dropped from 354 to 153 moves as the agent learned to finish games decisively; draws went from 37 to just 1
- Policy agreement with MCTS rose from 31% to 47%, meaning the network increasingly agrees with tree search but still overrides it half the time
- Late-game value error improved 38% (0.72 → 0.45) while early-game error stayed flat at ~0.86 — the model evaluates endgames well but struggles with openings
- MCTS simulations scaled from 100 to 1,260 per move across the run, giving the search tree progressively more budget
Visualizations
Value Error by Game Phase
Average Game Length
Value Loss (Train vs Validation)
Policy Loss (Train vs Validation)
Opening Weakness
Early-game value error actually increased slightly (0.70 → 0.86) over training, even as mid and late-game errors fell. This suggests the model is learning tactical play but not positional understanding in the opening phase — a common pattern in AlphaZero-style training that typically resolves with more iterations.