AlphaZero-Style Training Run: 13 Iterations
| iteration | loss_policy_train | loss_value_train | loss_policy_val | loss_value_val | loss_soft_policy_train | loss_soft_policy_val | loss_aux_value_train | loss_aux_value_val | loss_aux_value_0_train | loss_aux_value_0_val | loss_aux_value_1_train | loss_aux_value_1_val | loss_aux_value_2_train | loss_aux_value_2_val | loss_aux_value_3_train | loss_aux_value_3_val | gradient_steps | game_length_avg | game_length_stddev | game_length_min | game_length_max | game_wins | game_losses | game_draws | policy_entropy_avg | policy_max_prob_avg | policy_entropy_high_branch_avg | policy_max_prob_high_branch_avg | policy_agreement_avg | policy_agreement_high_branch_avg | policy_surprise_avg | value_z_avg | value_q_avg | value_z_stddev | value_q_stddev | value_correction_avg | value_correction_high_branch_avg | value_q_spread_avg | value_q_spread_high_branch_avg | value_error_early_avg | value_error_mid_avg | value_error_late_avg | value_network_stddev | lr | q_weight | mcts_sims | replay_samples | samples_iter | time_selfplay_secs | time_train_secs | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3.864787 | 0.320826 | 3.18191 | 0.242259 | 3.646052 | 3.076387 | 0.170796 | 0.031286 | 0.061016 | 0.005071 | 0.041854 | 0.009963 | 0.716512 | 0.133743 | 0.0 | 0.0 | 35 | 354.264 | 67.721033 | 134 | 418 | 206 | 257 | 37 | 1.069399 | 0.552025 | 1.627311 | 0.355778 | 0.306516 | 0.152918 | 0.900903 | 0.117202 | -0.043852 | 0.755335 | 0.117126 | 0.256406 | 0.220012 | 0.110637 | 0.128793 | 0.703323 | 0.707441 | 0.717725 | 0.043437 | 0.0005 | 0.028333 | 100 | 178444 | 178444 | 286.273461 | 77.393945 |
| 2 | 2 | 3.054179 | 0.214878 | 2.736785 | 0.18571 | 2.956313 | 2.728005 | 0.025733 | 0.01787 | 0.036853 | 0.028107 | 0.038285 | 0.029353 | 0.048863 | 0.028961 | 0.0 | 0.0 | 66 | 363.372 | 59.736836 | 132 | 420 | 235 | 235 | 30 | 1.107233 | 0.555301 | 1.653229 | 0.386097 | 0.212524 | 0.147646 | 1.226714 | 0.125226 | 0.066841 | 0.75366 | 0.302264 | 0.161567 | 0.139284 | 0.075105 | 0.073412 | 0.623991 | 0.541579 | 0.498636 | 0.311493 | 0.0005 | 0.056667 | 197 | 334478 | 156034 | 311.520347 | 122.90891 |
| 3 | 3 | 2.473027 | 0.319545 | 2.383122 | 0.304262 | 2.611523 | 2.547672 | 0.030155 | 0.017883 | 0.043951 | 0.021795 | 0.04879 | 0.029612 | 0.051183 | 0.033476 | 0.0 | 0.0 | 42 | 306.762 | 80.571219 | 125 | 456 | 246 | 242 | 12 | 0.914362 | 0.649239 | 1.223786 | 0.563321 | 0.343803 | 0.29213 | 1.169231 | 0.107064 | 0.078501 | 0.913341 | 0.397952 | 0.118311 | 0.109984 | 0.055504 | 0.064856 | 0.795905 | 0.72426 | 0.626296 | 0.413562 | 0.0005 | 0.085 | 293 | 214466 | 214466 | 565.548518 | 90.27873 |
| 4 | 4 | 2.306242 | 0.283898 | 2.178379 | 0.285746 | 2.373558 | 2.262347 | 0.023122 | 0.027903 | 0.033743 | 0.03843 | 0.035401 | 0.045739 | 0.040325 | 0.047388 | 0.0 | 0.0 | 92 | 290.198 | 101.386364 | 99 | 488 | 223 | 266 | 11 | 0.636061 | 0.749145 | 0.864953 | 0.690746 | 0.414222 | 0.362223 | 1.309917 | 0.052395 | 0.030916 | 0.88627 | 0.428252 | 0.157282 | 0.157085 | 0.05787 | 0.074392 | 0.807009 | 0.671657 | 0.560194 | 0.441084 | 0.0005 | 0.113333 | 390 | 468004 | 253538 | 776.247988 | 171.735797 |
| 5 | 5 | 2.120492 | 0.266878 | 2.084083 | 0.281335 | 2.201041 | 2.173777 | 0.022068 | 0.019783 | 0.029671 | 0.025418 | 0.034216 | 0.0314 | 0.040508 | 0.03751 | 0.0 | 0.0 | 125 | 245.736 | 82.902221 | 91 | 448 | 297 | 197 | 6 | 0.727392 | 0.718925 | 1.036619 | 0.622322 | 0.347107 | 0.255754 | 1.275226 | 0.092726 | 0.062839 | 0.952357 | 0.48553 | 0.178129 | 0.207145 | 0.073739 | 0.089232 | 0.855319 | 0.713809 | 0.555246 | 0.479771 | 0.0005 | 0.141667 | 487 | 635428 | 167424 | 624.092566 | 233.373341 |
| 6 | 6 | 2.056717 | 0.254081 | 2.004317 | 0.244936 | 2.128085 | 2.089925 | 0.025475 | 0.023007 | 0.035926 | 0.038931 | 0.040308 | 0.031487 | 0.044802 | 0.038463 | 0.0 | 0.0 | 159 | 224.934 | 85.427499 | 84 | 460 | 227 | 271 | 2 | 0.507362 | 0.801921 | 0.702605 | 0.743406 | 0.388656 | 0.307349 | 1.443071 | 0.064813 | 0.037782 | 0.960654 | 0.475798 | 0.198953 | 0.211323 | 0.075591 | 0.088419 | 0.876541 | 0.746356 | 0.578257 | 0.496498 | 0.0005 | 0.17 | 583 | 810439 | 175011 | 721.00008 | 297.277773 |
| 7 | 7 | 1.980565 | 0.240194 | 1.962871 | 0.247953 | 2.056309 | 2.038054 | 0.024335 | 0.023682 | 0.035475 | 0.028279 | 0.036753 | 0.039402 | 0.043138 | 0.045329 | 0.0 | 0.0 | 184 | 199.75 | 63.037604 | 71 | 438 | 253 | 243 | 4 | 0.709711 | 0.730818 | 1.050833 | 0.624401 | 0.331796 | 0.203324 | 1.272419 | 0.052391 | 0.052032 | 0.982562 | 0.453649 | 0.125269 | 0.11675 | 0.056044 | 0.065202 | 0.901564 | 0.776227 | 0.608777 | 0.460589 | 0.0005 | 0.198333 | 680 | 939431 | 128992 | 609.804331 | 345.188357 |
| 8 | 8 | 1.934722 | 0.224824 | 1.926557 | 0.237877 | 2.009667 | 1.996964 | 0.023222 | 0.030667 | 0.031103 | 0.045675 | 0.036661 | 0.050567 | 0.042819 | 0.049246 | 0.0 | 0.0 | 209 | 205.662 | 65.041493 | 92 | 434 | 297 | 201 | 2 | 0.669525 | 0.741777 | 1.023188 | 0.622877 | 0.319223 | 0.180888 | 1.501361 | 0.056121 | 0.060203 | 0.982131 | 0.472652 | 0.203874 | 0.193068 | 0.091324 | 0.104065 | 0.890773 | 0.743353 | 0.584211 | 0.457012 | 0.0005 | 0.226667 | 777 | 1065506 | 126075 | 655.579999 | 390.819171 |
| 9 | 9 | 1.905073 | 0.210875 | 1.928705 | 0.205652 | 1.952473 | 1.973959 | 0.022604 | 0.02032 | 0.028725 | 0.02477 | 0.03595 | 0.031809 | 0.042882 | 0.040527 | 0.0 | 0.0 | 198 | 191.228 | 71.703305 | 79 | 462 | 234 | 265 | 1 | 0.450463 | 0.823819 | 0.639126 | 0.766002 | 0.38914 | 0.288425 | 1.395133 | 0.044485 | 0.032008 | 0.976307 | 0.470015 | 0.16768 | 0.161381 | 0.074023 | 0.086891 | 0.903919 | 0.743153 | 0.570822 | 0.480495 | 0.0005 | 0.255 | 873 | 1011396 | 160356 | 953.552464 | 371.128067 |
| 10 | 10 | 1.885457 | 0.202162 | 1.930772 | 0.194228 | 1.936023 | 1.970428 | 0.023354 | 0.021064 | 0.028933 | 0.025361 | 0.037461 | 0.033421 | 0.044752 | 0.041666 | 0.0 | 0.0 | 174 | 183.046 | 61.119096 | 70 | 460 | 255 | 240 | 5 | 0.657052 | 0.752794 | 1.023239 | 0.635632 | 0.362461 | 0.230714 | 1.293476 | 0.062379 | 0.061959 | 0.981138 | 0.480829 | 0.122319 | 0.116329 | 0.05668 | 0.06547 | 0.893507 | 0.749269 | 0.558663 | 0.493849 | 0.0005 | 0.283333 | 970 | 888992 | 131134 | 815.300974 | 326.27391 |
| 11 | 11 | 1.845871 | 0.191741 | 1.825237 | 0.184761 | 1.896899 | 1.87675 | 0.02634 | 0.023318 | 0.03383 | 0.02774 | 0.042235 | 0.03806 | 0.049186 | 0.044817 | 0.0 | 0.0 | 172 | 190.236 | 63.771156 | 70 | 456 | 274 | 226 | 0 | 0.661495 | 0.752031 | 1.068277 | 0.625464 | 0.433967 | 0.306839 | 1.176193 | 0.103805 | 0.096743 | 0.980855 | 0.551427 | 0.096549 | 0.095214 | 0.046351 | 0.053343 | 0.853389 | 0.680149 | 0.469069 | 0.555274 | 0.0005 | 0.311667 | 1067 | 878536 | 156968 | 1076.774026 | 322.572067 |
| 12 | 12 | 1.769169 | 0.18186 | 1.772084 | 0.176119 | 1.83411 | 1.844026 | 0.028092 | 0.024488 | 0.036672 | 0.028815 | 0.045173 | 0.039682 | 0.051839 | 0.047161 | 0.0 | 0.0 | 166 | 152.794 | 46.067033 | 73 | 468 | 249 | 250 | 1 | 0.535325 | 0.799832 | 0.891145 | 0.685752 | 0.475168 | 0.291591 | 1.142953 | 0.106931 | 0.078072 | 0.989762 | 0.506 | 0.09566 | 0.093859 | 0.051289 | 0.056117 | 0.880376 | 0.72019 | 0.516044 | 0.519486 | 0.0005 | 0.34 | 1163 | 848198 | 144673 | 1061.730934 | 311.39659 |
| 13 | 13 | 1.743754 | 0.158582 | 1.726435 | 0.153854 | 1.81128 | 1.799296 | 0.025183 | 0.021926 | 0.030586 | 0.025736 | 0.040735 | 0.035313 | 0.04855 | 0.042916 | 0.0 | 0.0 | 168 | 153.432 | 51.590012 | 68 | 478 | 231 | 268 | 1 | 0.579754 | 0.784928 | 0.941172 | 0.672857 | 0.472446 | 0.284693 | 1.125724 | 0.159865 | 0.117188 | 0.97555 | 0.534392 | 0.094677 | 0.101375 | 0.043354 | 0.04914 | 0.861803 | 0.653665 | 0.446512 | 0.536952 | 0.0005 | 0.368333 | 1260 | 857260 | 138054 | 1159.378612 | 314.991295 |
1–13 of 13
Rows per page:
1 / 1
Reinforcement learning training metrics tracking policy/value losses, game outcomes, and MCTS search over 13 iterations.
This dataset contains 13 records across 51 fields: iteration, loss_policy_train, loss_value_train, loss_policy_val, loss_value_val, loss_soft_policy_train, and 45 more.