Provide a stopper hook which checks latest checkpoint. This hook will be...
Provide a stopper hook which checks latest checkpoint. This hook will be helpful to relieve following edge case: * global_step is reached to last_step * all workers stop due to last_step check except chief * chief starts writing the last checkpoint * A PS is preempted while chief is writing the checkpoint * chief restarts training from an older checkpoint * at this point only chief remains to handle remaining global steps. PiperOrigin-RevId: 208675370
Loading
Please sign in to comment