Commit 65c2daaf authored by Mustafa Ispir's avatar Mustafa Ispir Committed by TensorFlower Gardener
Browse files

Provide a stopper hook which checks latest checkpoint. This hook will be...

Provide a stopper hook which checks latest checkpoint. This hook will be helpful to relieve following edge case:
* global_step is reached to last_step
* all workers stop due to last_step check except chief
* chief starts writing the last checkpoint
* A PS is preempted while chief is writing the checkpoint
* chief restarts training from an older checkpoint
* at this point only chief remains to handle remaining global steps.

PiperOrigin-RevId: 208675370
parent 10a7b245
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment