BACKPORT: FROMGIT: sched/deadline: Fix dl_server getting stuck
John found it was easy to hit lockup warnings when running locktorture on a 2 CPU VM, which he bisected down to: commit cccb45d7 ("sched/deadline: Less agressive dl_server handling"). While debugging it seems there is a chance where we end up with the dl_server dequeued, with dl_se->dl_server_active. This causes dl_server_start() to return without enqueueing the dl_server, thus it fails to run when RT tasks starve the cpu. When this happens, dl_server_timer() catches the '!dl_se->server_has_tasks(dl_se)' case, which then calls replenish_dl_entity() and dl_server_stopped() and finally return HRTIMER_NO_RESTART. This ends in no new timer and also no enqueue, leaving the dl_server 'dead', allowing starvation. What should have happened is for the bandwidth timer to start the zero-laxity timer, which in turn would enqueue the dl_server and cause dl_se->server_pick_task() to be called -- which will stop the dl_server if no fair tasks are observed for a whole period. IOW, it is totally irrelevant if there are fair tasks at the moment of bandwidth refresh. This removes all dl_se->server_has_tasks() users, so remove the whole thing. Fixes: cccb45d7 ("sched/deadline: Less agressive dl_server handling") Reported-by:John Stultz <jstultz@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
John Stultz <jstultz@google.com> (cherry picked from commit 077e1e2e0015e5ba6538d1c5299fb299a3a92d60 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent) [jstultz: Fixed collisions in removed lines, preserved removed structure element to avoid KMI issues.] Signed-off-by:
John Stultz <jstultz@google.com> Change-Id: I351df57def1fb98de952ef42db54817a68cdb34b Bug: 444744538
Loading
Please sign in to comment