Commit 4a34c584 authored by Dev Jain's avatar Dev Jain Committed by Andrew Morton
Browse files

mempolicy: optimize queue_folios_pte_range by PTE batching

After the check for queue_folio_required(), the code only cares about the
folio in the for loop, i.e the PTEs are redundant.  Therefore, optimize
this loop by skipping over a PTE batch mapping the same folio.

With a test program migrating pages of the calling process, which includes
a mapped VMA of size 4GB with pte-mapped large folios of order-9, and
migrating once back and forth node-0 and node-1, the average execution
time reduces from 7.5 to 4 seconds, giving an approx 47% speedup.

Link: https://lkml.kernel.org/r/20250416053048.96479-1-dev.jain@arm.com


Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent 75404e07
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment