Commit f90ba27e authored Jun 08, 2025 by Barry Song Committed by yipeng xiang Sep 05, 2025

BACKPORT: mm: use per_vma lock for MADV_DONTNEED

Certain madvise operations, especially MADV_DONTNEED, occur far more
frequently than other madvise options, particularly in native and Java
heaps for dynamic memory management.

Currently, the mmap_lock is always held during these operations, even when
unnecessary.  This causes lock contention and can lead to severe priority
inversion, where low-priority threads—such as Android's
HeapTaskDaemon— hold the lock and block higher-priority threads.

This patch enables the use of per-VMA locks when the advised range lies
entirely within a single VMA, avoiding the need for full VMA traversal.
In practice, userspace heaps rarely issue MADV_DONTNEED across multiple
VMAs.

Tangquan's testing shows that over 99.5% of memory reclaimed by Android
benefits from this per-VMA lock optimization.  After extended runtime,
217,735 madvise calls from HeapTaskDaemon used the per-VMA path, while
only 1,231 fell back to mmap_lock.

To simplify handling, the implementation falls back to the standard
mmap_lock if userfaultfd is enabled on the VMA, avoiding the complexity of
userfaultfd_remove().

Many thanks to Lorenzo's work[1] on "mm/madvise: support VMA read locks
for MADV_DONTNEED[_LOCKED]"

Then use this mechanism to permit VMA locking to be done later in the
madvise() logic and also to allow altering of the locking mode to permit
falling back to an mmap read lock if required."

One important point, as pointed out by Jann[2], is that
untagged_addr_remote() requires holding mmap_lock.  This is because
address tagging on x86 and RISC-V is quite complex.

Until untagged_addr_remote() becomes atomic—which seems unlikely in the
near future—we cannot support per-VMA locks for remote processes.  So
for now, only local processes are supported.

Link: https://lore.kernel.org/all/0b96ce61-a52c-4036-b5b6-5c50783db51f@lucifer.local/ [1]
Link: https://lore.kernel.org/all/CAG48ez11zi-1jicHUZtLhyoNPGGVB+ROeAJCUw48bsjk4bbEkA@mail.gmail.com/ [2]
Link: https://lkml.kernel.org/r/20250607220150.2980-1-21cnbao@gmail.com


Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 425827225
Bug: 441636500
Change-Id: I9485baaf04a09d84e89157dab9bc9185f091947d
(cherry picked from commit a6fde7ad)
[oven: Moved changes in madvise_do_behavior out to do_madvise and removed
changes in vector_madvise. Because these functions haven't been
introduced in old kernel. Resolved other minor conflict as well.
add prev = vma; to stay consistent with the upstream.]
Signed-off-by: Oven <liyangouwen1@oppo.com>

parent d16bae51

Show whitespace changes

Inline Side-by-side

CodeLinaro @codelinaro
mentioned in commit b9e7529a
· Nov 05, 2025

mentioned in commit b9e7529a

mentioned in commit b9e7529a495a7032ce2aa2d437a5c1afd29b3b68

Toggle commit list
CodeLinaro @codelinaro
mentioned in commit a5e3d02f
· Nov 05, 2025

mentioned in commit a5e3d02f

mentioned in commit a5e3d02fb00385a544f110df5edc5118ceb9b304

Toggle commit list

Please to comment