Commit c1ff6dcf authored by Yu Zhao's avatar Yu Zhao Committed by Carlos Llamas
Browse files

FROMLIST: BACKPORT: THP zones: the use cases of policy zones

There are three types of zones:
1. The first four zones partition the physical address space of CPU
   memory.
2. The device zone provides interoperability between CPU and device
   memory.
3. The movable zone commonly represents a memory allocation policy.

Though originally designed for memory hot removal, the movable zone is
instead widely used for other purposes, e.g., CMA and kdump kernel, on
platforms that do not support hot removal, e.g., Android and ChromeOS.
Nowadays, it is legitimately a zone independent of any physical
characteristics. In spite of being somewhat regarded as a hack,
largely due to the lack of a generic design concept for its true major
use cases (on billions of client devices), the movable zone naturally
resembles a policy (virtual) zone overlayed on the first four
(physical) zones.

This proposal formally generalizes this concept as policy zones so
that additional policies can be implemented and enforced by subsequent
zones after the movable zone. An inherited requirement of policy zones
(and the first four zones) is that subsequent zones must be able to
fall back to previous zones and therefore must add new properties to
the previous zones rather than remove existing ones from them. Also,
all properties must be known at the allocation time, rather than the
runtime, e.g., memory object size and mobility are valid properties
but hotness and lifetime are not.

ZONE_MOVABLE becomes the first policy zone, followed by two new policy
zones:
1. ZONE_NOSPLIT, which contains pages that are movable (inherited from
   ZONE_MOVABLE) and restricted to a minimum order to be
   anti-fragmentation. The latter means that they cannot be split down
   below that order, while they are free or in use.
2. ZONE_NOMERGE, which contains pages that are movable and restricted
   to an exact order. The latter means that not only is split
   prohibited (inherited from ZONE_NOSPLIT) but also merge (see the
   reason in Chapter Three), while they are free or in use.

Since these two zones only can serve THP allocations (__GFP_MOVABLE |
__GFP_COMP), they are called THP zones. Reclaim works seamlessly and
compaction is not needed for these two zones.

Compared with the hugeTLB pool approach, THP zones tap into core MM
features including:
1. THP allocations can fall back to the lower zones, which can have
   higher latency but still succeed.
2. THPs can be either shattered (see Chapter Two) if partially
   unmapped or reclaimed if becoming cold.
3. THP orders can be much smaller than the PMD/PUD orders, e.g., 64KB
   contiguous PTEs on arm64 [1], which are more suitable for client
   workloads.

Policy zones can be dynamically resized by offlining pages in one of
them and onlining those pages in another of them. Note that this is
only done among policy zones, not between a policy zone and a physical
zone, since resizing is a (software) policy, not a physical
characteristic.

Implementing the same idea in the pageblock granularity has also been
explored but rejected at Google. Pageblocks have a finer granularity
and therefore can be more flexible than zones. The tradeoff is that
this alternative implementation was more complex and failed to bring a
better ROI. However, the rejection was mainly due to its inability to
be smoothly extended to 1GB THPs [2], which is a planned use case of
TAO.

[1] https://lore.kernel.org/20240215103205.2607016-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/20200928175428.4110504-1-zi.yan@sent.com/



Change-Id: I7eb555541d04b16b93dea5aa0e2b329c49694a10
Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
Link: https://lore.kernel.org/r/20240229183436.4110845-2-yuzhao@google.com/


Bug: 313807618
[ Don't allocate order 0 from nomerge/nosplit zone - causes increase
  in reclaim activity ]
Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
parent ace063af
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment