Commit 737c09e1 authored by Giuseppe Scrivano's avatar Giuseppe Scrivano Committed by Todd Kjos
Browse files

UPSTREAM: fs, close_range: add flag CLOSE_RANGE_CLOEXEC



When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't
immediately close the files but it sets the close-on-exec bit.

It is useful for e.g. container runtimes that usually install a
seccomp profile "as late as possible" before execv'ing the container
process itself.  The container runtime could either do:
  1                                  2
- install_seccomp_profile();       - close_range(MIN_FD, MAX_INT, 0);
- close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile();
- execve(...);                     - execve(...);

Both alternative have some disadvantages.

In the first variant the seccomp_profile cannot block the close_range
syscall, as well as opendir/read/close/... for the fallback on older
kernels.
In the second variant, close_range() can be used only on the fds
that are not going to be needed by the runtime anymore, and it must be
potentially called multiple times to account for the different ranges
that must be closed.

Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues.
The runtime is able to use the existing open fds, the seccomp profile
can block close_range() and the syscalls used for its fallback.

Signed-off-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
Link: https://lore.kernel.org/r/20201118104746.873084-2-gscrivan@redhat.com


Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
(cherry picked from commit 582f1fb6)
Bug: 216276716
Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
Change-Id: Ib2d44f9760a80e3febdb25925d17ce5ffd14910e
parent a185c0f1
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment