-
Notifications
You must be signed in to change notification settings - Fork 18.7k
daemon: archive: pause containers before doing filesystem operations #39252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is still a WIP since I still need to test that my reproducer no longer triggers the issue. |
95531f7
to
101c2d6
Compare
8fbdb8c
to
88799f6
Compare
78377ce
to
fede627
Compare
Codecov Report
@@ Coverage Diff @@
## master #39252 +/- ##
==========================================
+ Coverage 37.05% 37.08% +0.02%
==========================================
Files 612 612
Lines 45478 45584 +106
==========================================
+ Hits 16853 16904 +51
- Misses 26341 26392 +51
- Partials 2284 2288 +4 |
fede627
to
e9cac66
Compare
Wondering if we need a
(could be either |
Since this is a security fix I'm not sure how good of an idea it is to allow disabling it -- the attack scripts I have allow attackers to read and write to anywhere on the host filesystem as root. Adding footguns for their own sake feels like a bad idea.
|
Yes 😞 definitely understood; I can see this being a disruptive change though (as in; users that would currently would now seeing their container going offline). Surely, not really best practice if you need the actual container to get important data out 🤷♂, but still. |
I can do some more testing to see how bad the impact is, but in theory we're talking about hundreds of milliseconds of downtime -- EDIT: Actually no, it could be pretty bad if you're copying large files... |
Oh, I misunderstood, and thought the container would be paused during the whole copy operation 😊 😅. Should've taken the time to go through the whole flow. Sorry for that! Jumping between conversations 😊 |
No sorry, you are right -- it does apply to the entire copy operation. I forgot people copy pretty large files into and out of containers. Hmm, maybe we do need an off switch for this... |
Pause doesnt work with rootless, any workaround? |
For rootless we could just not do the pausing. As I mention in the commit message, this is a best-effort protection because (after spending several weeks working on it) I concluded it's effectively impossible to implement the actual protection necessary to defend against the attack in Docker. You would need to rewrite too many interfaces and too much glue code that it's not really doable anymore. |
Derek add label: rebuild/janky |
How come we aren't using chrootarchive here? |
@cpuguy83 At the very end of the chain, |
There are certain classes of attacks (as evidenced in CVE-2018-15664) which are caused by our allowing container processes to be executing while we are doing filesystem operations on the container. In particular, there are trivial TOCTOU races in symlink resolution and scoping that can be exploited[1]. The most complete solution to this problem would be to modify chrootarchive so that all of the archive operations occur with the root as the container rootfs (and not the parent directory, which is what causes the vulnerability since the parent is attacker-controlled). Unfortunately, changes to this core piece of Docker are almost impossible (the TarUntar interface has many copies and reimplementations that would all need to be modified to be able to handle a new "root" argument). So, we instead settle for the next-best option which is to pause the container during our usage of the filesystem. This is far from an ideal solution (you can image some attack scenarios such as shared volume mounts) where this is ineffectual but it does block the most basic attack. I am currently working on some new kernel functionality that would allow for much safer resolution of paths inside untrusted roots[2], but as above it would be difficult to modify Docker to use it. I am working on adding support to filepath-securejoin[3] though (however this will require quite a few inteface changes). [1]: https://bugzilla.suse.com/show_bug.cgi?id=1096726 [2]: https://lwn.net/Articles/767547/ [outdated API] [3]: https://github.com/cyphar/filepath-securejoin Fixes: CVE-2018-15664 Signed-off-by: Aleksa Sarai <asarai@suse.de>
Thinking we could add a flag to set the daemon level default ( Or alternatively we could add a flag which requires the user to set pause on cp ( I will be happy to look at updating interfaces to make this better. |
Just a thought, we could take a temporary snapshot on the fs if the container is running (the /cc @dmcgowan |
e9cac66
to
6227d8e
Compare
Are there real use cases people really need I am just saying if that is too complicated to fix and is too dangerous to leave, I would opt for having some flag like |
@arno01 it is hard to know how people are using |
I am afraid that pausing (SIGSTOP) containers is also a breaking change as it will lead to the error prone situations such as when someone is WDYT? |
I have implemented chrootarchive within a specified root (the container's root) here: #39292 The change itself is fairly isolated (2 new functions in chrootarchive). |
I'm closing this PR in favour of that one. @cpuguy83 I should've been more specific -- the difficulty I had was in trying to fix and restructure all the users of FollowSymlinkInScope. |
(Note: This PR was made public after discussions with the Docker security team, if you find a security vulnerability please report it directly to security@docker.com.)
There are certain classes of attacks (as evidenced in CVE-2018-15664)
which are caused by our allowing container processes to be executing
while we are doing filesystem operations on the container. In
particular, there are trivial TOCTOU races in symlink resolution and
scoping that can be exploited.
The most complete solution to this problem would be to modify
chrootarchive so that all of the archive operations occur with the root
as the container rootfs (and not the parent directory, which is what
causes the vulnerability since the parent is attacker-controlled).
Unfortunately, changes to this core piece of Docker are almost
impossible (the TarUntar interface has many copies and reimplementations
that would all need to be modified to be able to handle a new "root"
argument).
So, we instead settle for the next-best option which is to pause the
container during our usage of the filesystem. This is far from an ideal
solution (you can image some attack scenarios such as shared volume
mounts) where this is ineffectual but it does block the most basic
attack.
I am currently working on some new kernel functionality that would allow
for much safer resolution of paths inside untrusted roots, but as
above it would be difficult to modify Docker to use it. I am working on
adding support to filepath-securejoin though (however this will
require quite a few inteface changes).
Fixes: CVE-2018-15664
Signed-off-by: Aleksa Sarai asarai@suse.de