Skip to content

Fix overlay2 storage driver inside a user namespace #35794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 16, 2017

Conversation

estesp
Copy link
Contributor

@estesp estesp commented Dec 14, 2017

The overlay2 driver was not setting up the archive.TarOptions field
for userns properly, like other storage backend routes to "applyTarLayer"
functionality are. The InUserNS field is now populated for overlay2 using
the same query function used by the other storage drivers.

Signed-off-by: Phil Estes estesp@linux.vnet.ibm.com

Fixes: #35245

The overlay2 driver was not setting up the archive.TarOptions field
properly like other storage backend routes to "applyTarLayer"
functionality. The InUserNS field is populated now for overlay2 using
the same query function used by the other storage drivers.

Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
@thaJeztah
Copy link
Member

Would it be able to add a test to this?

@kolyshkin
Copy link
Contributor

kolyshkin commented Dec 14, 2017

@estesp can you please add a test case?

Also, speaking of other storage drivers, I see that

  1. overlay isusing func (g *NaiveDiffDriver) ApplyDiff() which passes archive.TarOptions already prepopulated to chrootarchive.ApplyUncompressedLayer() which calls chrootarchive.applyLayerHandler() which has the following code:
        if options == nil {
                options = &archive.TarOptions{}
                if rsystem.RunningInUserNS() {
                        options.InUserNS = true
                }
        }

So, it only sets options.InUserNS if options == nil which is not the case when it is used from NaiveDiffDriver's ApplyDiff.

  1. devmapper, btrfs, zfs and vfs are all using NaiveDiffDriver's implementation too, so they all suffer from the same issue.

  2. aufs is not supported inside userns.

So, based on this quick glance, it needs to be fixed as well (and a generic graphdriver test case would help to reveal the bug). The fix would be something like this:

diff --git a/daemon/graphdriver/fsdiff.go b/daemon/graphdriver/fsdiff.go
index 533c5a769..5e98d980a 100644
--- a/daemon/graphdriver/fsdiff.go
+++ b/daemon/graphdriver/fsdiff.go
@@ -8,6 +8,7 @@ import (
        "github.com/docker/docker/pkg/chrootarchive"
        "github.com/docker/docker/pkg/idtools"
        "github.com/docker/docker/pkg/ioutils"
+       rsystem "github.com/opencontainers/runc/libcontainer/system"
        "github.com/sirupsen/logrus"
 )
 
@@ -142,8 +143,11 @@ func (gdw *NaiveDiffDriver) ApplyDiff(id, parent string, diff io.Reader) (size i
        defer driver.Put(id)
 
        layerFs := layerRootFs.Path()
-       options := &archive.TarOptions{UIDMaps: gdw.uidMaps,
-               GIDMaps: gdw.gidMaps}
+       options := &archive.TarOptions{
+               UIDMaps:  gdw.uidMaps,
+               GIDMaps:  gdw.gidMaps,
+               InUserNS: rsystem.RunningInUserNS(),
+       }
        start := time.Now().UTC()
        logrus.Debug("Start untar layer")
        if size, err = ApplyUncompressedLayer(layerFs, diff, options); err != nil {

@estesp
Copy link
Contributor Author

estesp commented Dec 14, 2017

A testcase that tests the RunningInUserNS code path is difficult as it requires a Docker engine running "containerized" inside a user namespace enabled container. I do that locally via LXC, and I assume it should be possible with some form of DinD setup, but I gave up last time I tried to get the inner Docker engine to start properly. The LXC team does do nightly validation that moby/moby hasn't broken running in a user namespace, but apparently aren't using the overlay2 driver or they probably would have caught this.

@brauner is there a public place to see the daily CI for LXC+Docker/moby daemon details?

If @kolyshkin or @thaJeztah have ideas on how to test this with our CI setup let me know. I definitely tried when this support initially came in via @hallyn's PR but found the dependencies (e.g. run via LXC) made it not so simple to add to our current framework.

@brauner
Copy link
Contributor

brauner commented Dec 14, 2017

@brauner is there a public place to see the daily CI for LXC+Docker/moby daemon details?

@estesp, yeah, sure it's at https://jenkins.linuxcontainers.org/view/LXD/view/LXD%20tests/job/lxd-test-docker/
you can then click on the build number and select privileged or unprivileged, e.g.:

https://jenkins.linuxcontainers.org/view/LXD/view/LXD%20tests/job/lxd-test-docker/280/arch=amd64,mode=unprivileged,restrict=lxd-priv/console

@stgraber
Copy link

And the actual test script is https://github.com/lxc/lxc-ci/blob/master/bin/test-lxd-docker it's pretty crude but it does the job :)

@kolyshkin
Copy link
Contributor

I still don't understand how the other graphdrivers work inside userns (if my analysis are correct and they do not set options.InUserNS=true).

@brauner
Copy link
Contributor

brauner commented Dec 14, 2017

I still don't understand how the other graphdrivers work inside userns (if my analysis are correct and they do not set options.InUserNS=true).

Should always fallback to the vfs driver if other drivers don't work, no?

@brauner
Copy link
Contributor

brauner commented Dec 14, 2017

That's at least how I would've implemented it.

@estesp
Copy link
Contributor Author

estesp commented Dec 14, 2017

@kolyshkin I think you are correct that a more airtight fix across drivers needs more investigation. To your question "how does it work anywhere?" I think there is some complexity to the layers at which inUserNs checks are being applied, including at the tarfile apply "write a single tar entry out to file" step (where yet another check exists), so that may be part of the answer. I think one of us needs to more deeply review this whole flow of options object, inuserns checks, and that interaction with the various graphdrivers.

This PR as it stands may only be a stopgap to fix one glaring issue, but maybe we want to clean up this overall impl across graphdrivers?

@estesp
Copy link
Contributor Author

estesp commented Dec 15, 2017

@kolyshkin I don't have the exact reasoning down to a codepath, but just FYI, vfs and overlay work fine with today's code; only overlay2 fails and is fixed by this PR.

I can't test devicemapper and aufs as both are not supported inside a user namespace. Will try btrfs and zfs if possible.

@kolyshkin
Copy link
Contributor

And the actual test script is https://github.com/lxc/lxc-ci/blob/master/bin/test-lxd-docker it's pretty crude but it does the job :)

Thank you @stgraber, I sent a trivial PR to improve it a bit: lxc/lxc-ci#18

In particular, I am interested in what graph driver is used (docker info shows it).

@estesp
Copy link
Contributor Author

estesp commented Dec 15, 2017

Just FYI: on my lxc ubuntu:16.04 instance, using the "standard" Docker CE install (apt-add repo path), I end up with overlay2 as the default today.

@kolyshkin
Copy link
Contributor

Should always fallback to the vfs driver if other drivers don't work, no?

True, but the amount of checks a graph driver can do during its initialization to see whether it can work correctly is limited (in other words, we can not afford trying to create, mount and start a multi-layer container during every graph driver init to see if it really works). So, we do some basic checks like if the fs is supported, if needed binaries are there, if the kernel is sufficient etc, and the check if gd is working inside user ns is not performed.

@kolyshkin
Copy link
Contributor

Hmm, as per #35245, overlay is working inside userns while overlay2 does not. While the fix looks correct to me, I still fail to understand how overlay is working, as it looks like it needs a very similar fix (see #35794 (comment)).

I will do some tests.

@kolyshkin
Copy link
Contributor

OK I found why overlay works: applyLayer sets the option by itself:

inUserns := rsystem.RunningInUserNS()

...
if inUserns {
options.InUserNS = true
}

Now it's good to figure out why overlay2 needs this option explicitly set.

@estesp
Copy link
Contributor Author

estesp commented Dec 15, 2017

@kolyshkin

Now it's good to figure out why overlay2 needs this option explicitly set.

Because for whatever reason overlay2 doesn't go through applyLayer at all. It links up it's archive untar to chrootarchive.UntarUncompressed which ends up at the docker-untar reexec, not the docker-applyLayer reexec as overlay does.

@kolyshkin
Copy link
Contributor

OK, it's all clear to me now.

Overlay call stack is:

  • overlay.ApplyDiff() calls
  • chrootarchive.ApplyUncompressedLayer() calls
  • chrootarchive.applyLayerHandler() (which sets options.InUserNS in case options == nil) calls
  • chrootarchive.applyLayer() (which sets options.InUserNS even if it was set before) calls
  • archive.UnpackLayer() calls
  • archive.createTarFile() for every file found in a tar stream

Overlay2 call stack is:

  • overlay2.ApplyDiff() calls
  • chrootarchive.UntarUncompressed() calls
  • chrootarchive.untarHandler() calls
  • chrootarchive.invokeUnpack() calls
  • chrootarchive.untar() calls
  • archive.Unpack() calls
  • archive.createTarFile() for every file found in a tar stream

So, overlay works because its ApplyDiff() call stack involves chrootarchive.applyLayer() which sets InUserNS, and overlay2 didn't work because its ApplyDiff() doesn't have that.

So, this patch LGTM.

As a side note, it's interesting why code paths differ so much, and what is the difference between these.

@dmcgowan
Copy link
Member

@kolyshkin they differ because ovleray2 does not use the naive driver. The naive driver mounts, then applies the tar including doing removes for whiteouts. Overlay2 skips the mounts and just directly unpacks, transforming the aufs style whiteouts from the tar to overlayfs whiteouts. The main reason for using different codes paths was both performance, and to resolve a bug in naive driver that would show up with some of the faster graph drivers. IMO performance should be secondary and we should just have overlay2 use naive diff driver in all cases.

Copy link
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

IMO performance should be secondary and we should just have overlay2 use naive diff driver in all cases.

@kolyshkin would that be something you could work on?

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit 0862014 into moby:master Dec 16, 2017
@kolyshkin
Copy link
Contributor

and to resolve a bug in naive driver that would show up with some of the faster graph drivers

@dmcgowan was/is it related to lack of sub-second time-stamps and inability to distinguish between changed files because of that?

@dmcgowan
Copy link
Member

@kolyshkin yes, that one. There is a work around for it now that fixes the builder issue.

@estesp estesp deleted the fix-overlay2-untarinuserns branch December 21, 2017 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants