Skip to content

Fix "duplicate mount point" when --tmpfs /dev/shm is used #35467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 22, 2017

Conversation

kolyshkin
Copy link
Contributor

This is a fix to #35455, i.e.:

$ docker run --tmpfs /dev/shm busybox sh
docker: Error response from daemon: linux mounts: Duplicate mount point '/dev/shm'.

In current code (daemon.createSpec()), tmpfs mount from --tmpfs is added
to list of mounts (ms), when the mount from IpcMounts() is added.
While IpcMounts() is checking for existing mounts first, it does that
by using container.HasMountFor() function which only checks container.Mounts
but not container.Tmpfs.

Ultimately, the solution is to get rid of container.Tmpfs (moving its
data to container.Mounts). Current workaround is to add checking
of container.Tmpfs into container.HasMountFor().

A unit test case is included.

Unfortunately we can't call daemon.createSpec() from a unit test,
as the code relies a lot on various daemon structures to be initialized
properly, and it is hard to achieve. Therefore, we minimally mimick
the code flow of daemon.createSpec() -- barely enough to reproduce
the issue.

@tonistiigi
Copy link
Member

LGTM

@kolyshkin
Copy link
Contributor Author

The followup to this is to make a client use mount api for --tmpfs if it's available from the engine.

@thaJeztah
Copy link
Member

Trying some of the examples mentioned in #6758 (comment), the exec option is applied, but it looks like the size option is still ignored when this patch is applied;

$ docker run -it --rm alpine sh
/ # mount | grep shm
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)

$ docker run -it --rm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1g alpine sh
/ # mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,relatime,size=65536k)


$ docker run -it --rm -v /dev/shm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1g alpine sh
/ # mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,relatime,size=65536k)

I'm testing this inside the dev-container, so the --ipc=none workaround cannot be used (due to the Docker 17.06 CLI performing validation in the client)

@kolyshkin
Copy link
Contributor Author

but it looks like the size option is still ignored when this patch is applied

Honoring the size= option is the subject of #35316, if you don't have commits from that PR applied you will see any tmpfs to use the default size.

I will check it, too.

@thaJeztah
Copy link
Member

Oh, right, I assumed this was rebased, let me try rebasing on master and see if it works

The code in question looks up mounts two times: first by using
HasMountFor(), and then directly by looking in container.MountPoints.
There is no need to do it twice.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a fix to the following issue:

$ docker run --tmpfs /dev/shm busybox sh
docker: Error response from daemon: linux mounts: Duplicate mount point '/dev/shm'.

In current code (daemon.createSpec()), tmpfs mount from --tmpfs is added
to list of mounts (`ms`), when the mount from IpcMounts() is added.
While IpcMounts() is checking for existing mounts first, it does that
by using container.HasMountFor() function which only checks container.Mounts
but not container.Tmpfs.

Ultimately, the solution is to get rid of container.Tmpfs (moving its
data to container.Mounts). Current workaround is to add checking
of container.Tmpfs into container.HasMountFor().

A unit test case is included.

Unfortunately we can't call daemon.createSpec() from a unit test,
as the code relies a lot on various daemon structures to be initialized
properly, and it is hard to achieve. Therefore, we minimally mimick
the code flow of daemon.createSpec() -- barely enough to reproduce
the issue.

moby#35455

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

I have rebased it and ran some manual tests:

# docker run -it --rm alpine mount | grep shm
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)

# docker run -it --rm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1g alpine mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,relatime,size=1048576k)

# docker run -it --rm -v /dev/shm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1g alpine mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,relatime,size=1048576k)

# docker run -it --rm -v /dev/shm --tmpfs /dev/shm:rw,nosuid,nodev,exec alpine mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,relatime)

# docker run -it --rm alpine df -h /dev/shm
Filesystem                Size      Used Available Use% Mounted on
shm                      64.0M         0     64.0M   0% /dev/shm

# docker run -it --rm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1g alpine  df -h /dev/shm
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     1.0G         0      1.0G   0% /dev/shm

# docker run -it --rm -v /dev/shm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=1g alpine df -h /dev/shm
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     1.0G         0      1.0G   0% /dev/shm

# docker run -it --rm -v /dev/shm --tmpfs /dev/shm:rw,nosuid,nodev,exec alpine df -h /dev/shm
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     3.8G         0      3.8G   0% /dev/shm

Note the last one is interesting -- if tmpfs size is not set, the in-kernel default is used (half the RAM as per kernel Documentation/filesystems/tmpfs.txt).

@thaJeztah
Copy link
Member

I have rebased it and ran some manual tests:

Thanks! That was indeed the reason my earlier testing didn't work 😊

Note the last one is interesting -- if tmpfs size is not set, the in-kernel default is used (half the RAM as per kernel Documentation/filesystems/tmpfs.txt).

Yes, also noticed that while testing; I think we should apply the default size if no size was provided, because even in the most minimal case (e.g., just adding :rw - which is already the default - --tmpfs=/dev/shm:rw) causes the default size to get lost.

wdyt?

@thaJeztah
Copy link
Member

Hm, there's actually some prior discussion on this for tmpfs (in general, so not /dev/shm); #23398, #22420, and #22438

So, hm, I don't know what the behaviour should be

ping @justincormack as well 😄

@kolyshkin
Copy link
Contributor Author

I'm not sure what would be the right thing to do. The default size we have is for the default /dev/shm created (provided, supplied) by dockerd, while the --tmpfs /dev/shm is definitely not the default one.

Anyway, by looking at git commits I see there was one that added the default size:

That last PR refers to issue #23394. The issue is with /tmp being 64M things get broken. Interestingly, I tried running centos, fedora, and ubuntu, and there is no mount for /tmp. I saw that issue mentions DCOS and indeed it does create a mount for /tmp (https://github.com/dcos/dcos-docker/blob/2533532432d84199960bc4a72b76fa78796b8d3a/Makefile#L54), albeit its size is specified (as per commit mesosphere-backup/dcos-docker@4a1058b), as they probably hit the issue.

Anyway, my understanding is, some people rely on --tmpfs of unspecified size being big enough.

Our options here are in abundance:

  1. Leave it as is, i.e. up to the kernel, which currently makes it RAM/2;
  2. Limit it to hard-coded 64M;
  3. Limit it to user-specified shm-size (which defaults to 64M and can be overridden on a per-daemon or a per-container basis);
  4. Limit it to half the container's RAM (if --memory is specified);
  5. Any clever combination of the above.

Based on my experience with OpenVZ, having restrictive options by default (as in 2 and 3 above) is good and bad at the same time.

Being restrictive by default is good because:

  • it makes the system more secure overall (less chance of DoS initiated from a container)
  • it makes users think about simple questions like "how much RAM/diskspace/whatever do I actually need for this container?", makes them learn some resource management, and ultimately better know and control their systems.

Being restrictive by default is bad because:

  • manual configuration and better system knowledge is required (in other words, steeper learning curve);
  • oftentimes, users hit the (restrictive) limits and just give up (in other words, the software as a whole doesn't work for them).

Other source of wisdom is UNIX; unfortunately it sends us mixed signals, from Unix is user-friendly — it's just choosy about who its friends are (applicable to the case, that would mean "be restrictive by default and let users figure out all the knobs and dials"), to UNIX gives its users many ways to shoot themselves in a foot (applicable to this case, that'd mean "don't impose any restrictions").

Another way to look at it is "what do we usually do in such cases". Historically, Docker imposes no limits by default (say, RAM and disk space is not limited by default -- although perhaps it happened that way because there were no means to control such resources at that time). This is probably what users are expecting.

To sum it up: while personally I'd prefer limiting, I think leaving it as is would be more appropriate. Option 4, limit it to half a RAM (of container, if configured) is also a sensible option, but I guess it is better to be done in the kernel, which means it's the same as option 1.

@thaJeztah
Copy link
Member

Thanks for summarising the options, @kolyshkin!

I was briefly discussing this with @justincormack yesterday, and although we set defaults for /dev/shm, we do not set any defaults for custom tmpfs (--tmpfs) mounts;

$ docker run --rm --tmpfs /foo busybox sh -c 'mount | grep shm'
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
$ docker run --rm --tmpfs /foo busybox sh -c 'mount | grep foo'
tmpfs on /foo type tmpfs (rw,nosuid,nodev,noexec,relatime)

So, even though using --tmpfs /dev/shm happens to be overwriting the default /dev/shm that was set in the container, from Moby's perspective it's just another tmpfs, and it's the users responsibility to set the options that are desired (i.e., we should not be making assumptions).

Based on the above, I consider the current implementation to be correct.

What we do have to do is:

  • Update the documentation;
  • Related, but important:
    • Describe how --shm-size, --tmpfs, --mount, --volume, and --device interact (in what order are they applied / which one takes precedence)
    • Consider combining above options with conflicting paths to be invalid (may be complicated to check)
    • Consider marking (some of) the above options to be deprecated
    • Decide if we really want to add --shm-size to services, or use --mount as the real solution (Allow setting of shm-size for docker service create #26714)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants