Skip to content

projectquota: protect concurrent map access (ENGCORE-920) #39644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2019

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 31, 2019

fixes #39643

The bug was always there, hidden by the global lock (removed in #39135, which opened a can of worms fixed in a few followup PRs, such as #39209)

@kolyshkin
Copy link
Contributor Author

@amir73il PTAL

projectID, ok := q.quotas[targetPath]
q.Unlock()
if !ok {
projectID = q.nextProjectID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a lock too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it... the code was written without concurrency in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could use atomics for this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's easier to reuse the same lock

@cpuguy83
Copy link
Member

Seems like there's some other potential for data races around project ID's.
I also wonder if we need some locking around the syscalls at least per ID or target path?

@thaJeztah
Copy link
Member

Test is failing;

18:23:59 fatal error: sync: Unlock of unlocked RWMutex

@@ -13,4 +14,5 @@ type Control struct {
backingFsBlockDev string
nextProjectID uint32
quotas map[string]uint32
sync.RWMutex // protect quotas map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this above quotas ? I think the convention is to put the mutex above the fields that it is protecting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

q.quotas[targetPath] = projectID
q.Unlock()
q.nextProjectID++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should nextProjectID also be inside the lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, never mind, I see brian's comment above

@kolyshkin kolyshkin force-pushed the quota-map branch 4 times, most recently from b6421df to 4fbe160 Compare July 31, 2019 21:01
@kolyshkin
Copy link
Contributor Author

Seems like there's some other potential for data races around project ID's.

@cpuguy83 I think I have handled it (now in the latest version)

I also wonder if we need some locking around the syscalls at least per ID or target path?

Are you assuming it might be unsafe? I took a look at quotactl(2) man page and can't find anything that hints it might be the case. Perhaps @amir73il can chime in?

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@@ -1,6 +1,7 @@
// +build linux

package quota // import "github.com/docker/docker/daemon/graphdriver/quota"
import "sync"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this passed the gofmt check.
Can we move add a whitespace line above import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I am surprised as well -- it was auto-inserted by goimports which is a superset of gofmt -- maybe it's a corner case which they don't handle, I'll take a look.

Copy link
Contributor Author

@kolyshkin kolyshkin Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protect access to q.quotas map, and lock around changing nextProjectID.

Techinically, the lock in findNextProjectID() is not needed as it is
only called during initialization, but one can never be too careful.

Fixes: 52897d1 ("projectquota: utility class for project quota controls")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@andrewhsu
Copy link
Member

I see an old test here related to --storage-opt:

// Make sure we can grow the container's rootfs at creation time.
func (s *DockerSuite) TestCreateGrowRootfs(c *check.C) {
// Windows and Devicemapper support growing the rootfs
if testEnv.OSType != "windows" {
testRequires(c, Devicemapper)
}
out, _ := dockerCmd(c, "create", "--storage-opt", "size=120G", "busybox")
cleanedContainerID := strings.TrimSpace(out)
inspectOut := inspectField(c, cleanedContainerID, "HostConfig.StorageOpt")
c.Assert(inspectOut, checker.Equals, "map[size:120G]")
}

But test is only run on devicemapper. Perhaps we should change it to allow it to run on overlay2 with XFS?

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpuguy83 cpuguy83 merged commit 41040b7 into moby:master Aug 1, 2019
@kolyshkin kolyshkin changed the title projectquota: protect concurrent map access projectquota: protect concurrent map access (ENGCORE-920) Aug 1, 2019
@kolyshkin
Copy link
Contributor Author

Some followups

Tried to repro this, and apparently projectquota is not working on latest CentOS 7, due to EACCES from the kernel -- apparently it does not like we're using a self-made device. I did some experiments and here are the gist of results:

[root@kir-ce75-gd quota]# strace -e quotactl ./quota.test -test.run M
quotactl(Q_XGETQSTAT|PRJQUOTA, "/dev/mapper/docker-lvdocker", 0, {version=1, ...}) = 0
quotactl(Q_XGETQSTAT|PRJQUOTA, "/var/lib/docker/overlay2/backingFsBlockDev", 0, 0xc000054ee8) = -1 EACCES (Permission denied)
PASS
+++ exited with 0 +++
[root@kir-ce75-gd quota]# ls -l /var/lib/docker/overlay2/backingFsBlockDev
brw-------. 1 root root 253, 0 Aug  1 20:13 /var/lib/docker/overlay2/backingFsBlockDev
[root@kir-ce75-gd quota]# ls -l /dev/mapper/docker-lvdocker
lrwxrwxrwx. 1 root root 7 Aug  1 20:03 /dev/mapper/docker-lvdocker -> ../dm-0
[root@kir-ce75-gd quota]# ls -l /dev/dm-0 
brw-rw----. 1 root disk 253, 0 Aug  1 20:03 /dev/dm-0

So, /var/lib/docker/overlay2/backingFsBlockDev is identical to /dev/dm-0 but the kernel returns an error for the former and success for the latter.

It's all working just fine with Debian 9, though it's not reproducible there since it's a UP machine.

Reproduced on a SLES 15 system with 6x Xeon and XFS with pquota, using the following config:

# cat /etc/docker/daemon.json 
{
	"debug": true,
	"storage-driver": "overlay2",
	"storage-opts": ["overlay2.size=1G"],
	"experimental": true
}

and a stress-testing script from #39135 (took two runs with default values).

Compiled a fixed version; so far no bug. Left it to run overnight to fully confirm.

@thaJeztah
Copy link
Member

@kolyshkin can you create a tracking issue for project quota not working on CentOS 7 ?

@kolyshkin
Copy link
Contributor Author

Ran 140 iterations of the following script overnight (created/removed 1400000 containers total) using dockerd compiled from moby/moby master on SLES-15 system, no issues.

#!/bin/bash

NUMCTS=10000
PARALLEL=100

create() {
	# -v 'vol_{}:/vol'
	time ( seq $NUMCTS | parallel -j$PARALLEL docker create alpine top > /dev/null )
	echo "^^^ CREATE TIME ^^^"
}

remove() {
	time ( docker ps -qa | shuf | tail -n $NUMCTS | parallel -j$PARALLEL docker rm -f '{}' > /dev/null )
	echo "^^^ REMOVE TIME ^^^"
}

# precreate $NUMCTS containers
create
echo

# create another $NUMCTS while removing $NUMCTS containers
create &
remove &
wait

# remove the rest of it
remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker daemon crashes (again) with "fatal error: concurrent map read and map write"
5 participants