-
Notifications
You must be signed in to change notification settings - Fork 18.7k
projectquota: protect concurrent map access (ENGCORE-920) #39644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@amir73il PTAL |
projectID, ok := q.quotas[targetPath] | ||
q.Unlock() | ||
if !ok { | ||
projectID = q.nextProjectID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a lock too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it... the code was written without concurrency in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could use atomics for this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's easier to reuse the same lock
Seems like there's some other potential for data races around project ID's. |
Test is failing;
|
daemon/graphdriver/quota/types.go
Outdated
@@ -13,4 +14,5 @@ type Control struct { | |||
backingFsBlockDev string | |||
nextProjectID uint32 | |||
quotas map[string]uint32 | |||
sync.RWMutex // protect quotas map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this above quotas
? I think the convention is to put the mutex above the fields that it is protecting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
q.quotas[targetPath] = projectID | ||
q.Unlock() | ||
q.nextProjectID++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should nextProjectID
also be inside the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, never mind, I see brian's comment above
b6421df
to
4fbe160
Compare
@cpuguy83 I think I have handled it (now in the latest version)
Are you assuming it might be unsafe? I took a look at quotactl(2) man page and can't find anything that hints it might be the case. Perhaps @amir73il can chime in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
@@ -1,6 +1,7 @@ | |||
// +build linux | |||
|
|||
package quota // import "github.com/docker/docker/daemon/graphdriver/quota" | |||
import "sync" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this passed the gofmt check.
Can we move add a whitespace line above import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
I am surprised as well -- it was auto-inserted by goimports
which is a superset of gofmt
-- maybe it's a corner case which they don't handle, I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protect access to q.quotas map, and lock around changing nextProjectID. Techinically, the lock in findNextProjectID() is not needed as it is only called during initialization, but one can never be too careful. Fixes: 52897d1 ("projectquota: utility class for project quota controls") Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I see an old test here related to moby/integration-cli/docker_cli_create_test.go Lines 60 to 72 in 6345208
But test is only run on devicemapper. Perhaps we should change it to allow it to run on overlay2 with XFS? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some followups Tried to repro this, and apparently projectquota is not working on latest CentOS 7, due to
So, It's all working just fine with Debian 9, though it's not reproducible there since it's a UP machine. Reproduced on a SLES 15 system with 6x Xeon and XFS with pquota, using the following config:
and a stress-testing script from #39135 (took two runs with default values). Compiled a fixed version; so far no bug. Left it to run overnight to fully confirm. |
@kolyshkin can you create a tracking issue for project quota not working on CentOS 7 ? |
Ran 140 iterations of the following script overnight (created/removed 1400000 containers total) using dockerd compiled from moby/moby master on SLES-15 system, no issues. #!/bin/bash
NUMCTS=10000
PARALLEL=100
create() {
# -v 'vol_{}:/vol'
time ( seq $NUMCTS | parallel -j$PARALLEL docker create alpine top > /dev/null )
echo "^^^ CREATE TIME ^^^"
}
remove() {
time ( docker ps -qa | shuf | tail -n $NUMCTS | parallel -j$PARALLEL docker rm -f '{}' > /dev/null )
echo "^^^ REMOVE TIME ^^^"
}
# precreate $NUMCTS containers
create
echo
# create another $NUMCTS while removing $NUMCTS containers
create &
remove &
wait
# remove the rest of it
remove |
fixes #39643
The bug was always there, hidden by the global lock (removed in #39135, which opened a can of worms fixed in a few followup PRs, such as #39209)