-
Notifications
You must be signed in to change notification settings - Fork 18.7k
container: protect health monitor channel #35482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While this code was likely called from a single thread before, we have now seen panics, indicating that it could be called in parallel. This change adds a mutex to protect opening and closing of the channel. There may be another root cause associated with this panic, such as something that led to the calling of this in parallel, as this code is old and we had seen this condition until recently. This fix is by no means a permanent fix. Typically, bugs like this indicate misplaced channel ownership. In idiomatic uses, the channel should have a particular "owner" that coordinates sending and closure. In this case, the owner of the channel is unclear, so it gets opened lazily. Synchronizing this access is a decent solution, but a refactor may yield better results. Signed-off-by: Stephen J Day <stephen.day@docker.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but would like @cpuguy83 to have a look as well
cc @talex5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
in the future, would be good to have a |
@andrewhsu testing was already discussed, but the issue this is fixing is not possible to replicate in a testing scenario |
if not to replicate the testing scenario, just having the framework in place for a unit test prompts any future contributor to write an accompanying test if it relates. just saying...not all fixes scenarios can be tested in a unit test, but some of the auxiliary code can be verified to not have changed behaviour in a fix attempt. |
Adding an empty boilerplate with the testing framework in place would make compilation fail (unused imports), and although I generally agree on having (unit)-tests; we should guard against adding tests that don't test anything, as they give a false sense of coverage. |
@andrewhsu If a valid test were possible here, I would have included one. Unfortunately, races such as these fall into the category of a negative assertion. One cannot prove that there is not a race with a test that doesn't cover all of time (without the race detector). We have to rely on our mental faculties and experience to identify and address these kinds of conditions. In that framework, we can also only show that we are reasonably certain. In this case, we have a race condition, where the value of the Indeed, we can reproduce that situation in a test case by running Ideally, the functionality here is covered by some functional test that ensures these implementation details are not working. It seems like the concurrency level of that testing might not be sufficient. I am not wildly familiar with the root cause, but there was some talk of a higher level condition that could cause this. A test there, with the race detector enabled, would likely more sufficiently address this issue, and expose other possible issues. |
While this code was likely called from a single thread before, we have
now seen panics, indicating that it could be called in parallel. This
change adds a mutex to protect opening and closing of the channel. There
may be another root cause associated with this panic, such as something
that led to the calling of this in parallel, as this code is old and we
had seen this condition until recently.
This fix is by no means a permanent fix. Typically, bugs like this
indicate misplaced channel ownership. In idiomatic uses, the channel
should have a particular "owner" that coordinates sending and closure.
In this case, the owner of the channel is unclear, so it gets opened
lazily. Synchronizing this access is a decent solution, but a refactor
may yield better results.
Signed-off-by: Stephen J Day stephen.day@docker.com
Terror Bird: They ranged in height from 1–3 metres (3.3–9.8 ft) tall. Their closest modern-day relatives are believed to be the 80 cm-tall seriemas. Titanis walleri, one of the larger species, is known from Texas and Florida in North America.