Skip to content

Refresh containerd remotes on containerd restarted #36173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 8, 2018

Conversation

cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Jan 31, 2018

Before this patch, when containerd is restarted (due to a crash, or
kill, whatever), the daemon would keep trying to process the event
stream against the old socket handles. This would lead to a CPU spin due
to the error handling when the client can't connect to containerd.

This change makes sure the containerd remote client is updated for all
registered libcontainerd clients.

This is not necessarily the ideal fix which would likely require a
major refactor, but at least gets things to a working state with a
minimal patch.

Fixes #36002

@cpuguy83
Copy link
Member Author

ping @thaJeztah

@stevvooe
Copy link
Contributor

LGTM

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐯

@thaJeztah
Copy link
Member

Error response from daemon: driver "windowsfilter" failed to remove root filesystem for 2cdd9517ebbcff26868b4aaecbb61b6e9a9a201382645575954cea935cf8cfe4: rename D:\CI\CI-c0f59a531\daemon\windowsfilter\2cdd9517ebbcff26868b4aaecbb61b6e9a9a201382645575954cea935cf8cfe4 D:\CI\CI-c0f59a531\daemon\windowsfilter\2cdd9517ebbcff26868b4aaecbb61b6e9a9a201382645575954cea935cf8cfe4-removing: Access is denied.
23:16:29 
	Messages:   	failed to remove 2cdd9517ebbcff26868b4aaecbb61b6e9a9a201382645575954cea935cf8cfe4
23:16:29 

@mlaventure
Copy link
Contributor

@thaJeztah Windows doesn't use containerd yet :)

Copy link
Contributor

@mlaventure mlaventure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpuguy83
Copy link
Member Author

cpuguy83 commented Feb 1, 2018

Well, something here seems to not like CI....

@mlaventure
Copy link
Contributor

mlaventure commented Feb 1, 2018

The windows CI is definitely unrelated, since this code is not even built on windows.

I see that z failed too, that's something else unless z is known to be flaky sometimes? But it wasn't last time I paid attention.

@cpuguy83
Copy link
Member Author

cpuguy83 commented Feb 1, 2018

Janky is also stuck waiting for the daemon to start, multiple times now.

@mlaventure
Copy link
Contributor

The stack trace would help, I can't see how this is causing it so far :?

Before this patch, when containerd is restarted (due to a crash, or
kill, whatever), the daemon would keep trying to process the event
stream against the old socket handles. This would lead to a CPU spin due
to the error handling when the client can't connect to containerd.

This change makes sure the containerd remote client is updated for all
registered libcontainerd clients.

This is not neccessarily the ideal fix which would likely require a
major refactor, but at least gets things to a working state with a
minimal patch.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 force-pushed the fix_containerd_crash_spin branch from 2247756 to 400126f Compare February 7, 2018 16:57
@cpuguy83
Copy link
Member Author

cpuguy83 commented Feb 7, 2018

Found the issue. It was due to recursive locking in Restore()

@cpuguy83
Copy link
Member Author

cpuguy83 commented Feb 7, 2018

And green.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸

@yongtang yongtang merged commit 384ff69 into moby:master Feb 8, 2018
@cpuguy83 cpuguy83 deleted the fix_containerd_crash_spin branch February 8, 2018 14:27
lox added a commit to buildkite/elastic-ci-stack-for-aws that referenced this pull request Feb 23, 2018
lox added a commit to buildkite/elastic-ci-stack-for-aws that referenced this pull request Feb 23, 2018
lox added a commit to buildkite/elastic-ci-stack-for-aws that referenced this pull request Feb 24, 2018
lox added a commit to buildkite/elastic-ci-stack-for-aws that referenced this pull request Feb 26, 2018
mikeknox pushed a commit to mikeknox/elastic-ci-stack-for-aws that referenced this pull request Apr 10, 2018
… feature/merge-in-v2.3.5 to feature/add_latest_vault_plugin

* commit 'bcb6fe5980595d336a08c9b16333c58e733a1758': (52 commits)
  use stable agent
  use ami that doesn't require usage agreement
  Update changelog for 2.3.5-rc2
  Bump changelog for v2.3.5
  Show docker logs if docker isn't running
  Fix linting issue in docker installer
  Install 17.12.1-ce-rc2 to fix moby/moby#36173
  Revert "Merge pull request buildkite#360 from buildkite/vpc-rewrite-for-all-available-subnets"
  Update changelog for v2.3.1-v2.3.4
  Add some troubleshooting tips to README
  Configure docker config in boothook to avoid race conditions
  Check docker is running in environment
  Remove du output, will be too slow
  Try cleaning up disk in the environment hook if needed
  Trim leading whitespace from disk space output 💅🏻
  Make disk space output more human friendly
  Append to elastic-stack.log rather than overwrite
  Only redirect cron output if caller is root
  Tests should run cron as root
  Add logging to cron tasks and show disk usage in stack environment
  ...
mikeknox pushed a commit to mikeknox/elastic-ci-stack-for-aws that referenced this pull request Apr 10, 2018
… feature/add_latest_vault_plugin to use-vault-plugin

* commit 'f50fdfc20190313bd80ddabe1b69fa21d09b3e29': (55 commits)
  use stable agent
  use ami that doesn't require usage agreement
  specify vault-backend branch for vault plugin
  update to latest vault plugin
  update to latest vault plugin
  Update changelog for 2.3.5-rc2
  Bump changelog for v2.3.5
  Show docker logs if docker isn't running
  Fix linting issue in docker installer
  Install 17.12.1-ce-rc2 to fix moby/moby#36173
  Revert "Merge pull request buildkite#360 from buildkite/vpc-rewrite-for-all-available-subnets"
  Update changelog for v2.3.1-v2.3.4
  Add some troubleshooting tips to README
  Configure docker config in boothook to avoid race conditions
  Check docker is running in environment
  Remove du output, will be too slow
  Try cleaning up disk in the environment hook if needed
  Trim leading whitespace from disk space output 💅🏻
  Make disk space output more human friendly
  Append to elastic-stack.log rather than overwrite
  ...
mikeknox pushed a commit to mikeknox/elastic-ci-stack-for-aws that referenced this pull request Apr 10, 2018
kolyshkin added a commit to kolyshkin/moby that referenced this pull request Oct 27, 2020
In case we are killing and restarting containerd, do try to reconnect
and use the new connection.

Loosely based on moby#36173

Might help https://bugzilla.redhat.com/show_bug.cgi?id=1746435

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Killing docker-containerd breaks interaction with containers
7 participants