-
Notifications
You must be signed in to change notification settings - Fork 18.7k
seccomp: whitelist io-uring related system calls #39415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like CI is failing, because the json version was not updated as part of this PR;
|
ping @justincormack @crosbymichael PTAL |
I made a mistake thinking the json file is manually written, and it should be generated. I have fixed that now but some of the ci seems to be failing because of removing some non-existing folder (C:\go), what does that mean? |
Thanks! The RS5 failure can be ignored (see #39403) |
The only failure left is TestAPISwarmServicesMultipleAgents in janky, which does not seem to be relevant. I read the code but it is not about seccomp. |
yes, looks like that one is sometimes flaky; #23626 I'll restart CI |
This time it's only a few SKIPs & XFAILs, why is it marked as failing again? |
Looks to be an issue with the docker-py tests;
I'll kick that build |
@@ -160,6 +160,9 @@ func DefaultProfile() *types.Seccomp { | |||
"ioprio_set", | |||
"io_setup", | |||
"io_submit", | |||
"io_uring_setup", | |||
"io_uring_enter", | |||
"io_uring_register", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you sort these alphabetically like the rest please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done that
Are there any other new syscalls since the last update? |
ping @omegacoleman could you answer @justincormack's question (and sort the list #39415 (comment))? |
The precise answer is "I don't know" since I'm just a network dev not kernel dev, but even though there are, the new system call has to be added to the upstream libseccomp's syscall translation (name -> number) before it could be added to the config here, and these calls were added. Sorry for the delay as it is well, trival & as the upstream has added the calls I was able to just override the config myself.. |
Codecov Report
@@ Coverage Diff @@
## master #39415 +/- ##
=========================================
Coverage ? 37.34%
=========================================
Files ? 609
Lines ? 45200
Branches ? 0
=========================================
Hits ? 16882
Misses ? 26030
Partials ? 2288 |
Please sign your commits following these rules: $ git clone -b "master" git@github.com:omegacoleman/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354606984
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f Amending updates the existing PR. You DO NOT need to open a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating; could you try rebasing the PR to get rid of the merge commit?
Signed-off-by: youcai <omegacoleman@gmail.com>
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
windows failure is unrelated; let me get this one in |
This seems like it's a bad idea -- io_uring allows programs to execute certain syscalls without being limited by seccomp. Currently all of the syscalls you can access through io_uring are permitted in the default profile, but I'm a little worried that this wasn't brought up at the time. |
@cyphar it looks like I am reviewing the whole seccomp policy based on what has been useful in the past. |
Funnily enough, I found this PR after commenting on that LWN article. The problem with that restriction mechanism is that it doesn't really fulfil the seccomp usecase (as long as you can create a new |
From what I got from the news they're working on sth like using eBPF to filter the SQEs upon ktail update. Will try to make that configurable after kernel & libseccomp get that feature. |
Shall we revert this until we find the solution? |
I don't think it's necessary to revert. The default profile allows everything that you can do with |
This syncs the seccomp profile with changes made to containerd's default profile in [1]. The original containerd issue and PR mention: > Security experts generally believe io_uring to be unsafe. In fact > Google ChromeOS and Android have turned it off, plus all Google > production servers turn it off. Based on the blog published by Google > below it seems like a bunch of vulnerabilities related to io_uring can > be exploited to breakout of the container. > > [2] > > Other security reaserchers also hold this opinion: see [3] for a > blackhat presentation on io_uring exploits. For the record, these syscalls were added to the allowlist in [4]. [1]: containerd/containerd@a48ddf4 [2]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html [3]: https://i.blackhat.com/BH-US-23/Presentations/US-23-Lin-bad_io_uring.pdf [4]: moby#39415 Signed-off-by: Albin Kerouanton <albinker@gmail.com>
for future visitors; these were removed again in; |
closes #39396
- What I did
I added the io-uring related system call introduced in kernel 5.1 to the seccomp whitelist.
- How I did it
This is merely a follow-up to the upstream
- How to verify it
Build docker with the master branch libseccomp, and will be run programs using io-uring without trouble.
With older kernels or older versions of libseccomp, the configure will be omitted.
- Description for the changelog
seccomp: whitelist io-uring related system calls
- A picture of a cute animal (not mandatory but encouraged)
Would be a waste for such a simple change..