Skip to content

bpo-45819: Avoid releasing the GIL in nonblocking socket operations #29579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jcrist
Copy link
Contributor

@jcrist jcrist commented Nov 16, 2021

Previously, every operation in socket would release the GIL, allowing
other threads to progress. This makes sense when using threads and
blocking sockets, but when using non-blocking sockets (as done with
asyncio) these operations return quickly. In the presence of background
threads this can lead to the "convoy effect", where the GIL is acquired
and held by the background thread every time the IO thread releases it,
and then the IO thread blocks until it regains ownership of the GIL,
causing a massive decrease in IO operations per second.

One possible fix for this is to have IO-heavy threads release the GIL
less frequently. In the case of asyncio, the following changes were
needed:

  • Don't release the GIL for socket operations on non-blocking sockets.
  • Don't release the GIL for select (epoll, select, kqueue, ...) calls
    with a timeout of 0.

See the corresponding issue at https://bugs.python.org/issue45819
for more information.

https://bugs.python.org/issue45819

Previously, every operation in `socket` would release the GIL, allowing
other threads to progress. This makes sense when using threads and
blocking sockets, but when using non-blocking sockets (as done with
asyncio) these operations return quickly. In the presence of background
threads this can lead to the "convoy effect", where the GIL is acquired
and held by the background thread every time the IO thread releases it,
and then the IO thread blocks until it regains ownership of the GIL,
causing a massive decrease in IO operations per second.

One possible fix for this is to have IO-heavy threads release the GIL
less frequently. In the case of asyncio, the following changes were
needed:

- Don't release the GIL for socket operations on non-blocking sockets.
- Don't release the GIL for select (epoll, select, kqueue, ...) calls
with a timeout of 0.

With these changes, asyncio doesn't see as degraded of an output when a
background thread holds the GIL.
@@ -147,6 +147,20 @@ PyAPI_FUNC(void) PyEval_ReleaseThread(PyThreadState *tstate);
#define Py_END_ALLOW_THREADS PyEval_RestoreThread(_save); \
}


/* Conditionally release and restore the GIL. */
#define _Py_BEGIN_ALLOW_THREADS_COND(cond) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some helper macros here, but am not sure if that's best practice. Happy to change this if others have a better suggestion.

result = epoll_ctl(epfd, op, fd, &ev);
Py_END_ALLOW_THREADS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell epoll_ctl operations like this always complete quickly, so releasing the GIL here isn't really necessary.

res = sock_func(s, data);
Py_END_ALLOW_THREADS
_Py_END_ALLOW_THREADS_COND
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are likely other socket operations where releasing the GIL could be avoided, but this gets the bulk of those done frequently in a loop (send/ recv, ...). socket.close() is also handled below.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Dec 17, 2021
Comment on lines -3150 to +3152
Py_BEGIN_ALLOW_THREADS
_Py_BEGIN_ALLOW_THREADS_COND(s->sock_timeout)
res = SOCKETCLOSE(fd);
Py_END_ALLOW_THREADS
_Py_END_ALLOW_THREADS_COND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change here. s->sock_timeout doesn't affect to close.
Unlike send/recv, close can not be called frequently more than creating socket. So this shouldn't be a big performance problem.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants