New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478
base: main
Are you sure you want to change the base?
Conversation
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I dislike PyErr_HasSignals() function. You should reuse the existing PyErr_CheckSignals() function. The idea of PyErr_CheckSignals() is that if you hold the GIL for a long time, between two functions calls, you should check for signals. If PyErr_CheckSignals() fails, it raises an exception, you must abandon what you are doing and exit the function (with the exception raised by PyErr_CheckSignals()).
Check for other code using PyErr_CheckSignals() to have examples.
AFAICT In the case of this specific code path, it is possible (but somewhat uncommon) for the calls to
@vstinner can you see any approaches that I missed? What would you do in this situation? |
I'm aware of the failing |
I think you could avoid the |
I agree that it's fair to assume non-blocking read operations will complete quickly, but I'm not as confident making that assumption for a loop that could do millions of such operations. Imagine a situation where the application reads an HTTP header that says "Content-Length: 20000000000". It wants to buffer the download in memory, so it calls The upshot is that if we don't check for signals until after the end of the loop, the latency of signal processing will depend on whether the download was CPU bound, which seems surprising to me. |
Calling |
Continue looping until data is exhausted, and only then reacquire the GIL. This makes it possible to perform multi-threaded TLS downloads without saturating the GIL. On a test workload performing HTTPS download with 32 threads pinned to 16 cores, this produces a 4x speedup. before after wall clock time (s) : 29.637 7.116 user time (s) : 8.793 12.584 system time (s) : 105.118 30.010 voluntary switches : 1,653,065 248,484 speed (MB/s) : 4733 19712
I have removed |
The remaining problem is that this change breaks the Right now callers can rely on fact that either SSLSocket.read returns bytes OR it reads a TLS close_notify message, producing an exception; it never does both. With this change, it becomes possible to read application data and also read a TLS close_notify message, which will be processed by OpenSSL. That said, I don't imagine it's acceptable to change behavior like this. I think that to roll out this optimization would require calling code to opt in by passing a flag indicating "Please read eagerly from the socket; I know that I am responsible for explicitly checking for EOF before calling select() on the underlying OS socket." Does anyone have a better idea? |
Anyone that can give input? So @hashbrowncipher can continue. |
This PR is stale because it has been open for 30 days with no activity. |
Should this pull request no longer be a Draft? |
Does anyone know why the CI is blocked on this PR? |
@hashbrowncipher, if you put a comment on this post that says, "I have made the requested changes; please review again," I think the bot will tag this PR as needing core dev review. That might help you get your answers. |
@@ -2334,10 +2334,11 @@ _ssl__SSLSocket_read_impl(PySSLSocket *self, int len, int group_right_1, | |||
PyObject *dest = NULL; | |||
char *mem; | |||
size_t count = 0; | |||
size_t got = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t got = 0; | |
size_t readbytes; |
int retval; | ||
int sockstate; | ||
_PySSLError err; | ||
int nonblocking; | ||
int nonblocking = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to initialize the variable with 0
?
retval = SSL_read_ex(self->ssl, mem + got, len, &count); | ||
if(retval <= 0) { | ||
break; | ||
} | ||
|
||
got += count; | ||
len -= count; | ||
} while(nonblocking && len > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got
and count
are confusing variable names. I suggest that you keep count
as counter for the read()
function call and introduce a new variable for SSL_read_ex()
call.
Is it a good idea to have a potentially infinite loop here that ignores signals and timeouts? This smells like DoS vunlerable to be happen.
retval = SSL_read_ex(self->ssl, mem + got, len, &count); | |
if(retval <= 0) { | |
break; | |
} | |
got += count; | |
len -= count; | |
} while(nonblocking && len > 0); | |
retval = SSL_read_ex(self->ssl, mem + got, len, &readbytes); | |
if (retval <= 0) { | |
break; | |
} | |
count += readbytes; | |
len -= readbytes; | |
} while (nonblocking && len > 0); |
err = _PySSL_errno(retval == 0, self->ssl, retval); | ||
PySSL_END_ALLOW_THREADS | ||
self->err = err; | ||
|
||
if(got > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(got > 0) { | |
if (count > 0) { |
return | ||
size -= count | ||
|
||
raise AssertionError("All TLS reads were smaller than 16KB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise AssertionError("All TLS reads were smaller than 16KB") | |
self.fail("All TLS reads were smaller than 16KB") |
As @hashbrowncipher seems to have abandoned this (I tried to reach him on Twitter also, but no response), I will rebase and apply the requested changes. Of course if @hashbrowncipher wants to continue, of course I will wait |
@Safihre please do your best :) |
@tiran/@vstinner I have ran into the same problem as @hashbrowncipher: some code (like the FTP-tests) do not expect the |
my 2c, as an end-user: |
Continue looping until data is exhausted, and only then reacquire the GIL. This
makes it possible to perform multi-threaded TLS downloads without saturating
the GIL. On a test workload performing HTTPS download with 32 threads pinned
to 16 cores, this produces a 4x speedup.
Also add
PyErr_HasSignals()
, which checks whether a signal has arrivedwithout holding the GIL.
https://bugs.python.org/issue37355