Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hashbrowncipher
Copy link
Contributor

@hashbrowncipher hashbrowncipher commented Apr 20, 2021

Continue looping until data is exhausted, and only then reacquire the GIL. This
makes it possible to perform multi-threaded TLS downloads without saturating
the GIL. On a test workload performing HTTPS download with 32 threads pinned
to 16 cores, this produces a 4x speedup.

                          before     after
wall clock time (s)  :    29.637     7.116
user time (s)        :     8.793    12.584
system time (s)      :   105.118    30.010
voluntary switches   : 1,653,065   248,484
speed (MB/s)         :      4733     19712

Also add PyErr_HasSignals(), which checks whether a signal has arrived
without holding the GIL.

https://bugs.python.org/issue37355

Copy link
Member

@tiran tiran left a comment

There are a bunch of C compiler warnings and failing tests. Please take a look.

Lib/test/test_ssl.py Outdated Show resolved Hide resolved
Modules/_ssl.c Outdated Show resolved Hide resolved
Modules/_ssl.c Outdated Show resolved Hide resolved
Modules/signalmodule.c Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

bedevere-bot commented Apr 20, 2021

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@tiran tiran self-assigned this Apr 20, 2021
@tiran tiran requested review from vstinner and njsmith Apr 20, 2021
Copy link
Member

@vstinner vstinner left a comment

I dislike PyErr_HasSignals() function. You should reuse the existing PyErr_CheckSignals() function. The idea of PyErr_CheckSignals() is that if you hold the GIL for a long time, between two functions calls, you should check for signals. If PyErr_CheckSignals() fails, it raises an exception, you must abandon what you are doing and exit the function (with the exception raised by PyErr_CheckSignals()).

Check for other code using PyErr_CheckSignals() to have examples.

@hashbrowncipher
Copy link
Contributor Author

hashbrowncipher commented Apr 20, 2021

AFAICT PyErr_CheckSignals requires holding the GIL when called. I added PyErr_HasSignals to allow checking for signal arrival without the GIL held. If it returns true, then the calling code knows that it must reacquire the GIL and call PyErr_CheckSignals.

In the case of this specific code path, it is possible (but somewhat uncommon) for the calls to SSL_read to take a noticeable amount of time to fill the entire buffer, even though they are nonblocking. The options I as I saw them were:

  1. pay a 4x throughput penalty by reacquiring the GIL and calling PyErr_CheckSignals after each 16KB record (status quo)
  2. introduce a function like PyErr_HasSignals to check for signal arrival without holding the GIL
  3. do nothing about signal arrival until the buffer is full.

@vstinner can you see any approaches that I missed? What would you do in this situation?

@hashbrowncipher
Copy link
Contributor Author

hashbrowncipher commented Apr 20, 2021

I'm aware of the failing test_ftplib tests, but I won't be able to debug them until later.

@njsmith
Copy link
Contributor

njsmith commented Apr 20, 2021

I think you could avoid the HasSignals trick by not bothering to check for signals at all during a non-blocking read loop. Checking for signals during blocking operations is important because otherwise they might delay processing the signal for an unbounded amount of time. But for a non-blocking read, it's guaranteed to complete quickly in any case, so it should be enough to check for signals at the end, I think?

@hashbrowncipher
Copy link
Contributor Author

hashbrowncipher commented Apr 20, 2021

I think you could avoid the HasSignals trick by not bothering to check for signals at all during a non-blocking read loop. Checking for signals during blocking operations is important because otherwise they might delay processing the signal for an unbounded amount of time. But for a non-blocking read, it's guaranteed to complete quickly in any case, so it should be enough to check for signals at the end, I think?

I agree that it's fair to assume non-blocking read operations will complete quickly, but I'm not as confident making that assumption for a loop that could do millions of such operations. Imagine a situation where the application reads an HTTP header that says "Content-Length: 20000000000". It wants to buffer the download in memory, so it calls sock.read(20000000000). If the receiver is CPU-bound, then there will always be data available for processing in the socket buffer, and signals will not be handled until the entire download completes. If it is not CPU bound, then the read operation will return early.

The upshot is that if we don't check for signals until after the end of the loop, the latency of signal processing will depend on whether the download was CPU bound, which seems surprising to me.

@njsmith
Copy link
Contributor

njsmith commented Apr 20, 2021

Calling sock.read(20_000_000_000) is pretty broken to start with – you force the system to allocate a 20 gigabyte buffer, but then python will truncate it to whatever it manages to read before hitting end of data, so 99.99% of the time you allocate 20 gigabytes and then you immediately discard them again. And for normal sockets, that's sufficient protection, because you each call to .recv only makes one syscall. I do see your point about if you're CPU-bound though; it's annoying openssl doesn't just do one syscall and then decrypt all of it, but instead forces another syscall for each packet...

Continue looping until data is exhausted, and only then reacquire the GIL. This
makes it possible to perform multi-threaded TLS downloads without saturating
the GIL. On a test workload performing HTTPS download with 32 threads pinned
to 16 cores, this produces a 4x speedup.

                              before     after
    wall clock time (s)  :    29.637     7.116
    user time (s)        :     8.793    12.584
    system time (s)      :   105.118    30.010
    voluntary switches   : 1,653,065   248,484
    speed (MB/s)         :      4733     19712
@hashbrowncipher
Copy link
Contributor Author

hashbrowncipher commented Apr 22, 2021

I have removed PyErr_HasSignals. Signals will be buffered until the reading completes.

@hashbrowncipher
Copy link
Contributor Author

hashbrowncipher commented Apr 22, 2021

The remaining problem is that this change breaks the test_ftplib tests. I'd be interested in any bright ideas for how to remedy this, because right now (to me) it looks like this optimization changes existing expectations.

Right now callers can rely on fact that either SSLSocket.read returns bytes OR it reads a TLS close_notify message, producing an exception; it never does both. With this change, it becomes possible to read application data and also read a TLS close_notify message, which will be processed by OpenSSL. select/poll/epoll will all now indicate that the socket has no bytes available (which is true), but OpenSSL knows that no additional bytes will ever arrive on the socket. I validated my theory by patching the test_ftplib code to check for EOF after successful reads. With this change, the tests all passed.

That said, I don't imagine it's acceptable to change behavior like this. I think that to roll out this optimization would require calling code to opt in by passing a flag indicating "Please read eagerly from the socket; I know that I am responsible for explicitly checking for EOF before calling select() on the underlying OS socket." Does anyone have a better idea?

@Safihre
Copy link

Safihre commented May 21, 2021

Anyone that can give input? So @hashbrowncipher can continue.

@github-actions
Copy link

github-actions bot commented Jun 21, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 21, 2021
@dimaqq
Copy link
Contributor

dimaqq commented Jul 1, 2021

Should this pull request no longer be a Draft?

@pitrou
Copy link
Member

pitrou commented Jul 1, 2021

Does anyone know why the CI is blocked on this PR?

@jdevries3133
Copy link
Contributor

jdevries3133 commented Jul 1, 2021

@hashbrowncipher, if you put a comment on this post that says, "I have made the requested changes; please review again," I think the bot will tag this PR as needing core dev review. That might help you get your answers.

Copy link
Member

@tiran tiran left a comment

Please address my comments and implement the same feature for write().

@@ -2334,10 +2334,11 @@ _ssl__SSLSocket_read_impl(PySSLSocket *self, int len, int group_right_1,
PyObject *dest = NULL;
char *mem;
size_t count = 0;
size_t got = 0;
Copy link
Member

@tiran tiran Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t got = 0;
size_t readbytes;

int retval;
int sockstate;
_PySSLError err;
int nonblocking;
int nonblocking = 0;
Copy link
Member

@tiran tiran Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to initialize the variable with 0?

retval = SSL_read_ex(self->ssl, mem + got, len, &count);
if(retval <= 0) {
break;
}

got += count;
len -= count;
} while(nonblocking && len > 0);
Copy link
Member

@tiran tiran Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got and count are confusing variable names. I suggest that you keep count as counter for the read() function call and introduce a new variable for SSL_read_ex() call.

Is it a good idea to have a potentially infinite loop here that ignores signals and timeouts? This smells like DoS vunlerable to be happen.

Suggested change
retval = SSL_read_ex(self->ssl, mem + got, len, &count);
if(retval <= 0) {
break;
}
got += count;
len -= count;
} while(nonblocking && len > 0);
retval = SSL_read_ex(self->ssl, mem + got, len, &readbytes);
if (retval <= 0) {
break;
}
count += readbytes;
len -= readbytes;
} while (nonblocking && len > 0);

err = _PySSL_errno(retval == 0, self->ssl, retval);
PySSL_END_ALLOW_THREADS
self->err = err;

if(got > 0) {
Copy link
Member

@tiran tiran Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(got > 0) {
if (count > 0) {

return
size -= count

raise AssertionError("All TLS reads were smaller than 16KB")
Copy link
Member

@tiran tiran Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise AssertionError("All TLS reads were smaller than 16KB")
self.fail("All TLS reads were smaller than 16KB")

@Safihre
Copy link

Safihre commented Sep 29, 2021

As @hashbrowncipher seems to have abandoned this (I tried to reach him on Twitter also, but no response), I will rebase and apply the requested changes.

Of course if @hashbrowncipher wants to continue, of course I will wait 😃

@dimaqq
Copy link
Contributor

dimaqq commented Jan 25, 2022

@Safihre please do your best :)

@Safihre
Copy link

Safihre commented Jan 31, 2022

@tiran/@vstinner I have ran into the same problem as @hashbrowncipher: some code (like the FTP-tests) do not expect the SSLSocket.read() to consume the EOF.
Do you agree with @hashbrowncipher that there should be a new parameter (for example on the SSLContext) to enable this new behaviour?
If you agree, then I will try to implement and also add the same for SSLSocket.write().

@dimaqq
Copy link
Contributor

dimaqq commented Feb 1, 2022

my 2c, as an end-user:
I think the EOF handling is a leftover from bygone days;
Today the user code would rather open a new connection, thus .read() consuming EOF is totally reasonable...
Perhaps if that breaks code that calls .read() again after, it needs to re-report EOF or raise or something 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet