bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478

hashbrowncipher · 2021-04-20T04:31:38Z

Continue looping until data is exhausted, and only then reacquire the GIL. This
makes it possible to perform multi-threaded TLS downloads without saturating
the GIL. On a test workload performing HTTPS download with 32 threads pinned
to 16 cores, this produces a 4x speedup.

                          before     after
wall clock time (s)  :    29.637     7.116
user time (s)        :     8.793    12.584
system time (s)      :   105.118    30.010
voluntary switches   : 1,653,065   248,484
speed (MB/s)         :      4733     19712

Also add PyErr_HasSignals(), which checks whether a signal has arrived
without holding the GIL.

https://bugs.python.org/issue37355

tiran

There are a bunch of C compiler warnings and failing tests. Please take a look.

Lib/test/test_ssl.py

Modules/_ssl.c

Modules/signalmodule.c

bedevere-bot · 2021-04-20T06:56:42Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

vstinner

I dislike PyErr_HasSignals() function. You should reuse the existing PyErr_CheckSignals() function. The idea of PyErr_CheckSignals() is that if you hold the GIL for a long time, between two functions calls, you should check for signals. If PyErr_CheckSignals() fails, it raises an exception, you must abandon what you are doing and exit the function (with the exception raised by PyErr_CheckSignals()).

Check for other code using PyErr_CheckSignals() to have examples.

hashbrowncipher · 2021-04-20T16:52:17Z

AFAICT PyErr_CheckSignals requires holding the GIL when called. I added PyErr_HasSignals to allow checking for signal arrival without the GIL held. If it returns true, then the calling code knows that it must reacquire the GIL and call PyErr_CheckSignals.

In the case of this specific code path, it is possible (but somewhat uncommon) for the calls to SSL_read to take a noticeable amount of time to fill the entire buffer, even though they are nonblocking. The options I as I saw them were:

pay a 4x throughput penalty by reacquiring the GIL and calling PyErr_CheckSignals after each 16KB record (status quo)
introduce a function like PyErr_HasSignals to check for signal arrival without holding the GIL
do nothing about signal arrival until the buffer is full.

@vstinner can you see any approaches that I missed? What would you do in this situation?

hashbrowncipher · 2021-04-20T17:04:10Z

I'm aware of the failing test_ftplib tests, but I won't be able to debug them until later.

njsmith · 2021-04-20T17:57:57Z

I think you could avoid the HasSignals trick by not bothering to check for signals at all during a non-blocking read loop. Checking for signals during blocking operations is important because otherwise they might delay processing the signal for an unbounded amount of time. But for a non-blocking read, it's guaranteed to complete quickly in any case, so it should be enough to check for signals at the end, I think?

hashbrowncipher · 2021-04-20T18:59:48Z

I think you could avoid the HasSignals trick by not bothering to check for signals at all during a non-blocking read loop. Checking for signals during blocking operations is important because otherwise they might delay processing the signal for an unbounded amount of time. But for a non-blocking read, it's guaranteed to complete quickly in any case, so it should be enough to check for signals at the end, I think?

I agree that it's fair to assume non-blocking read operations will complete quickly, but I'm not as confident making that assumption for a loop that could do millions of such operations. Imagine a situation where the application reads an HTTP header that says "Content-Length: 20000000000". It wants to buffer the download in memory, so it calls sock.read(20000000000). If the receiver is CPU-bound, then there will always be data available for processing in the socket buffer, and signals will not be handled until the entire download completes. If it is not CPU bound, then the read operation will return early.

The upshot is that if we don't check for signals until after the end of the loop, the latency of signal processing will depend on whether the download was CPU bound, which seems surprising to me.

njsmith · 2021-04-20T19:10:40Z

Calling sock.read(20_000_000_000) is pretty broken to start with – you force the system to allocate a 20 gigabyte buffer, but then python will truncate it to whatever it manages to read before hitting end of data, so 99.99% of the time you allocate 20 gigabytes and then you immediately discard them again. And for normal sockets, that's sufficient protection, because you each call to .recv only makes one syscall. I do see your point about if you're CPU-bound though; it's annoying openssl doesn't just do one syscall and then decrypt all of it, but instead forces another syscall for each packet...

Continue looping until data is exhausted, and only then reacquire the GIL. This makes it possible to perform multi-threaded TLS downloads without saturating the GIL. On a test workload performing HTTPS download with 32 threads pinned to 16 cores, this produces a 4x speedup. before after wall clock time (s) : 29.637 7.116 user time (s) : 8.793 12.584 system time (s) : 105.118 30.010 voluntary switches : 1,653,065 248,484 speed (MB/s) : 4733 19712

hashbrowncipher · 2021-04-22T21:18:15Z

I have removed PyErr_HasSignals. Signals will be buffered until the reading completes.

hashbrowncipher · 2021-04-22T21:53:07Z

The remaining problem is that this change breaks the test_ftplib tests. I'd be interested in any bright ideas for how to remedy this, because right now (to me) it looks like this optimization changes existing expectations.

Right now callers can rely on fact that either SSLSocket.read returns bytes OR it reads a TLS close_notify message, producing an exception; it never does both. With this change, it becomes possible to read application data and also read a TLS close_notify message, which will be processed by OpenSSL. select/poll/epoll will all now indicate that the socket has no bytes available (which is true), but OpenSSL knows that no additional bytes will ever arrive on the socket. I validated my theory by patching the test_ftplib code to check for EOF after successful reads. With this change, the tests all passed.

That said, I don't imagine it's acceptable to change behavior like this. I think that to roll out this optimization would require calling code to opt in by passing a flag indicating "Please read eagerly from the socket; I know that I am responsible for explicitly checking for EOF before calling select() on the underlying OS socket." Does anyone have a better idea?

Safihre · 2021-05-21T15:25:14Z

Anyone that can give input? So @hashbrowncipher can continue.

github-actions · 2021-06-21T00:06:51Z

This PR is stale because it has been open for 30 days with no activity.

dimaqq · 2021-07-01T01:08:19Z

Should this pull request no longer be a Draft?

pitrou · 2021-07-01T08:44:21Z

Does anyone know why the CI is blocked on this PR?

tirkarthi · 2021-07-01T11:36:15Z

It could be due to being a draft PR

https://docs.microsoft.com/en-us/azure/devops/release-notes/2020/sprint-173-update#github-draft-pull-requests-do-not-trigger-pipelines
https://developercommunity.visualstudio.com/t/github-draft-pull-requests-should-trigger-the-pipe/1140412

jdevries3133 · 2021-07-01T13:14:33Z

@hashbrowncipher, if you put a comment on this post that says, "I have made the requested changes; please review again," I think the bot will tag this PR as needing core dev review. That might help you get your answers.

tiran

Please address my comments and implement the same feature for write().

tiran · 2021-07-01T13:46:53Z

Modules/_ssl.c

@@ -2334,10 +2334,11 @@ _ssl__SSLSocket_read_impl(PySSLSocket *self, int len, int group_right_1,
    PyObject *dest = NULL;
    char *mem;
    size_t count = 0;
+    size_t got = 0;


Suggested change

size_t got = 0;

size_t readbytes;

tiran · 2021-07-01T13:46:53Z

Modules/_ssl.c

    int retval;
    int sockstate;
    _PySSLError err;
-    int nonblocking;
+    int nonblocking = 0;


Why is it necessary to initialize the variable with 0?

tiran · 2021-07-01T13:46:53Z

Modules/_ssl.c

+            retval = SSL_read_ex(self->ssl, mem + got, len, &count);
+            if(retval <= 0) {
+                break;
+            }
+
+            got += count;
+            len -= count;
+        } while(nonblocking && len > 0);


got and count are confusing variable names. I suggest that you keep count as counter for the read() function call and introduce a new variable for SSL_read_ex() call.

Is it a good idea to have a potentially infinite loop here that ignores signals and timeouts? This smells like DoS vunlerable to be happen.

Suggested change

retval = SSL_read_ex(self->ssl, mem + got, len, &count);

if(retval <= 0) {

break;

}

got += count;

len -= count;

} while(nonblocking && len > 0);

retval = SSL_read_ex(self->ssl, mem + got, len, &readbytes);

if (retval <= 0) {

break;

}

count += readbytes;

len -= readbytes;

} while (nonblocking && len > 0);

tiran · 2021-07-01T13:46:53Z

Modules/_ssl.c

        err = _PySSL_errno(retval == 0, self->ssl, retval);
        PySSL_END_ALLOW_THREADS
        self->err = err;

+        if(got > 0) {


Suggested change

if(got > 0) {

if (count > 0) {

tiran · 2021-07-01T13:46:53Z

Lib/test/test_ssl.py

+                            return
+                        size -= count
+
+            raise AssertionError("All TLS reads were smaller than 16KB")


Suggested change

raise AssertionError("All TLS reads were smaller than 16KB")

self.fail("All TLS reads were smaller than 16KB")

Safihre · 2021-09-29T07:42:19Z

As @hashbrowncipher seems to have abandoned this (I tried to reach him on Twitter also, but no response), I will rebase and apply the requested changes.

Of course if @hashbrowncipher wants to continue, of course I will wait 😃

dimaqq · 2022-01-25T02:25:44Z

@Safihre please do your best :)

Safihre · 2022-01-31T09:42:48Z

@tiran/@vstinner I have ran into the same problem as @hashbrowncipher: some code (like the FTP-tests) do not expect the SSLSocket.read() to consume the EOF.
Do you agree with @hashbrowncipher that there should be a new parameter (for example on the SSLContext) to enable this new behaviour?
If you agree, then I will try to implement and also add the same for SSLSocket.write().

dimaqq · 2022-02-01T00:55:08Z

my 2c, as an end-user:
I think the EOF handling is a leftover from bygone days;
Today the user code would rather open a new connection, thus .read() consuming EOF is totally reasonable...
Perhaps if that breaks code that calls .read() again after, it needs to re-report EOF or raise or something 🤔

the-knights-who-say-ni added the CLA signed label Apr 20, 2021

bedevere-bot added the awaiting review label Apr 20, 2021

tiran requested changes Apr 20, 2021

View changes

Lib/test/test_ssl.py Outdated Show resolved Hide resolved

Modules/_ssl.c Outdated Show resolved Hide resolved

Modules/_ssl.c Outdated Show resolved Hide resolved

Modules/signalmodule.c Outdated Show resolved Hide resolved

bedevere-bot added awaiting changes and removed awaiting review labels Apr 20, 2021

tiran self-assigned this Apr 20, 2021

tiran requested review from vstinner and njsmith Apr 20, 2021

vstinner reviewed Apr 20, 2021

View changes

hashbrowncipher force-pushed the bpo37355 branch from 0dd80a2 to fa96128 Compare Apr 20, 2021

hashbrowncipher force-pushed the bpo37355 branch from fa96128 to 6305ff3 Compare Apr 22, 2021

puzzledsab mentioned this pull request Apr 29, 2021

Add article queue and change article tries system sabnzbd/sabnzbd#1870

Merged

github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 21, 2021

tiran requested changes Jul 1, 2021

View changes

Safihre mentioned this pull request Oct 5, 2021

Python 3.10 deprecates distuils ... find alternative for distutils.dir_util.copy_tree() sabnzbd/sabnzbd#1958

Closed

This was referenced Feb 14, 2022

For discussion: SABnzbd Wizard: turn on SSL Checkbox by default? sabnzbd/sabnzbd#2081

Closed

Python bindings animetosho/node-yencode#11

Closed

Safihre mentioned this pull request Feb 22, 2022

gh-81536: For nonblocking sockets, add SSLSocket.eager_recv to call SSL_read in a loop #31492

Open

hashbrowncipher mannequin mentioned this pull request Apr 13, 2022

SSLSocket.read does a GIL round-trip for every 16KB TLS record #81536

Open

ezio-melotti removed the CLA signed label Jul 13, 2022

github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 6, 2022

bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478

bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478

hashbrowncipher commented Apr 20, 2021 •

edited by bedevere-bot

tiran left a comment

bedevere-bot commented Apr 20, 2021

vstinner left a comment

hashbrowncipher commented Apr 20, 2021

hashbrowncipher commented Apr 20, 2021

njsmith commented Apr 20, 2021

hashbrowncipher commented Apr 20, 2021

njsmith commented Apr 20, 2021

hashbrowncipher commented Apr 22, 2021

hashbrowncipher commented Apr 22, 2021

Safihre commented May 21, 2021

github-actions bot commented Jun 21, 2021

dimaqq commented Jul 1, 2021

pitrou commented Jul 1, 2021

tirkarthi commented Jul 1, 2021

jdevries3133 commented Jul 1, 2021 •

edited

tiran left a comment

tiran Jul 1, 2021

tiran Jul 1, 2021

tiran Jul 1, 2021

tiran Jul 1, 2021

tiran Jul 1, 2021

Safihre commented Sep 29, 2021 •

edited

dimaqq commented Jan 25, 2022

Safihre commented Jan 31, 2022

dimaqq commented Feb 1, 2022

	raise AssertionError("All TLS reads were smaller than 16KB")
	self.fail("All TLS reads were smaller than 16KB")

bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478

Are you sure you want to change the base?

bpo-37355: For nonblocking sockets, call SSL_read in a loop #25478

Conversation

hashbrowncipher commented Apr 20, 2021 • edited by bedevere-bot

tiran left a comment

bedevere-bot commented Apr 20, 2021

vstinner left a comment

hashbrowncipher commented Apr 20, 2021

hashbrowncipher commented Apr 20, 2021

njsmith commented Apr 20, 2021

hashbrowncipher commented Apr 20, 2021

njsmith commented Apr 20, 2021

hashbrowncipher commented Apr 22, 2021

hashbrowncipher commented Apr 22, 2021

Safihre commented May 21, 2021

github-actions bot commented Jun 21, 2021

dimaqq commented Jul 1, 2021

pitrou commented Jul 1, 2021

tirkarthi commented Jul 1, 2021

jdevries3133 commented Jul 1, 2021 • edited

tiran left a comment

tiran Jul 1, 2021

Choose a reason for hiding this comment

tiran Jul 1, 2021

Choose a reason for hiding this comment

tiran Jul 1, 2021

Choose a reason for hiding this comment

tiran Jul 1, 2021

Choose a reason for hiding this comment

tiran Jul 1, 2021

Choose a reason for hiding this comment

Safihre commented Sep 29, 2021 • edited

dimaqq commented Jan 25, 2022

Safihre commented Jan 31, 2022

dimaqq commented Feb 1, 2022

hashbrowncipher commented Apr 20, 2021 •

edited by bedevere-bot

jdevries3133 commented Jul 1, 2021 •

edited

Safihre commented Sep 29, 2021 •

edited