-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-38912: fix close before connect callback on SSL tests #22691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-38912: fix close before connect callback on SSL tests #22691
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to replicate the failure under normal conditions (on Arch Linux, kernel 5.8.x). However, adding time.sleep(.1)
to the start of MyProto.connection_made()
resulted in the race condition occurring regularly:
======================================================================
ERROR: test_create_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_events.py", line 1111, in test_create_server_ssl_verified
proto.transport.close()
AttributeError: 'NoneType' object has no attribute 'close'
Including the time.sleep(.1)
change, I ran ~500 iterations of test_create_server_ssl_verified
using the changes applied in this PR:
[aeros:~/repos/cpython]$ ./python -m test -v test_asyncio.test_events --match test_create_server_ssl_verified -j4 -F
...
Ran 3 tests in 0.709s
OK
0:02:28 load avg: 2.63 [565] test_asyncio.test_events passed
test_create_server_ssl_verified (test.test_asyncio.test_events.EPollEventLoopTests) ... ok
test_create_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests) ... ok
test_create_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
and ~500 iterations of test_create_unix_server_ssl_verified
:
[aeros:~/repos/cpython]$ ./python -m test -v test_asyncio.test_events --match test_create_unix_server_ssl_verified -j4 -F
...
Ran 3 tests in 0.215s
OK
0:01:23 load avg: 5.05 [508] test_asyncio.test_events passed
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.EPollEventLoopTests) ... ok
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.PollEventLoopTests) ... ok
test_create_unix_server_ssl_verified (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
So, I think it's safe to say that it was indeed an issue with a race condition between the event loop and the test prototype's connection_made(), as suggested in the bpo issue, and that this patch addresses it. To be additionally certain that it no longer occurs on other platforms, I'll add a test-with-buildbots
label (note that other intermittent failures unrelated to this PR could potentially occur, so we'll be specifically looking out for relevant test_asyncio.test_events
issues).
Thanks for running them and help test, @aeros. Looks like all passed except for an ENV_CHANGED from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it looks like we reached a similar error within ssl_proto
(from warnings log in buildbot/PPC64LE RHEL8 Refleaks PR
):
...
./home/buildbot/buildarea/pull_request.cstratak-RHEL8-ppc64le.refleak/build/Lib/asyncio/sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x7fffa21fc830>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Warning -- Unraisable exception
Exception ignored in: <function _SSLProtocolTransport.__del__ at 0x7fffaa142c30>
/home/buildbot/buildarea/pull_request.cstratak-RHEL8-ppc64le.refleak/build/Lib/asyncio/selector_events.py:704: ResourceWarning: unclosed transport <_SelectorSocketTransport closing fd=9>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
At the very least though, I think that based on my local testing and buildbot fleet results, the failure occurs much less commonly now, which is a clear improvement from before. As the core dev that's probably the most experienced in chasing down buildbot failures, @vstinner, do you have any thoughts? My vote is +1, but I wouldn't mind a second opinion here.
I've decided to proceed with merging since it's a clear difference from before with the test failure happening significantly less often (even if it doesn't 100% address the issue). Thanks @JustinTArthur! |
Thanks @JustinTArthur for the PR, and @aeros for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8, 3.9. |
pythonGH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it. (cherry picked from commit de73d43) Co-authored-by: Justin Turner Arthur <justinarthur@gmail.com>
GH-22799 is a backport of this pull request to the 3.9 branch. |
pythonGH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it. (cherry picked from commit de73d43) Co-authored-by: Justin Turner Arthur <justinarthur@gmail.com>
GH-22800 is a backport of this pull request to the 3.8 branch. |
Thanks for the review and testing, @aeros. I hope to get some time to look at the remaining. We might be able to do a sweeping change to wrap cleanup activities in finally blocks or context managers for most of those tests to help ensure that only the leading exception is raised. |
pythonGH-22691) Reduces the rate at which the ENV CHANGED failure occurs in test_asyncio SSL tests (due to unclosed transport), but does not 100% resolve it.
As shown in bpo-38912 comments' log entries,
test_create_unix_server_ssl_verified
andtest_create_server_ssl_verified
could end up trying to close a transport they didn't have a reference to yet because a testProtocol
'sconnection_made
callback had yet to be called by the event loop:This PR ensures the callback is waited on before closing the connection to the client (and assuming that the desired SSL connectivity was established).
https://bugs.python.org/issue38912