New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-35926: Add support for OpenSSL 1.1.1b on Windows #11779
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
PCbuild/openssl.vcxproj
Outdated
<ConfigurationType>Makefile</ConfigurationType> | ||
<Bitness>64</Bitness> | ||
<ArchName>amd64</ArchName> | ||
<OpenSSLPlatform>VC-WIN64A</OpenSSLPlatform> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appveyor.yml uses VC-WIN64A-masm for this build. Should this be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean appveyor in the OpenSSL repository? If so, yeah, we should match them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's what I meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to VC-WIN64A-masm breaks the cpython build with a message that's not helpful:
Error: The operation could not be completed. Unspecified error
If I'm reading the configure files correctly then "VC-WIN64A" is the 64-bit equivalent of "VC-WIN32 no-asm" and it works with the cpython build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file shouldn't be being used in the CPython build - it's used to build OpenSSL separately and then we dynamically link to it (I might consider moving it to Tools
to better reflect this...). Maybe I'll have to try it and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used in the regular build but I needed to build openssl with the patch the prepare_ssl.bat applies before I could test changing to openssl 1.1.1a. I assumed that this script is used to build the binaries for cpython-bin-deps based on the comments in PCBuild/readme.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this file isn't used at all for the official builds I've been doing (example). And it doesn't quite match how I've been building either... I'll take a look and see if I can get the build done.
The access violation was fixed by a clean build. The other failure looks like the value for x509 was incremented in the certificate store compared to the keycert.pem file that is checked in. I'm still trying to learn what this means. 176 != 177, |
test_load_default_certs_env_windows is failing in my test build because after calling _wputenv(…) from EnvironmentVarGaurd() in python, when openssl calls getenv the environment variable isn't set so the key that was added to the path with the environment variables isn't counted... still trying to figure out why this happens. Could python and openssl be linking to different CRTs even though they are both retail? This ended up being caused by using a debug openssl.dll with a release build of python. |
PCbuild/openssl.vcxproj
Outdated
<ConfigurationType>Makefile</ConfigurationType> | ||
<Bitness>64</Bitness> | ||
<ArchName>amd64</ArchName> | ||
<OpenSSLPlatform>VC-WIN64A</OpenSSLPlatform> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file shouldn't be being used in the CPython build - it's used to build OpenSSL separately and then we dynamically link to it (I might consider moving it to Tools
to better reflect this...). Maybe I'll have to try it and see what happens.
I ended up creating a new clone, rebuilding openssl with the current version of prepare_ssl, and building cpython clean. When I run the tests for x86, and amd64, debug and retail for each they all currently pass. |
Lib/test/test_ssl.py
Outdated
self.close() | ||
self.running = False | ||
except OSError as err: | ||
if 'peer did not return a certificate' in err.args[1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why I haven't run into this issue on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is coming from Winsock. Could there be a difference in the way Winsock handles certs and the way linux sockets handle certs that is causing this difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the tests that are failing are failing in the same way as Linux. I can tell because there a comments and expected exceptions in multiple places that say something like "sometimes connections fail with TLS 1.3"
The tests are opening a socket doing a handshake and the disconnecting the client before the session tickets are finished sending which results in various disconnect errors. A real client would send something before closing the session, which appears to allow the session tickets to get sent. In my mind these are bad tests.
There are also these issues in the openssl issues about the session ticket protocol: openssl/openssl#7948, openssl/openssl#7967, python-trio/trio#819 which don't really seem resolved.
What is needed to move this PR forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the first of those issues, it seems like we should fix our tests to be negotiating the close properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiran Why the OSError case is different on Linux I'm not sure. On windows the OSError I was trying to handle was SSLError exception (based on OSError). There is a race condition between the server getting the PEER_DID_NOT_RETURN_A_CERTIFICATE error and the client receiving the "tvs13 alert ..." error on Windows. One of them always arrives but the protocol doesn't look like it guarantees enough synchronicity to force one of them to be always be first.
@zooba The first 3 tests failures were caused by a bug in openssl so I have removed the try/excepts.
I have submitted patches for openssl
openssl/openssl#8590
python/cpython-source-deps#13
I queued up a build on pipelines here with the built binaries. It looked good apart from one thing that I'm about to comment on. |
Correction - not going to point it out on the file because apparently you haven't touched the file?
|
The -x64 is dropped when you change the openssl build type from VC-WIN64A to VC-WIN64A-masm. The props file seems to be the source of my mysterious build failure when I was trying out the VC-WIN64A-masm build. |
I'm not seeing why Bedevere thinks the NEWS file is incorrectly formatted - it looks fine to me? I renamed the PR to see if that helps, but apparently not (or maybe it needs a new commit to reevaluate?) Maybe try deleting and recreating the NEWS entry? Also specify in the text that this only updates Windows - @ned-deily will add his own entry for macOS. |
I also don't see anything wrong with the news entry and, more importantly, neither does blurb when I run it locally. I don't know what's going on on the bot side. If it persists, we may need help from @Mariatta or @brettcannon. |
Looks fine, I'll try adding the skip news and removing it, in case it retrigger the check. |
The news check failed because it is expecting the news entry to be more than 45 characters, so please re-write the news entry. |
Edit: the actual news entry has to be at least 30 characters. |
@Mariatta Can you say what is expecting the news entry to be that long and why? AFAIK, neither blurb nor the docs build imposes a requirement. |
@ned-deily There was discussion here: python/bedevere#127 |
I guess we now have a counter-example. Personally, I don’t see a need to enforce a limit in the bot; in the end the release manager will review and edit the news entries anyway prior to release. OTOH I suppose the item could be padded. |
I'm fine to tweak this to "as long as file not empty" if that's what we want. |
That’s one solution! I think the fact that we’ve all spent so much time on tracking this down and discussing it says to me that having the arbitrary limit is more trouble than it is worth. |
Still hoping for @tiran to take a look at the SSL test changes, but since I'll be in the same room as him in a couple of days I can "negotiate" more strongly. I think we want the OpenSSL update for sure though, so getting this done ought to be easy enough. |
@paulmon There's a new OpenSSL build available. Will that remove the need for the changes to the tests (i.e. we're throwing the right exception now)? |
That patch gets rid of the exceptions that were removed in this commit: 016a485. I looked at the other exceptions with wireshark and didn't spot a way to make them match what happens on linux. I'm taking another look at the changes to verify whether they are fixed by the new build. |
if self.server.chatty and support.verbose: | ||
sys.stdout.write(err.args[1]) | ||
# test_pha_required_nocert is expecting this exception | ||
raise ssl.SSLError('tlsv13 alert certificate required') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that sockets are blocking by default on Linux, and non-blocking on Windows. This could cause this difference, maybe. When I step through the socket creation code the socket is created on this line on Windows:
fd = WSASocketW(family, type, proto,
NULL, 0,
WSA_FLAG_OVERLAPPED | WSA_FLAG_NO_HANDLE_INHERIT);
WSA_FLAG_OVERLAPPED specifies a non-blocking socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the remaining errors are caused when the server side fails first, but would fail on the client first if the sockets were blocking
Okay, we may have to hold this for a bit longer. I'll see if I can get some people who know this area to look it over at PyCon. |
I think that would be really helpful. Thanks! |
I've run the test suite a few times without issues, and I'm doing a custom buildbot run right now. If that comes up clear, I see no reason to keep from merging this. |
@tiran Any last comments or concerns? |
Just noticed one more change to the Travis CI configuration. Which seems to work okay, but isn't part of enabling OpenSSL 1.1.1b on Windows so we'll let Christian do it when he's ready. |
Co-Authored-By: Steve Dower <steve.dower@microsoft.com>
@zooba Makes sense to me. Merged your suggested change. |
Sorry, @paulmon and @zooba, I could not cleanly backport this to |
GH-13350 is a backport of this pull request to the 3.7 branch. |
After a little more testing I discovered that these changes don't quite pass all of the tests yet.
I will be looking into these failures tomorrow
amd64 retail - 1 failure
amd64 debug - 0 failures
win32 retail - 1 failures
win32 debug - access violation halts test
test_parse_cert_CVE_2019_5010 only fails win32 debug (access violation)
works for amd64 debug/release and win32 release
test_load_default_certs_env_windows fails on win32 and amd64 retail. skipped on debug
https://bugs.python.org/issue35926