Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-35926: Add support for OpenSSL 1.1.1b on Windows #11779

Merged
merged 19 commits into from May 15, 2019

Conversation

paulmon
Copy link
Contributor

@paulmon paulmon commented Feb 7, 2019

After a little more testing I discovered that these changes don't quite pass all of the tests yet.
I will be looking into these failures tomorrow

amd64 retail - 1 failure
amd64 debug - 0 failures
win32 retail - 1 failures
win32 debug - access violation halts test

test_parse_cert_CVE_2019_5010 only fails win32 debug (access violation)
works for amd64 debug/release and win32 release

test_load_default_certs_env_windows fails on win32 and amd64 retail. skipped on debug

https://bugs.python.org/issue35926

@paulmon paulmon requested a review from a team as a code owner Feb 7, 2019
@the-knights-who-say-ni
Copy link

the-knights-who-say-ni commented Feb 7, 2019

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

<ConfigurationType>Makefile</ConfigurationType>
<Bitness>64</Bitness>
<ArchName>amd64</ArchName>
<OpenSSLPlatform>VC-WIN64A</OpenSSLPlatform>
Copy link
Contributor Author

@paulmon paulmon Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appveyor.yml uses VC-WIN64A-masm for this build. Should this be updated?

Copy link
Member

@zooba zooba Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean appveyor in the OpenSSL repository? If so, yeah, we should match them.

Copy link
Contributor Author

@paulmon paulmon Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's what I meant

Copy link
Contributor Author

@paulmon paulmon Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to VC-WIN64A-masm breaks the cpython build with a message that's not helpful:
Error: The operation could not be completed. Unspecified error

If I'm reading the configure files correctly then "VC-WIN64A" is the 64-bit equivalent of "VC-WIN32 no-asm" and it works with the cpython build.

Copy link
Member

@zooba zooba Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't be being used in the CPython build - it's used to build OpenSSL separately and then we dynamically link to it (I might consider moving it to Tools to better reflect this...). Maybe I'll have to try it and see what happens.

Copy link
Contributor Author

@paulmon paulmon Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used in the regular build but I needed to build openssl with the patch the prepare_ssl.bat applies before I could test changing to openssl 1.1.1a. I assumed that this script is used to build the binaries for cpython-bin-deps based on the comments in PCBuild/readme.txt

Copy link
Member

@zooba zooba Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this file isn't used at all for the official builds I've been doing (example). And it doesn't quite match how I've been building either... I'll take a look and see if I can get the build done.

@paulmon
Copy link
Contributor Author

paulmon commented Feb 7, 2019

The access violation was fixed by a clean build.

The other failure looks like the value for x509 was incremented in the certificate store compared to the keycert.pem file that is checked in. I'm still trying to learn what this means.

176 != 177,
AssertionError: {'x509': 176, 'crl': 0, 'x509_ca': 137} != {'x509': 177, 'crl': 0, 'x509_ca': 137}

@paulmon
Copy link
Contributor Author

paulmon commented Feb 7, 2019

test_load_default_certs_env_windows is failing in my test build because after calling _wputenv(…) from EnvironmentVarGaurd() in python, when openssl calls getenv the environment variable isn't set so the key that was added to the path with the environment variables isn't counted... still trying to figure out why this happens. Could python and openssl be linking to different CRTs even though they are both retail?

This ended up being caused by using a debug openssl.dll with a release build of python.

Lib/test/test_ssl.py Outdated Show resolved Hide resolved
Lib/test/test_ssl.py Outdated Show resolved Hide resolved
<ConfigurationType>Makefile</ConfigurationType>
<Bitness>64</Bitness>
<ArchName>amd64</ArchName>
<OpenSSLPlatform>VC-WIN64A</OpenSSLPlatform>
Copy link
Member

@zooba zooba Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't be being used in the CPython build - it's used to build OpenSSL separately and then we dynamically link to it (I might consider moving it to Tools to better reflect this...). Maybe I'll have to try it and see what happens.

@paulmon
Copy link
Contributor Author

paulmon commented Feb 8, 2019

I ended up creating a new clone, rebuilding openssl with the current version of prepare_ssl, and building cpython clean. When I run the tests for x86, and amd64, debug and retail for each they all currently pass.

Copy link
Member

@tiran tiran left a comment

Thanks for working on this. For some unknown reason, Windows behaves slightly different than Linux. I wonder what's going on.

#10031 adds some debug helpers to analyse the protocol. I'll try to land the PR first, then you can use it to diagnose the handshake.

Lib/test/test_ssl.py Outdated Show resolved Hide resolved
self.close()
self.running = False
except OSError as err:
if 'peer did not return a certificate' in err.args[1]:
Copy link
Member

@tiran tiran Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why I haven't run into this issue on Linux.

Copy link
Contributor Author

@paulmon paulmon Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is coming from Winsock. Could there be a difference in the way Winsock handles certs and the way linux sockets handle certs that is causing this difference?

Copy link
Contributor Author

@paulmon paulmon Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the tests that are failing are failing in the same way as Linux. I can tell because there a comments and expected exceptions in multiple places that say something like "sometimes connections fail with TLS 1.3"

The tests are opening a socket doing a handshake and the disconnecting the client before the session tickets are finished sending which results in various disconnect errors. A real client would send something before closing the session, which appears to allow the session tickets to get sent. In my mind these are bad tests.

There are also these issues in the openssl issues about the session ticket protocol: openssl/openssl#7948, openssl/openssl#7967, python-trio/trio#819 which don't really seem resolved.

What is needed to move this PR forward?

Copy link
Member

@zooba zooba Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the first of those issues, it seems like we should fix our tests to be negotiating the close properly.

Copy link
Contributor Author

@paulmon paulmon Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran Why the OSError case is different on Linux I'm not sure. On windows the OSError I was trying to handle was SSLError exception (based on OSError). There is a race condition between the server getting the PEER_DID_NOT_RETURN_A_CERTIFICATE error and the client receiving the "tvs13 alert ..." error on Windows. One of them always arrives but the protocol doesn't look like it guarantees enough synchronicity to force one of them to be always be first.

@zooba The first 3 tests failures were caused by a bug in openssl so I have removed the try/excepts.
I have submitted patches for openssl
openssl/openssl#8590
python/cpython-source-deps#13

@zooba
Copy link
Member

zooba commented Feb 10, 2019

I queued up a build on pipelines here with the built binaries. It looked good apart from one thing that I'm about to comment on.

@zooba
Copy link
Member

zooba commented Feb 10, 2019

Correction - not going to point it out on the file because apparently you haven't touched the file?

PCbuild/openssl.props needs updating. In particular, x64 is no longer added to the filenames for the 64-bit build (though -arm and -arm64 are added for those builds).

@paulmon
Copy link
Contributor Author

paulmon commented Feb 11, 2019

The -x64 is dropped when you change the openssl build type from VC-WIN64A to VC-WIN64A-masm. The props file seems to be the source of my mysterious build failure when I was trying out the VC-WIN64A-masm build.

@zooba zooba changed the title bpo-35926code and test changes for OpenSSL 1.1.1a for Windows bpo-35926: Add support for OpenSSL 1.1.1a on Windows Feb 11, 2019
@zooba
Copy link
Member

zooba commented Feb 11, 2019

I'm not seeing why Bedevere thinks the NEWS file is incorrectly formatted - it looks fine to me? I renamed the PR to see if that helps, but apparently not (or maybe it needs a new commit to reevaluate?)

Maybe try deleting and recreating the NEWS entry? Also specify in the text that this only updates Windows - @ned-deily will add his own entry for macOS.

@ned-deily
Copy link
Member

ned-deily commented Feb 11, 2019

I also don't see anything wrong with the news entry and, more importantly, neither does blurb when I run it locally. I don't know what's going on on the bot side. If it persists, we may need help from @Mariatta or @brettcannon.

@Mariatta
Copy link
Sponsor Member

Mariatta commented Feb 11, 2019

Looks fine, I'll try adding the skip news and removing it, in case it retrigger the check.

@Mariatta
Copy link
Sponsor Member

Mariatta commented Feb 11, 2019

The news check failed because it is expecting the news entry to be more than 45 characters, so please re-write the news entry.

@Mariatta
Copy link
Sponsor Member

Mariatta commented Feb 11, 2019

Edit: the actual news entry has to be at least 30 characters.

@ned-deily
Copy link
Member

ned-deily commented Feb 11, 2019

@Mariatta Can you say what is expecting the news entry to be that long and why? AFAIK, neither blurb nor the docs build imposes a requirement.

@Mariatta
Copy link
Sponsor Member

Mariatta commented Feb 11, 2019

@ned-deily There was discussion here: python/bedevere#127

@ned-deily
Copy link
Member

ned-deily commented Feb 11, 2019

I guess we now have a counter-example. Personally, I don’t see a need to enforce a limit in the bot; in the end the release manager will review and edit the news entries anyway prior to release. OTOH I suppose the item could be padded.

@Mariatta
Copy link
Sponsor Member

Mariatta commented Feb 11, 2019

🤔 add "Patch by Paul Monson."? 😅

I'm fine to tweak this to "as long as file not empty" if that's what we want.

@ned-deily
Copy link
Member

ned-deily commented Feb 11, 2019

That’s one solution!

I think the fact that we’ve all spent so much time on tracking this down and discussing it says to me that having the arbitrary limit is more trouble than it is worth.

@zooba
Copy link
Member

zooba commented Apr 29, 2019

Still hoping for @tiran to take a look at the SSL test changes, but since I'll be in the same room as him in a couple of days I can "negotiate" more strongly.

I think we want the OpenSSL update for sure though, so getting this done ought to be easy enough.

@zooba
Copy link
Member

zooba commented Apr 29, 2019

@paulmon There's a new OpenSSL build available. Will that remove the need for the changes to the tests (i.e. we're throwing the right exception now)?

@paulmon
Copy link
Contributor Author

paulmon commented Apr 29, 2019

That patch gets rid of the exceptions that were removed in this commit: 016a485. I looked at the other exceptions with wireshark and didn't spot a way to make them match what happens on linux. I'm taking another look at the changes to verify whether they are fixed by the new build.

if self.server.chatty and support.verbose:
sys.stdout.write(err.args[1])
# test_pha_required_nocert is expecting this exception
raise ssl.SSLError('tlsv13 alert certificate required')
Copy link
Contributor Author

@paulmon paulmon Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that sockets are blocking by default on Linux, and non-blocking on Windows. This could cause this difference, maybe. When I step through the socket creation code the socket is created on this line on Windows:

fd = WSASocketW(family, type, proto,
                       NULL, 0,
                       WSA_FLAG_OVERLAPPED | WSA_FLAG_NO_HANDLE_INHERIT);

WSA_FLAG_OVERLAPPED specifies a non-blocking socket.

Copy link
Contributor Author

@paulmon paulmon May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the remaining errors are caused when the server side fails first, but would fail on the client first if the sockets were blocking

@zooba
Copy link
Member

zooba commented May 1, 2019

Okay, we may have to hold this for a bit longer.

I'll see if I can get some people who know this area to look it over at PyCon.

@paulmon
Copy link
Contributor Author

paulmon commented May 1, 2019

I'll see if I can get some people who know this area to look it over at PyCon.

I think that would be really helpful. Thanks!

@zooba
Copy link
Member

zooba commented May 14, 2019

I've run the test suite a few times without issues, and I'm doing a custom buildbot run right now. If that comes up clear, I see no reason to keep from merging this.

@zooba
Copy link
Member

zooba commented May 14, 2019

@tiran Any last comments or concerns?

.travis.yml Outdated Show resolved Hide resolved
@zooba
Copy link
Member

zooba commented May 15, 2019

Just noticed one more change to the Travis CI configuration. Which seems to work okay, but isn't part of enabling OpenSSL 1.1.1b on Windows so we'll let Christian do it when he's ready.

Co-Authored-By: Steve Dower <steve.dower@microsoft.com>
@paulmon
Copy link
Contributor Author

paulmon commented May 15, 2019

@zooba Makes sense to me. Merged your suggested change.

@miss-islington
Copy link
Contributor

miss-islington commented May 15, 2019

Thanks @paulmon for the PR, and @zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒🤖

@miss-islington
Copy link
Contributor

miss-islington commented May 15, 2019

Sorry, @paulmon and @zooba, I could not cleanly backport this to 3.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker fb7e7505ed1337bf40fa7b8b68317d1e86675a86 3.7

@bedevere-bot
Copy link

bedevere-bot commented May 15, 2019

GH-13350 is a backport of this pull request to the 3.7 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants