New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace built-in hashlib with verified implementations from HACL* #99108
Comments
(missing context: there was a private email thread cc'ing ~7 of us including myself, @tiran, @alex, @msprotz & others that led to this issue and PR) in response to #98517 CVE happening with our old XKCP sha3 vendored code. One reason I might support this work across all of our required always supported hashes is that if these perform well, I'd be happy to see the If this sounds like the opposite of my statements over buried in the blake3 rejection thread - it's because I did not seek an alternative to the variety of vendored fallback hash algorithm implementations we carry there. So the goal in those statements was to lean towards simplicity instead of complexity. (which has already paid off via our XKCP removal in favor of tiny_sha3 by not having Python 3.11+ suffer from the sha3 XKCP CVE that the larger complex third party code in 3.6-3.10 did) We provide and ship native implementations for the hash algorithms anyways as we support builds without OpenSSL present, and support building on systems with libssl versions or variants that do not provide everything. Q: Would adopting this code increase our maintenance burden? A: Probably, yes. We'd presumably want to update it with more recent versions from hacl-star from time to time. Though the theory is that there should never be a security need to do so? In the past we've been asked to update vendored hash algorithm code from time to time but have hesitated due to the complexity and perceived risk, and assumed lack of reward given people who care about performance would presumably have a modern OpenSSL to get that from. Though if we did get rid of the OpenSSL wrapper code that'd also be a reduced burden. Q: What about performance? A: Good question. That needs continual measuring on our Tiered supported platforms. Don't focus solely on thruput. OpenSSL's EVP APIs are known to have silly-slow setup time. So something hashing a pile of 234 byte objects may be slow no matter how good OpenSSL's bulk data thruput is. Benchmark small message thruput as well as bulk data thruput. Parting thought: There is a broader desire to wean us off of our OpenSSL dependency. I realize that's a hard thing so long as the |
Thanks for your comment, Greg. Some related thoughts.
None of this is infeasible -- it's just a matter of deciding what is good for a Python-in-the-future where hashlib no longer depends on OpenSSL. For instance, you might say "let's maintain a portable C version, and one single ASM version that uses SHAEXT when available (fastest)", and this is something we could make happen within HACL*. Just to give you a sense of what we have right now, allow me to go into greater detail:
Hope this helps. As you said, this kind of change is a first step in the right direction. |
replacing hashlib primitives (for the non-OpenSSL case) with verified implementations from HACL*. This is the first PR in the series, and focuses specifically on SHA2-256 and SHA2-224. This PR imports Hacl_Streaming_SHA2 into the Python tree. This is the HACL* implementation of SHA2, which combines a core implementation of SHA2 along with a layer of buffer management that allows updating the digest with any number of bytes. This supersedes the previous implementation in the tree. @franziskuskiefer was kind enough to benchmark the changes: in addition to being verified (thus providing significant safety and security improvements), this implementation also provides a sizeable performance boost! ``` --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- Sha2_256_Streaming 3163 ns 3160 ns 219353 // this PR LibTomCrypt_Sha2_256 5057 ns 5056 ns 136234 // library used by Python currently ``` The changes in this PR are as follows: - import the subset of HACL* that covers SHA2-256/224 into `Modules/_hacl` - rewire sha256module.c to use the HACL* implementation Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Alright, the first PR is in. Checkbox time:
|
Great! sha384/sha512 is a no-brainer and I'll send a followup PR very soon. We have SHA3 as well, so I can send that in, too, with the caveat that we have a few performance optimizations for sha3 in the works, so we can either wait for those to land in HACL* then submit the Python PR, or first land SHA3 in Python, then send a followup PR to refresh the HACL* code once the performance tweaks have landed. Happy to hear your thoughts on this. |
Replace the builtin hashlib implementations of SHA2-384 and SHA2-512 originally from LibTomCrypt with formally verified, side-channel resistant code from the [HACL*](https://github.com/hacl-star/hacl-star/) project. The builtins remain a fallback only used when OpenSSL does not provide them.
Given that the SHA3 implementation we currently ship is a tiny poor performing one, no need to wait. I expect we'll pull in updates to the HACL* implementations as needed in the future when there is a motivating reason. |
This builds HACL* as a library in one place. A followup to #101707 which broke some WASM builds. This fixes 2/4 of them, but the enscripten toolchain in the others don't deduplicate linker arguments and error out. A follow-on PR will address those.
This merges their code. They're backed by the same single HACL* star static library, having them be a single module simplifies maintenance. This should unbreak the wasm enscripten builds that currently fail due to linking in --whole-archive mode and the HACL* library appearing twice. Long unnoticed error fixed: `_sha512.SHA384Type` was doubly assigned and was actually SHA512Type. Nobody depends on those internal names.
This merges their code. They're backed by the same single HACL* static library, having them be a single module simplifies maintenance. This should unbreak the wasm enscripten builds that currently fail due to linking in --whole-archive mode and the HACL* library appearing twice. Long unnoticed error fixed: _sha512.SHA384Type was doubly assigned and was actually SHA512Type. Nobody depends on those internal names. Also rename LIBHACL_ make vars to LIBHACL_SHA2_ in preperation for other future HACL things.
Replaces our fallback non-OpenSSL MD5 and SHA1 implementations with those from HACL* as we've already done with SHA2.
* main: (76 commits) Fix syntax error in struct doc example (python#102160) pythongh-99108: Import MD5 and SHA1 from HACL* (python#102089) pythonGH-101777: `queue.rst`: use 2 spaces after a period to be consistent. (python#102143) Few coverage nitpicks for the cmath module (python#102067) pythonGH-100982: Restrict `FOR_ITER_RANGE` to a single instruction to allow instrumentation. (pythonGH-101985) pythongh-102135: Update turtle docs to rename wikipedia demo to rosette (python#102137) pythongh-99942: python.pc on android/cygwin should link to libpython per configure.ac (pythonGH-100356) pythongh-95672 fix typo SkitTest to SkipTest (pythongh-102119) pythongh-101936: Update the default value of fp from io.StringIO to io.BytesIO (pythongh-102100) pythongh-102008: simplify test_except_star by using sys.exception() instead of sys.exc_info() (python#102009) pythongh-101903: Remove obsolete undefs for previously removed macros Py_EnterRecursiveCall and Py_LeaveRecursiveCall (python#101923) pythongh-100556: Improve clarity of `or` docs (python#100589) pythongh-101777: Make `PriorityQueue` docs slightly clearer (python#102026) pythongh-101965: Fix usage of Py_EnterRecursiveCall return value in _bisectmodule.c (pythonGH-101966) pythongh-101578: Amend exception docs (python#102057) pythongh-101961 fileinput.hookcompressed should not set the encoding value for the binary mode (pythongh-102068) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) ...
Automerge-Triggered-By: GH:erlend-aasland
@gpshead I'll let you check MD5/SHA1 in the checklist above... quick question: what is the timeline for landing sha3? it requires a little bit of work on my side and I'll be traveling all March, so ideally I would submit the SHA3 PR in April. But if it's a dealbreaker because of e.g. the Python release schedule, I can scramble to try to submit it before then. Let me know. |
So long as we can get it committed before 3.12beta1 at the beginning of May (see https://peps.python.org/pep-0693/) we're good. So April for that is fine by me. No need to scramble. |
* main: (67 commits) pythongh-99108: Add missing md5/sha1 defines to Modules/Setup (python#102308) pythongh-100227: Move _str_replace_inf to PyInterpreterState (pythongh-102333) pythongh-100227: Move the dtoa State to PyInterpreterState (pythongh-102331) pythonGH-102305: Expand some macros in generated_cases.c.h (python#102309) Migrate to new PSF mailgun account (python#102284) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Python/) (python#102193) pythonGH-90744: Fix erroneous doc links in the sys module (python#101319) pythongh-87092: Make jump target label equal to the offset of the target in the instructions sequence (python#102093) pythongh-101101: Unstable C API tier (PEP 689) (pythonGH-101102) IDLE: Simplify DynOptionsMenu __init__code (python#101371) pythongh-101561: Add typing.override decorator (python#101564) pythongh-101825: Clarify that as_integer_ratio() output is always normalized (python#101843) pythongh-101773: Optimize creation of Fractions in private methods (python#101780) pythongh-102251: Updates to test_imp Toward Fixing Some Refleaks (pythongh-102254) pythongh-102296 Document that inspect.Parameter kinds support ordering (pythonGH-102297) pythongh-102250: Fix double-decref in COMPARE_AND_BRANCH error case (pythonGH-102287) pythongh-101100: Fix sphinx warnings in `types` module (python#102274) pythongh-91038: Change default argument value to `False` instead of `0` (python#31621) pythongh-101765: unicodeobject: use Py_XDECREF correctly (python#102283) [doc] Improve grammar/fix missing word (pythonGH-102060) ...
Sounds good, go ahead and work up a PR for blake2 HACL when you've got a chance. Don't fret about the 3.12beta1 feature freeze cutoff for blake2 - while our release manager has technically delayed that to two weeks from today, it isn't a big deal if a blake2 built-in update misses 3.12. |
) Replaces our built-in SHA3 implementation with a verified one from the HACL* project. This implementation is used when OpenSSL does not provide SHA3 or is not present. 3.11 shiped with a very slow tiny sha3 implementation to get off of the <=3.10 reference implementation that wound up having serious bugs. This brings us back to a reasonably performing built-in implementation consistent with what we've just replaced our other guaranteed available standard hash algorithms with: code from the HACL* project. --------- Co-authored-by: Gregory P. Smith <greg@krypto.org>
case sensitive filename
* main: (47 commits) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172) pythongh-103193: Improve `getattr_static` test coverage (python#104286) Trim trailing whitespace and test on CI (python#104275) pythongh-102500: Remove mention of bytes shorthand (python#104281) pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256) pythongh-99108: Replace SHA3 implementation HACL* version (python#103597) pythongh-104273: Remove redundant len() calls in argparse function (python#104274) pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200) pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266) pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244) pythongh-103650: Fix perf maps address format (python#103651) pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243) ...
* main: (29 commits) pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172) pythongh-103193: Improve `getattr_static` test coverage (python#104286) Trim trailing whitespace and test on CI (python#104275) pythongh-102500: Remove mention of bytes shorthand (python#104281) pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256) pythongh-99108: Replace SHA3 implementation HACL* version (python#103597) ...
* main: (156 commits) pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304) pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441) pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096) pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217) pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312) pythongh-104240: return code unit metadata from codegen (python#104300) pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) ...
* main: (35 commits) pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304) pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441) pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096) pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217) pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312) pythongh-104240: return code unit metadata from codegen (python#104300) pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) ...
Refresh HACL* from upstream and add a SHA3 test hashing over 4GiB of data.
* main: pythongh-91896: Fixup some docs issues following ByteString deprecation (python#104422) pythonGH-104371: check return value of calling `mv.release` (python#104417) pythongh-104415: Fix refleak tests for `typing.ByteString` deprecation (python#104416) pythonGH-86275: Implementation of hypothesis stubs for property-based tests, with zoneinfo tests (python#22863) pythonGH-103082: Filter LINE events in VM, to simplify tool implementation. (pythonGH-104387) pythongh-93649: Split gc- and allocation tests from _testcapimodule.c (pythonGH-104403) pythongh-104389: Add 'unused' keyword to Argument Clinic C converters (python#104390) pythongh-101819: Prepare _io._IOBase for module state (python#104386) pythongh-104413: Fix refleak when super attribute throws AttributeError (python#104414) Fix refleak in `super_descr_get` (python#104408) pythongh-87526: Remove dead initialization from _zoneinfo parse_abbr() (python#24700) pythongh-91896: Improve visibility of `ByteString` deprecation warnings (python#104294) pythongh-104371: Fix calls to `__release_buffer__` while an exception is active (python#104378) pythongh-104377: fix cell in comprehension that is free in outer scope (python#104394) pythongh-104392: Remove _paramspec_tvars from typing (python#104393) pythongh-104396: uuid.py to skip platform check for emscripten and wasi (pythongh-104397) pythongh-99108: Refresh HACL* from upstream (python#104401) pythongh-104301: Allow leading whitespace in disambiguated pdb statements (python#104342)
Quick update: I took a look at our blake2 code and it doesn't expose enough of the parameterized API which Python currently offers for tree hashing. It's not a big deal, but means there's enough work on our end that Blake2 won't make it into 3.12 but rather into the next release... I'll post updates here once we make enough progress upstream. |
This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins.
This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins. Includes adding comments to better describe the ENTER/LEAVE macros purpose and explain the lock strategy in both existing and new code.
…ythonGH-104675) This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins. Includes adding comments to better describe the ENTER/LEAVE macros purpose and explain the lock strategy in both existing and new code. (cherry picked from commit 2e5d8a9) Co-authored-by: Gregory P. Smith <greg@krypto.org>
…H-104675) (#104776) gh-99108: Release the GIL around hashlib built-in computation (GH-104675) This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins. Includes adding comments to better describe the ENTER/LEAVE macros purpose and explain the lock strategy in both existing and new code. (cherry picked from commit 2e5d8a9) Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
Refresh HACL* from upstream to improve SHA2 performance and fix a 32-bit issue in SHA3.
Refresh HACL* from upstream to improve SHA2 performance and fix a 32-bit issue in SHA3. (cherry picked from commit 160321e) Co-authored-by: Jonathan Protzenko <protz@microsoft.com>
(cherry picked from commit 3a314f7) Co-authored-by: Gregory P. Smith <greg@krypto.org>
Feature or enhancement
We propose to replace the non-OpenSSL cryptographic primitives in hashlib with high-assurance, verified versions from the HACL* project.
Pitch
As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.
The HACL* project https://github.com/hacl-star/hacl-star provides verified implementations of cryptographic primitives. These implementations are mathematically shown to be:
See https://hacl-star.github.io/Overview.html#what-is-verified-software for a longer description of how formal methods can help write high-assurance software and rule out entire classes of bugs.
The performance of HACL* is competitive with, and sometimes exceeds, that of OpenSSL. HACL* is distributed as pure C, and therefore is portable. Parts of HACL* have been adopted in Mozilla, Linux, the Tezos blockchain, and many more, thereby demonstrating that formally verified code is ready for production-time.
Previous discussion
Tagging @alex with whom I've informally discussed this.
The text was updated successfully, but these errors were encountered: