-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-73936: Support unicode passwords in smtplib #103611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This code was written by Steven Silvester and Bernie Hackett for MongoDB's Python driver, and contributed to Python with kind permission from MongoDB.
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
I have to go out now; will create and add a NEWS entry when I'm back. Sorry I missed that. |
28b78ca
to
c77adec
Compare
Formally speaking, this also requires support for SASLprep, RFC 4013. MongoDB very kindly contributes its SASLprep implementation.
c77adec
to
3263baa
Compare
Happy to contribute. I assume I successfully signed the CLA, though the clabot application failed with a Heroku "Application error" message. |
The CLA heroku app appeared to fail for me to, even though signing succeeded. But there's worse. I wondered why it didn't ask Steven Silvester to sign. The answer is: Because I lost a Co-Authored-by: line when I fatfingered during git rebase -i. Mea culpa. Shall I add the line back and force-push? |
Whatever CPython requires or would prefer? @blink1073 and I both work on PyMongo for MongoDB, and I have authority to speak for MongoDB in this case. |
from typing import Any, Optional | ||
|
||
try: | ||
import stringprep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringprep is available in all supported versions of python.
This check should be removed.
Can't tell if you already did so, but in general the guidance is to use merge and push, not rebase and force-push, to make things easier on the reviewers. We squash-merge when merging into main. I am asking @ambv to take a look at the copyright issue presented here -- it's possible that we need an explicit corporate SLA from MongoDB? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this as Lib/_saslprep.py
as we aren't offering this up as a public documented stdlib module and API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine this being useful outside of smtplib. I wrote it originally for PyMongo. I would have loved to be able to just import it instead. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new stdlib module shouldn't be taken lightly.
Would it make sense to add the function to smtplib.py
directly?
If you want this to be public documented API, then it should have documentation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful in imaplib, at least, which also uses SASL. I'd love to refactor the SASL use im imaplib and smtplib, but that's outside the scope of this PR. That refactoring should probably make saslprep public, if it wasn't before? I have no opinion on whether saslprep should be public without that refactoring.
(Many thanks for all the other comments; I'll get back to this tomorrow. Meetings and another PR grabbed the mutex on my day today and kept it until now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it living in its own file (which may be considered "required" for for license text reasons), but if this is to be a public API, I'd prefer to see the function exposed publicly from within an existing stdlib module, not a new top level module name.
It is a bit of a utility function. Another plausible stdlib module it could make sense to be exposed publicly from would be hashlib
, given its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be allowed to move the licence elsewhere? In fact we should probably add it to the long list of small licences...
After the code becomes part of Python, it will be improved, refactored and moved around. If that isn't allowed, we should IMO keep it as an external dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I left some comments below.
Some them are only relevant if this is public API; if we won't document/expose it then the argument names/defaults don't matter that much.
IMO, it would make sense to expose the function. Even if it's internal, CPython devs would be on the hook for (security) issues in it. And I don't see a good reason to encourage the community to reimplement this.
(But that's just my opinion, please wait for Greg's reply before going to implement it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new stdlib module shouldn't be taken lightly.
Would it make sense to add the function to smtplib.py
directly?
If you want this to be public documented API, then it should have documentation :)
:Returns: | ||
The SASLprep'ed version of `data`. | ||
""" | ||
prohibited: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does nothing.
prohibited: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be there to silence a type checker.
Type checkers in Python consider tuples structures, not immutable lists. Therefore concatenating tuples changes their type.
Instead of removing the line, I'd provide a better annotation.
Something like:
prohibited: Any | |
prohibited: tuple[Callable[[str], bool], ...] |
stringprep.in_table_c9, | ||
) | ||
|
||
def saslprep(data: Any, prohibit_unassigned_code_points: Optional[bool] = True) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Any
annotation does nothing. According to the docstring, it should be bytes|str
. (But you can drop it altogether; we don't run typecheckers for the stdlib.)
Why is the bool
Optional
? I don't see a reason for allowing None
here.
- `prohibit_unassigned_code_points`: True / False. RFC 3454 | ||
and RFCs for various SASL mechanisms distinguish between | ||
`queries` (unassigned code points allowed) and | ||
`stored strings` (unassigned code points prohibited). Defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, for the argument name it would be better to refer to the use case (something like store=True/False
or intent='query'/'store'
), and leave the effect as more of an implementation detail.
How was the default chosen? It seems that if you want False
but forget it, you'll get subtly wrong (and possibly dangerous?) behaviour. If that's the case, there should be no default; users should always specify this.
def test_saslprep(self): | ||
try: | ||
import stringprep # noqa | ||
except ImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @barry-scott said above, stringprep
is always available in 3.13+.
self.assertRaises(ValueError, saslprep, "\u0627\u0031") | ||
|
||
# Bytes strings are ignored. | ||
self.assertEqual(saslprep(b"user"), b"user") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test that with some non-UTF8 bytes?
self.assertEqual(saslprep(b"user"), b"user") | |
self.assertEqual(saslprep(b"user"), b"user") | |
self.assertEqual(saslprep(b"\0\xff"), b"\0\xff") |
self.assertEqual(saslprep("\u2168"), "IX") | ||
self.assertRaises(ValueError, saslprep, "\u0007") | ||
self.assertRaises(ValueError, saslprep, "\u0627\u0031") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add the test cases from mongodb's JS library?
FYI - another version of |
Yes, even though |
I have signed the CLA on behalf of MongoDB, Inc., checking the organization box. |
@arnt, are you planning to address the comments, or would you prefer someone else to push this PR forward? |
I see that two older pull requests do much the same thing, but weren't merged because of missing copyright assignments and perhaps missing tests. They also didn't quite implement the specification, which requires SASLprep. (SASLprep governs, basically, when two unicode logins/passwords are equal.)
AFAICT my tests cover the same ground as the tests in the other branches, without wasting CPU time on repeatedly testing the same thing. Precisely how much to repeat is of course a matter of taste.
Mongodb very graciously contributes its SASLprep implementation, and requests that he company be added in the contributors list. I'd like to make sure that that's done at the same time as this is merged.
I am doing this for my employer, ICANN, which doesn't strictly need to be added to the contributor list but perhaps it's nice. I can sign any and all copyright assignments necessary.
While writing this, I had the impression that some of the SASL work is repeated a little too much — that imaplib and smtplib ought to share SASL code. If you agree, I could try another (separate) PR for that.