Skip to content

gh-73936: Support unicode passwords in smtplib #103611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

arnt
Copy link

@arnt arnt commented Apr 18, 2023

I see that two older pull requests do much the same thing, but weren't merged because of missing copyright assignments and perhaps missing tests. They also didn't quite implement the specification, which requires SASLprep. (SASLprep governs, basically, when two unicode logins/passwords are equal.)

AFAICT my tests cover the same ground as the tests in the other branches, without wasting CPU time on repeatedly testing the same thing. Precisely how much to repeat is of course a matter of taste.

Mongodb very graciously contributes its SASLprep implementation, and requests that he company be added in the contributors list. I'd like to make sure that that's done at the same time as this is merged.

I am doing this for my employer, ICANN, which doesn't strictly need to be added to the contributor list but perhaps it's nice. I can sign any and all copyright assignments necessary.

While writing this, I had the impression that some of the SASL work is repeated a little too much — that imaplib and smtplib ought to share SASL code. If you agree, I could try another (separate) PR for that.

This code was written by Steven Silvester and Bernie Hackett for MongoDB's
Python driver, and contributed to Python with kind permission from MongoDB.
@arnt arnt requested a review from a team as a code owner April 18, 2023 15:27
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@ghost
Copy link

ghost commented Apr 18, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@arnt
Copy link
Author

arnt commented Apr 18, 2023

I have to go out now; will create and add a NEWS entry when I'm back. Sorry I missed that.

@arnt arnt force-pushed the unicode-passwords-smtplib branch from 28b78ca to c77adec Compare April 18, 2023 16:57
Formally speaking, this also requires support for SASLprep, RFC 4013.
MongoDB very kindly contributes its SASLprep implementation.
@arnt arnt force-pushed the unicode-passwords-smtplib branch from c77adec to 3263baa Compare April 18, 2023 17:13
@arhadthedev arhadthedev added stdlib Python modules in the Lib dir topic-email labels Apr 18, 2023
@behackett
Copy link

Happy to contribute. I assume I successfully signed the CLA, though the clabot application failed with a Heroku "Application error" message.

@arhadthedev
Copy link
Member

I assume I successfully signed the CLA, though the clabot application failed with a Heroku "Application error" message.

You're right, it's signed.

image

@arnt
Copy link
Author

arnt commented Apr 26, 2023

The CLA heroku app appeared to fail for me to, even though signing succeeded.

But there's worse. I wondered why it didn't ask Steven Silvester to sign. The answer is: Because I lost a Co-Authored-by: line when I fatfingered during git rebase -i. Mea culpa. Shall I add the line back and force-push?

@behackett
Copy link

Whatever CPython requires or would prefer? @blink1073 and I both work on PyMongo for MongoDB, and I have authority to speak for MongoDB in this case.

from typing import Any, Optional

try:
import stringprep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringprep is available in all supported versions of python.
This check should be removed.

@gvanrossum
Copy link
Member

Shall I add the line back and force-push?

Can't tell if you already did so, but in general the guidance is to use merge and push, not rebase and force-push, to make things easier on the reviewers. We squash-merge when merging into main.

I am asking @ambv to take a look at the copyright issue presented here -- it's possible that we need an explicit corporate SLA from MongoDB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this as Lib/_saslprep.py as we aren't offering this up as a public documented stdlib module and API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine this being useful outside of smtplib. I wrote it originally for PyMongo. I would have loved to be able to just import it instead. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new stdlib module shouldn't be taken lightly.
Would it make sense to add the function to smtplib.py directly?

If you want this to be public documented API, then it should have documentation :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful in imaplib, at least, which also uses SASL. I'd love to refactor the SASL use im imaplib and smtplib, but that's outside the scope of this PR. That refactoring should probably make saslprep public, if it wasn't before? I have no opinion on whether saslprep should be public without that refactoring.

(Many thanks for all the other comments; I'll get back to this tomorrow. Meetings and another PR grabbed the mutex on my day today and kept it until now.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it living in its own file (which may be considered "required" for for license text reasons), but if this is to be a public API, I'd prefer to see the function exposed publicly from within an existing stdlib module, not a new top level module name.

It is a bit of a utility function. Another plausible stdlib module it could make sense to be exposed publicly from would be hashlib, given its purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be allowed to move the licence elsewhere? In fact we should probably add it to the long list of small licences...
After the code becomes part of Python, it will be improved, refactored and moved around. If that isn't allowed, we should IMO keep it as an external dependency.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I left some comments below.

Some them are only relevant if this is public API; if we won't document/expose it then the argument names/defaults don't matter that much.

IMO, it would make sense to expose the function. Even if it's internal, CPython devs would be on the hook for (security) issues in it. And I don't see a good reason to encourage the community to reimplement this.

(But that's just my opinion, please wait for Greg's reply before going to implement it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new stdlib module shouldn't be taken lightly.
Would it make sense to add the function to smtplib.py directly?

If you want this to be public documented API, then it should have documentation :)

:Returns:
The SASLprep'ed version of `data`.
"""
prohibited: Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does nothing.

Suggested change
prohibited: Any

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be there to silence a type checker.

Type checkers in Python consider tuples structures, not immutable lists. Therefore concatenating tuples changes their type.

Instead of removing the line, I'd provide a better annotation.

Something like:

Suggested change
prohibited: Any
prohibited: tuple[Callable[[str], bool], ...]

stringprep.in_table_c9,
)

def saslprep(data: Any, prohibit_unassigned_code_points: Optional[bool] = True) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Any annotation does nothing. According to the docstring, it should be bytes|str. (But you can drop it altogether; we don't run typecheckers for the stdlib.)
Why is the bool Optional? I don't see a reason for allowing None here.

Comment on lines +59 to +62
- `prohibit_unassigned_code_points`: True / False. RFC 3454
and RFCs for various SASL mechanisms distinguish between
`queries` (unassigned code points allowed) and
`stored strings` (unassigned code points prohibited). Defaults
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, for the argument name it would be better to refer to the use case (something like store=True/False or intent='query'/'store'), and leave the effect as more of an implementation detail.
How was the default chosen? It seems that if you want False but forget it, you'll get subtly wrong (and possibly dangerous?) behaviour. If that's the case, there should be no default; users should always specify this.

def test_saslprep(self):
try:
import stringprep # noqa
except ImportError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @barry-scott said above, stringprep is always available in 3.13+.

self.assertRaises(ValueError, saslprep, "\u0627\u0031")

# Bytes strings are ignored.
self.assertEqual(saslprep(b"user"), b"user")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test that with some non-UTF8 bytes?

Suggested change
self.assertEqual(saslprep(b"user"), b"user")
self.assertEqual(saslprep(b"user"), b"user")
self.assertEqual(saslprep(b"\0\xff"), b"\0\xff")

self.assertEqual(saslprep("\u2168"), "IX")
self.assertRaises(ValueError, saslprep, "\u0007")
self.assertRaises(ValueError, saslprep, "\u0627\u0031")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add the test cases from mongodb's JS library?

@gpshead
Copy link
Member

gpshead commented Jan 4, 2024

FYI - another version of saslprep function can be found in the PyPI passlib project, BSD licensed: https://foss.heptapod.net/python-libs/passlib/-/blob/branch/stable/passlib/utils/__init__.py#L401 via https://pypi.org/project/passlib/

@ambv
Copy link
Contributor

ambv commented Jan 4, 2024

I am asking @ambv to take a look at the copyright issue presented here -- it's possible that we need an explicit corporate SLA from MongoDB?

Yes, even though saslprep.py here is contributed as part of Arnt's PR, its original author is @behackett. Bernie, MongoDB needs to sign a corporate CLA (via this form by checking "signing this form on behalf of an organization").

@behackett
Copy link

I am asking @ambv to take a look at the copyright issue presented here -- it's possible that we need an explicit corporate SLA from MongoDB?

Yes, even though saslprep.py here is contributed as part of Arnt's PR, its original author is @behackett. Bernie, MongoDB needs to sign a corporate CLA (via this form by checking "signing this form on behalf of an organization").

I have signed the CLA on behalf of MongoDB, Inc., checking the organization box.

@encukou
Copy link
Member

encukou commented Jan 25, 2024

@arnt, are you planning to address the comments, or would you prefer someone else to push this PR forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stdlib Python modules in the Lib dir topic-email
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants