Skip to content

gh-61456: Add Thai language codec aliases #15079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 7, 2025
Merged

gh-61456: Add Thai language codec aliases #15079

merged 7 commits into from
Apr 7, 2025

Conversation

btwood
Copy link
Contributor

@btwood btwood commented Aug 2, 2019

Adding aliases for Thai language support. The current code page is an implementation of the windows code page.
This will alias '874', 'ms874', and 'windows_874' to cp874, adding Thai language support for those users.

https://bugs.python.org/issue17254

Adding aliases for Thai language support. The current code page is an implementation of the windows code page.
This will alias '874', 'ms874', and 'windows_874' to cp874, adding Thai language support for those users.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@btwood
Copy link
Contributor Author

btwood commented Aug 2, 2019

CLA Submitted.

Copy link
Contributor

@eamanu eamanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests are failed. Please review it . And maybe a NEWs should be necessary?

@btwood
Copy link
Contributor Author

btwood commented Aug 5, 2019

Got all checks to pass. Ready for review.

@btwood
Copy link
Contributor Author

btwood commented Sep 9, 2019

@eamanu Bumping this because it's been a while.

@matrixise matrixise self-assigned this Sep 13, 2019
@csabella csabella requested a review from malemburg January 10, 2020 22:47
@matrixise matrixise removed their assignment Feb 3, 2020
@btwood
Copy link
Contributor Author

btwood commented May 5, 2020

bumping again, and adding some details here.

To quote myself from the bug ticket:

cp874 != ibm_874 != iso_8859_11

What I can say is that the current cp874 is the implementation of the windows_874 code page. The page itself references the microsoft code page, and also contains the appropriate characters (like EURO SIGN).

https://github.com/python/cpython/blob/master/Lib/encodings/cp874.py
""" Python Character Mapping Codec cp874 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT' with gencodec.py.

https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT

It seems appropriate to at least alias windows_874 with cp874. They are provably the same.

@jasonm23
Copy link

@malemburg please review or tap another.

@shamusfield-expel
Copy link

I'll also throw a bump in here - the email's contentmanager dies pretty frequently from this encoding being missing. For now I'm getting around this with this, nabbed from https://bugs.python.org/issue17254

    aliases = encodings.aliases.aliases
    additional = {
        'windows_874': 'cp874'
    }
    aliases.update(additional)

@btwood
Copy link
Contributor Author

btwood commented Sep 4, 2023

@malemburg @ezio-melotti @csabella
I've updated the commit to meet current pipeline checks and standards.
Please let me know if there is anything else I can do.

I'm looking forward to squashing this years long bug. In my work writing python email parsers, we definitely saw this from Thai senders using Microsoft email clients.

Thanks!

@ambv ambv changed the title bpo-17254: Thai Language Aliases gh-61456: Thai Language Aliases Apr 7, 2025
@ambv ambv changed the title gh-61456: Thai Language Aliases gh-61456: Add Thai language codec aliases Apr 7, 2025
@ambv ambv merged commit 895d983 into python:main Apr 7, 2025
39 checks passed
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Adding aliases for Thai language support. The current code page is an implementation of the windows code page.
This will alias '874', 'ms874', and 'windows_874' to cp874, adding Thai language support for those users.

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants