Skip to content

bpo-44170: Fix UnicodeDecodeError with multibyte utf8 characters in ShareableList #26328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

junnplus
Copy link

@junnplus junnplus commented May 24, 2021

This PR fix UnicodeDecodeError with multibyte utf8 characters in ShareableList.

>> from multiprocessing.shared_memory import ShareableList
>>> strings = ["Boom 💥 💥 💥"]
>>> ShareableList(strings)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Jun/.pyenv/versions/3.9.0/lib/python3.9/multiprocessing/shared_memory.py", line 479, in __repr__
    return f'{self.__class__.__name__}({list(self)}, name={self.shm.name!r})'
  File "/Users/Jun/.pyenv/versions/3.9.0/lib/python3.9/multiprocessing/shared_memory.py", line 435, in __getitem__
    v = back_transform(v)
  File "/Users/Jun/.pyenv/versions/3.9.0/lib/python3.9/multiprocessing/shared_memory.py", line 277, in <lambda>
    1: lambda value: value.rstrip(b'\x00').decode(_encoding),  # str
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 15: unexpected end of data

https://bugs.python.org/issue44170

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@junnplus

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

if isinstance(value, str) else value)
if len(encoded_value) > allocated_length:
encoded_value = self._encode_value(value)
if len(encoded_value) >= allocated_length:
Copy link
Author

@junnplus junnplus May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> from multiprocessing.shared_memory import ShareableList
>>> s1 = ShareableList(['1234567'])
>>> s1.format
'8s'
>>> s2 = ShareableList(['12345678'])
>>> s2.format
'16s'
>>> s1[0] = '12345678'  # Is this behavior expected?
>>> s1.format
'8s'
>>> s3 = ShareableList(s1)
>>> s3.format
'16s'

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 27, 2021
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants