Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-39087: Add PyUnicode_GetUTF8Buffer(). #17659
Conversation
def test_getutf8buffer(self): | ||
from _testcapi import unicode_getutf8buffer | ||
|
||
ascii = "foo" |
This comment has been minimized.
This comment has been minimized.
vstinner
Dec 19, 2019
Member
nitpick: "ascii" is the name of a builtin function, maybe rename it to "text".
self.assertEqual(c1 + 1, c2) | ||
|
||
mv.release() | ||
del mv |
This comment has been minimized.
This comment has been minimized.
vstinner
Dec 19, 2019
Member
Is "del mv" really needed for following tests? If yes, you might add a support.gc_collect() call.
self.assertEqual(unicode_getutf8buffer(bmp).tobytes(), b'\xc4\x80') | ||
self.assertEqual(unicode_getutf8buffer(bmp2).tobytes(), b'\xef\xbf\xbf') | ||
self.assertEqual(unicode_getutf8buffer(nonbmp).tobytes(), b'\xf4\x8f\xbf\xbf') | ||
self.assertRaises(UnicodeEncodeError, unicode_getutf8buffer, 'a\ud800b\udfffc') |
This comment has been minimized.
This comment has been minimized.
} | ||
// Instead of calling PyBuffer_Release(), move the ownership of the obj | ||
// to the memoryview object. | ||
((PyMemoryViewObject*)result)->mbuf->master.obj = buffer.obj; |
This comment has been minimized.
This comment has been minimized.
vstinner
Dec 19, 2019
Member
It seems a little strange to me that the function returns a Py_buffer, but you create a memory view and "hacks" the memory view internal object to return a view instead of the Py_buffer.
Maybe add two tests:
- one test in pure C which checks Py_buffer attributes
- one function exposing PyUnicode_GetUTF8Buffer() but returns a bytes object, just to check the UTF-8 encoded string: this function would call PyBuffer_Release()
Successful calls to :c:func:`PyUnicode_GetUTF8Buffer` must be paired with | ||
calls to :c:func:`PyBuffer_Release`, similar to :c:func:`PyObject_GetBuffer`. | ||
This comment has been minimized.
This comment has been minimized.
vstinner
Dec 19, 2019
Member
Since the motivation to add this new function was performance, would you mind to elaborate a little bit on that? Explain if a memory copy is needed. Explain if the function fills the internal UTF-8 cache if it's not already computed?
1, PyBUF_SIMPLE); | ||
} | ||
|
||
PyObject *bytes = _PyUnicode_AsUTF8String(unicode, errors); |
This comment has been minimized.
This comment has been minimized.
vstinner
Dec 19, 2019
Member
Would you mind to add a comment to explain that you don't fill the internal UTF-8 cache on purpose? We have so many ways to encode Unicode to UTF-8, I never recall which one fills the cache or not.
return -1; | ||
} | ||
|
||
if (PyUnicode_UTF8(unicode) != NULL) { |
methane commentedDec 19, 2019
•
edited by bedevere-bot
https://bugs.python.org/issue39087