Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39087: Add PyUnicode_GetUTF8Buffer(). #17659

Open
wants to merge 7 commits into
base: master
from

Conversation

@methane
Copy link
Member

methane commented Dec 19, 2019

Ref: bpo-39087

@methane methane requested a review from python/windows-team as a code owner Dec 19, 2019
@methane methane requested review from vstinner and serhiy-storchaka Dec 19, 2019
Lib/test/test_unicode.py Outdated Show resolved Hide resolved
Lib/test/test_unicode.py Outdated Show resolved Hide resolved
Lib/test/test_unicode.py Outdated Show resolved Hide resolved
Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Doc/c-api/unicode.rst Show resolved Hide resolved
Objects/unicodeobject.c Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
methane added 3 commits Dec 20, 2019
from _testcapi import unicode_getutf8buffer, test_unicode_getutf8buffer

# Run tests wrtten in C. Raise an error when test failed.
test_unicode_getutf8buffer()

This comment has been minimized.

Copy link
@vstinner

vstinner Dec 20, 2019

Member

Hum. test_capi executes all test_capi attributes with a name starting with "test_". Please check if it's executed. If yes, just rename the method. For example, rename it to unicode_test_getutf8buffer()).

This comment has been minimized.

Copy link
@methane

methane Dec 20, 2019

Author Member

You are right. I confirmed the test_unicode_getutf8buffer is called from test_capi.

I just removed calling test_unicode_getutf8buffer here because there is test_unicode_compare_with_ascii already.
Let's run the test from test_capi.

@@ -1061,6 +1061,28 @@ These are the UTF-8 codec APIs:
raised by the codec.
.. c:function: int PyUnicode_GetUTF8Buffer(PyObject *unicode, const char errors, Py_buffer *view)

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 21, 2019

Member

I am not sure about the order of parameters. There is a logic in placing the output parameter at the end, but in PyObject_GetBuffer() (for which I modelled the name) it is not the last parameter. Also, if we will add more similar functions in future, with multiple additional parameters (like flags and errors), it is better to place them at the end.

}
Py_ssize_t refcnt = Py_REFCNT(str);

if (PyUnicode_GetUTF8Buffer(str, NULL, &buf) < 0) {

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 21, 2019

Member

It is not possible that PyUnicode_GetUTF8Buffer() fails for an ASCII string.

def test_getutf8buffer(self):
from _testcapi import unicode_getutf8buffer

ascii_ = "foo"

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 21, 2019

Member

Why not inline these variables?

This comment has been minimized.

Copy link
@methane

methane Dec 25, 2019

Author Member
  • Follow other tests
  • Variable name is a hint why this value is tested.
return PyBuffer_FillInfo(view, unicode,
PyUnicode_UTF8(unicode),
PyUnicode_UTF8_LENGTH(unicode),
1, PyBUF_SIMPLE);

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 21, 2019

Member
Suggested change
1, PyBUF_SIMPLE);
1 /* readonly */, PyBUF_SIMPLE);
}

if (PyUnicode_UTF8(unicode) != NULL) {
return PyBuffer_FillInfo(view, unicode,

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 21, 2019

Member

We should check that Py_TYPE(unicode)->tp_as_buffer == NULL. Just for the case if the string subclass implement the buffer protocol.

@vstinner

This comment has been minimized.

Copy link
Member

vstinner commented Dec 21, 2019

IMHO it's a bad old habit that test_capi runs "all" tests. I prefer that test_unicode tests the C API of Unicode objects. So rename the method instead.

{
PyObject *unicode;
const char *errors = NULL;
if(!PyArg_ParseTuple(args, "U|s", &unicode, &errors)) {

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 25, 2019

Member

U calls PyUnicode_READY(). This can affect the testing. I suggest to use O.

"without exception set. (%s:%d)",
__FILE__, __LINE__);
}
Py_DECREF(str);

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 25, 2019

Member

Why decref twice?

return NULL;
}

if (buf.obj == str || !PyBytes_CheckExact(buf.obj)) {

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 25, 2019

Member

buf.obj == str is redundant.

// Test 3: There is a UTF-8 cache
// Reuse str of the previoss test.

const char *cache = PyUnicode_AsUTF8(str);

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 25, 2019

Member

Would it be too difficult to test also that there was no cache before calling PyUnicode_AsUTF8()?

"without exception set. (%s:%d)",
__FILE__, __LINE__);
}
Py_DECREF(str);

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 25, 2019

Member

Decrefed twice.

"Py_REFCNT(str) must not be changed. (%s:%d)",
__FILE__, __LINE__);
PyBuffer_Release(&buf);
Py_DECREF(str);

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Dec 25, 2019

Member

I think that is not safe to decref str and even call PyBuffer_Release() if this test fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.