Skip to content

bpo-39500: Document PyUnicode_IsIdentifier() function #18280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,17 @@ access internal read-only data of Unicode objects:
Part of the old-style Unicode API, please migrate to using the
:c:func:`PyUnicode_nBYTE_DATA` family of macros.

.. c:function:: int PyUnicode_IsIdentifier(PyObject *o)

Return ``1`` if the string is a valid identifier according to the language
definition, section :ref:`identifiers`. Return ``0`` otherwise.

Raise an exception and return ``-1`` on error.

.. versionchanged:: 3.9
The function now returns ``-1`` on error, instead of calling
:c:func:`Py_FatalError`.


Unicode Character Properties
""""""""""""""""""""""""""""
Expand Down
4 changes: 4 additions & 0 deletions Doc/whatsnew/3.9.rst
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,10 @@ Build and C API Changes
functions are now required to build Python.
(Contributed by Victor Stinner in :issue:`39395`.)

* The :c:func:`PyUnicode_IsIdentifier` function now returns ``-1`` on error,
instead of calling :c:func:`Py_FatalError`.
(Contributed by Victor Stinner in :issue:`39500`.)


Deprecated
==========
Expand Down
6 changes: 5 additions & 1 deletion Objects/typeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -2241,7 +2241,11 @@ valid_identifier(PyObject *s)
Py_TYPE(s)->tp_name);
return 0;
}
if (!PyUnicode_IsIdentifier(s)) {
int identifier = PyUnicode_IsIdentifier(s);
if (identifier < 0) {
return 0;
}
if (!identifier) {
PyErr_SetString(PyExc_TypeError,
"__slots__ must be identifiers");
return 0;
Expand Down
23 changes: 15 additions & 8 deletions Objects/unicodeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -12209,13 +12209,13 @@ PyUnicode_IsIdentifier(PyObject *self)
Py_UCS4 first;

if (PyUnicode_READY(self) == -1) {
Py_FatalError("identifier not ready");
return 0;
return -1;
}

/* Special case for empty strings */
if (PyUnicode_GET_LENGTH(self) == 0)
if (PyUnicode_GET_LENGTH(self) == 0) {
/* an empty string is not a valid identifier */
return 0;
}
kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);

Expand All @@ -12228,12 +12228,15 @@ PyUnicode_IsIdentifier(PyObject *self)
to check just for these, except that _ must be allowed
as starting an identifier. */
first = PyUnicode_READ(kind, data, 0);
if (!_PyUnicode_IsXidStart(first) && first != 0x5F /* LOW LINE */)
if (!_PyUnicode_IsXidStart(first) && first != 0x5F /* LOW LINE */) {
return 0;
}

for (i = 1; i < PyUnicode_GET_LENGTH(self); i++)
if (!_PyUnicode_IsXidContinue(PyUnicode_READ(kind, data, i)))
for (i = 1; i < PyUnicode_GET_LENGTH(self); i++) {
if (!_PyUnicode_IsXidContinue(PyUnicode_READ(kind, data, i))) {
return 0;
}
}
return 1;
}

Expand All @@ -12250,7 +12253,11 @@ static PyObject *
unicode_isidentifier_impl(PyObject *self)
/*[clinic end generated code: output=fe585a9666572905 input=2d807a104f21c0c5]*/
{
return PyBool_FromLong(PyUnicode_IsIdentifier(self));
int identifier = PyUnicode_IsIdentifier(self);
if (identifier < 0) {
return NULL;
}
return PyBool_FromLong(identifier);
}

/*[clinic input]
Expand Down
5 changes: 5 additions & 0 deletions Parser/tokenizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -1078,6 +1078,11 @@ verify_identifier(struct tok_state *tok)
return 0;
}
result = PyUnicode_IsIdentifier(s);
assert(result >= 0);
if (result < 0) {
/* silently ignore error */
PyErr_Clear();
}
Py_DECREF(s);
if (result == 0)
tok->done = E_IDENTIFIER;
Expand Down