New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-94526: getpath_dirname() no longer encodes the path #97645
base: main
Are you sure you want to change the base?
Conversation
Well. In fact, the issue is broader: no only _bootstrap_python is affected, any |
51ee026
to
f2235ae
Compare
I rebased and updated the PR to clarify that this issue affects the Python path configuration (sys.path creation). |
Sadly, There are getpath_methods which are injected inside a namespace (dict) by funcs_to_dict() function. It may be interesting to convert it to a regular extension module ( |
Misc/NEWS.d/next/Core and Builtins/2022-09-29-15-19-29.gh-issue-94526.wq5m6T.rst
Outdated
Show resolved
Hide resolved
const char *path; | ||
if (!PyArg_ParseTuple(args, "s", &path)) { | ||
PyObject *path; | ||
if (!PyArg_ParseTuple(args, "U", &path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I would use METH_O
and PyArg_Parse()
in these functions, but this is another issue.
Why cannot they be implemented in Python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I would use METH_O and PyArg_Parse() in these functions, but this is another issue.
I tried to minimize the changes.
Why cannot they be implemented in Python?
Ask @zooba who designed this. Maybe it can be changed?
Fix the Python path configuration used to initialized sys.path at Python startup. Paths are no longer encoded to UTF-8/strict to avoid encoding errors if it contains surrogate characters (bytes paths are decoded with the surrogateescape error handler). getpath_basename() and getpath_dirname() functions no longer encode the path to UTF-8/strict, but work directly on Unicode strings. These functions now use PyUnicode_FindChar() and PyUnicode_Substring() on the Unicode path, rather than strrchr() on the encoded bytes string.
I rephrased the NEWS entry to omit function names. Is it better? I only named functions in the commit message. |
Fix the Python path configuration used to initialized sys.path at
Python startup. getpath_basename() and getpath_dirname() functions no
longer encode the path to UTF-8/strict to avoid encoding errors if it
contains surrogate characters (created by decoding a bytes path with
the surrogateescape error handler).
The functions now use PyUnicode_FindChar() and PyUnicode_Substring()
on the Unicode path, rather than strrchr() on the encoded bytes
string.