Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39413: os.unsetenv() uses _wputenv() on Windows #18115

Open
wants to merge 1 commit into
base: master
from

Conversation

@vstinner
Copy link
Member

vstinner commented Jan 22, 2020

os.unsetenv() uses _wputenv() rather than SetEnvironmentVariableW():
_wputenv() updates the CRT, whereas SetEnvironmentVariableW() does
not.

Replace also lambda functions with regular functions to get named
functions, to ease debug.

https://bugs.python.org/issue39413

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

This PR is a follow-up of PR #18107 which was a implementation in pure Python. @serhiy-storchaka wrote that he would prefer an implementation in C: so here you have.

@@ -10126,6 +10126,8 @@ os_putenv_impl(PyObject *module, PyObject *name, PyObject *value)
goto error;
}

/* Use _wputenv() rather than SetEnvironmentVariableW(): _wputenv()

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

The search for "=" (line 10102) should begin at index 0, not index 1. Windows allows variable names that begin with 0, but ucrt does not allow it. For example:

>>> os.putenv('spam=', 'eggs')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: illegal environment variable name

>>> os.putenv('=spam', 'eggs')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

This comment has been minimized.

Copy link
@vstinner

vstinner Jan 22, 2020

Author Member

I noticed this surprising difference and I'm trying to understand what should be the correct behavior...

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Jan 22, 2020

Member

Does it work with '=C'?

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

Python initializes nt.environ from C _wenviron, so it follows the long-standing convention of MSVC to exclude hidden variables that begin with "=" and to disallow setting them. We still set them for per-drive working directories (e.g. "=Z:"), but we set them in the process environment only, with SetEnvironmentVariableW. For example, see win32_wchdir in this module. The drive variables are supposed to be a hidden implementation detail. CMD's set doesn't show them, except due to a parsing bug if we run set "" or set ".

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

If you want to allow setting them, then Python should initialize nt.environ from WINAPI GetEnvironmentStringsW instead of _wenviron (or at least also from GetEnvironmentStringsW , just for the hidden variables). Then if a name begins with "=", call SetEnvironmentVariableW instead of _wputenv.

This comment has been minimized.

Copy link
@vstinner

vstinner Jan 22, 2020

Author Member

I rewrote my PR to only check for "=" in os.putenv(). os.unsetenv() doesn't check the variable name: just report _wputenv() error. Example:

>>> os.unsetenv("key=")

>>> os.unsetenv("=key") 
OSError: [Errno 22] Invalid argument

>>> os.unsetenv("=c:")  
OSError: [Errno 22] Invalid argument

By the way, why does os.putenv() explicitly rejects "key=" variable name?

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Jan 22, 2020

Member

What about os.unsetenv("key=value")?

By the way, why does os.putenv() explicitly rejects "key=" variable name?

Because in "key==value" what is a key and what is a value? key=="key" and value=="=value" or key=="key=" and value=="value"?

This comment has been minimized.

Copy link
@vstinner

vstinner Jan 22, 2020

Author Member

Windows is so weird. The official documentation says:

The name of an environment variable cannot include an equal sign (=).

I tested: I can set a variable with name "zlong=" (the name contains "="). It is shown by the set command for example:

>>> import os
>>> os.putenv("zlong=", "1")
>>> os.putenv("zlon=g", "2")
>>> os.system("set")
(...)
zlon=g=2
zlong==1

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

The posix_putenv_dict_setitem call below is unnecessary with _wputenv. It copies the string before adding it to _wenviron, i.e. it does not comply with POSIX putenv.

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

I checked back to even Python 2.7 w/ msvcr90.dll, and even that old version of the CRT copies the string that's passed to putenv (via calls to _calloc_crt and strcpy_s), which I verified by noting the result from _calloc_crt and cross-checking with the address returned by getenv. The CRT didn't used to copy the string back in the 1990s, but that's ancient history.

os.unsetenv() uses _wputenv() rather than SetEnvironmentVariableW():
_wputenv() updates the CRT, whereas SetEnvironmentVariableW() does
not.

Replace also lambda functions with regular functions to get named
functions, to ease debug.
@vstinner vstinner force-pushed the vstinner:unsetenv_win3 branch from dcc4729 to b109e8c Jan 22, 2020
os_putenv_impl(PyObject *module, PyObject *name, PyObject *value)
/*[clinic end generated code: output=d29a567d6b2327d2 input=ba586581c2e6105f]*/
static PyObject*
py_win_putenv(PyObject *name, PyObject *value)

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

Is py_win_ the naming convention from now on -- as opposed to win32_?

/* Search from index 1 because on Windows starting '=' is allowed for
defining hidden environment variables. */
if (PyUnicode_GET_LENGTH(name) == 0 ||
PyUnicode_FindChar(name, '=', 1, PyUnicode_GET_LENGTH(name), 1) != -1)

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 22, 2020

Contributor

Again, this search should start at index 0, so that it consistently raises ValueError, instead of OSError if a name begins with "=".

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

Some notes on Windows environment variables:

  • There are two APIs:

    • native API: GetEnvironmentStringsW(), SetEnvironmentVariableW(), GetEnvironmentVariableW()
    • CRT API: _wenviron, _wgetenv(),_wputenv()
    • (I ignore the bytes API here on purpose, it's already too complicated :-))
  • There was a limit 32,767 characters (_MAX_ENV constant) for "key=value" string until Windows 2003: on more recent Windows, there is no technical limit: SetEnvironmentVariableW() doesn't fail for example. But "a batch file cannot set a variable that is longer than the maximum command line length."

  • The CRT API updates the CRT API and the native API, whereas native API doesn't update the CRT API.

  • "=" in name:

    • SetEnvironmentVariableW() only accepts "=" if its is a the name start: "=c:" and "=test" are accepted, "test=" or "te=st" are rejected
    • _wputenv() accepts "key=" and "te=st", but rejects "=c:" and "=test": "=" is not rejected if it's at the start of the name
  • empty name: SetEnvironmentVariableW() and _wputenv("", "") fails with Windows error 87 (invalid argument)

  • set shell command doesn't display variable names with a name starting with "=", but names with "=" in other places. My bet is that set uses the CRT API.

  • Python and variable names containing "=": Python is wrong. It parses the variable "victor=" with value "secret" as name="victor" and value="=secret" :-( convertenviron() of posixmodule.c searchs the first equal character, not the last.

@serhiy-storchaka

This comment has been minimized.

Copy link
Member

serhiy-storchaka commented Jan 22, 2020

There are also "secure" (or "safe"?) (but not thread-safe) _wgetenv_s() anf _wputenv_s(). _wputenv_s() is more convenient than _wputenv() because you do not need to concatenate key and value for it.

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

Python and variable names containing "=": Python is wrong. It parses the variable "victor=" with value "secret" as name="victor" and value="=secret" :-( convertenviron() of posixmodule.c searchs the first equal character, not the last.

I created https://bugs.python.org/issue39420 to track this issue.

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

I'm thinking aloud.

If we switch Python to use the native API (SetEnvironmentVariableW()) instead of the CRT API (_wputenv()), C libraries run inside the Python process and using the CRT API (getenv() or _wgetenv()) would no longer see "updated" environment variables. For example, I guess that OpenSSL can read environment variable after Python start. It can be an issue if it happens. The safe option is to stick to the CRT API.

To support "=" at the beginning of the variable name, one option would be to use the native API (SetEnvironmentVariableW()), or the CRT API (_wputenv()) otherwise.

I tested _wgetenv(). It has a weird behavior :-) If I set a variable with _wputenv("victor=", "secret"), _wgetenv("victor") returns =secret and _wgetenv("victor=") returns secret!

_wgetenv("=c:") fails with "OSError: [WinError 6] Descripteur non valide".

@eryksun

This comment has been minimized.

Copy link
Contributor

eryksun commented Jan 22, 2020

There was a limit 32,767 characters (_MAX_ENV constant) for "key=value" string until Windows 2003: on more recent Windows, there is no technical limit: SetEnvironmentVariableW() doesn't fail for example.

The consequence for this is that in Vista and later GetEnvironmentStringsW now acquires the PEB lock to snapshot a copy of the current environment block. Otherwise the result would be worthless and dangerous, since the environment block can be reallocated to grow without limit.

_wputenv() accepts "key=" and "te=st", but rejects "=c:" and "=test": "=" is not rejected if it's at the start of the name

"=" is rejected by _wputenv if it's at the start of the name. It's allowed with "te=st" because Windows allows "=" in the value, e.g. the latter translates to SetEnvironmentVariableW(L"te", L"=st").

set shell command doesn't display variable names with a name starting with "=", but names with "=" in other places. My bet is that set uses the CRT API.

CMD actually implements this all on its own. And it has a parsing bug that leads to displaying the 'hidden' variables if you run set "" and set ".

Python and variable names containing "=": Python is wrong. It parses the variable "victor=" with value "secret" as name="victor" and value="=secret" :-( convertenviron() of posixmodule.c searchs the first equal character, not the last.

Python is right.

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

CMD actually implements this all on its own. And it has a parsing bug that leads to displaying the 'hidden' variables if you run set "" and set ".

Oh right, nice :-)

vstinner@WIN C:\vstinner\python\master>set ""
=C:=C:\vstinner\python\master
=ExitCode=00000001
(...)
@eryksun

This comment has been minimized.

Copy link
Contributor

eryksun commented Jan 22, 2020

_wputenv("victor=", "secret"), _wgetenv("victor") returns =secret and _wgetenv("victor=") returns secret!

Do you mean _wputenv(L"victor==secret")? The call only has one argument. In that case, yes, _wgetenv has a bug here. It will naively match "=" as part of the name (e.g. "victor=") instead of splitting the (name, value) on the first "=" and only matching the name, or better yet, it could simply fail the call if the given name contains "=".

@serhiy-storchaka

This comment has been minimized.

Copy link
Member

serhiy-storchaka commented Jan 22, 2020

_wputenv() takes a single argument.

int _wputenv(
   const wchar_t *envstring
);

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/putenv-wputenv?view=vs-2019#syntax

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

GetEnvironmentStringsA() and GetEnvironmentStringsW() result also contain variables with name starting with "=", like =C or =ExitCode variables. But I misunderstood the doc... which looks wrong as well... A variable name and value are separated by "=", not by a null character!

It seems like internally, Windows environment is a list of "name=value\0" strings. Variable name and value seem to be stored together as a single string.

Moreover, after _wputenv("victor=secret"):

  • _wgetenv("victor") and GetEnvironmentVariable("victor") return "=secret"
  • _wgetenv("victor=") and GetEnvironmentVariable("victor=") returns "secret"

... in short, it doesn't seem possible to distinguish if "=" character belongs to the name or the value, except for the special case of name starting with "=".

SetEnvironmentVariableW() allows a name starting with "=", but "=" is not allowed elsewhere in the same. SetEnvironmentVariableW("=a=b", "c") fails for example (WinError 87).

The problem comes from _wputenv() API which takes a single argument.

@vstinner

This comment has been minimized.

Copy link
Member Author

vstinner commented Jan 22, 2020

I failed to find CRT source code in my VS 2017 Community install. VC\crt\src directory is missing.

But I found ReactOS implementation which gives me an idea on how it can be implemented and are the constraints: https://doxygen.reactos.org/df/def/putenv_8c.html#adea7d1267d47562084ac102de11a11fb

  • CRT provides char **environ and char **_wenviron which are separated from the Windows envrionement. In short, technically there 3 environment! The CRT API tries to maintain the 3 consistent.
  • CRT copies the string: Python doesn't have to manage the variable memory!
@eryksun

This comment has been minimized.

Copy link
Contributor

eryksun commented Jan 22, 2020

It seems like internally, Windows environment is a list of "name=value\0" strings. Variable name and value seem to be stored together as a single string.

A valid environment block is also terminated by a final null character.

Moreover, after _wputenv("victor=secret"):

* _wgetenv("victor") and GetEnvironmentVariable("victor") return "=secret"

These are correct.

* _wgetenv("victor=") and GetEnvironmentVariable("victor=") returns "secret"

The _wgetenv result is a bug that I already confirmed that in a previous comment.

The GetEnvironmentVariable result is likely a similar error due to naive name matching. I think it was introduced circa Vista, and the old code from 2003 and earlier did it right. I'll have to check the ReactOS implementation of RtlQueryEnvironmentVariable_U (replaced in Vista by RtlQueryEnvironmentVariable) to see how ReactOS reverse-engineered behavior that's supposed to be compatible with Server 2003. If their code properly verifies the name match (see below), then it's a bug from Vista. They rewrote much of the environment code in Vista. It wouldn't surprise me if they made naive assumptions.

The problem comes from _wputenv() API which takes a single argument.

There is no problem with _wputenv and SetEnvironmentVariableW allowing values that contain "=". When parsing an environment block, if a name-value pair starts with "=", then "=" must be the first character of the name, because all variables must have a name. The first "=" other than the first character must be the value delimiter because all variables must have a value. Any "=" character after that must be part of the value.

@eryksun

This comment has been minimized.

Copy link
Contributor

eryksun commented Jan 22, 2020

I failed to find CRT source code in my VS 2017 Community install. VC\crt\src directory is missing.

Look in "%ProgramFiles(x86)%\Windows Kits\10\Source\10.0.[version]\ucrt\env".

@eryksun

This comment has been minimized.

Copy link
Contributor

eryksun commented Jan 22, 2020

I'll have to check the ReactOS implementation of RtlQueryEnvironmentVariable_U (replaced in Vista by RtlQueryEnvironmentVariable) to see how ReactOS reverse-engineered behavior that's supposed to be compatible with Server 2003.

The ReactOS implementation of RtlQueryEnvironmentVariable_U looks correct to me, so I think the bug is in Vista's new RtlQueryEnvironmentVariable function.

In the loop body, they first increment wcs to skip the first character of the name, and then search for "=" via wcschr(wcs, L'='). This is the delimiter, and everything after it up to the null is val. They slice out the name, not including the "=", into var and compare this to the target Name via RtlEqualUnicodeString. If it matches, they copy val to the target Value string, if it's big enough. This could be optimized to skip comparing names that are the wrong size, but it otherwise seems okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.