New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running tests in parallel on Windows quits too soon #95027
Comments
Sample failure.
|
Probably https://websiteforstudents.com/how-to-change-system-locale-in-windows-11/ Another workaround would be using cpython/Lib/test/libregrtest/runtest_mp.py Lines 261 to 266 in 547f0bb
I'm not sure the root cause of the race condition when running |
Maybe related: gh-91227 |
See also: gh-91323 (specific to 3.11 and main) |
Multiprocessing people: due to some regression in 3.11/2, parallel tests on ma updated by otherwise pretty stock American Win10 started failing 29 days ago. They still fail today with essentially the same traceback. I believe they ran not too many months before. @pablogsal Today, sequential tests also fail by hanging for hours in test_winconsoleio, after taking 81 minutes to get that far EDIT: This is with plain |
|
The second sequential run was again fine, so forget winconsoleio. (I have no idea why the first run could have gone so badly.) Do the tests run in parallel on your non-windows machine? To me, the output pasted above reveals two bugs in the testing program.
git bisect wants a command that returns a 0/non-0 exit code. Though it would not help here, due to the fake 'success', is there a way to run regrtest and suppress printing and get an exit code instead? I looked at the 'Special runs;' options can could not find anything. I do not know git beyond the devguide chapter, so I would need some coaching even to find a good version for bisect (other than by manually downloading and re-installing earlier releases). What would be a good way to get an exit-code command/script? |
You can run git bisect manually and inspect every commit and then run |
It's unlikely to be UTF-8 vs mbcs, but could well be that it's decoding with The additional error appears to be in the
|
@zooba Did you or anyone else verify this problem on a machine other than mine? Windows or otherwise? At this point, I do not think that this should be a release blocker. If we are really concerned about tests running on installations, we need a more concerted effort to test installers. I installed the rc1 on my Macbook Air, ran the test suite both serial and parallel, and it failed both times with a malloc error. I ran test_idle alone and it hangs on the second call to a IDLE test help function. It does the same in the current 3.10 and 3.9 (prior to anything released today). (There is something odd in the debug print behavior, so I still do not know exactly where the failure is.) These two failures are different from the one reported here. |
Sorry for the delay, I do get the same result as you when testing locally, even when running tests from the build directory. Haven't got any more details than my analysis above - I'm trying again with less strict encoding errors to see what happens. |
…municating with subprocesses
…municating with subprocesses
There's one possible fix. There are likely other ways of doing it, so if someone has an approach they'd really rather see, feel free to send a different PR. |
Steve's patch seems to fix the premature quit problem in repository main (see PR). There is a separate issue that test_io fails when run in a subprocess ( |
Here's the proximate cause of the original failure report. It took weeks for me to figure this out in the background. Because, when all the tests are running, this failure shows up at an unpredictable time, and there's nothing in the Warning messages produced that says anything about which test caused it. Instead the whole test run just dies abruptly, and cascades of irrelevant errors are also produced because the runner is trying to shut down cleanly but keeps bashing into trying to delete files that are still open (a no-no on Windows) due to whatever other tests are running concurrently. It's
Changing the encoding on
|
Regression introduced recently by #94253 if I followed correctly. |
Good news, we can stop running it now I'm not sure why it's still in there. I guess the PR to remove distutils stalled on something... |
@vstinner I believe so - see my #96669 patch for one workaround, but feel free to propose something else if you prefer. Possibly UTF-8 mode is sufficient and can be set with xoption? But I think we want to override the errors as well. |
I'm not sure if this is related but we have a similar regression when running Django tests in parallel on Windows and Python 3.11
See logs. |
Via the discussion in #96669, I tried the Django test-case with the
I don't know if that's quite sufficient. Same result as reported by @felixxm. (This with 3.11b5 and 3.11rc2. Works without error with 3.10.) |
This bug tracker is for Python issue. Your issue is unrelated. Please open an issue to: https://github.com/django/django |
Hi Victor, We're maintaining Django |
Hi Victor,
This is the exact error from the opening pair of comments. As @felixxm says, we're reporting an apparent regression in a pre-release version, as we've been asked to do. Sorry if that wasn't clear from the comments. |
Thanks, but we've narrowed this one down to our own test suite. If you're reusing libregrtest in your tests, then you should get the fix automatically, but if you aren't (and I assume you're not, because pytest is miles better than libregrtest |
OK, thanks @zooba — I was hoping to get to it today, but it's on my list for tomorrow now to work out exactly which release introduced the change, and we'll open a new issue with at least reproduce steps for your consideration. |
You report an issue about a PermissionError on a sqlite database, whereas this issue is about an UnicodeDecodeError. I don't see how they could be related. Moreover, Django uses its own test runner ( Since the error message is different, please open a separated issue. Yes, it's possible that it's a Python regression, but someone has to analyze the issue to make sure that it's a Python regression, and not something else. |
OK, it'll be with you in the morning. Thanks @vstinner. |
I created issue #98219 about this annoying PermissionError. |
I can reproduce this issue on Windows on the main branch with:
Output:
|
Encodings used by libregrtest on Windows. Before commit 199ba23:
At commit 199ba23:
libregrtest now uses
|
Another difference on Windows:
|
I'm working on a fix, but first I'm trying to add a test to test_regrtest which reproduces the issue ;-) |
I wrote PR #98492 to fix the issue. |
On Windows, when the Python test suite is run with the -jN option, the ANSI code page is now used as the encoding for the stdout temporary file, rather than using UTF-8 which can lead to decoding errors.
On Windows, when the Python test suite is run with the -jN option, the ANSI code page is now used as the encoding for the stdout temporary file, rather than using UTF-8 which can lead to decoding errors. (cherry picked from commit ec1f6f5) Co-authored-by: Victor Stinner <vstinner@python.org>
On Windows, when the Python test suite is run with the -jN option, the ANSI code page is now used as the encoding for the stdout temporary file, rather than using UTF-8 which can lead to decoding errors. (cherry picked from commit ec1f6f5) Co-authored-by: Victor Stinner <vstinner@python.org>
On Windows, when the Python test suite is run with the -jN option, the ANSI code page is now used as the encoding for the stdout temporary file, rather than using UTF-8 which can lead to decoding errors. (cherry picked from commit ec1f6f5) Co-authored-by: Victor Stinner <vstinner@python.org>
I would prefer a formal review of my PR, but I merged my PR just to unblock the 3.11.0 final release (scheduled next Monday). Maybe if something can be enhanced, it can be done later. IMO this fix is better than the current situation. In short, it just restores the old behavior: encodings used before 199ba23 |
On Windows, when the Python test suite is run with the -jN option, the ANSI code page is now used as the encoding for the stdout temporary file, rather than using UTF-8 which can lead to decoding errors. (cherry picked from commit ec1f6f5) Co-authored-by: Victor Stinner <vstinner@python.org>
On Windows, when the Python test suite is run with the -jN option, the ANSI code page is now used as the encoding for the stdout temporary file, rather than using UTF-8 which can lead to decoding errors. (cherry picked from commit ec1f6f5) Co-authored-by: Victor Stinner <vstinner@python.org>
terryjreedy commentedJul 19, 2022
On my Win10 the test suite completes when run serially. But with main and 3.11, but not 3.10, it quits too soon with -j0. This has occured with both repository debug builds and installed 3.11.0b4. Failure is currently deterministic with variable details. Presence of -ugui or -uall has no apparent effect.
What happens is that roughly about 100 tests before the end, a 'regrtest worker thread' fails ('warning') with
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position <variable>: invalid start byte
. The 14 worker processes are stopped and the usual summary is given, but with an additional list of ' tests omitted <list of test names'. The test is called a 'SUCCESS'. This is followed by a traceback for SystemExit(0), followed by 1 or more tracebacks for PermissionError because a temporary test file is supposedly used by another process.Attaching a file with output starting with the initial warning fails. Will paste separately.
If this is not limited to my system, I think it should be a release blocker.
The text was updated successfully, but these errors were encountered: