gh-91378: Allow subprocess pass-thru with stdout/stderr capture #32344

pprindeville · 2022-04-05T21:08:50Z

Allow pass-thru of subprocess output even when capturing to buffers

I maintain some build wrappers that both pass the output (both stdout and stderr) through when being run interactively, but also capture logs into artifacts when being run through a CI/CD pipeline.

Being able to call proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, tee=True) would allow harvesting results as output, errors = subprocess.communicate() as well as permitting the results of the running process to be seen in real-time (some make recipes invoke docker and can take a while to complete... buffering until completion would confuse users as it might appear that the recipe has hung).

#91378, formerly:

https://bugs.python.org/issue47222

Issue: subprocess.Popen() should allow capturing output and sending it to stdout and stderr #91378

the-knights-who-say-ni · 2022-04-05T21:08:53Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@pprindeville

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

pprindeville · 2022-04-05T21:21:15Z

@Andrew-Shay This should look familiar...

Lib/subprocess.py

gpshead

A few things:

You're calling data.decode() on the read data, but no codec is specified. This is going to cause problems as there is never a guarantee about what an arbitrary process will actually output. A true "tee" should take the binary data it read and write the same binary data out to the destination. Which means you should use sys.stdout.buffer.write(data) rather than sys.stdout.write(data.decode()) (same for stderr).
This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.
What happens when tee is True but the stdout and/or stderr were not PIPE? If this would be useless, we should make it an error.
Documentation and docstrings need updating (I usually leave this to last), with a practical example.

I'm not decided on if I like this feature as designed or not yet, but I've seen enough code that could use it to know that the need is real in some form or another (you gave a good practical example). So lets get this into shape and see how it looks before making a final decision if this is the desired API or should be tweaked into a different shape.

bedevere-bot · 2022-04-06T04:43:06Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

pprindeville · 2022-04-06T05:56:03Z

A few things:

@gpshead Thanks for the comments.

You're calling data.decode() on the read data, but no codec is specified. This is going to cause problems as there is never a guarantee about what an arbitrary process will actually output. A true "tee" should take the binary data it read and write the same binary data out to the destination. Which means you should use sys.stdout.buffer.write(data) rather than sys.stdout.write(data.decode()) (same for stderr).

Done

This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.

Took a stab at this, but I'm unfamiliar with the test framework or how to debug the failure... any suggestions are appreciated.

What happens when tee is True but the stdout and/or stderr were not PIPE? If this would be useless, we should make it an error.

Added a test. Please let me know if you feel it's adequate or not.

Documentation and docstrings need updating (I usually leave this to last), with a practical example.

I'm not decided on if I like this feature as designed or not yet, but I've seen enough code that could use it to know that the need is real in some form or another (you gave a good practical example). So lets get this into shape and see how it looks before making a final decision if this is the desired API or should be tweaked into a different shape.

Agreed. I'll do this last.

pprindeville · 2022-04-07T20:20:59Z

Update:

This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.

@gpshead Added 2 unit tests.

pprindeville · 2022-04-07T21:42:17Z

I have made the requested changes; please review again

bedevere-bot · 2022-04-07T21:42:20Z

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

pprindeville · 2022-06-21T00:42:24Z

@gpshead Trying your change with .readline(), the test test_check_output_input_none_text seems to loop forever. Not sure why.

It's nearly identical to the preceding test test_check_output_input_none, but differs only in adding text=True.

pprindeville · 2022-06-22T20:01:15Z

For simplicity and consistency, why not just always loop reading 2000 lines at a time?

.readline(2000) bounds the read at 2000 bytes, not 2000 lines. Always doing that would be a performance regression for the most common existing users case: Where no checking for EOLs is done and the unbounded read() work is all in C with a more optimal read buffer size and result storage reallocation code.

@gpshead I could not get this to work. x86 would run out of memory, and amd64 would never exit (but get killed by a timeout). Can you please refine the suggestion?

gpshead · 2022-06-22T20:10:02Z

I'll try and take a look this week.

pprindeville · 2022-06-22T20:22:26Z

I'll try and take a look this week.

Thanks. I appreciate all of your efforts.

pprindeville · 2022-06-30T21:10:37Z

I'll try and take a look this week.

Any luck?

pprindeville · 2022-08-05T20:54:12Z

@gpshead Not sure what to do at this point.

gpshead · 2022-08-06T03:15:36Z

Thanks for the ping. Looking into this is still on my TODO list.

pprindeville · 2022-09-16T23:39:15Z

Should I abandon this?

gpshead · 2022-09-17T00:57:21Z

no, leave it open. i haven't had time.

pprindeville · 2022-10-20T16:55:10Z

Is it an option to proceed with what we have, and circle back to optimize it for WinNT?

pprindeville · 2023-01-12T05:10:08Z

Any progress?

…1378)

…H-91378)

…GH-91378)

gpshead · 2024-11-07T00:45:10Z

Let's not go forward with this PR, see my overall comment on the issue. Thanks for your work on it! The particular implementation in progress being done here isn't the reason why, though it is an example of why this complexity is not great to maintain within the stdlib.

pprindeville · 2024-11-09T05:21:03Z

Let's not go forward with this PR, see my overall comment on the issue. Thanks for your work on it! The particular implementation in progress being done here isn't the reason why, though it is an example of why this complexity is not great to maintain within the stdlib.

I won't be contributing again.

gst · 2024-11-09T13:10:51Z

Hi @pprindeville , this idea would be to me very welcome as an independent normal library package/module.

I personally remember having needed and implemented such kind of feature, but was using thread(s) to consume and share to other thread(s) (and be able to retain the entire output as well) the subprocess output as it arrives basically. but I wished there would have been such package available, would have me saved of that extra work.

I understand your frustration but the invoked reason is really valid actually.

Anyway, have a good weekend.

pprindeville requested a review from gpshead as a code owner April 5, 2022 21:08

bedevere-bot added the awaiting review label Apr 5, 2022

the-knights-who-say-ni added the CLA not signed label Apr 5, 2022

gst reviewed Apr 5, 2022

View reviewed changes

Lib/subprocess.py Outdated Show resolved Hide resolved

pprindeville force-pushed the issue-47222 branch from fad39ca to dd6eebf Compare April 6, 2022 04:17

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Apr 6, 2022

pprindeville force-pushed the issue-47222 branch from dd6eebf to 5fe4f54 Compare April 6, 2022 04:21

gpshead added the type-feature A feature request or enhancement label Apr 6, 2022

gpshead requested changes Apr 6, 2022

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Apr 6, 2022

gpshead self-assigned this Apr 6, 2022

pprindeville force-pushed the issue-47222 branch from 5fe4f54 to 21e26ae Compare April 6, 2022 05:43

pprindeville force-pushed the issue-47222 branch 2 times, most recently from 83d3e31 to c2f3e95 Compare April 7, 2022 06:17

pprindeville force-pushed the issue-47222 branch 3 times, most recently from 7339b78 to 4f5b73b Compare April 7, 2022 21:41

bedevere-bot added awaiting change review and removed awaiting changes labels Apr 7, 2022

bedevere-bot requested a review from gpshead April 7, 2022 21:42

pprindeville force-pushed the issue-47222 branch from 4f5b73b to 88c4250 Compare April 7, 2022 22:09

pprindeville force-pushed the issue-47222 branch 3 times, most recently from b84889c to 7247a70 Compare June 20, 2022 00:03

pprindeville force-pushed the issue-47222 branch 2 times, most recently from 286883b to db326c7 Compare June 21, 2022 01:29

ezio-melotti removed the CLA signed label Jul 13, 2022

pprindeville force-pushed the issue-47222 branch from db326c7 to 8692044 Compare January 12, 2023 06:36

pprindeville force-pushed the issue-47222 branch from 8692044 to 74400d4 Compare April 7, 2023 18:27

pprindeville added 5 commits April 7, 2023 15:17

bpo-47222: avoid ResourceWarnings about unclosed files (pythonGH-91378)

b4bdfa3

bpo-47222: Avoid .rstrip() where exact output is expected (pythonGH-9…

9e0464e

…1378)

bpo-47222: Refactor stdout/stderr pipe handler into function (pythonG…

70e603a

…H-91378)

bpo-47222: Add test coverage for user-specified read_callback (python…

becf1f6

…GH-91378)

bpo-47222: Allow pass-thru with stdout/stderr capture (pythonGH-91378)

7f1ef61

pprindeville force-pushed the issue-47222 branch from 74400d4 to 7f1ef61 Compare April 7, 2023 21:17

pprindeville mannequin mentioned this pull request Jul 8, 2024

subprocess.Popen() should allow capturing output and sending it to stdout and stderr #91378

Closed

gpshead closed this Nov 7, 2024

Uh oh!

gh-91378: Allow subprocess pass-thru with stdout/stderr capture #32344

gh-91378: Allow subprocess pass-thru with stdout/stderr capture #32344

Uh oh!

Conversation

pprindeville commented Apr 5, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Allow pass-thru of subprocess output even when capturing to buffers

Uh oh!

the-knights-who-say-ni commented Apr 5, 2022

CLA Missing

Uh oh!

pprindeville commented Apr 5, 2022

Uh oh!

Uh oh!

gpshead left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Apr 6, 2022

Uh oh!

pprindeville commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pprindeville commented Apr 7, 2022

Uh oh!

pprindeville commented Apr 7, 2022

Uh oh!

bedevere-bot commented Apr 7, 2022

Uh oh!

pprindeville commented Jun 21, 2022

Uh oh!

pprindeville commented Jun 22, 2022

Uh oh!

gpshead commented Jun 22, 2022

Uh oh!

pprindeville commented Jun 22, 2022

Uh oh!

pprindeville commented Jun 30, 2022

Uh oh!

pprindeville commented Aug 5, 2022

Uh oh!

gpshead commented Aug 6, 2022

Uh oh!

pprindeville commented Sep 16, 2022

Uh oh!

gpshead commented Sep 17, 2022

Uh oh!

pprindeville commented Oct 20, 2022

Uh oh!

pprindeville commented Jan 12, 2023

Uh oh!

gpshead commented Nov 7, 2024

Uh oh!

pprindeville commented Nov 9, 2024

Uh oh!

gst commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pprindeville commented Apr 5, 2022 •

edited by bedevere-bot

Loading

pprindeville commented Apr 6, 2022 •

edited

Loading

gst commented Nov 9, 2024 •

edited

Loading