Skip to content

gh-91378: Allow subprocess pass-thru with stdout/stderr capture #32344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

pprindeville
Copy link

@pprindeville pprindeville commented Apr 5, 2022

Allow pass-thru of subprocess output even when capturing to buffers

I maintain some build wrappers that both pass the output (both stdout and stderr) through when being run interactively, but also capture logs into artifacts when being run through a CI/CD pipeline.

Being able to call proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, tee=True) would allow harvesting results as output, errors = subprocess.communicate() as well as permitting the results of the running process to be seen in real-time (some make recipes invoke docker and can take a while to complete... buffering until completion would confuse users as it might appear that the recipe has hung).

#91378, formerly:

https://bugs.python.org/issue47222

@pprindeville pprindeville requested a review from gpshead as a code owner April 5, 2022 21:08
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@pprindeville

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@pprindeville
Copy link
Author

@Andrew-Shay This should look familiar...

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  1. You're calling data.decode() on the read data, but no codec is specified. This is going to cause problems as there is never a guarantee about what an arbitrary process will actually output. A true "tee" should take the binary data it read and write the same binary data out to the destination. Which means you should use sys.stdout.buffer.write(data) rather than sys.stdout.write(data.decode()) (same for stderr).
  2. This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.
  3. What happens when tee is True but the stdout and/or stderr were not PIPE? If this would be useless, we should make it an error.
  4. Documentation and docstrings need updating (I usually leave this to last), with a practical example.

I'm not decided on if I like this feature as designed or not yet, but I've seen enough code that could use it to know that the need is real in some form or another (you gave a good practical example). So lets get this into shape and see how it looks before making a final decision if this is the desired API or should be tweaked into a different shape.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@gpshead gpshead self-assigned this Apr 6, 2022
@pprindeville
Copy link
Author

pprindeville commented Apr 6, 2022

A few things:

@gpshead Thanks for the comments.

  1. You're calling data.decode() on the read data, but no codec is specified. This is going to cause problems as there is never a guarantee about what an arbitrary process will actually output. A true "tee" should take the binary data it read and write the same binary data out to the destination. Which means you should use sys.stdout.buffer.write(data) rather than sys.stdout.write(data.decode()) (same for stderr).

Done

  1. This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.

Took a stab at this, but I'm unfamiliar with the test framework or how to debug the failure... any suggestions are appreciated.

  1. What happens when tee is True but the stdout and/or stderr were not PIPE? If this would be useless, we should make it an error.

Added a test. Please let me know if you feel it's adequate or not.

  1. Documentation and docstrings need updating (I usually leave this to last), with a practical example.

I'm not decided on if I like this feature as designed or not yet, but I've seen enough code that could use it to know that the need is real in some form or another (you gave a good practical example). So lets get this into shape and see how it looks before making a final decision if this is the desired API or should be tweaked into a different shape.

Agreed. I'll do this last.

@pprindeville pprindeville force-pushed the issue-47222 branch 2 times, most recently from 83d3e31 to c2f3e95 Compare April 7, 2022 06:17
@pprindeville
Copy link
Author

Update:

  1. This needs a unittest. Look around test_subprocess.py for ideas. Executing a child process that executes a child process to see that the expected output winds up in the right place is entirely within reason.

@gpshead Added 2 unit tests.

@pprindeville pprindeville force-pushed the issue-47222 branch 3 times, most recently from 7339b78 to 4f5b73b Compare April 7, 2022 21:41
@pprindeville
Copy link
Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@pprindeville pprindeville force-pushed the issue-47222 branch 3 times, most recently from b84889c to 7247a70 Compare June 20, 2022 00:03
@pprindeville
Copy link
Author

@gpshead Trying your change with .readline(), the test test_check_output_input_none_text seems to loop forever. Not sure why.

It's nearly identical to the preceding test test_check_output_input_none, but differs only in adding text=True.

@pprindeville pprindeville force-pushed the issue-47222 branch 2 times, most recently from 286883b to db326c7 Compare June 21, 2022 01:29
@pprindeville
Copy link
Author

For simplicity and consistency, why not just always loop reading 2000 lines at a time?

.readline(2000) bounds the read at 2000 bytes, not 2000 lines. Always doing that would be a performance regression for the most common existing users case: Where no checking for EOLs is done and the unbounded read() work is all in C with a more optimal read buffer size and result storage reallocation code.

@gpshead I could not get this to work. x86 would run out of memory, and amd64 would never exit (but get killed by a timeout). Can you please refine the suggestion?

@gpshead
Copy link
Member

gpshead commented Jun 22, 2022

I'll try and take a look this week.

@pprindeville
Copy link
Author

I'll try and take a look this week.

Thanks. I appreciate all of your efforts.

@pprindeville
Copy link
Author

I'll try and take a look this week.

Any luck?

@pprindeville
Copy link
Author

@gpshead Not sure what to do at this point.

@gpshead
Copy link
Member

gpshead commented Aug 6, 2022

Thanks for the ping. Looking into this is still on my TODO list.

@pprindeville
Copy link
Author

Should I abandon this?

@gpshead
Copy link
Member

gpshead commented Sep 17, 2022

no, leave it open. i haven't had time.

@pprindeville
Copy link
Author

Is it an option to proceed with what we have, and circle back to optimize it for WinNT?

@pprindeville
Copy link
Author

Any progress?

@gpshead
Copy link
Member

gpshead commented Nov 7, 2024

Let's not go forward with this PR, see my overall comment on the issue. Thanks for your work on it! The particular implementation in progress being done here isn't the reason why, though it is an example of why this complexity is not great to maintain within the stdlib.

@gpshead gpshead closed this Nov 7, 2024
@pprindeville
Copy link
Author

Let's not go forward with this PR, see my overall comment on the issue. Thanks for your work on it! The particular implementation in progress being done here isn't the reason why, though it is an example of why this complexity is not great to maintain within the stdlib.

I won't be contributing again.

@gst
Copy link
Contributor

gst commented Nov 9, 2024

Hi @pprindeville , this idea would be to me very welcome as an independent normal library package/module.

I personally remember having needed and implemented such kind of feature, but was using thread(s) to consume and share to other thread(s) (and be able to retain the entire output as well) the subprocess output as it arrives basically. but I wished there would have been such package available, would have me saved of that extra work.

I understand your frustration but the invoked reason is really valid actually.

Anyway, have a good weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants