Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-18819: tarfile: only set device fields for device files #18080

Open
wants to merge 3 commits into
base: master
from

Conversation

@wchargin
Copy link
Contributor

wchargin commented Jan 20, 2020

The GNU docs describe the devmajor and devminor fields of the tar
header struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches tarfile to do the same.

Test Plan:
Unit tests added; they fail before this commit and pass after it. Also
verified that this enables output that is bit-for-bit compatible with
GNU tar. In particular, this program now passes on my Ubuntu 16.04,
whereas it failed before this patch:

import os
import subprocess
import tarfile
import tempfile

filename = "important_data"
contents = b"The quick brown fox jumps over the lazy dog"

with tempfile.TemporaryDirectory() as tmpdir:
    os.chdir(tmpdir)
    with open(filename, "wb") as outfile:
        outfile.write(contents)
    with tarfile.open("py.tar", "x", format=tarfile.GNU_FORMAT) as outfile:
        outfile.add(filename)
    subprocess.check_call(["tar", "cf", "gnu.tar", filename])
    subprocess.check_call(["cmp", "-b", "py.tar", "gnu.tar"])

(The exact contents of the files depend on the calling user and current
time, but should always be the same across both output archives.)

wchargin-branch: tarfile-limit-device-headers

https://bugs.python.org/issue18819

The GNU docs describe the `devmajor` and `devminor` fields of the tar
header struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches `tarfile` to do the same.

Test Plan:
No tests added because none appear to exist for this module. Manually
verified that this enables output that is bit-for-bit compatible with
GNU tar. In particular, this program now passes on my Ubuntu 16.04,
whereas it failed before this patch:

```python
import os
import subprocess
import tarfile
import tempfile

filename = "important_data"
contents = b"The quick brown fox jumps over the lazy dog"

with tempfile.TemporaryDirectory() as tmpdir:
    os.chdir(tmpdir)
    with open(filename, "wb") as outfile:
        outfile.write(contents)
    with tarfile.open("py.tar", "x", format=tarfile.GNU_FORMAT) as outfile:
        outfile.add(filename)
    subprocess.check_call(["tar", "cf", "gnu.tar", filename])
    subprocess.check_call(["shasum", "-a", "256", "py.tar", "gnu.tar"])
    subprocess.check_call(["cmp", "-b", "py.tar", "gnu.tar"])
```

(The exact hashes depend on the calling user and current time but should
always be the same across both output archives.)

wchargin-branch: tarfile-limit-device-headers
@ethanfurman

This comment has been minimized.

Copy link
Member

ethanfurman commented Jan 22, 2020

The code looks good. Question, though: why use both cmp and shasum? If cmp passes aren't the files identical?

Also, the tests for tarfile are in Lib/test/test_tarfile.py. A quick perusal suggests that MemberReadTest might be a good place for your test above (appropriately modified, of course). Feel free to suggest a different class in that file, though.

@ethanfurman ethanfurman self-assigned this Jan 22, 2020
Copy link
Member

ethanfurman left a comment

This looks good. Add your test and we should be ready to commit.

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 22, 2020

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Contributor Author

wchargin left a comment

Question, though: why use both cmp and shasum? If cmp passes aren't the files identical?

Yes; I had the shasum just as a sanity check that the files weren’t
empty, but you’re right that the cmp is certainly sufficient. I’ll go
ahead and remove this from the description to avoid confusion.

Also, the tests for tarfile are in Lib/test/test_tarfile.py.

Great, thanks. Not sure how I missed this.

Add your test and we should be ready to commit.

Will add a test to test_tarfile.py, probably more like the existing
tests in that file than like the end-to-end test in the description
(which would only work on systems with GNU tar installed).

Thanks for the speedy review! I’ll add a test and get back to you.

Test Plan:
This passes as written, and fails if the previous commit is reverted.

wchargin-branch: tarfile-limit-device-headers
@wchargin

This comment has been minimized.

Copy link
Contributor Author

wchargin commented Jan 22, 2020

I’ve put the test in a new test class because it needs to run only in
the uncompressed case (or, we’d need special handling to decompress the
compressed archives, but I don’t see any benefit to doing that).

I have made the requested changes; please review again.

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 22, 2020

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from ethanfurman Jan 22, 2020
Copy link
Member

ethanfurman left a comment

Looking good. Nice tests.

Please add yourself to Misc/ACKS and we should be good to go. (Sorry for not noticing that earlier.)

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 23, 2020

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

(Per request of Ethan Furman.)

wchargin-branch: tarfile-limit-device-headers
@wchargin

This comment has been minimized.

Copy link
Contributor Author

wchargin commented Jan 23, 2020

Done; thanks!

I have made the requested changes; please review again.

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 23, 2020

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from ethanfurman Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.