Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile regression in 3.11 beta: ignores utf8 flag and removes it from existing entries #95463

Closed
obfusk opened this issue Jul 30, 2022 · 8 comments
Assignees
Labels
3.11 3.12 release-blocker type-bug An unexpected behavior, bug, or error

Comments

@obfusk
Copy link
Contributor

obfusk commented Jul 30, 2022

Bug report

apksigcopier CI started failing on 3.11 beta because of this seemingly unnecessary change in #32007 (which added fallback encoding support):

     def _encodeFilenameFlags(self):
         try:
-            return self.filename.encode('ascii'), self.flag_bits
+            return self.filename.encode('ascii'), self.flag_bits & ~_MASK_UTF_FILENAME
         except UnicodeEncodeError:

which results in:

  • the utf8 flag in ZipInfo.flag_bits (whether set manually or from an existing entry that already had it) being completely ignored (i.e. set/unset based on whether the file name can be encoded in ascii or requires utf8, regardless of the current value) when file headers and the central directory are written, whereas before it was already being set when required but not unset if not required (and having ascii filenames with the utf8 flag set is completely fine and common, but now impossible);
  • resultingly, removal of the utf8 flag from existing zip entries (with names that can be encoded in ascii but have it set nonetheless) in the central directory (though presumably not from the individual file headers for unmodified existing entries, making them inconsistent with the central directory), e.g. when appending to an existing zip file.

Which resulted in zip files that were no longer bitwise identical, making the existing lack of support for creating reproducible/deterministic zip files even worse than before.

Please revert this change; I'm happy to make a PR for that.

Your environment

  • CPython versions tested on: 3.11.0-beta.5 (Ubuntu 20.04.4, GitHub actions), 3.11.0-beta.4 (Debian testing/unstable)
  • Operating system and architecture: see above (both amd64)
@obfusk
Copy link
Contributor Author

obfusk commented Aug 15, 2022

NB: this regression also affects f-droid (which uses apksigcopier for reproducible builds).

cc @eighthave

@pablogsal
Copy link
Member

pablogsal commented Aug 15, 2022

CC: @gpshead as you reviewed the PR.

Unless someone has a good reason not to, I plan to revert b5cf7374d79aae191c1e38d0959527e0bbb7a95e0df5b125a8b1056cc2c54851L483 tomorrow.

@pablogsal
Copy link
Member

pablogsal commented Aug 15, 2022

@obfusk Thanks for the very detailed report ❤️

@gpshead
Copy link
Member

gpshead commented Aug 15, 2022

Yes, good writeup. I'm fine with a revert of PR #32007's a25a985535ccbb7df8caddc0017550ff4eae5855. (I think you mispasted something else that isn't a commit id in your message pablogsal)

It seems more is needed here in order to preserve existing bits.

@gpshead
Copy link
Member

gpshead commented Aug 15, 2022

Oh I see, you were linking just to the particular one line of the change. :) If undoing just line itself is the fix, great. It sounds like we don't have test coverage for this situation though.

@obfusk
Copy link
Contributor Author

obfusk commented Aug 16, 2022

Oh I see, you were linking just to the particular one line of the change. :) If undoing just line itself is the fix, great. It sounds like we don't have test coverage for this situation though.

Yes, reverting just the single line changed in _encodeFilenameFlags (the diff in my report above) should be the fix.

I can confirm that monkey-patching _encodeFilenameFlags to the code from before this change makes apksigcopier CI succeed again.

Of course, adding additional test coverage to prevent regressions like this in the future would certainly be nice IMO.

Thanks for getting this fixed before 3.11 is released!

pablogsal added a commit to pablogsal/cpython that referenced this issue Aug 18, 2022
pablogsal added a commit to pablogsal/cpython that referenced this issue Aug 18, 2022
…ASK_UTF_FILENAME flags in bpo-28080

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 18, 2022
…ASK_UTF_FILENAME flags in bpo-28080 (pythonGH-96072)

Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 9d066e2)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
miss-islington pushed a commit that referenced this issue Aug 18, 2022
…F_FILENAME flags in bpo-28080 (GH-96072)

Automerge-Triggered-By: GH:pablogsal
miss-islington added a commit that referenced this issue Aug 19, 2022
…F_FILENAME flags in bpo-28080 (GH-96072)

Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 9d066e2)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@pablogsal
Copy link
Member

pablogsal commented Sep 5, 2022

This seems to be fixed, so I am closing it. Please re-open you think we missed something.

@obfusk
Copy link
Contributor Author

obfusk commented Sep 24, 2022

apksigcopier CI is green again with 3.11rc2 and the workaround removed, so seems to be fixed — as expected — indeed.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 3.12 release-blocker type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

5 participants