Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve importlib backports-upstream integration #129307

Open
FFY00 opened this issue Jan 26, 2025 · 2 comments
Open

Improve importlib backports-upstream integration #129307

FFY00 opened this issue Jan 26, 2025 · 2 comments
Labels
infra CI, GitHub Actions, buildbots, Dependabot, etc. topic-importlib

Comments

@FFY00
Copy link
Member

FFY00 commented Jan 26, 2025

The current status-quo when it comes to the development integration/synchronization between the importlib backports and the CPython upstream isn't optimal.

Before anything else, I must properly acknowledge @jaraco's monumental and tireless effort on maintaining the importlib backports, and handling the complex synchronization with the CPython upstream, not to mention the continued development of these modules. It has been instrumental to get things to the state they are today, and none of the issues discussed in this thread should reflect negatively on him, but rather our failure to ensure these projects got the resources they need — a far too common tale in open-source.

Here are some issues I think we should improve:

  • Synchronization process — even though @jaraco has left comments in some PRs describing his workflow, there's no properly documented process
  • Authorship stripping — the current way changes are synced in and from the backports strip commit authorship
  • Documentation fragmentation, resulting in a sub-optimal documentation
  • CLA enforcement — the backports do not enforce the CLA
  • Segmented development workflow — issues and changes happen in both places
  • Source history — the current way changes are synced in and from the backports strip commit history

cc @python/importlib-team

@FFY00 FFY00 added infra CI, GitHub Actions, buildbots, Dependabot, etc. topic-importlib labels Jan 26, 2025
@FFY00
Copy link
Member Author

FFY00 commented Jan 26, 2025

I would like to propose officially defining a development upstream, and enforcing it.

The solution that I think would more cleanly handle fragmentation, history, authorship, and CLA issues, is to select CPython as the upstream. An approach to implement this could be to track the backport version here, and when updated, have CI automation to update the backport repos, just like we do to backport to older Python versions.

While I think that's cleaner, it is a major change to how these modules are currently developed, and the implementation might be too complex, so I think it's more likely for us to go with the backports as the upstream. If so, there are a couple things I think we should do:

  • Add issue tags for each backport, which, once assigned, would cause the issue to be moved to the correct repo
  • Prevent PRs from changing the code (with a manual override)
  • Implement some level of automation for the synchronization operation, preserving commit authorship (perhaps even history)
  • Document the synchronization process
  • Have the backports maintain a copy of the correspondent CPython documentation, instead of having its own
  • Add the CLA bot to the backports

@AA-Turner
Copy link
Member

select CPython as the upstream

I think that this makes sense, especially as both of the importlib modules are no longer "provisional". Useful parallels can be drawn with PEP 360, which used to record "externally maintained" packages, and was updated in 2006 to say:

It has been deemed dangerous to codify external maintenance of any code checked into Python’s code repository. Code contributors should expect Python’s development methodology to be used for any and all code checked into Python’s code repository.

Another parallel is the changes to Pathlib that Barney has recently been making. He published the pathlib-abc package as a backport/preview, rather than primary development being in that package.

Whilst having a brief look at the history, I found that Jason noted in a comment from a few years ago that:

The advantage of having the module in the standard library is that at some point, the pace of change should slow and the stdlib can become the primary/only use.


The other two recently externally-developed modules seem to be tomli and zoneinfo (please let me know if I'm missing any).

Three of the four PRs to tomllib since it was added as a package were to synchronise with tomli as an upstream (#128907, #126428, and #124587). These have each been quite minor, and each has been opened as an individual PR, rather than an omnibus "sync with version X" update.

zoneinfo used to have some sync PRs, but the last one seems to be four years ago (#20499), and the backport package has not been updated for two years (last release 2020-06).


There have also been some problems with synchronising the documentation, as the d.p.o documentation used to point to the backport (python/importlib_metadata#485), with one user going so far as to manipulate Sphinx internals (stefan6419846/license_tools#63) to solve this problem. Ultimately, documentation was removed from the backport package (python/importlib_metadata#466).


To Jason's quoted comment above about pace of development eventually slowing, I wonder if at some point we should seek to update the backport packages less frequently, and to mirror Python releases. There is prior art for this with zipfile3{x} packages on PyPI. This would ease the burden of the actual backporting, as it would be done less often. The backport package could also use a rebase or fast-forward merge, which would preserve authorship details.

As such, I would be in favour of this python/cpython repo being the one where new features are developed, discussed, and merged for importlib.resources and importlib.metadata (and also tomllib).

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra CI, GitHub Actions, buildbots, Dependabot, etc. topic-importlib
Projects
None yet
Development

No branches or pull requests

2 participants