New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-84436: Implement Immortal Instances #19474
base: main
Are you sure you want to change the base?
gh-84436: Implement Immortal Instances #19474
Conversation
This is ready to review, the CI is finally green. Really no idea why the newly added GC tests are failing on Windows and unfortunately I don't have a Windows machine to debug this. |
Also looping in @vstinner. Finally got around upstreaming this patch since you recently wrote about this on your C-API Improvement Docs |
My first reaction is that this shouldn't become part of the default build because most Python users will not make use of it and then it becomes pure extra overhead. However, I know for some people that it is a useful feature (e.g. pre-fork server architecture that exploits copy-on-write OS memory management). I would use it myself since I write web applications with that style. Would it be okay to make this a compile time option, disabled by default? I think in general it is a bad idea to have too many of those types of build options. It makes code maintenance and testing more difficult. Some example build variations from the past that caused issues: thread/no-threads, Unicode width, various debug options (@vstinner removed some of those). So, I'm not super excited about introducing a new build option. Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow? A couple years ago I did a "tagged pointer" experiment that used a similar bit. In that case, small integers became one machine word in size and also become immortal. Another thought: when you did your testing, were any objects made immortal? I would imagine that, by default, you could make everything immortal after initial interpreter startup. You are paying for an extra test+branch in INCREF and DECREF but for many objects (e.g. None, True, False, types) you avoid dirtying the memory/cache with writes to the reference count. |
@nascheme you should definitely join the conversation happening in the bug report of this PR https://bugs.python.org/issue40255
Exactly, this change might be a feature for CPython power users
Yeah, that's probably the best option. That's also the consensus in the bug report thread (if the change is approved)
Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).
We can indeed, I think somebody also mentioned that in the bug report. A potentially good place could be In theory we could optimize even further to reduce the perf cost. By leveraging saturated adds and conditional moves we could remove the branching instruction. I haven't explored this further since the current PR was good enough. Personally, I favor the current PR, but this could be changed to:
|
Not only that, we would need specialized buildbots to test the code base with this option activated in a bunch of supported platforms and that raises the maintainance costs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature sounds controversial, so I block it until a consensus can be reached.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@@ -183,6 +183,8 @@ PyAPI_FUNC(int) _PyUnicode_CheckConsistency( | |||
/* Interning state. */ | |||
#define SSTATE_NOT_INTERNED 0 | |||
#define SSTATE_INTERNED_MORTAL 1 | |||
#define SSTATE_INTERNED_IMMORTAL 2 | |||
#define SSTATE_INTERNED_IMMORTAL_STATIC 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you update this whatsnew topic?
Lines 951 to 953 in 79311cb
* Remove the ``PyUnicode_InternImmortal()`` function and the | |
``SSTATE_INTERNED_IMMORTAL`` macro. | |
(Contributed by Victor Stinner in :gh:`85858`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@methane will do! Trying to get this to a steady state with all the CI passing, turns out that 6 months of rebasing broke a lot of things
Will mark this as ready to review once it's there
845dc35
to
bc28cb0
Compare
Note that this PR is currently impacted by #100911 |
This is also impacted by gh-101265. |
Quick update on this PR! PEP683 has been accepted with some conditions. We've addressed all the conditions except in the two scenarios that are covered under: GH-100911 and GH-101265. It currently stands at a 1.03x Geometric Mean Regression on the Pyperformance benchmark suite when compiled with GCC11.1 and measured against the main branch. For more details and discussion, refer to: https://discuss.python.org/t/pep683-immortal-objects-updates/23382 |
Motivation
This PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this opens up future optimization opportunities that might end up making the runtime faster overall.
Implementation Details
The implementation relies on setting a bit in the reference count word. The 3 MSBs are already reserved. The 1st MSB can't be modified, otherwise, due to two's complement, the following:
if (Py_REFCNT(o) > 0)
fails on immortal instances. The 2-3 MSBs are used for the reduced GC Head (GH-7043). Therefore, we use the 4th MSB to mark an immortal instance withkImmortalBit
. This also means that if an object reaches a reference count ofkImmortalBit
, it will automatically be immortalized.Also, immortal instances will not interfere with the reference leak tooling. That is, they won't show up as leaks. This is done by guaranteeing the parity between
_Py_RefTotal++
and_Py_RefTotal--
and setting them only on non-immortal instances.https://bugs.python.org/issue40255