New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-84436: Implement Immortal Objects #19474
gh-84436: Implement Immortal Objects #19474
Conversation
This is ready to review, the CI is finally green. Really no idea why the newly added GC tests are failing on Windows and unfortunately I don't have a Windows machine to debug this. |
Also looping in @vstinner. Finally got around upstreaming this patch since you recently wrote about this on your C-API Improvement Docs |
My first reaction is that this shouldn't become part of the default build because most Python users will not make use of it and then it becomes pure extra overhead. However, I know for some people that it is a useful feature (e.g. pre-fork server architecture that exploits copy-on-write OS memory management). I would use it myself since I write web applications with that style. Would it be okay to make this a compile time option, disabled by default? I think in general it is a bad idea to have too many of those types of build options. It makes code maintenance and testing more difficult. Some example build variations from the past that caused issues: thread/no-threads, Unicode width, various debug options (@vstinner removed some of those). So, I'm not super excited about introducing a new build option. Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow? A couple years ago I did a "tagged pointer" experiment that used a similar bit. In that case, small integers became one machine word in size and also become immortal. Another thought: when you did your testing, were any objects made immortal? I would imagine that, by default, you could make everything immortal after initial interpreter startup. You are paying for an extra test+branch in INCREF and DECREF but for many objects (e.g. None, True, False, types) you avoid dirtying the memory/cache with writes to the reference count. |
@nascheme you should definitely join the conversation happening in the bug report of this PR https://bugs.python.org/issue40255
Exactly, this change might be a feature for CPython power users
Yeah, that's probably the best option. That's also the consensus in the bug report thread (if the change is approved)
Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).
We can indeed, I think somebody also mentioned that in the bug report. A potentially good place could be In theory we could optimize even further to reduce the perf cost. By leveraging saturated adds and conditional moves we could remove the branching instruction. I haven't explored this further since the current PR was good enough. Personally, I favor the current PR, but this could be changed to:
|
Not only that, we would need specialized buildbots to test the code base with this option activated in a bunch of supported platforms and that raises the maintainance costs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature sounds controversial, so I block it until a consensus can be reached.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
They are. The cause is Lines 117 to 132 in 5c00a62
|
Indeed, the module exports Line 1377 in a9b31ab
Line 1397 in a9b31ab
Lines 1411 to 1416 in a9b31ab
I have no idea why some buildbots cannot import |
congrats @eduardo-elizondo!!! |
Thanks for all the hard work you put in and for being so very patient. |
Three years in the making and it's finally there! Huge shoutouts to @ericsnowcurrently for working together with me throughout the years to get this all the way through, you rock!! Also, thanks @markshannon @gvanrossum and @pablogsal for being a sounding board for ideas, reviews, and coaching on the messaging of the PR / PEP! |
Congrats @eduardo-elizondo! 🔥 I followed this PR from the beginning, even needed to port the changes (in 2020) into python 3.9 due to the memory issue with multiprocessing for my previous company. At some point, I felt this was not going to be in Python due to the inactivity here (especially the review). |
@ericsnowcurrently @eduardo-elizondo One of the acceptance conditions for pep 683 was to update the pep with final benchmark results. It appears that has not been done. Are the numbers available somewhere? I want to determine whether the performance regressions reported in #109049 are due to this PR (or perhaps other PRs as well). In discourse: pep-683-immortal-objects-using-a-fixed-refcount-round-4-last-call there is some discussion on whether to add the performance numbers or not. After that message the pep has been updated, but not the performance numbers. |
@eduardo-elizondo, do you have time to update the PEP with final benchmark results for ea2c001? |
Note: https://speed.python.org/ shows an important performance regression in maybe 1/3rd of all benchmarks, dated Apr 22, which is the date this PR was merged. The most significant I've seen so far (which, if reproduced locally, could help figure out the root cause) is unpack_sequence. |
@ericsnowcurrently we actually already have the benchmark numbers here: #19474 (comment) which I ran right before merging and there's only test/lint fixes on top of that. This shows roughly a ~1.02x geometric mean regression (~1.03x on MSVC). Let me know if this is what we are looking for! To be clear - there are will be both slower and faster benchmarks. However, we should focus on the geometric mean (rather than a single benchmark) which is our best proxy when benchmarks move in both ways. Separately, performance measurements can come up with wildly different results on different environments. For these experiments I used gcc-11.1 and MSVC v14.33 on 'lab-like' bare-metal machines which resulted in consistently reproducible results. |
There is a discrepancy between the text in PEP 683 and the actual implementation, sections Accidental Immortality and Accidental De-Immortalizing, on 64-bit machines. If it is already mentioned somewhere, sorry about that; but it might be useful to mention that inside the PEP itself. The problem is that, unless I'm missing something, the PEP says that 64-bit machines don't have any accidental immortality problems in practice because a 64-bit or close-to-64-bit refcount never overflows; but the implementation instead starts to consider objects immortal as soon as their refcount reaches 31 bits, a much more reachable value on 64-bit machines. For example, a The issues listed in Accidental De-Immortalizing are particularly problematic in this situation. I have not tested it, but looking at the source code it seems that this kind of code would crash CPython on 64-bit machines:
Again, this is not a real bug report, because it seems that this kind of issue was considered and is supposedly passed as an acceptable trade-off for using stable-ABI modules compiled with older versions of CPython. This is more a missing documentation issue. |
Note: a potential fix for this issue would be to change Py_INCREF on 64-bit platforms: instead of checking if The value to initialize immortal objects with would be something like |
...or, change _Py_IsImmortal to (Py_ssize_t)refcount<0 on 64-bit, and then just use _Py_IsImmortal in both INCREF and DECREF on both 32- and 64-bit platforms, and be done with it? This should Always Just Work(tm) if we reasonably assume that it's completely impossible to repeat a loop (Here I'm working with the implicit never-documented assumption that people first tried to use the 32-bit code directly on 64-bit, with the immortal value 0x3fffffffffffffff, but found that it has some performance impact on Intel. That would be because a constant value that doesn't fit 32 bits does indeed have a cost. That's why I'm suggesting here to use (Py_ssize_t)refcount < 0, which is simpler and might be even cheaper than the current 32- and 64-bit mix of arithmetic on the same refcount.) |
@arigo, thanks for the feedback, both about speed.python.org and about accidental de-immortalization. @eduardo-elizondo has the insight we need (which I don't) in both cases, so I'll defer to what he has to say. @markshannon may be have some thoughts as well, at least about the refcount corner case. |
This is the implementation of PEP683
Motivation
The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
https://bugs.python.org/issue40255