Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-84436: Implement Immortal Instances #19474

Draft
wants to merge 160 commits into
base: main
Choose a base branch
from

Conversation

eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo commented Apr 11, 2020

Motivation

This PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this opens up future optimization opportunities that might end up making the runtime faster overall.

Implementation Details

The implementation relies on setting a bit in the reference count word. The 3 MSBs are already reserved. The 1st MSB can't be modified, otherwise, due to two's complement, the following: if (Py_REFCNT(o) > 0) fails on immortal instances. The 2-3 MSBs are used for the reduced GC Head (GH-7043). Therefore, we use the 4th MSB to mark an immortal instance with kImmortalBit. This also means that if an object reaches a reference count of kImmortalBit, it will automatically be immortalized.

Also, immortal instances will not interfere with the reference leak tooling. That is, they won't show up as leaks. This is done by guaranteeing the parity between _Py_RefTotal++ and _Py_RefTotal-- and setting them only on non-immortal instances.

https://bugs.python.org/issue40255

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Apr 11, 2020

This is ready to review, the CI is finally green. Really no idea why the newly added GC tests are failing on Windows and unfortunately I don't have a Windows machine to debug this.

@eduardo-elizondo eduardo-elizondo changed the title [WIP] bpo-40255: Implement Immortal Instances bpo-40255: Implement Immortal Instances Apr 11, 2020
@eduardo-elizondo
Copy link
Contributor Author

Looping in @carljm and @DinoV who have pointed out some of the issues with immortal instances in the permanent generation participating in a GC collection (i.e dicts). Let me know if you have some other thoughts or ideas on this!

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Apr 11, 2020

Also looping in @vstinner. Finally got around upstreaming this patch since you recently wrote about this on your C-API Improvement Docs

Include/object.h Outdated Show resolved Hide resolved
Include/object.h Outdated Show resolved Hide resolved
Modules/gcmodule.c Outdated Show resolved Hide resolved
Modules/gcmodule.c Outdated Show resolved Hide resolved
Include/object.h Outdated Show resolved Hide resolved
Lib/test/test_gc.py Outdated Show resolved Hide resolved
@nascheme
Copy link
Member

My first reaction is that this shouldn't become part of the default build because most Python users will not make use of it and then it becomes pure extra overhead. However, I know for some people that it is a useful feature (e.g. pre-fork server architecture that exploits copy-on-write OS memory management). I would use it myself since I write web applications with that style.

Would it be okay to make this a compile time option, disabled by default? I think in general it is a bad idea to have too many of those types of build options. It makes code maintenance and testing more difficult. Some example build variations from the past that caused issues: thread/no-threads, Unicode width, various debug options (@vstinner removed some of those). So, I'm not super excited about introducing a new build option.

Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow? A couple years ago I did a "tagged pointer" experiment that used a similar bit. In that case, small integers became one machine word in size and also become immortal.

Another thought: when you did your testing, were any objects made immortal? I would imagine that, by default, you could make everything immortal after initial interpreter startup. You are paying for an extra test+branch in INCREF and DECREF but for many objects (e.g. None, True, False, types) you avoid dirtying the memory/cache with writes to the reference count.

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Apr 14, 2020

@nascheme you should definitely join the conversation happening in the bug report of this PR https://bugs.python.org/issue40255

However, I know for some people that it is a useful feature

Exactly, this change might be a feature for CPython power users

Would it be okay to make this a compile time option, disabled by default?

Yeah, that's probably the best option. That's also the consensus in the bug report thread (if the change is approved)

I think in general it is a bad idea to have too many of those types of build options.

Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).

Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow?

We can indeed, I think somebody also mentioned that in the bug report. A potentially good place could be main.c:pymain_main right after pymain_main. Let me explore that and push that change if it looks like performance a improvement!

In theory we could optimize even further to reduce the perf cost. By leveraging saturated adds and conditional moves we could remove the branching instruction. I haven't explored this further since the current PR was good enough. Personally, I favor the current PR, but this could be changed to:

/* Branch-less incref saturated at PY_SSIZE_T_MAX */
#define _Py_INC_REF(op) ({
    __asm__ (
        "addq $0x1, %[refcnt]"
        "cmovoq  %[refcnt_max], %[refcnt]"
        : [refcnt] "+r" (((PyObject *)op)->ob_refcnt)
        : [refcnt_max] "r" (PY_SSIZE_T_MAX)
    );})

/* Branch-less decref saturated at PY_SSIZE_T_MAX */
#define _Py_DEC_REF(op) ({
    Py_ssize_t tmp = ((PyObject *)op)->ob_refcnt;
    __asm__ (
        "subq $0x1, %[refcnt]"
        "addq $0x1, %[tmp]"
        "cmovoq  %[refcnt_max], %[refcnt]"
        : [refcnt] "+r" (((PyObject *)op)->ob_refcnt), [tmp] "+r" (tmp)
        : [refcnt_max] "r" (PY_SSIZE_T_MAX)
    );})

@pablogsal
Copy link
Member

pablogsal commented Apr 14, 2020

Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).

Not only that, we would need specialized buildbots to test the code base with this option activated in a bunch of supported platforms and that raises the maintainance costs.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature sounds controversial, so I block it until a consensus can be reached.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@@ -183,6 +183,8 @@ PyAPI_FUNC(int) _PyUnicode_CheckConsistency(
/* Interning state. */
#define SSTATE_NOT_INTERNED 0
#define SSTATE_INTERNED_MORTAL 1
#define SSTATE_INTERNED_IMMORTAL 2
#define SSTATE_INTERNED_IMMORTAL_STATIC 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you update this whatsnew topic?

* Remove the ``PyUnicode_InternImmortal()`` function and the
``SSTATE_INTERNED_IMMORTAL`` macro.
(Contributed by Victor Stinner in :gh:`85858`.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane will do! Trying to get this to a steady state with all the CI passing, turns out that 6 months of rebasing broke a lot of things 😅 . After which, I will go through and update all the required documentation.

Will mark this as ready to review once it's there

@eduardo-elizondo
Copy link
Contributor Author

Note that this PR is currently impacted by #100911

@ericsnowcurrently
Copy link
Member

This is also impacted by gh-101265.

@eduardo-elizondo
Copy link
Contributor Author

Quick update on this PR! PEP683 has been accepted with some conditions. We've addressed all the conditions except in the two scenarios that are covered under: GH-100911 and GH-101265. It currently stands at a 1.03x Geometric Mean Regression on the Pyperformance benchmark suite when compiled with GCC11.1 and measured against the main branch.

For more details and discussion, refer to: https://discuss.python.org/t/pep683-immortal-objects-updates/23382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet