gh-84436: Implement Immortal Objects #19474

eduardo-elizondo · 2020-04-11T16:51:06Z

This is the implementation of PEP683

Motivation

The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.

https://bugs.python.org/issue40255

Issue: Fixing Copy on Writes from reference counting and immortal objects #84436

eduardo-elizondo · 2020-04-11T20:22:32Z

This is ready to review, the CI is finally green. Really no idea why the newly added GC tests are failing on Windows and unfortunately I don't have a Windows machine to debug this.

eduardo-elizondo · 2020-04-11T20:23:49Z

Looping in @carljm and @DinoV who have pointed out some of the issues with immortal instances in the permanent generation participating in a GC collection (i.e dicts). Let me know if you have some other thoughts or ideas on this!

eduardo-elizondo · 2020-04-11T20:24:58Z

Also looping in @vstinner. Finally got around upstreaming this patch since you recently wrote about this on your C-API Improvement Docs

Include/object.h

Modules/gcmodule.c

Include/object.h

Lib/test/test_gc.py

nascheme · 2020-04-14T18:57:46Z

My first reaction is that this shouldn't become part of the default build because most Python users will not make use of it and then it becomes pure extra overhead. However, I know for some people that it is a useful feature (e.g. pre-fork server architecture that exploits copy-on-write OS memory management). I would use it myself since I write web applications with that style.

Would it be okay to make this a compile time option, disabled by default? I think in general it is a bad idea to have too many of those types of build options. It makes code maintenance and testing more difficult. Some example build variations from the past that caused issues: thread/no-threads, Unicode width, various debug options (@vstinner removed some of those). So, I'm not super excited about introducing a new build option.

Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow? A couple years ago I did a "tagged pointer" experiment that used a similar bit. In that case, small integers became one machine word in size and also become immortal.

Another thought: when you did your testing, were any objects made immortal? I would imagine that, by default, you could make everything immortal after initial interpreter startup. You are paying for an extra test+branch in INCREF and DECREF but for many objects (e.g. None, True, False, types) you avoid dirtying the memory/cache with writes to the reference count.

eduardo-elizondo · 2020-04-14T19:28:34Z

@nascheme you should definitely join the conversation happening in the bug report of this PR https://bugs.python.org/issue40255

However, I know for some people that it is a useful feature

Exactly, this change might be a feature for CPython power users

Would it be okay to make this a compile time option, disabled by default?

Yeah, that's probably the best option. That's also the consensus in the bug report thread (if the change is approved)

I think in general it is a bad idea to have too many of those types of build options.

Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).

Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow?

We can indeed, I think somebody also mentioned that in the bug report. A potentially good place could be main.c:pymain_main right after pymain_main. Let me explore that and push that change if it looks like performance a improvement!

In theory we could optimize even further to reduce the perf cost. By leveraging saturated adds and conditional moves we could remove the branching instruction. I haven't explored this further since the current PR was good enough. Personally, I favor the current PR, but this could be changed to:

/* Branch-less incref saturated at PY_SSIZE_T_MAX */
#define _Py_INC_REF(op) ({
    __asm__ (
        "addq $0x1, %[refcnt]"
        "cmovoq  %[refcnt_max], %[refcnt]"
        : [refcnt] "+r" (((PyObject *)op)->ob_refcnt)
        : [refcnt_max] "r" (PY_SSIZE_T_MAX)
    );})

/* Branch-less decref saturated at PY_SSIZE_T_MAX */
#define _Py_DEC_REF(op) ({
    Py_ssize_t tmp = ((PyObject *)op)->ob_refcnt;
    __asm__ (
        "subq $0x1, %[refcnt]"
        "addq $0x1, %[tmp]"
        "cmovoq  %[refcnt_max], %[refcnt]"
        : [refcnt] "+r" (((PyObject *)op)->ob_refcnt), [tmp] "+r" (tmp)
        : [refcnt_max] "r" (PY_SSIZE_T_MAX)
    );})

pablogsal · 2020-04-14T20:03:37Z

Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).

Not only that, we would need specialized buildbots to test the code base with this option activated in a bunch of supported platforms and that raises the maintainance costs.

vstinner

This feature sounds controversial, so I block it until a consensus can be reached.

bedevere-bot · 2020-04-14T21:48:43Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

arhadthedev · 2023-04-22T18:19:30Z

It seems like all the remaining test failures are due to test_urllib2, and test_urllib2net which are caused by 5c00a62.

They are. The cause is urllib.request.__all__ not having HTTPSHandler:

cpython/Lib/urllib/request.py

Lines 117 to 132 in 5c00a62

    
           __all__ = [ 
        
               # Classes 
        
               'Request', 'OpenerDirector', 'BaseHandler', 'HTTPDefaultErrorHandler', 
        
               'HTTPRedirectHandler', 'HTTPCookieProcessor', 'ProxyHandler', 
        
               'HTTPPasswordMgr', 'HTTPPasswordMgrWithDefaultRealm', 
        
               'HTTPPasswordMgrWithPriorAuth', 'AbstractBasicAuthHandler', 
        
               'HTTPBasicAuthHandler', 'ProxyBasicAuthHandler', 'AbstractDigestAuthHandler', 
        
               'HTTPDigestAuthHandler', 'ProxyDigestAuthHandler', 'HTTPHandler', 
        
               'FileHandler', 'FTPHandler', 'CacheFTPHandler', 'DataHandler', 
        
               'UnknownHandler', 'HTTPErrorProcessor', 
        
               # Functions 
        
               'urlopen', 'install_opener', 'build_opener', 
        
               'pathname2url', 'url2pathname', 'getproxies', 
        
               # Legacy interface 
        
               'urlretrieve', 'urlcleanup', 'URLopener', 'FancyURLopener', 
        
           ]

arhadthedev · 2023-04-22T18:32:13Z

~~The test failure is addressed in gh-103688.~~

Indeed, the module exports HTTPSHandler, though conditionally:

cpython/Lib/urllib/request.py

Line 1377 in a9b31ab

if hasattr(http.client, 'HTTPSConnection'):

cpython/Lib/urllib/request.py

Line 1397 in a9b31ab

__all__.append('HTTPSHandler')

http.client.HTTPSConnection, in its turn, also exported conditionally:

cpython/Lib/http/client.py

Lines 1411 to 1416 in a9b31ab

    
           try: 
        
               import ssl 
        
           except ImportError: 
        
               pass 
        
           else: 
        
               class HTTPSConnection(HTTPConnection):

I have no idea why some buildbots cannot import ssl thus failing the whole export chain.

ericsnowcurrently · 2023-04-22T19:40:25Z

congrats @eduardo-elizondo!!!

ericsnowcurrently · 2023-04-22T19:53:23Z

Thanks for all the hard work you put in and for being so very patient.

eduardo-elizondo · 2023-04-25T22:20:45Z

Three years in the making and it's finally there! Huge shoutouts to @ericsnowcurrently for working together with me throughout the years to get this all the way through, you rock!!

Also, thanks @markshannon @gvanrossum and @pablogsal for being a sounding board for ideas, reviews, and coaching on the messaging of the PR / PEP!

ydshieh · 2023-04-26T05:37:01Z

Congrats @eduardo-elizondo! 🔥

I followed this PR from the beginning, even needed to port the changes (in 2020) into python 3.9 due to the memory issue with multiprocessing for my previous company.

At some point, I felt this was not going to be in Python due to the inactivity here (especially the review).
It's incredible and amazing that you continued to finalize this work, and finally got it merged 🎉 !

This was missed in gh-19474. It matters for with a per-interpreter GIL since PyDictKeysObject.dk_refcnt breaks isolation and leads to races.

eendebakpt · 2023-10-28T19:31:08Z

@ericsnowcurrently @eduardo-elizondo One of the acceptance conditions for pep 683 was to update the pep with final benchmark results. It appears that has not been done. Are the numbers available somewhere? I want to determine whether the performance regressions reported in #109049 are due to this PR (or perhaps other PRs as well).

In discourse: pep-683-immortal-objects-using-a-fixed-refcount-round-4-last-call there is some discussion on whether to add the performance numbers or not. After that message the pep has been updated, but not the performance numbers.

ericsnowcurrently · 2023-10-30T15:25:42Z

@eduardo-elizondo, do you have time to update the PEP with final benchmark results for ea2c001?

arigo · 2023-11-01T12:01:35Z

Note: https://speed.python.org/ shows an important performance regression in maybe 1/3rd of all benchmarks, dated Apr 22, which is the date this PR was merged. The most significant I've seen so far (which, if reproduced locally, could help figure out the root cause) is unpack_sequence.

gvanrossum · 2023-11-01T15:37:43Z

@ericsnowcurrently @mdboom ^^

eduardo-elizondo · 2023-11-03T03:32:07Z

@ericsnowcurrently we actually already have the benchmark numbers here: #19474 (comment) which I ran right before merging and there's only test/lint fixes on top of that. This shows roughly a ~1.02x geometric mean regression (~1.03x on MSVC). Let me know if this is what we are looking for!

To be clear - there are will be both slower and faster benchmarks. However, we should focus on the geometric mean (rather than a single benchmark) which is our best proxy when benchmarks move in both ways. Separately, performance measurements can come up with wildly different results on different environments. For these experiments I used gcc-11.1 and MSVC v14.33 on 'lab-like' bare-metal machines which resulted in consistently reproducible results.

ericsnowcurrently · 2023-11-03T18:20:29Z

python/peps#3519

arigo · 2023-11-03T19:43:54Z

There is a discrepancy between the text in PEP 683 and the actual implementation, sections Accidental Immortality and Accidental De-Immortalizing, on 64-bit machines. If it is already mentioned somewhere, sorry about that; but it might be useful to mention that inside the PEP itself. The problem is that, unless I'm missing something, the PEP says that 64-bit machines don't have any accidental immortality problems in practice because a 64-bit or close-to-64-bit refcount never overflows; but the implementation instead starts to consider objects immortal as soon as their refcount reaches 31 bits, a much more reachable value on 64-bit machines. For example, a 2**31-entries numpy object array can be initialized with copies of the same object---this only takes 16 GB of RAM.

The issues listed in Accidental De-Immortalizing are particularly problematic in this situation. I have not tested it, but looking at the source code it seems that this kind of code would crash CPython on 64-bit machines:

make a non-immortal object 'a'.
store at least 2**31 copies of 'a' inside a data structure maintained by an older stable-ABI extension module.
store 2**31 more copies of 'a' inside a list (e.g. by calling the list() function over that data structure; at that point the total memory usage is 32-33 GB of RAM, a very reachable amount). The refcount reaches 2**32-1 and there is at least one incref beyond that that is dropped.
deallocate the data structure; as this is done in the older extension module, this decrefs 'a' to 2**31-1 and the object looses its immortal status.
deallocate the list. This will cause the refcount to be decremented 2**31 times from its previous value of 2**31-1, and crash.

Again, this is not a real bug report, because it seems that this kind of issue was considered and is supposedly passed as an acceptable trade-off for using stable-ABI modules compiled with older versions of CPython. This is more a missing documentation issue.

arigo · 2023-11-06T12:22:23Z

Note: a potential fix for this issue would be to change Py_INCREF on 64-bit platforms: instead of checking if (uint)refcount == 2**32-1, it could check if (Py_ssize_t)refcount < 0. This would remove any risk of the crash described above because no INCREF would ever skip the refcount increment before bit 63 is set. Only the more acceptable very rare leak might occur, once bit 31 is set.

The value to initialize immortal objects with would be something like 0xC0000000C0000000, which is in the middle of both 32-bit and 64-bit ranges. It takes 0x40000000 increfs or decrefs by an old stable-ABI module to accidentally make the CPython core change the refcount again (but with no risk of accidentally freeing the object because the refcount is huge).

ericsnowcurrently · 2023-11-06T16:36:26Z

CC @eduardo-elizondo

arigo · 2023-11-08T08:15:52Z

...or, change _Py_IsImmortal to (Py_ssize_t)refcount<0 on 64-bit, and then just use _Py_IsImmortal in both INCREF and DECREF on both 32- and 64-bit platforms, and be done with it? This should Always Just Work(tm) if we reasonably assume that it's completely impossible to repeat a loop 2**62 times, and initialize the immortal refcount to 2**63+2**62.

(Here I'm working with the implicit never-documented assumption that people first tried to use the 32-bit code directly on 64-bit, with the immortal value 0x3fffffffffffffff, but found that it has some performance impact on Intel. That would be because a constant value that doesn't fit 32 bits does indeed have a cost. That's why I'm suggesting here to use (Py_ssize_t)refcount < 0, which is simpler and might be even cheaper than the current 32- and 64-bit mix of arithmetic on the same refcount.)

ericsnowcurrently · 2023-11-08T14:54:55Z

@arigo, thanks for the feedback, both about speed.python.org and about accidental de-immortalization. @eduardo-elizondo has the insight we need (which I don't) in both cases, so I'll defer to what he has to say. @markshannon may be have some thoughts as well, at least about the refcount corner case.

Implement Immortal Instances

0c930b7

eduardo-elizondo requested a review from pablogsal as a code owner April 11, 2020 16:51

the-knights-who-say-ni added the CLA signed label Apr 11, 2020

bedevere-bot added the awaiting review label Apr 11, 2020

eduardo-elizondo added 9 commits April 11, 2020 09:52

Nits

7005944

Bypass immortality in NewReference

c6a1bfa

Add News and Fix MSVC Build

51e4879

Formatting Nits

cc2ece3

Typo

72d12fa

MSVC Test

f04776e

Skip test for MSVC

fa8d668

Skip test for MSVC 32 & 64

f066633

Skip all tests for Windows

36e0a9a

eduardo-elizondo changed the title ~~[WIP] bpo-40255: Implement Immortal Instances~~ bpo-40255: Implement Immortal Instances Apr 11, 2020

nascheme reviewed Apr 14, 2020

View reviewed changes

Include/object.h Outdated Show resolved Hide resolved

Include/object.h Outdated Show resolved Hide resolved

Modules/gcmodule.c Outdated Show resolved Hide resolved

pablogsal reviewed Apr 14, 2020

View reviewed changes

Modules/gcmodule.c Outdated Show resolved Hide resolved

pablogsal reviewed Apr 14, 2020

View reviewed changes

Include/object.h Outdated Show resolved Hide resolved

pablogsal reviewed Apr 14, 2020

View reviewed changes

Lib/test/test_gc.py Outdated Show resolved Hide resolved

vstinner previously requested changes Apr 14, 2020

View reviewed changes

bedevere-bot removed the awaiting review label Apr 14, 2020

bedevere-bot added the awaiting changes label Apr 14, 2020

Immortalize known immortals

2f9fa29

eduardo-elizondo requested a review from 1st1 as a code owner April 15, 2020 19:19

arhadthedev mentioned this pull request Apr 22, 2023

gh-99352: Export HTTPSHandler from urllib.request #103688

Closed

ericsnowcurrently added 🤖 automerge and removed 🤖 automerge labels Apr 22, 2023

ericsnowcurrently merged commit ea2c001 into python:main Apr 22, 2023
85 of 100 checks passed

bedevere-bot removed the awaiting merge label Apr 22, 2023

oraluben mentioned this pull request Apr 23, 2023

Support CPython 3.12 (and immortal object) alibaba/code-data-share-for-python#33

Closed

ericsnowcurrently mentioned this pull request May 6, 2023

gh-104252: Immortalize Py_EMPTY_KEYS #104253

Merged

ericsnowcurrently added a commit that referenced this pull request May 10, 2023

gh-104252: Immortalize Py_EMPTY_KEYS (gh-104253)

b8f7ab5

This was missed in gh-19474. It matters for with a per-interpreter GIL since PyDictKeysObject.dk_refcnt breaks isolation and leads to races.

mdboom mentioned this pull request Jun 7, 2023

Compare a matrix of nogil to other upstreams faster-cpython/ideas#597

Open

sunmy2019 mentioned this pull request Jul 7, 2023

dict can have the same key twice #106507

Closed

Eclips4 mentioned this pull request Aug 14, 2023

Unnecessary comment about increasing the reference count in usage of Py_None #107955

Closed

eendebakpt mentioned this pull request Sep 19, 2023

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

Open

Mause mentioned this pull request Sep 27, 2023

[PythonDev] Don't dereference None when creating pandas dataframe duckdb/duckdb#9127

Merged

gh-84436: Implement Immortal Objects #19474

gh-84436: Implement Immortal Objects #19474

eduardo-elizondo commented Apr 11, 2020 •

edited

eduardo-elizondo commented Apr 11, 2020 •

edited

eduardo-elizondo commented Apr 11, 2020

eduardo-elizondo commented Apr 11, 2020 •

edited

nascheme commented Apr 14, 2020

eduardo-elizondo commented Apr 14, 2020 •

edited

pablogsal commented Apr 14, 2020 •

edited

vstinner left a comment

bedevere-bot commented Apr 14, 2020

arhadthedev commented Apr 22, 2023

arhadthedev commented Apr 22, 2023 •

edited

ericsnowcurrently commented Apr 22, 2023

ericsnowcurrently commented Apr 22, 2023

eduardo-elizondo commented Apr 25, 2023 •

edited

ydshieh commented Apr 26, 2023

eendebakpt commented Oct 28, 2023

ericsnowcurrently commented Oct 30, 2023

arigo commented Nov 1, 2023

gvanrossum commented Nov 1, 2023

eduardo-elizondo commented Nov 3, 2023 •

edited

ericsnowcurrently commented Nov 3, 2023

arigo commented Nov 3, 2023 •

edited

arigo commented Nov 6, 2023

ericsnowcurrently commented Nov 6, 2023

arigo commented Nov 8, 2023 •

edited

ericsnowcurrently commented Nov 8, 2023

gh-84436: Implement Immortal Objects #19474

gh-84436: Implement Immortal Objects #19474

Conversation

eduardo-elizondo commented Apr 11, 2020 • edited

Motivation

eduardo-elizondo commented Apr 11, 2020 • edited

eduardo-elizondo commented Apr 11, 2020

eduardo-elizondo commented Apr 11, 2020 • edited

nascheme commented Apr 14, 2020

eduardo-elizondo commented Apr 14, 2020 • edited

pablogsal commented Apr 14, 2020 • edited

vstinner left a comment

Choose a reason for hiding this comment

bedevere-bot commented Apr 14, 2020

arhadthedev commented Apr 22, 2023

arhadthedev commented Apr 22, 2023 • edited

ericsnowcurrently commented Apr 22, 2023

ericsnowcurrently commented Apr 22, 2023

eduardo-elizondo commented Apr 25, 2023 • edited

ydshieh commented Apr 26, 2023

eendebakpt commented Oct 28, 2023

ericsnowcurrently commented Oct 30, 2023

arigo commented Nov 1, 2023

gvanrossum commented Nov 1, 2023

eduardo-elizondo commented Nov 3, 2023 • edited

ericsnowcurrently commented Nov 3, 2023

arigo commented Nov 3, 2023 • edited

arigo commented Nov 6, 2023

ericsnowcurrently commented Nov 6, 2023

arigo commented Nov 8, 2023 • edited

ericsnowcurrently commented Nov 8, 2023

eduardo-elizondo commented Apr 11, 2020 •

edited

eduardo-elizondo commented Apr 11, 2020 •

edited

eduardo-elizondo commented Apr 11, 2020 •

edited

eduardo-elizondo commented Apr 14, 2020 •

edited

pablogsal commented Apr 14, 2020 •

edited

arhadthedev commented Apr 22, 2023 •

edited

eduardo-elizondo commented Apr 25, 2023 •

edited

eduardo-elizondo commented Nov 3, 2023 •

edited

arigo commented Nov 3, 2023 •

edited

arigo commented Nov 8, 2023 •

edited