New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-40255: Implement Immortal Instances #19474
base: main
Are you sure you want to change the base?
bpo-40255: Implement Immortal Instances #19474
Conversation
This is ready to review, the CI is finally green. Really no idea why the newly added GC tests are failing on Windows and unfortunately I don't have a Windows machine to debug this. |
Also looping in @vstinner. Finally got around upstreaming this patch since you recently wrote about this on your C-API Improvement Docs |
Include/object.h
Outdated
|
||
static inline int _Py_IsImmortal(PyObject *op) | ||
{ | ||
return (op->ob_refcnt & kImmortalBit) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move these functions to the private API as well (Include/internal/pycore_gc.h
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do!
My first reaction is that this shouldn't become part of the default build because most Python users will not make use of it and then it becomes pure extra overhead. However, I know for some people that it is a useful feature (e.g. pre-fork server architecture that exploits copy-on-write OS memory management). I would use it myself since I write web applications with that style. Would it be okay to make this a compile time option, disabled by default? I think in general it is a bad idea to have too many of those types of build options. It makes code maintenance and testing more difficult. Some example build variations from the past that caused issues: thread/no-threads, Unicode width, various debug options (@vstinner removed some of those). So, I'm not super excited about introducing a new build option. Is it possible we can leverage this extra status bit on objects to recover the lost performance somehow? A couple years ago I did a "tagged pointer" experiment that used a similar bit. In that case, small integers became one machine word in size and also become immortal. Another thought: when you did your testing, were any objects made immortal? I would imagine that, by default, you could make everything immortal after initial interpreter startup. You are paying for an extra test+branch in INCREF and DECREF but for many objects (e.g. None, True, False, types) you avoid dirtying the memory/cache with writes to the reference count. |
@nascheme you should definitely join the conversation happening in the bug report of this PR https://bugs.python.org/issue40255
Exactly, this change might be a feature for CPython power users
Yeah, that's probably the best option. That's also the consensus in the bug report thread (if the change is approved)
Yeah that's one of the drawbacks. That being said, I can help with setting up the travis build to integrate this change if needed (cc @vstinner).
We can indeed, I think somebody also mentioned that in the bug report. A potentially good place could be In theory we could optimize even further to reduce the perf cost. By leveraging saturated adds and conditional moves we could remove the branching instruction. I haven't explored this further since the current PR was good enough. Personally, I favor the current PR, but this could be changed to:
|
Not only that, we would need specialized buildbots to test the code base with this option activated in a bunch of supported platforms and that raises the maintainance costs. |
This feature sounds controversial, so I block it until a consensus can be reached.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Hi @methane , let me provide a few more details, but I am not able to provide the full process, because I am obliged to keep some business secret in my previous role.
These are all I remembered. I am no longer at that company. I could contact them if you want to hear more details from them. Currently they have to work with Python 3.8 with @eduardo-elizondo (modified) solution - which ends in 2024. I just want to let people reviewing this PR know there are real world needs for this solution. (Even if, in some case, there are sophisticated ways to arrange our huge data, it is not always obvious, and potentially complicate the whole codebase a lot). |
@ydshieh Thank you for your detailed information. Although I think building 100GB trie on top of Python dict is bad idea, it's one of Python application.
Of course, re-fork on each request is bad idea. Common web applications recreate worker process on each 100~1000 requests. (And loading application after fork is much safe. e.g.
Dicts have two memory blocks. One is PyObject part and it is 48bytes. Another is variable sized hashtable. When dict is very small (but not empty), hashtable is 160bytes (len(d) <= 5) or 280bytes (5 < len(d) <= 10). INCREF/DECREF changes only PyObject part. Assuming all dict in your trie is very small, at most 64/(64+160) = 29% memory would be copied to each processes. Again, thanks for you for your information. Such information is needed to make this kind of estimation. FWIW, there is another pull request about memory usage. |
@eduardo-elizondo Would you merge main branch to all these pull requests? I am worrying about memory leak and performance regression. That's why I suggested to make this configure option at the mailing list. @vstinner added leak test recently. So merging main branch will solve my worrying. I am OK to merge this without configure option if no leaks. (I am not SC member so this is just a my personal request). |
@methane Thank you for the reply. I would like to provide a few more information/remarks, and some questions (just take this chance to learn a bit more internal) if you have time to answer.
My questions (if you have time to give a CPython newbie some lessons)
Final personal thoughts There might be alternate solutions to avoid this CoW issue, but:
Thank you again! |
If average dict size is larger, memory copied would become smaller than 29%.
PyObject part contains fixed size data and metadata that all Python objects have. The refcount is contained here. So CoW happened here. Another part contains variable-sized data. This includes pointer to keys and values. This part don't have refcount of Python object. So this pull request doesn't affect it and CoW won't happen here until you modify the dict.
When getting item from value, Let's stop here. It is too offtopic. |
@eduardo-elizondo Thank you for updating this. |
This reverts commit 5af0167.
I would not assume that the trie code is using the builtin dict
implementation. It may be that they are using tree-related code, and their
keys are mostly the sort of string that gets interned - and therefore no
longer has reference count churn.
…On Mon, Feb 28, 2022, 6:14 AM Inada Naoki ***@***.***> wrote:
Assuming all dict in your trie is very small: This is not our case.
If average dict size is larger, memory copied would become smaller than
29%.
- 64/(64+160): what 64 represents here, and what this formula computes?
sizeof(PyDictObject) + sizeof(PyGCHeader).
- One is PyObject part. Another is variable sized hashtable.: What are
the roles of these 2 parts?
PyObject part contains fixed size data and metadata that all Python
objects have. The refcount is contained here. So CoW happened here.
Another part contains variable-sized data. This includes pointer to keys
and values. This part don't have refcount of Python object. So this pull
request doesn't affect it and CoW won't happen here until you modify the
dict.
- INCREF/DECREF changes only PyObject part. What happen when a read OP
is performed, i.e. when we call a_dict[key], in terms of these 2
parts? In particular, what things are copied?
When getting item from value, PyObject* is retrieved from hashtable, and
refcount of the value (not contained in the hashtable) is incremented. In
your case, the value is very likly another dict.
Let's stop here. It is too offtopic.
—
Reply to this email directly, view it on GitHub
<#19474 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4DCW6CWWOT7Q6IPEBNMWDU5NKIRANCNFSM4MGCCFAQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
This reverts commit 25fd52a.
Motivation
This PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this opens up future optimization opportunities that might end up making the runtime faster overall.
Implementation Details
The implementation relies on setting a bit in the reference count word. The 3 MSBs are already reserved. The 1st MSB can't be modified, otherwise, due to two's complement, the following:
if (Py_REFCNT(o) > 0)
fails on immortal instances. The 2-3 MSBs are used for the reduced GC Head (GH-7043). Therefore, we use the 4th MSB to mark an immortal instance withkImmortalBit
. This also means that if an object reaches a reference count ofkImmortalBit
, it will automatically be immortalized.Also, immortal instances will not interfere with the reference leak tooling. That is, they won't show up as leaks. This is done by guaranteeing the parity between
_Py_RefTotal++
and_Py_RefTotal--
and setting them only on non-immortal instances.https://bugs.python.org/issue40255