Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38119: Fix shmem resource tracking #21516

Open
wants to merge 3 commits into
base: master
from

Conversation

@vinay0410
Copy link
Contributor

@vinay0410 vinay0410 commented Jul 17, 2020

Eliminates the use of the resource tracker on shared memory segments but preserves its use on semaphores, etc. where it is appropriately applied.

Adds a test that fails with the current code (an independent Python process started via subprocess accesses shared memory created by a different Python process, then exits normally, expecting the original creator process to continue to be able to access that shared memory) but passes with this patch to defend against future regressions.

https://bugs.python.org/issue38119

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Jul 17, 2020

@tiran , This is a duplicate of #16139 #15989 , which had conflicts, so I fixed them and opened this.
These changes will also fix following issues:
https://bugs.python.org/issue39959

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Jul 27, 2020

Davin (@applio) can you review and approve this? What was the hold-up with your own fix? (This is a copy that fixes the merge conflicts.) Or alternately fix the conflicts in #16139 so that can be merged?

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Jul 28, 2020

Correction, I mentioned the wrong pull request earlier.
This is a duplicate of #15989

@DamianBarabonkovQC
Copy link

@DamianBarabonkovQC DamianBarabonkovQC commented Aug 3, 2020

I would like to ask about this issue to get some more context, because maybe I may be missing something. I have noticed this exact same problem as in issue38119. Is removing the resource tracking entirely from SharedMemory the preferred way forward?

I think that there is some benefit to this resource tracking. I'm giving you a heads up that I'm currently working on a patch that resolves this issue and retains the resource tracker. I'll submit it as a PR when it is done.

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 3, 2020

We don't actually know what went on in @applio's head when he wrote this patch, and he doesn't seem to respond to email. But, @DamianBarabonkovQC, before you do more work, can you explain what you are planning to do? (Honestly I personally know nothing about the resource tracker, I am just hoping to help find a resolution of one type or another.) Probably the better place to do your write-up would be the in bpo-38119.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Aug 3, 2020

@DamianBarabonkovQC , I would say removing resource tracker is not the best way to go forward, but it is the easiest way to fix some of the major issues plaguing shared_memory. You can see my comment detailing why unix's shared memory will still be inconsistent with windows.

Windows uses a reference counting mechanism to cleanup shared memory segment, it only destroys/unlinks the shared memory segment if all the process using it are no longer running.

I have mentioned some methods, how we can implement a reference counting mechanism to make unix's implement consistent with windows here.

Having said that, there are still major issues in shared_memory which this PR will fix. Consider the scenario mentioned here.
A similar scenario has also been mentioned here.

@DamianBarabonkovQC
Copy link

@DamianBarabonkovQC DamianBarabonkovQC commented Aug 3, 2020

@vinay0410, to this matter I agree that this PR is a good fix at present. I'm just thinking forward.

I like the idea of a reference count at the start of the shared memory segment, for its simplicity and clarity. My solution involves the /tmp directory, but was was mentioned in your post, that might not always work depending on the user's settings for /tmp.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Aug 3, 2020

@DamianBarabonkovQC , Completely agree, I think for now this should be merged to fix some of the critical issues, and then we should have a discussion with core developers about a better and robust solution to tackle this problem.

@DamianBarabonkovQC
Copy link

@DamianBarabonkovQC DamianBarabonkovQC commented Aug 6, 2020

Thanks for your input @gvanrossum to this matter. Both Vinay and I agree that for the time being, a temporary patch to this issue would be to simply remove the resource tracking temporarily from the SharedMemory since it is a source of issue with the current implementation. How could we get this patch accepted?

Moving forward, I have began a discussion detailing a potential solution here. This would not be very invasive proposal, that should implement resource tracking correctly.

I would really appreciate your thoughts on this manner, since maybe my proposal is more invasive than I think, or there is a better approach.

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 6, 2020

You need to get someone to review it who actually understand shared memory. Since @tiran hasn't responded to your review request and @applio has not responded to repeated pleas, I'm not sure who to try next (I don't qualify, I don't understand this code at all). Maybe you can email @ericsnowcurrently and get him interested?

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Aug 28, 2020

Hi @gvanrossum , Thanks for suggesting possible reviewers. I dropped and email to @ericsnowcurrently 2 weeks ago, but he hasn't opened it yet. Do you have any other suggestions to get this PR reviewed, since this is relatively critical issue in multiprocessing.shared_memory.

@vinay0410 vinay0410 closed this Aug 28, 2020
@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Aug 28, 2020

reopened to re-trigger CI-CD.

@vinay0410 vinay0410 reopened this Aug 28, 2020
@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 28, 2020

@vinay0410 vinay0410 force-pushed the vinay0410:fix_shmem_resource_tracking branch from a061cd5 to 6d09ce7 Aug 28, 2020
@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Oct 3, 2020

Hi @gvanrossum , I posted on python-dev mailing list around a month ago (https://mail.python.org/archives/list/python-dev@python.org/message/O67CR7QWEOJ7WDAJEBKSY74NQ2C4W3AI/) but haven't received any response.
Are there any other next steps ?

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Oct 3, 2020

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Oct 3, 2020

Sure, I will send you a reminder on October 19th.

@lthakur007
Copy link

@lthakur007 lthakur007 commented Oct 14, 2020

Hello Team,
May I know when will this PR get merged?
Thank You!

@aeros aeros requested a review from pitrou Oct 18, 2020
@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Oct 19, 2020

Hi @gvanrossum , As suggested I am reminding you to take up review of this PR in python core dev sprint.

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Oct 19, 2020

@vinay0410 Thanks for the ping. I've heard that @pitrou should be able to help out here.

@pitrou pitrou removed the request for review from tiran Oct 19, 2020
@pitrou
Copy link
Member

@pitrou pitrou commented Oct 19, 2020

Removed @tiran as a reviewer since this is really because of the **/*sha* pattern in CODEOWNERS wrongly matches multiprocessing/shared_memory.py (the intent was to match the SHAxxx cryptographic hash algorithms).

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 19, 2020

This code was originally added in #13222. I don't think we can remove it without some discussion. I'm cc'ing @pierreglaser as well as @tomMoral and @ogrisel to see if they have better suggestions to reconcile the various expectations here.

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 19, 2020

(otherwise I'll recommend rejecting)

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Oct 19, 2020

Why reject? There is a real problem here (https://bugs.python.org/issue38119).

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 19, 2020

Well, the problem is really differing expectations. In some cases, you want shared memory segments to be reliably collected when a process dies abruptly (Unix systems don't ensure this, so you have to do it yourself). In other cases, you want shared memory segments to outlive their known consumers. The proper solution, IMHO, would be to make it configurable so that everybody can have it their own way.

In any case, removing this code blindly would produce resource leaks in existing applications.

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Oct 19, 2020

Ah, I see. First thing I've seen in this entire thread from someone who actually understands both sides of the problem. Thanks!

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Oct 20, 2020

@pitrou , @gvanrossum ,
I agree removing this code will cause problems like memory leaks.
As @pitrou suggested to make it configurable, how about keeping (track=True) as an optional argument defaulting to True. If track=False no resource tracking will be used and user will have to destruct shared memory segment manually, else resource tracking will be used as it is being used now.

Another approach is to use Windows' mechanism of cleaning shared memory. Windows keeps reference count for every shared memory segment, and only destructs when reference count becomes 0. Using this will also make Unix's mechanism consistent with current windows' mechanism.
It will be difficult to implement the configurable approach in windows, because windows will destruct shared memory segments as soon as their reference count becomes 0, even if track=False.

@tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Oct 20, 2020

The idea of the resource_tracker is to add a safety net for shared_memory. Correct clean up cannot always happen, either because processes can die abruptly or users don't realize they need to clean up. For long running machines -- such as server, or computing cluster -- these leaked resources can pile up and become a real issue. Moreover, this might be hard to debug, for instance if someone is having an error for shared memory full while it is caused by other users leaked ressources.

The resource_tracker avoids this by maybe being overly conservative but I don't think removing it completely is the right option. IMO, @pitrou's suggestion of making this configurable is probably the only way to easily allow both behaviours, while explicitly leaving the duty of cleaning the resource to the user -- adding an option that bypass the resource_tracker for users that wants to manage the shared memory them-self while still being conservative in the default case.

For the ref counting at the beginning of the file, the issue is that then, I don't see how to ensure that we detect the resource should be cleaned up if a process holding a reference on it dies.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Oct 20, 2020

@tomMoral , for reference counting at the beginning of the file, one would need resource tracker to ensure that the shared memory segment is destructed as soon as reference count becomes 0, and resource tracker will be responsible for decrementing/incrementing reference count.

@tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Oct 20, 2020

@vinay0410 The issue is with the child processes of the main process.

Only the main process is tracked by the resource_tracker. So if any of the child processes die without decrementing the ref count, how does the resource tracker knows how many child processes have incremented the ref count?

Maybe it would be possible by having an internal ref count in the resource tracker - allowing to decrement the ref count by the number of process connected to the resource_tracker that died -- but this is by no mean easy and can lead to resource leak or early destruction quite easily. So not sure if it is a solution that can work.

Just thinking out loud, maybe another solution would be to have a ref count of the number of resource_tracker that have a reference to the shared_memory? This would group the processes per block (main process and descendant), without changing the behaviour if there is only one block and having a clean up mechanism aware of the multiple block which would be tracked at the block level -- clean up (decref/unlink) once the main process dies.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Oct 20, 2020

Just thinking out loud, maybe another solution would be to have a ref count of the number of resource_tracker that have a reference to the shared_memory? This would group the processes per block (main process and descendant), without changing the behaviour if there is only one block and having a clean up mechanism aware of the multiple block which would be tracked at the block level -- clean up (decref/unlink) once the main process dies.

@tomMoral I think I concur. If the shared memory segment is passed to the child process, the reference count shouldn't be increased and it shouldn't be decreased when the child process dies. So basically, the reference count of a shared memory segment can only be increased or decreased by a process which attaches to it directly or creates it.

Is that what you were also thinking ?

Also, I think child process will have their own separate resource trackers, right ?
Because shared memory segments created by child processes are still being tracked (I think).

@tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Oct 20, 2020

Also, I think child process will have their own separate resource trackers, right ?
Because shared memory segments created by child processes are still being tracked (I think).

Unless something weird happens, the resource tracker is shared accross all descendant of a main process (see here for instance). If a child process creates a new segment, it is also tracked in the same resource_tracker.

@tomMoral I think I concur. If the shared memory segment is passed to the child process, the reference count shouldn't be increased and it shouldn't be decreased when the child process dies. So basically, the reference count of a shared memory segment can only be increased or decreased by a process which attaches to it directly or creates it.

Is that what you were also thinking ?

Actually, the issue is that passing the shared memory segment to a child process is the same as passing it to another process. The name of the segment is send and we create a new SharedMemory object. We cannot have different behavior between these 2. IMO, incrementing the refcount should be done by the ressource_tracker. This ensures that we have only one reference count per "group of processes" as the resource_tracker is shared by all these processes. And if a register event is sent for the same segment name, the refcount won't be increased.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Nov 2, 2020

@tomMoral , should I then start with implementing refcount using resource tracker approach. Are you okay with the approach ?

@tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Nov 3, 2020

I am okay with the approach of ref counting with the resource_tracker.

Just to summarize and make sure we are on the same page:

  • When creating a SharedMemory segment with create=True, reserve the space for an integer that will be the ref count.
  • Whenever a SharedMemory is instantiated, call register in the resource_tracker.
  • The resource_tracker handles incrementing the refcount and it only increments it when the name of the shared memory segment is unknown (once per group of processes). Not sure of which primitive can be used to count in an atomic way here? An option is to use a semaphore file (store the name a the beginning of the shm segment) but it feels wasteful.
  • When a group of process dies, if the segment still exists (it can have been unlinked by a direct call in another process), the refcount is decremented and if it reaches 0, the resource_tracker calls unlink on the segment.

This seems to provide a simple enough mechanism to ensure that shared memory segment are not left unattended, while providing the flexibility to have multiple unrelated processes using it.

A potential backward compat issue is with cross language application (for instance people using C program to access this shared memory). I don't know if there is a way to ensure that the refcount is not used as part of the shared memory segment.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Nov 3, 2020

Not sure of which primitive can be used to count in an atomic way here? An option is to use a semaphore file (store the name a the beginning of the shm segment) but it feels wasteful.

@tomMoral , I was thinking of using the following primitives to count in an atomic way.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

These primitives ensure cross process atomic updates. And these primitives (namely __atomic_store_n and __atomic_load_n) are already available in the python source code, used for implementing GIL.

__atomic_store_n(&((ATOMIC_VAL)->_value), NEW_VAL, ORDER))

@tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Nov 3, 2020

@vinay0410 as these are low-level C primitives, the refcount seems complicated to implement and maintain but maybe I just don't see how you would use them because the resource_tracker is implemented in pure python. What is your idea to call these primitive?

Also, do you think it is possible shift the mmap or the memoryview by a certain number of bytes to avoid writing on the refcount by mistake if it is at the beginning? Or maybe the refcount should be put at the end of the segment as the mmap and memoryview can be restricted to smaller size. Then the only danger comes from the fact that a C-program could very well overwrite the refcount but it is low level stuff so probably okay.

Note that it might be worth considering using a named semaphore (which is typically designed for such atomic counting). Adding at the end of the file the name of a semaphore and the length of the name to be able to read it. This would allow to implement everything with python (Semaphore and pickle). I can try to do a basic implem if you want.

@pitrou
Copy link
Member

@pitrou pitrou commented Nov 3, 2020

as these are low-level C primitives, the refcount seems complicated to implement and maintain

It would be easy enough to expose Python wrappers for the atomic intrinsics in the _multiprocessing module, IMHO.

Note that you need atomic increments and decrements, not mere loads and stores. This is provided in a portable way by the remarkable portable-snippets library: https://github.com/nemequ/portable-snippets/tree/master/atomic

Or maybe the refcount should be put at the end of the segment as the mmap and memoryview can be restricted to smaller size.

That sounds like the best solution, since it would maintain alignment (though I'm not sure it matters a lot).

cc @aeros, by the way.

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Nov 4, 2020

@pitrou ,

Note that you need atomic increments and decrements, not mere loads and stores.

That's correct I am planning to use, __atomic_add_fetch and __atomic_sub_fetch.

Or maybe the refcount should be put at the end of the segment as the mmap and memoryview can be restricted to smaller size.

That sounds like the best solution, since it would maintain alignment (though I'm not sure it matters a lot).

In

self._buf = memoryview(self._mmap)
, I was hoping to replace this with.

self._buf = memoryview(self._mmap)
self._ref_count = self._buf[0:4]
self._public_buf = self._buf[4:]

And then replace self._buf with self._public_buf here.

Finally I will pass self._ref_count to python functions which will atomically increment or decrement refcount implemented using c builtins.
cc: @tomMoral

@vinay0410
Copy link
Contributor Author

@vinay0410 vinay0410 commented Nov 6, 2020

Hi @pitrou and @tomMoral , I have raised a PR #23174 , implementing the above solution. Also, I have not updated tests and documentation yet. Because, I just wanted to get some initial feedback on the approach I am using to implement the above solution.

I have added functions _posixshmem.shm_inc_refcount(memoryview) and _posixshmem.shm_dec_refcount(memorview). To atomically increment and decrement integer referenced by the passed memorview object.

@aeros
Copy link
Member

@aeros aeros commented Nov 8, 2020

@tomMoral

The issue is with the child processes of the main process.

Only the main process is tracked by the resource_tracker. So if any of the child processes die without decrementing the ref count, how does the resource tracker knows how many child processes have incremented the ref count?

Maybe it would be possible by having an internal ref count in the resource tracker - allowing to decrement the ref count by the number of process connected to the resource_tracker that died -- but this is by no mean easy and can lead to resource leak or early destruction quite easily. So not sure if it is a solution that can work.

I'm not strongly familiar enough with resource_tracker to know for certain if it would work, but one possible way to handle this could be via a SIGCHLD handler that decrements the internal refcount whenever a child processes dies or is stopped. Some precaution might have to be taken though to ensure that if a handler is already set by an external program that it's restored after all of the child processes have been accounted for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.