bpo-44895: Temporarily add an extra gc.collect() call #27746
Conversation
…vestigate the refleak
Let's run this on buildbots and if it indeed resolves the leak we can do this for now instead of skipping the test. |
If you want to schedule another build, you need to add the " |
@vstinner The buildbot tests passed. This might quiet down the ci. What do you think? |
Thanks @iritkatriel for the PR, and @ambv for merging it |
GH-27753 is a backport of this pull request to the 3.10 branch. |
This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue. (cherry picked from commit 7bf28cb) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue. (cherry picked from commit 7bf28cb) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
@@ -1014,6 +1014,9 @@ def cycle(): | |||
|
|||
def test_no_hang_on_context_chain_cycle2(self): | |||
# See issue 25782. Cycle at head of context chain. |
This fix looks incorrect, tests should not depend on the GC to pass. When this happens, is a symptom of another problem.
I propose to revert this commit and investigate.
CC: @vstinner
I understand is a "temporary measure" but in my experience those are left there with no fixes more often than not. Also, I don't feel comfortable with "
temporary fixes in the release candidate.
The alternative is to disable the test. That doesn't fix the issue either.
I prefer to deactivate the test. The reason is that relying on the GC in this way at the end has global effects and can mask other issues. Is also not deterministic and can actually be an endless loop in some extreme situations involving resurrection.
This is just my opinion on this of course, If the consensus is to leave this because the test has more value, then let's leave it, but I have to say that my previous experience with these kind of fixes is that they are left there more often than not.
Don't worry, is not urgent.
Thanks a lot for the investigation and for all the work!!
I don't feel comfortable with " temporary fixes in the release candidate.
Sure, we weren't going to let this slip into RC2. The point was to make refleak tests able to catch other regressions on that branch in the mean time.
If you'd rather redo the fix as a skip instead of gc.collect()
then that's fine as well. However, from what I understood on the PR, having it run on the entire buildbot fleet for a few days would give us more confidence whether that approach to working around the refleaks is even effective.
How about we leave it as is for the weekend and remove the gc.collect()
loop on Monday?
If you'd rather redo the fix as a skip instead of
gc.collect()
then that's fine as well. However, from what I understood on the PR, having it run on the entire buildbot fleet for a few days would give us more confidence whether that approach to working around the refleaks is even effective.
I don't get what you mean by this. Why do we want to know if the approach to work around is effective? What information do we gain by this? I can understand the though that this may gives us some more light into the problem but this workaround is too intrusive to gather any conclusions from the actual problem, more then that a cycle is likely involved.
How about we leave it as is for the weekend and remove the
gc.collect()
loop on Monday?
Why do we want to know if the approach to work around is effective?
Irit wrote:
Let's run this on buildbots and if it indeed resolves the leak we can do this for now.
I misinterpreted this as "let's merge this and see" but obviously she meant the test-with-buildbots
label. Nevermind!
I'm fine with the workaround to unblock buildbots, but https://bugs.python.org/issue44895 must only be closed when the root issue is identified. regrtest test runner runs gc.collect(). regrtest -R 3:3 runs gc.collect() one more time. So it's strange that you have to add a third gc.collect() call. The worst case that I saw was a bug in a type implemented in C: https://vstinner.github.io/subinterpreter-leaks.html Calling gc.collect() worked around this bug. But I had to fix the C type (_thread.Lock) to fix the root issue. I don't think that it's the same bug here, since the leak was only seen when an interpreter was destroyed. Here the leak is seen at each loop. |
This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue.
https://bugs.python.org/issue44895
The text was updated successfully, but these errors were encountered: