bpo-44895: Temporarily add an extra gc.collect() call #27746

iritkatriel · 2021-08-12T16:59:52Z

This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue.

https://bugs.python.org/issue44895

…vestigate the refleak

iritkatriel · 2021-08-12T17:00:59Z

Let's run this on buildbots and if it indeed resolves the leak we can do this for now instead of skipping the test.

bedevere-bot · 2021-08-12T17:01:19Z

🤖 New build scheduled with the buildbot fleet by @iritkatriel for commit a6a8b1c 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

iritkatriel · 2021-08-13T08:50:42Z

@vstinner The buildbot tests passed. This might quiet down the ci. What do you think?

miss-islington · 2021-08-13T09:41:38Z

Thanks @iritkatriel for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

bedevere-bot · 2021-08-13T09:41:57Z

GH-27753 is a backport of this pull request to the 3.10 branch.

This is part of an investigation of a non-deterministic reference leak. While we're looking for the root cause, this is included temporarily so that CI doesn't fail on this particular issue. This enables it to find other regressions in the meantime, which would otherwise be shadowed by our known issue. (cherry picked from commit 7bf28cb) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>

pablogsal · 2021-08-13T11:41:18Z

Lib/test/test_exceptions.py

@@ -1014,6 +1014,9 @@ def cycle():

    def test_no_hang_on_context_chain_cycle2(self):
        # See issue 25782. Cycle at head of context chain.


This fix looks incorrect, tests should not depend on the GC to pass. When this happens, is a symptom of another problem.

I propose to revert this commit and investigate.

CC: @vstinner

I understand is a "temporary measure" but in my experience those are left there with no fixes more often than not. Also, I don't feel comfortable with "
temporary fixes in the release candidate.

The alternative is to disable the test. That doesn't fix the issue either.

I prefer to deactivate the test. The reason is that relying on the GC in this way at the end has global effects and can mask other issues. Is also not deterministic and can actually be an endless loop in some extreme situations involving resurrection.

This is just my opinion on this of course, If the consensus is to leave this because the test has more value, then let's leave it, but I have to say that my previous experience with these kind of fixes is that they are left there more often than not.

Don't worry, is not urgent.

Thanks a lot for the investigation and for all the work!! 🚀

I don't feel comfortable with " temporary fixes in the release candidate.

Sure, we weren't going to let this slip into RC2. The point was to make refleak tests able to catch other regressions on that branch in the mean time.

If you'd rather redo the fix as a skip instead of gc.collect() then that's fine as well. However, from what I understood on the PR, having it run on the entire buildbot fleet for a few days would give us more confidence whether that approach to working around the refleaks is even effective.

How about we leave it as is for the weekend and remove the gc.collect() loop on Monday?

If you'd rather redo the fix as a skip instead of gc.collect() then that's fine as well. However, from what I understood on the PR, having it run on the entire buildbot fleet for a few days would give us more confidence whether that approach to working around the refleaks is even effective.

I don't get what you mean by this. Why do we want to know if the approach to work around is effective? What information do we gain by this? I can understand the though that this may gives us some more light into the problem but this workaround is too intrusive to gather any conclusions from the actual problem, more then that a cycle is likely involved.

How about we leave it as is for the weekend and remove the gc.collect() loop on Monday?

👍 Works for me

Why do we want to know if the approach to work around is effective?

Irit wrote:

Let's run this on buildbots and if it indeed resolves the leak we can do this for now.

I misinterpreted this as "let's merge this and see" but obviously she meant the test-with-buildbots label. Nevermind!

vstinner · 2021-08-13T13:11:39Z

@vstinner The buildbot tests passed. This might quiet down the ci. What do you think?

I'm fine with the workaround to unblock buildbots, but https://bugs.python.org/issue44895 must only be closed when the root issue is identified.

regrtest test runner runs gc.collect(). regrtest -R 3:3 runs gc.collect() one more time. So it's strange that you have to add a third gc.collect() call.

The worst case that I saw was a bug in a type implemented in C: https://vstinner.github.io/subinterpreter-leaks.html Calling gc.collect() worked around this bug. But I had to fix the C type (_thread.Lock) to fix the root issue. I don't think that it's the same bug here, since the leak was only seen when an interpreter was destroyed. Here the leak is seen at each loop.

bpo-44895: temporarily add a gc call to unbreak the built while we in…

a6a8b1c

…vestigate the refleak

iritkatriel requested a review from vstinner Aug 12, 2021

the-knights-who-say-ni added the CLA signed label Aug 12, 2021

bedevere-bot added awaiting core review type-tests labels Aug 12, 2021

iritkatriel added 🔨 test-with-buildbots skip news labels Aug 12, 2021

bedevere-bot removed the 🔨 test-with-buildbots label Aug 12, 2021

iritkatriel changed the title ~~bpo-44895: temporarily add a gc call to unbreak the built while we in…~~ bpo-44895: temporarily add a gc call to unbreak the build while we in… Aug 12, 2021

ambv added the needs backport to 3.10 label Aug 13, 2021

ambv merged commit 7bf28cb into python:main Aug 13, 2021
74 checks passed

bedevere-bot removed awaiting core review needs backport to 3.10 labels Aug 13, 2021

ambv changed the title ~~bpo-44895: temporarily add a gc call to unbreak the build while we in…~~ bpo-44895: Temporarily add an extra gc.collect() call Aug 13, 2021

pablogsal reviewed Aug 13, 2021

View changes

python / cpython Public

bpo-44895: Temporarily add an extra gc.collect() call #27746

bpo-44895: Temporarily add an extra gc.collect() call #27746

iritkatriel commented Aug 12, 2021 •

edited by ambv

iritkatriel commented Aug 12, 2021

bedevere-bot commented Aug 12, 2021

iritkatriel commented Aug 13, 2021

miss-islington commented Aug 13, 2021

bedevere-bot commented Aug 13, 2021

pablogsal Aug 13, 2021

pablogsal Aug 13, 2021

pablogsal Aug 13, 2021

iritkatriel Aug 13, 2021

pablogsal Aug 13, 2021

pablogsal Aug 13, 2021

ambv Aug 13, 2021

ambv Aug 13, 2021

pablogsal Aug 13, 2021

ambv Aug 13, 2021

vstinner commented Aug 13, 2021

		@@ -1014,6 +1014,9 @@ def cycle():

		def test_no_hang_on_context_chain_cycle2(self):
		# See issue 25782. Cycle at head of context chain.

python / cpython Public

bpo-44895: Temporarily add an extra gc.collect() call #27746

bpo-44895: Temporarily add an extra gc.collect() call #27746

Conversation

iritkatriel commented Aug 12, 2021 • edited by ambv

iritkatriel commented Aug 12, 2021

bedevere-bot commented Aug 12, 2021

iritkatriel commented Aug 13, 2021

miss-islington commented Aug 13, 2021

bedevere-bot commented Aug 13, 2021

vstinner commented Aug 13, 2021

iritkatriel commented Aug 12, 2021 •

edited by ambv