-
-
Notifications
You must be signed in to change notification settings - Fork 32k
[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
07fb2b4
to
2cdf1c7
Compare
I think this "feature" needs to be turned-off. It solves a problem that almost no one has and if they did one that could be managed by allocating more resources. In its place is a new problem that kicks in at a lower threshold and is not user solvable. Unlike the Python recursion limit, this isn't user settable. AFAICT it is hard-wired. It breaks one of the intended use cases for the cache decorators, namely, making it straightforward to implement dynamic programming solutions. Interestingly, it only damages the C implementation — the pure python implementation still works fine. I really wish you would have taken the suggestion by @gvanrossum to implement more robust stack management via monitoring or an interrupt rather than imposing an arbitrary (now a bit little larger) constraint which is a bit of hack given that we can't really know the actual limit in advance. The number chosen in this PR can be way to small in some environments and possibly too large in others. In a way, this PR makes the situation worse. While it makes |
@rhettinger Are you saying that a segfault due to runaway C level recursion is preferable over an exception? I don't have much sympathy for that attitude. I think if you're using recursion to the point where this matters, you're overdue for an algorithmic design anyways. |
@gvanrossum I think what is preferable to @rhettinger is the levels of control and predictability that were present before @markshannon's original change. A simple change of the limits doesn't actually save us from runaway C recursion, as explained in the PEP rejection justification, but makes the problem harder to notice. |
Ideally, the VM would raise an exception just before a segfault would happen. What I'd like to do is this:
|
@gvanrossum gvanrossum
No, I wasn't talking about a segfault (user visible crash). I think the way page faults work is that a signal or interrupt is raised that can then be handled either by allocating more memory or by raising an exception for a gentle recovery within Python.
I don't really have a say in this, but my recommendation is to entirely remove the limit entirely from Python 3.12 so that users would get back to the capabilities they always had. In practice, it was working out just fine. Having 3.12 put in an artificial cap just breaks programs under load that would have run just fine. Backporting an increased limit just makes the problem harder to detect and more likely to be found only in production code. And even if a user knew to wrap a cache decorator in a try/except, I don't know what they could put in the except-clause to recover. We're now is a weird situation where the user's best option is to bypass is use the pure python version of the cache rather than the C version which has been tightly fenced. |
But we never did that. Running out of C stack was always a hard segfault, and we always guarded against it with a simple recursion depth check. Is the main issue that you want the C stack limit to still be under user control, so they can shoot themselves in the foot, or tune the limit and system stack allocation (via limit -s) to their needs? |
It sounds like Mark has a plan for a clean solution in 3.13. Until then, my recommendation is to restore 3.12 to the same state as 3.11, something that we've lived with for a long time. But if you decide to keep the cap in 3.12, a user settable limit would be preferable to a hard-wired limit. Our experience with |
I wrote a little script that tries to bisect to the max recursion depth without either segfault or exception for this function: @functools.lru_cache(maxsize=None)
def recurse(n: int):
if n <= 0:
return 0
else:
return recurse(n - 1) + 1 I ran it on my Mac M3, Python configured without any flags.
So the PR improves things a bit but still leaves a lot on the table. (UPDATE: On my slightly older Intel Mac I get similar results; the numbers for 3.11 and before are slightly higher, the 3.12 numbers are the same for all variants.) Full scriptimport functools
import os
import sys
sys.setrecursionlimit(999999999)
@functools.lru_cache(maxsize=None)
def recurse(n: int):
if n <= 0:
return 0
else:
return recurse(n - 1) + 1
def newton(limit: int = 100_000):
hi = limit
lo = 1
while lo < hi:
# print(f'Bracket {lo = }, {hi =}')
mid = (hi + lo + 1) // 2
pid = os.fork()
if pid:
# Parent
pid, status = os.waitpid(pid, 0)
else:
# Child
try:
recurse(mid)
except RecursionError:
# print("RecursionError")
sys.exit(1)
else:
# print("Good")
sys.exit(0)
print(f"{status = } for {mid = }")
if status:
hi = mid - 1
else:
lo = mid
print(lo, hi)
newton() |
Increasing the limit way further doesn't give a much higher limit, and the segfaults return -- so it looks like I hit upon a sweet spot for Macs. Running the code in a thread doesn't change much either. It's quite possible that if you use a different C function to enter recursion you get different results -- possibly the C lru_cache implementation has very few local variables. I don't know if it's realistic to ask for the 3.11 behavior back (IIRC there was a change in the interpreter that made some intervention necessary) but I'll pass the baton to Mark. In the meantime, it's the holidays, so let's give this a break until the new year. |
Apologies in advance if this is noise. Don't really know what this PR is about, but seems vaguely related 😉. Starting about mid-week, I always get 14 test failures on 64-bit Windows, main branch, release build. Offhand they all seem related to runaway recursion:
|
@tim-one Probably better to report that on the corresponding issue (#112215). Or create a new issue linking to that one. Would be good to know a bit more about your configuration (Windows version, MSVC version), since I don't think we see this in the Windows 64-bit build in CI. Also, Merry Christmas! It's been too long. |
Superseded by #115083 |
Backport of #113397
@functools.cache
#112215