Skip to content

[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Dec 22, 2023

@rhettinger
Copy link
Contributor

I think this "feature" needs to be turned-off. It solves a problem that almost no one has and if they did one that could be managed by allocating more resources. In its place is a new problem that kicks in at a lower threshold and is not user solvable. Unlike the Python recursion limit, this isn't user settable. AFAICT it is hard-wired.

It breaks one of the intended use cases for the cache decorators, namely, making it straightforward to implement dynamic programming solutions. Interestingly, it only damages the C implementation — the pure python implementation still works fine.

I really wish you would have taken the suggestion by @gvanrossum to implement more robust stack management via monitoring or an interrupt rather than imposing an arbitrary (now a bit little larger) constraint which is a bit of hack given that we can't really know the actual limit in advance. The number chosen in this PR can be way to small in some environments and possibly too large in others.

In a way, this PR makes the situation worse. While it makes factorial(2000) work, it will fail for large inputs (where it did not used to fail). With the larger limit, a user with a real world application is less likely to find this dead end during testing. Only when the load increases will there be a failure that forces an algorithmic redesign (or if someone is clever, just replacing the now useless C version with a pure Python version they can control).

@gvanrossum
Copy link
Member

@rhettinger Are you saying that a segfault due to runaway C level recursion is preferable over an exception? I don't have much sympathy for that attitude. I think if you're using recursion to the point where this matters, you're overdue for an algorithmic design anyways.

@kaathewise
Copy link

kaathewise commented Dec 24, 2023

@gvanrossum I think what is preferable to @rhettinger is the levels of control and predictability that were present before @markshannon's original change.

A simple change of the limits doesn't actually save us from runaway C recursion, as explained in the PEP rejection justification, but makes the problem harder to notice.

@markshannon
Copy link
Member Author

Ideally, the VM would raise an exception just before a segfault would happen.
Unfortunately that would be too expensive, but we should be able to get close enough (within a 100 or so) of that ideal.

What I'd like to do is this:

  • Get this improvement into 3.12
  • Find a better fix for 3.13, close to the ideal
  • Consider backporting that fix to 3.12 (which might not be feasible if the above fix is too complex)

@rhettinger
Copy link
Contributor

rhettinger commented Dec 24, 2023

@gvanrossum gvanrossum

Are you saying that a segfault due to runaway C level recursion is preferable over an exception

No, I wasn't talking about a segfault (user visible crash). I think the way page faults work is that a signal or interrupt is raised that can then be handled either by allocating more memory or by raising an exception for a gentle recovery within Python.

@markshannon

Get this improvement into 3.12

I don't really have a say in this, but my recommendation is to entirely remove the limit entirely from Python 3.12 so that users would get back to the capabilities they always had. In practice, it was working out just fine.

Having 3.12 put in an artificial cap just breaks programs under load that would have run just fine. Backporting an increased limit just makes the problem harder to detect and more likely to be found only in production code. And even if a user knew to wrap a cache decorator in a try/except, I don't know what they could put in the except-clause to recover.

We're now is a weird situation where the user's best option is to bypass is use the pure python version of the cache rather than the C version which has been tightly fenced.

@gvanrossum
Copy link
Member

But we never did that. Running out of C stack was always a hard segfault, and we always guarded against it with a simple recursion depth check. Is the main issue that you want the C stack limit to still be under user control, so they can shoot themselves in the foot, or tune the limit and system stack allocation (via limit -s) to their needs?

@rhettinger rhettinger removed their request for review December 24, 2023 17:37
@rhettinger
Copy link
Contributor

It sounds like Mark has a plan for a clean solution in 3.13. Until then, my recommendation is to restore 3.12 to the same state as 3.11, something that we've lived with for a long time.

But if you decide to keep the cap in 3.12, a user settable limit would be preferable to a hard-wired limit. Our experience with sys.setrecursionlimit shows that people have found this to be useful.

@gvanrossum
Copy link
Member

gvanrossum commented Dec 24, 2023

I wrote a little script that tries to bisect to the max recursion depth without either segfault or exception for this function:

@functools.lru_cache(maxsize=None)
def recurse(n: int):
    if n <= 0:
        return 0
    else:
        return recurse(n - 1) + 1

I ran it on my Mac M3, Python configured without any flags.

  • In 3.9 the limit is 22300, and if you go beyond you get a segfault.
  • In 3.10 it's down to 15643, also segfault.
  • In 3.11 it's 24957 (the best of the bunch), also segfault if you go beyond.
  • In 3.12.1 it's down to a disappointing 498, but never segfaults.
  • In 3.12 with this PR it's 2665, and never segfaults.
  • If I modify the PR to make the limit 80000 (that's 10x), I get 26665, and still no segfaults.

So the PR improves things a bit but still leaves a lot on the table.

(UPDATE: On my slightly older Intel Mac I get similar results; the numbers for 3.11 and before are slightly higher, the 3.12 numbers are the same for all variants.)

Full script
import functools
import os
import sys

sys.setrecursionlimit(999999999)


@functools.lru_cache(maxsize=None)
def recurse(n: int):
    if n <= 0:
        return 0
    else:
        return recurse(n - 1) + 1


def newton(limit: int = 100_000):
    hi = limit
    lo = 1
    while lo < hi:
        # print(f'Bracket {lo = }, {hi =}')
        mid = (hi + lo + 1) // 2
        pid = os.fork()
        if pid:
            # Parent
            pid, status = os.waitpid(pid, 0)
        else:
            # Child
            try:
                recurse(mid)
            except RecursionError:
                # print("RecursionError")
                sys.exit(1)
            else:
                # print("Good")
                sys.exit(0)
        print(f"{status = } for {mid = }")
        if status:
            hi = mid - 1
        else:
            lo = mid
    print(lo, hi)


newton()

@gvanrossum
Copy link
Member

Increasing the limit way further doesn't give a much higher limit, and the segfaults return -- so it looks like I hit upon a sweet spot for Macs. Running the code in a thread doesn't change much either.

It's quite possible that if you use a different C function to enter recursion you get different results -- possibly the C lru_cache implementation has very few local variables.

I don't know if it's realistic to ask for the 3.11 behavior back (IIRC there was a change in the interpreter that made some intervention necessary) but I'll pass the baton to Mark.

In the meantime, it's the holidays, so let's give this a break until the new year.

@tim-one
Copy link
Member

tim-one commented Dec 25, 2023

Apologies in advance if this is noise. Don't really know what this PR is about, but seems vaguely related 😉.

Starting about mid-week, I always get 14 test failures on 64-bit Windows, main branch, release build. Offhand they all seem related to runaway recursion:

14 tests failed:
    test_call test_compile test_copy test_descr test_dictviews
    test_exceptions test_functools test_importlib test_isinstance
    test_pickle test_pickletools test_richcmp test_typing
    test_xml_etree

@gvanrossum
Copy link
Member

@tim-one Probably better to report that on the corresponding issue (#112215). Or create a new issue linking to that one. Would be good to know a bit more about your configuration (Windows version, MSVC version), since I don't think we see this in the Windows 64-bit build in CI. Also, Merry Christmas! It's been too long.

@markshannon
Copy link
Member Author

Superseded by #115083

@markshannon markshannon deleted the backport-113397 branch August 6, 2024 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants