[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403

markshannon · 2023-12-22T14:40:50Z

Backport of #113397

Issue: 3.12 setrecursionlimit is ignored in connection with @functools.cache #112215

rhettinger · 2023-12-22T22:21:55Z

I think this "feature" needs to be turned-off. It solves a problem that almost no one has and if they did one that could be managed by allocating more resources. In its place is a new problem that kicks in at a lower threshold and is not user solvable. Unlike the Python recursion limit, this isn't user settable. AFAICT it is hard-wired.

It breaks one of the intended use cases for the cache decorators, namely, making it straightforward to implement dynamic programming solutions. Interestingly, it only damages the C implementation — the pure python implementation still works fine.

I really wish you would have taken the suggestion by @gvanrossum to implement more robust stack management via monitoring or an interrupt rather than imposing an arbitrary (now a bit little larger) constraint which is a bit of hack given that we can't really know the actual limit in advance. The number chosen in this PR can be way to small in some environments and possibly too large in others.

In a way, this PR makes the situation worse. While it makes factorial(2000) work, it will fail for large inputs (where it did not used to fail). With the larger limit, a user with a real world application is less likely to find this dead end during testing. Only when the load increases will there be a failure that forces an algorithmic redesign (or if someone is clever, just replacing the now useless C version with a pure Python version they can control).

gvanrossum · 2023-12-24T01:13:22Z

@rhettinger Are you saying that a segfault due to runaway C level recursion is preferable over an exception? I don't have much sympathy for that attitude. I think if you're using recursion to the point where this matters, you're overdue for an algorithmic design anyways.

kaathewise · 2023-12-24T10:25:26Z

@gvanrossum I think what is preferable to @rhettinger is the levels of control and predictability that were present before @markshannon's original change.

A simple change of the limits doesn't actually save us from runaway C recursion, as explained in the PEP rejection justification, but makes the problem harder to notice.

markshannon · 2023-12-24T10:28:19Z

Ideally, the VM would raise an exception just before a segfault would happen.
Unfortunately that would be too expensive, but we should be able to get close enough (within a 100 or so) of that ideal.

What I'd like to do is this:

Get this improvement into 3.12
Find a better fix for 3.13, close to the ideal
Consider backporting that fix to 3.12 (which might not be feasible if the above fix is too complex)

rhettinger · 2023-12-24T17:19:20Z

@gvanrossum gvanrossum

Are you saying that a segfault due to runaway C level recursion is preferable over an exception

No, I wasn't talking about a segfault (user visible crash). I think the way page faults work is that a signal or interrupt is raised that can then be handled either by allocating more memory or by raising an exception for a gentle recovery within Python.

@markshannon

Get this improvement into 3.12

I don't really have a say in this, but my recommendation is to entirely remove the limit entirely from Python 3.12 so that users would get back to the capabilities they always had. In practice, it was working out just fine.

Having 3.12 put in an artificial cap just breaks programs under load that would have run just fine. Backporting an increased limit just makes the problem harder to detect and more likely to be found only in production code. And even if a user knew to wrap a cache decorator in a try/except, I don't know what they could put in the except-clause to recover.

We're now is a weird situation where the user's best option is to bypass is use the pure python version of the cache rather than the C version which has been tightly fenced.

gvanrossum · 2023-12-24T17:28:40Z

But we never did that. Running out of C stack was always a hard segfault, and we always guarded against it with a simple recursion depth check. Is the main issue that you want the C stack limit to still be under user control, so they can shoot themselves in the foot, or tune the limit and system stack allocation (via limit -s) to their needs?

rhettinger · 2023-12-24T17:48:10Z

It sounds like Mark has a plan for a clean solution in 3.13. Until then, my recommendation is to restore 3.12 to the same state as 3.11, something that we've lived with for a long time.

But if you decide to keep the cap in 3.12, a user settable limit would be preferable to a hard-wired limit. Our experience with sys.setrecursionlimit shows that people have found this to be useful.

gvanrossum · 2023-12-24T23:11:42Z

I wrote a little script that tries to bisect to the max recursion depth without either segfault or exception for this function:

@functools.lru_cache(maxsize=None)
def recurse(n: int):
    if n <= 0:
        return 0
    else:
        return recurse(n - 1) + 1

I ran it on my Mac M3, Python configured without any flags.

In 3.9 the limit is 22300, and if you go beyond you get a segfault.
In 3.10 it's down to 15643, also segfault.
In 3.11 it's 24957 (the best of the bunch), also segfault if you go beyond.
In 3.12.1 it's down to a disappointing 498, but never segfaults.
In 3.12 with this PR it's 2665, and never segfaults.
If I modify the PR to make the limit 80000 (that's 10x), I get 26665, and still no segfaults.

So the PR improves things a bit but still leaves a lot on the table.

(UPDATE: On my slightly older Intel Mac I get similar results; the numbers for 3.11 and before are slightly higher, the 3.12 numbers are the same for all variants.)

Full script

import functools
import os
import sys

sys.setrecursionlimit(999999999)


@functools.lru_cache(maxsize=None)
def recurse(n: int):
    if n <= 0:
        return 0
    else:
        return recurse(n - 1) + 1


def newton(limit: int = 100_000):
    hi = limit
    lo = 1
    while lo < hi:
        # print(f'Bracket {lo = }, {hi =}')
        mid = (hi + lo + 1) // 2
        pid = os.fork()
        if pid:
            # Parent
            pid, status = os.waitpid(pid, 0)
        else:
            # Child
            try:
                recurse(mid)
            except RecursionError:
                # print("RecursionError")
                sys.exit(1)
            else:
                # print("Good")
                sys.exit(0)
        print(f"{status = } for {mid = }")
        if status:
            hi = mid - 1
        else:
            lo = mid
    print(lo, hi)


newton()

gvanrossum · 2023-12-24T23:51:52Z

Increasing the limit way further doesn't give a much higher limit, and the segfaults return -- so it looks like I hit upon a sweet spot for Macs. Running the code in a thread doesn't change much either.

It's quite possible that if you use a different C function to enter recursion you get different results -- possibly the C lru_cache implementation has very few local variables.

I don't know if it's realistic to ask for the 3.11 behavior back (IIRC there was a change in the interpreter that made some intervention necessary) but I'll pass the baton to Mark.

In the meantime, it's the holidays, so let's give this a break until the new year.

tim-one · 2023-12-25T00:51:20Z

Apologies in advance if this is noise. Don't really know what this PR is about, but seems vaguely related 😉.

Starting about mid-week, I always get 14 test failures on 64-bit Windows, main branch, release build. Offhand they all seem related to runaway recursion:

14 tests failed:
    test_call test_compile test_copy test_descr test_dictviews
    test_exceptions test_functools test_importlib test_isinstance
    test_pickle test_pickletools test_richcmp test_typing
    test_xml_etree

gvanrossum · 2023-12-25T01:47:51Z

@tim-one Probably better to report that on the corresponding issue (#112215). Or create a new issue linking to that one. Would be good to know a bit more about your configuration (Windows version, MSVC version), since I don't think we see this in the Windows 64-bit build in CI. Also, Merry Christmas! It's been too long.

markshannon · 2024-02-13T14:17:46Z

Superseded by #115083

markshannon added 4 commits December 15, 2023 18:40

pythonGH-112215: Increase C recursion limit for non debug builds (pyt…

2cdf1c7

…honGH-113397)

Reduce C recursion limit for Windows

0e5af41

Reduce Windows C recursion limit to 6500

00160bc

Fix #ifdefs from backport and tweak a few tests

be8625f

markshannon requested a review from rhettinger as a code owner December 22, 2023 14:40

bedevere-app bot mentioned this pull request Dec 22, 2023

GH-112215: Increase C recursion limit for non debug builds #113397

Merged

bedevere-app bot added the awaiting core review label Dec 22, 2023

bedevere-app bot mentioned this pull request Dec 22, 2023

3.12 setrecursionlimit is ignored in connection with @functools.cache #112215

Open

markshannon force-pushed the backport-113397 branch from 07fb2b4 to 2cdf1c7 Compare December 22, 2023 14:45

rhettinger removed their request for review December 24, 2023 17:37

markshannon closed this Feb 13, 2024

markshannon deleted the backport-113397 branch August 6, 2024 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403

[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403

Uh oh!

markshannon commented Dec 22, 2023 •

edited by bedevere-app bot

Loading

Uh oh!

rhettinger commented Dec 22, 2023

Uh oh!

gvanrossum commented Dec 24, 2023

Uh oh!

kaathewise commented Dec 24, 2023 •

edited

Loading

Uh oh!

markshannon commented Dec 24, 2023

Uh oh!

rhettinger commented Dec 24, 2023 •

edited

Loading

Uh oh!

gvanrossum commented Dec 24, 2023

Uh oh!

rhettinger commented Dec 24, 2023

Uh oh!

gvanrossum commented Dec 24, 2023 •

edited

Loading

Uh oh!

gvanrossum commented Dec 24, 2023

Uh oh!

tim-one commented Dec 25, 2023

Uh oh!

gvanrossum commented Dec 25, 2023

Uh oh!

markshannon commented Feb 13, 2024

Uh oh!

Uh oh!

Uh oh!

[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403

[3.12] GH-112215: Increase C recursion limit for non debug builds (GH-113397) #113403

Uh oh!

Conversation

markshannon commented Dec 22, 2023 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Dec 22, 2023

Uh oh!

gvanrossum commented Dec 24, 2023

Uh oh!

kaathewise commented Dec 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented Dec 24, 2023

Uh oh!

rhettinger commented Dec 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Dec 24, 2023

Uh oh!

rhettinger commented Dec 24, 2023

Uh oh!

gvanrossum commented Dec 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Dec 24, 2023

Uh oh!

tim-one commented Dec 25, 2023

Uh oh!

gvanrossum commented Dec 25, 2023

Uh oh!

markshannon commented Feb 13, 2024

Uh oh!

Uh oh!

markshannon commented Dec 22, 2023 •

edited by bedevere-app bot

Loading

kaathewise commented Dec 24, 2023 •

edited

Loading

rhettinger commented Dec 24, 2023 •

edited

Loading

gvanrossum commented Dec 24, 2023 •

edited

Loading