Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.12 setrecursionlimit is ignored in connection with @functools.cache #112215

Open
PeterLuschny opened this issue Nov 17, 2023 · 31 comments
Open
Assignees
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes release-blocker type-bug An unexpected behavior, bug, or error

Comments

@PeterLuschny
Copy link

PeterLuschny commented Nov 17, 2023

Bug report

A minimal example:

#  fib.py 
import sys
sys.setrecursionlimit(2000)

from functools import cache

@cache
def fib(n):
    if n<1: return 0
    if n==1: return 1
    return fib(n-1) + fib(n-2)

print(fib(500))

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

Edit: This (simpler) example is from @dimpase .

Linked PRs

@PeterLuschny PeterLuschny added the type-bug An unexpected behavior, bug, or error label Nov 17, 2023
@dimpase
Copy link
Contributor

dimpase commented Nov 17, 2023

also tested on Linux (with the call to sys.getwindowsversion() removed).

As well, there is no discrepancy with 3.11 vs 3.12 if @cache is removed. So the working hypothesis is that @cache is to blame for the regression.

The current master is bad too.
Bisecting currently, last year things were still OK.

In my testing I modified the lines

    sys.set_int_max_str_digits(5000)
    print(sys.version_info, sys.getwindowsversion())

I removed the 1st one (it's very recent), and only left

    print(sys.version_info)

@dimpase
Copy link
Contributor

dimpase commented Nov 17, 2023

I found the 1st bad commit, 7644935, by bisecting:

76449350b3467b85bcb565f9e2bf945bd150a66e is the first bad commit
commit 76449350b3467b85bcb565f9e2bf945bd150a66e
Author: Mark Shannon <mark@hotpy.org>
Date:   Wed Oct 5 01:34:03 2022 +0100

    GH-91079: Decouple C stack overflow checks from Python recursion checks. (GH-96510)

 Include/cpython/pystate.h                          | 16 ++++-
 Include/internal/pycore_ceval.h                    | 21 +++---
 Include/internal/pycore_runtime_init.h             |  2 +-
 Lib/test/support/__init__.py                       |  5 +-
 Lib/test/test_ast.py                               |  6 +-
 Lib/test/test_call.py                              | 38 ++++++++++
 Lib/test/test_collections.py                       |  2 +-
 Lib/test/test_compile.py                           |  3 +-
 Lib/test/test_dynamic.py                           |  8 +--
 Lib/test/test_exceptions.py                        |  8 +--
 Lib/test/test_isinstance.py                        | 12 ++--
 Lib/test/test_marshal.py                           |  3 +-
 .../2022-09-05-09-56-32.gh-issue-91079.H4-DdU.rst  |  3 +
 Modules/_testinternalcapi.c                        |  4 +-
 Parser/asdl_c.py                                   |  9 +--
 Python/Python-ast.c                                |  9 +--
 Python/ast.c                                       |  9 +--
 Python/ast_opt.c                                   |  9 +--
 Python/ceval.c                                     | 81 +++++++++++++++-------
 Python/pystate.c                                   |  5 +-
 Python/symtable.c                                  |  9 +--
 Python/sysmodule.c                                 |  2 +-
 22 files changed, 165 insertions(+), 99 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2022-09-05-09-56-32.gh-issue-91079.H4-DdU.rst

@dimpase
Copy link
Contributor

dimpase commented Nov 17, 2023

@markshannon - greetings from Oxford, by the way :-)

@sunmy2019
Copy link
Member

Yes. I was suggesting exposing an API for the C recursion limit, but not receiving active feedback.

See the discussion here: #107263

@dimpase
Copy link
Contributor

dimpase commented Nov 18, 2023

@PeterLuschny - basic Fibonacci

#  fib.py 
import sys
sys.setrecursionlimit(2000)

from functools import cache

@cache
def fib(n):
    if n<1: return 0
    if n==1: return 1
    return fib(n-1) + fib(n-2)

print(fib(500))

shows the bug just as well

@dimpase
Copy link
Contributor

dimpase commented Nov 18, 2023

it's not "probably in connection with @cache", it's definitely

@PeterLuschny PeterLuschny changed the title 3.12 setrecursionlimit is ignored, probably in connection with @cache 3.12 setrecursionlimit is ignored in connection with @cache Nov 18, 2023
@Eclips4 Eclips4 added 3.12 bugs and security fixes 3.13 new features, bugs and security fixes labels Nov 18, 2023
@Eclips4 Eclips4 changed the title 3.12 setrecursionlimit is ignored in connection with @cache 3.12 setrecursionlimit is ignored in connection with @functools.cache Nov 18, 2023
@dimpase
Copy link
Contributor

dimpase commented Nov 18, 2023

The bug is triggered by the use of the C implementation of _lru_cache_wrapper, in Modules/_functoolsmodule.c.
To see this, disable it (then Python code will be used) by applying

--- a/Lib/functools.py
+++ b/Lib/functools.py
@@ -641,7 +641,7 @@ def cache_clear():
     return wrapper
 
 try:
-    from _functools import _lru_cache_wrapper
+    from _functools import no_no_lru_cache_wrapper
 except ImportError:
     pass

and see ./python <fib.py computing the answer, instead of erroring out.

In a way, it's to be expected, as C code isn't governed by setrecursionlimit any more in 3.12+.

@arhadthedev arhadthedev added the performance Performance or resource usage label Nov 19, 2023
@sunmy2019 sunmy2019 removed the performance Performance or resource usage label Nov 20, 2023
@sunmy2019
Copy link
Member

@arhadthedev This is not related to performance. It's a new hard-coded limit not adjustable.

@gvanrossum
Copy link
Member

@markshannon ^^

@markshannon
Copy link
Member

We added the C recursion limit for safety reasons. We don't want to give that up.

We also opted for portability of code by setting the C recursion limit to a single value for all platforms.
Perhaps that was a mistake and we should choose different limits at build time depending on the platform and the build, improving backwards compatibility at the expense of portability.

@dimpase
Copy link
Contributor

dimpase commented Nov 20, 2023

I don't understand how you can nuke all the recursive uses of Python C extensions in the name of "safety", sorry.

For a small sampling, have a look how many code chunks on GitHub have @cache and setrecursionlimit in one place. I counted over 400.

@gpshead
Copy link
Member

gpshead commented Nov 20, 2023

@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.

@dimpase
Copy link
Contributor

dimpase commented Nov 20, 2023

@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.

Apologies. In my defense, I can only say that I am speaking from one of the pitfalls wisely predicted by the SC here.

* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the 
  recursion, making the likelihood of it being useful even smaller. It also creates 
  easy pitfalls for users who do end up relying on recursion.

in the rejection of PEP 651 – Robust Stack Overflow Handling

@sunmy2019
Copy link
Member

@dimpase: your tone is aggressive. that's uncalled for. please keep it civil.

Apologies. In my defense, I can only say that I am speaking from one of the pitfalls wisely predicted by the SC here.

* The benefit of PEP 651 is negated as soon as a non-Python function is involved in the 
  recursion, making the likelihood of it being useful even smaller. It also creates 
  easy pitfalls for users who do end up relying on recursion.

in the rejection of PEP 651 – Robust Stack Overflow Handling

I agree with this. It causes large inconvenience for users who do end up relying on recursion.

Their right to use C recursion is deprived, even without notice.

@sunmy2019
Copy link
Member

And, by the way, limiting the C stack cannot prevent all stack overflows.

500, 1000, or 1500 are just rough estimates. We can still create stack overflows under these conditions.

The idiomatic way is to register a signal handler (on UNIX) / VectoredExceptionHandler (on Windows) to check if the stack pointer is out of the boundary while not recoverable.


I am fine with having these limits, but please provide a way to circumvent them.

Why are users forbidden to use large stack memory even if they provide one?

@gvanrossum
Copy link
Member

I think it would be more productive to come up with a way that we can calculate the C stack size accurately rather than continue to explain why it's not great (we know, we know).

@sunmy2019
Copy link
Member

sunmy2019 commented Nov 21, 2023

calculate the C stack size accurately

Yes, we can estimate it better. We don't need to be that accurate. We can make sure the user can make it larger.

For example on Linux, we can query the stack limit at the program thread start. https://linux.die.net/man/2/getrlimit Then set the C recursion limit.

When the user sets the stack limit of 1MB, he gets 1MB / 2KB = 500.

When he needs 2000, all he needs to do is set the stack limit to 1 MB * 2000 / 500 = 4 MB. The main conflict will be resolved.

@gpshead
Copy link
Member

gpshead commented Nov 21, 2023

getrlimit(RLIMIT_STACK only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.

@gpshead
Copy link
Member

gpshead commented Nov 21, 2023

There is no posix API to get a given threads stack size.

Note that there is a Linux only GNU extension that works if you #define _GNU_SOURCE before the #include <pthread.h>:

size_t get_pthread_stack_size(pthread_t tid) {
  pthread_attr_t attr;
  pthread_attr_init( &attr );  // probably not needed
  if (pthread_getattr_np(tid, &attr) < 0) {
    perror("pthread_getattr_np failed");
    return 0;
  }
  size_t stack_size = 0;
  if (pthread_attr_getstacksize(&attr, &stack_size) < 0) {
    perror("pthread_attr_getstacksize failed");
    return 0;
  }
  return stack_size;
}

macOS appears to have its own non-standard pthread_get_stacksize_np() function. It's bit of a mess of non-standard APIs.

(my best guess is that the _np suffix in both worlds means "non-posix")

@sunmy2019
Copy link
Member

sunmy2019 commented Nov 21, 2023

getrlimit(RLIMIT_STACK only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.

Yeah, you are right. It's just too niche (too target-specific).

I also understand Python would prefer resumeable error to just aborting the program.

Then, how about providing the user an option to disable this C recursion limit check? Let the user manage themselves under this mode.

@dimpase
Copy link
Contributor

dimpase commented Nov 21, 2023

getrlimit(RLIMIT_STACK only returns the size of the main process thread, not the running thread. There is no posix API to get a given threads stack size. It can only be controlled at thread creation time and CPython does not control creation of all threads. It is common for C/C++ applications to launch threads with a different stack size than the overly large main thread default.

Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize (which is POSIX)
instead of a GNU extension? These give you the minimal stack size. (Not sure what minimal stands for here).

One way or another, knowing C stack limit does not give one complete knowledge on the maximal C recursion depth, as different functions place different amount of data on stack.

Anyhow, for the issue at hand, the C version _functools._lru_cache_wrapper(), behind all this, is an implementation of a Python wrapper, what it produces has to be compatible with the Python version functools._lru_cache_wrapper(), recursion-depth-wise too.

@gpshead
Copy link
Member

gpshead commented Nov 21, 2023

Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize (which is POSIX)
instead of a GNU extension?

Nope. That is merely a function for accessing the opaque pthread_attr_t structure data.

@dimpase
Copy link
Contributor

dimpase commented Nov 21, 2023

Isn't it possible to get C stack size (at thread creation time) directly via pthread_attr_getstacksize (which is POSIX)
instead of a GNU extension?

Nope. That is merely a function for accessing the opaque pthread_attr_t structure data.

the code you posted in #112215 (comment) is basically a small wrapper around pthread_attr_getstacksize, isn't it?
Just copy it all if you must - there is nothing magical about a GNU extension there.

@gpshead
Copy link
Member

gpshead commented Nov 21, 2023

no it isn't. pthread_getattr_np() fetches the thread information.

@dimpase
Copy link
Contributor

dimpase commented Nov 21, 2023

no it isn't. pthread_getattr_np() fetches the thread information.

OK, sorry, I misunderstood that. It seems that musl has similar functions implemented, so it doesn't look impossible to have this for all platforms. (I don't know anything about Windows, though).

@rhettinger
Copy link
Contributor

rhettinger commented Nov 25, 2023

Python 3.12 should not be left in its current state. The OP's example demonstrates a major regression.

@tim-one
Copy link
Member

tim-one commented Dec 25, 2023

Starting about mid-week, I always see 14 test failures on Windows 10 (Pro), seemingly due to this. Current main branch, release build, Visual Studio 2022, nothing customized:

14 tests failed:
    test_call test_compile test_copy test_descr test_dictviews
    test_exceptions test_functools test_importlib test_isinstance
    test_pickle test_pickletools test_richcmp test_typing
    test_xml_etree

Sometimes there's an explicit message about Windows stack overflow. Historically, as I recall we always needed a lower recursion limit on Windows (than on Linux).

@tim-one
Copy link
Member

tim-one commented Dec 25, 2023

Note: 13 of the 14 tests named above still pass in a debug build. As I recall, test_importlib (which still fails in a debug build) started flaking out before the other ones:

Traceback (most recent call last):
  File "C:\Code\Python\Lib\test\test_importlib\test_main.py", line 424, in test_packages_distributions_symlinked_top_level
    fixtures.build_files(files, self.site_dir)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Code\Python\Lib\test\test_importlib\_path.py", line 70, in build
    create(contents, _ensure_tree_maker(prefix) / name)
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Code\Python\Lib\functools.py", line 911, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\Code\Python\Lib\test\test_importlib\_path.py", line 91, in _
    path.symlink_to(content)
    ~~~~~~~~~~~~~~~^^^^^^^^^
  File "C:\Code\Python\Lib\pathlib\__init__.py", line 441, in symlink_to
    os.symlink(target, self, target_is_directory)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1314] A required privilege is not held by the client: '.symlink.target' -> 'C:\\Users\\Tim\\AppData\\Local\\Temp\\test_python_zgi02bbl\\tmp8n3vtxjo\\symlinked'

@gvanrossum
Copy link
Member

Hm, maybe the CI uses debug build.

@gvanrossum
Copy link
Member

gvanrossum commented Jan 3, 2024

Marking this as release blocker since we discussed this off-line. The plan is to restore 3.12 to the 3.11 behavior (which is a little tricky since various tests were added/updated, but it's doable). For 3.13 we will do per-platform things (and per-config-setting, e.g. debug) by default and add an API to change the C recursion limit.

See also #113403 (comment) for some measurements of what we can actually support without crashing (on Mac, but apparently Linux is similar, and Windows is dramatically different).

@gvanrossum
Copy link
Member

Starting about mid-week, I always see 14 test failures on Windows 10 (Pro), seemingly due to this. Current main branch, release build, Visual Studio 2022, nothing customized:

I get this too. Windows 11, main branch. We also see it in the Azure Pipelines CI build, e.g.
https://dev.azure.com/Python/cpython/_build/results?buildId=147048&view=results
(requires login?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes release-blocker type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

10 participants