Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38076 Clear the interpreter state only after clearing module globals #18039

Open
wants to merge 6 commits into
base: master
from

Conversation

@eduardo-elizondo
Copy link
Contributor

eduardo-elizondo commented Jan 17, 2020

Currently, during runtime destruction, _PyImport_Cleanup is clearing the interpreter state before clearing out the modules themselves. This leads to a segfault on modules that rely on the module state to clear themselves up.

For example, let's take the small snippet added in the issue by @DinoV :

import _struct

class C:
    def __init__(self):
        self.pack = _struct.pack
    def __del__(self):
        self.pack('I', -42)

_struct.x = C()

The module _struct uses the module state to run pack. Therefore, the module state has to be alive until after the module has been cleared out to successfully run C.__del__. This happens at line 606, when _PyImport_Cleanup calls _PyModule_Clear. In fact, the loop that calls _PyModule_Clear has in its comments:

Now, if there are any modules left alive, clear their globals to minimize potential leaks. All C extension modules actually end up here, since they are kept alive in the interpreter state.

That means that we can't clear the module state (which is used by C Extensions) before we run that loop.

Moving _PyInterpreterState_ClearModules until after it, fixes the segfault in the code snippet.

Finally, this updates a test in io to correctly assert the error that it now throws (since it now finds the io module state). The test that uses this is: test_create_at_shutdown_without_encoding. Given this test is now working is a proof that the module state now stays alive even when __del__ is called at module destruction time. Thus, I didn't add a new tests for this.

https://bugs.python.org/issue38076

@eduardo-elizondo eduardo-elizondo changed the title bpo-38076 Clear the interpreter state only after clearing module globals [WIP - Waiting for Tests]bpo-38076 Clear the interpreter state only after clearing module globals Jan 17, 2020
@eduardo-elizondo eduardo-elizondo changed the title [WIP - Waiting for Tests]bpo-38076 Clear the interpreter state only after clearing module globals bpo-38076 Clear the interpreter state only after clearing module globals Jan 17, 2020
@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 17, 2020

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit f2cfc13 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

@DinoV

This comment has been minimized.

Copy link
Contributor

DinoV commented Jan 17, 2020

I'm about to hop on a plane so I can't check it now but I'm curious what happens with:

import _struct, sys

class C:
    def __init__(self):
        self.pack = _struct.pack
    def __del__(self):
        self.pack('I', -42)

sys.x = C()
@pablogsal

This comment has been minimized.

Copy link
Member

pablogsal commented Jan 17, 2020

I'm about to hop on a plane so I can't check it now but I'm curious what happens with:

Still segfaults on my system:

❯ cat question.py  
import _struct, sys

class C:
    def __init__(self):
        self.pack = _struct.pack
    def __del__(self):
        self.pack('I', -42)

sys.x = C()

~/github/python/master runtime-destruction-fix*
❯ ./python question.py 
[1]    25816 segmentation fault (core dumped)  ./python question.py


(gdb) bt
#0  PyModule_GetState (m=0x0) at Objects/moduleobject.c:565
#1  0x00007ffff7fa1fd8 in cache_struct_converter (fmt='I', ptr=ptr@entry=0x7fffffffd170)
    at /home/pablogsal/github/python/master/Modules/_struct.c:2100
#2  0x00007ffff7fa22a8 in pack (self=<optimized out>, args=0x7ffff788a550, nargs=2)
    at /home/pablogsal/github/python/master/Modules/_struct.c:2162
#3  0x00005555557827fa in cfunction_vectorcall_FASTCALL (func=<built-in method pack of module object at remote 0x7ffff7821530>, 
    args=0x7ffff788a550, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:366
#4  0x000055555567d102 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775810, args=0x7ffff788a550, 
    callable=<built-in method pack of module object at remote 0x7ffff7821530>, tstate=0x555555952fc0) at ./Include/cpython/abstract.h:111
#5  _PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775810, args=0x7ffff788a550, 
    callable=<built-in method pack of module object at remote 0x7ffff7821530>) at ./Include/cpython/abstract.h:120
#6  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555952fc0) at Python/ceval.c:4931
#7  _PyEval_EvalFrameDefault (
    f=Frame 0x7ffff788a3d0, for file /home/pablogsal/github/python/master/question.py, line 7, in __del__ (self=<C(pack=<built-in method pack of module object at remote 0x7ffff7821530>) at remote 0x7ffff7792140>), throwflag=<optimized out>) at Python/ceval.c:3383
#8  0x00005555555bd12d in _PyEval_EvalFrame (throwflag=0, f=0x7ffff788a3d0, tstate=0x555555952fc0) at ./Include/internal/pycore_ceval.h:43
#9  function_code_fastcall (tstate=0x555555952fc0, co=<optimized out>, args=0x7fffffffd390, nargs=1, globals=<optimized out>)
    at Objects/call.c:327
#10 0x00005555555bda74 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, 
    kwnames=<optimized out>) at Objects/call.c:364
#11 0x0000555555612336 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x7fffffffd388, 
    callable=<function at remote 0x7ffff77cbd70>, tstate=0x555555952fc0) at ./Include/cpython/abstract.h:111
#12 _PyObject_CallOneArg (arg=<unknown at remote 0x555555952fc0>, func=<function at remote 0x7ffff77cbd70>)
    at ./Include/cpython/abstract.h:162
#13 call_unbound_noarg (unbound=<optimized out>, func=func@entry=<function at remote 0x7ffff77cbd70>, 
    self=self@entry=<C(pack=<built-in method pack of module object at remote 0x7ffff7821530>) at remote 0x7ffff7792140>)
    at Objects/typeobject.c:1466
#14 0x000055555561d098 in slot_tp_finalize (
    self=<C(pack=<built-in method pack of module object at remote 0x7ffff7821530>) at remote 0x7ffff7792140>) at Objects/typeobject.c:6858
#15 0x00005555555f87cf in PyObject_CallFinalizer (
    self=self@entry=<C(pack=<built-in method pack of module object at remote 0x7ffff7821530>) at remote 0x7ffff7792140>)
    at Objects/object.c:311
#16 0x00005555555fb1e2 in PyObject_CallFinalizerFromDealloc (
    self=self@entry=<C(pack=<built-in method pack of module object at remote 0x7ffff7821530>) at remote 0x7ffff7792140>)
    at Objects/object.c:329
#17 0x00005555556144fc in subtype_dealloc (
    self=<C(pack=<built-in method pack of module object at remote 0x7ffff7821530>) at remote 0x7ffff7792140>) at Objects/typeobject.c:1220
#18 0x00005555555f9d20 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2291
#19 0x00005555555eae7d in _Py_DECREF (op=<optimized out>, lineno=546, filename=0x5555557cb173 "./Include/object.h") at ./Include/object.h:479
#20 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:546
#21 insertdict (mp=mp@entry=0x7ffff78d2650, key=key@entry='x', hash=6776896660283084101, value=value@entry=None) at Objects/dictobject.c:1102
#22 0x00005555555ebe4c in PyDict_SetItem (
    op=op@entry={'__name__': None, '__doc__': None, '__package__': None, '__loader__': None, '__spec__': None, 'addaudithook': None, 'audit': None, 'breakpointhook': None, '_clear_type_cache': None, '_current_frames': None, 'displayhook': None, 'exc_info': None, 'excepthook': None, 'exit': None, 'getdefaultencoding': None, 'getdlopenflags': None, 'getallocatedblocks': None, 'getfilesystemencoding': None, 'getfilesystemencodeerrors': None, 'gettotalrefcount': None, 'getrefcount': None, 'getrecursionlimit': None, 'getsizeof': None, '_getframe': None, 'intern': None, 'is_finalizing': None, 'setswitchinterval': None, 'getswitchinterval': None, 'setdlopenflags': None, 'setprofile': None, 'getprofile': None, 'setrecursionlimit': None, 'settrace': None, 'gettrace': None, 'call_tracing': None, '_debugmallocstats': None, 'set_coroutine_origin_tracking_depth': None, 'get_coroutine_origin_tracking_depth': None, 'set_asyncgen_hooks': None, 'get_asyncgen_hooks': None, 'unraisablehook': None, 'modules': None, 'stderr': None, '__stderr__':...(truncated), key='x', value=None) at Objects/dictobject.c:1545
#23 0x00005555555f7ee6 in _PyModule_ClearDict (
    d={'__name__': None, '__doc__': None, '__package__': None, '__loader__': None, '__spec__': None, 'addaudithook': None, 'audit': None, 'breakpointhook': None, '_clear_type_cache': None, '_current_frames': None, 'displayhook': None, 'exc_info': None, 'excepthook': None, 'exit': None, 'getdefaultencoding': None, 'getdlopenflags': None, 'getallocatedblocks': None, 'getfilesystemencoding': None, 'getfilesystemencodeerrors': None, 'gettotalrefcount': None, 'getrefcount': None, 'getrecursionlimit': None, 'getsizeof': None, '_getframe': None, 'intern': None, 'is_finalizing': None, 'setswitchinterval': None, 'getswitchinterval': None, 'setdlopenflags': None, 'setprofile': None, 'getprofile': None, 'setrecursionlimit': None, 'settrace': None, 'gettrace': None, 'call_tracing': None, '_debugmallocstats': None, 'set_coroutine_origin_tracking_depth': None, 'get_coroutine_origin_tracking_depth': None, 'set_asyncgen_hooks': None, 'get_asyncgen_hooks': None, 'unraisablehook': None, 'modules': None, 'stderr': None, '__stderr__':...(truncated)) at Objects/moduleobject.c:629
#24 0x000055555569f898 in _PyImport_Cleanup (tstate=tstate@entry=0x555555952fc0) at Python/import.c:617
#25 0x00005555556b8108 in Py_FinalizeEx () at Python/pylifecycle.c:1412
#26 0x00005555555adc58 in Py_RunMain () at Modules/main.c:634
#27 0x00005555555adccd in pymain_main (args=args@entry=0x7fffffffd670) at Modules/main.c:662
#28 0x00005555555add91 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:686
--Type <RET> for more, q to quit, c to continue without paging--
#29 0x00005555555ac6d2 in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:16


@eduardo-elizondo

This comment has been minimized.

Copy link
Contributor Author

eduardo-elizondo commented Jan 17, 2020

That of course fails since we are clearing the module state before we clear up the sys globals. You bring up a good point though. I'll just push this past the clearing of every module dictionary.

I just tested that locally and it worked, even on your new example. I will add a new test and update the PR

@DinoV

This comment has been minimized.

Copy link
Contributor

DinoV commented Jan 17, 2020

Another possibility would be restoring sys to its original form like we do builtins: https://github.com/eduardo-elizondo/cpython/blob/f2cfc13f65e94f755b5d47218f4479bca30118fc/Python/import.c#L564

And then collecting and then killing off sys/builtin and collecting again.

Copy link
Contributor

DinoV left a comment

It'd be good to add the various test cases in the bug to the PR as well.

@eduardo-elizondo

This comment has been minimized.

Copy link
Contributor Author

eduardo-elizondo commented Jan 17, 2020

@DinoV given that sys is also treated specially just like builtins, it might make sense to also add this. Also, agreed now that you've pointed out a couple of other edge cases, I'll add more tests.

Nit
@eduardo-elizondo

This comment has been minimized.

Copy link
Contributor Author

eduardo-elizondo commented Jan 18, 2020

Actually, I decided against creating a sysdict copy to avoid increasing the heap memory needed for the interpreter state. Just pushing further back the module state clearing does the trick.

By the way, as you might have noticed in the sys test, C.__del__ is not called when sys holds an instance to it. This is the normal behavior even before any changes were added related to the module state.

@@ -619,6 +617,9 @@ _PyImport_Cleanup(PyThreadState *tstate)
}
_PyModule_ClearDict(interp->builtins);

/* Clear module dict copies stored in the interpreter state */

This comment has been minimized.

Copy link
@eduardo-elizondo

eduardo-elizondo Jan 18, 2020

Author Contributor

I'm tempted to change this comment to: "Clear module dict copies stored in the interpreter state only after all the modules have been cleared" or something along those lines.

@pablogsal pablogsal self-requested a review Jan 18, 2020
@pablogsal pablogsal self-assigned this Jan 18, 2020
@Jongy

This comment has been minimized.

Copy link

Jongy commented Jan 18, 2020

Wow. Funny story.

I was debugging some Python crash I got lately, and realized it has something to do with mishandled refcounts. I hooked _Py_Dealloc, logged all freed objects and identified the object freed incorrectly.

After "solving" that with a temporary Py_INCREF(op); return from dealloc, I got another unrelated crash in which it seemed like the PyModuleObject of _struct is missing? PyState_FindModule returnsNULL. So I started tracing how did this happen using the _Py_Dealloc hook, but decided to stop for today because it's getting late here.

Just before getting off the computer, I hopped by the latest PRs of CPython, to see what kind of interpreter bugs people usually encounter and fix. And so it happens the very first PR I stumble upon is this exact problem. Luckiest click ever.

TL;DR it was funny and I can confirm moving _PyInterpreterState_ClearModules down 50 lines fixes the problem for me 🤷‍♂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.