Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frame.setlineno has serious flaws. #94438

Closed
3 tasks done
markshannon opened this issue Jun 30, 2022 · 18 comments
Closed
3 tasks done

frame.setlineno has serious flaws. #94438

markshannon opened this issue Jun 30, 2022 · 18 comments
Assignees
Labels
3.11 bug and security fixes 3.12 new features, bug and security fixes type-bug An unexpected behavior, bug, or error

Comments

@markshannon
Copy link
Member

markshannon commented Jun 30, 2022

The frame_setlineno function works in in stages:

  • Determine a set of possible bytecode offsets as targets from the line number.
  • Compute the stack state for these targets and the current position
  • Determine a best target. That is, the first one that has a compatible stack.
  • Pop values form the stack and jump.

The first steps is faulty (I think, I haven't demonstrated this) as it might be possible to jump to an instruction involved in frame creation. This should be easy to fix using the new _co_firsttraceable field.

The second step has (at least) three flaws:

  • It does not account for NULLs on the stack, making it possible to jump from a stack with NULLs to one that cannot handle NULLs.
  • It does not skip over caches, so could produce incorrect stacks by misinterpreting cache entries as normal instructions.
  • It is out of date. For example it thinks that PUSH_EXC_INFO pushes three values. It only pushes one.

Setting the line number of a frame is only possible in the debugger, so this isn't as terrible as might appear, but it definitely needs fixing.

@markshannon markshannon added type-crash A hard crash of the interpreter, possibly with a core dump deferred-blocker labels Jun 30, 2022
@markshannon markshannon added 3.11 bug and security fixes 3.12 new features, bug and security fixes labels Jun 30, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 1, 2022
…nes. (pythonGH-94444)

(cherry picked from commit be80db1)

Co-authored-by: Mark Shannon <mark@hotpy.org>
markshannon added a commit that referenced this issue Jul 1, 2022
* Account for NULLs on evaluation stack when jumping lines.
@pablogsal
Copy link
Member

pablogsal commented Jul 4, 2022

Moving this back to release blocker because apparently, this could end in many changes.

I am missing some context here on what this is affecting so I changed it from deferred blocker to release blocker if we think we can delay this to 3.12, please, say so :)

@iritkatriel
Copy link
Member

It is out of date. For example it thinks that PUSH_EXC_INFO pushes three values. It only pushes one.

After failing to write a test that will crash on this, I analysed the code and realised that the "exception handling opcode" cases of the switch are no longer reachable - there is nothing that will initialise their stack[] entries, so they get skipped in the continue; before the switch.

I created a PR to use my favourite macro in these cases: #94582

We could backport it, but we don't have to.

@iritkatriel
Copy link
Member

It does not skip over caches, so could produce incorrect stacks by misinterpreting cache entries as normal instructions.

I'm assuming this refers to the mark_stacks loop again, which is where stacks are produced.

The good news is that it looks like it just happens to work by accident. There is no case in the switch for the CACHE opcode, so it goes to the default case. Since the stack_effect of CACHE is 0, it just copies stack[i] to stack[i+1] unchanged. So the stack that is supposed to be on the next instruction will propagate as is until the first non-cache entry.

We could make it more explicit like this:

diff --git a/Objects/frameobject.c b/Objects/frameobject.c
index 34a6c46c6b..ccdf7de990 100644
--- a/Objects/frameobject.c
+++ b/Objects/frameobject.c
@@ -220,6 +220,11 @@ mark_stacks(PyCodeObject *code_obj, int len)
             }
             opcode = _Py_OPCODE(code[i]);
             switch (opcode) {
+                case CACHE:
+                {
+                    stacks[i+1] = stacks[i];
+                    break;
+                }
                 case JUMP_IF_FALSE_OR_POP:
                 case JUMP_IF_TRUE_OR_POP:
                 case POP_JUMP_FORWARD_IF_FALSE:

thoughts?

@sweeneyde
Copy link
Member

There is no case in the switch for the CACHE opcode, so it goes to the default case.

Don't cache entries get populated at specialization time with data that overwrites the CACHE opcode? I don't think we're keeping around 0s in the bytecode that aren't subject to change. If I'm understanding right, the problem comes from misinterpreting that specialized cache data as if it's some bytecode that has a stack effect.

It looks like the dis module looks up _inline_cache_entries[deop] to determine how many cache entries to skip over. Would that work here?

@iritkatriel
Copy link
Member

Ah I see! Would you like to try? (I'm signing off for the night soon.)

@sweeneyde
Copy link
Member

I don't think I'm going to have time in the next couple of days to come up with a decent test case and PR and everything, but I am thinking that something vaguely like this could work:

@@ -213,12 +213,15 @@ mark_stacks(PyCodeObject *code_obj, int len)
     int todo = 1;
     while (todo) {
         todo = 0;
-        for (i = 0; i < len; i++) {
+        int ncaches;
+        for (i = 0; i < len; i += (1 + ncaches)) {
+            ncaches = 0;
             int64_t next_stack = stacks[i];
             if (next_stack == UNINITIALIZED) {
                 continue;
             }
-            opcode = _Py_OPCODE(code[i]);
+            opcode = _PyOpcode_Deopt[_Py_OPCODE(code[i])];
+            ncaches = _PyOpcode_Caches[opcode];
             switch (opcode) {
                 case JUMP_IF_FALSE_OR_POP:
                 case JUMP_IF_TRUE_OR_POP:

iritkatriel added a commit that referenced this issue Jul 6, 2022
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jul 6, 2022
…EXC_INFO and POP_EXCEPT cases are no longer reachable (pythonGH-94582)

(cherry picked from commit 50b9a77)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
iritkatriel added a commit that referenced this issue Jul 6, 2022
…FO and POP_EXCEPT cases are no longer reachable (GH-94582) (GH-94595)

(cherry picked from commit 50b9a77)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
@brandtbucher
Copy link
Member

It does not skip over caches, so could produce incorrect stacks by misinterpreting cache entries as normal instructions

@iritkatriel is correct... nothing needs to be done about this. mark_stacks uses _PyCode_GetCode, which produces a "deoptimized" byte string equivalent to .co_code attribute access. Otherwise, we would need cases in this switch for all of the adaptive forms of these instructions.

So I think this issue can be closed?

@iritkatriel
Copy link
Member

Does it make sense to add some test that exercises this function on highly specialised/warmed up code? I think all the tests just run some function a small number of times.

@sweeneyde
Copy link
Member

Ah, my mistake, I missed the _PyCode_GetCode.

@brandtbucher
Copy link
Member

Well, the function is designed to totally ignore warmed-up code and just use the deoptimized string. So new tests would keep us from accidentally using it in the future, but I'm pretty sure other stuff here would start breaking much sooner (tons of instructions would be hitting the default case and would end up having a stack effect of PY_INVALID_STACK_EFFECT, which is INT_MAX).

Perhaps it would just be better to assert(delta != PY_INVALID_STACK_EFFECT)?

@iritkatriel
Copy link
Member

Let’s close then.

@brandtbucher
Copy link
Member

Reopening because extended arguments aren't handled correctly. :(

@brandtbucher brandtbucher reopened this Jul 21, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 22, 2022
…k_stacks (pythonGH-95110)

(cherry picked from commit e4d3a96)

Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
miss-islington added a commit that referenced this issue Jul 22, 2022
…ks (GH-95110)

(cherry picked from commit e4d3a96)

Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
@brandtbucher brandtbucher added type-bug An unexpected behavior, bug, or error and removed release-blocker type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 23, 2022
@brandtbucher
Copy link
Member

Removing release blocker status since all known crashes are fixed.

I'm still going to leave this open, thoough, since the four "is None" jumps aren't handled yet (which just means that we aren't able to jump to some valid locations).

@iritkatriel
Copy link
Member

I just noticed that the switch in mark_stacks doesn't have cases for POP_JUMP_IF_NONE/POP_JUMP_IF_NOT_NONE.

@brandtbucher
Copy link
Member

Yep. See my above comment. :)

@iritkatriel
Copy link
Member

ah yeah :)

@gvanrossum
Copy link
Member

Shall we nevertheless close this, and open a more specific issue?

@markshannon
Copy link
Member Author

Yes.
Most, if not all, of the flaws I listed have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 bug and security fixes 3.12 new features, bug and security fixes type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

6 participants