Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39537 New line number table format #18326

Closed
wants to merge 7 commits into from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Feb 3, 2020

This PR add a new line number table format as specified in bpo:39537.

It also:

  • Fixes up frame.setlineno to make it more robust, but less "smart". It used to attempt to disallow "unreasonable" jumps. It now just tests whether a jump is safe, that is won't crash the interpreter, allowing any safe jump. This should make frame.setlineno much more robust w.r.t. bytecode compiler changes.
  • Updates the compiler to mark compiler-generated code as having no line number.
  • Changes a number of tests to account for pass statements having no code associated with them.

To make life easier for tools, we attach a line number to the LOAD CONSTANT None; RETURN VALUE pair emitted for empty code object, ensuring that at least one bytecode per code object has a line number.

https://bugs.python.org/issue39537

@markshannon markshannon changed the title bpo:39537 New line number table format bpo-39537 New line number table format Feb 3, 2020
@nedbat
Copy link
Member

@nedbat nedbat commented Feb 3, 2020

I haven't understood all the implications of this, I hope it won't have unintended consequences for coverage.py. But I'll start with a small suggestion: don't name this co_lnotab. Pick a new name for this. No code can transparently use this new data, so it shouldn't have the same name. A new name will mean code using the line number table doesn't have to switch on Python version, or janky heuristics, to know what data format they are dealing with. Even co_lnotab2 would be good.

@pablogsal pablogsal requested review from pablogsal and Yhg1s Feb 3, 2020
@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 4, 2020

@nedbat I too hope it won't have unintended consequences for coverage.py
Does coverage attempt to parse co_lnotab?

Personally, I don't think it makes sense to expose the line number table at all.
Making a co_lnotab a property that recreates the old format is one possibility to improve backwards compatibility.

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 4, 2020

Yes, coverage.py parses co_lnotab (I assumed you knew this, and was why you made me nosy on the bpo): https://github.com/nedbat/coveragepy/blob/master/coverage/parser.py#L392-L419 . Notice that this code includes having to adapt to the negative offsets introduced in 3.6.

Don't you want coverage.py to have access to the more accurate information in the new table? Isn't improved tooling the point of the change?

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 5, 2020

Yes, of course I want coverage.py to have access to correct line numbers.

Why do you need to parse co_lontab at all? What API are we missing that forces you to do so?
Since the dis module has to do the same parsing anyway, we might as well make that part of the public API.

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 5, 2020

Currently, coverage.py only uses co_lnotab to get the set of all possible line numbers. In the past, it did much more extensive analysis of the byte code. Even then, co_lnotab was simply a list of pairs (byte_offset, line_number), so providing that information somehow would suit me.

Keep backward compatibility in mind, though: I won't be able to use a new 3.9 API any time soon, and so will have to adapt to changing data formats. I can make do with a version check if need be.

Serhiy is talking on the bpo about adding richer information to this table also. I'm not sure what the right abstract API is for that though, since we're extending the information stored.

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 6, 2020

Do you have a number for that bpo issue?

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 6, 2020

Would a method on the code object that exposes the C function PyAddr2Line() be suitable?
I'm thinking inst_to_line(inst) on the code object. It would convert the "inst" (byte offset) into the line number such that f.__code__.inst_to_line(f.f_lasti) == f.f_lineno

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 6, 2020

@markshannon By "the bpo", I meant https://bugs.python.org/issue39537, the one that started this pull request.

The way coverage.py uses co_lnotab now, PyCode_Addr2Line would be inconvenient: I don't have a byte offset in mind and need the line number. I just iterate the entire table to get the complete set of line numbers. Doing that with PyCode_Addr2Line would require iterating over byte addresses, calling the function each time, which would give O(N^2) behavior.

If I had to choose an abstract interface to this table, it would be an iterator of (byte_offset, line_number) pairs, just like the function findlinestarts in dis.py.

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 6, 2020

What about bytecodes with no line number?
E.g. this sequence where the bytecodes at 8 to 11 have no line number.
Would None as a line number cause problems?
(0, 4), (8, None), (12, 5), etc.

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 6, 2020

Can you give me an example for concreteness? If one of those bytecodes raises an exception, what would the traceback say?

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 7, 2020

Either they can't raise an exception, or it is handled locally.
For example when cleaning up the variable used for an exception, the compiler emits

LOAD_CONSTANT None
STORE_FAST var
DELETE_FAST var

which cannot raise.
Let's assume for this discussion that the compiler and interpreter do the right thing.

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 7, 2020

To clarify the above. That bytecode would be emitted to clean up var in

try:
     ...
except Exception as var:
     ...

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 7, 2020

You asked,

Would None as a line number cause problems?

I don't know how to answer that. This would be a new API, so I have to change code to adapt to it. If you tell me some of the line numbers would be None, then I will deal with that. At the moment, I'm just collecting the set of all line numbers, so I would drop None from that set and carry on.

I asked about tracebacks to get a sense of what None might actually mean in this new context.

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 10, 2020

The format of the line number table is orthogonal to how the compiler generates code that might raise exceptions. All tracebacks will continue to have correct line numbers.

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 10, 2020

I've always been taught, and have taught, that any Python byte code can raise exceptions. The code will have to deal with the possibility of a frame in a traceback being at a byte code with no line number. What will the traceback display for it? "None"? Some nearby line number? Will the frame be omitted from the traceback (the worst possibility)?

@markshannon
Copy link
Member Author

@markshannon markshannon commented Feb 21, 2020

Not all bytecodes can raise exceptions. The compiler can guarantee that the certain sequences of generated bytecodes will not raise an exception. Examples:
At the end of a function, the sequence LOAD_CONST (None); RETURN_VALUE is emitted. It cannot raise.
To clean up a value used in an exception block , the sequence LOAD_CONST (None); STORE_FAST var; DELETE_FAST var is emitted. It cannot raise either.

@nedbat
Copy link
Member

@nedbat nedbat commented Feb 21, 2020

Maybe I've been misinformed. Can't a KeyboardInterrupt happen during those bytecodes?

@markshannon
Copy link
Member Author

@markshannon markshannon commented Mar 4, 2020

KeyboardInterrupt is an asynchronous exception, which means that the concept of "during" isn't that meaningful. A better way to think of it is that an exception will be triggered "soon" after the interrupt occurs. "Soon" usually means within a few bytecodes. Even now we don't check on every bytecode, and to ensure the semantics of with and async with we need check less frequently.

@markshannon
Copy link
Member Author

@markshannon markshannon commented Jul 9, 2020

I've given this a bit more thought lately.

@nedbat
How about a new attribute, called co_lines, that returned an iterator of (start_offset, end_offset, line_number) triples?
Would that mean you wouldn't need to access co_lnotab?

I think a way forward would be to hide any new format, and have the co_lnotab lazily compute the equivalent table in the old form.

@nedbat
Copy link
Member

@nedbat nedbat commented Jul 12, 2020

How about a new attribute, called co_lines, that returned an iterator of (start_offset, end_offset, line_number) triples?

Sure, that would work. Anything that lets me get the set of all possible line numbers is fine.

@markshannon
Copy link
Member Author

@markshannon markshannon commented Jul 21, 2020

Closing until PEP 626 is accepted or rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants