bpo-39537 New line number table format #18326

markshannon · 2020-02-03T10:37:09Z

This PR add a new line number table format as specified in bpo:39537.

It also:

Fixes up frame.setlineno to make it more robust, but less "smart". It used to attempt to disallow "unreasonable" jumps. It now just tests whether a jump is safe, that is won't crash the interpreter, allowing any safe jump. This should make frame.setlineno much more robust w.r.t. bytecode compiler changes.
Updates the compiler to mark compiler-generated code as having no line number.
Changes a number of tests to account for pass statements having no code associated with them.

To make life easier for tools, we attach a line number to the LOAD CONSTANT None; RETURN VALUE pair emitted for empty code object, ensuring that at least one bytecode per code object has a line number.

https://bugs.python.org/issue39537

…n be shown to not crash the intepreter.

…confusing pdb too much.

nedbat · 2020-02-03T11:25:48Z

I haven't understood all the implications of this, I hope it won't have unintended consequences for coverage.py. But I'll start with a small suggestion: don't name this co_lnotab. Pick a new name for this. No code can transparently use this new data, so it shouldn't have the same name. A new name will mean code using the line number table doesn't have to switch on Python version, or janky heuristics, to know what data format they are dealing with. Even co_lnotab2 would be good.

markshannon · 2020-02-04T09:24:06Z

@nedbat I too hope it won't have unintended consequences for coverage.py
Does coverage attempt to parse co_lnotab?

Personally, I don't think it makes sense to expose the line number table at all.
Making a co_lnotab a property that recreates the old format is one possibility to improve backwards compatibility.

nedbat · 2020-02-04T10:23:01Z

Yes, coverage.py parses co_lnotab (I assumed you knew this, and was why you made me nosy on the bpo): https://github.com/nedbat/coveragepy/blob/master/coverage/parser.py#L392-L419 . Notice that this code includes having to adapt to the negative offsets introduced in 3.6.

Don't you want coverage.py to have access to the more accurate information in the new table? Isn't improved tooling the point of the change?

markshannon · 2020-02-05T09:29:12Z

Yes, of course I want coverage.py to have access to correct line numbers.

Why do you need to parse co_lontab at all? What API are we missing that forces you to do so?
Since the dis module has to do the same parsing anyway, we might as well make that part of the public API.

nedbat · 2020-02-05T10:14:48Z

Currently, coverage.py only uses co_lnotab to get the set of all possible line numbers. In the past, it did much more extensive analysis of the byte code. Even then, co_lnotab was simply a list of pairs (byte_offset, line_number), so providing that information somehow would suit me.

Keep backward compatibility in mind, though: I won't be able to use a new 3.9 API any time soon, and so will have to adapt to changing data formats. I can make do with a version check if need be.

Serhiy is talking on the bpo about adding richer information to this table also. I'm not sure what the right abstract API is for that though, since we're extending the information stored.

markshannon · 2020-02-06T10:26:47Z

Do you have a number for that bpo issue?

markshannon · 2020-02-06T10:29:57Z

Would a method on the code object that exposes the C function PyAddr2Line() be suitable?
I'm thinking inst_to_line(inst) on the code object. It would convert the "inst" (byte offset) into the line number such that f.__code__.inst_to_line(f.f_lasti) == f.f_lineno

nedbat · 2020-02-06T10:48:23Z

@markshannon By "the bpo", I meant https://bugs.python.org/issue39537, the one that started this pull request.

The way coverage.py uses co_lnotab now, PyCode_Addr2Line would be inconvenient: I don't have a byte offset in mind and need the line number. I just iterate the entire table to get the complete set of line numbers. Doing that with PyCode_Addr2Line would require iterating over byte addresses, calling the function each time, which would give O(N^2) behavior.

If I had to choose an abstract interface to this table, it would be an iterator of (byte_offset, line_number) pairs, just like the function findlinestarts in dis.py.

markshannon · 2020-02-06T19:55:37Z

What about bytecodes with no line number?
E.g. this sequence where the bytecodes at 8 to 11 have no line number.
Would None as a line number cause problems?
(0, 4), (8, None), (12, 5), etc.

nedbat · 2020-02-06T20:08:02Z

Can you give me an example for concreteness? If one of those bytecodes raises an exception, what would the traceback say?

markshannon · 2020-02-07T09:51:15Z

Either they can't raise an exception, or it is handled locally.
For example when cleaning up the variable used for an exception, the compiler emits

LOAD_CONSTANT None
STORE_FAST var
DELETE_FAST var

which cannot raise.
Let's assume for this discussion that the compiler and interpreter do the right thing.

markshannon · 2020-02-07T09:53:02Z

To clarify the above. That bytecode would be emitted to clean up var in

try:
     ...
except Exception as var:
     ...

nedbat · 2020-02-07T11:34:14Z

You asked,

Would None as a line number cause problems?

I don't know how to answer that. This would be a new API, so I have to change code to adapt to it. If you tell me some of the line numbers would be None, then I will deal with that. At the moment, I'm just collecting the set of all line numbers, so I would drop None from that set and carry on.

I asked about tracebacks to get a sense of what None might actually mean in this new context.

markshannon · 2020-02-10T10:07:20Z

The format of the line number table is orthogonal to how the compiler generates code that might raise exceptions. All tracebacks will continue to have correct line numbers.

nedbat · 2020-02-10T10:35:50Z

I've always been taught, and have taught, that any Python byte code can raise exceptions. The code will have to deal with the possibility of a frame in a traceback being at a byte code with no line number. What will the traceback display for it? "None"? Some nearby line number? Will the frame be omitted from the traceback (the worst possibility)?

markshannon · 2020-02-21T14:16:51Z

Not all bytecodes can raise exceptions. The compiler can guarantee that the certain sequences of generated bytecodes will not raise an exception. Examples:
At the end of a function, the sequence LOAD_CONST (None); RETURN_VALUE is emitted. It cannot raise.
To clean up a value used in an exception block , the sequence LOAD_CONST (None); STORE_FAST var; DELETE_FAST var is emitted. It cannot raise either.

nedbat · 2020-02-21T16:43:17Z

Maybe I've been misinformed. Can't a KeyboardInterrupt happen during those bytecodes?

markshannon · 2020-03-04T11:41:07Z

KeyboardInterrupt is an asynchronous exception, which means that the concept of "during" isn't that meaningful. A better way to think of it is that an exception will be triggered "soon" after the interrupt occurs. "Soon" usually means within a few bytecodes. Even now we don't check on every bytecode, and to ensure the semantics of with and async with we need check less frequently.

markshannon · 2020-07-09T09:37:52Z

I've given this a bit more thought lately.

@nedbat
How about a new attribute, called co_lines, that returned an iterator of (start_offset, end_offset, line_number) triples?
Would that mean you wouldn't need to access co_lnotab?

I think a way forward would be to hide any new format, and have the co_lnotab lazily compute the equivalent table in the old form.

nedbat · 2020-07-12T19:49:31Z

How about a new attribute, called co_lines, that returned an iterator of (start_offset, end_offset, line_number) triples?

Sure, that would work. Anything that lets me get the set of all possible line numbers is fine.

markshannon · 2020-07-21T14:53:14Z

Closing until PEP 626 is accepted or rejected.

markshannon added 6 commits Jan 31, 2020

Change line number table format. Faster and easier to use.

03d3f55

Make frame.setlineno more general and robust: Accept any jump that ca…

25c2b6f

…n be shown to not crash the intepreter.

Make sure that at least the one instruction has a line number; avoid …

2167c10

…confusing pdb too much.

Tidy up odds and ends.

c33ade6

Add news entry for new line number table format.

eb24f44

More tidying up.

2ef4205

the-knights-who-say-ni added the CLA signed label Feb 3, 2020

bedevere-bot added the awaiting core review label Feb 3, 2020

markshannon changed the title ~~bpo:39537 New line number table format~~ bpo-39537 New line number table format Feb 3, 2020

Fix compielr warnings.

01401d4

pablogsal requested review from pablogsal and Yhg1s Feb 3, 2020

markshannon closed this Jul 21, 2020

bpo-39537 New line number table format #18326

bpo-39537 New line number table format #18326

markshannon commented Feb 3, 2020 •

edited by bedevere-bot

nedbat commented Feb 3, 2020

markshannon commented Feb 4, 2020

nedbat commented Feb 4, 2020

markshannon commented Feb 5, 2020

nedbat commented Feb 5, 2020

markshannon commented Feb 6, 2020

markshannon commented Feb 6, 2020

nedbat commented Feb 6, 2020

markshannon commented Feb 6, 2020

nedbat commented Feb 6, 2020

markshannon commented Feb 7, 2020

markshannon commented Feb 7, 2020

nedbat commented Feb 7, 2020

markshannon commented Feb 10, 2020

nedbat commented Feb 10, 2020

markshannon commented Feb 21, 2020

nedbat commented Feb 21, 2020

markshannon commented Mar 4, 2020

markshannon commented Jul 9, 2020

nedbat commented Jul 12, 2020

markshannon commented Jul 21, 2020

bpo-39537 New line number table format #18326

bpo-39537 New line number table format #18326

Conversation

markshannon commented Feb 3, 2020 • edited by bedevere-bot

nedbat commented Feb 3, 2020

markshannon commented Feb 4, 2020

nedbat commented Feb 4, 2020

markshannon commented Feb 5, 2020

nedbat commented Feb 5, 2020

markshannon commented Feb 6, 2020

markshannon commented Feb 6, 2020

nedbat commented Feb 6, 2020

markshannon commented Feb 6, 2020

nedbat commented Feb 6, 2020

markshannon commented Feb 7, 2020

markshannon commented Feb 7, 2020

nedbat commented Feb 7, 2020

markshannon commented Feb 10, 2020

nedbat commented Feb 10, 2020

markshannon commented Feb 21, 2020

nedbat commented Feb 21, 2020

markshannon commented Mar 4, 2020

markshannon commented Jul 9, 2020

nedbat commented Jul 12, 2020

markshannon commented Jul 21, 2020

markshannon commented Feb 3, 2020 •

edited by bedevere-bot