bpo-34683: Make SyntaxError column offsets consistently 1-indexed #9338

ammaraskar · 2018-09-15T23:34:29Z

Column offsets are now 0-indexed and errors now point to the start of the token. Fixes the off-by-one problem for ast.c errors.

https://bugs.python.org/issue34683

gvanrossum · 2018-09-16T23:39:13Z

This is reasonable. But I did find some places where it makes things worse. E.g.

$ ./python.exe _.py
  File "_.py", line 1
    (0x+1)
       ^
SyntaxError: invalid hexadecimal literal

this positions the caret just past the invalid token (0x), whereas previously it was positioned at the final character:

$ python3 _.py
  File "_.py", line 1
    (0x+1)
      ^
SyntaxError: invalid token

It would behoove someone to carefully double-check all cases where a syntax error originates in the lexer -- there are a variety of cases.

ammaraskar · 2018-09-16T23:47:40Z

Aah, good catch, I'll go through the errors and add a lot more test cases.

ammaraskar · 2018-09-17T14:22:13Z

That issue has been resolved, I had failed to change tokenizer.c which still used 1-indexed column offsets. Added a bunch more test cases but they're still not exhaustive, I'd appreciate someone giving it a second look to find where else syntax errors can bubble up from.

gvanrossum · 2018-09-17T14:53:27Z

Have you at least grepped all of the source for occurrences of SyntaxError?

ammaraskar · 2018-09-17T15:34:44Z

Yup, that's what I did and then I grepped for check_syntax_error in the test suite and made sure I had a test for each case there.

gvanrossum

This all looks fine, except I'm suddenly getting cold feet about changing the offset from 1-based to 0-based.

While I agree that it was previously inconsistent, and Python generally uses 0-based indexing, for common cases the offset is now noticeably moving from 1-based to 0-based, and that may affect tools that catch SyntaxError and then use the lineno and offset fields to display the error in an editor. I also happen to know that text editors (e.g. vim, Emacs) typically use 1-based column offsets.

Now, the meaning of the offset field is not publicly documented (at least https://docs.python.org/3/library/exceptions.html#SyntaxError doesn't mention that it's 1-based), but I'd still like to tread lightly here.

Sorry for the last-minute change of heart!

@terryjreedy Does IDLE care about this?

ammaraskar · 2018-09-17T15:43:18Z

Another potential downside is that this is a breaking change to PyErr_SyntaxLocationObject, which is exposed in the public C api.

gvanrossum · 2018-09-17T15:51:14Z

OK -- if we can make it consistent with 0-based, we can also make it consistent with 1-based, right? Do you feel you have the energy to do it that way?

ammaraskar · 2018-09-17T15:51:54Z

Yeah that sounds good to me, lets keep around the new tests and make it all consistent to 1-indexed.

serhiy-storchaka · 2018-09-17T16:47:07Z

I thought that the offset is 0-based in most case. Just in some cases it points to the start of invalid token, and in other cases it points past the end of it. This depends on the place of raising an error: tokenizer, AST, compiler. In case of 1-character tokens this looks as 1-based offsets.

start of tokens in parsing errors.

ammaraskar · 2018-09-17T16:59:26Z

Anywhere its 0-indexed seems like a mistake because the caret printing code uses offset > 0 instead of offset >= 0, and there's old test cases like this: 503d6c5
where it looks like the offset is 1-indexed.

ammaraskar · 2018-09-17T17:09:28Z

fwiw I think Guido's intuition about tools breaking from changing to 0-indexing is right, a vim plugin that does syntax checking for vim assumes SyntaxError's offset is 1-indexed: vim-syntastic/syntastic@6c91e8d

terryjreedy · 2018-09-17T21:18:26Z

I presume 'now' in "Column offsets are now 0-indexed" meant after rather than before the original patch, though not 'now', after the patch revision.

tk and hence IDLE use 1-based line numbers and 0-based column slice offsets. The tk/IDLE text cursor is a vertical 'slice' bar positioned between characters, as in this comment box, and many (most?, all?) GUI editors. IDLE currently subtracts 1 (in three places) from what it assumes to be 1-based offsets. At the left margin it reports 0 on the status display and passes 0 as the start-slice argument when coloring a slice. Hence, IDLE highlights the ' ' before 1 in for 1 in a: pass, when the offset is 0-based. I most care about consistency.

I pulled this PR to a master-based branch and the Python caret and IDLE highlight still mark the space before 1, not 1 itself. So something does not seem to be fixed yet.

I personally would prefer 0-based slice positions and would happily remove the '-1's. But I notice that Notepad++ (not written in Python), reports 1, not 0, for the left-margin position. I can imagine that some python-based code might expect the same. AFAIK, text-based editors with characters cursors (underline or over-block), imitating terminals, use 1-based positions.

terryjreedy

I updated my clone, created branch 'pr_9338', and a file containing for 1 in a: pass. Both python and IDLE still mark the space before 1. So either I do not understand the issue or the patch is not complete.

bedevere-bot · 2018-09-17T21:23:13Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

ammaraskar · 2018-09-17T21:46:52Z

That's weird, here's what I get on Linux:

$ ./python /tmp/s.py
  File "/tmp/s.py", line 1
    for 1 in a: pass
        ^
SyntaxError: can't assign to literal

and this is Windows:

λ win32\python D:\workspace\s.py
  File "D:\workspace\s.py", line 1
    for 1 in a: pass
        ^
SyntaxError: can't assign to literal

there's also a test that covers that case here: https://github.com/python/cpython/pull/9338/files#diff-435de67ff9c3be9a3ec7a5fd9550c1c0R216
5 is where the '1' ends up with 1-indexing

gvanrossum · 2018-09-17T21:55:58Z

That's also what I get on macOS.

Based on error of not recompiling python after applying patch with C code.

terryjreedy · 2018-09-17T23:13:24Z

I almost never test patches with changes to .c files, so I inadvertently verified that the new tests fail without the new code compiled and running. Looks good now.

gvanrossum

I'll merge this now, it looks good. Thanks!

gvanrossum · 2018-09-24T21:00:29Z

Lib/test/test_exceptions.py

+        check('class foo:return 1', 1, 11)
+        check('def f():\n  continue', 2, 3)
+        check('def f():\n  break', 2, 3)
+        check('try:\n  pass\nexcept:\n  pass\nexcept ValueError:\n  pass', 2, 3)


FWIW the line number for this error is off (it should complain on line 3, not on line 2, but that's not a new bug.

gvanrossum · 2018-09-24T21:11:36Z

I guess the auto-merge doesn't work, either because some Azure tests flaked or because some reviewers didn't yet approve.

the-knights-who-say-ni added the CLA signed label Sep 15, 2018

bedevere-bot added the awaiting review label Sep 15, 2018

ammaraskar force-pushed the caret_fix branch 2 times, most recently from 6b1255d to b0dbd0d Compare September 16, 2018 01:11

serhiy-storchaka self-requested a review September 17, 2018 07:24

ammaraskar force-pushed the caret_fix branch 2 times, most recently from b172d27 to d482c5b Compare September 17, 2018 15:34

gvanrossum reviewed Sep 17, 2018

View reviewed changes

ammaraskar force-pushed the caret_fix branch from d482c5b to 66d6c9a Compare September 17, 2018 15:41

bpo-34683: Standardize SyntaxError column offsets to 1-indexed. Point to

14c6520

start of tokens in parsing errors.

ammaraskar force-pushed the caret_fix branch from e1b2332 to 14c6520 Compare September 17, 2018 16:50

ammaraskar changed the title ~~bpo-34683: Make SyntaxError column offsets 0-indexed~~ bpo-34683: Make SyntaxError column offsets consistently 1-indexed Sep 17, 2018

Fix offset values in old tests

e53a27a

ammaraskar force-pushed the caret_fix branch from d3502e4 to e53a27a Compare September 17, 2018 18:38

terryjreedy previously requested changes Sep 17, 2018

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Sep 17, 2018

gvanrossum reviewed Sep 24, 2018

View reviewed changes

gvanrossum approved these changes Sep 24, 2018

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Sep 24, 2018

gvanrossum added 🤖 automerge and removed 🤖 automerge labels Sep 24, 2018

gvanrossum merged commit 025eb98 into python:master Sep 24, 2018

bedevere-bot removed the awaiting merge label Sep 24, 2018

Uh oh!

bpo-34683: Make SyntaxError column offsets consistently 1-indexed #9338

bpo-34683: Make SyntaxError column offsets consistently 1-indexed #9338

Uh oh!

Conversation

ammaraskar commented Sep 15, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Sep 16, 2018

Uh oh!

ammaraskar commented Sep 16, 2018

Uh oh!

ammaraskar commented Sep 17, 2018

Uh oh!

gvanrossum commented Sep 17, 2018 via email

Uh oh!

ammaraskar commented Sep 17, 2018

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

ammaraskar commented Sep 17, 2018

Uh oh!

gvanrossum commented Sep 17, 2018 via email

Uh oh!

ammaraskar commented Sep 17, 2018

Uh oh!

serhiy-storchaka commented Sep 17, 2018

Uh oh!

ammaraskar commented Sep 17, 2018

Uh oh!

ammaraskar commented Sep 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terryjreedy commented Sep 17, 2018

Uh oh!

terryjreedy left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Sep 17, 2018

Uh oh!

ammaraskar commented Sep 17, 2018

Uh oh!

gvanrossum commented Sep 17, 2018 via email

Uh oh!

terryjreedy commented Sep 17, 2018

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

gvanrossum Sep 24, 2018

Choose a reason for hiding this comment

Uh oh!

gvanrossum commented Sep 24, 2018

Uh oh!

Uh oh!

ammaraskar commented Sep 15, 2018 •

edited by bedevere-bot

Loading

ammaraskar commented Sep 17, 2018 •

edited

Loading