Skip to content

bpo-34683: Make SyntaxError column offsets consistently 1-indexed #9338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 24, 2018

Conversation

ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented Sep 15, 2018

Column offsets are now 0-indexed and errors now point to the start of the token. Fixes the off-by-one problem for ast.c errors.

https://bugs.python.org/issue34683

@gvanrossum
Copy link
Member

This is reasonable. But I did find some places where it makes things worse. E.g.

$ ./python.exe _.py
  File "_.py", line 1
    (0x+1)
       ^
SyntaxError: invalid hexadecimal literal

this positions the caret just past the invalid token (0x), whereas previously it was positioned at the final character:

$ python3 _.py
  File "_.py", line 1
    (0x+1)
      ^
SyntaxError: invalid token

It would behoove someone to carefully double-check all cases where a syntax error originates in the lexer -- there are a variety of cases.

@ammaraskar
Copy link
Member Author

Aah, good catch, I'll go through the errors and add a lot more test cases.

@serhiy-storchaka serhiy-storchaka self-requested a review September 17, 2018 07:24
@ammaraskar
Copy link
Member Author

That issue has been resolved, I had failed to change tokenizer.c which still used 1-indexed column offsets. Added a bunch more test cases but they're still not exhaustive, I'd appreciate someone giving it a second look to find where else syntax errors can bubble up from.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 17, 2018 via email

@ammaraskar ammaraskar force-pushed the caret_fix branch 2 times, most recently from b172d27 to d482c5b Compare September 17, 2018 15:34
@ammaraskar
Copy link
Member Author

Yup, that's what I did and then I grepped for check_syntax_error in the test suite and made sure I had a test for each case there.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine, except I'm suddenly getting cold feet about changing the offset from 1-based to 0-based.

While I agree that it was previously inconsistent, and Python generally uses 0-based indexing, for common cases the offset is now noticeably moving from 1-based to 0-based, and that may affect tools that catch SyntaxError and then use the lineno and offset fields to display the error in an editor. I also happen to know that text editors (e.g. vim, Emacs) typically use 1-based column offsets.

Now, the meaning of the offset field is not publicly documented (at least https://docs.python.org/3/library/exceptions.html#SyntaxError doesn't mention that it's 1-based), but I'd still like to tread lightly here.

Sorry for the last-minute change of heart!

@terryjreedy Does IDLE care about this?

@ammaraskar
Copy link
Member Author

Another potential downside is that this is a breaking change to PyErr_SyntaxLocationObject, which is exposed in the public C api.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 17, 2018 via email

@ammaraskar
Copy link
Member Author

Yeah that sounds good to me, lets keep around the new tests and make it all consistent to 1-indexed.

@serhiy-storchaka
Copy link
Member

I thought that the offset is 0-based in most case. Just in some cases it points to the start of invalid token, and in other cases it points past the end of it. This depends on the place of raising an error: tokenizer, AST, compiler. In case of 1-character tokens this looks as 1-based offsets.

@ammaraskar ammaraskar changed the title bpo-34683: Make SyntaxError column offsets 0-indexed bpo-34683: Make SyntaxError column offsets consistently 1-indexed Sep 17, 2018
@ammaraskar
Copy link
Member Author

Anywhere its 0-indexed seems like a mistake because the caret printing code uses offset > 0 instead of offset >= 0, and there's old test cases like this: 503d6c5
where it looks like the offset is 1-indexed.

@ammaraskar
Copy link
Member Author

ammaraskar commented Sep 17, 2018

fwiw I think Guido's intuition about tools breaking from changing to 0-indexing is right, a vim plugin that does syntax checking for vim assumes SyntaxError's offset is 1-indexed: vim-syntastic/syntastic@6c91e8d

@terryjreedy
Copy link
Member

I presume 'now' in "Column offsets are now 0-indexed" meant after rather than before the original patch, though not 'now', after the patch revision.

tk and hence IDLE use 1-based line numbers and 0-based column slice offsets. The tk/IDLE text cursor is a vertical 'slice' bar positioned between characters, as in this comment box, and many (most?, all?) GUI editors. IDLE currently subtracts 1 (in three places) from what it assumes to be 1-based offsets. At the left margin it reports 0 on the status display and passes 0 as the start-slice argument when coloring a slice. Hence, IDLE highlights the ' ' before 1 in for 1 in a: pass, when the offset is 0-based. I most care about consistency.

I pulled this PR to a master-based branch and the Python caret and IDLE highlight still mark the space before 1, not 1 itself. So something does not seem to be fixed yet.

I personally would prefer 0-based slice positions and would happily remove the '-1's. But I notice that Notepad++ (not written in Python), reports 1, not 0, for the left-margin position. I can imagine that some python-based code might expect the same. AFAIK, text-based editors with characters cursors (underline or over-block), imitating terminals, use 1-based positions.

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated my clone, created branch 'pr_9338', and a file containing for 1 in a: pass. Both python and IDLE still mark the space before 1. So either I do not understand the issue or the patch is not complete.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ammaraskar
Copy link
Member Author

That's weird, here's what I get on Linux:

$ ./python /tmp/s.py
  File "/tmp/s.py", line 1
    for 1 in a: pass
        ^
SyntaxError: can't assign to literal

and this is Windows:

λ win32\python D:\workspace\s.py
  File "D:\workspace\s.py", line 1
    for 1 in a: pass
        ^
SyntaxError: can't assign to literal

there's also a test that covers that case here: https://github.com/python/cpython/pull/9338/files#diff-435de67ff9c3be9a3ec7a5fd9550c1c0R216
5 is where the '1' ends up with 1-indexing

@gvanrossum
Copy link
Member

gvanrossum commented Sep 17, 2018 via email

@terryjreedy terryjreedy dismissed their stale review September 17, 2018 23:04

Based on error of not recompiling python after applying patch with C code.

@terryjreedy
Copy link
Member

I almost never test patches with changes to .c files, so I inadvertently verified that the new tests fail without the new code compiled and running. Looks good now.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this now, it looks good. Thanks!

check('class foo:return 1', 1, 11)
check('def f():\n continue', 2, 3)
check('def f():\n break', 2, 3)
check('try:\n pass\nexcept:\n pass\nexcept ValueError:\n pass', 2, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the line number for this error is off (it should complain on line 3, not on line 2, but that's not a new bug.

@gvanrossum
Copy link
Member

I guess the auto-merge doesn't work, either because some Azure tests flaked or because some reviewers didn't yet approve.

@gvanrossum gvanrossum merged commit 025eb98 into python:master Sep 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants