-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-34683: Make SyntaxError column offsets consistently 1-indexed #9338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b1255d
to
b0dbd0d
Compare
This is reasonable. But I did find some places where it makes things worse. E.g.
this positions the caret just past the invalid token (
It would behoove someone to carefully double-check all cases where a syntax error originates in the lexer -- there are a variety of cases. |
Aah, good catch, I'll go through the errors and add a lot more test cases. |
That issue has been resolved, I had failed to change |
Have you at least grepped all of the source for occurrences of SyntaxError?
|
b172d27
to
d482c5b
Compare
Yup, that's what I did and then I grepped for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks fine, except I'm suddenly getting cold feet about changing the offset from 1-based to 0-based.
While I agree that it was previously inconsistent, and Python generally uses 0-based indexing, for common cases the offset is now noticeably moving from 1-based to 0-based, and that may affect tools that catch SyntaxError and then use the lineno and offset fields to display the error in an editor. I also happen to know that text editors (e.g. vim, Emacs) typically use 1-based column offsets.
Now, the meaning of the offset field is not publicly documented (at least https://docs.python.org/3/library/exceptions.html#SyntaxError doesn't mention that it's 1-based), but I'd still like to tread lightly here.
Sorry for the last-minute change of heart!
@terryjreedy Does IDLE care about this?
d482c5b
to
66d6c9a
Compare
Another potential downside is that this is a breaking change to |
OK -- if we can make it consistent with 0-based, we can also make it
consistent with 1-based, right? Do you feel you have the energy to do it
that way?
|
Yeah that sounds good to me, lets keep around the new tests and make it all consistent to 1-indexed. |
I thought that the offset is 0-based in most case. Just in some cases it points to the start of invalid token, and in other cases it points past the end of it. This depends on the place of raising an error: tokenizer, AST, compiler. In case of 1-character tokens this looks as 1-based offsets. |
start of tokens in parsing errors.
e1b2332
to
14c6520
Compare
Anywhere its 0-indexed seems like a mistake because the caret printing code uses |
fwiw I think Guido's intuition about tools breaking from changing to 0-indexing is right, a vim plugin that does syntax checking for vim assumes SyntaxError's offset is 1-indexed: vim-syntastic/syntastic@6c91e8d |
d3502e4
to
e53a27a
Compare
I presume 'now' in "Column offsets are now 0-indexed" meant after rather than before the original patch, though not 'now', after the patch revision. tk and hence IDLE use 1-based line numbers and 0-based column slice offsets. The tk/IDLE text cursor is a vertical 'slice' bar positioned between characters, as in this comment box, and many (most?, all?) GUI editors. IDLE currently subtracts 1 (in three places) from what it assumes to be 1-based offsets. At the left margin it reports 0 on the status display and passes 0 as the start-slice argument when coloring a slice. Hence, IDLE highlights the ' ' before 1 in I pulled this PR to a master-based branch and the Python caret and IDLE highlight still mark the space before 1, not 1 itself. So something does not seem to be fixed yet. I personally would prefer 0-based slice positions and would happily remove the '-1's. But I notice that Notepad++ (not written in Python), reports 1, not 0, for the left-margin position. I can imagine that some python-based code might expect the same. AFAIK, text-based editors with characters cursors (underline or over-block), imitating terminals, use 1-based positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated my clone, created branch 'pr_9338', and a file containing for 1 in a: pass
. Both python and IDLE still mark the space before 1. So either I do not understand the issue or the patch is not complete.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
That's weird, here's what I get on Linux:
and this is Windows:
there's also a test that covers that case here: https://github.com/python/cpython/pull/9338/files#diff-435de67ff9c3be9a3ec7a5fd9550c1c0R216 |
That's also what I get on macOS.
|
Based on error of not recompiling python after applying patch with C code.
I almost never test patches with changes to .c files, so I inadvertently verified that the new tests fail without the new code compiled and running. Looks good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this now, it looks good. Thanks!
check('class foo:return 1', 1, 11) | ||
check('def f():\n continue', 2, 3) | ||
check('def f():\n break', 2, 3) | ||
check('try:\n pass\nexcept:\n pass\nexcept ValueError:\n pass', 2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the line number for this error is off (it should complain on line 3, not on line 2, but that's not a new bug.
I guess the auto-merge doesn't work, either because some Azure tests flaked or because some reviewers didn't yet approve. |
Column offsets are now 0-indexed and errors now point to the start of the token. Fixes the off-by-one problem for ast.c errors.
https://bugs.python.org/issue34683