Make sure we pass UTF-16 code unit offsets in all the LSP types #1113

Xanewok · 2018-11-03T12:33:42Z

cc #1112
cc microsoft/language-server-protocol#376

This causes problems with displaying correct diagnostic span and code suggestion spans (here).

Xanewok · 2018-11-30T11:27:53Z

Currently LSP specifies all the text offset to use the UTF-16 code unit ("Text Documents section in the LSP specification) and so that's what Range type is expected to pass.

However, RLS uses its own rls_span::Range (from rls-span crate, used both by the rustc and rls), which has text unit offset specified as the unicode scalar values (think Rust char and chars()), which we naively transform to Range using rls_to_range: (bad!)

rls/src/lsp_data.rs

Line 130 in 816017b

pub fn rls_to_range(r: span::Range<span::ZeroIndexed>) -> Range {

For lines it doesn't matter, but we should only be able to make the UTF-16 code units <> Unicode scalar value offset conversion given a source line that the range operates on.

It might make sense to create a method on the VFS (https://github.com/rust-dev-tools/rls-vfs) to convert between given spans or columns.

See #1112 and rust-dev-tools/rls-vfs#24 for related changes

lijinpei · 2019-02-13T07:06:37Z

Maybe we should ignore this problem, and wait (or make ?) M$ to change that to utf-8?

Xanewok · 2019-02-13T08:21:56Z

The earliest they could do it is in LSP 4.0 and I’m not sure they even plan on doing so, so I’d say we should still do it ourselves.

…

On Wed, 13 Feb 2019 at 08:06, lijinpei ***@***.***> wrote: Maybe we should ignore this problem, and wait (or make ?) M$ to change that to utf-8? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1113 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC8y3Zl7bDiW5IqbioP15auy1HSVSD8Iks5vM7l_gaJpZM4YMx9d> .

mawww · 2019-02-14T05:15:04Z

If enough client/servers disregard the spec and unify on a sane alternative (byte or codepoint count), VSCode and the spec will eventually adapt. I suspect most tools use byte or codepoint counts until an issue gets opened due to a strange interaction with another lsp tool, at which point somebody reads the spec, re-reads it again, and goes through the various stages of grief...

Microsoft has control of the spec, but we, as tools writers, have no obligation to follow it to the letter, provided we unify on alternative behaviours and make it known.

soc · 2019-03-26T19:39:23Z

@mawww This is exactly what I intend to do for the (non-Rust) LSP implementation I'm planning to write in the coming months.

Xanewok mentioned this issue Nov 9, 2018

Fix crashes when editing wide characters spanning 2 `u16`s (UTF-16) #1112

Merged

Xanewok added the good first issue label Nov 30, 2018

Xanewok mentioned this issue Dec 11, 2018

Character offsets from the client are utf-16 code units #630

Closed

mawww mentioned this issue Feb 15, 2019

UTF-8 mode clangd/clangd#3

Closed

Xanewok added the bug label Mar 3, 2019

bstaletic mentioned this issue Mar 26, 2019

Some input Avi-D-coder/lsp-range-unit-survey#2

Closed

rust-lang / rls

Make sure we pass UTF-16 code unit offsets in all the LSP types #1113

Make sure we pass UTF-16 code unit offsets in all the LSP types #1113

Xanewok commented Nov 3, 2018

Xanewok commented Nov 30, 2018

lijinpei commented Feb 13, 2019

Xanewok commented Feb 13, 2019

mawww commented Feb 14, 2019

soc commented Mar 26, 2019 •

edited

rust-lang / rls

Join GitHub today

GitHub is where the world builds software

Make sure we pass UTF-16 code unit offsets in all the LSP types #1113

Make sure we pass UTF-16 code unit offsets in all the LSP types #1113

Comments

Xanewok commented Nov 3, 2018

Xanewok commented Nov 30, 2018

lijinpei commented Feb 13, 2019

Xanewok commented Feb 13, 2019

mawww commented Feb 14, 2019

soc commented Mar 26, 2019 • edited

Essential cookies

Always active

Analytics cookies

soc commented Mar 26, 2019 •

edited