Skip to content

Support LSP 3.17 positionEncoding negotiation#442

Merged
mame merged 2 commits intoruby:masterfrom
ahogappa:feature/position-encoding
Apr 28, 2026
Merged

Support LSP 3.17 positionEncoding negotiation#442
mame merged 2 commits intoruby:masterfrom
ahogappa:feature/position-encoding

Conversation

@ahogappa
Copy link
Copy Markdown
Contributor

Summary

Implements LSP 3.17 positionEncoding negotiation. Today TypeProf hardcodes UTF-16LE columns and omits positionEncoding from its capabilities response, so modern clients that prefer UTF-8 (Helix, Zed, Neovim) all fall back to UTF-16.

This PR also exposes the encoding as Service.new(position_encoding:) for embedders.
The motivating use case is ruby-minify, which cross-references TypeProf nodes against Prism re-parses to drive variable/method renaming.
Prism's native columns are byte-based — equivalent to LSP utf-8 — so Service.new(position_encoding: Encoding::UTF_8) makes the two coordinate systems line up.
Without this option, ruby-minify reaches into TypeProf via node.instance_variable_get(:@raw_node).location to bypass UTF-16 columns.

Behavior change

None by default. Clients that don't send general.positionEncodings, and embedders that don't pass position_encoding, still get UTF-16LE.

Implementation note

The server picks the first mutually-supported encoding from the client's preference-ordered list (utf-8 / utf-16 / utf-32), falling back to utf-16 per spec.

FileContext#column_offsets_for returns columns in the configured encoding. For UTF-8 it uses Prism's native start_column / end_column (= byte offsets) — note that Prism's code_units_cache(Encoding::UTF_8) reports code points rather than bytes, which would violate the LSP spec.

Verification

End-to-end against Helix, which proposes general.positionEncodings: ["utf-8", "utf-32", "utf-16"]. Source — line 4 references undefined あfoo, where is 3 bytes in UTF-8 / 1 code unit in UTF-16:

def foo(n)
  n
end

あfoo(1, 2)

Helix helix_lsp::transport log — InitializeResult

Master (ruby/typeprof 0.31.1):

typeprof <- {"id":0,"result":{"capabilities":{"textDocumentSync":{...},"hoverProvider":true,...,"referencesProvider":true},"serverInfo":{"name":"typeprof","version":"0.31.1"}},"jsonrpc":"2.0"}

This PR:

typeprof <- {"id":0,"result":{"capabilities":{"positionEncoding":"utf-8","textDocumentSync":{...},"hoverProvider":true,...,"referencesProvider":true},"serverInfo":{"name":"typeprof","version":"0.31.1"}},"jsonrpc":"2.0"}

The only difference is the new "positionEncoding":"utf-8" field, which signals that subsequent column values use UTF-8 byte offsets.

  • Unit tests in test/core/position_encoding_test.rb (UTF-8 / UTF-16 / UTF-32 columns).
  • LSP tests in test/lsp/lsp_test.rb (negotiation matrix).
  • End-to-end against Helix: confirmed positionEncoding: "utf-8" response and correct diagnostic range on a non-ASCII identifier.

Spec: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

ahogappa and others added 2 commits April 25, 2026 22:14
Adds `position_encoding:` option to `TypeProf::Core::Service.new`.
FileContext stores the encoding and computes Prism::Location columns
accordingly. Default is `Encoding::UTF_16LE` (preserves existing
behavior).

UTF-8 uses Prism's native `start_column`/`end_column` (= byte offsets),
because Prism's `code_units_cache(Encoding::UTF_8)` reports code points
rather than bytes — the latter is what the LSP 3.17 spec defines for
`utf-8`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements `general.positionEncodings` negotiation per LSP 3.17 spec.
The server picks the first encoding from the client's preference list
that it supports (`utf-8` / `utf-16` / `utf-32`) and reports it back via
`capabilities.positionEncoding`. Falls back to UTF-16 (mandatory per
spec) if the client doesn't propose any supported encoding.

The negotiated value flows into per-workspace Services through
`core_options.merge(position_encoding: ...)`, so each Service computes
column positions in the agreed encoding.

See: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread lib/typeprof/core/env.rb
attr_reader :path, :comments, :position_encoding
def initialize(path, position_encoding = nil, prism_source = nil, comments = nil)
@path = path
@position_encoding = position_encoding || Encoding::UTF_16LE
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: given that Ruby's default source encoding is UTF-8, UTF-8 would arguably be a more natural default.
However, switching from UTF-16LE would be a breaking change for existing users, so this PR keeps UTF-16LE.

Copy link
Copy Markdown
Member

@mame mame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I would like to give it a try!

@mame mame merged commit 1aedffb into ruby:master Apr 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants