Component
pygls/protocol/json_rpc.py
Summary
The method computes the Content-Length header value using len(body), which returns the number of Unicode code points rather than the number of bytes in the UTF-8 encoded payload. When the JSON-RPC payload contains non-ASCII characters (e.g., emoji, CJK text, or localized diagnostic messages), the declared length differs from the actual number of bytes transmitted, causing the receiving peer to misparse the message boundary.
Steps to Reproduce
- Return a
CompletionItem with a label containing non-ASCII characters (e.g., "测试").
- Inspect the raw LSP traffic and observe that
Content-Length is smaller than the actual UTF-8 payload size.
- The LSP client hangs or reports a parse error, waiting for more bytes than declared.
Expected Behavior
Per the LSP Base Protocol specification (JSON-RPC over HTTP), Content-Length must reflect the exact byte count of the UTF-8 encoded message body.
Actual Behavior
Content-Length reflects the Unicode character count, violating the LSP specification.
Affected Code (pygls/protocol/json_rpc.py, ~L528-541)
body = json.dumps(data, default=self._serialize_message)
header = (
f"Content-Length: {len(body)}\r\n"
...
)
data = header + body
res = self.writer.write(data.encode(self.CHARSET))
Proposed Fix
body_bytes = body.encode(self.CHARSET)
header = (
f"Content-Length: {len(body_bytes)}\r\n"
...
).encode(self.CHARSET)
self.writer.write(header + body_bytes)
Component
pygls/protocol/json_rpc.py
Summary
The method computes the
Content-Lengthheader value usinglen(body), which returns the number of Unicode code points rather than the number of bytes in the UTF-8 encoded payload. When the JSON-RPC payload contains non-ASCII characters (e.g., emoji, CJK text, or localized diagnostic messages), the declared length differs from the actual number of bytes transmitted, causing the receiving peer to misparse the message boundary.Steps to Reproduce
CompletionItemwith a label containing non-ASCII characters (e.g.,"测试").Content-Lengthis smaller than the actual UTF-8 payload size.Expected Behavior
Per the LSP Base Protocol specification (JSON-RPC over HTTP),
Content-Lengthmust reflect the exact byte count of the UTF-8 encoded message body.Actual Behavior
Content-Lengthreflects the Unicode character count, violating the LSP specification.Affected Code (
pygls/protocol/json_rpc.py, ~L528-541)Proposed Fix