Skip to content

Conversation

lihaoyi
Copy link
Contributor

@lihaoyi lihaoyi commented Oct 15, 2025

Implements scala/improvement-proposals#112

  • The initial Lexing/Scanning is lenient, only looking for the opening ''''* and equivalent closing delimiter. This matches how we can expect this to be implemented in other tools that have more restricted lexing frameworks (IntelliJ w/ JFlex, VSCode w/ TextMate Grammars, NeoVim w/ TreeSitter)

  • All other validation (opening delimiter must be followed by newline, closing delimiter must be preceded by whitespace only) and de-denting is left to the parsing phase, which is the only time we have a complete string "literal" when an interpolator is present, and thus are able to look at the trailing delimiter's preceding indent whitespace and trim it from all earlier STRINGLIT/STRINGPART tokens

  • def interpolatedString needed to be refactored to support dedenting: rather than constructing the trees immediately, we first assemble all the strings parts, then use the last string part to compute the dedent that we apply to all other parts, and only then do we construct the trees

  • Covered by neg/ tests and run/ tests for all the major features and edge cases I could think of:

    • All indentation removed
    • Some indentation preserved
    • Empty strings
    • Single-line strings
    • Blank lines in the string
    • Leading and trailing blank lines
    • Varying indentation
    • Extensible delimiters with 4 and 5 quotes
    • Funky operator and unicode characters in the string
    • Tab-based indentation
    • Interpolation with s and f
    • Single- and Multi-line pattern matching with and without interpolation
    • In larger expressions: lists, infix operators, etc.
    • As singleton-type ascriptions and singleton-type parameters
    • As literals passed to @compileTimeOnly
  • I haven't managed to reliably run tests for some reason, I think I'm bumping into https://contributors.scala-lang.org/t/current-testcompilation/7256. But I tested it manually by copy-pasting the run/neg test files into the bin/scala REPL and compared the output manually with the .check files on disk, and the output is identical

@Gedochao Gedochao requested review from odersky and sjrd October 15, 2025 08:33
@Gedochao Gedochao changed the title WIP dedented triple-quoted string literals SIP-72: WIP dedented triple-quoted string literals Oct 15, 2025
@odersky
Copy link
Contributor

odersky commented Oct 15, 2025

That was fast!

@Gedochao Gedochao added needs-minor-release This PR cannot be merged until the next minor release needs-sip A SIP needs to be raised to move this issue/PR along. stat:sip-in-progress and removed needs-sip A SIP needs to be raised to move this issue/PR along. labels Oct 15, 2025
@lihaoyi
Copy link
Contributor Author

lihaoyi commented Oct 15, 2025

Not ready to review yet! Still need a bit more vibing haha

@Gedochao
Copy link
Contributor

Not ready to review yet! Still need a bit more vibing haha

Ah right, I'll convert it to draft then

@Gedochao Gedochao marked this pull request as draft October 15, 2025 08:49

val hasTabs = closingIndent.contains('\t')
val hasSpaces = closingIndent.contains(' ')
if (hasTabs && hasSpaces) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to detect this in one loop

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Oct 15, 2025

I think this is in a pretty good state right now. There's some funniness around the test infrastructure I haven't figured out, and obviously it would have to wait for scala/improvement-proposals#112, but all the tests pass when run manually.

Despite attempting to vibe code the whole thing, claude could only get about 80% of the way there. The last 20% of feature-work/debugging/cleanup I ended up doing by hand.

Could certainly use more cleanup, but this is probably the natural stopping point until the SIP goes into the next phase

else
literal(inTypeOrSingleton = true)

/** Dedent a string literal by removing common leading whitespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new code in the compiler we use indentation syntax and new conditional if / then / else syntax. The old Java conditional syntax is already disabled under -language.future.

Comment on lines +1555 to +1566
val isDedented =
in.charOffset + 2 < in.buf.length &&
in.buf(in.charOffset - 1) == '\'' &&
in.buf(in.charOffset) == '\'' &&
in.buf(in.charOffset + 1) == '\''
in.nextToken()
def nextSegment(literalOffset: Offset) =
segmentBuf += Thicket(
literal(literalOffset, inPattern = inPattern, inStringInterpolation = true),
atSpan(in.offset) {
if (in.token == IDENTIFIER)
termIdent()
else if (in.token == USCORE && inPattern) {
in.nextToken()
Ident(nme.WILDCARD)
}
else if (in.token == THIS) {
in.nextToken()
This(EmptyTypeIdent)
}
else if (in.token == LBRACE)
if (inPattern) Block(Nil, inBraces(pattern()))
else expr()
else {
report.error(InterpolatedStringError(), source.atSpan(Span(in.offset)))
EmptyTree
}
})

var offsetCorrection = if isTripleQuoted then 3 else 1
while (in.token == STRINGPART)
nextSegment(in.offset + offsetCorrection)
// Collect all string parts and their offsets
val stringParts = new ListBuffer[(String, Offset)]
val interpolatedExprs = new ListBuffer[Tree]

var offsetCorrection = if (isDedented) 3 else if (isTripleQuoted) 3 else 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is super sketchy, I'm sure there's a better way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-minor-release This PR cannot be merged until the next minor release stat:sip-in-progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants