Skip to content

Lex lifetimes as identifiers, recover from emoji and emit appropriate error#153893

Open
estebank wants to merge 6 commits intorust-lang:mainfrom
estebank:emoji-lifetime
Open

Lex lifetimes as identifiers, recover from emoji and emit appropriate error#153893
estebank wants to merge 6 commits intorust-lang:mainfrom
estebank:emoji-lifetime

Conversation

@estebank
Copy link
Copy Markdown
Contributor

@estebank estebank commented Mar 15, 2026

Lex and parse emoji in lifetimes by using the identifier logic, and disallow them in the parser with a hard error. Allow emoji to start a lifetime name even if they are not XID_Start.

error: identifiers cannot contain emoji: `'🐛🐛🐛family👨👩👧👦`
  --> $DIR/emoji-in-lifetime.rs:1:22
   |
LL | fn bad_lifetime_name<'🐛🐛🐛family👨👩👧👦>(
   |                      ^^^^^^^^^^^^^^^^^^^^^

Address #141081 (but we could provide more information in the diagnostic, pointing at the specific chars, providing a link to the reference on identifiers and/or some other extra information).

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 15, 2026

The parser was modified, potentially altering the grammar of (stable) Rust
which would be a breaking change.

cc @fmease

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 15, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 15, 2026

r? @jdonszelmann

rustbot has assigned @jdonszelmann.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: compiler
  • compiler expanded to 69 candidates
  • Random selection from 15 candidates

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

self.bump();
self.eat_while(is_id_continue);
self.eat_while(|c| {
let is_emoji = !c.is_ascii() && c.is_emoji_char();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we possibly create a function or a method for this, i see this logic is using in two places, maybe something like self.is_emoji() would be better approach?

Comment on lines +332 to +334
if has_emoji {
self.dcx().struct_span_err(span, "lifetimes cannot contain emoji").emit();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make use of the preexisting ParseSess.bad_unicode_identifiers & rustc_interface::passes infrastructure?

Copy link
Copy Markdown
Member

@fmease fmease Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("lifetime"1 tokens are a kind of identifier)

Footnotes

  1. Strictly speaking "ticked identifiers" because they don't just represent lifetimes but also labels which have a distinct grammar (e.g., 'static is not a syntactically valid label but it is a lexically valid "lifetime" ticked identifier)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason to not do that is to still leverage the 'blah' recovery machinery. Here we are in a situation where we have to either choose one case to have nicer output than the other, or find a good way to unify both checks.

Lex and parse emoji in lifetimes, and disallow them in the parser with a hard error. Allow emoji to start a lifetime name even if they are not XID_Start.

```
error: lifetimes cannot contain emoji
  --> $DIR/emoji-in-lifetime.rs:1:22
   |
LL | fn bad_lifetime_name<'🐛🐛🐛family👨👩👧👦>(
   |                      ^^^^^^^^^^^^^^^^^^^^^
```
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 18, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 18, 2026

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

@rustbot rustbot added the T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue. label Mar 18, 2026
@rust-log-analyzer

This comment has been minimized.

@estebank estebank changed the title Lex lifetimes using emoji and emit appropriate error Lex lifetimes as identifiers, recover from emoji and emit appropriate error Mar 19, 2026
Copy link
Copy Markdown
Member

@Veykril Veykril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-analyzer side

View changes since this review

@jdonszelmann
Copy link
Copy Markdown
Contributor

I agree with esteban for the nicer suggestions here, that's probably worth it. Sorry it took me a while, but lgtm.

@bors r=me,veykril rollup

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Mar 30, 2026

📌 Commit 4f1676d has been approved by me,veykril

It is now in the queue for this repository.

⚠️ The following reviewer(s) could not be found: me

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 30, 2026
@fmease
Copy link
Copy Markdown
Member

fmease commented Mar 30, 2026

@bors r=jdonszelmann,Veykril

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Mar 30, 2026

📌 Commit 4f1676d has been approved by jdonszelmann,Veykril

It is now in the queue for this repository.

tgross35 added a commit to tgross35/rust that referenced this pull request Mar 30, 2026
…lmann,Veykril

Lex lifetimes as identifiers, recover from emoji and emit appropriate error

Lex and parse emoji in lifetimes by using the identifier logic, and disallow them in the parser with a hard error. Allow emoji to start a lifetime name even if they are not XID_Start.

```
error: identifiers cannot contain emoji: `'🐛🐛🐛family👨👩👧👦`
  --> $DIR/emoji-in-lifetime.rs:1:22
   |
LL | fn bad_lifetime_name<'🐛🐛🐛family👨👩👧👦>(
   |                      ^^^^^^^^^^^^^^^^^^^^^
```

Address rust-lang#141081 (but we could provide more information in the diagnostic, pointing at the specific chars, providing a link to the reference on identifiers and/or some other extra information).
rust-bors bot pushed a commit that referenced this pull request Mar 31, 2026
Rollup of 7 pull requests

Successful merges:

 - #142659 (compiler-builtins: Clean up features)
 - #153574 (Avoid ICE when param-env normalization leaves unresolved inference variables)
 - #153648 (Fix EII function aliases eliminated by LTO)
 - #153790 (Fix regression when dealing with generics/values with unresolved inference)
 - #153893 (Lex lifetimes as identifiers, recover from emoji and emit appropriate error)
 - #153980 (refactor: move doc(rust_logo) check to parser)
 - #154551 (Skip suggestions pointing to macro def for assert_eq)
rust-bors bot pushed a commit that referenced this pull request Mar 31, 2026
Rollup of 7 pull requests

Successful merges:

 - #142659 (compiler-builtins: Clean up features)
 - #153574 (Avoid ICE when param-env normalization leaves unresolved inference variables)
 - #153648 (Fix EII function aliases eliminated by LTO)
 - #153790 (Fix regression when dealing with generics/values with unresolved inference)
 - #153893 (Lex lifetimes as identifiers, recover from emoji and emit appropriate error)
 - #153980 (refactor: move doc(rust_logo) check to parser)
 - #154551 (Skip suggestions pointing to macro def for assert_eq)
@Zalathar
Copy link
Copy Markdown
Member

Failed in rollup: #154611 (comment)

@bors r-
@bors try jobs=dist-ohos-armv7

@rust-bors rust-bors bot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 31, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Mar 31, 2026

This pull request was unapproved.

This PR was contained in a rollup (#154611), which was unapproved.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Mar 31, 2026
Lex lifetimes as identifiers, recover from emoji and emit appropriate error


try-job: dist-ohos-armv7
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Mar 31, 2026

💔 Test for 8db3bb9 failed: CI. Failed job:

@rust-log-analyzer
Copy link
Copy Markdown
Collaborator

The job dist-ohos-armv7 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
258 |                 rustc_lexer::TokenKind::Lifetime { invalid, starts_with_number } => {
    |                                                           ++++++++++++++++++++
help: if you don't care about this missing field, you can explicitly ignore it
    |
258 |                 rustc_lexer::TokenKind::Lifetime { invalid, starts_with_number: _ } => {
    |                                                           +++++++++++++++++++++++
help: or always ignore missing fields here
    |
258 |                 rustc_lexer::TokenKind::Lifetime { invalid, .. } => {
    |                                                           ++++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants