Skip to content

Conversation

miikkas
Copy link
Contributor

@miikkas miikkas commented Jun 18, 2025

The radices of integer literals in the input code, including both C and C++, are retained on a best-effort basis in the generated bindings.

  • The radix of a C/C++ integer literal in constant, enum, or macro definition is parsed. If it can't be determined, the code falls back to not doing anything about the radix.
    • This can be currently optionally enabled with Builder (.keep_integer_radices(true)) or CLI (--keep-integer-radices) switches. For now, the default is to have the feature off.
  • During Rust code generation, the radix of an integer literal – if it has been determined – is then used to output that integer's value represented in that number base.
  • New testing is added in the form of unit tests for parsing the radix of a Clang CXToken_Literal token and for outputting the tokens during codegen, and also as a new test header file paired with its matching expected Rust binding output.
  • Existing tests with hexadecimal or octal integers are updated to keep the corresponding radices of those values.

Resolves #3236.

@miikkas miikkas force-pushed the keep-radix branch 3 times, most recently from 8f8d206 to 7054e0f Compare June 19, 2025 17:32
@miikkas miikkas marked this pull request as ready for review June 19, 2025 17:35
@miikkas
Copy link
Contributor Author

miikkas commented Jun 19, 2025

I have now included additional testing to the PR.

Regarding the 13 failing existing tests: It seems that the differences are due to bindgen now generating, as intended, Rust literals using hexadecimal representation, while the values themselves have correctly stayed the same. Should I fix the expected code accordingly?

edit: I have adjusted the expected code accordingly.

@miikkas
Copy link
Contributor Author

miikkas commented Jun 21, 2025

The test I added was changed to use C++14 instead of C23, as it seems that the latter is not available for clang 9.0, which IIUC is used in the CI.

I have changed the expected side of the previously failing existing tests. Those changes are currently included in a separate commit in case that makes review less cumbersome. I will squash it later as instructed in the contribution guidelines.

edit: The commits have been squashed.

@miikkas miikkas marked this pull request as draft July 6, 2025 09:06
@miikkas miikkas force-pushed the keep-radix branch 4 times, most recently from 9064c00 to 2733cf6 Compare July 6, 2025 13:38
@miikkas miikkas marked this pull request as ready for review July 6, 2025 13:44
@miikkas
Copy link
Contributor Author

miikkas commented Jul 6, 2025

I have made the solution a lot more robust and included further unit testing. In my view, this is ready for review.

r? @emilio @pvdrz

@rustbot
Copy link
Collaborator

rustbot commented Jul 6, 2025

Error: The feature assign is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

miikkas

This comment was marked as resolved.

@miikkas miikkas force-pushed the keep-radix branch 2 times, most recently from fc8e1d6 to 91928a9 Compare July 12, 2025 05:36
@miikkas
Copy link
Contributor Author

miikkas commented Jul 15, 2025

To mitigate possible churn and not force the feature on all users immediately, I've added an option defaulting to off as per #3236 (comment).

@rustbot
Copy link
Collaborator

rustbot commented Aug 8, 2025

Error: Parsing assign command in comment failed: ...'' | error: specify user to assign to at >| ''...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@rustbot
Copy link
Collaborator

rustbot commented Aug 8, 2025

Error: The feature assign is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

Copy link
Contributor Author

@miikkas miikkas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some unnecessary complex code may still remain that should be checked.

/// literal definition.
///
/// Returns `None` if unable to infer a base.
pub(crate) fn from_integer_literal_token(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this function is supposed to work when bytes corresponding to an existing integer literal have been passed to it, it's possibly doing far too much. I should look more into clang's API to see if all the extra parsing here is unnecessary. Same goes for the corresponding unit tests. If clang has already validated that the bytes represent an integer literal, determining the radix becomes much easier.

@miikkas
Copy link
Contributor Author

miikkas commented Aug 17, 2025

The latest change removed the varying case long long suffixes (i.e. lL and Ll) from the INTEGER_SUFFIXES array, since those are not permitted by the C standard nor are they supported by clang or gcc. Additionally, I adjusted the length of some documentation rows that exceeded rustfmt's configured maximum of 80, and I added some testing for shorts and chars as well in the new .hpp test file.

The radices of integer literals in input C and C++ code are retained in the
generated bindings, provided that the newly added builder option is set to true,
or the corresponding CLI flag is used.

If an input header contains constants that have binary, octal, or hexadecimal
representation, the literals in the Rust bindings are output with the same
number base. E.g., 0x10 in C/C++ header would be output as 0x10 in Rust code.

The functionality is accessed with the builder option `keep_integer_radices`,
which defaults to `false` for now. `--keep-integer-radices` is the corresponding
CLI flag.

The existing tests with header files and corresponding expectations that were
affected by this change have been adjusted to accommodate the change: The CLI
flag for keeping the integer radices was added to the bindgen command for those
tests, and the values in the expectations were updated to have the original
radix; the new values were confirmed by hand to equal the original ones.
@miikkas
Copy link
Contributor Author

miikkas commented Aug 17, 2025

The commits have now been squashed into one commit that should stand on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Keep the integer literal radices of C and C++ in generated Rust
2 participants