Skip to content

fix(builder): strip UTF-8 BOM from .ino sources before preprocessing #2983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ritesh006
Copy link

Arduino CLI: Strip UTF‑8 BOM from .ino before preprocessing

Summary

When a sketch .ino is saved as UTF-8 with BOM, the three BOM bytes (EF BB BF) reach the compiler and cause:

stray '\357' in program
stray '\273' in program
stray '\277' in program

This PR strips the BOM at read-time so the merged .cpp and any copied sources are clean.

Refs: arduino/arduino-ide#2752


Please check if the PR fulfills these requirements

See how to contribute

  • The PR has no duplicates (please search among the Pull Requests before creating one)
  • The PR follows our contributing guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • UPGRADING.md has been updated with a migration guide (for breaking changes)
  • configuration.schema.json updated if new parameters are added.

What kind of change does this PR introduce?

Bug fix — make the CLI robust to UTF-8 BOM at the start of .ino and additional files.


What is the current behavior?

  • If a .ino is saved as UTF-8 with BOM, the BOM bytes are preserved into the merged .cpp, leading to compiler errors (stray '\357' / '\273' / '\277').
  • This matches IDE issue Multi-line comments causing build failure arduino-ide#2752 and appears “random” to users because some editors silently add a BOM; a blank line after an initial block comment makes it easy to reproduce.

What is the new behavior?

  • On reading sketch sources:
    • Strip a leading UTF-8 BOM before merging .ino files.
    • Strip a leading UTF-8 BOM when copying additional files.
  • Result: BOM-prefixed sketches compile successfully. No behavior change for normal UTF-8 (no BOM) files.

Implementation notes

  • Added helper:
func stripUTF8BOM(b []byte) []byte {
    if len(b) >= 3 && b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF {
        return b[3:]
    }
    return b
}
  • Applied in:
    • internal/arduino/builder/sketch.gosketchMergeSources() (via getSource(...))
    • internal/arduino/builder/sketch.gosketchCopyAdditionalFiles(...)

Test plan (manual)

  1. Create a minimal sketch:
/* test */

int x = 42;
void setup(){ Serial.begin(9600); }
void loop(){ Serial.println(x); delay(1000); }
  1. Save with BOM (VS Code → Save with Encoding → UTF-8 with BOM).
  2. Compile:
arduino-cli compile -b arduino:avr:uno <sketch-folder>

Before this patch: fails with:

stray '\357' in program
stray '\273' in program
stray '\277' in program

After this patch: succeeds.

Control: Save as UTF-8 (no BOM) → succeeds (unchanged).

(Optional follow-up): add an automated test by placing a BOM-prefixed .ino in testdata and asserting the merged output compiles.


Does this PR introduce a breaking change?

No. The change only strips a BOM if present; no impact on existing UTF-8 (no BOM) files or other encodings.


Other information

  • The issue was reported in the IDE repo, but the root cause is in the CLI merge/preprocess path. Fixing it here resolves the problem for the IDE once it bundles a CLI containing this patch.
  • Performance/overhead is negligible (constant-time 3-byte check per file).

When a sketch .ino is saved as UTF-8 *with BOM*, the BOM bytes (EF BB BF)
reach the compiler and cause:
  stray '\357' in program
  stray '\273' in program
  stray '\277' in program

This strips the BOM at read-time so the merged .cpp and copied sources are clean.

Refs: arduino/arduino-ide#2752
@CLAassistant
Copy link

CLAassistant commented Aug 24, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Aug 24, 2025

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.25%. Comparing base (eb4e2ca) to head (b36b87a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/arduino/builder/sketch.go 57.14% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2983      +/-   ##
==========================================
- Coverage   68.26%   68.25%   -0.02%     
==========================================
  Files         241      241              
  Lines       22703    22710       +7     
==========================================
+ Hits        15499    15500       +1     
- Misses       6007     6011       +4     
- Partials     1197     1199       +2     
Flag Coverage Δ
unit 68.25% <57.14%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@per1234 per1234 linked an issue Aug 24, 2025 that may be closed by this pull request
3 tasks
@per1234 per1234 added type: enhancement Proposed improvement topic: code Related to content of the project itself topic: build-process Related to the sketch build process labels Aug 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: build-process Related to the sketch build process topic: code Related to content of the project itself type: enhancement Proposed improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-line comments causing build failure
3 participants