Skip to content

Fix async profiler output copy to s3 [skip ci]#14723

Open
jihoonson wants to merge 2 commits intoNVIDIA:mainfrom
jihoonson:fix-async-profiler-s3-output
Open

Fix async profiler output copy to s3 [skip ci]#14723
jihoonson wants to merge 2 commits intoNVIDIA:mainfrom
jihoonson:fix-async-profiler-s3-output

Conversation

@jihoonson
Copy link
Copy Markdown
Collaborator

Description

The async profiler currently fails to save its output to s3. This is because it calls copyFromLocalFile to copy each result file to s3, which has a known issue. This PR uses Files.copy after opening a stream to work around the issue.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@jihoonson jihoonson added the bug Something isn't working label May 4, 2026
@jihoonson jihoonson changed the title Fix async profiler output copy to s3 Fix async profiler output copy to s3 [skip ci] May 4, 2026
Signed-off-by: Jihoon Son <ghoonson@gmail.com>
@jihoonson jihoonson force-pushed the fix-async-profiler-s3-output branch from b4334a4 to af157d1 Compare May 4, 2026 21:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR fixes async profiler output not being saved to S3 by replacing fs.copyFromLocalFile (broken by HADOOP-18173) with withResource(fs.create(...)) { out => Files.copy(localPath, out) } across the three copy sites. The withResource pattern is applied correctly to guarantee the output stream is closed and the S3 upload is finalized even if Files.copy throws.

Confidence Score: 5/5

Safe to merge — targeted fix with correct resource management.

All three copy sites are updated consistently. The withResource ARM pattern is applied correctly, ensuring the output stream is closed on both success and exception paths. No logic regressions or new resource leaks introduced. No P0/P1 findings.

No files require special attention.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/asyncProfiler.scala Replaces three copyFromLocalFile calls with withResource(fs.create(...)) { Files.copy(...) } to work around HADOOP-18173; resource handling is correct.

Sequence Diagram

sequenceDiagram
    participant Exec as Executor
    participant FS as Hadoop FileSystem
    participant S3 as S3 / HDFS

    Exec->>FS: fs.create(outPath, overwrite=false)
    FS-->>Exec: OutputStream (out)
    Exec->>Exec: Files.copy(localPath, out)
    Note over Exec,FS: withResource ensures out.close() on both success and exception
    Exec->>FS: out.close() [via withResource]
    FS->>S3: Finalize / commit object
    S3-->>FS: OK
    Exec->>Exec: localFile.delete()
Loading

Reviews (3): Last reviewed commit: "fix line limit" | Re-trigger Greptile

Comment on lines 357 to 361
if (compressJfrFile(tempFilePath, compressedTempPath)) {
fs.copyFromLocalFile(new Path(compressedTempPath.toString), outPath)
withResource(fs.create(outPath, false)) { out =>
Files.copy(compressedTempPath, out)
}
compressedTempPath.toFile.delete() // delete compressed temp file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Compressed temp file not deleted on exception

If Files.copy(compressedTempPath, out) throws, withResource correctly closes out, but the compressedTempPath.toFile.delete() on line 361 is skipped, leaking the compressed temp file. The same exception also skips the tempFilePath.toFile.delete() at line 374. A try/finally (or a second withResource-like guard) around compressedTempPath would ensure cleanup on both normal and exceptional paths.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile-apps this is an existing issue, isn't it? Even before this change, compressedTempPath will be never deleted if fs.copyFromLocalFile throws.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's a pre-existing issue that this PR didn't introduce. My comment was unfair to flag as part of this review since the change doesn't make the cleanup situation any worse than before. Thanks for the correction.

Tip: You can give me more context with [.greptile/rules.md](https://www.greptile.com/docs/code-review/custom-standards) so I get this right next time.

abellina
abellina previously approved these changes May 4, 2026
@jihoonson
Copy link
Copy Markdown
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants