Skip to content

Conversation

@alexskr
Copy link
Member

@alexskr alexskr commented Aug 19, 2025

Summary

This PR refactors LinkedData::Utils::FileHelpers to improve archive handling.

Motivation

Zip/Gzip functionality silently broke after the rubyzip gem was updated to v3.0 in ncbo_cron:

E, [2025-08-13T20:53:32.436690 #1059800] ERROR -- : ["Errno::ENOENT: No such file or directory @ rb_sysopen - /opt/ontoportal/ncbo_cron/srv/ncbo/repository/AGROVOC/1/unzipped/agrovoc_lod.nt\n/opt/ontoportal/ncbo_cron/vendor/bundle/ruby/3.1.0/gems/rubyzip-3.0.0/lib/zip/entry.rb:746:in `initialize'\n\t/opt/ontoportal/ncbo_cron/vendor/bundle/ruby/3.1.0/gems/rubyzip-3.0.0/lib/zip/entry.rb:746:in `open'\n\t/opt/ontoportal/ncbo_cron/vendor/bundle/ruby/3.1.0/gems/rubyzip-3.0.0/lib/zip/entry.rb:746:in `create_file'\n\t/opt/ontoportal/ncbo_cron/vendor/bundle/ruby/3.1.0/gems/rubyzip-3.0.0/lib/zip/entry.rb:296:in 
...

RubyZip v3 changed extraction behavior (destination_directory: required).
Needed stronger protections against Zip Slip (path traversal).
.gz extraction previously buffered the whole file into memory.

Changes

  • Rename files_from_zip to filenames_from_archive for clarity (returns entry names, not extracted files
  • Harden .gz and .zip extraction (sanitize orig_name, prevent path traversal, clean up partials).
  • Behavior: zip? / gzip? now act as pure predicates (returning true/false) instead of raising on missing files.
  • move relevant unit tests from model/test_ontology_submission to utils/test_filehelpers
  • add rubyzip gem v3 as dependency to gemspec to make sure the correct version is installed in projects which include ontologies_linked_data gem
  • pin thin gem to v1 for compatibility reasons
  • added more unit tests

- Switch from shelling out to `file --mime` to checking magic bytes
  (PK for ZIP, 1F 8B for GZIP). Improves portability and removes
  external dependency.
- Add `safe_join` guard to block path traversal.
- Normalize gzip names via `resolve_gzip_name` (strip control chars,
  collapse to basename, ensure non-empty).
- Explicitly exclude .tar.gz and .tgz files (not supported yet).
- align with RubyZip v3+ semantics:
    - enforce explicit destination_directory (no implicit cwd writes),
    - block path traversal (`../` entries skipped).
- use streaming writes (`IO.copy_stream`) to reduce memory usage.
- Rename `files_from_zip` to `filenames_in_archive` for clarity
  (returns entry names, not extracted files
- add rubyzip gem as dependency to gemspec
- pin thin gem to v1 for compatibility reasons
@alexskr alexskr requested a review from mdorf August 19, 2025 06:05
@codecov
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

❌ Patch coverage is 89.61039% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.23%. Comparing base (2bbdfc9) to head (5589c69).
⚠️ Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
lib/ontologies_linked_data/utils/file.rb 89.33% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #256      +/-   ##
===========================================
+ Coverage    80.21%   80.23%   +0.01%     
===========================================
  Files           84       84              
  Lines         5843     5874      +31     
===========================================
+ Hits          4687     4713      +26     
- Misses        1156     1161       +5     
Flag Coverage Δ
unittests 80.23% <89.61%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexskr alexskr changed the title refactors LinkedData::Utils::FileHelpers to improve archive handling refactors: LinkedData::Utils::FileHelpers to improve archive handling Aug 19, 2025
@alexskr alexskr changed the title refactors: LinkedData::Utils::FileHelpers to improve archive handling Refactor: improve archive handling in LinkedData::Utils::FileHelpers Aug 19, 2025
@alexskr alexskr marked this pull request as ready for review August 19, 2025 06:35
Copy link
Member

@mdorf mdorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@alexskr alexskr merged commit 061f530 into develop Aug 19, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants