Skip to content

fix: normalize the URL when storing in the database#844

Merged
kalbasit merged 1 commit intomainfrom
02-11-fix_normalize_the_url_when_storing_in_the_database
Feb 12, 2026
Merged

fix: normalize the URL when storing in the database#844
kalbasit merged 1 commit intomainfrom
02-11-fix_normalize_the_url_when_storing_in_the_database

Conversation

@kalbasit
Copy link
Owner

By normalizing the NAR URL before storing it or checking for its
existence in the database, we ensure that the hash used in the
'nar_files' table matches the actual hash of the file in the storage
layer. This prevents duplicate 'nar_file' records when the same NAR is
referenced by different 'narinfo' files with varying URL formats (e.g.,
with or without a hash prefix).

Changes:

  • Call '.Normalize()' on the parsed NAR URL in 'storeInDatabase'.
  • Call '.Normalize()' on the parsed NAR URL in 'storeNarInfoInDatabase'.

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working go Pull requests that update go code labels Feb 11, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kalbasit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances data consistency within the cache database by introducing a URL normalization step for NAR (Nix Archive) URLs. By ensuring that NAR URLs are consistently formatted before being used in database operations, the system can accurately identify and manage NAR files, preventing the creation of redundant nar_file entries and guaranteeing that stored hashes correctly reflect the actual content in the storage layer.

Highlights

  • NAR URL Normalization: Implemented .Normalize() on parsed NAR URLs in storeInDatabase and storeNarInfoInDatabase before storing them or checking for their existence in the database. This ensures the nar_files.hash matches the actual file hash in the storage layer, preventing duplicate records when the same NAR is referenced by different URL formats.
Changelog
  • pkg/cache/cache.go
    • Normalized the NAR URL before calling c.ensureNarFile to ensure hash consistency and prevent duplicate nar_file records.
    • Normalized the NAR URL before creating or getting nar_file records in storeNarInfoInDatabase to ensure the stored hash matches the actual file.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot
Copy link

dosubot bot commented Feb 11, 2026

Related Documentation

Checked 7 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Owner Author

kalbasit commented Feb 11, 2026

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where different NAR URL formats could lead to duplicate nar_file records in the database by normalizing the NAR URL in storeInDatabase and storeNarInfoInDatabase. However, a critical security vulnerability has been identified: the normalization logic in pkg/nar/url.go is insecure as it fails open when path traversal attempts are detected. This, combined with the use of unsanitized hashes in temporary file creation elsewhere in pkg/cache/cache.go, poses a significant security risk. Additionally, the same normalization is needed in the PutNar function (and its related CDC code paths) to fully resolve the duplication issue for directly uploaded NARs. Please ensure that the Normalize function fails securely and that all file system operations use properly sanitized inputs, and consider extending the normalization to PutNar for a comprehensive fix.

Base automatically changed from 02-11-fix_narinfo_should_return_nar_url_normalized to main February 12, 2026 01:49
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Feb 12, 2026
@kalbasit kalbasit force-pushed the 02-11-fix_normalize_the_url_when_storing_in_the_database branch from d8933c2 to deedcdf Compare February 12, 2026 01:50
@kalbasit kalbasit added the backport release-0.8 Backport PR automatically label Feb 12, 2026
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 12, 2026
@kalbasit kalbasit force-pushed the 02-11-fix_normalize_the_url_when_storing_in_the_database branch from deedcdf to 0fcd7a4 Compare February 12, 2026 01:59
@kalbasit kalbasit enabled auto-merge (squash) February 12, 2026 02:01
By normalizing the NAR URL before storing it or checking for its
existence in the database, we ensure that the hash used in the
'nar_files' table matches the actual hash of the file in the storage
layer. This prevents duplicate 'nar_file' records when the same NAR is
referenced by different 'narinfo' files with varying URL formats (e.g.,
with or without a hash prefix).

Changes:
- Call '.Normalize()' on the parsed NAR URL in 'storeInDatabase'.
- Call '.Normalize()' on the parsed NAR URL in 'storeNarInfoInDatabase'.
@kalbasit kalbasit force-pushed the 02-11-fix_normalize_the_url_when_storing_in_the_database branch from 0fcd7a4 to 58513d4 Compare February 12, 2026 02:11
@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.96%. Comparing base (e422e95) to head (58513d4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #844   +/-   ##
=====================================
  Coverage   3.96%   3.96%           
=====================================
  Files          6       6           
  Lines        429     429           
=====================================
  Hits          17      17           
  Misses       409     409           
  Partials       3       3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kalbasit kalbasit merged commit a6df951 into main Feb 12, 2026
16 checks passed
@kalbasit kalbasit deleted the 02-11-fix_normalize_the_url_when_storing_in_the_database branch February 12, 2026 02:22
@kalbasit
Copy link
Owner Author

Backport failed for release-0.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-0.8
git worktree add -d .worktree/backport-844-to-release-0.8 origin/release-0.8
cd .worktree/backport-844-to-release-0.8
git switch --create backport-844-to-release-0.8
git cherry-pick -x a6df95184767eaa7dbfc3489d9cf804eabcc69a3

1 similar comment
@kalbasit
Copy link
Owner Author

Backport failed for release-0.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-0.8
git worktree add -d .worktree/backport-844-to-release-0.8 origin/release-0.8
cd .worktree/backport-844-to-release-0.8
git switch --create backport-844-to-release-0.8
git cherry-pick -x a6df95184767eaa7dbfc3489d9cf804eabcc69a3

kalbasit added a commit that referenced this pull request Feb 12, 2026
By normalizing the NAR URL before storing it or checking for its
existence in the database, we ensure that the hash used in the
'nar_files' table matches the actual hash of the file in the storage
layer. This prevents duplicate 'nar_file' records when the same NAR is
referenced by different 'narinfo' files with varying URL formats (e.g.,
with or without a hash prefix).

Changes:
- Call '.Normalize()' on the parsed NAR URL in 'storeInDatabase'.
- Call '.Normalize()' on the parsed NAR URL in 'storeNarInfoInDatabase'.

(cherry picked from commit a6df951)
kalbasit added a commit that referenced this pull request Feb 12, 2026
…857)

By normalizing the NAR URL before storing it or checking for its
existence in the database, we ensure that the hash used in the
'nar_files' table matches the actual hash of the file in the storage
layer. This prevents duplicate 'nar_file' records when the same NAR is
referenced by different 'narinfo' files with varying URL formats (e.g.,
with or without a hash prefix).

Changes:
- Call '.Normalize()' on the parsed NAR URL in 'storeInDatabase'.
- Call '.Normalize()' on the parsed NAR URL in 'storeNarInfoInDatabase'.

(cherry picked from commit a6df951)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport release-0.8 Backport PR automatically bug Something isn't working go Pull requests that update go code size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant