Skip to content

[wip] oxen lfs git integration#300

Draft
malcolmgreaves wants to merge 9 commits intomainfrom
mg/git_integration_replace_lfs
Draft

[wip] oxen lfs git integration#300
malcolmgreaves wants to merge 9 commits intomainfrom
mg/git_integration_replace_lfs

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

@malcolmgreaves malcolmgreaves commented Feb 26, 2026

Adds oxen lfs — a Git LFS replacement that uses Oxen to store to manage
large files inside standard Git repositories. Users track file patterns, and Git's
clean/smudge filter mechanism transparently replaces large file content with
small pointer files on commit and restores them on checkout.

New CLI commands (oxen lfs <subcommand>)

oxen lfs install (and --uninstall)
One-time global setup: configures Git's filter.oxen driver in ~/.gitconfig

oxen lfs init
Per-repo setup: creates .oxen/versions/, installs hooks, adds .oxen/ to .gitignore

oxen lfs track "<pattern>"
Adds a glob pattern to .gitattributes with filter=oxen diff=oxen merge=oxen -text

`oxen untrack
│ untrack "" │ Removes a tracked pattern from .gitattributes │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ status │ Shows all tracked pointer files and whether content is available locally │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ push │ Pushes large file content to an Oxen remote (Phase 3 — scaffolded) │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ pull [--local] │ Restores pointer files from local store (and origin for local clones) │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ fetch-all │ Strict restore: resolves every tracked pointer or errors with a list of failures │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ env │ Prints diagnostic info about the current LFS configuration │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ clean / smudge │ Single-file filter entry points (invoked by Git) │
├───────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ filter-process │ Long-running filter process for batch clean/smudge (invoked by Git) │
└───────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘

Library modules (src/lib/src/lfs/)

┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Module │ Purpose │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ pointer.rs │ Pointer file format: encode, decode, validate (version https://oxen.ai/spec/v1, oid xxh3:, size ) │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ filter.rs │ Core clean/smudge logic with local store lookup, origin discovery for local clones, and fallback │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ filter_process.rs │ Git long-running filter protocol v2 (pkt-line framing, handshake, per-file command loop) │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ gitattributes.rs │ Track/untrack/list patterns in .gitattributes │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ hooks.rs │ Installs pre-push, post-checkout, post-merge hooks using the full oxen binary path │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ install.rs │ Global ~/.gitconfig filter driver setup (filter.oxen.process, .clean, .smudge) │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ config.rs │ .oxen/lfs.toml config (optional remote URL) │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ status.rs │ Walks working tree, finds pointer files matching tracked patterns, checks local availability │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ sync.rs │ Push, pull, and fetch-all orchestration with Git index stat cache refresh │
└───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Seamless local clone support

The smudge filter auto-detects when a repo was cloned from a local origin (via git config remote.origin.url) and copies content directly from the origin's .oxen/versions/ store.
With oxen lfs install configured globally, git clone /path/to/repo restores all large files automatically with no extra commands. The .oxen/versions/ directory is auto-created on
demand so fresh clones don't error.

Clean git status after restore

After pull or fetch-all replaces pointer files with real content, the on-disk file size and mtime change. Without refreshing Git's index, git status would show false
modifications. Both commands run git add on restored paths — the clean filter produces the identical pointer blob, so no commit-level change occurs, only the stat cache is
updated.

Modified existing files

┌──────────────────────────┬──────────────────────────────────────────────────────────┐
│ File │ Change │
├──────────────────────────┼──────────────────────────────────────────────────────────┤
│ src/lib/src/lib.rs │ Added pub mod lfs │
├──────────────────────────┼──────────────────────────────────────────────────────────┤
│ src/lib/src/constants.rs │ Added LFS_CONFIG_FILENAME and LFS_VERSIONS_DIR constants │
├──────────────────────────┼──────────────────────────────────────────────────────────┤
│ src/cli/src/cmd.rs │ Added pub mod lfs; pub use lfs::LfsCmd │
├──────────────────────────┼──────────────────────────────────────────────────────────┤
│ src/cli/src/main.rs │ Registered LfsCmd in the command list │
└──────────────────────────┴──────────────────────────────────────────────────────────┘

Usage

One-time global setup

oxen lfs install

Per-repo setup

cd my-git-repo
oxen lfs init
oxen lfs track "*.bin"
oxen lfs track "datasets/**"

Normal Git workflow — filters run automatically

git add large_model.bin
git commit -m "Add model"
git push origin main

Clone restores files automatically (local clones)

git clone /path/to/origin /path/to/clone

Explicit full restore (errors if any pointer can't be resolved)

oxen lfs fetch-all

Test plan

  • cargo clippy --no-deps -- -D warnings — clean
  • cargo test lfs — 40 unit tests pass across all library modules
  • Manual end-to-end: init, track, add, commit, clone, verify restore
  • Manual fetch-all: verify errors when a pointer OID is missing from all stores
  • Manual git status is clean after fetch-all and pull

@malcolmgreaves malcolmgreaves marked this pull request as draft February 26, 2026 08:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 26, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09c91bf4-8e6a-496a-958c-d975d0cb40e1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/git_integration_replace_lfs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@malcolmgreaves malcolmgreaves force-pushed the mg/git_integration_replace_lfs branch from 330c5f9 to 8260a83 Compare February 27, 2026 20:02
1. lfs/config.rs — Added resolve_remote()

  - New async method that parses remote_url → RemoteRepository via
  api::client::repositories::get_by_url()
  - Returns Ok(None) when no remote is configured, errors if URL is set but repo doesn't
   exist

  2. api/client/versions.rs — Generic download_versions_to_store()

  - Extracted the HTTP QUERY + gzip + tar extraction logic into
  try_download_versions_to_store() that takes &dyn VersionStore
  - Added public download_versions_to_store() with retry wrapper (same retry logic as
  original)
  - Refactored try_download_data_from_version_paths to delegate — zero behavior change
  for existing callers

  3. lfs/sync.rs — git_add() error handling

  - Changed return type from () to Result<(), OxenError>
  - Propagates spawn errors, logs non-zero exit as warning (non-fatal)
  - Updated both call sites to use ?

  4. lfs/sync.rs — Real push_to_remote() implementation

  - Loads LfsConfig, resolves remote — skips if no remote configured
  - Builds a temp staging dir with files hard-linked/copied from version store at their
  real repo-relative paths
  - Creates workspace → add_files → commit via workspace API
  - On error, attempts workspace cleanup via delete
  - Renamed _args → hook_args and logs for debugging

  5. lfs/sync.rs — Real pull_from_remote() with remote download

  - After local + origin resolution, collects still-missing OIDs into need_remote
  - If !local_only and remote is configured, batch-downloads via
  download_versions_to_store()
  - Restores downloaded files to working tree and runs git_add()

  6. lfs/sync.rs — fetch_all() updated for remote

  - After local+origin resolution, tries configured Oxen remote for unresolved pointers
  - Only errors if pointers remain unresolved AND no remote is available

  7. lfs/filter.rs — Remote fetch in smudge() with 30s timeout

  - Renamed _lfs_config → lfs_config
  - After local + origin checks, attempts remote fetch wrapped in
  tokio::time::timeout(30s)
  - On success, reads from local store; on timeout/error, falls through to pointer
  fallback

  8. lfs/filter_process.rs — Documented _caps

  - Added comment explaining why capabilities are read but unused

  9. Tests (4 new, all passing)

  - test_push_no_remote_configured — succeeds silently with no remote
  - test_pull_local_only_no_network — restores local content, doesn't attempt network
  - test_git_add_returns_result — propagates errors properly
  - test_smudge_remote_fallback_on_no_server — falls back gracefully when remote
  unreachable

  Verification

  - cargo clippy --no-deps -- -D warnings — clean
  - cargo test --lib lfs — 44 tests passed, 0 failed
@malcolmgreaves malcolmgreaves force-pushed the mg/git_integration_replace_lfs branch from 9528c56 to e60220d Compare February 28, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant