Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- do `oxen lfs init` when doing a `git clone` on an oxen-enabled repository
- fix gaps (oxen lfs push)
113 changes: 113 additions & 0 deletions oxen-rust/docs/dev/OxenLfsBranchSummary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# `oxen lfs` — Git Integration: Branch Summary

## What This Is

A **drop-in replacement for `git lfs`** that stores large file content in Oxen's version store and syncs it to an Oxen server. Users keep using Git for version control while offloading large binary files to Oxen's infrastructure instead of GitHub's LFS.

---

## How It Works

### Architecture

```
Git Repository
├── .git/hooks/
│ ├── pre-push → oxen lfs push
│ ├── post-checkout → oxen lfs pull --local
│ └── post-merge → oxen lfs pull --local
├── .gitattributes *.bin filter=oxen diff=oxen merge=oxen -text
├── .gitignore .oxen/
├── .oxen/
│ ├── lfs.toml remote_url = "https://hub.oxen.ai/ns/repo"
│ └── versions/ content-addressable store (xxh3 hashes)
│ └── <ab>/<cdef…>/data
└── working tree
└── model.bin (pointer file in Git, real content on disk)
```

### Pointer Format

```
version https://oxen.ai/spec/v1
oid xxh3:a1b2c3d4e5f6a7b8a1b2c3d4e5f6a7b8
size 5242880
```

Uses xxHash3-128 (fast, non-cryptographic) instead of git-lfs's SHA-256.

### Key Data Flows

**Clean (file -> pointer):** Git add/commit triggers the clean filter. Hashes content (xxHash3-128), stores blob in `.oxen/versions/`, returns a 3-line pointer (~100 bytes) that Git commits.

**Smudge (pointer -> file):** Git checkout triggers the smudge filter. Tries 4 tiers:
1. Local `.oxen/versions/` store
2. Origin's `.oxen/versions/` (for local `git clone`)
3. Configured Oxen remote (HTTP, 30s timeout)
4. Fallback: return pointer bytes + warn

**Push:** `pre-push` hook (or `oxen lfs push`) creates a temporary workspace on the Oxen server, uploads versioned blobs via `add_files` (handles batching + multipart), commits the workspace, cleans up.

**Pull:** `post-checkout`/`post-merge` hooks (or `oxen lfs pull`) scan for pointer files, restore content from local -> origin -> remote, then `git add` to refresh the index stat cache.

---

## All Files on This Branch

### Library (`oxen-rust/src/lib/src/lfs/`)

| File | Purpose |
|------|---------|
| `lfs.rs` | Module declaration (9 submodules) |
| `pointer.rs` | Pointer file encode/decode/validation (xxh3, 200-byte max) |
| `config.rs` | `.oxen/lfs.toml` load/save + `resolve_remote()` -> `RemoteRepository` |
| `gitattributes.rs` | `.gitattributes` track/untrack/list patterns |
| `install.rs` | Global `~/.gitconfig` filter driver install/uninstall |
| `hooks.rs` | `.git/hooks/` pre-push, post-checkout, post-merge (idempotent, preserves existing) |
| `filter.rs` | Clean filter (hash+store) and smudge filter (4-tier lookup with 30s remote timeout) |
| `filter_process.rs` | Git long-running filter protocol v2 (pkt-line, capability negotiation) |
| `status.rs` | Walk working tree, find pointers matching tracked patterns, check local availability |
| `sync.rs` | `push_to_remote` (workspace API), `pull_from_remote` (batch download), `fetch_all`, `git_add` |

### CLI (`oxen-rust/src/cli/src/cmd/lfs/`)

| Command | Purpose |
|---------|---------|
| `oxen lfs init [--remote URL]` | Initialize LFS in a git repo (creates .oxen/, hooks, .gitignore) |
| `oxen lfs install [--uninstall]` | Global filter driver in `~/.gitconfig` |
| `oxen lfs track <pattern>` | Add pattern to `.gitattributes` |
| `oxen lfs untrack <pattern>` | Remove pattern from `.gitattributes` |
| `oxen lfs push` | Upload versioned blobs to Oxen remote via workspace API |
| `oxen lfs pull [--local]` | Download + restore pointer files |
| `oxen lfs fetch-all` | Strict sync: errors if anything can't be resolved |
| `oxen lfs status` | Show tracked files + local/missing status |
| `oxen lfs clean` | Stdin->stdout clean filter for Git |
| `oxen lfs smudge` | Stdin->stdout smudge filter for Git |
| `oxen lfs filter-process` | Long-running filter process (pkt-line v2) |
| `oxen lfs env` | Print version, remote URL, versions dir, tracked patterns |

### Modified Shared Code

| File | Change |
|------|--------|
| `api/client/versions.rs` | Added `download_versions_to_store()` -- generic batch download to any `VersionStore` (refactored existing download to delegate, zero behavior change) |
| `constants.rs` | Added `OXEN_HIDDEN_DIR` constant |
| `lib.rs` / `cmd.rs` / `main.rs` | Registered lfs module and subcommands |

---

## Tests

44 LFS tests pass. Clippy clean. Coverage includes:
- Pointer serialization/deserialization/validation
- Config save/load/defaults
- `.gitattributes` manipulation (track, untrack, list, idempotency)
- Hook installation (creation, idempotency, preservation, permissions, path quoting)
- Global filter install/uninstall
- Clean filter (stores content, returns pointer, idempotent)
- Smudge filter (restores content, passthrough non-pointer, fallback on missing, remote fallback on unreachable server)
- pkt-line protocol (text/binary roundtrips, key=value pairs)
- Status detection (finds pointers matching patterns)
- Push with no remote (silent success)
- Pull local-only (no network, restores local content)
- `git_add` returns Result (empty list, non-git dir)
89 changes: 89 additions & 0 deletions oxen-rust/docs/dev/OxenLfsGitLfsParity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# `oxen lfs` vs `git lfs` — Parity Roadmap

## Current State

The `oxen lfs` integration is feature-complete for core local and remote workflows: clean/smudge filters, long-running filter process, git hooks, CLI commands, local clone support, and remote push/pull via the Oxen workspace API.

---

## Remaining TODOs

### From the `TODO` File

1. **Auto-init on `git clone`** -- Detect `.gitattributes` with `filter=oxen` and auto-run `oxen lfs init`
2. **Fix gaps (oxen lfs push)** -- Vague; likely refers to edge cases

### Missing Commands

| Priority | Command | What It Does | Effort |
|----------|---------|-------------|--------|
| High | `lfs fetch` | Download objects without restoring (separate from `pull`) | Small |
| High | `lfs checkout` | Restore files from local cache only | Small (essentially `pull --local` as named command) |
| High | `lfs ls-files` | List all LFS-tracked files with their OIDs | Small (reuse `status::get_status`) |
| Medium | `lfs prune` | Delete unreferenced objects from `.oxen/versions/` | Medium (needs reachability analysis) |
| Medium | `lfs migrate import` | Rewrite history to convert large files to pointers | Large (needs `git filter-repo` integration) |
| Medium | `lfs migrate export` | Rewrite history to remove LFS, restore files inline | Large |
| Low | `lfs lock`/`unlock`/`locks` | File locking for binary assets | Large (needs server API) |
| Low | `lfs fsck` | Verify integrity of local objects | Small (hash each file, compare) |

### Missing Features

| Priority | Feature | Notes |
|----------|---------|-------|
| **High** | Skip re-uploading already-pushed files | Push doesn't check if remote already has a hash before uploading |
| **High** | Progress indicators | No progress bars during push/pull of large files |
| Medium | Per-branch/per-ref fetch | `fetch-all` downloads everything; no way to fetch for a specific ref |
| Medium | SSH transfer adapter | Only HTTP supported |
| Low | Custom transfer adapters | Extensibility for non-HTTP transports |
| Low | Custom merge driver | `merge=oxen` is declared in `.gitattributes` but no driver is implemented |
| Low | Deduplication / storage optimization | No chunking or dedup beyond content-addressing |

---

## Intentional Divergences (Not Gaps)

These are architectural decisions, not missing features:

- **Hash**: xxHash3-128 vs SHA-256 -- speed over cryptographic guarantees
- **Server protocol**: Oxen workspace API vs git-lfs Batch API -- leverages existing Oxen infrastructure
- **Config**: `.oxen/lfs.toml` vs git config -- clean separation from git config namespace
- **Pointer namespace**: `oxen.ai/spec/v1` vs `git-lfs.github.com/spec/v1`

---

## Full `git lfs` Command Coverage

| `git lfs` Command | `oxen lfs` Equivalent | Status |
|-------------------|-----------------------|--------|
| `install` | `oxen lfs install` | Done |
| `uninstall` | `oxen lfs install --uninstall` | Done (flag, not separate command) |
| `track` | `oxen lfs track` | Done |
| `untrack` | `oxen lfs untrack` | Done |
| `push` | `oxen lfs push` | Done |
| `pull` | `oxen lfs pull` | Done |
| `fetch` | -- | Not implemented (separate from pull) |
| `checkout` | `oxen lfs pull --local` | Done (as flag, not separate command) |
| `status` | `oxen lfs status` | Done |
| `ls-files` | -- | Not implemented |
| `env` | `oxen lfs env` | Done |
| `clean` | `oxen lfs clean` | Done |
| `smudge` | `oxen lfs smudge` | Done |
| `filter-process` | `oxen lfs filter-process` | Done |
| `lock` / `unlock` | -- | Not implemented |
| `locks` | -- | Not implemented |
| `prune` | -- | Not implemented |
| `migrate import` | -- | Not implemented |
| `migrate export` | -- | Not implemented |
| `fsck` | -- | Not implemented |
| `clone` | -- | Not applicable (use `git clone` + `oxen lfs init`) |
| `dedup` | -- | Not implemented |
| `merge-driver` | -- | Not implemented |
| `logs` | -- | Not implemented |
| `pointer` | -- | Not implemented as CLI (library only) |

### Additional `oxen lfs` Commands (No `git lfs` Equivalent)

| Command | Purpose |
|---------|---------|
| `oxen lfs init [--remote URL]` | One-step repo setup (creates .oxen/, hooks, .gitignore, optional remote) |
| `oxen lfs fetch-all` | Strict sync: errors if any pointer can't be resolved (combines fetch + checkout + strict validation) |
3 changes: 3 additions & 0 deletions oxen-rust/src/cli/src/cmd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ pub use info::InfoCmd;
pub mod init;
pub use init::InitCmd;

pub mod lfs;
pub use lfs::LfsCmd;

pub mod load;
pub use load::LoadCmd;

Expand Down
107 changes: 107 additions & 0 deletions oxen-rust/src/cli/src/cmd/lfs.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
pub mod clean;
pub use clean::LfsCleanCmd;

pub mod env;
pub use env::LfsEnvCmd;

pub mod fetch_all;
pub use fetch_all::LfsFetchAllCmd;

pub mod filter_process;
pub use filter_process::LfsFilterProcessCmd;

pub mod init;
pub use init::LfsInitCmd;

pub mod install;
pub use install::LfsInstallCmd;

pub mod pull;
pub use pull::LfsPullCmd;

pub mod push;
pub use push::LfsPushCmd;

pub mod smudge;
pub use smudge::LfsSmudgeCmd;

pub mod status;
pub use status::LfsStatusCmd;

pub mod track;
pub use track::LfsTrackCmd;

pub mod untrack;
pub use untrack::LfsUntrackCmd;

use async_trait::async_trait;
use clap::Command;

use liboxen::error::OxenError;
use std::collections::HashMap;

use crate::cmd::RunCmd;

pub const NAME: &str = "lfs";
pub struct LfsCmd;

#[async_trait]
impl RunCmd for LfsCmd {
fn name(&self) -> &str {
NAME
}

fn args(&self) -> Command {
let mut command = Command::new(NAME)
.about("Oxen large file storage (Git LFS replacement)")
.subcommand_required(true)
.arg_required_else_help(true);

let sub_commands = Self::get_subcommands();
for cmd in sub_commands.values() {
command = command.subcommand(cmd.args());
}
command
}

async fn run(&self, args: &clap::ArgMatches) -> Result<(), OxenError> {
let sub_commands = Self::get_subcommands();
if let Some((name, sub_matches)) = args.subcommand() {
let Some(cmd) = sub_commands.get(name) else {
eprintln!("Unknown lfs subcommand {name}");
return Err(OxenError::basic_str(format!(
"Unknown lfs subcommand {name}"
)));
};

tokio::task::block_in_place(|| {
tokio::runtime::Handle::current().block_on(cmd.run(sub_matches))
})?;
}
Ok(())
}
}

impl LfsCmd {
fn get_subcommands() -> HashMap<String, Box<dyn RunCmd>> {
let commands: Vec<Box<dyn RunCmd>> = vec![
Box::new(LfsCleanCmd),
Box::new(LfsEnvCmd),
Box::new(LfsFetchAllCmd),
Box::new(LfsFilterProcessCmd),
Box::new(LfsInitCmd),
Box::new(LfsInstallCmd),
Box::new(LfsPullCmd),
Box::new(LfsPushCmd),
Box::new(LfsSmudgeCmd),
Box::new(LfsStatusCmd),
Box::new(LfsTrackCmd),
Box::new(LfsUntrackCmd),
];
let mut runners: HashMap<String, Box<dyn RunCmd>> = HashMap::new();
for cmd in commands {
runners.insert(cmd.name().to_string(), cmd);
}
runners
}
}
53 changes: 53 additions & 0 deletions oxen-rust/src/cli/src/cmd/lfs/clean.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
use async_trait::async_trait;
use clap::{Arg, Command};

use liboxen::constants::OXEN_HIDDEN_DIR;
use liboxen::error::OxenError;
use liboxen::lfs;

use crate::cmd::RunCmd;

pub const NAME: &str = "clean";
pub struct LfsCleanCmd;

#[async_trait]
impl RunCmd for LfsCleanCmd {
fn name(&self) -> &str {
NAME
}

fn args(&self) -> Command {
Command::new(NAME)
.about("Clean filter for a single file (invoked by Git)")
.arg(Arg::new("separator").long("").hide(true))
.arg(
Arg::new("file")
.help("Path to the file being cleaned")
.required(false),
)
}

async fn run(&self, _args: &clap::ArgMatches) -> Result<(), OxenError> {
let repo_root = std::env::current_dir()?;
let versions_dir = repo_root.join(OXEN_HIDDEN_DIR).join("versions");

// Read content from stdin.
let content = {
use std::io::Read;
let mut buf = Vec::new();
std::io::stdin().read_to_end(&mut buf)?;
buf
};

let result = lfs::filter::clean(&versions_dir, &content).await?;

// Write result to stdout.
{
use std::io::Write;
std::io::stdout().write_all(&result)?;
std::io::stdout().flush()?;
}

Ok(())
}
}
Loading