Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,4 @@ md5 = "0.8.0"
jiff = "0.2.18"
anyhow = "1.0.100"
whoami = "2"
uuid = "1.20.0"
147 changes: 8 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,145 +2,14 @@

Rewrite of `dvs`, the data-version-control system made by A2-AI.

DVS (Data Version System) is a tool for versioning large or sensitive files under Git without tracking the file content directly. It uses content-addressable storage with blake3 hashing.
DVS (Data Version System) is a tool for versioning large or sensitive files under Git without tracking the file content directly.

## Installation

The CLI binary is named `dvs`. Install from source:
## TODOs

```bash
# Install with locked dependencies (recommended)
cargo install --path dvs-cli --locked

# Force reinstall if already installed
cargo install --path dvs-cli --locked --force
```

Or build directly:

```bash
cargo build -p dvs-cli --release
# Binary will be at target/release/dvs
```

## Usage

```bash
# Initialize DVS in a repository
dvs init <storage_dir>

# Add files to DVS tracking
dvs add <files...>

# Restore files from storage
dvs get <files...>

# Check file status
dvs status [files...]

# Push objects to remote
dvs push [--remote URL]

# Pull objects from remote
dvs pull [--remote URL]

# Materialize files from manifest
dvs materialize [files...]

# View reflog history
dvs log [-n N]

# Rollback to previous state
dvs rollback <target>
```

### Batch Operations

Commands that accept file arguments also support `--batch` to read paths from stdin:

```bash
# Add files listed in a file
cat files.txt | dvs add --batch

# Process output from find
find . -name "*.csv" | dvs add --batch

# Batch format supports comments and blank lines
echo "data.csv
# This is a comment
results.json" | dvs add --batch
```

### Output Formats

All commands support `--format json` for machine-readable output:

```bash
dvs status --format json
dvs add data.csv --format json
```

Use `--quiet` to suppress non-error output, or `--output null` to discard output entirely.

## Development

### Building

```bash
# Build workspace (dvs-core, dvs-cli)
just build

# Build R package
just rpkg-build

# Build everything
just build-all
```

### Testing

```bash
# Run workspace tests
just test

# Run R package Rust tests
just rpkg-test

# Run all tests
just test-all
```

### R Package Maintenance

The R package (`dvsR`) uses vendored miniextendr crates for CRAN compliance. When developing with a local miniextendr checkout, use these commands to keep vendored sources up to date:

```bash
# Automatic staleness detection (recommended)
# Re-vendors only if miniextendr sources have changed
just rpkg-vendor-detect

# Force re-vendor (always updates vendored crates)
just rpkg-vendor-force

# Custom miniextendr path
just rpkg-vendor-with-staleness /path/to/miniextendr

# Configure R package (generates Cargo.toml, Makevars, etc.)
just rpkg-configure

# Install R package
just rpkg-install
```

### Code Quality

```bash
# Format code
just fmt

# Run clippy
just clippy

# Run all CI checks
just ci
```
- Azure backend
- GC?
- dvs remove?
- integrity check? would need to read the file again after saving it
- compression?
- migrate from dvs1
1 change: 1 addition & 0 deletions dvs/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ md5.workspace = true
jiff.workspace = true
anyhow.workspace = true
whoami.workspace = true
uuid = { version = "1.20.0", features = ["v4"] }

[target.'cfg(unix)'.dependencies]
nix = { version = "0.31", features = ["user", "fs"] }
Expand Down
10 changes: 5 additions & 5 deletions dvs/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
```

rm -rf ../.dvs ../dvs.toml ../.storage && cargo run --features=cli -- init /home/vincent/Code/a2-ai/dvsexperimental/dvs/.storage
cargo run --features=cli -- add README.md
cargo run --features=cli -- status
rm -rf .dvs dvs.toml .storage && cargo run -- init /home/vincent/Code/a2-ai/dvs2/.storage
cargo run -- add README.md
cargo run -- status
rm README.md
cargo run --features=cli -- get README.md
cargo run --features=cli -- status
cargo run -- get README.md
cargo run -- status
```
44 changes: 44 additions & 0 deletions dvs/src/audit.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
use std::path::PathBuf;

use crate::Hashes;
use anyhow::Result;
use jiff::Timestamp;
use serde::{Deserialize, Serialize};
use uuid::Uuid;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AuditFile {
pub path: PathBuf,
pub hashes: Hashes,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AuditEntry {
pub operation_id: String,
pub timestamp: i64,
pub user: String,
pub file: AuditFile,
}

impl AuditEntry {
pub fn new(operation_id: Uuid, file: AuditFile) -> Self {
let timestamp = Timestamp::now().as_second();
let user = whoami::username().unwrap_or_else(|_| "unknown".to_string());

Self {
operation_id: operation_id.to_string(),
timestamp,
user,
file,
}
}
}

pub fn parse_audit_log(bytes: &[u8]) -> Result<Vec<AuditEntry>> {
let content = std::str::from_utf8(bytes)?;
content
.lines()
.filter(|line| !line.trim().is_empty())
.map(|line| Ok(serde_json::from_str(line)?))
.collect()
}
Loading
Loading