Feature Request: Make zipping optional

## Summary

Currently BABS always zips outputs. For my workflows, raw BIDS-derivative output would be preferable because it enables cleaner integration into [BIDS-study layouts](https://bids-specification.readthedocs.io/en/stable/common-principles.html#study-dataset) with better provenance tracking.

In particular, zipping works fine for individual usage where some manual intervention is acceptable, but for larger automation workflows with many datasets, it would be ideal to produce the outputs into their final place.

## BIDS-study output structure goal

See [BIDS common principles](https://bids-specification.readthedocs.io/en/stable/common-principles.html#study-dataset) and [PR #1741](https://github.com/bids-standard/bids-specification/pull/1741) for background on study-level dataset organization.

```
my-study/
  sourcedata/raw/               # BIDS raw dataset (subdataset)
  code/
    containers/                 # containers should be considered part of code
  derivatives/
    mriqc/                      # MRIQC outputs (subdataset)
      sourcedata/raw/           # reference to input (YODA)
      code/                     # processing scripts (provenance)
      sub-01/
      sub-02/
      dataset_description.json  # single dataset_description instead of 1 from each subject
    fmriprep/                   # fMRIPrep outputs (subdataset)
      sourcedata/raw/
      code/
      sub-01/
      sub-02/
      dataset_description.json
```

With raw outputs (no zip), after `babs merge` you could clone the output_ria directly into the BIDS-study:

```bash
datalad clone ria+file://output_ria#~data derivatives/mriqc
```

This gets very close to the desired structure - the clone is the final derivative, ready to use, with full provenance intact.

(Note: BABS's default input path is `inputs/data`, but this is overridable via `path_in_babs` config to use `sourcedata/raw` for BIDS-study compliance.)

## Problems with current zip-only approach

- **Provenance obscured:** Unzipping with `datalad run` records "unzip" as provenance rather than the original BIDS app command. The true provenance isn't lost, but it's obscured.
- **Duplicate large files:** After unzipping, git-annex has both the zips and the raw files. These can be dropped, but it's an extra step that is easily forgotten. Minor but potentially wasteful.
- **Unzip conflicts:** Each per-subject zip contains shared files like `dataset_description.json` and `.bidsignore`. When unzipping, these conflict. They're probably identical (same container version = same output), but with raw outputs stored in git (not zips), merge would catch any surprises.

## For raw outputs to work

For per-subject branches to merge cleanly, JSON and TSV files should be stored in git (not annex) - either via `text2git` config or `.gitattributes`. This allows git's octopus merge to handle identical files, and surface conflicts if they unexpectedly differ.

## Suggested config option

Ideally the user could choose:
- Raw outputs committed (no zip)
- Zipped outputs (current behavior)

## Implementation approaches

**Least invasive:** Modify existing script generation template to optionally include the zip step. The generated `*_zip.sh` script already handles both execution and zipping - it could conditionally skip the zip.

**With containers-run (see https://github.com/PennLINC/babs/issues/328):** Separating `containers-run` from zip enables two explicit commits - one for the BIDS app, one for zipping. This gives cleaner provenance and makes the zip step trivially optional, but is a larger change.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Make zipping optional #327

Summary

BIDS-study output structure goal

Problems with current zip-only approach

For raw outputs to work

Suggested config option

Implementation approaches

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Make zipping optional #327

Description

Summary

BIDS-study output structure goal

Problems with current zip-only approach

For raw outputs to work

Suggested config option

Implementation approaches

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions