Skip to content

Binary archive#416

Merged
dpastoor merged 4 commits intomainfrom
binary_archive
Feb 25, 2026
Merged

Binary archive#416
dpastoor merged 4 commits intomainfrom
binary_archive

Conversation

@weswc
Copy link
Member

@weswc weswc commented Feb 17, 2026

In order to improve performance, rv will respect binary archive end points.

In standard CRAN-like repositories, archived packages are only kept as source in the following structure:

src/contrib/
├─ Archive/
│  ├─ pkg/
│  │  ├─ pkg_0.1.0.tar.gz
├─ pkg_0.2.0.tar.gz
bin/macosx/big-sur-<arch>/contrib/<r_version>/
├─ pkg_0.2.0.tgz

This PR expands this concept for repositories that wish to expand the binary availability by adding the Archive structure to each binary end point

src/contrib/
├─ Archive/
│  ├─ pkg/
│  │  ├─ pkg_0.1.0.tar.gz
├─ pkg_0.2.0.tar.gz
bin/macosx/big-sur-<arch>/contrib/<r_version>/
├─ Archive/
│  ├─ pkg/
│  │  ├─ pkg_0.1.0.tgz
├─ pkg_0.2.0.tgz

@weswc
Copy link
Member Author

weswc commented Feb 17, 2026

Test run on linux OS, where the lockfile was mutated to point to dplyr 1.1.3 when the available version is 1.1.4. PPM supports the archive at the linux binary end point to support them as complete CRAN-like repos, but serves sources in the Archive.

NOTE: The package kind is determined at resolution by looking at the package's available in the binary repository db. Since it is listed, it gets labeled as source, even though it is truly installed as binary.

cargo run --all-features -- sync -c example_projects/simple/rproject.toml -vvv
...Logging related to installing dependencies...

[2026-02-17T16:16:17Z DEBUG rv::sync::handler] Installing dplyr (source) on worker 4
[2026-02-17T16:16:17Z DEBUG rv::sync::sources::repositories] Package dplyr (1.1.3) not found in cache, trying to download it.
[2026-02-17T16:16:17Z WARN  rv::sync::sources::repositories] Failed to download source from https://packagemanager.posit.co/cran/2025-05-12/src/contrib/dplyr_1.1.3.tar.gz: Failed to download package: Failed to download file from `https://packagemanager.posit.co/cran/2025-05-12/src/contrib/dplyr_1.1.3.tar.gz`. Trying binary archive
[2026-02-17T16:16:17Z DEBUG rv::http] Downloaded from https://packagemanager.posit.co/cran/__linux__/noble/2025-05-12/src/contrib/Archive/dplyr/dplyr_1.1.3.tar.gz?r_version=4.5&arch=x86_64 in 345ms
[2026-02-17T16:16:17Z DEBUG rv::fs] Using NFS optimization for untarring archive
[2026-02-17T16:16:17Z DEBUG rv::fs] 452 files found, using parallel copy with a default of 4 threads
[2026-02-17T16:16:18Z DEBUG rv::http] Successfully extracted archive to /data/user-homes/wes/projects/rv/cache/rv/1501a23f89/4.5/x86_64/noble/dplyr/1.1.3 (in sub folder: Some("/data/user-homes/wes/projects/rv/cache/rv/1501a23f89/4.5/x86_64/noble/dplyr/1.1.3/dplyr")) in 399ms
[2026-02-17T16:16:18Z DEBUG rv::sync::sources::repositories] dplyr was expected as binary, found to be source
[2026-02-17T16:16:18Z DEBUG rv::sync::sources::repositories] Compiling package from /data/user-homes/wes/projects/rv/cache/rv/1501a23f89/src/dplyr/1.1.3/dplyr
[2026-02-17T16:16:18Z DEBUG rv::sync::link] Link mode Copy forced
[2026-02-17T16:16:18Z DEBUG rv::sync::link] Copying package from "/data/user-homes/wes/projects/rv/cache/rv/1501a23f89/src/dplyr/1.1.3/dplyr" to "/tmp/.tmpvyMvqQ".
[2026-02-17T16:16:18Z DEBUG rv::r_cmd] Compiling /data/user-homes/wes/projects/rv/cache/rv/1501a23f89/src/dplyr/1.1.3/dplyr with env vars: R_LIBS_SITE=/data/user-homes/wes/projects/rv/example_projects/simple/rv/library/4.5/x86_64/noble/__rv__staging:/data/user-homes/wes/projects/rv/example_projects/simple/rv/library/4.5/x86_64/noble R_LIBS_USER=/data/user-homes/wes/projects/rv/example_projects/simple/rv/library/4.5/x86_64/noble/__rv__staging:/data/user-homes/wes/projects/rv/example_projects/simple/rv/library/4.5/x86_64/noble _R_SHLIB_STRIP_=true
[2026-02-17T16:16:26Z DEBUG rv::sync::link] Symlinking package from "/data/user-homes/wes/projects/rv/cache/rv/1501a23f89/4.5/x86_64/noble/dplyr/1.1.3" to "example_projects/simple/rv/library/4.5/x86_64/noble/__rv__staging".
[2026-02-17T16:16:26Z DEBUG rv::sync::handler] Completed installing dplyr (16/16)
[2026-02-17T16:16:26Z INFO  rv::cli::sync] Synced dependencies in 10556ms
Downloaded (16):
  + R6           2.6.1  binary  https://packagemanager.posit.co/cran/2025-05-12     132ms
  + cli          3.6.5  binary  https://packagemanager.posit.co/cran/2025-05-12     200ms
  + dplyr        1.1.3  source  https://packagemanager.posit.co/cran/2025-05-12      9.3s
  + fansi        1.0.6  binary  https://packagemanager.posit.co/cran/2025-05-12     166ms
  + generics     0.1.4  binary  https://packagemanager.posit.co/cran/2025-05-12     130ms
  + glue         1.8.0  binary  https://packagemanager.posit.co/cran/2025-05-12     193ms
  + lifecycle    1.0.4  binary  https://packagemanager.posit.co/cran/2025-05-12     169ms
  + magrittr     2.0.3  binary  https://packagemanager.posit.co/cran/2025-05-12     173ms
  + pillar      1.10.2  binary  https://packagemanager.posit.co/cran/2025-05-12     200ms
  + pkgconfig    2.0.3  binary  https://packagemanager.posit.co/cran/2025-05-12     162ms
  + rlang        1.1.6  binary  https://packagemanager.posit.co/cran/2025-05-12     295ms
  + tibble       3.2.1  binary  https://packagemanager.posit.co/cran/2025-05-12     143ms
  + tidyselect   1.2.1  binary  https://packagemanager.posit.co/cran/2025-05-12     142ms
  + utf8         1.2.5  binary  https://packagemanager.posit.co/cran/2025-05-12     180ms
  + vctrs        0.6.5  binary  https://packagemanager.posit.co/cran/2025-05-12     184ms
  + withr        3.0.2  binary  https://packagemanager.posit.co/cran/2025-05-12     174ms

sync completed in 10.6s (16 installed, 0 removed)

@weswc
Copy link
Member Author

weswc commented Feb 17, 2026

Similar on mac, we see it trying to fetch Archive/dplyr/dplyr_1.1.4.tar.gz from the mac binary end point, before falling back to source archive

2026-02-17T17:57:46Z DEBUG rv::sync::sources::repositories] Package dplyr (1.1.4) not found in cache, trying to download it.
[2026-02-17T17:57:46Z WARN  rv::sync::sources::repositories] Failed to download source from https://packagemanager.posit.co/cran/latest/src/contrib/dplyr_1.1.4.tar.gz: Failed to download package: Failed to download file from `https://packagemanager.posit.co/cran/latest/src/contrib/dplyr_1.1.4.tar.gz`. Trying binary archive
[2026-02-17T17:57:47Z WARN  rv::sync::sources::repositories] Failed to download binary archive from https://packagemanager.posit.co/cran/latest/bin/macosx/big-sur-arm64/contrib/4.5/Archive/dplyr/dplyr_1.1.4.tgz: Failed to download package: Failed to download file from `https://packagemanager.posit.co/cran/latest/bin/macosx/big-sur-arm64/contrib/4.5/Archive/dplyr/dplyr_1.1.4.tgz`. Trying archive
[2026-02-17T17:57:47Z DEBUG rv::http] Downloaded from https://packagemanager.posit.co/cran/latest/src/contrib/Archive/dplyr/dplyr_1.1.4.tar.gz in 348ms
[2026-02-17T17:57:47Z DEBUG rv::http] Successfully extracted archive to /Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4 (in sub folder: Some("/Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr")) in 135ms
[2026-02-17T17:57:47Z DEBUG rv::sync::sources::repositories] Compiling package from /Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr
[2026-02-17T17:57:47Z DEBUG rv::sync::link] Link mode Copy forced
[2026-02-17T17:57:47Z DEBUG rv::sync::link] Copying package from "/Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr" to "/var/folders/n_/p8p3xyjn17j46vzr511810900000gn/T/.tmpTFkqfI".
[2026-02-17T17:57:47Z DEBUG rv::r_cmd] Compiling /Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr with env vars: R_LIBS_SITE=/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64/__rv__staging:/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64 R_LIBS_USER=/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64/__rv__staging:/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64 _R_SHLIB_STRIP_=true
[2026-02-17T17:57:52Z DEBUG rv::sync::link] Cloning package from "/Users/wescummings/projects/rv/cache/rv/f4780dda90/4.5/arm64/dplyr/1.1.4" to "example_projects/simple/rv/library/4.5/arm64/__rv__staging".
[2026-02-17T17:57:52Z DEBUG rv::sync::handler] Completed installing dplyr (15/15)
[2026-02-17T17:57:52Z INFO  rv::cli::sync] Synced dependencies in 7586ms
Downloaded (15):
  + R6           2.6.1  binary  https://packagemanager.posit.co/cran/latest     340ms
  + cli          3.6.5  binary  https://packagemanager.posit.co/cran/latest     526ms
  + dplyr        1.1.4  source  https://packagemanager.posit.co/cran/latest      5.5s
  + generics     0.1.4  binary  https://packagemanager.posit.co/cran/latest     344ms
  + glue         1.8.0  binary  https://packagemanager.posit.co/cran/latest     400ms
  + lifecycle    1.0.5  binary  https://packagemanager.posit.co/cran/latest     306ms
  + magrittr     2.0.4  binary  https://packagemanager.posit.co/cran/latest     331ms
  + pillar      1.11.1  binary  https://packagemanager.posit.co/cran/latest     405ms
  + pkgconfig    2.0.3  binary  https://packagemanager.posit.co/cran/latest     314ms
  + rlang        1.1.7  binary  https://packagemanager.posit.co/cran/latest     533ms
  + tibble       3.3.1  binary  https://packagemanager.posit.co/cran/latest     401ms
  + tidyselect   1.2.1  binary  https://packagemanager.posit.co/cran/latest     342ms
  + utf8         1.2.6  binary  https://packagemanager.posit.co/cran/latest     388ms
  + vctrs        0.7.1  binary  https://packagemanager.posit.co/cran/latest     425ms
  + withr        3.0.2  binary  https://packagemanager.posit.co/cran/latest     396ms

sync completed in 7.6s (15 installed, 0 removed)

@weswc weswc requested a review from dpastoor February 17, 2026 18:01
@Keats
Copy link
Collaborator

Keats commented Feb 18, 2026

NOTE: The package kind is determined at resolution by looking at the package's available in the binary repository db. Since it is listed, it gets labeled as source, even though it is truly installed as binary.

Can we fix it?

@weswc
Copy link
Member Author

weswc commented Feb 18, 2026

There's a couple reasons I would be fine leaving it:

  • This is unique to PRISM. No other common repos use binary archives, but we are trying to push the/our ecosystem ahead
  • For PPM, it stores an archive under the linux bin path, but it just serves source. Highly related to why when we download "binaries" we have to make sure they are, especially for linux

To me, this case is small enough having the discrepency would be fine, but if we want to take the jump and start to report on a sync what the true source type is for all packages, I can include that in the PR (i.e. we assume local and URL are source, and return it as source, even if we do find a binary at sync time)

}
}

// 3. Download binary from archive
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we try binary from archive before source?

@dpastoor
Copy link
Member

[2026-02-17T16:16:17Z DEBUG rv::sync::sources::repositories] Package dplyr (1.1.3) not found in cache, trying to download it.
[2026-02-17T16:16:17Z WARN rv::sync::sources::repositories] Failed to download source from https://packagemanager.posit.co/cran/2025-05-12/src/contrib/dplyr_1.1.3.tar.gz: Failed to download package: Failed to download file from https://packagemanager.posit.co/cran/2025-05-12/src/contrib/dplyr_1.1.3.tar.gz. Trying binary archive
[2026-02-17T16:16:17Z DEBUG rv::http] Downloaded from https://packagemanager.posit.co/cran/__linux__/noble/2025-05-12/src/contrib/Archive/dplyr/dplyr_1.1.3.tar.gz?r_version=4.5&arch=x86_64 in 345ms
[2026-02-17T16:16:17Z DEBUG rv::fs] Using NFS optimization for untarring archive

i'm confused on this ordering of it attempting the src first, then "falling back" to the binary archive.

eg I'd basically expect the flow to be binary --> binary archive --> src --> src archive for the sequence of lookups

@weswc
Copy link
Member Author

weswc commented Feb 18, 2026

[2026-02-18T21:36:36Z DEBUG rv::sync::handler] Installing dplyr (source) on worker 2
[2026-02-18T21:36:36Z DEBUG rv::sync::sources::repositories] Package dplyr (1.1.4) not found in cache, trying to download it.
[2026-02-18T21:36:36Z WARN  rv::sync::sources::repositories] Failed to download archived binary from `https://packagemanager.posit.co/cran/latest/bin/macosx/big-sur-arm64/contrib/4.5/Archive/dplyr/dplyr_1.1.4.tgz`. Trying source
[2026-02-18T21:36:36Z WARN  rv::sync::sources::repositories] Failed to download source from `https://packagemanager.posit.co/cran/latest/src/contrib/dplyr_1.1.4.tar.gz`. Trying source archive
[2026-02-18T21:36:37Z DEBUG rv::http] Downloaded from https://packagemanager.posit.co/cran/latest/src/contrib/Archive/dplyr/dplyr_1.1.4.tar.gz in 390ms
[2026-02-18T21:36:37Z DEBUG rv::http] Successfully extracted archive to /Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4 (in sub folder: Some("/Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr")) in 129ms
[2026-02-18T21:36:37Z DEBUG rv::sync::sources::repositories] Compiling package from /Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr
[2026-02-18T21:36:37Z DEBUG rv::sync::link] Link mode Copy forced
[2026-02-18T21:36:37Z DEBUG rv::sync::link] Copying package from "/Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr" to "/var/folders/n_/p8p3xyjn17j46vzr511810900000gn/T/.tmpfiCd6I".
[2026-02-18T21:36:37Z DEBUG rv::r_cmd] Compiling /Users/wescummings/projects/rv/cache/rv/f4780dda90/src/dplyr/1.1.4/dplyr with env vars: R_LIBS_SITE=/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64/__rv__staging:/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64 R_LIBS_USER=/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64/__rv__staging:/Users/wescummings/projects/rv/example_projects/simple/rv/library/4.5/arm64 _R_SHLIB_STRIP_=true
[2026-02-18T21:36:42Z DEBUG rv::sync::link] Cloning package from "/Users/wescummings/projects/rv/cache/rv/f4780dda90/4.5/arm64/dplyr/1.1.4" to "example_projects/simple/rv/library/4.5/arm64/__rv__staging".
[2026-02-18T21:36:42Z DEBUG rv::sync::handler] Completed installing dplyr (15/15)
[2026-02-18T21:36:42Z INFO  rv::cli::sync] Synced dependencies in 7573ms
Downloaded (15):
  + R6           2.6.1  binary  https://packagemanager.posit.co/cran/latest     352ms
  + cli          3.6.5  binary  https://packagemanager.posit.co/cran/latest     517ms
  + dplyr        1.1.4  source  https://packagemanager.posit.co/cran/latest      5.5s
  + generics     0.1.4  binary  https://packagemanager.posit.co/cran/latest     350ms
  + glue         1.8.0  binary  https://packagemanager.posit.co/cran/latest     364ms
  + lifecycle    1.0.5  binary  https://packagemanager.posit.co/cran/latest     288ms
  + magrittr     2.0.4  binary  https://packagemanager.posit.co/cran/latest     355ms
  + pillar      1.11.1  binary  https://packagemanager.posit.co/cran/latest     368ms
  + pkgconfig    2.0.3  binary  https://packagemanager.posit.co/cran/latest     305ms
  + rlang        1.1.7  binary  https://packagemanager.posit.co/cran/latest     526ms
  + tibble       3.3.1  binary  https://packagemanager.posit.co/cran/latest     368ms
  + tidyselect   1.2.1  binary  https://packagemanager.posit.co/cran/latest     309ms
  + utf8         1.2.6  binary  https://packagemanager.posit.co/cran/latest     378ms
  + vctrs        0.7.1  binary  https://packagemanager.posit.co/cran/latest     468ms
  + withr        3.0.2  binary  https://packagemanager.posit.co/cran/latest     375ms

@rzofchak-a2ai and I talked about it before implementing and thought we should defer to what is "available" before digging into the archive, but also see wanting the speed benefit of the archive. Fortunately, the case where we have both a valid archived binary and a valid non-archived source is quite low, so shouldn't matter very much.

I'm also using the from_lockfile arg to not go check the archive when installing non-locked packages

&paths,
"testpkg",
&PackageType::Binary,
false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm thinking this might be better as a struct for readability - reading this code trying to map what each oft hargs are gets tough. A struct is a little more boilerplate but is clear whats there, and even reading the tests could tell what falues are permuting.

@dpastoor dpastoor merged commit b64214b into main Feb 25, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants