feat(provide): +unique and +entities strategy modifiers#11245
Draft
feat(provide): +unique and +entities strategy modifiers#11245
Conversation
8d8d18c to
420b111
Compare
- config: ParseProvideStrategy returns error, rejects "all" mixed with selective strategies, removes dead strategy==0 check - config: add MustParseProvideStrategy for pre-validated call sites - config: ValidateProvideConfig validates strategy at startup - config: ShouldProvideForStrategy uses bitmask check for ProvideStrategyAll - core/node: downstream callers use MustParseProvideStrategy - core/node: fix Pinning() nil return that caused fx.Provide panic
420b111 to
4468527
Compare
- ProvideStrategyUnique: bloom filter cross-DAG deduplication - ProvideStrategyEntities: entity-aware traversal (implies Unique) - parser: "unique" and "entities" tokens recognized - validation: modifiers must combine with pinned/mfs, incompatible with all/roots - go.mod: update boxo to feat/provide-entity-roots-with-dedup (VisitedTracker, WalkDAG, WalkEntityRoots, NewConcatProvider, NewUniquePinnedProvider, NewPinnedEntityRootsProvider)
pure rename, no behavior change. prepares for ExecuteFastProvideDAG which will walk the DAG according to Provide.Strategy.
adds ExecuteFastProvideRoot calls to pin add and pin update, matching the behavior of ipfs add and ipfs dag import. respects Import.FastProvideRoot and Import.FastProvideWait config options. previously, pin add/update did not trigger any immediate providing, leaving pinned content invisible to the DHT until the next reprovide cycle (up to 22h).
when Provide.Strategy includes +unique, the reprovide cycle uses a shared BloomTracker across all sub-walks (MFS, recursive pins, direct pins). duplicate sub-DAG branches across recursive pins are detected and skipped, reducing traversal from O(pins * total_blocks) to O(unique_blocks). - readLastUniqueCount / persistUniqueCount: persist bloom sizing count between cycles at /reprovideLastUniqueCount - uniqueMFSProvider: MFS walker with shared tracker + locality check - createKeyProvider restructured: +unique bit checked first, non-unique strategies fall through to existing switch unchanged - per-cycle fresh BloomTracker sized from previous cycle's count - channel wrapper persists count on successful cycle completion
when Provide.Strategy includes +entities (which implies +unique), the reprovide cycle uses WalkEntityRoots instead of WalkDAG, emitting only entity roots (files, directories, HAMT shards) and skipping internal file chunks. - mfsEntityRootsProvider: MFS walk with entity root detection - createKeyProvider: select walker based on +entities flag via function references (makePinProv / makeMFSProv) to avoid duplicating the stream wiring logic - all combinations: pinned+entities, mfs+entities, pinned+mfs+entities
- config.md: document +unique, +entities modifiers with caveats (range request limitation, roots vs entities distinction) - changelog v0.41: add entries for strategy modifiers, pin add/update fast-provide, and hardened strategy parsing
per-block providing during ipfs add is now opt-in via --fast-provide-dag (or Import.FastProvideDAG config, default: false). without it, only the root CID is fast-provided after add, and the reprovide cycle handles the rest. this changes the default for Provide.Strategy=pinned: previously every block was provided during write, now only the root is immediate. use --fast-provide-dag=true to restore the previous behavior. Provide.Strategy=all is unaffected (blockstore hook provides on Put).
pin add and pin update now accept the same --fast-provide-root and --fast-provide-wait CLI flags as ipfs add and ipfs dag import, with the same config fallbacks (Import.FastProvideRoot, Import.FastProvideWait). previously these were config-only with no CLI override.
--fast-provide-dag now available on ipfs add, ipfs dag import, ipfs pin add, and ipfs pin update (matching --fast-provide-root). - ExecuteFastProvideDAG accepts []cid.Cid so multiple roots share one bloom tracker (cross-root dedup for dag import and pin add) - --fast-provide-dag supersedes --fast-provide-root (DAG walk includes the root CID as the first emitted via DFS pre-order) - wait parameter: when true blocks until walk completes, when false runs in background goroutine - Import.FastProvideDAG config option (default: false)
05f8870 to
07d7c66
Compare
- strategy section: clearer trade-offs, suggested configurations, memory comparison with concrete numbers - Import.FastProvideDAG: new config option documentation - Import.FastProvideRoot/Wait: updated to mention pin commands - all three Import.FastProvide* options: consistent "Applies to" lists
…-roots-with-dedup
800a1ef to
a858eb1
Compare
when TEST_DHT_STUB=1, the CLI test harness creates 20 in-process libp2p hosts on loopback, each running a DHT server with a shared in-memory ProviderStore. kubo daemons bootstrap to them over real TCP, exercising the full DHT code path without public internet. tests opt in via h.SetStubBootstrap(nodes) after Init(). on the daemon side, WAN DHT filters (AddressFilter, QueryFilter, RoutingTableFilter, RoutingTablePeerDiversityFilter) are lifted to accept loopback peers when TEST_DHT_STUB is set. depends on: github.com/libp2p/go-libp2p-kad-dht#1241
a858eb1 to
4a47439
Compare
add sweep reprovide tests for all strategies (all, pinned, roots, mfs, pinned+mfs). each test waits for two reprovide cycles to confirm the schedule runs repeatedly. sweep uses short Provide.DHT.Interval and polls provide stat --enc=json. harden negative assertions: - roots: test excludes child blocks of a recursive pin (not just unpinned content), using --only-hash to learn the child CID - mfs: test that pinned content outside MFS is not provided fix: ipfs add --only-hash no longer triggers fast-provide or pinning (was providing CIDs for data that was never stored) rename SetStubBootstrap to BootstrapWithStubDHT with lazy-init (ephemeral peers created on first call, not on harness creation)
…-roots-with-dedup # Conflicts: # docs/changelogs/v0.41.md
d52b242 to
8ae795c
Compare
strategy tests for pinned+mfs+unique and pinned+mfs+entities, covering both provide-at-add-time and reprovide (two cycles). content uses a nested DAG (root/subdir/largefile with 1 MiB chunks) to exercise the walker on multi-level structures. BootstrapWithStubDHT is now self-contained: it always creates 20 ephemeral DHT peers on loopback and sets TEST_DHT_STUB=1 on each node's environment so the daemon lifts WAN DHT filters. no external env var needed. the sweep provider requires >=20 DHT peers to estimate network size (prefix length); without enough peers it stays offline and never provides. TEST_DHT_STUB on the daemon side lifts WAN DHT filters (AddressFilter, QueryFilter, RoutingTableFilter, RoutingTablePeerDiversityFilter) to accept loopback peers. this is set automatically by BootstrapWithStubDHT. other changes: - Provide.DHT.Interval=30s in sweep reprovide tests (was 1m) - uniq() helper for unique CIDs across parallel subtests - ipfs add --only-hash disables fast-provide and pinning
8ae795c to
0243a1c
Compare
…-roots-with-dedup
ipfs add --help: rewrite fast-provide section with clear structure (content discoverability, flag defaults, strategy=all behavior) ipfs routing reprovide: mark as deprecated, note it returns an error with sweep provider, log error with actionable guidance changelog: fix missing --fast-provide-dag flag on pin commands, use "routing system" instead of "DHT" where applicable, link to docs/config.md as source of truth for defaults environment-variables.md: note that BootstrapWithStubDHT sets TEST_DHT_STUB automatically, no external env var needed
the fork (NoopMessageSender, MsgSenderBuilder) is no longer used. the ephemeral peer pool in BootstrapWithStubDHT replaced the NoopMessageSender approach.
log providedCIDs and skippedBranches after each unique reprovide cycle and fast-provide-dag walk. tests verify exact counts with two dir pins sharing a 10 KiB file (5 KiB chunks): fast-provide-dag asserts 5 provided + 1 skipped branch, reprovide asserts 6 provided + 1 skipped branch (includes empty MFS root pin). both assert bloom tracker created and no autoscale. updates boxo to pick up Deduplicated() counter, bloom creation/autoscale logging, and review feedback fixes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Warning
not ready for review, this is a sandbox for running CI
Summary
Provide.Strategymodifiers (+uniqueand+entities) for nodes with large, overlapping pin sets (e.g. https://collab.ipfscluster.io hosting https://github.com/ipfs/distributions)pin add/pin update, new--fast-provide-dagflagipfs add --only-hashbug fixChanges
+uniqueand+entitiesstrategy modifiersNew opt-in modifiers for
Provide.Strategy:+unique: bloom filter dedup across recursive pins. Shared subtrees traversed once per reprovide cycle instead of once per pin. ~4 bytes/CID memory. LogsprovidedCIDsandskippedBranchesafter each cycle.+entities: announces only entity roots (files, directories, HAMT shards), skipping internal file chunks. Implies+unique.Example:
Provide.Strategy = "pinned+mfs+entities"Default
Provide.Strategy=allis unchanged. Seedocs/config.md#providestrategyfor details.Fast-provide on
pin addandpin updateBoth commands now accept
--fast-provide-root,--fast-provide-dag, and--fast-provide-wait, matchingipfs addandipfs dag import. Root CID is announced immediately after pinning. Seedocs/config.md#importfor defaults.--fast-provide-dagflagNew flag on
ipfs add,ipfs dag import,ipfs pin add,ipfs pin update. Walks and provides the full DAG immediately using the active strategy. No effect withProvide.Strategy=all(blockstore already provides every block on write). Configurable viaImport.FastProvideDAG(default: false).Hardened strategy parsing
Unknown tokens, empty tokens, and invalid combinations now produce clear errors at startup instead of being silently ignored.
ipfs routing reprovidedeprecatedMarked as deprecated. Returns an error with the sweep provider (default). Use
ipfs provide stat -ato monitor reprovide progress.Bug fix:
ipfs add --only-hash--only-hashno longer triggers fast-provide or pinning.Provider strategy test suite
Full test coverage for both legacy and sweep providers across all strategies (
all,pinned,roots,mfs,pinned+mfs,pinned+mfs+unique,pinned+mfs+entities):+uniquededup tests assert exactprovidedCIDsandskippedBranchescounts+entitiestests use nested DAGs with chunked files to verify chunks are skippedrootstests verify child blocks of a pin are excluded;mfstests verify pinned content outside MFS is excludedBootstrapWithStubDHT(nodes)creates ephemeral DHT peers on loopback for the sweep provider (needs >=20 peers to estimate network size)Compatibility
Provide.Strategy=all)+uniqueand+entitiesare opt-in--fast-provide-dagdefaults to falseDepends on
boxo#1124:dag/walker(BloomTracker, WalkEntityRoots, WalkDAG),pinning/dspinner(NewUniquePinnedProvider, NewPinnedEntityRootsProvider)Context