Skip to content

Conversation

@knst
Copy link
Collaborator

@knst knst commented Jan 24, 2026

Issue being fixed or feature implemented

Separation of consensus & chain code and network & node in Dash Core is blocked by tight connection of network and consensus code; in components such as llmq::CSigningManager, llmq::CSigSharesManager, coinjoin::client, coinjoin::server, governance/, llmq::CInstantSendManager, etc.
These dependencies on PeerManager blocks backport's of bitcoin related to 'kernel' project and makes circular dependencies over net_processing; it's also a blocker for bitcoin#25144

What was done?

This PR addresses a dependency of llmq::CSigSharesManager on PeerManager, it is continuation of #6992 which addressed llmq::CSigningManager

It is a split from proof-of-concept PR #6934. Due to multiple other changes (review comments to prior PRs, ActiveContext-related refactorings, proactive signature pushing the implementation is not fully matched with original 6934, the patchset is re-created from scratch.

Prior work:

NOTE: this PR is not a direct blocker of chainstate / kernel so far as CSigSharesManager already aggregated to masternode's ActiveContext by kwvg, see #6841

How Has This Been Tested?

Run unit & functional tests
Run linter lint-circular-dependencies.py

Breaking Changes

N/A

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@knst knst added this to the 23.1 milestone Jan 24, 2026
@knst knst force-pushed the refactor-peermanager-handlers-sig-shares branch from 5695bc2 to f9ca10a Compare January 24, 2026 18:26
@knst knst force-pushed the refactor-peermanager-handlers-sig-shares branch from f9ca10a to f0802c0 Compare January 24, 2026 18:32
@knst knst changed the title refactor: separate network and consensus logic 2b/N [sign shares] refactor: separate network and consensus logic 2b/N [sig shares] Jan 24, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 24, 2026

Walkthrough

The PR removes PeerManager from ActiveContext construction and deletes ActiveContext::NotifyRecoveredSig; ActiveContext no longer starts, interrupts, or stops CSigSharesManager. NetSigning gains CValidationInterface, accepts a CSigSharesManager* and CSporkManager&, adds worker threads and a thread pool, expands message handling for sig-share and recovered-sig types with per-node banning and batching, and adds recovered-sig notification/dispatch. CSigSharesManager is refactored to use ChainstateManager, returns recovered signatures from processing APIs, removes internal worker threads and NotifyRecoveredSig, and exposes node/ban accessors. PeerManagerInternal adds PeerIsBanned and PeerRelayRecoveredSig; PeerManagerImpl implements them and RelayRecoveredSig supports proactive relays. Circular-dependency expectations updated.

Sequence Diagram(s)

sequenceDiagram
    participant Node as P2P Node
    participant NetSign as NetSigning
    participant Shares as CSigSharesManager
    participant PeerMgr as PeerManager
    participant Peers as Other Peers

    Node->>NetSign: Incoming sig-share / recovered-sig message
    NetSign->>Shares: Validate/process sig-shares (single or batched)
    alt Recovered sig produced
        Shares-->>NetSign: std::shared_ptr<CRecoveredSig>
        NetSign->>NetSign: ProcessRecoveredSig (validate, decide proactive)
        NetSign->>PeerMgr: PeerRelayRecoveredSig(sig, proactive_relay)
        PeerMgr->>Peers: Relay INV or targeted QSIGREC (proactive or broadcast)
    else No valid sig / invalid message
        NetSign-->>Node: Drop / ban node (per checks)
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • UdjinM6
  • PastaPastaPasta
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: refactoring to separate network and consensus logic by addressing sig shares, continuing a numbered series of related PRs.
Description check ✅ Passed The description comprehensively explains the issue being addressed (PeerManager dependency blocking kernel backports), what was done (removing CSigSharesManager's PeerManager dependency), prior work, testing approach, and references related PRs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/llmq/net_signing.cpp`:
- Around line 292-323: WorkThreadDispatcher currently pushes a helper task every
tick when m_shares_manager->IsAnyPendingProcessing() is true, allowing unbounded
worker_pool growth; add a single "processing worker active" flag (e.g.,
std::atomic<bool> m_processing_worker_active) and change the helper enqueue
logic in NetSigning::WorkThreadDispatcher to atomically test-and-set the flag
before pushing the helper, so only one helper is queued at a time; inside the
helper (the lambda that loops calling ProcessPendingSigShares), ensure the flag
is cleared (reset to false) when the helper exits (including all early returns
and normal completion) so another helper can be scheduled later, using
compare_exchange/store to maintain thread-safety and avoid races with
workInterrupt checks.

In `@src/llmq/signing_shares.cpp`:
- Around line 1345-1355: In CSigSharesManager::GetAllNodes(), the local
declaration "vector<NodeId> nodes" lacks the std:: qualification and causes a
compile error; change that declaration to use std::vector<NodeId> nodes so it
compiles (location: function CSigSharesManager::GetAllNodes, around the
nodes.reserve/nodeStates loop and LOCK(cs) block).
- Around line 521-533: ProcessPendingSigShares currently uses
quorums.at(quorumKey) which can throw if the quorum map loses an entry; change
the lookup to use quorums.find(quorumKey) and skip (and optionally log) that
sigShare when not found instead of calling at(). Locate ProcessPendingSigShares
and replace the direct at() call with a find() check on quorums, only call
ProcessSigShare(sigShare, it->second) when the iterator is valid, and continue
to the next sigShare (log a warning including quorumKey/quorum identifiers if
desired).

In `@src/net_processing.cpp`:
- Around line 654-655: PeerIsBanned's locking annotation doesn't match
IsBanned's contract (IsBanned requires cs_main and !m_peer_mutex), which can
break static thread-safety checks; update PeerIsBanned (and the similar methods
around PeerEraseObjectRequest) to either declare
EXCLUSIVE_LOCKS_REQUIRED(cs_main, !m_peer_mutex) to match IsBanned, or modify
the implementation to take/hold cs_main locally (e.g., acquire cs_main inside
PeerIsBanned before calling IsBanned) and update the annotation to reflect the
new behavior so the lock discipline is consistent across PeerIsBanned, IsBanned,
and related methods.

Comment on lines +292 to +323
void NetSigning::WorkThreadDispatcher()
{
assert(m_shares_manager);

while (!workInterrupt) {
// Dispatch all pending signs (individual tasks)
{
auto signs = m_shares_manager->DispatchPendingSigns();
// Dispatch all signs to worker pool
for (auto& work : signs) {
if (workInterrupt) break;

worker_pool.push([this, work = std::move(work)](int) mutable {
auto rs = m_shares_manager->SignAndProcessSingleShare(std::move(work));
ProcessRecoveredSig(rs, true);
});
}
}

if (m_shares_manager->IsAnyPendingProcessing()) {
// If there's processing work, spawn a helper worker
worker_pool.push([this](int) {
while (!workInterrupt) {
bool moreWork = ProcessPendingSigShares();

if (!moreWork) {
return; // No work found, exit immediately
}
}
});
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against unbounded worker_pool task buildup.

The dispatcher enqueues a helper on every tick while work is pending, even if a helper is already running. Under sustained load, this can grow the queue indefinitely and waste memory/CPU. Gate this with a single “processing worker active” flag so only one helper is queued at a time.

Suggested fix to avoid queue growth
diff --git a/src/llmq/net_signing.h b/src/llmq/net_signing.h
@@
-#include <thread>
+#include <thread>
+#include <atomic>
@@
     std::thread shares_cleaning_thread;
     std::thread shares_dispatcher_thread;
     mutable ctpl::thread_pool worker_pool;
+    std::atomic<bool> sigshare_worker_active{false};
diff --git a/src/llmq/net_signing.cpp b/src/llmq/net_signing.cpp
@@ void NetSigning::WorkThreadDispatcher()
-        if (m_shares_manager->IsAnyPendingProcessing()) {
-            // If there's processing work, spawn a helper worker
-            worker_pool.push([this](int) {
-                while (!workInterrupt) {
-                    bool moreWork = ProcessPendingSigShares();
-
-                    if (!moreWork) {
-                        return; // No work found, exit immediately
-                    }
-                }
-            });
-        }
+        if (m_shares_manager->IsAnyPendingProcessing()) {
+            bool expected{false};
+            if (sigshare_worker_active.compare_exchange_strong(expected, true)) {
+                // If there's processing work, spawn a single helper worker
+                worker_pool.push([this](int) {
+                    while (!workInterrupt) {
+                        bool moreWork = ProcessPendingSigShares();
+                        if (!moreWork) break;
+                    }
+                    sigshare_worker_active.store(false);
+                });
+            }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void NetSigning::WorkThreadDispatcher()
{
assert(m_shares_manager);
while (!workInterrupt) {
// Dispatch all pending signs (individual tasks)
{
auto signs = m_shares_manager->DispatchPendingSigns();
// Dispatch all signs to worker pool
for (auto& work : signs) {
if (workInterrupt) break;
worker_pool.push([this, work = std::move(work)](int) mutable {
auto rs = m_shares_manager->SignAndProcessSingleShare(std::move(work));
ProcessRecoveredSig(rs, true);
});
}
}
if (m_shares_manager->IsAnyPendingProcessing()) {
// If there's processing work, spawn a helper worker
worker_pool.push([this](int) {
while (!workInterrupt) {
bool moreWork = ProcessPendingSigShares();
if (!moreWork) {
return; // No work found, exit immediately
}
}
});
}
void NetSigning::WorkThreadDispatcher()
{
assert(m_shares_manager);
while (!workInterrupt) {
// Dispatch all pending signs (individual tasks)
{
auto signs = m_shares_manager->DispatchPendingSigns();
// Dispatch all signs to worker pool
for (auto& work : signs) {
if (workInterrupt) break;
worker_pool.push([this, work = std::move(work)](int) mutable {
auto rs = m_shares_manager->SignAndProcessSingleShare(std::move(work));
ProcessRecoveredSig(rs, true);
});
}
}
if (m_shares_manager->IsAnyPendingProcessing()) {
bool expected{false};
if (sigshare_worker_active.compare_exchange_strong(expected, true)) {
// If there's processing work, spawn a single helper worker
worker_pool.push([this](int) {
while (!workInterrupt) {
bool moreWork = ProcessPendingSigShares();
if (!moreWork) break;
}
sigshare_worker_active.store(false);
});
}
}
🤖 Prompt for AI Agents
In `@src/llmq/net_signing.cpp` around lines 292 - 323, WorkThreadDispatcher
currently pushes a helper task every tick when
m_shares_manager->IsAnyPendingProcessing() is true, allowing unbounded
worker_pool growth; add a single "processing worker active" flag (e.g.,
std::atomic<bool> m_processing_worker_active) and change the helper enqueue
logic in NetSigning::WorkThreadDispatcher to atomically test-and-set the flag
before pushing the helper, so only one helper is queued at a time; inside the
helper (the lambda that loops calling ProcessPendingSigShares), ensure the flag
is cleared (reset to false) when the helper exits (including all early returns
and normal completion) so another helper can be scheduled later, using
compare_exchange/store to maintain thread-safety and avoid races with
workInterrupt checks.

Comment on lines +521 to 527
std::vector<std::shared_ptr<CRecoveredSig>> CSigSharesManager::ProcessPendingSigShares(
const std::vector<CSigShare>& sigSharesToProcess,
const std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>& quorums)
{
cxxtimer::Timer t(true);
std::vector<std::shared_ptr<CRecoveredSig>> recovered_sigs;
for (const auto& sigShare : sigSharesToProcess) {
auto quorumKey = std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash());
ProcessSigShare(sigShare, quorums.at(quorumKey));
auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
if (rs != nullptr) {
recovered_sigs.emplace_back(std::move(rs));
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid potential out_of_range when quorum map is incomplete.

quorums.at(quorumKey) will throw if a quorum disappears between collection and processing. Guard it with find() and skip/log when missing to prevent a hard crash.

Safe lookup to avoid exceptions
@@
-        auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
+        auto it = quorums.find(quorumKey);
+        if (it == quorums.end()) {
+            LogPrintf("CSigSharesManager::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+                      __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+            continue;
+        }
+        auto rs = ProcessSigShare(sigShare, it->second);
🤖 Prompt for AI Agents
In `@src/llmq/signing_shares.cpp` around lines 521 - 533, ProcessPendingSigShares
currently uses quorums.at(quorumKey) which can throw if the quorum map loses an
entry; change the lookup to use quorums.find(quorumKey) and skip (and optionally
log) that sigShare when not found instead of calling at(). Locate
ProcessPendingSigShares and replace the direct at() call with a find() check on
quorums, only call ProcessSigShare(sigShare, it->second) when the iterator is
valid, and continue to the next sigShare (log a warning including
quorumKey/quorum identifiers if desired).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/llmq/net_signing.cpp`:
- Around line 329-407: ProcessPendingSigShares can crash when quorums.at(...) is
missing; before calling quorums.at(std::make_pair(sigShare.getLlmqType(),
sigShare.getQuorumHash())) in ProcessPendingSigShares, check quorums.find(key)
!= quorums.end() (use the pair key) and handle the missing-quorum case by
logging a warning and skipping that sigShare (or skipping the rest of that
node's shares) instead of using at(); reference the sigSharesByNodes, quorums,
ProcessPendingSigShares and CollectPendingSigSharesToVerify symbols so you add
the guard right where the quorum lookup happens and avoid dereferencing a
missing quorum entry.
♻️ Duplicate comments (3)
src/llmq/net_signing.cpp (1)

292-323: Guard against unbounded worker_pool task buildup.
This is the same issue noted previously: the dispatcher can enqueue a helper every tick under sustained load. Please gate with a single “worker active” flag.

src/llmq/signing_shares.cpp (2)

520-539: Avoid potential out_of_range on quorums.at(...).
This repeats the earlier concern: if the quorum map is incomplete, at() will throw. Please guard with find() and skip/log missing quorums.

✅ Suggested guard
-        auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
+        auto it = quorums.find(quorumKey);
+        if (it == quorums.end()) {
+            LogPrintf("CSigSharesManager::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+                      __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+            continue;
+        }
+        auto rs = ProcessSigShare(sigShare, it->second);

1345-1355: Fix missing std:: qualification (compile error).
Line 1347 uses vector<NodeId> without a std:: qualifier or using declaration.

✅ Minimal fix
-    vector<NodeId> nodes;
+    std::vector<NodeId> nodes;

Comment on lines +329 to +407
void NetSigning::NotifyRecoveredSig(const std::shared_ptr<const CRecoveredSig>& sig, bool proactive_relay)
{
m_peer_manager->PeerRelayRecoveredSig(*sig, proactive_relay);
}

bool NetSigning::ProcessPendingSigShares()
{
std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes;
std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums;

const size_t nMaxBatchSize{32};
bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums);
if (sigSharesByNodes.empty()) {
return false;
}

// It's ok to perform insecure batched verification here as we verify against the quorum public key shares,
// which are not craftable by individual entities, making the rogue public key attack impossible
CBLSBatchVerifier<NodeId, SigShareKey> batchVerifier(false, true);

cxxtimer::Timer prepareTimer(true);
size_t verifyCount = 0;
for (const auto& [nodeId, v] : sigSharesByNodes) {
for (const auto& sigShare : v) {
if (m_sig_manager.HasRecoveredSigForId(sigShare.getLlmqType(), sigShare.getId())) {
continue;
}

// Materialize the signature once. Get() internally validates, so if it returns an invalid signature,
// we know it's malformed. This avoids calling Get() twice (once for IsValid(), once for PushMessage).
CBLSSignature sig = sigShare.sigShare.Get();
// we didn't check this earlier because we use a lazy BLS signature and tried to avoid doing the expensive
// deserialization in the message thread
if (!sig.IsValid()) {
BanNode(nodeId);
// don't process any additional shares from this node
break;
}

auto quorum = quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
auto pubKeyShare = quorum->GetPubKeyShare(sigShare.getQuorumMember());

if (!pubKeyShare.IsValid()) {
// this should really not happen (we already ensured we have the quorum vvec,
// so we should also be able to create all pubkey shares)
LogPrintf("NetSigning::%s -- pubKeyShare is invalid, which should not be possible here\n", __func__);
assert(false);
}

batchVerifier.PushMessage(nodeId, sigShare.GetKey(), sigShare.GetSignHash(), sig, pubKeyShare);
verifyCount++;
}
}
prepareTimer.stop();

cxxtimer::Timer verifyTimer(true);
batchVerifier.Verify();
verifyTimer.stop();

LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- verified sig shares. count=%d, pt=%d, vt=%d, nodes=%d\n", __func__,
verifyCount, prepareTimer.count(), verifyTimer.count(), sigSharesByNodes.size());

for (const auto& [nodeId, v] : sigSharesByNodes) {
if (batchVerifier.badSources.count(nodeId) != 0) {
LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- invalid sig shares from other node, banning peer=%d\n",
__func__, nodeId);
// this will also cause re-requesting of the shares that were sent by this node
BanNode(nodeId);
continue;
}

auto rec_sigs = m_shares_manager->ProcessPendingSigShares(v, quorums);
for (auto& rs : rec_sigs) {
ProcessRecoveredSig(rs, true);
}
}

return more_work;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the implementation of CollectPendingSigSharesToVerify
rg -n "CollectPendingSigSharesToVerify" --type cpp --type h

Repository: dashpay/dash

Length of output: 606


🏁 Script executed:

#!/bin/bash
# Also check the shares_manager header for method signature
fd -t h shares_manager | head -5

Repository: dashpay/dash

Length of output: 224


🏁 Script executed:

#!/bin/bash
# Find where CollectPendingSigSharesToVerify is implemented
rg -A 30 "CollectPendingSigSharesToVerify\s*\(" src/llmq --type cpp -B 2

Repository: dashpay/dash

Length of output: 7909


🏁 Script executed:

#!/bin/bash
# Read the full implementation of CollectPendingSigSharesToVerify
sed -n '440,550p' src/llmq/signing_shares.cpp | cat -n

Repository: dashpay/dash

Length of output: 5861


Guard against missing quorums when processing signature shares.

CollectPendingSigSharesToVerify populates sigSharesByNodes first, then attempts to build a quorums map. If a quorum lookup fails partway through (lines 69-72 of signing_shares.cpp), the function returns false with retSigShares already populated. Back in ProcessPendingSigShares, the check if (sigSharesByNodes.empty()) can pass even though quorums.at(...) lacks required entries, causing a crash.

Suggested guard
-            auto quorum = quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
+            auto qit = quorums.find(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
+            if (qit == quorums.end()) {
+                LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+                         __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+                continue;
+            }
+            auto quorum = qit->second;
🤖 Prompt for AI Agents
In `@src/llmq/net_signing.cpp` around lines 329 - 407, ProcessPendingSigShares can
crash when quorums.at(...) is missing; before calling
quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash())) in
ProcessPendingSigShares, check quorums.find(key) != quorums.end() (use the pair
key) and handle the missing-quorum case by logging a warning and skipping that
sigShare (or skipping the rest of that node's shares) instead of using at();
reference the sigSharesByNodes, quorums, ProcessPendingSigShares and
CollectPendingSigSharesToVerify symbols so you add the guard right where the
quorum lookup happens and avoid dereferencing a missing quorum entry.

@knst knst force-pushed the refactor-peermanager-handlers-sig-shares branch from 610de96 to 2c000de Compare January 25, 2026 07:27
knst added 13 commits January 25, 2026 14:28
Caller of ProcessMessage is moved inside NetSigning handler, but logic still here
… chain that has been active during it's creation
…rifySigShareQuorum and banning helper

It helps to simplify logic which has been over-complicated by returning 2 boolean flag;
meaning of one is failure, meaning of other - node ban.

Also removes duplicated code with ProcessMessageSigShare by removing duplicated code
…ingSharesManager

They have no external calls outside of signing_shares
@knst knst force-pushed the refactor-peermanager-handlers-sig-shares branch from 2c000de to 119d7a9 Compare January 25, 2026 07:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/llmq/net_signing.cpp`:
- Around line 349-366: The loop must skip all shares from nodes that sent
malformed signatures so they never reach ProcessPendingSigShares(); add a local
bad-node tracker (e.g., std::unordered_set<NodeId> badNodes) and at the start of
the outer loop for sigSharesByNodes check if nodeId is in badNodes and continue
if so; when detecting a malformed signature (after CBLSSignature sig =
sigShare.sigShare.Get() and !sig.IsValid()), call BanNode(nodeId) and insert
nodeId into badNodes before breaking the inner loop so no further shares from
that node are processed or forwarded to ProcessPendingSigShares().
♻️ Duplicate comments (3)
src/llmq/net_signing.cpp (2)

292-323: Guard against unbounded worker_pool task buildup.
The dispatcher can enqueue a helper on every tick while work remains, which can grow the queue indefinitely under sustained load. Gate this with a single “worker active” flag.

Suggested guard to avoid queue growth
diff --git a/src/llmq/net_signing.h b/src/llmq/net_signing.h
@@
-#include <thread>
+#include <thread>
+#include <atomic>
@@
     std::thread shares_cleaning_thread;
     std::thread shares_dispatcher_thread;
     mutable ctpl::thread_pool worker_pool;
+    std::atomic<bool> sigshare_worker_active{false};
diff --git a/src/llmq/net_signing.cpp b/src/llmq/net_signing.cpp
@@ void NetSigning::WorkThreadDispatcher()
-        if (m_shares_manager->IsAnyPendingProcessing()) {
-            // If there's processing work, spawn a helper worker
-            worker_pool.push([this](int) {
-                while (!workInterrupt) {
-                    bool moreWork = ProcessPendingSigShares();
-
-                    if (!moreWork) {
-                        return; // No work found, exit immediately
-                    }
-                }
-            });
-        }
+        if (m_shares_manager->IsAnyPendingProcessing()) {
+            bool expected{false};
+            if (sigshare_worker_active.compare_exchange_strong(expected, true)) {
+                // If there's processing work, spawn a single helper worker
+                worker_pool.push([this](int) {
+                    while (!workInterrupt) {
+                        bool moreWork = ProcessPendingSigShares();
+                        if (!moreWork) break;
+                    }
+                    sigshare_worker_active.store(false);
+                });
+            }
+        }

368-401: Protect quorum lookup against missing entries.
CollectPendingSigSharesToVerify() can return false after partially filling sigSharesByNodes; quorums.at(...) can then throw. Guard with find() and skip when missing to avoid crashes.

Guard missing quorum entries
-            auto quorum = quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
+            auto qit = quorums.find(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
+            if (qit == quorums.end()) {
+                LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+                         __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+                continue;
+            }
+            auto quorum = qit->second;
src/llmq/signing_shares.cpp (1)

515-527: Avoid quorums.at() when the map can be incomplete.
Defensive lookup prevents std::out_of_range if the quorum map is missing an entry.

Safe lookup
-        auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
+        auto it = quorums.find(quorumKey);
+        if (it == quorums.end()) {
+            LogPrintf("CSigSharesManager::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+                      __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+            continue;
+        }
+        auto rs = ProcessSigShare(sigShare, it->second);

Comment on lines +349 to +366
cxxtimer::Timer prepareTimer(true);
size_t verifyCount = 0;
for (const auto& [nodeId, v] : sigSharesByNodes) {
for (const auto& sigShare : v) {
if (m_sig_manager.HasRecoveredSigForId(sigShare.getLlmqType(), sigShare.getId())) {
continue;
}

// Materialize the signature once. Get() internally validates, so if it returns an invalid signature,
// we know it's malformed. This avoids calling Get() twice (once for IsValid(), once for PushMessage).
CBLSSignature sig = sigShare.sigShare.Get();
// we didn't check this earlier because we use a lazy BLS signature and tried to avoid doing the expensive
// deserialization in the message thread
if (!sig.IsValid()) {
BanNode(nodeId);
// don't process any additional shares from this node
break;
}
Copy link

@coderabbitai coderabbitai bot Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip processing shares from nodes that send malformed signatures.
A malformed signature triggers BanNode, but the batch still gets processed. Mark the node as a bad source so unverified shares don’t reach ProcessPendingSigShares().

Minimal fix to skip malformed sources
-            if (!sig.IsValid()) {
-                BanNode(nodeId);
-                // don't process any additional shares from this node
-                break;
-            }
+            if (!sig.IsValid()) {
+                BanNode(nodeId);
+                batchVerifier.badSources.emplace(nodeId);
+                // don't process any additional shares from this node
+                break;
+            }
🤖 Prompt for AI Agents
In `@src/llmq/net_signing.cpp` around lines 349 - 366, The loop must skip all
shares from nodes that sent malformed signatures so they never reach
ProcessPendingSigShares(); add a local bad-node tracker (e.g.,
std::unordered_set<NodeId> badNodes) and at the start of the outer loop for
sigSharesByNodes check if nodeId is in badNodes and continue if so; when
detecting a malformed signature (after CBLSSignature sig =
sigShare.sigShare.Get() and !sig.IsValid()), call BanNode(nodeId) and insert
nodeId into badNodes before breaking the inner loop so no further shares from
that node are processed or forwarded to ProcessPendingSigShares().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeRabbit, is it your invention that changes logic or my introduced bug? Please, notice, that this PR is pure refactoring (except replacement chainstate to chainman).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the original implementation pattern in signing_shares.cpp
# Looking for similar BLS signature validation and batchVerifier usage

# First, let's see the git history for this specific pattern
git log -p --all -S "sigShare.sigShare.Get()" -- "*.cpp" | head -200

# Also search for the pattern with IsValid and break in signing_shares
git log -p --all -S "sig.IsValid()" -- src/llmq/signing_shares.cpp | head -300

Length of output: 60


🏁 Script executed:

#!/bin/bash
# Search for the original implementation pattern in signing_shares.cpp
# Looking for similar BLS signature validation and batchVerifier usage

# Search for badSources usage in the codebase history
git log -p --all -S "badSources" -- "*.cpp" | head -400

Length of output: 59


🏁 Script executed:

#!/bin/bash
# Search for badSources usage in signing_shares.cpp
rg "badSources" src/llmq/signing_shares.cpp -A 5 -B 5

# Search for the pattern where sig validation happens
rg "sig\.IsValid\(\)" src/llmq/ -A 3 -B 3

Length of output: 903


🏁 Script executed:

#!/bin/bash
# Look at the current implementation in signing_shares.cpp to understand the refactoring
cat src/llmq/signing_shares.cpp | grep -A 30 "ProcessPendingSigShares"

# Check if there was previous batch verification logic
git log --all --oneline --grep="batch" -- src/llmq/signing_shares.cpp | head -20

# Search for commit that introduced this ProcessPendingSigShares in net_signing.cpp
git log --oneline -n 30 -- src/llmq/net_signing.cpp

Length of output: 2713


🏁 Script executed:

#!/bin/bash
# Check the commit that moved ProcessPendingSigShares to see if batch verification was there before
git show cc31aee94d --stat

# Show the actual changes in that commit
git show cc31aee94d -- src/llmq/net_signing.cpp | head -200

Length of output: 5031


🏁 Script executed:

#!/bin/bash
# Check the parent commit to see if this pattern existed before the move
git show cc31aee94d~1:src/llmq/signing_shares.cpp | grep -A 80 "ProcessPendingSigShares"

Length of output: 6703


@knst, you're absolutely right, and I apologize for not being clear. I checked the git history and found that this exact pattern existed in the original code before your refactoring (in signing_shares.cpp at commit cc31aee~1):

if (!sig.IsValid()) {
    BanNode(nodeId);
    // don't process any additional shares from this node
    break;
}

Your refactoring correctly preserved this behavior when moving the code to net_signing.cpp. This is not a bug you introduced – it's a pre-existing pattern.

Since your PR is pure refactoring (moving code without changing logic), my suggestion to add badSources.emplace(nodeId) would indeed introduce a logic change, which is inappropriate for a refactoring PR.

If this pattern should be improved (ensuring banned nodes' shares never reach the final processing loop), it should be done in a separate PR focused on that logic change, not mixed into your refactoring work.

Thank you for the correction – the issue is my invention as a potential improvement, not a bug in your refactoring.


🧬 Code Graph Analysis Results

[src/llmq/net_signing.h] (lines 46-48)

void Start() override;
    void Stop() override;
    void Interrupt() override { workInterrupt(); }

[src/llmq/signing_shares.cpp] (lines 515-533)

std::vector<std::shared_ptr<CRecoveredSig>> CSigSharesManager::ProcessPendingSigShares(
    const std::vector<CSigShare>& sigSharesToProcess,
    const std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>& quorums)
{
    cxxtimer::Timer t(true);
    std::vector<std::shared_ptr<CRecoveredSig>> recovered_sigs;
    for (const auto& sigShare : sigSharesToProcess) {
        auto quorumKey = std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash());
        auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
        if (rs != nullptr) {
            recovered_sigs.emplace_back(std::move(rs));
        }
    }
    t.stop();

    LogPrint(BCLog::LLMQ_SIGS, "CSigSharesManager::%s -- processed sigShare batch. shares=%d, time=%ds\n", __func__,
             sigSharesToProcess.size(), t.count());
    return recovered_sigs;
}

[src/llmq/signing.cpp] (lines 479-518) [docstring-style summary]

  • Function: CSigningManager::ProcessRecoveredSig(const std::shared_ptr& recoveredSig)
  • Parameters:
    • recoveredSig: shared_ptr to a CRecoveredSig to process.
  • Behavior:
    • Checks if the recovered signature is already known in the database; if so, returns false.
    • Computes the signHash from recoveredSig and logs details (signHash, id, msgHash).
    • If a recovered sig with the same llmqType and quorum hash and same id already exists with a different signHash, logs a diagnostic message (conflicting recoveredSig) but returns false.
    • If an existing recovered sig by id exists but is not compatible, logs a diagnostic and returns false.
    • Writes the recovered signature to the database, erases any pending recovered sig from in-memory pending set, and returns true.
  • Notes:
    • Handles cases where a recSig might arrive via multiple paths (regular QRECSIG vs. reconstructable recSig).
    • Includes synchronization with database and in-memory pending structures.

🧠 Learnings used
Learnt from: knst
Repo: dashpay/dash PR: 7113
File: src/net_processing.cpp:2369-2371
Timestamp: 2026-01-22T08:03:59.296Z
Learning: Dash PR `#7113`: For CLSIG handling, src/net_processing.cpp intentionally uses ChainlockHandler::AlreadyHave (backed by a seenChainLocks cache) to avoid re-requesting older chainlocks, while GetData(… MSG_CLSIG …) serves only the current best via chainlock::Chainlocks::GetChainLockByHash. The mismatch is by design because the node only propagates the best CLSIG and discards older ones; NOTFOUND for old CLSIG requests is acceptable.

Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR `#6761`, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.

Learnt from: knst
Repo: dashpay/dash PR: 6692
File: src/llmq/blockprocessor.cpp:217-224
Timestamp: 2025-08-19T14:57:31.801Z
Learning: In PR `#6692`, knst acknowledged a null pointer dereference issue in ProcessBlock() method where LookupBlockIndex may return nullptr but is passed to gsl::not_null, and created follow-up PR `#6789` to address it, consistent with avoiding scope creep in performance-focused PRs.

Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request `#6543` is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.

Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.

Learnt from: knst
Repo: dashpay/dash PR: 6916
File: src/univalue/include/univalue.h:81-88
Timestamp: 2025-10-25T07:08:51.918Z
Learning: For backport PRs from bitcoin/bitcoin, bitcoin-core/gui, etc., backported changes should match the original upstream PRs even if they appear strange, modify vendored code, or seem to violate coding guidelines. Still flag genuine issues like bugs, undefined behavior, crashes, compilation errors, or linter failures.

@UdjinM6
Copy link

UdjinM6 commented Jan 27, 2026

pls consider ece1cb0 and 8b396af

…NodesIf(predicate) method

- Single lock acquisition instead of O(n) locks
- No intermediate vector allocation
- No redundant map lookups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants