-
Notifications
You must be signed in to change notification settings - Fork 1.2k
refactor: separate network and consensus logic 2b/N [sig shares] #7115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
refactor: separate network and consensus logic 2b/N [sig shares] #7115
Conversation
|
5695bc2 to
f9ca10a
Compare
f9ca10a to
f0802c0
Compare
WalkthroughThe PR removes PeerManager from ActiveContext construction and deletes ActiveContext::NotifyRecoveredSig; ActiveContext no longer starts, interrupts, or stops CSigSharesManager. NetSigning gains CValidationInterface, accepts a CSigSharesManager* and CSporkManager&, adds worker threads and a thread pool, expands message handling for sig-share and recovered-sig types with per-node banning and batching, and adds recovered-sig notification/dispatch. CSigSharesManager is refactored to use ChainstateManager, returns recovered signatures from processing APIs, removes internal worker threads and NotifyRecoveredSig, and exposes node/ban accessors. PeerManagerInternal adds PeerIsBanned and PeerRelayRecoveredSig; PeerManagerImpl implements them and RelayRecoveredSig supports proactive relays. Circular-dependency expectations updated. Sequence Diagram(s)sequenceDiagram
participant Node as P2P Node
participant NetSign as NetSigning
participant Shares as CSigSharesManager
participant PeerMgr as PeerManager
participant Peers as Other Peers
Node->>NetSign: Incoming sig-share / recovered-sig message
NetSign->>Shares: Validate/process sig-shares (single or batched)
alt Recovered sig produced
Shares-->>NetSign: std::shared_ptr<CRecoveredSig>
NetSign->>NetSign: ProcessRecoveredSig (validate, decide proactive)
NetSign->>PeerMgr: PeerRelayRecoveredSig(sig, proactive_relay)
PeerMgr->>Peers: Relay INV or targeted QSIGREC (proactive or broadcast)
else No valid sig / invalid message
NetSign-->>Node: Drop / ban node (per checks)
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRsSuggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/llmq/net_signing.cpp`:
- Around line 292-323: WorkThreadDispatcher currently pushes a helper task every
tick when m_shares_manager->IsAnyPendingProcessing() is true, allowing unbounded
worker_pool growth; add a single "processing worker active" flag (e.g.,
std::atomic<bool> m_processing_worker_active) and change the helper enqueue
logic in NetSigning::WorkThreadDispatcher to atomically test-and-set the flag
before pushing the helper, so only one helper is queued at a time; inside the
helper (the lambda that loops calling ProcessPendingSigShares), ensure the flag
is cleared (reset to false) when the helper exits (including all early returns
and normal completion) so another helper can be scheduled later, using
compare_exchange/store to maintain thread-safety and avoid races with
workInterrupt checks.
In `@src/llmq/signing_shares.cpp`:
- Around line 1345-1355: In CSigSharesManager::GetAllNodes(), the local
declaration "vector<NodeId> nodes" lacks the std:: qualification and causes a
compile error; change that declaration to use std::vector<NodeId> nodes so it
compiles (location: function CSigSharesManager::GetAllNodes, around the
nodes.reserve/nodeStates loop and LOCK(cs) block).
- Around line 521-533: ProcessPendingSigShares currently uses
quorums.at(quorumKey) which can throw if the quorum map loses an entry; change
the lookup to use quorums.find(quorumKey) and skip (and optionally log) that
sigShare when not found instead of calling at(). Locate ProcessPendingSigShares
and replace the direct at() call with a find() check on quorums, only call
ProcessSigShare(sigShare, it->second) when the iterator is valid, and continue
to the next sigShare (log a warning including quorumKey/quorum identifiers if
desired).
In `@src/net_processing.cpp`:
- Around line 654-655: PeerIsBanned's locking annotation doesn't match
IsBanned's contract (IsBanned requires cs_main and !m_peer_mutex), which can
break static thread-safety checks; update PeerIsBanned (and the similar methods
around PeerEraseObjectRequest) to either declare
EXCLUSIVE_LOCKS_REQUIRED(cs_main, !m_peer_mutex) to match IsBanned, or modify
the implementation to take/hold cs_main locally (e.g., acquire cs_main inside
PeerIsBanned before calling IsBanned) and update the annotation to reflect the
new behavior so the lock discipline is consistent across PeerIsBanned, IsBanned,
and related methods.
| void NetSigning::WorkThreadDispatcher() | ||
| { | ||
| assert(m_shares_manager); | ||
|
|
||
| while (!workInterrupt) { | ||
| // Dispatch all pending signs (individual tasks) | ||
| { | ||
| auto signs = m_shares_manager->DispatchPendingSigns(); | ||
| // Dispatch all signs to worker pool | ||
| for (auto& work : signs) { | ||
| if (workInterrupt) break; | ||
|
|
||
| worker_pool.push([this, work = std::move(work)](int) mutable { | ||
| auto rs = m_shares_manager->SignAndProcessSingleShare(std::move(work)); | ||
| ProcessRecoveredSig(rs, true); | ||
| }); | ||
| } | ||
| } | ||
|
|
||
| if (m_shares_manager->IsAnyPendingProcessing()) { | ||
| // If there's processing work, spawn a helper worker | ||
| worker_pool.push([this](int) { | ||
| while (!workInterrupt) { | ||
| bool moreWork = ProcessPendingSigShares(); | ||
|
|
||
| if (!moreWork) { | ||
| return; // No work found, exit immediately | ||
| } | ||
| } | ||
| }); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against unbounded worker_pool task buildup.
The dispatcher enqueues a helper on every tick while work is pending, even if a helper is already running. Under sustained load, this can grow the queue indefinitely and waste memory/CPU. Gate this with a single “processing worker active” flag so only one helper is queued at a time.
Suggested fix to avoid queue growth
diff --git a/src/llmq/net_signing.h b/src/llmq/net_signing.h
@@
-#include <thread>
+#include <thread>
+#include <atomic>
@@
std::thread shares_cleaning_thread;
std::thread shares_dispatcher_thread;
mutable ctpl::thread_pool worker_pool;
+ std::atomic<bool> sigshare_worker_active{false};
diff --git a/src/llmq/net_signing.cpp b/src/llmq/net_signing.cpp
@@ void NetSigning::WorkThreadDispatcher()
- if (m_shares_manager->IsAnyPendingProcessing()) {
- // If there's processing work, spawn a helper worker
- worker_pool.push([this](int) {
- while (!workInterrupt) {
- bool moreWork = ProcessPendingSigShares();
-
- if (!moreWork) {
- return; // No work found, exit immediately
- }
- }
- });
- }
+ if (m_shares_manager->IsAnyPendingProcessing()) {
+ bool expected{false};
+ if (sigshare_worker_active.compare_exchange_strong(expected, true)) {
+ // If there's processing work, spawn a single helper worker
+ worker_pool.push([this](int) {
+ while (!workInterrupt) {
+ bool moreWork = ProcessPendingSigShares();
+ if (!moreWork) break;
+ }
+ sigshare_worker_active.store(false);
+ });
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void NetSigning::WorkThreadDispatcher() | |
| { | |
| assert(m_shares_manager); | |
| while (!workInterrupt) { | |
| // Dispatch all pending signs (individual tasks) | |
| { | |
| auto signs = m_shares_manager->DispatchPendingSigns(); | |
| // Dispatch all signs to worker pool | |
| for (auto& work : signs) { | |
| if (workInterrupt) break; | |
| worker_pool.push([this, work = std::move(work)](int) mutable { | |
| auto rs = m_shares_manager->SignAndProcessSingleShare(std::move(work)); | |
| ProcessRecoveredSig(rs, true); | |
| }); | |
| } | |
| } | |
| if (m_shares_manager->IsAnyPendingProcessing()) { | |
| // If there's processing work, spawn a helper worker | |
| worker_pool.push([this](int) { | |
| while (!workInterrupt) { | |
| bool moreWork = ProcessPendingSigShares(); | |
| if (!moreWork) { | |
| return; // No work found, exit immediately | |
| } | |
| } | |
| }); | |
| } | |
| void NetSigning::WorkThreadDispatcher() | |
| { | |
| assert(m_shares_manager); | |
| while (!workInterrupt) { | |
| // Dispatch all pending signs (individual tasks) | |
| { | |
| auto signs = m_shares_manager->DispatchPendingSigns(); | |
| // Dispatch all signs to worker pool | |
| for (auto& work : signs) { | |
| if (workInterrupt) break; | |
| worker_pool.push([this, work = std::move(work)](int) mutable { | |
| auto rs = m_shares_manager->SignAndProcessSingleShare(std::move(work)); | |
| ProcessRecoveredSig(rs, true); | |
| }); | |
| } | |
| } | |
| if (m_shares_manager->IsAnyPendingProcessing()) { | |
| bool expected{false}; | |
| if (sigshare_worker_active.compare_exchange_strong(expected, true)) { | |
| // If there's processing work, spawn a single helper worker | |
| worker_pool.push([this](int) { | |
| while (!workInterrupt) { | |
| bool moreWork = ProcessPendingSigShares(); | |
| if (!moreWork) break; | |
| } | |
| sigshare_worker_active.store(false); | |
| }); | |
| } | |
| } | |
🤖 Prompt for AI Agents
In `@src/llmq/net_signing.cpp` around lines 292 - 323, WorkThreadDispatcher
currently pushes a helper task every tick when
m_shares_manager->IsAnyPendingProcessing() is true, allowing unbounded
worker_pool growth; add a single "processing worker active" flag (e.g.,
std::atomic<bool> m_processing_worker_active) and change the helper enqueue
logic in NetSigning::WorkThreadDispatcher to atomically test-and-set the flag
before pushing the helper, so only one helper is queued at a time; inside the
helper (the lambda that loops calling ProcessPendingSigShares), ensure the flag
is cleared (reset to false) when the helper exits (including all early returns
and normal completion) so another helper can be scheduled later, using
compare_exchange/store to maintain thread-safety and avoid races with
workInterrupt checks.
| std::vector<std::shared_ptr<CRecoveredSig>> CSigSharesManager::ProcessPendingSigShares( | ||
| const std::vector<CSigShare>& sigSharesToProcess, | ||
| const std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>& quorums) | ||
| { | ||
| cxxtimer::Timer t(true); | ||
| std::vector<std::shared_ptr<CRecoveredSig>> recovered_sigs; | ||
| for (const auto& sigShare : sigSharesToProcess) { | ||
| auto quorumKey = std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()); | ||
| ProcessSigShare(sigShare, quorums.at(quorumKey)); | ||
| auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey)); | ||
| if (rs != nullptr) { | ||
| recovered_sigs.emplace_back(std::move(rs)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid potential out_of_range when quorum map is incomplete.
quorums.at(quorumKey) will throw if a quorum disappears between collection and processing. Guard it with find() and skip/log when missing to prevent a hard crash.
Safe lookup to avoid exceptions
@@
- auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
+ auto it = quorums.find(quorumKey);
+ if (it == quorums.end()) {
+ LogPrintf("CSigSharesManager::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+ __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+ continue;
+ }
+ auto rs = ProcessSigShare(sigShare, it->second);🤖 Prompt for AI Agents
In `@src/llmq/signing_shares.cpp` around lines 521 - 533, ProcessPendingSigShares
currently uses quorums.at(quorumKey) which can throw if the quorum map loses an
entry; change the lookup to use quorums.find(quorumKey) and skip (and optionally
log) that sigShare when not found instead of calling at(). Locate
ProcessPendingSigShares and replace the direct at() call with a find() check on
quorums, only call ProcessSigShare(sigShare, it->second) when the iterator is
valid, and continue to the next sigShare (log a warning including
quorumKey/quorum identifiers if desired).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/llmq/net_signing.cpp`:
- Around line 329-407: ProcessPendingSigShares can crash when quorums.at(...) is
missing; before calling quorums.at(std::make_pair(sigShare.getLlmqType(),
sigShare.getQuorumHash())) in ProcessPendingSigShares, check quorums.find(key)
!= quorums.end() (use the pair key) and handle the missing-quorum case by
logging a warning and skipping that sigShare (or skipping the rest of that
node's shares) instead of using at(); reference the sigSharesByNodes, quorums,
ProcessPendingSigShares and CollectPendingSigSharesToVerify symbols so you add
the guard right where the quorum lookup happens and avoid dereferencing a
missing quorum entry.
♻️ Duplicate comments (3)
src/llmq/net_signing.cpp (1)
292-323: Guard against unbounded worker_pool task buildup.
This is the same issue noted previously: the dispatcher can enqueue a helper every tick under sustained load. Please gate with a single “worker active” flag.src/llmq/signing_shares.cpp (2)
520-539: Avoid potentialout_of_rangeonquorums.at(...).
This repeats the earlier concern: if the quorum map is incomplete,at()will throw. Please guard withfind()and skip/log missing quorums.✅ Suggested guard
- auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey)); + auto it = quorums.find(quorumKey); + if (it == quorums.end()) { + LogPrintf("CSigSharesManager::%s -- missing quorum for llmqType=%d quorumHash=%s\n", + __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString()); + continue; + } + auto rs = ProcessSigShare(sigShare, it->second);
1345-1355: Fix missingstd::qualification (compile error).
Line 1347 usesvector<NodeId>without astd::qualifier orusingdeclaration.✅ Minimal fix
- vector<NodeId> nodes; + std::vector<NodeId> nodes;
| void NetSigning::NotifyRecoveredSig(const std::shared_ptr<const CRecoveredSig>& sig, bool proactive_relay) | ||
| { | ||
| m_peer_manager->PeerRelayRecoveredSig(*sig, proactive_relay); | ||
| } | ||
|
|
||
| bool NetSigning::ProcessPendingSigShares() | ||
| { | ||
| std::unordered_map<NodeId, std::vector<CSigShare>> sigSharesByNodes; | ||
| std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher> quorums; | ||
|
|
||
| const size_t nMaxBatchSize{32}; | ||
| bool more_work = m_shares_manager->CollectPendingSigSharesToVerify(nMaxBatchSize, sigSharesByNodes, quorums); | ||
| if (sigSharesByNodes.empty()) { | ||
| return false; | ||
| } | ||
|
|
||
| // It's ok to perform insecure batched verification here as we verify against the quorum public key shares, | ||
| // which are not craftable by individual entities, making the rogue public key attack impossible | ||
| CBLSBatchVerifier<NodeId, SigShareKey> batchVerifier(false, true); | ||
|
|
||
| cxxtimer::Timer prepareTimer(true); | ||
| size_t verifyCount = 0; | ||
| for (const auto& [nodeId, v] : sigSharesByNodes) { | ||
| for (const auto& sigShare : v) { | ||
| if (m_sig_manager.HasRecoveredSigForId(sigShare.getLlmqType(), sigShare.getId())) { | ||
| continue; | ||
| } | ||
|
|
||
| // Materialize the signature once. Get() internally validates, so if it returns an invalid signature, | ||
| // we know it's malformed. This avoids calling Get() twice (once for IsValid(), once for PushMessage). | ||
| CBLSSignature sig = sigShare.sigShare.Get(); | ||
| // we didn't check this earlier because we use a lazy BLS signature and tried to avoid doing the expensive | ||
| // deserialization in the message thread | ||
| if (!sig.IsValid()) { | ||
| BanNode(nodeId); | ||
| // don't process any additional shares from this node | ||
| break; | ||
| } | ||
|
|
||
| auto quorum = quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash())); | ||
| auto pubKeyShare = quorum->GetPubKeyShare(sigShare.getQuorumMember()); | ||
|
|
||
| if (!pubKeyShare.IsValid()) { | ||
| // this should really not happen (we already ensured we have the quorum vvec, | ||
| // so we should also be able to create all pubkey shares) | ||
| LogPrintf("NetSigning::%s -- pubKeyShare is invalid, which should not be possible here\n", __func__); | ||
| assert(false); | ||
| } | ||
|
|
||
| batchVerifier.PushMessage(nodeId, sigShare.GetKey(), sigShare.GetSignHash(), sig, pubKeyShare); | ||
| verifyCount++; | ||
| } | ||
| } | ||
| prepareTimer.stop(); | ||
|
|
||
| cxxtimer::Timer verifyTimer(true); | ||
| batchVerifier.Verify(); | ||
| verifyTimer.stop(); | ||
|
|
||
| LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- verified sig shares. count=%d, pt=%d, vt=%d, nodes=%d\n", __func__, | ||
| verifyCount, prepareTimer.count(), verifyTimer.count(), sigSharesByNodes.size()); | ||
|
|
||
| for (const auto& [nodeId, v] : sigSharesByNodes) { | ||
| if (batchVerifier.badSources.count(nodeId) != 0) { | ||
| LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- invalid sig shares from other node, banning peer=%d\n", | ||
| __func__, nodeId); | ||
| // this will also cause re-requesting of the shares that were sent by this node | ||
| BanNode(nodeId); | ||
| continue; | ||
| } | ||
|
|
||
| auto rec_sigs = m_shares_manager->ProcessPendingSigShares(v, quorums); | ||
| for (auto& rs : rec_sigs) { | ||
| ProcessRecoveredSig(rs, true); | ||
| } | ||
| } | ||
|
|
||
| return more_work; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the implementation of CollectPendingSigSharesToVerify
rg -n "CollectPendingSigSharesToVerify" --type cpp --type hRepository: dashpay/dash
Length of output: 606
🏁 Script executed:
#!/bin/bash
# Also check the shares_manager header for method signature
fd -t h shares_manager | head -5Repository: dashpay/dash
Length of output: 224
🏁 Script executed:
#!/bin/bash
# Find where CollectPendingSigSharesToVerify is implemented
rg -A 30 "CollectPendingSigSharesToVerify\s*\(" src/llmq --type cpp -B 2Repository: dashpay/dash
Length of output: 7909
🏁 Script executed:
#!/bin/bash
# Read the full implementation of CollectPendingSigSharesToVerify
sed -n '440,550p' src/llmq/signing_shares.cpp | cat -nRepository: dashpay/dash
Length of output: 5861
Guard against missing quorums when processing signature shares.
CollectPendingSigSharesToVerify populates sigSharesByNodes first, then attempts to build a quorums map. If a quorum lookup fails partway through (lines 69-72 of signing_shares.cpp), the function returns false with retSigShares already populated. Back in ProcessPendingSigShares, the check if (sigSharesByNodes.empty()) can pass even though quorums.at(...) lacks required entries, causing a crash.
Suggested guard
- auto quorum = quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
+ auto qit = quorums.find(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash()));
+ if (qit == quorums.end()) {
+ LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- missing quorum for llmqType=%d quorumHash=%s\n",
+ __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString());
+ continue;
+ }
+ auto quorum = qit->second;🤖 Prompt for AI Agents
In `@src/llmq/net_signing.cpp` around lines 329 - 407, ProcessPendingSigShares can
crash when quorums.at(...) is missing; before calling
quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash())) in
ProcessPendingSigShares, check quorums.find(key) != quorums.end() (use the pair
key) and handle the missing-quorum case by logging a warning and skipping that
sigShare (or skipping the rest of that node's shares) instead of using at();
reference the sigSharesByNodes, quorums, ProcessPendingSigShares and
CollectPendingSigSharesToVerify symbols so you add the guard right where the
quorum lookup happens and avoid dereferencing a missing quorum entry.
610de96 to
2c000de
Compare
Caller of ProcessMessage is moved inside NetSigning handler, but logic still here
… chain that has been active during it's creation
…_shares and cleanup
…rifySigShareQuorum and banning helper It helps to simplify logic which has been over-complicated by returning 2 boolean flag; meaning of one is failure, meaning of other - node ban. Also removes duplicated code with ProcessMessageSigShare by removing duplicated code
…ingSharesManager They have no external calls outside of signing_shares
2c000de to
119d7a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/llmq/net_signing.cpp`:
- Around line 349-366: The loop must skip all shares from nodes that sent
malformed signatures so they never reach ProcessPendingSigShares(); add a local
bad-node tracker (e.g., std::unordered_set<NodeId> badNodes) and at the start of
the outer loop for sigSharesByNodes check if nodeId is in badNodes and continue
if so; when detecting a malformed signature (after CBLSSignature sig =
sigShare.sigShare.Get() and !sig.IsValid()), call BanNode(nodeId) and insert
nodeId into badNodes before breaking the inner loop so no further shares from
that node are processed or forwarded to ProcessPendingSigShares().
♻️ Duplicate comments (3)
src/llmq/net_signing.cpp (2)
292-323: Guard against unboundedworker_pooltask buildup.
The dispatcher can enqueue a helper on every tick while work remains, which can grow the queue indefinitely under sustained load. Gate this with a single “worker active” flag.Suggested guard to avoid queue growth
diff --git a/src/llmq/net_signing.h b/src/llmq/net_signing.h @@ -#include <thread> +#include <thread> +#include <atomic> @@ std::thread shares_cleaning_thread; std::thread shares_dispatcher_thread; mutable ctpl::thread_pool worker_pool; + std::atomic<bool> sigshare_worker_active{false};diff --git a/src/llmq/net_signing.cpp b/src/llmq/net_signing.cpp @@ void NetSigning::WorkThreadDispatcher() - if (m_shares_manager->IsAnyPendingProcessing()) { - // If there's processing work, spawn a helper worker - worker_pool.push([this](int) { - while (!workInterrupt) { - bool moreWork = ProcessPendingSigShares(); - - if (!moreWork) { - return; // No work found, exit immediately - } - } - }); - } + if (m_shares_manager->IsAnyPendingProcessing()) { + bool expected{false}; + if (sigshare_worker_active.compare_exchange_strong(expected, true)) { + // If there's processing work, spawn a single helper worker + worker_pool.push([this](int) { + while (!workInterrupt) { + bool moreWork = ProcessPendingSigShares(); + if (!moreWork) break; + } + sigshare_worker_active.store(false); + }); + } + }
368-401: Protect quorum lookup against missing entries.
CollectPendingSigSharesToVerify()can returnfalseafter partially fillingsigSharesByNodes;quorums.at(...)can then throw. Guard withfind()and skip when missing to avoid crashes.Guard missing quorum entries
- auto quorum = quorums.at(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash())); + auto qit = quorums.find(std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash())); + if (qit == quorums.end()) { + LogPrint(BCLog::LLMQ_SIGS, "NetSigning::%s -- missing quorum for llmqType=%d quorumHash=%s\n", + __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString()); + continue; + } + auto quorum = qit->second;src/llmq/signing_shares.cpp (1)
515-527: Avoidquorums.at()when the map can be incomplete.
Defensive lookup preventsstd::out_of_rangeif the quorum map is missing an entry.Safe lookup
- auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey)); + auto it = quorums.find(quorumKey); + if (it == quorums.end()) { + LogPrintf("CSigSharesManager::%s -- missing quorum for llmqType=%d quorumHash=%s\n", + __func__, ToUnderlying(sigShare.getLlmqType()), sigShare.getQuorumHash().ToString()); + continue; + } + auto rs = ProcessSigShare(sigShare, it->second);
| cxxtimer::Timer prepareTimer(true); | ||
| size_t verifyCount = 0; | ||
| for (const auto& [nodeId, v] : sigSharesByNodes) { | ||
| for (const auto& sigShare : v) { | ||
| if (m_sig_manager.HasRecoveredSigForId(sigShare.getLlmqType(), sigShare.getId())) { | ||
| continue; | ||
| } | ||
|
|
||
| // Materialize the signature once. Get() internally validates, so if it returns an invalid signature, | ||
| // we know it's malformed. This avoids calling Get() twice (once for IsValid(), once for PushMessage). | ||
| CBLSSignature sig = sigShare.sigShare.Get(); | ||
| // we didn't check this earlier because we use a lazy BLS signature and tried to avoid doing the expensive | ||
| // deserialization in the message thread | ||
| if (!sig.IsValid()) { | ||
| BanNode(nodeId); | ||
| // don't process any additional shares from this node | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip processing shares from nodes that send malformed signatures.
A malformed signature triggers BanNode, but the batch still gets processed. Mark the node as a bad source so unverified shares don’t reach ProcessPendingSigShares().
Minimal fix to skip malformed sources
- if (!sig.IsValid()) {
- BanNode(nodeId);
- // don't process any additional shares from this node
- break;
- }
+ if (!sig.IsValid()) {
+ BanNode(nodeId);
+ batchVerifier.badSources.emplace(nodeId);
+ // don't process any additional shares from this node
+ break;
+ }🤖 Prompt for AI Agents
In `@src/llmq/net_signing.cpp` around lines 349 - 366, The loop must skip all
shares from nodes that sent malformed signatures so they never reach
ProcessPendingSigShares(); add a local bad-node tracker (e.g.,
std::unordered_set<NodeId> badNodes) and at the start of the outer loop for
sigSharesByNodes check if nodeId is in badNodes and continue if so; when
detecting a malformed signature (after CBLSSignature sig =
sigShare.sigShare.Get() and !sig.IsValid()), call BanNode(nodeId) and insert
nodeId into badNodes before breaking the inner loop so no further shares from
that node are processed or forwarded to ProcessPendingSigShares().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeRabbit, is it your invention that changes logic or my introduced bug? Please, notice, that this PR is pure refactoring (except replacement chainstate to chainman).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for the original implementation pattern in signing_shares.cpp
# Looking for similar BLS signature validation and batchVerifier usage
# First, let's see the git history for this specific pattern
git log -p --all -S "sigShare.sigShare.Get()" -- "*.cpp" | head -200
# Also search for the pattern with IsValid and break in signing_shares
git log -p --all -S "sig.IsValid()" -- src/llmq/signing_shares.cpp | head -300
Length of output: 60
🏁 Script executed:
#!/bin/bash
# Search for the original implementation pattern in signing_shares.cpp
# Looking for similar BLS signature validation and batchVerifier usage
# Search for badSources usage in the codebase history
git log -p --all -S "badSources" -- "*.cpp" | head -400
Length of output: 59
🏁 Script executed:
#!/bin/bash
# Search for badSources usage in signing_shares.cpp
rg "badSources" src/llmq/signing_shares.cpp -A 5 -B 5
# Search for the pattern where sig validation happens
rg "sig\.IsValid\(\)" src/llmq/ -A 3 -B 3
Length of output: 903
🏁 Script executed:
#!/bin/bash
# Look at the current implementation in signing_shares.cpp to understand the refactoring
cat src/llmq/signing_shares.cpp | grep -A 30 "ProcessPendingSigShares"
# Check if there was previous batch verification logic
git log --all --oneline --grep="batch" -- src/llmq/signing_shares.cpp | head -20
# Search for commit that introduced this ProcessPendingSigShares in net_signing.cpp
git log --oneline -n 30 -- src/llmq/net_signing.cpp
Length of output: 2713
🏁 Script executed:
#!/bin/bash
# Check the commit that moved ProcessPendingSigShares to see if batch verification was there before
git show cc31aee94d --stat
# Show the actual changes in that commit
git show cc31aee94d -- src/llmq/net_signing.cpp | head -200
Length of output: 5031
🏁 Script executed:
#!/bin/bash
# Check the parent commit to see if this pattern existed before the move
git show cc31aee94d~1:src/llmq/signing_shares.cpp | grep -A 80 "ProcessPendingSigShares"
Length of output: 6703
@knst, you're absolutely right, and I apologize for not being clear. I checked the git history and found that this exact pattern existed in the original code before your refactoring (in signing_shares.cpp at commit cc31aee~1):
if (!sig.IsValid()) {
BanNode(nodeId);
// don't process any additional shares from this node
break;
}Your refactoring correctly preserved this behavior when moving the code to net_signing.cpp. This is not a bug you introduced – it's a pre-existing pattern.
Since your PR is pure refactoring (moving code without changing logic), my suggestion to add badSources.emplace(nodeId) would indeed introduce a logic change, which is inappropriate for a refactoring PR.
If this pattern should be improved (ensuring banned nodes' shares never reach the final processing loop), it should be done in a separate PR focused on that logic change, not mixed into your refactoring work.
Thank you for the correction – the issue is my invention as a potential improvement, not a bug in your refactoring.
🧬 Code Graph Analysis Results
[src/llmq/net_signing.h] (lines 46-48)
void Start() override;
void Stop() override;
void Interrupt() override { workInterrupt(); }
[src/llmq/signing_shares.cpp] (lines 515-533)
std::vector<std::shared_ptr<CRecoveredSig>> CSigSharesManager::ProcessPendingSigShares(
const std::vector<CSigShare>& sigSharesToProcess,
const std::unordered_map<std::pair<Consensus::LLMQType, uint256>, CQuorumCPtr, StaticSaltedHasher>& quorums)
{
cxxtimer::Timer t(true);
std::vector<std::shared_ptr<CRecoveredSig>> recovered_sigs;
for (const auto& sigShare : sigSharesToProcess) {
auto quorumKey = std::make_pair(sigShare.getLlmqType(), sigShare.getQuorumHash());
auto rs = ProcessSigShare(sigShare, quorums.at(quorumKey));
if (rs != nullptr) {
recovered_sigs.emplace_back(std::move(rs));
}
}
t.stop();
LogPrint(BCLog::LLMQ_SIGS, "CSigSharesManager::%s -- processed sigShare batch. shares=%d, time=%ds\n", __func__,
sigSharesToProcess.size(), t.count());
return recovered_sigs;
}
[src/llmq/signing.cpp] (lines 479-518) [docstring-style summary]
- Function: CSigningManager::ProcessRecoveredSig(const std::shared_ptr& recoveredSig)
- Parameters:
- recoveredSig: shared_ptr to a CRecoveredSig to process.
- Behavior:
- Checks if the recovered signature is already known in the database; if so, returns false.
- Computes the signHash from recoveredSig and logs details (signHash, id, msgHash).
- If a recovered sig with the same llmqType and quorum hash and same id already exists with a different signHash, logs a diagnostic message (conflicting recoveredSig) but returns false.
- If an existing recovered sig by id exists but is not compatible, logs a diagnostic and returns false.
- Writes the recovered signature to the database, erases any pending recovered sig from in-memory pending set, and returns true.
- Notes:
- Handles cases where a recSig might arrive via multiple paths (regular QRECSIG vs. reconstructable recSig).
- Includes synchronization with database and in-memory pending structures.
🧠 Learnings used
Learnt from: knst
Repo: dashpay/dash PR: 7113
File: src/net_processing.cpp:2369-2371
Timestamp: 2026-01-22T08:03:59.296Z
Learning: Dash PR `#7113`: For CLSIG handling, src/net_processing.cpp intentionally uses ChainlockHandler::AlreadyHave (backed by a seenChainLocks cache) to avoid re-requesting older chainlocks, while GetData(… MSG_CLSIG …) serves only the current best via chainlock::Chainlocks::GetChainLockByHash. The mismatch is by design because the node only propagates the best CLSIG and discards older ones; NOTFOUND for old CLSIG requests is acceptable.
Learnt from: kwvg
Repo: dashpay/dash PR: 6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR `#6761`, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.
Learnt from: knst
Repo: dashpay/dash PR: 6692
File: src/llmq/blockprocessor.cpp:217-224
Timestamp: 2025-08-19T14:57:31.801Z
Learning: In PR `#6692`, knst acknowledged a null pointer dereference issue in ProcessBlock() method where LookupBlockIndex may return nullptr but is passed to gsl::not_null, and created follow-up PR `#6789` to address it, consistent with avoiding scope creep in performance-focused PRs.
Learnt from: kwvg
Repo: dashpay/dash PR: 6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request `#6543` is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6933
File: src/llmq/utils.cpp:284-298
Timestamp: 2025-11-04T18:24:27.241Z
Learning: In consensus-critical code (such as quorum formation, block validation, or deployment activation logic), do not suggest changes to the logic itself even if the implementation appears theoretically incorrect or off-by-one. Consensus rules, once deployed on the Dash network, must be preserved exactly to avoid network forks. Refactoring PRs should maintain perfect behavioral equivalence. Only suggest logic changes if explicitly accompanied by a DIP (Dash Improvement Proposal) or if the maintainer indicates the consensus rule needs to be changed with appropriate activation logic.
Learnt from: knst
Repo: dashpay/dash PR: 6916
File: src/univalue/include/univalue.h:81-88
Timestamp: 2025-10-25T07:08:51.918Z
Learning: For backport PRs from bitcoin/bitcoin, bitcoin-core/gui, etc., backported changes should match the original upstream PRs even if they appear strange, modify vendored code, or seem to violate coding guidelines. Still flag genuine issues like bugs, undefined behavior, crashes, compilation errors, or linter failures.
…NodesIf(predicate) method - Single lock acquisition instead of O(n) locks - No intermediate vector allocation - No redundant map lookups
Issue being fixed or feature implemented
Separation of consensus & chain code and network & node in Dash Core is blocked by tight connection of network and consensus code; in components such as llmq::CSigningManager, llmq::CSigSharesManager, coinjoin::client, coinjoin::server, governance/, llmq::CInstantSendManager, etc.
These dependencies on PeerManager blocks backport's of bitcoin related to 'kernel' project and makes circular dependencies over net_processing; it's also a blocker for bitcoin#25144
What was done?
This PR addresses a dependency of llmq::CSigSharesManager on PeerManager, it is continuation of #6992 which addressed llmq::CSigningManager
It is a split from proof-of-concept PR #6934. Due to multiple other changes (review comments to prior PRs, ActiveContext-related refactorings, proactive signature pushing the implementation is not fully matched with original 6934, the patchset is re-created from scratch.
Prior work:
NOTE: this PR is not a direct blocker of chainstate / kernel so far as
CSigSharesManageralready aggregated to masternode's ActiveContext by kwvg, see #6841How Has This Been Tested?
Run unit & functional tests
Run linter lint-circular-dependencies.py
Breaking Changes
N/A
Checklist: