multi: add BuildOnion, SendOnion, and TrackOnion RPCs #9489

calvinrzachman · 2025-02-08T03:11:04Z

Change Description

We add a new switchrpc RPC sub-system with SendOnion, BuildOnion, and TrackOnion RPCs. This allows the daemon to offload path-finding, onion construction and payment life-cycle management to an external entity (such as a remotely instantiated ChannelRouter type) and instead accept onion payments for direct delivery to the network.

Almost all the new functionality is hidden by a non-default build tag called switchrpc.
Opted for a very slim wrapper around direct delivery of UpdateAddHTLC to the HTLCSwitch for forwarding, eg: no extra tracking by way of ChannelRouter and the ControlTower structures. This may be suitable given intended use by remote server with with instantiated ChannelRouter component which will perform this payment attempt life-cycle tracking centrally for a collection of backing lnd instances. NOTE: This would allow for the deployment of a slimmed down lnd instance which does not contain any routing components in the future.
Error information is communicated via RPC protobuf message directly such that the error types can be recreated client side if desired - as would be the case if this RPC is used by a remotely instantiated ChannelRouter .
This RPC could be used to implement an "oblivious send" in which a client submits onions via RPC to a hosted node provider such that the node provider does not know to whom the onion is going.

Avoiding Duplicate Payment Attempts

We are making send/track(onion) requests which traverse an async and unreliable network. Clients which use these RPCs to make decisions about whether to make additional payment attempts run the risk of a race/re-ordering of request processing misleading them into making a re-attempt when such a re-attempt is not safe to make. We'd like to prevent duplicate payment attempts and unintentional loss of funds by RPC clients.

Consider the following scenario:

Client calls SendOnion with attempt ID A. Client receives gRPC DeadlineExceeded or service Unavailable error and is unable to distinguish between the request never reaching the server (eg: the server is offline --> safe to re-attempt via different server) and the server receiving the request and being unable to respond in time.
TrackOnion with attempt ID A.
TrackOnion indicates no HTLC for attempt ID A, so client makes another attempt and calls SendOnion again with different attempt ID.
SendOnion for attempt ID A executes.
SendOnion for attempt ID B executes. We have leaked an attempt and unintentionally overpaid!

One approach involves idempotent SendOnion implementation combined with an RPC client which persists acknowledgement of successful onion/HTLC receipt and dispatch from the server. The client can wait, retrying if necessary, until it gets an explicit acknowledgment from the server about the HTLC’s status before ever calling TrackOnion.
- To handle restarts, it must persist this acknowledgement and differentiate between ACK’d and UNACK’d attempts, handling them differently on startup. NOTE: This would require ChannelRouter changes for how we intend to use these RPCs ⚠️
The server acknowledgement in the previous approach is what allows a careful RPC caller to guarantee ordering between “send” and “track” for the same attempt ID even with an unreliable network. “Send” will always complete before “track”, thereby removing the risk of duplicate payment.
The other alternative is to make this protective "send then track" enforcement the responsibility of the server. The server can prevent sends for an attempt ID which has already been tracked. The client will just have to try again with a different attempt ID. NOTE: This avoids ChannelRouter changes for how we intend to use these RPCs ⚠️
- The server can make this determination by way of a store consulted by both SendOnion and TrackOnion. Both must write to the store so there can be communication. If TrackOnion only reads, then the potential for a race or unsafe order of execution still exists.
- NOTE: This PR currently takes this approach!

Future

Consider improvements to the Switch network result store.
- Allow remote maintenance of the data in this store.
- Implement duplicate protection for SendOnion which survives restarts, possibly via InitAttempt style method on the Switch store. All duplicates with same attempt ID would be rejected until the result for that attempt ID has been read and cleaned from the result store. Then the attempt ID can be freed for re-use.
Generalized concept of HTLC attempt ID (whether tuple or server generated) which permits multiple remote clients dispatching payments via SendOnion style RPC. Each RPC client should be able to clean only the attempt results relevant to it from the Switch's network result store. This would also allow lnd to be used both by remote clients dispatching payments via SendOnion and by more traditional clients via SendPaymentV2 at the same time.

coderabbitai · 2025-02-08T03:11:10Z

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

bitromortac

Awesome work 🎉, this will be very useful! I've only started to look at the design and first commits, just leaving a few thoughts, but will continue to review. I think making SendOnion idempotent and repeatable is the safest option to lead to a TrackOnion endpoint that can be called at any time, to make client restarts simple (but only a preliminary conclusion). Is there an example somewhere of the switch RPC being consumed in a ChannelRouter (implementing retries)?

lnrpc/switchrpc/switch.proto

bitromortac · 2025-03-19T11:19:20Z

lnrpc/switchrpc/switch.proto

+
+    // The attempt ID uniquely identifying this payment attempt. The caller can
+    // expect to track results for the payment via this attempt ID.
+    uint64 attempt_id = 6;


Could we take the ephemeral key in the onion to track the onion uniquely instead of attempt id?

You present an interesting consideration 🤔 At a high level, it makes sense that some kind of ID must be used to allow the clients of a "send" RPC or function (whether payment, HTLC, or onion) to follow up with the result. For payments, this is payment_hash. For HTLCs, this has so far been a single uint64 sequence # style counter called attempt_id. The practice of using attempt_id here for onions, is from me following that pattern and from the information currently available to implementations of this interface:

type PaymentAttemptDispatcher interface { // SendHTLC is a function that directs a link-layer switch to // forward a fully encoded payment to the first hop in the route // denoted by its public key. A non-nil error is to be returned if the // payment was unsuccessful. SendHTLC(firstHop lnwire.ShortChannelID, attemptID uint64, htlcAdd *lnwire.UpdateAddHTLC) error // GetAttemptResult returns the result of the payment attempt with // the given attemptID. The paymentHash should be set to the payment's // overall hash, or in case of AMP payments the payment's unique // identifier. GetAttemptResult(attemptID uint64, paymentHash lntypes.Hash, deobfuscator htlcswitch.ErrorDecrypter) ( <-chan *htlcswitch.PaymentResult, error)

The only current code that builds onions that I know about and which could submit onions via this endpoint is the ChannelRouter type so the current RPC protobuf message fields were structured so as to help any potential re-user of the ChannelRouter type. It is possible that ephemeral onion key makes more sense to use generally as an tracking ID here though since it is a better fingerprint or more tightly bound to the onion itself.

I have started to be of the mind that lnd itself may need to change the way it handles attempt IDs a bit so as to facilitate multiple, independent users of a SendOnion style endpoint. You could imagine each RPC client generating its own attempt IDs - there would be the possibility of collision within the network result store used by the Switch.

bitromortac · 2025-03-19T11:29:55Z

lnrpc/switchrpc/switch_server.go

+	// NOTE(calvin): We'll either need to require clients provide the short
+	// channel ID to use as a first hop OR lookup an acceptable channel ID
+	// for the given first hop public key.


For the start I think using the channel id is better and gives more control over liquidity, it also reflects the API for sendpayment. But we could also have the option to specify a pubkey, not sure if that is more convenient in terms of the consumer side.

I've added in some commits to allow for sending via channel ID rather than pubkey. Let me know if the approach looks reasonable.

saubyk · 2025-03-21T02:56:28Z

cc: @positiveblue in case you're interested in reviewing this pr

bitromortac · 2025-05-09T09:40:50Z

Left a few comments in calvinrzachman#17, I think the approach in there looks good!

ellemouton

love the minimal change set!! ✨

mostly just style & structure comments. I think the new subserver is also missing a unit test.

Also just an overall note on commit structure: it would be better to only plug in the new server once it is complete & ready.

so i'd suggest the following structure:

any refactors required
add the new package, implement the logic and unit tests
add proto definitions
wrapper grpc service that implements the proto defs and calls the new logic.
now, plug the completed subserver into LND
now add itests

lnrpc/switchrpc/driver.go

lnrpc/switchrpc/switch_server_test.go

htlcswitch/interfaces.go

htlcswitch/switch_test.go

ellemouton · 2025-06-27T10:16:12Z

htlcswitch/switch.go

+		if deobfuscator == nil {
+			return &PaymentResult{
+				EncryptedError: htlc.Reason,
+			}, nil
+		}


i think it needs to be explained more clearly why this could be nil. ie, be exlicit about the case we are handling - both in a comment in the code & in the commit message

There is brief mention in the godoc comment for the function. Updated to add a small comment in-line as well!

ellemouton · 2025-07-02T11:30:53Z

itest/lnd_sendonion_test.go

+	// TODO(calvin): Other things to check:
+	// - Error conditions/handling (server handles with decryptor or caller
+	//   handles encrypted error blobs from server)
+	// - That we successfully convert pubkey --> channel when there are
+	//   multiple channels, some of which can carry the payment and other
+	//   which cannot.
+	// - Send the same onion again. Send the same onion again but mark it
+	//   with a different attempt ID.
+	//
+	// If we send again, our node does forward the onion but the first hop
+	// considers it a replayed onion.
+	// 2024-05-01 15:54:18.364 [ERR] HSWC: unable to process onion packet: sphinx packet replay attempted
+	// 2024-05-01 15:54:18.364 [ERR] HSWC: ChannelLink(a680b373941e2e056e7b98007cc8cee933331e28981474b34d4275bb94cd17fe:0): unable to decode onion hop iterator: InvalidOnionVersion
+	// 2024-05-01 15:54:18.364 [DBG] PEER: Peer(0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74): Sending UpdateFailMalformedHTLC(chan_id=fe17cd94bb75424db3741498281e3333e9cec87c00987b6e052e1e9473b380a6, id=1, fail_code=InvalidOnionVersion) to 0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74@127.0.0.1:63567
+	// If we randomize the payment hash, first hop says bad HMAC.
+	//


will this be addressed here?

Some of it will be handled in the TrackOnion and duplicate send onion tests. Removed the comment to clean things up a bit.

ellemouton · 2025-07-02T11:31:27Z

itest/lnd_sendonion_test.go

+func testTrackOnion(ht *lntest.HarnessTest) {
+	// Create a four-node context consisting of Alice, Bob and two new
+	// nodes: Carol and Dave. This will provide a 4 node, 3 channel topology.
+	// Alice will make a  channel with Bob, and Bob with Carol, and Carol


feels like it could just be part of the existing send onion test no?

It is possible that we could make everything into one big test. But I think there might be enough to TrackOnion to merit creating a separate test. For example we can either defer error encryption to the switchrpc server by supplying the ephemeral session key and hop public keys used to construct the onion, or we can handle the onion error decryption on the client side if we wish to for privacy or other reasons.

ellemouton · 2025-07-02T11:32:07Z

itest/lnd_sendonion_test.go

+	// require.Error(ht, err, "expected error when re-sending same onion with same attempt ID")
+	// // Assert that the error is a gRPC codes.AlreadyExists error.
+	// st, ok := status.FromError(err)
+	// require.True(ht, ok, "expected a gRPC status error")
+	// require.Equal(ht, codes.AlreadyExists, st.Code(), "expected AlreadyExists error code")
+	// // require.Contains(ht, st.Message(), "duplicate onion", "expected error message to indicate duplicate onion")


only add the code when it isnt commented out. can leave the todo

or, only add the test once the logic actually does the thing :)

ellemouton · 2025-07-02T11:32:20Z

itest/lnd_sendonion_test.go

 	// - Send different onion but with same attempt ID.
 }

+func testSendOnionTwice(ht *lntest.HarnessTest) {


test doc pls 🙏

also - can we not extend the existing test?

ellemouton · 2025-07-02T11:34:39Z

lnrpc/switchrpc/switch_server.go

+	// scenarios where network requests are reordered. If an attempt ID has
+	// already been used by either SendOnion or TrackOnion, SendOnion will
+	// return DUPLICATE_HTLC for that attempt ID.
+	usedAttemptIDs *roaring64.Bitmap


as noted offline, an in-mem solution is not enough to make something idempotent. Will need a persisted solution if we find that we indeed are at rish of duplicate attempts

Updated to remove this in-memory method as we can instead make use of an InitAttempt method to checkpoint some information about the attempt prior to sending it out to the network. That way, we'll have the means to deny subsequent initialization attempts. We can also bury this duplicate safety one layer deeper within the actual Switch itself. This seems somewhat analogous to the InitPayment concept within the Router.

calvinrzachman

Thanks for the thorough review! I updated the commit ordering as suggested made sure to save hooking up the Switch RPC server into lnd until the end just before the itests. Let me know if anything was not sufficiently addressed 🙏

calvinrzachman · 2025-07-28T21:39:45Z

lnrpc/switchrpc/switch_server.go

+			"unable to process shared secrets")
+	}
+
+	// NOTE(calvin): In order to decrypt errors server side we require


Ahh I think I got the idea to use this format from the TODOs. I'll remove that for all the NOTEs

lnrpc/switchrpc/switch_server_test.go

htlcswitch/switch_test.go

calvinrzachman · 2025-08-07T20:37:26Z

htlcswitch/switch.go

+		if deobfuscator == nil {
+			return &PaymentResult{
+				EncryptedError: htlc.Reason,
+			}, nil
+		}


There is brief mention in the godoc comment for the function. Updated to add a small comment in-line as well!

calvinrzachman · 2025-08-07T22:25:05Z

lnrpc/switchrpc/switch_server.go

+	// scenarios where network requests are reordered. If an attempt ID has
+	// already been used by either SendOnion or TrackOnion, SendOnion will
+	// return DUPLICATE_HTLC for that attempt ID.
+	usedAttemptIDs *roaring64.Bitmap


Updated to remove this in-memory method as we can instead make use of an InitAttempt method to checkpoint some information about the attempt prior to sending it out to the network. That way, we'll have the means to deny subsequent initialization attempts. We can also bury this duplicate safety one layer deeper within the actual Switch itself. This seems somewhat analogous to the InitPayment concept within the Router.

calvinrzachman · 2025-08-12T13:34:53Z

itest/lnd_sendonion_test.go

+	// NOTE(calvin): We may want our wrapper RPC client to allow errors
+	// through so that we can make some assertions about them in various
+	// scenarios.
+	// resp, err := alice.RPC.SendOnion(onionReq)
+	// require.NoError(ht, err, "unable to send payment via onion")


Thanks for the thorough read. Made a pass to cleanup stray comments generally.

calvinrzachman · 2025-08-12T13:35:32Z

itest/lnd_sendonion_test.go

+// const (
+// 	defaultTimeout = 30 * time.Second
+// )


removed 🧼

calvinrzachman · 2025-08-12T14:56:41Z

itest/lnd_sendonion_test.go

+func testTrackOnion(ht *lntest.HarnessTest) {
+	// Create a four-node context consisting of Alice, Bob and two new
+	// nodes: Carol and Dave. This will provide a 4 node, 3 channel topology.
+	// Alice will make a  channel with Bob, and Bob with Carol, and Carol


It is possible that we could make everything into one big test. But I think there might be enough to TrackOnion to merit creating a separate test. For example we can either defer error encryption to the switchrpc server by supplying the ephemeral session key and hop public keys used to construct the onion, or we can handle the onion error decryption on the client side if we wish to for privacy or other reasons.

calvinrzachman · 2025-08-12T16:13:17Z

itest/lnd_sendonion_test.go

+	// TODO(calvin): Other things to check:
+	// - Error conditions/handling (server handles with decryptor or caller
+	//   handles encrypted error blobs from server)
+	// - That we successfully convert pubkey --> channel when there are
+	//   multiple channels, some of which can carry the payment and other
+	//   which cannot.
+	// - Send the same onion again. Send the same onion again but mark it
+	//   with a different attempt ID.
+	//
+	// If we send again, our node does forward the onion but the first hop
+	// considers it a replayed onion.
+	// 2024-05-01 15:54:18.364 [ERR] HSWC: unable to process onion packet: sphinx packet replay attempted
+	// 2024-05-01 15:54:18.364 [ERR] HSWC: ChannelLink(a680b373941e2e056e7b98007cc8cee933331e28981474b34d4275bb94cd17fe:0): unable to decode onion hop iterator: InvalidOnionVersion
+	// 2024-05-01 15:54:18.364 [DBG] PEER: Peer(0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74): Sending UpdateFailMalformedHTLC(chan_id=fe17cd94bb75424db3741498281e3333e9cec87c00987b6e052e1e9473b380a6, id=1, fail_code=InvalidOnionVersion) to 0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74@127.0.0.1:63567
+	// If we randomize the payment hash, first hop says bad HMAC.
+	//


Some of it will be handled in the TrackOnion and duplicate send onion tests. Removed the comment to clean things up a bit.

lnrpc/switchrpc/switch_server_test.go

This will allow a sub-system access to information about the state of a channel link such as forwarding bandwidth, eligibility, etc. while not permitting full control over link function.

Add RPC for dispatching payments via onions. The payment route and onion are computed by the caller and the onion is delivered to the server for forwarding. NOTE: The server does NOT process or peel the onion so it assumed that the onion will be constructed such that the first hop is encrypted to one of the server's channel partners.

These tests verify that internal errors from the htlcswitch (eg: ErrDuplicateAdd or ErrPaymentIDNotFound) are precisely translated into the specific error codes and messages defined in the `switch.proto` file. This is critical for the remote client, which relies on these exact signals to make important state decisions (e.g., whether to retry a payment). We also confirm that the server validates incoming requests and correctly rejects malformed or incomplete requests. This is important to do for externally provided input to the daemon, even if the users of this RPC server are trusted.

Allow the switch to defer error handling when callers of GetAttemptResult do not provide an error decrypter.

lnrpc/switchrpc/switch_server.go

ziggie1984 · 2025-12-21T09:02:26Z

lnrpc/switchrpc/switch.proto

+    repeated bytes hop_pubkeys = 4;
+}
+
+message TrackOnionResponse {


what about this, I think you did not use oneof at all might make also sense the other new RPC structs:

message DecryptedError { string message = 1; ErrorCode code = 2; } message TrackOnionResponse { oneof result { bytes preimage = 1; // Success DecryptedError decrypted_error = 2; // Decrypted failure bytes encrypted_error = 3; // Encrypted failure } }

ziggie1984

Thank you for answering all the previous comments, I still have some additional comments, especially that we maybe make the RPC API more clean by using the optional keyword where applicable and also using the oneof method when having multiple outcomes.

I think it is worth to take the time here for the RPC layer because once shipped we will have all the work to deprecate stuff and cannot easily change the API

ziggie1984 · 2025-12-21T09:05:42Z

lnrpc/switchrpc/switch_server.go

+
+	// If neither is provided, the caller wants to handle decryption. This
+	// is a valid use case, so we return no decryptor and no error.
+	if !sessionKeyProvided && !hopPubkeysProvided {


same here this can be accomplished if we make the rpc interface more strict by using oneof or ? can we make the RPC api more strict then we can skip all these checks here: @gemini-code-assist

That's a good suggestion to consider for making the RPC API more strict! The current implementation of buildErrorDecryptor already enforces the "both or neither" semantic for session_key and hop_pubkeys through explicit checks:

It first checks if both sessionKeyBytes and hopPubkeys are empty. If so, it correctly interprets this as the caller wanting to handle decryption themselves and returns no decryptor.

It then checks if only one of them is provided (sessionKeyProvided != hopPubkeysProvided). If this is the case, it returns an error, as both are required for server-side decryption.

While using oneof in the protobuf definition could visually group these fields, it wouldn't entirely eliminate the need for these checks in the Go code. For bytes and repeated bytes fields in protobuf3, an empty value (e.g., []byte{} or [][]byte{}) is still considered a "set" value by the oneof mechanism. Therefore, the len(field) > 0 checks would still be necessary within the chosen oneof option to ensure that the provided data is meaningful for decryption.

The current explicit checks in the Go code are clear and effectively handle the required logic for these optional fields.

ziggie1984 · 2025-12-21T09:16:39Z

lnrpc/switchrpc/switch_server.go

+func validateSessionKey(sessionKeyBytes []byte) error {
+	const expectedKeyLength = 32
+
+	// Check length of session key.
+	if len(sessionKeyBytes) != expectedKeyLength {
+		return fmt.Errorf("invalid session key length: got %d, "+
+			"expected %d", len(sessionKeyBytes), expectedKeyLength)
+	}
+
+	// Interpret the key as a big-endian unsigned integer.
+	keyValue := new(big.Int).SetBytes(sessionKeyBytes)
+
+	// Check if the key is in the valid range [1, N-1].
+	if keyValue.Sign() <= 0 || keyValue.Cmp(btcec.S256().N) >= 0 {
+		return fmt.Errorf("session key is out of range")
+	}
+
+	return nil


This is definitely violation of boundaries, having this fine granular crypto on this level of the code is not good design we should rather update the crypto package and then do something like this:

// btcec library should have this func ValidatePrivKeyBytes(pk []byte) error { if len(pk) != 32 { return ErrInvalidKeyLen } keyValue := new(big.Int).SetBytes(pk) if keyValue.Sign() <= 0 || keyValue.Cmp(S256().N) >= 0 { return ErrKeyOutOfRange } return nil } Then the domain layer (htlcswitch) should use it: // In htlcswitch, not RPC layer privKey, pubKey := btcec.PrivKeyFromBytes(sessionKeyBytes) if err := btcec.ValidatePrivKeyBytes(sessionKeyBytes); err != nil { return fmt.Errorf("invalid session key: %w", err) }

Yeah, I'm up to try to stick this into the lower level package. Should we defer that to a follow up PR so we don't need to wait on a new package release? I have a branch to update btcec here: calvinrzachman/btcd#1

followup is OK, but let's remove this check from this PR.

ziggie1984 · 2025-12-21T09:19:46Z

lnrpc/switchrpc/switch.proto

+    uint64 attempt_id = 2;
+
+    // Optional: Session key used to generate the onion/sphinx packet.
+    bytes session_key = 3;


would it make sense to use oneof from the grpc package for these two components which are optional @gemini-code-assist

That's an excellent suggestion! You're absolutely right that oneof is the perfect construct for first_hop_pubkey and first_hop_chan_id in the SendOnionRequest message.

Using oneof would formally enforce the mutual exclusivity that's currently handled by application logic, making the API contract clearer and preventing invalid states at the protobuf level. It also simplifies validation for clients and can lead to cleaner generated code.

I've prepared a code suggestion to implement this change in lnrpc/switchrpc/switch.proto:

message SendOnionRequest { // The raw onion packet to be sent across the network. bytes onion_blob = 1; // The attempt ID uniquely identifying this payment attempt. The caller can // expect to track results for the payment via this attempt ID. uint64 attempt_id = 2; // The total amount in millisatoshis required to complete a payment over // this route. This value includes the cumulative fees at each hop. The HTLC // extended to the first-hop in the route will need to have at least this // many (milli)satoshis. int64 amount = 5; // The absolute timelock or CLTV value that should be extended to the first // hop in the route. All other hops will decrement the time-lock as // described by the onion. uint32 timelock = 6; // The payment hash associated with the HTLC. This is needed for tracking // and debugging purposes. bytes payment_hash = 7; // The optional blinding point to use when constructing the HTLC. // If set, this will be used to blind the next hop in the route. bytes blinding_point = 8; // Optional custom TLV records to include in the UpdateAddHTLC message. map<uint64, bytes> custom_records = 9; // Optional opaque extra data to attach to the UpdateAddHTLC message. bytes extra_data = 10; oneof first_hop { // The first hop's public key where the onion will be sent. // The first hop must be specified either by channel ID or public key. bytes first_hop_pubkey = 3; // The channel id of the channel that must be taken to the first hop. // The first hop must be specified either by channel ID or public key. uint64 first_hop_chan_id = 4; } }

This change would require regenerating the protobuf code and updating the SendOnion server-side logic to use the first_hop oneof field correctly.

Let's do this along side the TrackOnionResponse updates in a follow up PR if that's okay.

lnrpc/switchrpc/switch.proto

ziggie1984 · 2025-12-21T10:09:13Z

lnrpc/switchrpc/switch.proto

+
+    // The optional blinding point to use when constructing the HTLC.
+    // If set, this will be used to blind the next hop in the route.
+    bytes blinding_point = 8;


let's start to use optional keyword for rpc proto ?

Added the optional directive.

ziggie1984 · 2025-12-21T10:12:03Z

itest/lnd_sendonion_test.go

+	// Now that the original HTLC attempt has settled, we'll send the same
+	// onion again with the same attempt ID.
+	//
+	// NOTE: Currently, this does not error. When we make SendOnion fully


hmm why does this not error tho, because the result should still be in the result store, we haven't restarted yet ?

Ahh this is because the CircuitMap on its own doesn't defend against duplicates after the settle/fail result is back from the network and the circuit is torn down. That is partly why we want to use InitAttempt. So that even if the result is in the store, an RPC client retrying SendOnion (possibly due to timeouts or ErrDuplicateAdd response from the server being delayed or lost) is not at any risk of causing duplicate attempts.

but what I do not understand, this PR builds already ontop of the initAttempt PR or ? Have you rebased this PR on the current base branch ?

ziggie1984 · 2025-12-21T10:22:50Z

lnrpc/switchrpc/switch.proto

+    // expect to track results for the payment via this attempt ID.
+    uint64 attempt_id = 2;
+
+    // Optional: Session key used to generate the onion/sphinx packet.


I don't think this comment is correct, because we only need these for the error decryptor rather then creating an onion sphinx package or ?

Ahh, can see how this comment might be confusing. While this key's purpose in TrackOnion is for decryption, it is not arbitrary; it is the exact same cryptographic material used to construct the onion. If it does not match, then I think decryption is impossible. Only the creator of the onion can decrypt the forwarding errors.

Updated the comment to better reflect this.

lnrpc/switchrpc/switch.yaml

Adds the TrackOnion RPC to the switchrpc service. This allows a caller to subscribe to the final outcome (settle or fail) of a specific HTLC attempt. This RPC is designed to be called after a successful dispatch has been confirmed via the SendOnion RPC. It should not be used to determine whether an HTLC dispatch was received in an ambiguous network scenario. That ambiguity must be resolved by retrying the idempotent SendOnion RPC until a definitive acknowledgement is received. Once dispatch is confirmed, TrackOnion provides the mechanism to wait for the result of the in-flight HTLC. The RPC allows callers to specify whether error decryption should be handled by the server or performed by the client, providing flexibility for different error handling strategies.

This will allow us to leverage this function from the Switch RPC server's BuildOnion implementation.

Add RPC which constructs a sphinx onion packet for the given payment route. NOTE: This is added primarily to aid with the itests added later.

This plugs in the Switch RPC server to the rest of lnd. The service will be available for use.

Update so that "make unit-cover" uses tags in a manner consistent with the rest of our unit testing.

This demonstrates how the Switch and SendOnion rpc behave when asked to dispatch duplicate onions. Notably, the Switch circuit map detects this - but only if the matching onion is still in flight. Once the circuit is torn down, the duplicate is permitted by the Switch. It is likely that we will add a layer of protection to the SendOnion call itself to prevent duplicates even after the first HTLC is no longer in-flight.

We declare each service's REST annotations in its own file. This is optional in the v1 but mandatory when using v2 of the grpc-gateway/v2 library.

Update the Switch RPC protos to make use of the 'optional' directive. Though this may not impact the generated types or how the user interacts with these types, it may serve to document the fact that they are optional a bit better.

ziggie1984

LGTM, great back and forth, congratulations on the PR 🎉

Please submit the follow-up PRs soon!

calvinrzachman mentioned this pull request Feb 8, 2025

multi: add BuildOnion, SendOnion, and TrackOnion RPCs #8907

Closed

bitromortac self-requested a review March 14, 2025 16:19

bitromortac reviewed Mar 19, 2025

View reviewed changes

saubyk assigned calvinrzachman Mar 21, 2025

saubyk added this to the v0.20.0 milestone Apr 3, 2025

saubyk added this to lnd v0.20 Apr 3, 2025

saubyk moved this to In progress in lnd v0.20 Apr 3, 2025

calvinrzachman force-pushed the switchrpc branch from 3bf2f15 to 2900535 Compare April 29, 2025 01:25

This was referenced Apr 30, 2025

multi: Switch store modifications + prevent duplicates in SendHTLC #9777

Closed

multi: Switch store modifications + prevent duplicates in SendHTLC calvinrzachman/lnd#17

Draft

levmi requested a review from ellemouton May 8, 2025 14:47

calvinrzachman force-pushed the switchrpc branch from 2900535 to e1b56dc Compare June 26, 2025 18:20

ellemouton reviewed Jul 2, 2025

View reviewed changes

calvinrzachman mentioned this pull request Jul 8, 2025

htlcswitch: add InitAttempt for idempotent external dispatch #10049

Merged

calvinrzachman mentioned this pull request Jul 15, 2025

switchrpc: add NextAttemptID rpc calvinrzachman/lnd#20

Draft

calvinrzachman mentioned this pull request Jul 25, 2025

[feature]: add option to disable router rpc server #10110

Open

calvinrzachman force-pushed the switchrpc branch from e1b56dc to 038c377 Compare August 11, 2025 17:28

calvinrzachman changed the base branch from master to 0-19-2-branch August 11, 2025 17:30

calvinrzachman force-pushed the switchrpc branch from 038c377 to 2c32ac5 Compare August 11, 2025 18:59

calvinrzachman changed the base branch from 0-19-2-branch to master August 11, 2025 19:00

calvinrzachman force-pushed the switchrpc branch from 2c32ac5 to 38a26f0 Compare August 14, 2025 02:50

calvinrzachman commented Aug 14, 2025

View reviewed changes

calvinrzachman force-pushed the switchrpc branch 2 times, most recently from c113546 to fa23d48 Compare August 26, 2025 19:32

calvinrzachman mentioned this pull request Aug 26, 2025

switchrpc: allow remote cleanup of Switch attempt store calvinrzachman/lnd#19

Draft

calvinrzachman force-pushed the switchrpc branch from fa23d48 to ccab228 Compare September 5, 2025 22:24

calvinrzachman requested a review from ellemouton September 8, 2025 02:05

multi: expose interface for read-only link access

0949f94

This will allow a sub-system access to information about the state of a channel link such as forwarding bandwidth, eligibility, etc. while not permitting full control over link function.

calvinrzachman force-pushed the switchrpc branch from a2acef3 to cd74c0b Compare December 19, 2025 07:55

calvinrzachman added 5 commits December 19, 2025 11:24

htlcswitch: return encrypted error blob when missing deobfuscator

449f20f

Allow the switch to defer error handling when callers of GetAttemptResult do not provide an error decrypter.

switchrpc: build error decrypter from sphinx info

f5215b9

switchrpc: add new TrackOnion rpc proto

70626ac

calvinrzachman force-pushed the switchrpc branch 2 times, most recently from 687bf53 to fdbdcbe Compare December 19, 2025 18:21

ziggie1984 reviewed Dec 21, 2025

View reviewed changes

lnrpc/switchrpc/switch_server.go Outdated Show resolved Hide resolved

ziggie1984 reviewed Dec 21, 2025

View reviewed changes

calvinrzachman added 14 commits December 23, 2025 12:09

switchrpc: add new BuildOnion rpc proto

db49b7f

payments/db: export sphinx packet generation function

baed63c

This will allow us to leverage this function from the Switch RPC server's BuildOnion implementation.

routing: export session key generation function

97a48f2

This will allow us to leverage this function from the Switch RPC server's BuildOnion implementation.

switchrpc: add BuildOnion rpc

78ed29c

Add RPC which constructs a sphinx onion packet for the given payment route. NOTE: This is added primarily to aid with the itests added later.

switchrpc: register with main grpc server

246d883

This plugs in the Switch RPC server to the rest of lnd. The service will be available for use.

lntest: add switchrpc methods to harness

2832020

itest: add send_onion test

9adb1bb

make: update tags for make unit-cover

a7b6aef

Update so that "make unit-cover" uses tags in a manner consistent with the rest of our unit testing.

itest: add track_onion test

d904fe5

lnrpc: specify REST annotations for switchrpc

428f79d

We declare each service's REST annotations in its own file. This is optional in the v1 but mandatory when using v2 of the grpc-gateway/v2 library.

docs: update v0.21 release notes

03481fb

switchrpc: proto message updates

946a221

Update the Switch RPC protos to make use of the 'optional' directive. Though this may not impact the generated types or how the user interacts with these types, it may serve to document the fact that they are optional a bit better.

calvinrzachman force-pushed the switchrpc branch from fdbdcbe to 946a221 Compare December 23, 2025 19:11

ziggie1984 approved these changes Dec 23, 2025

View reviewed changes

ziggie1984 merged commit 307e665 into lightningnetwork:elle-base-branch-payment-service Dec 23, 2025
40 of 41 checks passed

github-project-automation bot moved this from In review to Done in v0.21 Dec 23, 2025

multi: add BuildOnion, SendOnion, and TrackOnion RPCs #9489

multi: add BuildOnion, SendOnion, and TrackOnion RPCs #9489

Conversation

calvinrzachman commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Avoiding Duplicate Payment Attempts

Future

Uh oh!

coderabbitai bot commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saubyk commented Mar 21, 2025

Uh oh!

bitromortac commented May 9, 2025

Uh oh!

ellemouton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

calvinrzachman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

calvinrzachman commented Feb 8, 2025 •

edited

Loading

coderabbitai bot commented Feb 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)