Skip to content

Conversation

@calvinrzachman
Copy link
Contributor

@calvinrzachman calvinrzachman commented Feb 8, 2025

Change Description

We add a new switchrpc RPC sub-system with SendOnion, BuildOnion, and TrackOnion RPCs. This allows the daemon to offload path-finding, onion construction and payment life-cycle management to an external entity (such as a remotely instantiated ChannelRouter type) and instead accept onion payments for direct delivery to the network.

  • Almost all the new functionality is hidden by a non-default build tag called switchrpc.
  • Opted for a very slim wrapper around direct delivery of UpdateAddHTLC to the HTLCSwitch for forwarding, eg: no extra tracking by way of ChannelRouter and the ControlTower structures. This may be suitable given intended use by remote server with with instantiated ChannelRouter component which will perform this payment attempt life-cycle tracking centrally for a collection of backing lnd instances. NOTE: This would allow for the deployment of a slimmed down lnd instance which does not contain any routing components in the future.
  • Error information is communicated via RPC protobuf message directly such that the error types can be recreated client side if desired - as would be the case if this RPC is used by a remotely instantiated ChannelRouter .
  • This RPC could be used to implement an "oblivious send" in which a client submits onions via RPC to a hosted node provider such that the node provider does not know to whom the onion is going.

Avoiding Duplicate Payment Attempts

We are making send/track(onion) requests which traverse an async and unreliable network. Clients which use these RPCs to make decisions about whether to make additional payment attempts run the risk of a race/re-ordering of request processing misleading them into making a re-attempt when such a re-attempt is not safe to make. We'd like to prevent duplicate payment attempts and unintentional loss of funds by RPC clients.

Consider the following scenario:

  1. Client calls SendOnion with attempt ID A. Client receives gRPC DeadlineExceeded or service Unavailable error and is unable to distinguish between the request never reaching the server (eg: the server is offline --> safe to re-attempt via different server) and the server receiving the request and being unable to respond in time.
  2. TrackOnion with attempt ID A.
  3. TrackOnion indicates no HTLC for attempt ID A, so client makes another attempt and calls SendOnion again with different attempt ID.
  4. SendOnion for attempt ID A executes.
  5. SendOnion for attempt ID B executes. We have leaked an attempt and unintentionally overpaid!
  • One approach involves idempotent SendOnion implementation combined with an RPC client which persists acknowledgement of successful onion/HTLC receipt and dispatch from the server. The client can wait, retrying if necessary, until it gets an explicit acknowledgment from the server about the HTLC’s status before ever calling TrackOnion.
    • To handle restarts, it must persist this acknowledgement and differentiate between ACK’d and UNACK’d attempts, handling them differently on startup. NOTE: This would require ChannelRouter changes for how we intend to use these RPCs ⚠️
  • The server acknowledgement in the previous approach is what allows a careful RPC caller to guarantee ordering between “send” and “track” for the same attempt ID even with an unreliable network. “Send” will always complete before “track”, thereby removing the risk of duplicate payment.
  • The other alternative is to make this protective "send then track" enforcement the responsibility of the server. The server can prevent sends for an attempt ID which has already been tracked. The client will just have to try again with a different attempt ID. NOTE: This avoids ChannelRouter changes for how we intend to use these RPCs ⚠️
    • The server can make this determination by way of a store consulted by both SendOnion and TrackOnion. Both must write to the store so there can be communication. If TrackOnion only reads, then the potential for a race or unsafe order of execution still exists.
    • NOTE: This PR currently takes this approach!

Future

  • Consider improvements to the Switch network result store.
    • Allow remote maintenance of the data in this store.
    • Implement duplicate protection for SendOnion which survives restarts, possibly via InitAttempt style method on the Switch store. All duplicates with same attempt ID would be rejected until the result for that attempt ID has been read and cleaned from the result store. Then the attempt ID can be freed for re-use.
  • Generalized concept of HTLC attempt ID (whether tuple or server generated) which permits multiple remote clients dispatching payments via SendOnion style RPC. Each RPC client should be able to clean only the attempt results relevant to it from the Switch's network result store. This would also allow lnd to be used both by remote clients dispatching payments via SendOnion and by more traditional clients via SendPaymentV2 at the same time.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work 🎉, this will be very useful! I've only started to look at the design and first commits, just leaving a few thoughts, but will continue to review. I think making SendOnion idempotent and repeatable is the safest option to lead to a TrackOnion endpoint that can be called at any time, to make client restarts simple (but only a preliminary conclusion). Is there an example somewhere of the switch RPC being consumed in a ChannelRouter (implementing retries)?


// The attempt ID uniquely identifying this payment attempt. The caller can
// expect to track results for the payment via this attempt ID.
uint64 attempt_id = 6;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we take the ephemeral key in the onion to track the onion uniquely instead of attempt id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You present an interesting consideration 🤔 At a high level, it makes sense that some kind of ID must be used to allow the clients of a "send" RPC or function (whether payment, HTLC, or onion) to follow up with the result. For payments, this is payment_hash. For HTLCs, this has so far been a single uint64 sequence # style counter called attempt_id. The practice of using attempt_id here for onions, is from me following that pattern and from the information currently available to implementations of this interface:

type PaymentAttemptDispatcher interface {
	// SendHTLC is a function that directs a link-layer switch to
	// forward a fully encoded payment to the first hop in the route
	// denoted by its public key. A non-nil error is to be returned if the
	// payment was unsuccessful.
	SendHTLC(firstHop lnwire.ShortChannelID,
		attemptID uint64,
		htlcAdd *lnwire.UpdateAddHTLC) error
		
	// GetAttemptResult returns the result of the payment attempt with
	// the given attemptID. The paymentHash should be set to the payment's
	// overall hash, or in case of AMP payments the payment's unique
	// identifier.
	GetAttemptResult(attemptID uint64, paymentHash lntypes.Hash,
		deobfuscator htlcswitch.ErrorDecrypter) (
		<-chan *htlcswitch.PaymentResult, error)

The only current code that builds onions that I know about and which could submit onions via this endpoint is the ChannelRouter type so the current RPC protobuf message fields were structured so as to help any potential re-user of the ChannelRouter type. It is possible that ephemeral onion key makes more sense to use generally as an tracking ID here though since it is a better fingerprint or more tightly bound to the onion itself.

I have started to be of the mind that lnd itself may need to change the way it handles attempt IDs a bit so as to facilitate multiple, independent users of a SendOnion style endpoint. You could imagine each RPC client generating its own attempt IDs - there would be the possibility of collision within the network result store used by the Switch.

Comment on lines 247 to 249
// NOTE(calvin): We'll either need to require clients provide the short
// channel ID to use as a first hop OR lookup an acceptable channel ID
// for the given first hop public key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the start I think using the channel id is better and gives more control over liquidity, it also reflects the API for sendpayment. But we could also have the option to specify a pubkey, not sure if that is more convenient in terms of the consumer side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added in some commits to allow for sending via channel ID rather than pubkey. Let me know if the approach looks reasonable.

@saubyk
Copy link
Collaborator

saubyk commented Mar 21, 2025

cc: @positiveblue in case you're interested in reviewing this pr

@bitromortac
Copy link
Collaborator

Left a few comments in calvinrzachman#17, I think the approach in there looks good!

Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the minimal change set!! ✨

mostly just style & structure comments. I think the new subserver is also missing a unit test.

Also just an overall note on commit structure: it would be better to only plug in the new server once it is complete & ready.

so i'd suggest the following structure:

  • any refactors required
  • add the new package, implement the logic and unit tests
  • add proto definitions
  • wrapper grpc service that implements the proto defs and calls the new logic.
  • now, plug the completed subserver into LND
  • now add itests

Comment on lines 1029 to 1051
if deobfuscator == nil {
return &PaymentResult{
EncryptedError: htlc.Reason,
}, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it needs to be explained more clearly why this could be nil. ie, be exlicit about the case we are handling - both in a comment in the code & in the commit message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is brief mention in the godoc comment for the function. Updated to add a small comment in-line as well!

Comment on lines 152 to 241
// TODO(calvin): Other things to check:
// - Error conditions/handling (server handles with decryptor or caller
// handles encrypted error blobs from server)
// - That we successfully convert pubkey --> channel when there are
// multiple channels, some of which can carry the payment and other
// which cannot.
// - Send the same onion again. Send the same onion again but mark it
// with a different attempt ID.
//
// If we send again, our node does forward the onion but the first hop
// considers it a replayed onion.
// 2024-05-01 15:54:18.364 [ERR] HSWC: unable to process onion packet: sphinx packet replay attempted
// 2024-05-01 15:54:18.364 [ERR] HSWC: ChannelLink(a680b373941e2e056e7b98007cc8cee933331e28981474b34d4275bb94cd17fe:0): unable to decode onion hop iterator: InvalidOnionVersion
// 2024-05-01 15:54:18.364 [DBG] PEER: Peer(0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74): Sending UpdateFailMalformedHTLC(chan_id=fe17cd94bb75424db3741498281e3333e9cec87c00987b6e052e1e9473b380a6, id=1, fail_code=InvalidOnionVersion) to 0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74@127.0.0.1:63567
// If we randomize the payment hash, first hop says bad HMAC.
//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be addressed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of it will be handled in the TrackOnion and duplicate send onion tests. Removed the comment to clean things up a bit.

Comment on lines 171 to 250
func testTrackOnion(ht *lntest.HarnessTest) {
// Create a four-node context consisting of Alice, Bob and two new
// nodes: Carol and Dave. This will provide a 4 node, 3 channel topology.
// Alice will make a channel with Bob, and Bob with Carol, and Carol
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like it could just be part of the existing send onion test no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that we could make everything into one big test. But I think there might be enough to TrackOnion to merit creating a separate test. For example we can either defer error encryption to the switchrpc server by supplying the ephemeral session key and hop public keys used to construct the onion, or we can handle the onion error decryption on the client side if we wish to for privacy or other reasons.

Comment on lines 275 to 280
// require.Error(ht, err, "expected error when re-sending same onion with same attempt ID")
// // Assert that the error is a gRPC codes.AlreadyExists error.
// st, ok := status.FromError(err)
// require.True(ht, ok, "expected a gRPC status error")
// require.Equal(ht, codes.AlreadyExists, st.Code(), "expected AlreadyExists error code")
// // require.Contains(ht, st.Message(), "duplicate onion", "expected error message to indicate duplicate onion")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only add the code when it isnt commented out. can leave the todo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, only add the test once the logic actually does the thing :)

// - Send different onion but with same attempt ID.
}

func testSendOnionTwice(ht *lntest.HarnessTest) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test doc pls 🙏

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also - can we not extend the existing test?

// scenarios where network requests are reordered. If an attempt ID has
// already been used by either SendOnion or TrackOnion, SendOnion will
// return DUPLICATE_HTLC for that attempt ID.
usedAttemptIDs *roaring64.Bitmap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted offline, an in-mem solution is not enough to make something idempotent. Will need a persisted solution if we find that we indeed are at rish of duplicate attempts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove this in-memory method as we can instead make use of an InitAttempt method to checkpoint some information about the attempt prior to sending it out to the network. That way, we'll have the means to deny subsequent initialization attempts. We can also bury this duplicate safety one layer deeper within the actual Switch itself. This seems somewhat analogous to the InitPayment concept within the Router.

Copy link
Contributor Author

@calvinrzachman calvinrzachman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! I updated the commit ordering as suggested made sure to save hooking up the Switch RPC server into lnd until the end just before the itests. Let me know if anything was not sufficiently addressed 🙏

"unable to process shared secrets")
}

// NOTE(calvin): In order to decrypt errors server side we require
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think I got the idea to use this format from the TODOs. I'll remove that for all the NOTEs

Comment on lines 1029 to 1051
if deobfuscator == nil {
return &PaymentResult{
EncryptedError: htlc.Reason,
}, nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is brief mention in the godoc comment for the function. Updated to add a small comment in-line as well!

// scenarios where network requests are reordered. If an attempt ID has
// already been used by either SendOnion or TrackOnion, SendOnion will
// return DUPLICATE_HTLC for that attempt ID.
usedAttemptIDs *roaring64.Bitmap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove this in-memory method as we can instead make use of an InitAttempt method to checkpoint some information about the attempt prior to sending it out to the network. That way, we'll have the means to deny subsequent initialization attempts. We can also bury this duplicate safety one layer deeper within the actual Switch itself. This seems somewhat analogous to the InitPayment concept within the Router.

Comment on lines 108 to 112
// NOTE(calvin): We may want our wrapper RPC client to allow errors
// through so that we can make some assertions about them in various
// scenarios.
// resp, err := alice.RPC.SendOnion(onionReq)
// require.NoError(ht, err, "unable to send payment via onion")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough read. Made a pass to cleanup stray comments generally.

Comment on lines 18 to 20
// const (
// defaultTimeout = 30 * time.Second
// )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 🧼

Comment on lines 171 to 250
func testTrackOnion(ht *lntest.HarnessTest) {
// Create a four-node context consisting of Alice, Bob and two new
// nodes: Carol and Dave. This will provide a 4 node, 3 channel topology.
// Alice will make a channel with Bob, and Bob with Carol, and Carol
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that we could make everything into one big test. But I think there might be enough to TrackOnion to merit creating a separate test. For example we can either defer error encryption to the switchrpc server by supplying the ephemeral session key and hop public keys used to construct the onion, or we can handle the onion error decryption on the client side if we wish to for privacy or other reasons.

Comment on lines 152 to 241
// TODO(calvin): Other things to check:
// - Error conditions/handling (server handles with decryptor or caller
// handles encrypted error blobs from server)
// - That we successfully convert pubkey --> channel when there are
// multiple channels, some of which can carry the payment and other
// which cannot.
// - Send the same onion again. Send the same onion again but mark it
// with a different attempt ID.
//
// If we send again, our node does forward the onion but the first hop
// considers it a replayed onion.
// 2024-05-01 15:54:18.364 [ERR] HSWC: unable to process onion packet: sphinx packet replay attempted
// 2024-05-01 15:54:18.364 [ERR] HSWC: ChannelLink(a680b373941e2e056e7b98007cc8cee933331e28981474b34d4275bb94cd17fe:0): unable to decode onion hop iterator: InvalidOnionVersion
// 2024-05-01 15:54:18.364 [DBG] PEER: Peer(0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74): Sending UpdateFailMalformedHTLC(chan_id=fe17cd94bb75424db3741498281e3333e9cec87c00987b6e052e1e9473b380a6, id=1, fail_code=InvalidOnionVersion) to 0352f454dd5e09cd3e979cbace6fc6727cfa9a1eaa878a452ce63b221f51771a74@127.0.0.1:63567
// If we randomize the payment hash, first hop says bad HMAC.
//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of it will be handled in the TrackOnion and duplicate send onion tests. Removed the comment to clean things up a bit.

This will allow a sub-system access to information
about the state of a channel link such as forwarding
bandwidth, eligibility, etc. while not permitting full
control over link function.
Add RPC for dispatching payments via onions. The payment
route and onion are computed by the caller and the onion
is delivered to the server for forwarding.

NOTE: The server does NOT process or peel the onion so it
assumed that the onion will be constructed such that the first
hop is encrypted to one of the server's channel partners.
These tests verify that internal errors from the htlcswitch
(eg: ErrDuplicateAdd or ErrPaymentIDNotFound) are precisely
translated into the specific error codes and messages defined
in the `switch.proto` file. This is critical for the remote
client, which relies on these exact signals to make important
state decisions (e.g., whether to retry a payment).

We also confirm that the server validates incoming requests
and correctly rejects malformed or incomplete requests. This
is important to do for externally provided input to the daemon,
even if the users of this RPC server are trusted.
Allow the switch to defer error handling when callers of GetAttemptResult
do not provide an error decrypter.
@calvinrzachman calvinrzachman force-pushed the switchrpc branch 2 times, most recently from 687bf53 to fdbdcbe Compare December 19, 2025 18:21
repeated bytes hop_pubkeys = 4;
}

message TrackOnionResponse {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this, I think you did not use oneof at all might make also sense the other new RPC structs:

  message DecryptedError {
      string message = 1;
      ErrorCode code = 2;
  }

  message TrackOnionResponse {
      oneof result {
          bytes preimage = 1;                // Success
          DecryptedError decrypted_error = 2; // Decrypted failure
          bytes encrypted_error = 3;          // Encrypted failure
      }
  }

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for answering all the previous comments, I still have some additional comments, especially that we maybe make the RPC API more clean by using the optional keyword where applicable and also using the oneof method when having multiple outcomes.

I think it is worth to take the time here for the RPC layer because once shipped we will have all the work to deprecate stuff and cannot easily change the API


// If neither is provided, the caller wants to handle decryption. This
// is a valid use case, so we return no decryptor and no error.
if !sessionKeyProvided && !hopPubkeysProvided {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here this can be accomplished if we make the rpc interface more strict by using oneof or ? can we make the RPC api more strict then we can skip all these checks here: @gemini-code-assist

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion to consider for making the RPC API more strict! The current implementation of buildErrorDecryptor already enforces the "both or neither" semantic for session_key and hop_pubkeys through explicit checks:

  1. It first checks if both sessionKeyBytes and hopPubkeys are empty. If so, it correctly interprets this as the caller wanting to handle decryption themselves and returns no decryptor.
  2. It then checks if only one of them is provided (sessionKeyProvided != hopPubkeysProvided). If this is the case, it returns an error, as both are required for server-side decryption.

While using oneof in the protobuf definition could visually group these fields, it wouldn't entirely eliminate the need for these checks in the Go code. For bytes and repeated bytes fields in protobuf3, an empty value (e.g., []byte{} or [][]byte{}) is still considered a "set" value by the oneof mechanism. Therefore, the len(field) > 0 checks would still be necessary within the chosen oneof option to ensure that the provided data is meaningful for decryption.

The current explicit checks in the Go code are clear and effectively handle the required logic for these optional fields.

Comment on lines +413 to +430
func validateSessionKey(sessionKeyBytes []byte) error {
const expectedKeyLength = 32

// Check length of session key.
if len(sessionKeyBytes) != expectedKeyLength {
return fmt.Errorf("invalid session key length: got %d, "+
"expected %d", len(sessionKeyBytes), expectedKeyLength)
}

// Interpret the key as a big-endian unsigned integer.
keyValue := new(big.Int).SetBytes(sessionKeyBytes)

// Check if the key is in the valid range [1, N-1].
if keyValue.Sign() <= 0 || keyValue.Cmp(btcec.S256().N) >= 0 {
return fmt.Errorf("session key is out of range")
}

return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely violation of boundaries, having this fine granular crypto on this level of the code is not good design we should rather update the crypto package and then do something like this:

  // btcec library should have this
  func ValidatePrivKeyBytes(pk []byte) error {
      if len(pk) != 32 {
          return ErrInvalidKeyLen
      }

      keyValue := new(big.Int).SetBytes(pk)
      if keyValue.Sign() <= 0 || keyValue.Cmp(S256().N) >= 0 {
          return ErrKeyOutOfRange
      }

      return nil
  }

  Then the domain layer (htlcswitch) should use it:
  // In htlcswitch, not RPC layer
  privKey, pubKey := btcec.PrivKeyFromBytes(sessionKeyBytes)
  if err := btcec.ValidatePrivKeyBytes(sessionKeyBytes); err != nil {
      return fmt.Errorf("invalid session key: %w", err)
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm up to try to stick this into the lower level package. Should we defer that to a follow up PR so we don't need to wait on a new package release? I have a branch to update btcec here: calvinrzachman/btcd#1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followup is OK, but let's remove this check from this PR.

uint64 attempt_id = 2;

// Optional: Session key used to generate the onion/sphinx packet.
bytes session_key = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to use oneof from the grpc package for these two components which are optional @gemini-code-assist

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent suggestion! You're absolutely right that oneof is the perfect construct for first_hop_pubkey and first_hop_chan_id in the SendOnionRequest message.

Using oneof would formally enforce the mutual exclusivity that's currently handled by application logic, making the API contract clearer and preventing invalid states at the protobuf level. It also simplifies validation for clients and can lead to cleaner generated code.

I've prepared a code suggestion to implement this change in lnrpc/switchrpc/switch.proto:

message SendOnionRequest {
    // The raw onion packet to be sent across the network.
    bytes onion_blob = 1;

    // The attempt ID uniquely identifying this payment attempt. The caller can
    // expect to track results for the payment via this attempt ID.
    uint64 attempt_id = 2;

    // The total amount in millisatoshis required to complete a payment over
    // this route. This value includes the cumulative fees at each hop. The HTLC
    // extended to the first-hop in the route will need to have at least this
    // many (milli)satoshis.
    int64 amount = 5;

    // The absolute timelock or CLTV value that should be extended to the first
    // hop in the route. All other hops will decrement the time-lock as
    // described by the onion.
    uint32 timelock = 6;

    // The payment hash associated with the HTLC. This is needed for tracking
    // and debugging purposes.
    bytes payment_hash = 7;

    // The optional blinding point to use when constructing the HTLC.
    // If set, this will be used to blind the next hop in the route.
    bytes blinding_point = 8;

    // Optional custom TLV records to include in the UpdateAddHTLC message.
    map<uint64, bytes> custom_records = 9;

    // Optional opaque extra data to attach to the UpdateAddHTLC message.
    bytes extra_data = 10;

    oneof first_hop {
        // The first hop's public key where the onion will be sent.
        // The first hop must be specified either by channel ID or public key.
        bytes first_hop_pubkey = 3;

        // The channel id of the channel that must be taken to the first hop.
        // The first hop must be specified either by channel ID or public key.
        uint64 first_hop_chan_id = 4;
    }
}

This change would require regenerating the protobuf code and updating the SendOnion server-side logic to use the first_hop oneof field correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this along side the TrackOnionResponse updates in a follow up PR if that's okay.


// The optional blinding point to use when constructing the HTLC.
// If set, this will be used to blind the next hop in the route.
bytes blinding_point = 8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's start to use optional keyword for rpc proto ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the optional directive.

// Now that the original HTLC attempt has settled, we'll send the same
// onion again with the same attempt ID.
//
// NOTE: Currently, this does not error. When we make SendOnion fully
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why does this not error tho, because the result should still be in the result store, we haven't restarted yet ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh this is because the CircuitMap on its own doesn't defend against duplicates after the settle/fail result is back from the network and the circuit is torn down. That is partly why we want to use InitAttempt. So that even if the result is in the store, an RPC client retrying SendOnion (possibly due to timeouts or ErrDuplicateAdd response from the server being delayed or lost) is not at any risk of causing duplicate attempts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what I do not understand, this PR builds already ontop of the initAttempt PR or ? Have you rebased this PR on the current base branch ?

// expect to track results for the payment via this attempt ID.
uint64 attempt_id = 2;

// Optional: Session key used to generate the onion/sphinx packet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is correct, because we only need these for the error decryptor rather then creating an onion sphinx package or ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, can see how this comment might be confusing. While this key's purpose in TrackOnion is for decryption, it is not arbitrary; it is the exact same cryptographic material used to construct the onion. If it does not match, then I think decryption is impossible. Only the creator of the onion can decrypt the forwarding errors.

Updated the comment to better reflect this.

Adds the TrackOnion RPC to the switchrpc service. This
allows a caller to subscribe to the final outcome (settle
or fail) of a specific HTLC attempt.

This RPC is designed to be called after a successful
dispatch has been confirmed via the SendOnion RPC. It
should not be used to determine whether an HTLC dispatch
was received in an ambiguous network scenario. That
ambiguity must be resolved by retrying the idempotent
SendOnion RPC until a definitive acknowledgement is
received.

Once dispatch is confirmed, TrackOnion provides the mechanism
to wait for the result of the in-flight HTLC. The RPC allows
callers to specify whether error decryption should be handled
by the server or performed by the client, providing flexibility
for different error handling strategies.
This will allow us to leverage this function from the
Switch RPC server's BuildOnion implementation.
This will allow us to leverage this function from the
Switch RPC server's BuildOnion implementation.
Add RPC which constructs a sphinx onion packet for the
given payment route.

NOTE: This is added primarily to aid with the itests
added later.
This plugs in the Switch RPC server to the
rest of lnd. The service will be available for use.
Update so that "make unit-cover" uses tags in
a manner consistent with the rest of our unit
testing.
This demonstrates how the Switch and SendOnion rpc
behave when asked to dispatch duplicate onions. Notably,
the Switch circuit map detects this - but only if the
matching onion is still in flight. Once the circuit is
torn down, the duplicate is permitted by the Switch.

It is likely that we will add a layer of protection to the
SendOnion call itself to prevent duplicates even after the
first HTLC is no longer in-flight.
We declare each service's REST annotations in its own file.
This is optional in the v1 but mandatory when using v2 of
the grpc-gateway/v2 library.
Update the Switch RPC protos to make use of the
'optional' directive. Though this may not impact the
generated types or how the user interacts with these
types, it may serve to document the fact that they
are optional a bit better.
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great back and forth, congratulations on the PR 🎉

Please submit the follow-up PRs soon!

@ziggie1984 ziggie1984 merged commit 307e665 into lightningnetwork:elle-base-branch-payment-service Dec 23, 2025
40 of 41 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in v0.21 Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants