Upstream and dnst keyset TSIG support. (resolves #65)#564
Conversation
- New `cascade tsig add` client command to request that the daemon add a TSIG key to the Cascade TSIG store. - New cascaded POST /tsig/ API to add a TSIG key to the Cascade TSIG store. - Extension of the `cascade zone add` command `--source` argument with an optional `!<TSIG key name>` syntax to define the TSIG key that Cascade should use when sending an XFR request to the upstream. - Pass the key defined by the source to the zone loader instead of None (the zone loader is already capable of using the key, it just wasn't being told which key to use)
As Base64 is typically how tooling present/accept TSIG key secret data to/from users.
Begin with a test that should fail: fetching a zone without using the required TSIG key.
Updated based on the NSD documentation at https://nsd.docs.nlnetlabs.nl/en/latest/manpages/nsd.conf.html#EXAMPLE.
Also, reverse the change on failure to register the zone. Also, add some missing zone server updates and failure to mark state as dirty.
| Incoming DNS messages that are TSIG signed will be rejected if the key used | ||
| to sign the message is not registered with Cascade. |
There was a problem hiding this comment.
The description is accurate, added keys are relevant for all zones because once a key is added to the global Cascade store, any incoming DNS message (whether from upstream, e.g. NOTIFY, or from downstream, e.g. AXFR or SOA) will be handled by the
TsigMiddlewareSvcwhich, even if a zone is not configured to use TSIG, will still reject the incoming message if that message uses a TSIG key which is not in the global Cascade TSIG store.
The description may be accurate, but the behavior sounds far from ideal. It sounds like the TSIG middleware service activates once the TSIG key store is non-empty; but this means TSIG config for one zone affects others, which is IMO quite surprising. Cascade should only allow TSIG keys that are relevant for the zone being queried; it's okay to document the real implemented behavior, but I think this description should also mention the direction we want to move to. We should create a GH issue for reaching the desired TSIG behavior and we could then link to it here.
This could be worked around by adding a custom middleware service layer impl between
TsigMiddlewareSvcandNotifyMiddlewareSvcthat does the "correct key" and "correct no key" checks.
Perhaps we could just add a check for this in the new ZoneService type. It has access to all the right information.
| -------- | ||
|
|
||
| https://cascade.docs.nlnetlabs.nl | ||
| Cascade online documentation |
There was a problem hiding this comment.
"Cascade online documentation" doesn't make sense grammatically...
|
|
||
| .. option:: publication-nameservers = [] | ||
|
|
||
| A set of nameservers to use when checking for rrsiG propagation during a |
| :RFC:`4648` Base64 encoded secret key material. The number of bytes prior | ||
| to encoding must be correct for the specified ``<ALGORITHM>``. | ||
|
|
||
| Can also be a path to a file containing the Base64 encoded secret material. |
There was a problem hiding this comment.
Could this be ambiguous, e.g. is foo/bar (or something like it) could be valid Base64 as well as a filesystem path. One way to make this unambiguous is to require paths to start with / or ./.
| addr: SocketAddr, | ||
|
|
||
| /// The name of a TSIG key, if any. | ||
| tsig_key: Option<String>, |
There was a problem hiding this comment.
Why not store Option<Name<Bytes>> here, and eliminate the fallibility of converting into cascade_api::ZoneSource?
There was a problem hiding this comment.
domain::base::Name isn't available here, only cascade_api types. The latter currently exposes TsigKeyName which is defined as domain::base::Name<octseq::Array<255>> which if used here causes Clippy to contain about an overly large enum variant. How would you suggest to proceed here?
| Cascade is designed to be deployed between a hidden upstream nameserver and | ||
| public downstream nameservers. The hidden upstream serves the unsigned zone, | ||
| Cascade signs it and passes it to the downstream nameservers for publication | ||
| to consumers. |
There was a problem hiding this comment.
Sure... but we intend to support that. The original paragraph here says "Cascade is designed to be" -- i.e. its intention and not necessarily the reality. I think zonefile-based pipelines are part of that design, even if they are not fully supported yet.
| Controlling automatic key rollover zone transfer settings | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| When using automatic key rollover (the default) Cascade will attempt to verify | ||
| that certain key properties of the signed zone being served to consumers are | ||
| correct. | ||
|
|
||
| This verification is done by transferring the zone and inspecting it. By | ||
| default transfer is attempted from the nameserver identified by the MNAME | ||
| field of the apex SOA record in the zone. | ||
|
|
||
| If an alternate nameserver should be queried instead of the MNAME | ||
| nameserver, or if a specific port number or TSIG key should be used | ||
| to request the transfer, you will also need to configure the Cascade | ||
| key manager to fetch the zone correctly. This can be done via the | ||
| ``key-manager.publication-nameservers`` policy setting. |
There was a problem hiding this comment.
It is in this document as it relates to zone transfers.
At first glance, it appears that this document details how to use Cascade in a zone transfer based pipeline, with regards to its input and output. But the key manager's querying of public nameservers feels distinct from that. Sure, both relate to the underlying mechanism of XFRs, but that is why I don't understand the purpose of this document -- is its purpose "here is how to use Cascade in a zone transfer based pipeline" or "here is every zone transfer related consideration for using Cascade"?
| state.tsig_store.mark_dirty(center); | ||
| drop(state); | ||
| save_now(center); |
There was a problem hiding this comment.
I see. Please add a comment.
|
Following an internal discussion we have agreed to merge this PR as-is. I have extracted a few notable items to separate GH issues so that we don't lose track of them: |
Summary
This PR implements two of three pieces of work needed for #301: upstream TSIG support and support for configuring
dnst keysetto override the nameserver to test against when in auto mode. The third piece, downstream TSIG support, is implemented by PR #587.Upstream TSIG support consists of TSIG signing XFR related requests sent to the upstream nameserver, accepting and verifying the TSIG key used when we receive a NOTIFY from the upstream nameserver, and adding support for the operator to configure Cascade to know the details of a TSIG key and to know which TSIG key to use with the upstream nameserver.
Note regarding downstream TSIG
Although downstream TSIG support has its own separate PR #587, some of the functionality is actually enabled by this PR. For example, by adding support in this PR for adding TSIG keys to the TSIG key store, that automatically makes those keys available to the
TsigMiddlewareSvcwhich is already used by the zone server, it just had no keys to work with until now.Known issues:
NotifyMiddlewareSvcfrom thedomaincrate which invokes a callback when a NOTIFY is received but fails to pass the TSIG key used to the callback defined in Cascade.Changes made
CLI changes:
cascade tsigsubcommand toadd,listandremoveTSIG keys.cascade zone addcommand--sourceargument with an optional^<TSIG key name>suffix (aligned with the syntax chosen in Initial support forset tsig-store-pathandset publication-nameserver. dnst#166) to define the TSIG key that Cascade should use when sending XFR related requests to the upstream.Policy changes:
key-manager.publication-nameserverspolicy setting that corresponds to thednst keyset set publication-nameserverscommand.Daemon changes:
/tsig/API endpoints to match the new CLI commands.hmac-prefixed names for TSIG algorithms.zone addis done quickly enough aftertsig addthatdnst keysetdoesn't also try and read the TSIG store JSON file and find that the key is missing as it wasn't written to disk yet (due to the 5 second wait before dirty state is flushed to disk).Documentation changes:
cascade-tsig.rstman page for the newcascade tsigsubcommand.cascade-zone.rstman page for the extension of thezone add --sourceargument syntax.IntegrationsTOC entry toHSM Integrationsand added aNameserver IntegrationsTOC entry.Nameserver Integrations/NSDpage showing how to use NSD as an upstream, optionally with TSIG.Zone Transferspage describing how Cascade communicates using zone transfers with upstreams and downstreams.System test changes:
dnstwith support for the newdnst keyset set tsig-store-pathanddnst keyset set publication-nameserversubcommands.tsig-keykey definition and a newprimary-tsigpattern that is modeled after theprimarypattern but which requires the newtsig-key, and adds anexample-tsig.testzone that uses it.upstream-tsigsystem test which tests XFR against NSD using the new "TSIG required" zone and associated settings, firstly deliberately without the TSIG key configured, so zone loading fails, then with it configured such that zone loading succeeds.This PR still requires:
cascade zone add, e.g. for NOTIFY in authentication? For TSIG request to upstream authentication? What about SOA queries sent to the upstream? Does the Cascade behaviour match the NSD behaviour and if not why not?tsig addsubcommand and the extension to the interpretation of thezone add --sourceargument.tsig listCLI command to list TSIG keys.tsig removeCLI command to remove TSIG keys. Removing an in-use TSIG key should presumably be disallowed. Perhaps there also needs to be a way to list the zones using a TSIG key, maybecascade tsig showcould show that?Possibly also extendIgnore this for now as it would collide with PR Status rework #567 and isn't strictly needed for this PRcascade zone statusCLI command to show which TSIG key if any the zone is using with its source?INFO cascaded::loader::zone: Setting source of zone 'nl'logging.To think about:
If you are changing Rust code or integration tests (
Cargo.*,crates/,etc/,integration-tests/,src/):actthrough theact-wrapper(as described inTESTING.md)?If you are adding/deleting man pages:
man_pagesconfig indoc/manual/source/conf.py?Cargo.toml?If you are modifying man pages: