Skip to content

Conversation

bnaecker
Copy link
Collaborator

@bnaecker bnaecker commented Sep 26, 2025

  • Add database queries for marking any number of IP Pools as delegated for Oxide internal use, rather than assuming there's just one.
  • Rework queries for linking / unlinking IP Pool and Silo, making them concurrency-safe and preventing deletion when there are IPs or we're trying to delete the last internal IP Pool.
  • The public API does not change here. We're still assuming exactly one IP Pool per IP version for internal usage, and not allowing the link or pool to appear in the API. That'll be a follow-up commit.
  • Fixes Enforce relationship between IP Pools reserved for Oxide services silo #8948

@bnaecker bnaecker force-pushed the internal-support-for-ip-pool-delegation branch from f3c16d2 to 15bc99a Compare September 27, 2025 15:30
- Add database queries for linking any number of IP Pools to the Oxide
  internal Silo, rather than assuming there's just one.
- Rework queries for linking / unlinking IP Pool and Silo, making them
  concurrency-safe and preventing deletion when there are IPs or we're
  trying to delete the last internal IP Pool.
- The public API does not change here. We're still assuming exactly one
  IP Pool per IP version for internal usage, and not allowing the link
  or pool to appear in the API. That'll be a follow-up commit.
- Fixes #8948
@bnaecker bnaecker force-pushed the internal-support-for-ip-pool-delegation branch from 15bc99a to 44a2d02 Compare September 29, 2025 20:31
@bnaecker
Copy link
Collaborator Author

bnaecker commented Oct 1, 2025

I'm actually rethinking this a bit. There's a lot of complexity here, mostly because we use the ip_pool_resource table to link both customer silos, and to reflect the fact that a pool is "delegated" for our internal usage. The latter is when we link the pool to our hidden internal silo.

This makes the database queries pretty complicated, since we're trying to enforce that a pool is one of:

  • unlinked
  • linked to 1 or more customer silos
  • linked to our internal silo, of which there is exactly one

This also means you can't just look at the ip_pool table to know whether a pool is delegated, you have to join it with the resource table first.

I think I want to restructure this. I want to add an is_delegated column on the ip_pool table, which takes the place of the link between the pool and our internal silo. The ip_pool_resource table would only represent links to customer resources. This should make the conditional queries that enforce the above invariants simpler, and will also make #8947 easier, since we can tell if a pool is delegated from just one table.

All that to say, I'm moving this to a draft now, and I'll un-draftify it when I'm happy again.

@bnaecker bnaecker marked this pull request as draft October 1, 2025 16:34
@david-crespo
Copy link
Contributor

So would a delegated pool be disallowed from being linked? Or could a pool do both?

@bnaecker
Copy link
Collaborator Author

bnaecker commented Oct 1, 2025

That would be disallowed, just like it is today and in this PR. A pool is delegated to Oxide XOR linkable to customer silos.

- Add an `is_delegated` column to the `ip_pool` table
- Enforce that pools are delegated XOR linkable to customer silos by
  looking at this table and using conditional update queries.
@bnaecker bnaecker marked this pull request as ready for review October 3, 2025 18:22
@bnaecker
Copy link
Collaborator Author

bnaecker commented Oct 3, 2025

Alright, I've marked this ready for review now. The latest commits add an is_delegated column to the ip_pool table. This is true if the pool is delegated to Oxide, or false if an operator wants to revoke it. This takes the place of linking to the private, internal Oxide silo, meaning the ip_pool_resource table is just for linking to customer silos. This is a bit simpler than the previous code, especially for doing things like "listing all the delegated pools". But it's not as big a simplification as I'd hoped, so I'm open to reverting it to 44a2d02 if we want to link the internal Silo the same way.

@bnaecker bnaecker changed the title Internal support for linking any number of IP Pools and Silos. Database support for delegating IP Pools for Oxide internal use Oct 3, 2025
pub rcgen: i64,

/// True if the IP Pool has been delegated for Oxide use.
pub is_delegated: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming at this review without much context, I'm wondering if is_delegated is too vague/generic and should say what it's delegated to, although I don't know how to be more specific without making it a lot longer. is_delegated_to_services? is_delegated_to_oxide? Entirely possible this is fine as-is and "look at the doc comment to see what it's delegated to", especially if we don't think we'll ever have other kinds of IP pool delegation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgallagher and I talked a bit live about this. I think we'll change the name here to something like purpose or intended_use, and make this an enum rather than a bool. We'd have two variants today, indicating the pool can be used for external silos or for Oxide internal use. We can add variants if we need in the future for expressing things like "use by Nexus" or "use by external DNS".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all this in e4ff3b8.

// NOTE: It'd be better to do one roundtrip to the DB, but this is
// rarely-used right now. We also want to return the authz and database
// objects, so we need the lookup-path mechanism.
// TODO-remove: Use list_ip_pools_for_internal instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be addressed on this PR? Will callers of this method either fail (if the original built-in pools no longer exist) or succeed but with incorrect/incomplete information (if additional pools have been delegated)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK as-is, because you cannot actually change these pools yet. There's no API in Nexus for doing that. I have a follow-up PR in the works which makes the delegation API public, which also changes how this method works.

.await
.unwrap();

// We should be able to delete one of these.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the delegated pools to be down to only IPv6 - do we know that works throughout the rest of the control plane? (E.g., I don't think reconfigurator cares, but AFAIK we've never tested an IPv6 only setup.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same comment as above applies -- this tests the internal database logic, but you can't exercise this through an API yet. I don't think it's possible to get into a situation in a real deployment where there is only an IPv6 pool. Even if there were a way to delete the IPv4 pool, I think that'd fail because there are addresses allocated out of it anyway.

- Add an enum for IP Pool reservation type, currently just representing
  internal or external uses.
- Rework queries to be phrased in terms of "reserving" a pool for a
  specific use and checking for valid reservation types at query time.
- Add expectorate tests for all complicated queries.
@bnaecker bnaecker requested a review from jgallagher October 7, 2025 18:25
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good; I like the enum over the boolean quite a lot. Mostly just have more vague questions for you. 😅

)
AND CAST(
IF(
(SELECT count(1) FROM ip_pool WHERE time_deleted IS NULL AND reservation_type = $5 LIMIT 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud - I think in practice this extra guard is probably not necessary? The guard above that we can't change the type of a pool if there are any services using IPs from it will prevent deleting any in-use pool, and there will always be at least one in-use pool (otherwise we wouldn't have any services with external IPs at all), right? But this seems super cheap and maybe makes it very clear that we always have to have at least one, so I'm on board with keeping it anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the checks are doing slightly different things. The first one ensures that we don't delete the pool if there are addresses used from it. This one checks that there are at least two pools of this same reservation type. Those could be different in a future where we make more fine-grained distinctions between the types of internal pools, like NTP, DNS, etc. It might still be the case that the address-use guard is sufficient, but I'm with you that this is cheap and conservative :)

NOT (
resource_type = 'silo' AND
resource_id = '001de000-5110-4000-8000-000000000001' AND
is_default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is removing this constraint correct as a part of this PR? I'm not sure how reservation_type / multiple external pools interacts with is_default. Will we let people set a default external service pool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- I put this in place recently when I thought we'd reserve pools by linking them to the internal silo. Since we're not doing that anymore, it seemed more confusing than helpful to keep the constraint.

However! I realize I need a data migration to remove those links to the internal silo. I'll add that now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the deletion in beb5523

AND reservation_type = $3
AND CAST(
IF(
EXISTS(SELECT 1 FROM ip_pool_resource WHERE ip_pool_id = $4 LIMIT 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thinking out loud - this all looks correct to me, but I'm vaguely worried that our checks to switch between reservation types are so different. If I want to change something away from "for external silos", I have to check the ip_pool_resource table, but if I want to change something away from "for Oxide internal", I have to check the external_ip table. Should rows in external_ip also have corresponding rows in ip_pool_resource?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rows in external_ip also have corresponding rows in ip_pool_resource?

I think this is true for addresses used by external silo resources, like instances. So to move a pool away from "for Oxide internal", we couldn't use a check against that table. Not to confuse things further, but we could do that if we went back to linking the pool to the internal silo when we reserve it "for Oxide internal".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enforce relationship between IP Pools reserved for Oxide services silo

3 participants