Skip to content

[Nexthop] Distro CLI reprovision #976

Open
travisb-nexthop wants to merge 2 commits intofacebook:mainfrom
nexthop-ai:distro_cli_2.1_reprovision
Open

[Nexthop] Distro CLI reprovision #976
travisb-nexthop wants to merge 2 commits intofacebook:mainfrom
nexthop-ai:distro_cli_2.1_reprovision

Conversation

@travisb-nexthop
Copy link
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

The fboss-image device reprovision command logs onto the switch, wipes
enough of the installed image to be confident that provisioning will
proceed or loudly fail, then reboots.

Test Plan

Run the reprovision command:

$ ./fboss-image device dc:da:4d:fc:ad:2d reprovision
Warning: Permanently added '10.250.33.2' (ED25519) to the list of known hosts.
root@10.250.33.2's password:
50+0 records in
50+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 0.0181773 s, 2.9 GB/s
Yes/No? Warning: Partition /dev/nvme0n1p3 is being used. Are you sure you want to continue?
yes
Ignore/Cancel? ignore
Error: Partition(s) 3 on /dev/nvme0n1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
Information: You may need to update /etc/fstab.

Rebooting.
Timeout, server 10.250.33.2 not responding

See on the switch console:

[root@fboss103 ~]# [367274.792918] reboot: Restarting system
[15][19][11][32][A1][A9][A9][A9][A8][AA][AE][AF][AF][CD][B0][C1][B1][C2][C3][B1][B4][B8][C5][B2][C6][C7][B3][B6][B6][B7][B7][B7][B7][B7][BE][D2][D6][B9][C7][C7][CC][B7][B8][C9][BA][CB][BB][D0][D0][D0][D0][D0][D1][D1][D1][CA][B7][D3][CC][BC][CE][C6][AF][4F][3B][33][60][
61][9A][62][68][69][6A][79][70][71][90][91][92][94][94

Version 2.22.1286. Copyright (C) 2024 AMI
BIOS Date: 06/27/2024 13:16:17 Ver: NL402
Press <DEL> or <ESC> to enter setup.

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv6 on MAC: DC-DA-4D-FC-AD-2D. Press ESC key to abort PXE boot..

The Distro CLI getip and ssh commands make a best-effort attempt to
determine an IP for the given switch based on its MAC address and then
either return that IP or directly ssh to it.

It does so by connecting to the Distro Infra container and checking on
the configured interface for pre-existing IP neighbor/ARP entries. If
no such entries are found it will attempt a subnet ping to refresh the
Linux kernel neighbor entries.

In most cases this is sufficient after the system has PXE booted and
therefore the neighbor cache is filled.

Testing is as easy as starting the Distro Infra container:
```
$ ./distro_infra.sh --intf eth1 --persist-dir data
```

Then running the fboss-image device getip command with the appropriate
MAC address:
```
$ ./fboss-image device dc:da:4d:fc:ad:2d getip
[0.00s] Getting IP for device dc:da:4d:fc:ad:2d
10.250.33.2
```
The fboss-image device reprovision command logs onto the switch, wipes
enough of the installed image to be confident that provisioning will
proceed or loudly fail, then reboots.

Testing:

Run the reprovision command:
```
$ ./fboss-image device dc:da:4d:fc:ad:2d reprovision
Warning: Permanently added '10.250.33.2' (ED25519) to the list of known hosts.
root@10.250.33.2's password:
50+0 records in
50+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 0.0181773 s, 2.9 GB/s
Yes/No? Warning: Partition /dev/nvme0n1p3 is being used. Are you sure you want to continue?
yes
Ignore/Cancel? ignore
Error: Partition(s) 3 on /dev/nvme0n1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
Information: You may need to update /etc/fstab.

Rebooting.
Timeout, server 10.250.33.2 not responding
```

See on the switch console:
```
[root@fboss103 ~]# [367274.792918] reboot: Restarting system
[15][19][11][32][A1][A9][A9][A9][A8][AA][AE][AF][AF][CD][B0][C1][B1][C2][C3][B1][B4][B8][C5][B2][C6][C7][B3][B6][B6][B7][B7][B7][B7][B7][BE][D2][D6][B9][C7][C7][CC][B7][B8][C9][BA][CB][BB][D0][D0][D0][D0][D0][D1][D1][D1][CA][B7][D3][CC][BC][CE][C6][AF][4F][3B][33][60][
61][9A][62][68][69][6A][79][70][71][90][91][92][94][94

Version 2.22.1286. Copyright (C) 2024 AMI
BIOS Date: 06/27/2024 13:16:17 Ver: NL402
Press <DEL> or <ESC> to enter setup.

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv6 on MAC: DC-DA-4D-FC-AD-2D. Press ESC key to abort PXE boot..
```
@travisb-nexthop
Copy link
Contributor Author

This is stacked on top of #975

@travisb-nexthop travisb-nexthop marked this pull request as ready for review March 4, 2026 19:11
@travisb-nexthop travisb-nexthop requested a review from a team as a code owner March 4, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant