Skip to content

Commit 2247343

Browse files
JohnGarbuttmarkgoddard
authored andcommitted
Support batching up commands
When you have around 60 baremetal nodes attached to a single switch, it takes a long time to execute all those commands. This gets worse when you limit the number of concurrent ssh connections. Here we look to batch up commands to send to the switch together using a single connection. The results of each port's commands are returned when available. This is implemented using etcd as a queueing system. Commands are added to an input key, then a worker thread processes the available commands for a particular switch device. We pull off the queue using the version at which the keys were added, giving a FIFO style queue. The result of each command set are added to an output key, which the original request thread is watching. Distributed locks are used to serialise the processing of commands for each switch device. Various neat etcd features are used here to alleviate some of the issues of distributed task coordination, including transactions, leases, watches, historical key/value tracking, etc. Co-Authored-By: Mark Goddard <mark@stackhpc.com> Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c (cherry picked from commit 45b237b)
1 parent 129eccd commit 2247343

File tree

9 files changed

+990
-12
lines changed

9 files changed

+990
-12
lines changed

doc/source/configuration.rst

Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,9 @@ for the Dell PowerConnect device::
118118
ngs_switchport_mode = access
119119

120120
Dell PowerConnect devices have been seen to have issues with multiple
121-
concurrent configuration sessions. See :ref:`synchronization` for details on
122-
how to limit the number of concurrent active connections to each device.
121+
concurrent configuration sessions. See :ref:`synchronization` and
122+
:ref:`batching` for details on how to limit the number of concurrent active
123+
connections to each device.
123124

124125
for the Brocade FastIron (ICX) device::
125126

@@ -201,8 +202,16 @@ connection URL for the backend should be configured as follows::
201202
[ngs_coordination]
202203
backend_url = <backend URL>
203204

204-
The default is to limit the number of concurrent active connections to each
205-
device to one, but the number may be configured per-device as follows::
205+
The backend URL format includes the Tooz driver as the scheme, with driver
206+
options passed using query string parameters. For example, to use the
207+
``etcd3gw`` driver with an API version of ``v3`` and a path to a CA
208+
certificate::
209+
210+
[ngs_coordination]
211+
backend_url = etcd3+https://etcd.example.com?api_version=v3,ca_cert=/path/to/ca/cert.crt
212+
213+
The default behaviour is to limit the number of concurrent active connections
214+
to each device to one, but the number may be configured per-device as follows::
206215

207216
[genericswitch:device-hostname]
208217
ngs_max_connections = <max connections>
@@ -216,6 +225,35 @@ timeout of 60 seconds before failing. This timeout can be configured as follows
216225
...
217226
acquire_timeout = <timeout in seconds>
218227

228+
.. _batching:
229+
230+
Batching
231+
========
232+
233+
For many network devices there is a significant SSH connection overhead which
234+
is incurred for each network or port configuration change. In a large scale
235+
system with many concurrent changes, this overhead adds up quickly. Since the
236+
Antelope release, the Generic Switch driver includes support to batch up switch
237+
configuration changes and apply them together using a single SSH connection.
238+
239+
This is implemented using etcd as a queueing system. Commands are added
240+
to an input key, then a worker thread processes the available commands
241+
for a particular switch device. We pull off the queue using the version
242+
at which the keys were added, giving a FIFO style queue. The result of
243+
each command set are added to an output key, which the original request
244+
thread is watching. Distributed locks are used to serialise the
245+
processing of commands for each switch device.
246+
247+
The etcd endpoint is configured using the same ``[ngs_coordination]
248+
backend_url`` option used in :ref:`synchronization`, with the limitation that
249+
only ``etcd3gw`` is supported.
250+
251+
Additionally, each device that will use batched configuration should include
252+
the following option::
253+
254+
[genericswitch:device-hostname]
255+
ngs_batch_requests = True
256+
219257
Disabling Inactive Ports
220258
========================
221259

0 commit comments

Comments
 (0)