Skip to content

Commit 4d7a8a0

Browse files
committed
feat: extract robust SSH Connection Manager to simplify provision command
This commit extracts SSH connection management into a dedicated collaborative class to reduce the complexity of the provision command and improve maintainability. The SSH Connection Manager implements comprehensive robustness features for reliable VM communication. Key Changes: - Extract TorrustDeploy::Infrastructure::SSH::Connection class - Implement connection health checks with 30-second caching - Add automatic reconnection with configurable retry logic - Add connection expiration tracking (5-minute default) - Replace system command calls with Net::SSH2 library - Refactor provision command to use SSH Connection Manager - Add comprehensive test suite (unit, integration, E2E) Robustness Features: - Health validation using lightweight echo commands - Automatic recovery from stale connections and network issues - Connection lifetime management to prevent timeout problems - Configurable robustness settings (health_check_enabled, auto_reconnect) - Enhanced error handling and state management Technical Implementation: - Uses Net::SSH2 for reliable SSH communication - Moo object system with lazy connection initialization - Connection reuse with automatic invalidation when needed - Backward compatible API with enhanced reliability - Comprehensive POD documentation Testing: - Unit tests for all SSH Connection Manager methods - Integration tests for provision command workflow - E2E tests with Docker-based SSH server simulation - All 19 tests passing (unit + integration) Dependencies: - Added Net::SSH2 and Carp to cpanfile - Added libssh2-1-dev system dependency documentation - Updated README with installation and usage instructions This extraction is a key foundation for future refactoring work, providing a reliable SSH communication layer that will be used intensively across the application.
1 parent ba74577 commit 4d7a8a0

File tree

11 files changed

+1065
-40
lines changed

11 files changed

+1065
-40
lines changed

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,13 @@ local development.
4141

4242
## Installation
4343

44+
First, install system dependencies:
45+
46+
```bash
47+
# On Ubuntu/Debian:
48+
sudo apt install libssh2-1-dev
49+
```
50+
4451
Install cpanminus:
4552

4653
```bash
@@ -114,6 +121,12 @@ Before using the provision command, ensure you have:
114121
sudo usermod -aG libvirt $USER
115122
```
116123

124+
- **SSH development libraries** for Net::SSH2 Perl module:
125+
126+
```bash
127+
sudo apt install libssh2-1-dev
128+
```
129+
117130
- **Default libvirt storage pool** configured:
118131

119132
```bash
@@ -170,6 +183,7 @@ Run end-to-end tests that require local virtualization support:
170183
- Local machine with KVM/libvirt support
171184
- OpenTofu installed
172185
- Required system tools: `qemu-system-x86_64`, `sshpass`
186+
- SSH development libraries: `libssh2-1-dev`
173187
- Cannot run in CI environments
174188

175189
### All Tests

cpanfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ requires 'App::Cmd';
33
requires 'Moo';
44
requires 'namespace::clean';
55
requires 'Path::Tiny';
6+
requires 'Net::SSH2';
7+
requires 'Carp';
68

79
on 'test' => sub {
810
requires 'Test2::Suite';

lib/TorrustDeploy/App/Command/Provision.pm

Lines changed: 64 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ use v5.38;
44

55
use TorrustDeploy::App -command;
66
use TorrustDeploy::Provision::OpenTofu;
7+
use TorrustDeploy::Infrastructure::SSH::Connection;
78
use Path::Tiny qw(path);
89
use File::Spec;
910
use Time::HiRes qw(sleep);
@@ -54,14 +55,17 @@ sub execute {
5455
# Get VM IP address
5556
my $vm_ip = $tofu->get_vm_ip($tofu_dir);
5657

58+
# Create SSH connection
59+
my $ssh_connection = TorrustDeploy::Infrastructure::SSH::Connection->new(host => $vm_ip);
60+
5761
# Wait for cloud-init completion
58-
$self->_wait_for_cloud_init($vm_ip);
62+
$self->_wait_for_cloud_init($ssh_connection);
5963

6064
# Verify SSH key authentication after cloud-init completes
61-
$self->_verify_ssh_key_auth($vm_ip);
65+
$self->_verify_ssh_key_auth($ssh_connection);
6266

6367
# Show final summary
64-
$self->_show_final_summary($vm_ip);
68+
$self->_show_final_summary($ssh_connection);
6569
}
6670

6771
sub _copy_templates {
@@ -100,7 +104,7 @@ sub _copy_templates {
100104
}
101105

102106
sub _wait_for_cloud_init {
103-
my ($self, $vm_ip) = @_;
107+
my ($self, $ssh_connection) = @_;
104108

105109
say "Waiting for cloud-init to complete...";
106110
say "This may take several minutes while packages are installed and configured.";
@@ -117,10 +121,9 @@ sub _wait_for_cloud_init {
117121
while ($attempt < $max_attempts && !$ssh_connected) {
118122
$attempt++;
119123

120-
my $ssh_test = system("timeout 5 sshpass -p 'torrust123' ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'echo \"SSH connected\"' >/dev/null 2>&1");
121-
if ($ssh_test == 0) {
124+
if ($ssh_connection->test_password_connection()) {
122125
$ssh_connected = 1;
123-
say "✅ SSH password connection established to $vm_ip";
126+
say "✅ SSH password connection established to " . $ssh_connection->host;
124127
} else {
125128
if ($attempt % 6 == 0) { # Every 30 seconds
126129
say " [Waiting for SSH connection... ${attempt}0s elapsed]";
@@ -130,8 +133,8 @@ sub _wait_for_cloud_init {
130133
}
131134

132135
if (!$ssh_connected) {
133-
say "❌ Failed to establish SSH connection to $vm_ip after " . ($max_attempts * 5 / 60) . " minutes";
134-
$self->_print_cloud_init_logs($vm_ip);
136+
say "❌ Failed to establish SSH connection to " . $ssh_connection->host . " after " . ($max_attempts * 5 / 60) . " minutes";
137+
$self->_print_cloud_init_logs($ssh_connection);
135138
die "SSH connection failed";
136139
}
137140

@@ -142,16 +145,16 @@ sub _wait_for_cloud_init {
142145
while ($attempt < $max_attempts) {
143146
$attempt++;
144147

145-
my $check_result = system("timeout 10 sshpass -p 'torrust123' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'test -f $completion_file' >/dev/null 2>&1");
148+
my $result = $ssh_connection->execute_command("test -f $completion_file");
146149

147-
if ($check_result == 0) {
150+
if ($result->{success}) {
148151
say "✅ Cloud-init setup completed successfully!";
149152

150153
# Show completion message
151-
my $completion_content = `timeout 10 sshpass -p 'torrust123' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'cat $completion_file' 2>/dev/null`;
152-
if ($completion_content) {
153-
chomp $completion_content;
154-
say "📅 Completion marker: $completion_content";
154+
my $completion_result = $ssh_connection->execute_command("cat $completion_file");
155+
if ($completion_result->{success} && $completion_result->{output}) {
156+
chomp $completion_result->{output};
157+
say "📅 Completion marker: " . $completion_result->{output};
155158
}
156159
$cloud_init_success = 1;
157160
last;
@@ -167,70 +170,91 @@ sub _wait_for_cloud_init {
167170
}
168171

169172
if (!$cloud_init_success) {
170-
say "❌ Timeout waiting for cloud-init to complete on $vm_ip after " . ($max_attempts * 5 / 60) . " minutes";
171-
$self->_print_cloud_init_logs($vm_ip);
173+
say "❌ Timeout waiting for cloud-init to complete on " . $ssh_connection->host . " after " . ($max_attempts * 5 / 60) . " minutes";
174+
$self->_print_cloud_init_logs($ssh_connection);
172175
die "Cloud-init timeout";
173176
}
174177
}
175178

176179
sub _show_final_summary {
177-
my ($self, $vm_ip) = @_;
180+
my ($self, $ssh_connection) = @_;
178181

179182
say "📦 Final system summary:";
180183

181-
my $docker_version = `timeout 10 sshpass -p 'torrust123' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'docker --version 2>/dev/null || echo "Docker not available"' 2>/dev/null`;
184+
my $docker_result = $ssh_connection->execute_command('docker --version');
185+
my $docker_version = $docker_result->{success} ? $docker_result->{output} : "Docker not available";
182186
chomp $docker_version if $docker_version;
183187
say " Docker: $docker_version" if $docker_version;
184188

185-
my $ufw_status = `timeout 10 sshpass -p 'torrust123' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'ufw status 2>/dev/null | head -1 || echo "UFW not available"' 2>/dev/null`;
189+
my $ufw_result = $ssh_connection->execute_command('ufw status | head -1');
190+
my $ufw_status = $ufw_result->{success} ? $ufw_result->{output} : "UFW not available";
186191
chomp $ufw_status if $ufw_status;
187192
say " Firewall: $ufw_status" if $ufw_status;
188193

189194
say "Provisioning completed successfully!";
190-
say "VM is ready at IP: $vm_ip";
195+
say "VM is ready at IP: " . $ssh_connection->host;
191196
}
192197

193198
sub _print_cloud_init_logs {
194-
my ($self, $vm_ip) = @_;
199+
my ($self, $ssh_connection) = @_;
195200

196201
say "📄 Cloud-init logs (for debugging):";
197202

198203
# Print cloud-init-output.log
199204
say "=== /var/log/cloud-init-output.log ===";
200-
my $output_log = `timeout 30 sshpass -p 'torrust123' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'sudo cat /var/log/cloud-init-output.log 2>/dev/null || echo "Log file not available"' 2>/dev/null`;
201-
if ($output_log && $output_log !~ /^Log file not available/) {
202-
print $output_log;
205+
my $output_result = $ssh_connection->execute_command_with_sudo('cat /var/log/cloud-init-output.log');
206+
if ($output_result->{success}) {
207+
print $output_result->{output};
203208
} else {
204209
say "Cloud-init output log not available";
205210
}
206211

207212
say "=== /var/log/cloud-init.log ===";
208-
my $main_log = `timeout 30 sshpass -p 'torrust123' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null torrust\@$vm_ip 'sudo cat /var/log/cloud-init.log 2>/dev/null || echo "Log file not available"' 2>/dev/null`;
209-
if ($main_log && $main_log !~ /^Log file not available/) {
210-
print $main_log;
213+
my $main_result = $ssh_connection->execute_command_with_sudo('cat /var/log/cloud-init.log');
214+
if ($main_result->{success}) {
215+
print $main_result->{output};
211216
} else {
212217
say "Cloud-init main log not available";
213218
}
214219
}
215220

216221
sub _verify_ssh_key_auth {
217-
my ($self, $vm_ip) = @_;
222+
my ($self, $ssh_connection) = @_;
218223

219224
say "🔑 Checking SSH key authentication...";
220225

221-
my $ssh_key_path = "$ENV{HOME}/.ssh/testing_rsa";
222-
223-
# Test SSH key authentication
224-
my $result = system("timeout 10 ssh -i '$ssh_key_path' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o PasswordAuthentication=no torrust\@$vm_ip 'echo \"SSH key authentication successful\"' >/dev/null 2>&1");
226+
# SSH authentication might need time to fully stabilize after cloud-init reboot
227+
# Try with progressive delays: immediate, 5s, 10s, 15s
228+
my @retry_delays = (0, 5, 10, 15);
225229

226-
if ($result == 0) {
227-
say "✅ SSH key authentication is working correctly!";
228-
say "You can now connect using: ssh -i ~/.ssh/testing_rsa torrust\@$vm_ip";
229-
} else {
230-
say "❌ SSH key authentication failed";
231-
$self->_print_cloud_init_logs($vm_ip);
232-
die "SSH key authentication failed";
230+
for my $attempt (0..$#retry_delays) {
231+
if ($attempt > 0) {
232+
my $delay = $retry_delays[$attempt];
233+
say "⏳ Waiting ${delay}s before retry attempt " . ($attempt + 1) . "...";
234+
sleep $delay;
235+
}
236+
237+
# Create a fresh SSH connection for key authentication test
238+
# This ensures we don't have any state issues from cloud-init monitoring
239+
my $fresh_ssh = TorrustDeploy::Infrastructure::SSH::Connection->new(
240+
host => $ssh_connection->host
241+
);
242+
243+
if ($fresh_ssh->test_key_connection()) {
244+
say "✅ SSH key authentication is working correctly!";
245+
say "You can now connect using: ssh -i " . $fresh_ssh->ssh_key_path . " " . $fresh_ssh->username . "@" . $fresh_ssh->host;
246+
return;
247+
}
248+
249+
if ($attempt < $#retry_delays) {
250+
say "⚠️ SSH key authentication failed, will retry...";
251+
}
233252
}
253+
254+
# All retries failed
255+
say "❌ SSH key authentication failed after all retries";
256+
$self->_print_cloud_init_logs($ssh_connection);
257+
die "SSH key authentication failed";
234258
}
235259

236260
1;

0 commit comments

Comments
 (0)