Skip to content

Conversation

@sammuti
Copy link
Contributor

@sammuti sammuti commented Jan 12, 2026

Introduces ExecutionBackend trait abstraction to support multiple compute substrates (local subprocesses, Kubernetes
pods, etc.) through a uniform interface. Refactors execution to cleanly separate CLI local runs from tower-runner
server-side execution.

Changes

New abstraction layer (crates/tower-runtime/src/execution.rs)

  • ExecutionBackend trait - defines interface for compute substrates
  • ExecutionHandle trait - manages running executions (status, logs, termination)
  • ExecutionSpec - unified specification for execution requests
  • Supporting types: BundleRef, RuntimeConfig, CacheConfig, ResourceLimits, NetworkingSpec

Backend implementations

  • SubprocessBackend - For tower-runner server-side native execution
    • Implements ExecutionBackend with full logs() support for multiple consumers
    • Used by tower-runner to stream logs to control plane
    • Located in crates/tower-runtime/src/backends/subprocess.rs
  • CliBackend (new) - For CLI --local runs
    • Simple single-consumer pattern matching original develop behavior
    • Caller creates channel and owns receiver directly
    • No complex logs() method needed
    • Located in crates/tower-runtime/src/backends/cli.rs

Refactoring

  • Updated tower-cmd/run.rs to use CliBackend for --local runs
  • Removed dead code: AppLauncher struct (replaced by direct backend usage), unused imports

Design Rationale

The abstraction cleanly separates two distinct use cases:

  1. CLI --local execution: Simple, single consumer pattern for user-facing CLI runs
  2. tower-runner server execution: Multi-consumer pattern supporting log streaming to control plane

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an ExecutionBackend trait abstraction to enable Tower to support multiple compute substrates (local processes, Kubernetes pods, etc.) through a uniform interface, while refactoring existing local execution to implement this new abstraction.

Changes:

  • Added new execution abstraction layer with ExecutionBackend and ExecutionHandle traits
  • Implemented LocalBackend wrapping existing LocalApp functionality
  • Added dependencies async-trait and uuid for trait support and ID generation

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/tower-runtime/src/execution.rs Defines core execution traits, types, and abstractions for backend-agnostic execution management
crates/tower-runtime/src/local.rs Implements LocalBackend and LocalHandle to adapt existing subprocess execution to new abstraction
crates/tower-runtime/src/lib.rs Exports new execution module
crates/tower-runtime/src/errors.rs Adds error variants for execution abstraction (AppNotStarted, NoHandle, InvalidPackage)
crates/tower-runtime/Cargo.toml Adds async-trait and uuid dependencies
Cargo.toml Defines workspace-level versions for async-trait and uuid

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 615 to 617
package: match spec.bundle {
BundleRef::Local { path } => Package::from_unpacked_path(path).await,
},
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Package::from_unpacked_path call is not wrapped in error handling. If this operation fails, the error message will be generic. Consider adding context about which bundle path failed to load to improve debugging.

Copilot uses AI. Check for mistakes.
Comment on lines 644 to 645
typical_cold_start_ms: 1000, // ~1s for venv + sync
typical_warm_start_ms: 100, // ~100ms with warm cache
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hardcoded timing estimates should be documented as approximate values that may vary based on system resources and bundle complexity. Consider adding a comment explaining these are typical values, not guarantees.

Suggested change
typical_cold_start_ms: 1000, // ~1s for venv + sync
typical_warm_start_ms: 100, // ~100ms with warm cache
// The following timing values are typical, approximate estimates and may vary
// based on system resources, bundle complexity, and runtime conditions.
typical_cold_start_ms: 1000, // ~1s for venv + sync on a typical development machine
typical_warm_start_ms: 100, // ~100ms with a warm cache under typical conditions

Copilot uses AI. Check for mistakes.
Comment on lines 728 to 736
loop {
let status = self.status().await?;
match status {
ExecutionStatus::Preparing | ExecutionStatus::Running => {
tokio::time::sleep(Duration::from_millis(100)).await;
}
_ => return Ok(status),
}
}
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling interval of 100ms is hardcoded. For long-running executions, this creates unnecessary overhead. Consider making the polling interval configurable or implementing an event-based notification mechanism instead of polling.

Copilot uses AI. Check for mistakes.
pub async fn status(&self) -> Result<ExecutionStatus, Error> {
self.app
.as_ref()
.ok_or(Error::AppNotStarted)?
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'app not started' is vague. Consider a more descriptive error such as 'cannot get status: no app is currently running' to provide better context to users.

Copilot uses AI. Check for mistakes.
Comment on lines 360 to 363
pub struct AppLauncher<A: App> {
backend: Arc<A::Backend>,
app: Option<A>,
}
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AppLauncher struct and its methods lack documentation comments. Since this is a public API component of the new abstraction, it should include doc comments explaining its purpose, usage patterns, and lifecycle management responsibilities.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@bradhe bradhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that you want to land? It's pretty WIP-y it seems to me, has loads of duplicated stuff from elsewhere in the tower-runtime crate. I've left some comments for now, please let me know how you'd like to proceed.

let opts = StartOptions {
ctx: spec.telemetry_ctx,
package: match spec.bundle {
BundleRef::Local { path } => Package::from_unpacked_path(path).await,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called PackageRef not BundleRef

@sammuti sammuti changed the title [TOW-1299] App Isolation traits [WIP][TOW-1299] App Isolation traits Jan 12, 2026
uuid = { workspace = true }

# K8s dependencies (optional)
k8s-openapi = { version = "0.23", features = ["v1_31"], optional = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update deps to latest

@sammuti sammuti changed the title [WIP][TOW-1299] App Isolation traits [TOW-1299] App Isolation traits Jan 14, 2026
Copy link
Contributor

@bradhe bradhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another review here. Let's review my feedback synchronously.

Comment on lines 599 to +601
/// monitor_local_status is a helper function that will monitor the status of a given app and waits for
/// it to progress to a terminal state.
async fn monitor_local_status(app: Arc<Mutex<LocalApp>>) -> Status {
debug!("Starting status monitoring for LocalApp");
async fn monitor_cli_status(handle: Arc<Mutex<tower_runtime::backends::cli::CliHandle>>) -> Status {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was renamed to "cli" but runner in third party infrastructure (e.g. self-hosted runners) will use local processes too, not Kubernetes...

Comment on lines +169 to +208
// Build container spec
// Note: In K8s, 'command' = entrypoint, 'args' = command
let container = Container {
name: "app".to_string(),
image: Some(spec.runtime.image.clone()),
env: Some(env_vars),
command: spec.runtime.entrypoint.clone(), // K8s command = entrypoint
args: spec.runtime.command.clone(), // K8s args = command
volume_mounts: if volume_mounts.is_empty() {
None
} else {
Some(volume_mounts)
},
resources: Some(resources),
working_dir: Some("/app".to_string()),
..Default::default()
};

// Build pod spec
let pod_spec = PodSpec {
containers: vec![container],
volumes: if volumes.is_empty() {
None
} else {
Some(volumes)
},
restart_policy: Some("Never".to_string()),
..Default::default()
};

Ok(Pod {
metadata: k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta {
name: Some(format!("tower-run-{}", spec.id)),
namespace: Some(self.namespace.clone()),
labels: Some(labels),
..Default::default()
},
spec: Some(pod_spec),
..Default::default()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming a change for this is coming?

Comment on lines +214 to +215
/// Get current execution status
async fn status(&self) -> Result<Status, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get the status of the execution environment setup or the status of the app that's running? Or both?

Comment on lines +391 to +392
// Create ConfigMap with bundle contents and get path mapping
let path_mapping = self.create_bundle_configmap(&spec).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just calling this out for myself that I expect this will go away.

Ok(match phase.as_str() {
"Pending" => Status::None,
"Running" => Status::Running,
"Succeeded" => Status::Exited,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is meant to get the status of the app in it's lifecycle, this means that once the Pod is provisioned, it'll get marked as "Exited" right?

Comment on lines +517 to +519
if tokio::time::timeout(std::time::Duration::from_secs(60), condition)
.await
.is_ok()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a container takes longer than 60 seconds to log?

line,
};
if tx.send(output).is_err() {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probs wanna log the error?

Comment on lines +548 to +556
async fn terminate(&mut self) -> Result<(), Error> {
let pods: Api<Pod> = Api::namespaced(self.client.clone(), &self.namespace);

pods.delete(&self.pod_name, &DeleteParams::default())
.await
.map_err(|_| Error::TerminateFailed)?;

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SubprocessHandle guarantee the process is dead by the end of terminate or is it fire and forget?

Comment on lines +585 to +586
// Delete pod
self.terminate().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup is typically called after the app is already terminated/exited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a k8s.rs in here for a kubernetes app now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants