Skip to content

Fix: Model downloads failing on VPN connections (SSL/TLS decrypt error). QoL: Retry button#1574

Merged
mohnjiles merged 12 commits intoLykosAI:mainfrom
NeuralFault:vpn-ssl-downloadfix
Apr 10, 2026
Merged

Fix: Model downloads failing on VPN connections (SSL/TLS decrypt error). QoL: Retry button#1574
mohnjiles merged 12 commits intoLykosAI:mainfrom
NeuralFault:vpn-ssl-downloadfix

Conversation

@NeuralFault
Copy link
Copy Markdown
Contributor

@NeuralFault NeuralFault commented Mar 11, 2026

This pull request introduces user-facing support for retrying failed model downloads, including UI updates, business logic, and robust handling of transient network errors. The main changes add a "Retry" button to the download manager, implement the underlying retry logic with exponential backoff, and ensure proper state management for retries.

User-facing and ViewModel changes:

  • Added a Retry button to the download manager UI, visible only when a download has failed and retry is supported (ProgressManagerPage.axaml).
    retrybutton

  • Extended PausableProgressItemViewModelBase to support retry functionality: added SupportsRetry, CanRetry, and RetryCommand properties/methods, allowing subclasses to define retry logic and expose it to the UI. [1] [2] [3]

  • Enabled retry support in DownloadProgressItemViewModel and implemented the Retry method to reset the attempt counter and re-register the download for retry. [1] [2]

Core download logic improvements:

  • Added robust detection of transient network/SSL exceptions (including AuthenticationException) as retryable, and implemented exponential backoff with jitter for automatic retries—persisting state before delay to ensure resumability. [1] [2] [3] [4]
  • Added ResetAttempts method to TrackedDownload to allow manual retry to reset the attempt counter and state cleanly.

Service layer changes:

  • Added TryRestartDownload to ITrackedDownloadService and its implementation, allowing failed downloads to be re-added to the tracking dictionary and resumed as new retry attempts. [1] [2]

When a user is on a VPN connection, the tunnel connection can be rerouted on the provider's end from time to time, which breaks the TCP download stream for a short period. (Known case with NordVPN and Proton VPN)
When this happens the current logic immediately tries to continue the download before the connection can correct itself and commonly fails through each of the 3 retries in the current logic. Leading to failed download state and user has to go back to Model Browser and manually retry the download all over again.
This keeps the 3 retries but allows time for the connection to fully reset so the download can properly resume. In the extreme case that it still fails the 3 retries, have added a retry button for QoL instead of having to manually search the model again to restart the download.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the reliability and user experience of model downloads by introducing both automatic and manual retry mechanisms. It addresses issues where downloads might fail due to transient network problems, particularly common with VPN connections, by implementing intelligent error detection, exponential backoff, and a user-initiated retry option. The changes ensure that users can more easily recover from failed downloads without having to restart the entire process.

Highlights

  • User Interface Enhancement: Introduced a "Retry" button in the download manager UI, which becomes visible when a download fails and supports retrying.
  • ViewModel Retry Support: Extended PausableProgressItemViewModelBase with SupportsRetry, CanRetry, and RetryCommand to enable retry functionality for subclasses.
  • Download ViewModel Implementation: Implemented retry logic in DownloadProgressItemViewModel, allowing it to reset attempt counters and re-register failed downloads.
  • Robust Error Handling: Enhanced core download logic to detect transient network and SSL exceptions (including AuthenticationException) as retryable errors.
  • Adaptive Retry Mechanism: Implemented exponential backoff with jitter for automatic retries, persisting state before delay to ensure resumability and allow network conditions to stabilize.
  • Manual Retry State Management: Added a ResetAttempts method to TrackedDownload to cleanly reset the retry counter for manual retries.
  • Service Layer Integration: Introduced TryRestartDownload in ITrackedDownloadService and its implementation to re-add failed downloads to the tracking system for resumption.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • StabilityMatrix.Avalonia/ViewModels/Base/PausableProgressItemViewModelBase.cs
    • Added CanRetry to the NotifyPropertyChangedFor attribute list.
    • Introduced SupportsRetry virtual property, defaulting to false.
    • Added CanRetry property, which is true if SupportsRetry is true and the state is Failed.
    • Added RetryCommand and a virtual Retry method.
  • StabilityMatrix.Avalonia/ViewModels/Progress/DownloadProgressItemViewModel.cs
    • Overrode SupportsRetry to return true.
    • Implemented the Retry method to call download.ResetAttempts() and downloadService.TryRestartDownload(download).
  • StabilityMatrix.Avalonia/Views/ProgressManagerPage.axaml
    • Added a Button for retry, bound to RetryCommand and visible based on CanRetry.
  • StabilityMatrix.Core/Models/TrackedDownload.cs
    • Imported System.Security.Authentication.
    • Added IsTransientNetworkException static helper method to check for IOException or AuthenticationException (including inner and aggregate exceptions).
    • Modified the OnDownloadTaskCompleted method to use IsTransientNetworkException for retry condition.
    • Implemented exponential backoff with jitter for automatic retries, including state persistence before delay.
    • Added ResetAttempts method to reset the retry counter and set state to Inactive.
  • StabilityMatrix.Core/Services/ITrackedDownloadService.cs
    • Added Task TryRestartDownload(TrackedDownload download); to the interface.
  • StabilityMatrix.Core/Services/TrackedDownloadService.cs
    • Implemented TryRestartDownload to re-create the download's JSON file, re-add it to the tracking dictionary, and then call TryResumeDownload.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust retry mechanism for model downloads, addressing failures on unstable connections like VPNs, and includes a manual 'Retry' button and exponential backoff for automatic retries. However, the current implementation has a significant race condition where multiple download tasks can be started for the same file, potentially leading to data corruption. Additionally, the automatic retry logic does not respect user cancellation during the backoff period, and there is a resource leak in the service layer when re-registering downloads. Furthermore, some file I/O operations in async methods are synchronous, which can block the calling thread, and should be made asynchronous to maintain application responsiveness.

@NeuralFault
Copy link
Copy Markdown
Contributor Author

Give me a bit to modify and thoroughly verify

@NeuralFault
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses download failures on unstable connections by introducing both automatic retries with exponential backoff and a manual retry option for the user. The changes are well-structured, extending the view models and services to support the new functionality. The detection of transient network errors, including SSL/TLS-related AuthenticationException, is a thoughtful addition that will improve robustness, particularly for users on VPNs. I have a couple of minor suggestions to improve maintainability by reducing code duplication and eliminating a magic number.

@NeuralFault
Copy link
Copy Markdown
Contributor Author

Ready for merge

ionite34
ionite34 previously approved these changes Apr 8, 2026
@mohnjiles
Copy link
Copy Markdown
Contributor

mohnjiles commented Apr 8, 2026

  1. [P1] The new early return in TrackedDownload.cs will break queued resumes for partial downloads. When a resume is blocked, TryResumeDownload() puts the item into Pending (TrackedDownloadService.cs), and ProcessPendingDownloads() retries it later via the same method. Right now that works because Resume() logs on Pending but still goes through with it. With this change, Resume() bails immediately for Pending items, so anything partially downloaded that gets queued will just stay stuck once a slot opens.

  2. [P2] Manual retry will lose the import sidecar files for model downloads. Failure cleanup deletes everything in ExtraCleanupFileNames (TrackedDownload.cs), and model imports add the .cm-info.json + preview image to that list before starting the download (ModelImportService.cs). The retry path only restarts the TrackedDownload, it doesn't recreate those sidecars. So a retry can succeed but leave the model without its connected metadata and preview.

Would also love to see a test for "resume while queued from Pending" and one for "failed model import retry preserves sidecar files."

- Fix Resume() ignoring Pending state, causing queued downloads to stall after ProcessPendingDownloads() enqueues them (P1)

- Preserve sidecar files (.cm-info.json, preview image) on download failure so manual retry can succeed; only delete sidecars on explicit cancel or dismiss (P2)

- Add Dismiss action to TrackedDownload, ownloadProgressItemViewModel, and ProgressManagerPage so users can clean up failed downloads without retrying, orphaning sidecars in the process

- Fix KeyNotFoundException crash in UpdateJsonForDownload when Dismiss fires a state-change event after the download was already removed from the tracking dictionary on failure

- Add unit tests: Resume_WhileInPendingState_SetsStateToWorking,
OnFailed_SidecarFilesPreservedForRetry, OnCancelled_SidecarFilesAreDeleted
@NeuralFault
Copy link
Copy Markdown
Contributor Author

NeuralFault commented Apr 9, 2026

Note one caveat to the persistent sidecars, If the download fails and the user does not retry it or dismisses it, the metadata and preview files will remain in the model folder afterwards and on SM relaunch while download disapears from the dict and manager. Orphaned model jsons and previews shouldn't have any impact and just sits silently.

@mohnjiles mohnjiles merged commit 2feb162 into LykosAI:main Apr 10, 2026
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 10, 2026
@NeuralFault NeuralFault deleted the vpn-ssl-downloadfix branch April 10, 2026 02:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants