Skip to content

SOLR-17949: Add Azure Blob Storage backup repository module#3750

Open
prateeksinghalgit wants to merge 7 commits intoapache:mainfrom
prateeksinghalgit:SOLR-17949-azure-blob-repository
Open

SOLR-17949: Add Azure Blob Storage backup repository module#3750
prateeksinghalgit wants to merge 7 commits intoapache:mainfrom
prateeksinghalgit:SOLR-17949-azure-blob-repository

Conversation

@prateeksinghalgit
Copy link
Copy Markdown

@prateeksinghalgit prateeksinghalgit commented Oct 9, 2025

Implements Azure Blob Storage backup repository as discussed in SOLR-17949.

Description

This PR adds a new backup repository implementation for Azure Blob Storage, enabling Solr collections to be backed up to and restored from Microsoft Azure.

Key Features:

  • Full backup/restore functionality to Azure Blob Storage
  • Support for 4 authentication methods (Connection String, Account Key, SAS Token, Azure Identity)
  • Incremental backup support with versioning
  • Data integrity verification (checksum validation)
  • Compatible with Azurite emulator for local testing
  • Comprehensive documentation and 76 passing unit tests

Solution

The implementation follows Solr's BackupRepository interface pattern, similar to existing S3 and GCS repository modules:

  • AzureBlobBackupRepository: Main class implementing Solr's BackupRepository interface
  • AzureBlobStorageClient: Wrapper for Azure SDK, providing file operations
  • AzureBlobIndexInput: Custom Lucene IndexInput for reading from Azure blobs
  • AzureBlobOutputStream: Custom output stream for writing to Azure blobs
  • Authentication: Supports 4 methods via flexible configuration in solr.xml

All streaming operations are compatible with Solr's ResumableInputStream for fault-tolerant transfers.

Implementation stats:

  • 8 implementation files (1,606 LOC)
  • 8 test files (2,180 LOC)
  • All dependencies Apache 2.0 licensed

Tests

Unit Tests: 76/76 passing (100%)

./gradlew :solr:modules:blob-repository:test
# Result: BUILD SUCCESSFUL - 76 test(s)

Test Coverage:

  • Basic read/write operations
  • Large file handling (1GB+)
  • Binary data integrity
  • Concurrent operations
  • Stream lifecycle (close/resume behavior)
  • Incremental backups
  • All 4 authentication methods
  • Integration with Azurite (local emulator)
  • Integration with real Azure Blob Storage

Testing Instructions:
Can be tested locally with Azurite emulator (no Azure account needed) or with real Azure Blob Storage. See solr/modules/blob-repository/README.md for detailed setup instructions.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew :solr:modules:blob-repository:check (module-specific check passed).
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Dependency upgrades tool:build tests labels Oct 9, 2025
@pratsgit
Copy link
Copy Markdown

Quick update: following up on the dev@ thread.

To clarify scope — although the diff is large, the actual implementation is centered in 8 main files and 8 test files; the rest are license header updates.

Happy to split the PR into smaller logical chunks (core module / tests / docs) if that helps with review.

Thanks everyone!

@janhoy
Copy link
Copy Markdown
Contributor

janhoy commented Nov 20, 2025

Don’t have time to review but asked copilot for an opinion 😉

Copy link
Copy Markdown
Contributor

@janhoy janhoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only immediate comment is naming. «Blob-repository» is too generic. Should contain word «azure»?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements Azure Blob Storage backup repository support for Apache Solr, enabling collections to be backed up to and restored from Microsoft Azure. The implementation follows established patterns from existing S3 and GCS modules, providing 4 authentication methods (Connection String, Account Key, SAS Token, Azure Identity), incremental backup support, and comprehensive documentation with 76 passing unit tests.

Reviewed Changes

Copilot reviewed 70 out of 70 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
backup-restore.adoc Adds comprehensive documentation for BlobBackupRepository with authentication methods and configuration
BlobBackupRepository.java Main repository implementation following Solr's BackupRepository interface
BlobStorageClient.java Azure SDK wrapper providing blob storage operations
BlobOutputStream.java Custom output stream for block-based blob uploads
BlobIndexInput.java Lucene IndexInput implementation with page caching
Test files 8 test classes with 76 passing tests covering all functionality
build.gradle Module build configuration with Azure SDK dependencies
License files License and notice files for Azure and Netty dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread solr/licenses/azure-NOTICE.txt Outdated
Comment thread solr/licenses/msal4j-NOTICE.txt Outdated
Comment thread solr/licenses/reactor-NOTICE.txt Outdated
Comment thread gradle/libs.versions.toml Outdated
This commit adds support for backing up and restoring Solr collections
to Azure Blob Storage with multiple authentication options.

Features:
- Full backup/restore functionality to Azure Blob Storage
- Support for 4 authentication methods:
  * Connection String (for development)
  * Account Name + Key (for simple production)
  * SAS Token (recommended for production)
  * Azure Identity (Managed Identity, Service Principal, Azure CLI)
- Incremental backup support with versioning
- Data integrity verification (checksum validation)
- Compatible with Azurite emulator for local testing
- Comprehensive documentation and 76 passing unit tests

Implementation:
- 8 implementation files (1,606 LOC)
- 8 test files (2,180 LOC)
- All dependencies Apache 2.0 licensed
- Follows Solr's backup repository patterns
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch 3 times, most recently from 32efbbd to 5462186 Compare November 21, 2025 23:10
- Renamed module from blob-repository to azure-blob-repository
- Renamed all classes from Blob* to AzureBlob* for clarity
- Updated package from org.apache.solr.blob to org.apache.solr.azureblob
- Added Azure SDK dependencies (azure-storage-blob, azure-identity)
- Updated Solr Reference Guide with Azure Blob Storage documentation
- Added .gitignore entries for Azurite test infrastructure

All authentication methods tested successfully with real Azure Blob Storage:
- Connection String authentication
- Account Name + Key authentication
- SAS Token authentication
- Service Principal (Azure Identity) authentication

Testing completed with 100% success rate on backup/restore operations.
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from 5462186 to 47f456a Compare November 22, 2025 02:35
@prateeksinghalgit
Copy link
Copy Markdown
Author

Only immediate comment is naming. «Blob-repository» is too generic. Should contain word «azure»?

made the change to azure-blob-repository

Comment thread solr/licenses/msal4j-NOTICE.txt Outdated
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from 24daaaa to e6d9a64 Compare November 24, 2025 09:22
Comment thread solr/solr-ref-guide/modules/deployment-guide/pages/backup-restore.adoc Outdated
- Switch from Netty to OkHttp for better Security Manager compatibility
- Use static shared HttpClient for better resource management
- Fix licenses: msal4j and Azure SDK are MIT licensed
- Add changelog entry
- Add JFR permissions for Reactor
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from e6d9a64 to f7adb05 Compare November 24, 2025 23:43
@janhoy
Copy link
Copy Markdown
Contributor

janhoy commented Nov 25, 2025

I'll leave the rest of the review to others more proficient in Azure Blob than me (have never used it). I'd love to test it with a read AzBlob but I'll leave that to someone who already have an active account.

I have a concern that the PR feels largely AI generated(?), lacking the care to details that we require for contributions. Have me excused @prateeksinghalgit if this is not correct, it was just a hunch I got while reviewing. I'd rather have a short well thought out README with useful advice for users than three pages of detailed step by step instructions for testing and developing the feature.

For other reviewers to pick up where I left, consider in partifular the HTTP client choice, correct licensing and avoiding hard coded ports in tests. I have not done a complete review, not read the ref-guide part at all, just commented on things me and Copilot saw immediately.

@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch 3 times, most recently from 8e812b1 to 3545833 Compare November 26, 2025 07:25
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from 3545833 to e18dd13 Compare November 26, 2025 07:59
@pratsgit
Copy link
Copy Markdown

pratsgit commented Dec 1, 2025

Hi all,

I’ve pushed a set of updates based on the feedback so far:

  1. Switched the integration tests to use Testcontainers with Azurite (no hard-coded ports / external prereqs) and disabled the Security Manager for this module’s tests, following the pattern used in the extraction module.
  2. Addressed HTTP client concerns by using azure-core-http-okhttp and keeping those deps in test scope only, plus explicit test deps.
  3. Cleaned up and significantly shortened the README and docs
  4. Did additional cleanup to align naming, structure, and configuration patterns with those of the existing repository plugins.

Please let me know if anything else should be adjusted that would help move the review forward.

Happy to make further changes.

Thanks again for the thoughtful review and guidance so far!

— Prateek

@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from e18dd13 to b28e265 Compare December 2, 2025 07:12
@prateeksinghalgit
Copy link
Copy Markdown
Author

Hi All,

I’ve fixed the recent CI failures:

  1. Added proper thread leak filters for Testcontainers / JNA so tests are cleanly skipped when Docker isn’t available in CI.
  2. Fixed two error-prone findings (IntLongMath and MissingOverride) that only surfaced in CI because error-prone is disabled by default in local builds unless -Pvalidation.errorprone=true is used.

Please let me know if anything else should be adjusted to help move the review forward. I’m happy to iterate further as needed.

- Use Testcontainers (Azurite) for integration tests to avoid hardcoded ports and external dependencies
- Disable Security Manager for Azure Blob tests to support Testcontainers (similar to extraction module)
- Fix OkHttp compilation error by adding explicit testImplementation
- Update documentation: BlobBackupRepository -> AzureBlobBackupRepository
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from b28e265 to 428ca18 Compare December 3, 2025 19:37
@prateeksinghalgit prateeksinghalgit force-pushed the SOLR-17949-azure-blob-repository branch from 3cef18e to d015bbe Compare January 14, 2026 20:44
@github-actions
Copy link
Copy Markdown

This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!

@github-actions github-actions Bot added the stale PR not updated in 60 days label Mar 16, 2026
@janhoy janhoy requested review from HoustonPutman and psalagnac and removed request for janhoy March 19, 2026 21:47
@github-actions github-actions Bot removed the stale PR not updated in 60 days label Mar 20, 2026
@prateeksinghalgit
Copy link
Copy Markdown
Author

Thanks @janhoy for adding more reviewers. @HoustonPutman and @psalagnac ,let me know if you have any questions related to the pr.

@psalagnac
Copy link
Copy Markdown
Contributor

Thanks for adding me as a reviewer.
I've been pretty busy recently and I haven't got the chance to do the review yet. I'll try to do a complete review before end of next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency upgrades documentation Improvements or additions to documentation jetty-server tests tool:build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants