SOLR-17949: Add Azure Blob Storage backup repository module#3750
SOLR-17949: Add Azure Blob Storage backup repository module#3750prateeksinghalgit wants to merge 7 commits intoapache:mainfrom
Conversation
|
Quick update: following up on the dev@ thread. To clarify scope — although the diff is large, the actual implementation is centered in 8 main files and 8 test files; the rest are license header updates. Happy to split the PR into smaller logical chunks (core module / tests / docs) if that helps with review. Thanks everyone! |
|
Don’t have time to review but asked copilot for an opinion 😉 |
janhoy
left a comment
There was a problem hiding this comment.
Only immediate comment is naming. «Blob-repository» is too generic. Should contain word «azure»?
There was a problem hiding this comment.
Pull Request Overview
This PR implements Azure Blob Storage backup repository support for Apache Solr, enabling collections to be backed up to and restored from Microsoft Azure. The implementation follows established patterns from existing S3 and GCS modules, providing 4 authentication methods (Connection String, Account Key, SAS Token, Azure Identity), incremental backup support, and comprehensive documentation with 76 passing unit tests.
Reviewed Changes
Copilot reviewed 70 out of 70 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| backup-restore.adoc | Adds comprehensive documentation for BlobBackupRepository with authentication methods and configuration |
| BlobBackupRepository.java | Main repository implementation following Solr's BackupRepository interface |
| BlobStorageClient.java | Azure SDK wrapper providing blob storage operations |
| BlobOutputStream.java | Custom output stream for block-based blob uploads |
| BlobIndexInput.java | Lucene IndexInput implementation with page caching |
| Test files | 8 test classes with 76 passing tests covering all functionality |
| build.gradle | Module build configuration with Azure SDK dependencies |
| License files | License and notice files for Azure and Netty dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This commit adds support for backing up and restoring Solr collections to Azure Blob Storage with multiple authentication options. Features: - Full backup/restore functionality to Azure Blob Storage - Support for 4 authentication methods: * Connection String (for development) * Account Name + Key (for simple production) * SAS Token (recommended for production) * Azure Identity (Managed Identity, Service Principal, Azure CLI) - Incremental backup support with versioning - Data integrity verification (checksum validation) - Compatible with Azurite emulator for local testing - Comprehensive documentation and 76 passing unit tests Implementation: - 8 implementation files (1,606 LOC) - 8 test files (2,180 LOC) - All dependencies Apache 2.0 licensed - Follows Solr's backup repository patterns
32efbbd to
5462186
Compare
- Renamed module from blob-repository to azure-blob-repository - Renamed all classes from Blob* to AzureBlob* for clarity - Updated package from org.apache.solr.blob to org.apache.solr.azureblob - Added Azure SDK dependencies (azure-storage-blob, azure-identity) - Updated Solr Reference Guide with Azure Blob Storage documentation - Added .gitignore entries for Azurite test infrastructure All authentication methods tested successfully with real Azure Blob Storage: - Connection String authentication - Account Name + Key authentication - SAS Token authentication - Service Principal (Azure Identity) authentication Testing completed with 100% success rate on backup/restore operations.
5462186 to
47f456a
Compare
made the change to azure-blob-repository |
24daaaa to
e6d9a64
Compare
- Switch from Netty to OkHttp for better Security Manager compatibility - Use static shared HttpClient for better resource management - Fix licenses: msal4j and Azure SDK are MIT licensed - Add changelog entry - Add JFR permissions for Reactor
e6d9a64 to
f7adb05
Compare
|
I'll leave the rest of the review to others more proficient in Azure Blob than me (have never used it). I'd love to test it with a read AzBlob but I'll leave that to someone who already have an active account. I have a concern that the PR feels largely AI generated(?), lacking the care to details that we require for contributions. Have me excused @prateeksinghalgit if this is not correct, it was just a hunch I got while reviewing. I'd rather have a short well thought out README with useful advice for users than three pages of detailed step by step instructions for testing and developing the feature. For other reviewers to pick up where I left, consider in partifular the HTTP client choice, correct licensing and avoiding hard coded ports in tests. I have not done a complete review, not read the ref-guide part at all, just commented on things me and Copilot saw immediately. |
8e812b1 to
3545833
Compare
3545833 to
e18dd13
Compare
|
Hi all, I’ve pushed a set of updates based on the feedback so far:
Please let me know if anything else should be adjusted that would help move the review forward. Happy to make further changes. Thanks again for the thoughtful review and guidance so far! — Prateek |
e18dd13 to
b28e265
Compare
|
Hi All, I’ve fixed the recent CI failures:
Please let me know if anything else should be adjusted to help move the review forward. I’m happy to iterate further as needed. |
- Use Testcontainers (Azurite) for integration tests to avoid hardcoded ports and external dependencies - Disable Security Manager for Azure Blob tests to support Testcontainers (similar to extraction module) - Fix OkHttp compilation error by adding explicit testImplementation - Update documentation: BlobBackupRepository -> AzureBlobBackupRepository
b28e265 to
428ca18
Compare
3cef18e to
d015bbe
Compare
|
This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution! |
|
Thanks @janhoy for adding more reviewers. @HoustonPutman and @psalagnac ,let me know if you have any questions related to the pr. |
|
Thanks for adding me as a reviewer. |
…eam merge Made-with: Cursor
Implements Azure Blob Storage backup repository as discussed in SOLR-17949.
Description
This PR adds a new backup repository implementation for Azure Blob Storage, enabling Solr collections to be backed up to and restored from Microsoft Azure.
Key Features:
Solution
The implementation follows Solr's
BackupRepositoryinterface pattern, similar to existing S3 and GCS repository modules:BackupRepositoryinterfaceIndexInputfor reading from Azure blobssolr.xmlAll streaming operations are compatible with Solr's
ResumableInputStreamfor fault-tolerant transfers.Implementation stats:
Tests
Unit Tests: 76/76 passing (100%)
./gradlew :solr:modules:blob-repository:test # Result: BUILD SUCCESSFUL - 76 test(s)Test Coverage:
Testing Instructions:
Can be tested locally with Azurite emulator (no Azure account needed) or with real Azure Blob Storage. See
solr/modules/blob-repository/README.mdfor detailed setup instructions.Checklist
Please review the following and check all that apply:
mainbranch../gradlew :solr:modules:blob-repository:check(module-specific check passed).