Skip to content

Add bulk create and update operations for MongoDB metadata repositories #2001

@Ygohr

Description

@Ygohr

Context

The Midaz ledger system processes high volumes of transactions that require metadata storage in MongoDB. Currently, metadata operations are performed individually for each entity, which becomes a bottleneck during batch transaction processing.

Problem/Motivation

When processing multiple transactions simultaneously, the system performs individual metadata storage operations for each transaction. This creates:

  • Excessive database round-trips: Each operation requires a separate network call (100 transactions = 100 round-trips)
  • Connection pool pressure: High concurrency exhausts available connections
  • Retry complexity: Individual operations lack atomic batch guarantees, requiring complex application-level deduplication
  • Scalability limits: Performance degrades linearly as transaction volume increases

Current Behavior

  • Metadata for each entity is inserted/updated via individual Create() and Update() calls
  • No built-in idempotency protection for retries (risk of duplicate records)
  • No bulk operation support in the MongoDB metadata repository interface

Goal

Implement CreateBulk() and UpdateBulk() methods for MongoDB metadata repositories that:

  1. Batch operations: Process up to 1000 documents per BulkWrite call
  2. Idempotent inserts: Use $setOnInsert with upsert to prevent duplicates on retry
  3. Deadlock prevention: Sort documents by EntityID before processing
  4. Result tracking: Return detailed counts (attempted, inserted, matched, modified)
  5. Cancellation support: Respect context cancellation between chunks
  6. Multi-tenant aware: Automatically resolve tenant database from context
  7. Observable: Include OpenTelemetry tracing spans for all operations

Success Metrics

  • CreateBulk() method available in onboarding and transaction metadata repositories
  • UpdateBulk() method available in onboarding and transaction metadata repositories
  • Zero duplicate records created when retrying the same batch
  • Database round-trips reduced from N to ceil(N/1000) for batch operations
  • Benchmark tests demonstrate performance improvement over individual operations
  • All bulk operations emit OpenTelemetry trace spans

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions