Skip to content

feat: replace MidazOrganizationID with prefix-based collection#608

Merged
arthurkz merged 7 commits intodevelopfrom
fix/generate-report-internal-db
Apr 7, 2026
Merged

feat: replace MidazOrganizationID with prefix-based collection#608
arthurkz merged 7 commits intodevelopfrom
fix/generate-report-internal-db

Conversation

@arthurkz
Copy link
Copy Markdown
Contributor

@arthurkz arthurkz commented Apr 6, 2026

Pull Request Checklist

Pull Request Type

  • Manager
  • Worker
  • Frontend
  • Infrastructure
  • Packages
  • Pipeline
  • Tests
  • Documentation

Checklist

Please check each item after it's completed.

  • I have tested these changes locally.
  • I have updated the documentation accordingly.
  • I have added necessary comments to the code, especially in complex areas.
  • I have ensured that my changes adhere to the project's coding standards.
  • I have checked for any potential security issues.
  • I have ensured that all tests pass.
  • I have updated the version appropriately (if applicable).
  • I have confirmed this code is ready for review.

Additional Notes

Obs: Please, always remember to target your PR to develop branch instead of main.

…iscovery

- add ListCollectionNames and GetDatabaseSchemaForPluginCRM to MongoDB Repository
- processPluginCRMCollection now discovers all org-scoped collections via prefix
  matching (holders_*, aliases_*), queries each, merges results, and injects
  organization_id into every record
- schema discovery uses union of fields across all org collections
- deprecate DATASOURCE_CRM_MIDAZ_ORGANIZATION_ID env var (no longer required)
- update tests to mock ListCollectionNames in CRM processing flow

X-Lerian-Ref: 0x1
@arthurkz arthurkz requested a review from a team as a code owner April 6, 2026 20:54
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

processPluginCRMCollection now lists all MongoDB collections, selects those matching a logical <collection>_ prefix, queries each physical collection, injects organization_id from the suffix, merges records into result["plugin_crm"][collection], records OTEL attributes, and decrypts the aggregated data. Collection-list failures abort immediately.

Changes

Cohort / File(s) Summary
Report Data Generation Logic
components/worker/internal/services/generate-report-data.go
Rewrote processPluginCRMCollection to discover physical collections via ListCollectionNames, select/sort those with <collection>_ prefix, query each, inject organization_id from collection suffix, aggregate into result["plugin_crm"][collection], set OTEL merged-count attribute, and run decryptPluginCRMData on the merged slice. Per-collection query errors are wrapped with the physical collection name; listing errors return immediately.
Report Data Tests
components/worker/internal/services/generate-report-data_test.go, components/worker/internal/services/generate-report_test.go
Updated tests to mock ListCollectionNames and assert multi-collection discovery, prefix filtering, and organization_id injection; removed tests expecting error when MidazOrganizationID missing; added tests expecting ListCollectionNames failure propagation; adjusted existing mocks to expect listing before per-collection queries.
Configuration & Schema Selection
pkg/datasource-config.go, pkg/datasource/direct_provider_schema.go
Marked MidazOrganizationID as legacy/back-compat; added unexported pluginCRMDataSourceID and branch to use plugin-CRM-specific schema discovery; skip org-suffix stripping for plugin CRM logical names.
MongoDB Repository Interface & Mocks
pkg/mongodb/datasource.mongodb.go, pkg/mongodb/datasource.mongodb.mock.go
Added GetDatabaseSchemaForPluginCRM(ctx) and ListCollectionNames(ctx) to the Repository interface and to the generated GoMock MockRepository with recorder methods.
MongoDB Query Implementation
pkg/mongodb/datasource_query.go
Added ExternalDataSource.ListCollectionNames(ctx) to obtain a DB client, list all collection names with a slow query timeout, wrap errors, and log database name and discovered count.
MongoDB Schema Discovery
pkg/mongodb/datasource_schema.go
Added ExternalDataSource.GetDatabaseSchemaForPluginCRM(ctx): groups physical collections by logical prefix (before first _), samples up to 5 physical collections (sorted) per logical group, unions field types into a single CollectionSchema per logical name, treats sampling failures as warnings, and propagates DB/listing errors.

Sequence Diagram

sequenceDiagram
    participant Worker as Report Worker
    participant Repo as MongoDB Repository
    participant DB as MongoDB Database

    Worker->>Repo: ListCollectionNames(ctx)
    Repo->>DB: List all collections
    DB-->>Repo: ["holders_org-123","holders_org-456","other_data"]
    Repo-->>Worker: collection names

    Worker->>Worker: Filter by prefix (e.g., "holders_")
    Worker->>Worker: Sort matching physical collections

    loop For each matching collection
        Worker->>Repo: QueryWithAdvancedFilters(ctx, physColl)
        Repo->>DB: Query collection physColl
        DB-->>Repo: Records
        Repo-->>Worker: Results
        Worker->>Worker: Inject organization_id (trim prefix)
        Worker->>Worker: Append to merged slice
    end

    Worker->>Worker: Set OTEL span attr (merged count)
    Worker->>Worker: decryptPluginCRMData(merged slice)
    Worker-->>Worker: Return aggregated plugin_crm data
Loading

Comment @coderabbitai help to get the list of available commands and usage tips.

@lerian-studio
Copy link
Copy Markdown
Contributor

lerian-studio commented Apr 6, 2026

📊 Unit Test Coverage Report: reporter-worker

Metric Value
Overall Coverage 92.5% ✅ PASS
Threshold 85%

Coverage by Package

Package Coverage
github.com/LerianStudio/reporter/components/worker/internal/adapters/rabbitmq 94.2%
github.com/LerianStudio/reporter/components/worker/internal/services 95.6%

Generated by Go PR Analysis workflow

@lerian-studio
Copy link
Copy Markdown
Contributor

lerian-studio commented Apr 6, 2026

📊 Unit Test Coverage Report: reporter-manager

Metric Value
Overall Coverage 88.1% ✅ PASS
Threshold 85%

Coverage by Package

Package Coverage
github.com/LerianStudio/reporter/components/manager/internal/adapters/http/in 88.8%
github.com/LerianStudio/reporter/components/manager/internal/services 89.4%

Generated by Go PR Analysis workflow

@lerian-studio
Copy link
Copy Markdown
Contributor

lerian-studio commented Apr 6, 2026

🔒 Security Scan Results — worker

Trivy

Filesystem Scan

✅ No vulnerabilities or secrets found.

Docker Image Scan

✅ No vulnerabilities found.


Docker Hub Health Score Compliance

✅ Policies — 4/4 met

Policy Status
Default non-root user ✅ Passed
No fixable critical/high CVEs ✅ Passed
No high-profile vulnerabilities ✅ Passed
No AGPL v3 licenses ✅ Passed

🔍 View full scan logs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/datasource/direct_provider_schema.go (1)

91-119: ⚠️ Potential issue | 🟠 Major

Plugin CRM logical-schema handling is only fixed in this path.

This special case updates GetDataSourceSchema, but the sibling MongoDB validation/details paths still branch on MidazOrganizationID and fall back to physical holders_<org> names. Once MIDAZ_ORGANIZATION_ID is removed, template validation and datasource-details flows will keep rejecting logical holders/aliases references even though schema lookup here succeeds. Please update those call sites in the same rollout so the deprecation is actually safe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/datasource/direct_provider_schema.go` around lines 91 - 119, The code
special-cases pluginCRM by calling GetDatabaseSchemaForPluginCRM when
dataSourceID == pluginCRMDataSourceID and strips org suffix only in
GetDataSourceSchema; fix the sibling MongoDB validation and datasource-details
code paths so they use the same branching and naming logic: where code currently
branches on ds.MidazOrganizationID or calls
GetDatabaseSchema/GetDatabaseSchemaForOrganization, add the same dataSourceID ==
pluginCRMDataSourceID condition to call GetDatabaseSchemaForPluginCRM, and apply
the same displayName handling (use stripOrgSuffix only when MidazOrganizationID
!= "" and dataSourceID != pluginCRMDataSourceID) so validation (template
validation) and details flows accept logical names like "holders" for plugin
CRM. Ensure references to pluginCRMDataSourceID, MidazOrganizationID,
GetDatabaseSchemaForPluginCRM, GetDatabaseSchemaForOrganization,
GetDatabaseSchema, and stripOrgSuffix are updated consistently across those
validation/details functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/worker/internal/services/generate-report-data.go`:
- Around line 359-390: The merge order for multi-org CRM results is
non-deterministic because allCollections is iterated as returned; before the
loop over allCollections (where prefix := collection + "_" and you check
physColl and call uc.queryMongoCollectionWithFilters), sort/filter the list of
physical collections deterministically (e.g., collect matching physColl into a
slice, sort.Strings on that slice) so the subsequent loop appends org records in
a stable order into allResults; also add "sort" to the imports.
- Around line 376-381: The loop currently swallows errors from
uc.queryMongoCollectionWithFilters by logging and doing continue, which yields
silently incomplete reports; instead, change the behavior so the function fails
fast: replace the log+continue path inside the loop that checks queryErr to
return an error (e.g., wrap queryErr with context including physColl and
"plugin_crm") or, if you prefer aggregated reporting, collect physColl into a
slice of failures and after the loop return a consolidated error listing failed
collections and their errors. Update the branch that handles queryErr (from
uc.queryMongoCollectionWithFilters) to either return fmt.Errorf("failed querying
collection %s: %w", physColl, queryErr) immediately or append the failure to a
failures map and return an aggregated error at the end of the function.

In `@pkg/mongodb/datasource_schema.go`:
- Around line 265-281: The code currently creates a CollectionSchema and appends
it to schema even when unionFields is empty; update the logic around
CollectionSchema/FieldInformation creation so that if len(unionFields) == 0 you
either skip appending the logical group or emit a warning and continue;
specifically, before constructing collSchema or appending to schema, check
unionFields length (use the existing logicalName and sampled context), call
logger.Log at LevelWarn with logical_name and orgs_sampled when skipping, and
only build/apply collSchema and append to schema when unionFields has at least
one field.
- Around line 197-207: The code after calling ds.connection.GetDB(ctx) and
database.ListCollectionNames(schemaCtx, ...) lacks the context.DeadlineExceeded
checks present in GetDatabaseSchema and GetDatabaseSchemaForOrganization; update
the current function (the block with client, err := ds.connection.GetDB(ctx) and
the subsequent database.ListCollectionNames call) to detect if err ==
context.DeadlineExceeded (or errors.Is(err, context.DeadlineExceeded)) and
return the same timeout-specific error messages used by the sibling methods,
i.e., handle timeouts distinctly for both the GetDB call and the
ListCollectionNames call to keep behavior consistent.
- Around line 241-263: GetDatabaseSchemaForPluginCRM currently samples up to 5
physical collections via sampleMultipleDocuments and unions their fields instead
of calling discoverCollectionFields (which runs aggregation + sampling) to avoid
expensive per-org aggregations; add a concise comment above the loop explaining
this intentional design choice and tradeoff (e.g., that we sample multiple
org-scoped collections to avoid costly aggregation for multi-org scenarios,
assuming field schemas are consistent across orgs), and reference the reasons
and symbols involved (GetDatabaseSchemaForPluginCRM, sampleMultipleDocuments,
discoverCollectionFields, unionFields, unknownDataType) so future maintainers
understand why aggregation discovery was not used.

---

Outside diff comments:
In `@pkg/datasource/direct_provider_schema.go`:
- Around line 91-119: The code special-cases pluginCRM by calling
GetDatabaseSchemaForPluginCRM when dataSourceID == pluginCRMDataSourceID and
strips org suffix only in GetDataSourceSchema; fix the sibling MongoDB
validation and datasource-details code paths so they use the same branching and
naming logic: where code currently branches on ds.MidazOrganizationID or calls
GetDatabaseSchema/GetDatabaseSchemaForOrganization, add the same dataSourceID ==
pluginCRMDataSourceID condition to call GetDatabaseSchemaForPluginCRM, and apply
the same displayName handling (use stripOrgSuffix only when MidazOrganizationID
!= "" and dataSourceID != pluginCRMDataSourceID) so validation (template
validation) and details flows accept logical names like "holders" for plugin
CRM. Ensure references to pluginCRMDataSourceID, MidazOrganizationID,
GetDatabaseSchemaForPluginCRM, GetDatabaseSchemaForOrganization,
GetDatabaseSchema, and stripOrgSuffix are updated consistently across those
validation/details functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9d7f330b-f8a9-4c55-975c-59dd883feacb

📥 Commits

Reviewing files that changed from the base of the PR and between ff8f16b and cd9a114.

📒 Files selected for processing (9)
  • components/worker/internal/services/generate-report-data.go
  • components/worker/internal/services/generate-report-data_test.go
  • components/worker/internal/services/generate-report_test.go
  • pkg/datasource-config.go
  • pkg/datasource/direct_provider_schema.go
  • pkg/mongodb/datasource.mongodb.go
  • pkg/mongodb/datasource.mongodb.mock.go
  • pkg/mongodb/datasource_query.go
  • pkg/mongodb/datasource_schema.go

arthurkz added 2 commits April 6, 2026 18:11
…order

Templates using index-based access (e.g. holders.0.document) would render
different organizations depending on ListCollectionNames return order. Sorting
the matched physical collections ensures consistent results across runs.

X-Lerian-Ref: 0x1
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/worker/internal/services/generate-report-data.go`:
- Around line 377-378: Remove the stray blank line immediately after the loop
header for "for _, physColl := range matchingCollections {" so the block starts
directly with the first statement (i.e., delete the empty line after the opening
brace in the loop over matchingCollections), ensuring the loop body formatting
conforms to lint rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dcf023cc-30da-4acc-899c-9f889ac42e13

📥 Commits

Reviewing files that changed from the base of the PR and between cd9a114 and 63cf697.

📒 Files selected for processing (1)
  • components/worker/internal/services/generate-report-data.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 183-292: GetDatabaseSchemaForPluginCRM is over-complex; extract
the collection grouping and sampling/union logic into helpers to reduce
cognitive complexity: implement groupCollectionsByPrefix(collections []string)
map[string][]string to encapsulate the prefix-based grouping used where
logicalGroups is built, and implement (ds *ExternalDataSource)
sampleAndUnionFields(ctx context.Context, database *mongo.Database,
physicalCollections []string, maxSample int) map[string]string to encapsulate
the sampling loop that calls ds.sampleMultipleDocuments and builds unionFields
(use unknownDataType for additionalFields and respect maxSampleCollections
constant); then refactor GetDatabaseSchemaForPluginCRM to call these two helpers
(keep existing logging, schema construction with CollectionSchema and
FieldInformation, and the maxSampleCollections constant) so the main function
only orchestrates DB calls and assembles results.
- Around line 244-247: physicalCollections is sampled directly which is
non-deterministic because ListCollectionNames has no stable order; to fix, sort
the collection list before slicing so the sampling is deterministic—e.g., call
sort.Strings(physicalCollections) (or sort.Slice with desired comparator)
immediately before the sampled := physicalCollections line and then apply the
existing maxSampleCollections logic; ensure this change references the
physicalCollections and maxSampleCollections symbols so the deterministic
sampling is applied consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e34e7015-6516-45fc-ae5c-53745fd3fe28

📥 Commits

Reviewing files that changed from the base of the PR and between 63cf697 and 5fdaae5.

📒 Files selected for processing (1)
  • pkg/mongodb/datasource_schema.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/mongodb/datasource_schema.go (1)

244-247: 🧹 Nitpick | 🔵 Trivial

Non-deterministic sampling order may cause inconsistent schema discovery.

ListCollectionNames does not guarantee a stable order, and Go map iteration (logicalGroups) is also non-deterministic. Taking the first 5 collections without sorting may yield different samples across calls, potentially leading to inconsistent union schemas if there are minor field variations between orgs.

♻️ Optional: Sort before sampling for determinism
+		import "sort"
+
 		sampled := physicalCollections
+		sort.Strings(sampled)
 		if len(sampled) > maxSampleCollections {
 			sampled = sampled[:maxSampleCollections]
 		}

Note: Add "sort" to the imports at the top of the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/mongodb/datasource_schema.go` around lines 244 - 247, The sampling of
collections for schema discovery is non-deterministic because
physicalCollections (from ListCollectionNames) and the logicalGroups map iterate
in arbitrary order; before taking the first maxSampleCollections slice (the
sampled variable), sort physicalCollections (and any keys derived from
logicalGroups) to ensure a stable order; add "sort" to imports and call
sort.Strings(physicalCollections) (or sort.Strings on the slice you assign to
sampled) before truncating to maxSampleCollections so schema sampling is
deterministic across runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 287-289: The iteration over the map unionFields produces
non-deterministic ordering when appending to collSchema.Fields
(FieldInformation{Name:, DataType:}), so change the loop to collect and sort the
map keys (e.g., into a slice), then iterate over the sorted keys to append
FieldInformation entries in a stable order; ensure you use the same fieldName
and dataType pairing from unionFields when constructing each FieldInformation.

---

Duplicate comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 244-247: The sampling of collections for schema discovery is
non-deterministic because physicalCollections (from ListCollectionNames) and the
logicalGroups map iterate in arbitrary order; before taking the first
maxSampleCollections slice (the sampled variable), sort physicalCollections (and
any keys derived from logicalGroups) to ensure a stable order; add "sort" to
imports and call sort.Strings(physicalCollections) (or sort.Strings on the slice
you assign to sampled) before truncating to maxSampleCollections so schema
sampling is deterministic across runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b1e77c79-b35b-445b-843d-c24b0d18a8b1

📥 Commits

Reviewing files that changed from the base of the PR and between 5fdaae5 and 25db616.

📒 Files selected for processing (1)
  • pkg/mongodb/datasource_schema.go

- extract groupCollectionsByPrefix and sampleAndUnionFields to reduce cognitive
  complexity of GetDatabaseSchemaForPluginCRM (33 -> ~15)
- sort physical collections before sampling for deterministic union schema
- skip logical groups with zero discovered fields

X-Lerian-Ref: 0x1
@blacksmith-sh

This comment has been minimized.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 285-290: The loop that builds collSchema.Fields from the
unionFields map in sampleAndUnionFields is non-deterministic; create a slice of
field names from unionFields, sort it (e.g., using sort.Strings), then iterate
over the sorted fieldNames to append FieldInformation entries to
collSchema.Fields so the union schema ordering is deterministic across runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f8cbf74-7fb5-42ee-b09e-3a08c71dd6f0

📥 Commits

Reviewing files that changed from the base of the PR and between 25db616 and 08b2cbc.

📒 Files selected for processing (1)
  • pkg/mongodb/datasource_schema.go

@arthurkz arthurkz merged commit 95c1949 into develop Apr 7, 2026
20 checks passed
@arthurkz arthurkz deleted the fix/generate-report-internal-db branch April 7, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants