feat: replace MidazOrganizationID with prefix-based collection#608
feat: replace MidazOrganizationID with prefix-based collection#608
Conversation
…iscovery - add ListCollectionNames and GetDatabaseSchemaForPluginCRM to MongoDB Repository - processPluginCRMCollection now discovers all org-scoped collections via prefix matching (holders_*, aliases_*), queries each, merges results, and injects organization_id into every record - schema discovery uses union of fields across all org collections - deprecate DATASOURCE_CRM_MIDAZ_ORGANIZATION_ID env var (no longer required) - update tests to mock ListCollectionNames in CRM processing flow X-Lerian-Ref: 0x1
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughprocessPluginCRMCollection now lists all MongoDB collections, selects those matching a logical Changes
Sequence DiagramsequenceDiagram
participant Worker as Report Worker
participant Repo as MongoDB Repository
participant DB as MongoDB Database
Worker->>Repo: ListCollectionNames(ctx)
Repo->>DB: List all collections
DB-->>Repo: ["holders_org-123","holders_org-456","other_data"]
Repo-->>Worker: collection names
Worker->>Worker: Filter by prefix (e.g., "holders_")
Worker->>Worker: Sort matching physical collections
loop For each matching collection
Worker->>Repo: QueryWithAdvancedFilters(ctx, physColl)
Repo->>DB: Query collection physColl
DB-->>Repo: Records
Repo-->>Worker: Results
Worker->>Worker: Inject organization_id (trim prefix)
Worker->>Worker: Append to merged slice
end
Worker->>Worker: Set OTEL span attr (merged count)
Worker->>Worker: decryptPluginCRMData(merged slice)
Worker-->>Worker: Return aggregated plugin_crm data
Comment |
📊 Unit Test Coverage Report:
|
| Metric | Value |
|---|---|
| Overall Coverage | 92.5% ✅ PASS |
| Threshold | 85% |
Coverage by Package
| Package | Coverage |
|---|---|
github.com/LerianStudio/reporter/components/worker/internal/adapters/rabbitmq |
94.2% |
github.com/LerianStudio/reporter/components/worker/internal/services |
95.6% |
Generated by Go PR Analysis workflow
📊 Unit Test Coverage Report:
|
| Metric | Value |
|---|---|
| Overall Coverage | 88.1% ✅ PASS |
| Threshold | 85% |
Coverage by Package
| Package | Coverage |
|---|---|
github.com/LerianStudio/reporter/components/manager/internal/adapters/http/in |
88.8% |
github.com/LerianStudio/reporter/components/manager/internal/services |
89.4% |
Generated by Go PR Analysis workflow
🔒 Security Scan Results —
|
| Policy | Status |
|---|---|
| Default non-root user | ✅ Passed |
| No fixable critical/high CVEs | ✅ Passed |
| No high-profile vulnerabilities | ✅ Passed |
| No AGPL v3 licenses | ✅ Passed |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/datasource/direct_provider_schema.go (1)
91-119:⚠️ Potential issue | 🟠 MajorPlugin CRM logical-schema handling is only fixed in this path.
This special case updates
GetDataSourceSchema, but the sibling MongoDB validation/details paths still branch onMidazOrganizationIDand fall back to physicalholders_<org>names. OnceMIDAZ_ORGANIZATION_IDis removed, template validation and datasource-details flows will keep rejecting logicalholders/aliasesreferences even though schema lookup here succeeds. Please update those call sites in the same rollout so the deprecation is actually safe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/datasource/direct_provider_schema.go` around lines 91 - 119, The code special-cases pluginCRM by calling GetDatabaseSchemaForPluginCRM when dataSourceID == pluginCRMDataSourceID and strips org suffix only in GetDataSourceSchema; fix the sibling MongoDB validation and datasource-details code paths so they use the same branching and naming logic: where code currently branches on ds.MidazOrganizationID or calls GetDatabaseSchema/GetDatabaseSchemaForOrganization, add the same dataSourceID == pluginCRMDataSourceID condition to call GetDatabaseSchemaForPluginCRM, and apply the same displayName handling (use stripOrgSuffix only when MidazOrganizationID != "" and dataSourceID != pluginCRMDataSourceID) so validation (template validation) and details flows accept logical names like "holders" for plugin CRM. Ensure references to pluginCRMDataSourceID, MidazOrganizationID, GetDatabaseSchemaForPluginCRM, GetDatabaseSchemaForOrganization, GetDatabaseSchema, and stripOrgSuffix are updated consistently across those validation/details functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/worker/internal/services/generate-report-data.go`:
- Around line 359-390: The merge order for multi-org CRM results is
non-deterministic because allCollections is iterated as returned; before the
loop over allCollections (where prefix := collection + "_" and you check
physColl and call uc.queryMongoCollectionWithFilters), sort/filter the list of
physical collections deterministically (e.g., collect matching physColl into a
slice, sort.Strings on that slice) so the subsequent loop appends org records in
a stable order into allResults; also add "sort" to the imports.
- Around line 376-381: The loop currently swallows errors from
uc.queryMongoCollectionWithFilters by logging and doing continue, which yields
silently incomplete reports; instead, change the behavior so the function fails
fast: replace the log+continue path inside the loop that checks queryErr to
return an error (e.g., wrap queryErr with context including physColl and
"plugin_crm") or, if you prefer aggregated reporting, collect physColl into a
slice of failures and after the loop return a consolidated error listing failed
collections and their errors. Update the branch that handles queryErr (from
uc.queryMongoCollectionWithFilters) to either return fmt.Errorf("failed querying
collection %s: %w", physColl, queryErr) immediately or append the failure to a
failures map and return an aggregated error at the end of the function.
In `@pkg/mongodb/datasource_schema.go`:
- Around line 265-281: The code currently creates a CollectionSchema and appends
it to schema even when unionFields is empty; update the logic around
CollectionSchema/FieldInformation creation so that if len(unionFields) == 0 you
either skip appending the logical group or emit a warning and continue;
specifically, before constructing collSchema or appending to schema, check
unionFields length (use the existing logicalName and sampled context), call
logger.Log at LevelWarn with logical_name and orgs_sampled when skipping, and
only build/apply collSchema and append to schema when unionFields has at least
one field.
- Around line 197-207: The code after calling ds.connection.GetDB(ctx) and
database.ListCollectionNames(schemaCtx, ...) lacks the context.DeadlineExceeded
checks present in GetDatabaseSchema and GetDatabaseSchemaForOrganization; update
the current function (the block with client, err := ds.connection.GetDB(ctx) and
the subsequent database.ListCollectionNames call) to detect if err ==
context.DeadlineExceeded (or errors.Is(err, context.DeadlineExceeded)) and
return the same timeout-specific error messages used by the sibling methods,
i.e., handle timeouts distinctly for both the GetDB call and the
ListCollectionNames call to keep behavior consistent.
- Around line 241-263: GetDatabaseSchemaForPluginCRM currently samples up to 5
physical collections via sampleMultipleDocuments and unions their fields instead
of calling discoverCollectionFields (which runs aggregation + sampling) to avoid
expensive per-org aggregations; add a concise comment above the loop explaining
this intentional design choice and tradeoff (e.g., that we sample multiple
org-scoped collections to avoid costly aggregation for multi-org scenarios,
assuming field schemas are consistent across orgs), and reference the reasons
and symbols involved (GetDatabaseSchemaForPluginCRM, sampleMultipleDocuments,
discoverCollectionFields, unionFields, unknownDataType) so future maintainers
understand why aggregation discovery was not used.
---
Outside diff comments:
In `@pkg/datasource/direct_provider_schema.go`:
- Around line 91-119: The code special-cases pluginCRM by calling
GetDatabaseSchemaForPluginCRM when dataSourceID == pluginCRMDataSourceID and
strips org suffix only in GetDataSourceSchema; fix the sibling MongoDB
validation and datasource-details code paths so they use the same branching and
naming logic: where code currently branches on ds.MidazOrganizationID or calls
GetDatabaseSchema/GetDatabaseSchemaForOrganization, add the same dataSourceID ==
pluginCRMDataSourceID condition to call GetDatabaseSchemaForPluginCRM, and apply
the same displayName handling (use stripOrgSuffix only when MidazOrganizationID
!= "" and dataSourceID != pluginCRMDataSourceID) so validation (template
validation) and details flows accept logical names like "holders" for plugin
CRM. Ensure references to pluginCRMDataSourceID, MidazOrganizationID,
GetDatabaseSchemaForPluginCRM, GetDatabaseSchemaForOrganization,
GetDatabaseSchema, and stripOrgSuffix are updated consistently across those
validation/details functions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9d7f330b-f8a9-4c55-975c-59dd883feacb
📒 Files selected for processing (9)
components/worker/internal/services/generate-report-data.gocomponents/worker/internal/services/generate-report-data_test.gocomponents/worker/internal/services/generate-report_test.gopkg/datasource-config.gopkg/datasource/direct_provider_schema.gopkg/mongodb/datasource.mongodb.gopkg/mongodb/datasource.mongodb.mock.gopkg/mongodb/datasource_query.gopkg/mongodb/datasource_schema.go
…order Templates using index-based access (e.g. holders.0.document) would render different organizations depending on ListCollectionNames return order. Sorting the matched physical collections ensures consistent results across runs. X-Lerian-Ref: 0x1
…ly skipping X-Lerian-Ref: 0x1
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/worker/internal/services/generate-report-data.go`:
- Around line 377-378: Remove the stray blank line immediately after the loop
header for "for _, physColl := range matchingCollections {" so the block starts
directly with the first statement (i.e., delete the empty line after the opening
brace in the loop over matchingCollections), ensuring the loop body formatting
conforms to lint rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dcf023cc-30da-4acc-899c-9f889ac42e13
📒 Files selected for processing (1)
components/worker/internal/services/generate-report-data.go
X-Lerian-Ref: 0x1
X-Lerian-Ref: 0x1
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 183-292: GetDatabaseSchemaForPluginCRM is over-complex; extract
the collection grouping and sampling/union logic into helpers to reduce
cognitive complexity: implement groupCollectionsByPrefix(collections []string)
map[string][]string to encapsulate the prefix-based grouping used where
logicalGroups is built, and implement (ds *ExternalDataSource)
sampleAndUnionFields(ctx context.Context, database *mongo.Database,
physicalCollections []string, maxSample int) map[string]string to encapsulate
the sampling loop that calls ds.sampleMultipleDocuments and builds unionFields
(use unknownDataType for additionalFields and respect maxSampleCollections
constant); then refactor GetDatabaseSchemaForPluginCRM to call these two helpers
(keep existing logging, schema construction with CollectionSchema and
FieldInformation, and the maxSampleCollections constant) so the main function
only orchestrates DB calls and assembles results.
- Around line 244-247: physicalCollections is sampled directly which is
non-deterministic because ListCollectionNames has no stable order; to fix, sort
the collection list before slicing so the sampling is deterministic—e.g., call
sort.Strings(physicalCollections) (or sort.Slice with desired comparator)
immediately before the sampled := physicalCollections line and then apply the
existing maxSampleCollections logic; ensure this change references the
physicalCollections and maxSampleCollections symbols so the deterministic
sampling is applied consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e34e7015-6516-45fc-ae5c-53745fd3fe28
📒 Files selected for processing (1)
pkg/mongodb/datasource_schema.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pkg/mongodb/datasource_schema.go (1)
244-247: 🧹 Nitpick | 🔵 TrivialNon-deterministic sampling order may cause inconsistent schema discovery.
ListCollectionNamesdoes not guarantee a stable order, and Go map iteration (logicalGroups) is also non-deterministic. Taking the first 5 collections without sorting may yield different samples across calls, potentially leading to inconsistent union schemas if there are minor field variations between orgs.♻️ Optional: Sort before sampling for determinism
+ import "sort" + sampled := physicalCollections + sort.Strings(sampled) if len(sampled) > maxSampleCollections { sampled = sampled[:maxSampleCollections] }Note: Add
"sort"to the imports at the top of the file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/mongodb/datasource_schema.go` around lines 244 - 247, The sampling of collections for schema discovery is non-deterministic because physicalCollections (from ListCollectionNames) and the logicalGroups map iterate in arbitrary order; before taking the first maxSampleCollections slice (the sampled variable), sort physicalCollections (and any keys derived from logicalGroups) to ensure a stable order; add "sort" to imports and call sort.Strings(physicalCollections) (or sort.Strings on the slice you assign to sampled) before truncating to maxSampleCollections so schema sampling is deterministic across runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 287-289: The iteration over the map unionFields produces
non-deterministic ordering when appending to collSchema.Fields
(FieldInformation{Name:, DataType:}), so change the loop to collect and sort the
map keys (e.g., into a slice), then iterate over the sorted keys to append
FieldInformation entries in a stable order; ensure you use the same fieldName
and dataType pairing from unionFields when constructing each FieldInformation.
---
Duplicate comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 244-247: The sampling of collections for schema discovery is
non-deterministic because physicalCollections (from ListCollectionNames) and the
logicalGroups map iterate in arbitrary order; before taking the first
maxSampleCollections slice (the sampled variable), sort physicalCollections (and
any keys derived from logicalGroups) to ensure a stable order; add "sort" to
imports and call sort.Strings(physicalCollections) (or sort.Strings on the slice
you assign to sampled) before truncating to maxSampleCollections so schema
sampling is deterministic across runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b1e77c79-b35b-445b-843d-c24b0d18a8b1
📒 Files selected for processing (1)
pkg/mongodb/datasource_schema.go
- extract groupCollectionsByPrefix and sampleAndUnionFields to reduce cognitive complexity of GetDatabaseSchemaForPluginCRM (33 -> ~15) - sort physical collections before sampling for deterministic union schema - skip logical groups with zero discovered fields X-Lerian-Ref: 0x1
This comment has been minimized.
This comment has been minimized.
X-Lerian-Ref: 0x1
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mongodb/datasource_schema.go`:
- Around line 285-290: The loop that builds collSchema.Fields from the
unionFields map in sampleAndUnionFields is non-deterministic; create a slice of
field names from unionFields, sort it (e.g., using sort.Strings), then iterate
over the sorted fieldNames to append FieldInformation entries to
collSchema.Fields so the union schema ordering is deterministic across runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2f8cbf74-7fb5-42ee-b09e-3a08c71dd6f0
📒 Files selected for processing (1)
pkg/mongodb/datasource_schema.go
Pull Request Checklist
Pull Request Type
Checklist
Please check each item after it's completed.
Additional Notes
Obs: Please, always remember to target your PR to develop branch instead of main.