Bug description
When a FeedVersion is created from a snapshot export, originNamespace is correctly set to snapshot.namespace in the FeedVersion(FeedSource, Snapshot) constructor. However, the field is subsequently erased by later jobs that fetch the FeedVersion from MongoDB and write it back using TypedPersistence.replace().
The MongoDB POJO codec deserializes documents by calling the no-arg constructor and populating only fields present in the stored document. Any field absent from the document — including originNamespace on documents created before that field existed in the codebase — is left as null on the deserialized Java object. When replace() is then called with that object, it overwrites the entire MongoDB document, writing null for every field that was null on the Java object, including originNamespace.
The result is that originNamespace is written correctly on the first save but silently erased by a subsequent replace() call later in the same pipeline or in a future operation.
To reproduce
- Open any feed source with editor snapshots
- Export a snapshot as a new feed version via the Snapshots tab → Publish button
- Inspect the resulting
FeedVersion document in MongoDB immediately after ValidateFeedJob completes — originNamespace will be present
- Wait for
ValidateMobilityDataFeedJob to complete
- Re-inspect the same document —
originNamespace is now null
Expected behavior
originNamespace should be set to the namespace of the source Snapshot and remain populated for the lifetime of the FeedVersion document, regardless of subsequent validation or persistence operations.
Actual behavior
originNamespace is present after initial creation but is erased by subsequent replace() calls in ValidateMobilityDataFeedJob.jobFinished() and potentially publishToExternalResource(). In a sample feed with 11 versions of which 9 were created via snapshot export, only 2–3 retained a non-null originNamespace.
Root cause
The bug is caused by the interaction of two patterns in the persistence layer:
1. Full document replacement via TypedPersistence.replace()
public void replace(String id, T replaceObject) {
mongoCollection.replaceOne(eq(id), replaceObject); // overwrites entire document
}
2. MongoDB POJO codec deserialization via no-arg constructor
public FeedVersion() {
// do nothing — all fields start null
}
When a FeedVersion is fetched from MongoDB with getById(), the POJO codec instantiates a blank object and populates only the fields present in the stored document. Fields absent from the document (e.g. originNamespace on older documents) remain null. If replace() is then called on that object, those null values are written back to MongoDB, erasing data that may have been correctly stored.
Confirmed offending code paths:
ValidateMobilityDataFeedJob.jobFinished() — fetches or receives a FeedVersion then calls persistFeedVersionAfterValidation() → replace()
publishToExternalResource() — same pattern
The fix
Immediate fix — replace full-document writes with surgical field updates in the offending methods. In ValidateMobilityDataFeedJob.jobFinished() and any equivalent method that only mutates specific fields, replace:
Persistence.feedVersions.replace(id, this);
With targeted updateField() calls:
Persistence.feedVersions.updateField(feedVersion.id, "mobilityDataResult", feedVersion.mobilityDataResult);
updateField() uses MongoDB $set under the hood and only touches the named field, leaving all other fields in the stored document intact.
Systemic fix — audit all replace() calls on FeedVersion across the codebase. replace() is only safe when the full in-memory object lifecycle is controlled end-to-end (e.g. the CreateFeedVersionFromSnapshotJob pipeline where the same object reference is threaded through every sub-job without any intermediate getById() fetch). Every other usage should be converted to updateField() for the specific fields being mutated.
Impact
originNamespace is the only field linking a FeedVersion back to the Snapshot it was created from. When erased, there is no programmatic way to determine which snapshot corresponds to a published version, making editor state recovery unreliable.
- This bug affects any field added to
FeedVersion after initial deployment. Any document written before a new field was introduced will have that field erased the next time replace() is called on a deserialized copy of it. originNamespace is one confirmed instance but the vulnerability is systemic.
- Downstream tooling and reporting built on the datatools MongoDB (e.g. Metabase dashboards tracking feed publication history) cannot reliably join versions to their source snapshots.
Environment
- datatools-server: hosted instance
- MongoDB: 8.0.21
Bug description
When a
FeedVersionis created from a snapshot export,originNamespaceis correctly set tosnapshot.namespacein theFeedVersion(FeedSource, Snapshot)constructor. However, the field is subsequently erased by later jobs that fetch theFeedVersionfrom MongoDB and write it back usingTypedPersistence.replace().The MongoDB POJO codec deserializes documents by calling the no-arg constructor and populating only fields present in the stored document. Any field absent from the document — including
originNamespaceon documents created before that field existed in the codebase — is left asnullon the deserialized Java object. Whenreplace()is then called with that object, it overwrites the entire MongoDB document, writingnullfor every field that wasnullon the Java object, includingoriginNamespace.The result is that
originNamespaceis written correctly on the first save but silently erased by a subsequentreplace()call later in the same pipeline or in a future operation.To reproduce
FeedVersiondocument in MongoDB immediately afterValidateFeedJobcompletes —originNamespacewill be presentValidateMobilityDataFeedJobto completeoriginNamespaceis nownullExpected behavior
originNamespaceshould be set to thenamespaceof the sourceSnapshotand remain populated for the lifetime of theFeedVersiondocument, regardless of subsequent validation or persistence operations.Actual behavior
originNamespaceis present after initial creation but is erased by subsequentreplace()calls inValidateMobilityDataFeedJob.jobFinished()and potentiallypublishToExternalResource(). In a sample feed with 11 versions of which 9 were created via snapshot export, only 2–3 retained a non-nulloriginNamespace.Root cause
The bug is caused by the interaction of two patterns in the persistence layer:
1. Full document replacement via
TypedPersistence.replace()2. MongoDB POJO codec deserialization via no-arg constructor
When a
FeedVersionis fetched from MongoDB withgetById(), the POJO codec instantiates a blank object and populates only the fields present in the stored document. Fields absent from the document (e.g.originNamespaceon older documents) remainnull. Ifreplace()is then called on that object, thosenullvalues are written back to MongoDB, erasing data that may have been correctly stored.Confirmed offending code paths:
ValidateMobilityDataFeedJob.jobFinished()— fetches or receives aFeedVersionthen callspersistFeedVersionAfterValidation()→replace()publishToExternalResource()— same patternThe fix
Immediate fix — replace full-document writes with surgical field updates in the offending methods. In
ValidateMobilityDataFeedJob.jobFinished()and any equivalent method that only mutates specific fields, replace:With targeted
updateField()calls:updateField()uses MongoDB$setunder the hood and only touches the named field, leaving all other fields in the stored document intact.Systemic fix — audit all
replace()calls onFeedVersionacross the codebase.replace()is only safe when the full in-memory object lifecycle is controlled end-to-end (e.g. theCreateFeedVersionFromSnapshotJobpipeline where the same object reference is threaded through every sub-job without any intermediategetById()fetch). Every other usage should be converted toupdateField()for the specific fields being mutated.Impact
originNamespaceis the only field linking aFeedVersionback to theSnapshotit was created from. When erased, there is no programmatic way to determine which snapshot corresponds to a published version, making editor state recovery unreliable.FeedVersionafter initial deployment. Any document written before a new field was introduced will have that field erased the next timereplace()is called on a deserialized copy of it.originNamespaceis one confirmed instance but the vulnerability is systemic.Environment