Skip to content

originNamespace silently erased on FeedVersion by full-document replace() after deserialization #646

@canales

Description

@canales

Bug description

When a FeedVersion is created from a snapshot export, originNamespace is correctly set to snapshot.namespace in the FeedVersion(FeedSource, Snapshot) constructor. However, the field is subsequently erased by later jobs that fetch the FeedVersion from MongoDB and write it back using TypedPersistence.replace().

The MongoDB POJO codec deserializes documents by calling the no-arg constructor and populating only fields present in the stored document. Any field absent from the document — including originNamespace on documents created before that field existed in the codebase — is left as null on the deserialized Java object. When replace() is then called with that object, it overwrites the entire MongoDB document, writing null for every field that was null on the Java object, including originNamespace.

The result is that originNamespace is written correctly on the first save but silently erased by a subsequent replace() call later in the same pipeline or in a future operation.


To reproduce

  1. Open any feed source with editor snapshots
  2. Export a snapshot as a new feed version via the Snapshots tab → Publish button
  3. Inspect the resulting FeedVersion document in MongoDB immediately after ValidateFeedJob completes — originNamespace will be present
  4. Wait for ValidateMobilityDataFeedJob to complete
  5. Re-inspect the same document — originNamespace is now null

Expected behavior

originNamespace should be set to the namespace of the source Snapshot and remain populated for the lifetime of the FeedVersion document, regardless of subsequent validation or persistence operations.


Actual behavior

originNamespace is present after initial creation but is erased by subsequent replace() calls in ValidateMobilityDataFeedJob.jobFinished() and potentially publishToExternalResource(). In a sample feed with 11 versions of which 9 were created via snapshot export, only 2–3 retained a non-null originNamespace.


Root cause

The bug is caused by the interaction of two patterns in the persistence layer:

1. Full document replacement via TypedPersistence.replace()

public void replace(String id, T replaceObject) {
    mongoCollection.replaceOne(eq(id), replaceObject); // overwrites entire document
}

2. MongoDB POJO codec deserialization via no-arg constructor

public FeedVersion() {
    // do nothing — all fields start null
}

When a FeedVersion is fetched from MongoDB with getById(), the POJO codec instantiates a blank object and populates only the fields present in the stored document. Fields absent from the document (e.g. originNamespace on older documents) remain null. If replace() is then called on that object, those null values are written back to MongoDB, erasing data that may have been correctly stored.

Confirmed offending code paths:

  • ValidateMobilityDataFeedJob.jobFinished() — fetches or receives a FeedVersion then calls persistFeedVersionAfterValidation()replace()
  • publishToExternalResource() — same pattern

The fix

Immediate fix — replace full-document writes with surgical field updates in the offending methods. In ValidateMobilityDataFeedJob.jobFinished() and any equivalent method that only mutates specific fields, replace:

Persistence.feedVersions.replace(id, this);

With targeted updateField() calls:

Persistence.feedVersions.updateField(feedVersion.id, "mobilityDataResult", feedVersion.mobilityDataResult);

updateField() uses MongoDB $set under the hood and only touches the named field, leaving all other fields in the stored document intact.

Systemic fix — audit all replace() calls on FeedVersion across the codebase. replace() is only safe when the full in-memory object lifecycle is controlled end-to-end (e.g. the CreateFeedVersionFromSnapshotJob pipeline where the same object reference is threaded through every sub-job without any intermediate getById() fetch). Every other usage should be converted to updateField() for the specific fields being mutated.


Impact

  • originNamespace is the only field linking a FeedVersion back to the Snapshot it was created from. When erased, there is no programmatic way to determine which snapshot corresponds to a published version, making editor state recovery unreliable.
  • This bug affects any field added to FeedVersion after initial deployment. Any document written before a new field was introduced will have that field erased the next time replace() is called on a deserialized copy of it. originNamespace is one confirmed instance but the vulnerability is systemic.
  • Downstream tooling and reporting built on the datatools MongoDB (e.g. Metabase dashboards tracking feed publication history) cannot reliably join versions to their source snapshots.

Environment

  • datatools-server: hosted instance
  • MongoDB: 8.0.21

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions