Skip to content
This repository was archived by the owner on Apr 27, 2023. It is now read-only.

reduce reliance on mutation in conversion process#105

Open
jordanverasamy wants to merge 17 commits intomasterfrom
improvement/remove-mutation-reliance
Open

reduce reliance on mutation in conversion process#105
jordanverasamy wants to merge 17 commits intomasterfrom
improvement/remove-mutation-reliance

Conversation

@jordanverasamy
Copy link
Contributor

@jordanverasamy jordanverasamy commented Oct 25, 2018

Problem I'm trying to solve

Anyone anywhere can mutate the Shopify record

Here's the way the convert pipeline works on master:

When the convert process goes through each pipeline stage in sequence and calls stage.convert(row, record), that pipeline stage is expected to mutate record.

This is accomplished through multiple different ways. Most #convert functions use record.merge! to mutate that record to add the new data that it parsed from the hash.

However, additionally, it turns out that every AttributesAccumulator mutates the variable that is used to initialize it. A lot of our code was relying on this fact, because the accumulator was adding data to the record that was not ever merge!'d in.

In addition to that, some pipeline stages were mutating the record in other ways. For example, the VariantImage pipeline stage was mutating both the parent object and the corresponding variant.

Magento data always gets inserted into the Shopify record, then erased later

On a separate note, I noticed that sometimes our record was polluted with Magento data. This made it very confusing to determine what was going on, since the Shopify record would be a giant hash with ~10 keys of Shopify-formatted data and 50+ keys of Magento-formatted data.

It turns out that this was happening because the VariantAttributes pipeline stage was inserting the entire Magento hash for that simple product into the variants list of the Shopify record we were accumulating. This wasn't affecting the output, since we have a key whitelist in our CSV builder, but it made debugging very annoying.

Proposed solution

I'm proposing a change to the way our pipeline stages are expected to work.

Restructure the way pipeline stages are wired together

The #convert function of a pipeline stage is no longer expected to mutate anything. Instead, the return value of one pipeline stage is passed along as the input to the next pipeline stage.

This is accomplished by changing our call to @pipeline_stages.each into a call to @pipeline_stages.reduce.

This means that if you want to know where some data gets added to the record, instead of having to look everywhere for what line of code could have mutated that record, one simply has to look at the data that is being passed between pipeline stages, since nothing else could be modifying that record.

The goal of this change is to ensure that you can ALWAYS simply look at the input and return records of any #convert function and know what it did.

Restructure the RecordBuilder to match the new functional design

One slightly weird implementation detail of this is that the RecordBuilder, as written, actually expects that the record it yields will be mutated. I've added a bit of code to the RecordBuilder so that instead of expecting the record to be mutated with all the new data, it instead expects the block it is passed to return a new record with all the new data, and adjusts its instances table accordingly.

Don't let accumulators ever mutate external objects

The AttributesAccumulator class now calls .dup on the object passed to the initializer. This means that as the accumulator slurps up data from hashes and accumulates them in the @output instance variable, the internal state of the AttributesAccumulator is being built up, but the variable that was initially passed in is no longer being modified.

Rewrite pipeline stages to match the new structure

Remove the VariantAttributes pipeline stage. The purpose of this stage was simply to make sure that the right variants got populated to the right parent object, but this can actually be done very easily from the TopLevelVariantAttributes pipeline stage instead. This means we can remove this pipeline stage entirely.

A side effect of this is that the Shopify record is never deliberately polluted with keys that we don't expect to make it into the final CSV (except for the Magento product IDs, which are required for the association of simple products to configurable products.) This makes the record data easier to read as it gets incrementally built up by pipeline stages.

Rewrite the VariantImage pipeline stage to not mutate anything.

Checklist:

  • Testing & QA: Passes tests and linting.
  • 🎩: I have manually tested these changes.

Related issues:

Closes:

Copy link
Contributor

@surajreddy surajreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach so far. 👍

@jordanverasamy jordanverasamy force-pushed the improvement/remove-mutation-reliance branch from 06ff53a to 35edc8f Compare November 15, 2018 15:23
@jordanverasamy jordanverasamy changed the title [WIP] reduce reliance on mutation in conversion process [WIP/RFC] reduce reliance on mutation in conversion process Nov 15, 2018
@jordanverasamy jordanverasamy changed the title [WIP/RFC] reduce reliance on mutation in conversion process reduce reliance on mutation in conversion process Dec 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants