reduce reliance on mutation in conversion process by jordanverasamy · Pull Request #105 · Shopify/shopify_transporter

jordanverasamy · 2018-10-25T22:13:37Z

Problem I'm trying to solve

Anyone anywhere can mutate the Shopify record

Here's the way the convert pipeline works on master:

When the convert process goes through each pipeline stage in sequence and calls stage.convert(row, record), that pipeline stage is expected to mutate record.

This is accomplished through multiple different ways. Most #convert functions use record.merge! to mutate that record to add the new data that it parsed from the hash.

However, additionally, it turns out that every AttributesAccumulator mutates the variable that is used to initialize it. A lot of our code was relying on this fact, because the accumulator was adding data to the record that was not ever merge!'d in.

In addition to that, some pipeline stages were mutating the record in other ways. For example, the VariantImage pipeline stage was mutating both the parent object and the corresponding variant.

Magento data always gets inserted into the Shopify record, then erased later

On a separate note, I noticed that sometimes our record was polluted with Magento data. This made it very confusing to determine what was going on, since the Shopify record would be a giant hash with ~10 keys of Shopify-formatted data and 50+ keys of Magento-formatted data.

It turns out that this was happening because the VariantAttributes pipeline stage was inserting the entire Magento hash for that simple product into the variants list of the Shopify record we were accumulating. This wasn't affecting the output, since we have a key whitelist in our CSV builder, but it made debugging very annoying.

Proposed solution

I'm proposing a change to the way our pipeline stages are expected to work.

Restructure the way pipeline stages are wired together

The #convert function of a pipeline stage is no longer expected to mutate anything. Instead, the return value of one pipeline stage is passed along as the input to the next pipeline stage.

This is accomplished by changing our call to @pipeline_stages.each into a call to @pipeline_stages.reduce.

This means that if you want to know where some data gets added to the record, instead of having to look everywhere for what line of code could have mutated that record, one simply has to look at the data that is being passed between pipeline stages, since nothing else could be modifying that record.

The goal of this change is to ensure that you can ALWAYS simply look at the input and return records of any #convert function and know what it did.

Restructure the RecordBuilder to match the new functional design

One slightly weird implementation detail of this is that the RecordBuilder, as written, actually expects that the record it yields will be mutated. I've added a bit of code to the RecordBuilder so that instead of expecting the record to be mutated with all the new data, it instead expects the block it is passed to return a new record with all the new data, and adjusts its instances table accordingly.

Don't let accumulators ever mutate external objects

The AttributesAccumulator class now calls .dup on the object passed to the initializer. This means that as the accumulator slurps up data from hashes and accumulates them in the @output instance variable, the internal state of the AttributesAccumulator is being built up, but the variable that was initially passed in is no longer being modified.

Rewrite pipeline stages to match the new structure

Remove the VariantAttributes pipeline stage. The purpose of this stage was simply to make sure that the right variants got populated to the right parent object, but this can actually be done very easily from the TopLevelVariantAttributes pipeline stage instead. This means we can remove this pipeline stage entirely.

A side effect of this is that the Shopify record is never deliberately polluted with keys that we don't expect to make it into the final CSV (except for the Magento product IDs, which are required for the association of simple products to configurable products.) This makes the record data easier to read as it gets incrementally built up by pipeline stages.

Rewrite the VariantImage pipeline stage to not mutate anything.

Checklist:

Testing & QA: Passes tests and linting.
🎩: I have manually tested these changes.

Related issues:

Closes:

surajreddy

I like the approach so far. 👍

…ite yet though

surajreddy reviewed Oct 26, 2018

View reviewed changes

jordanverasamy added 14 commits November 14, 2018 15:14

initial poking and prodding

289d960

some adjustments to the way we use the record builder, not working qu…

a0efa0a

…ite yet though

it works! (no full tophat yet)

53f9b2d

some updates

14b453d

second isnt a thing apparently

e8f60f4

some progresss....

3cb1967

:think:

ca27e12

adjust the way record builder works

52fc117

some progress?

fddc1e2

some cleanup

78f903e

almost works!! :D

78f7637

WORKS!

a7836b6

fix a test

b1eabd6

more progress. pretty FUCKING legit

35edc8f

jordanverasamy force-pushed the improvement/remove-mutation-reliance branch from 06ff53a to 35edc8f Compare November 15, 2018 15:23

jordanverasamy added 3 commits November 15, 2018 14:43

some progress

4ad6cb7

rip out variantattributes pipeline stage

7eda91d

seems to work perfectly for orders too

60e9a85

jordanverasamy changed the title ~~[WIP] reduce reliance on mutation in conversion process~~ [WIP/RFC] reduce reliance on mutation in conversion process Nov 15, 2018

jordanverasamy changed the title ~~[WIP/RFC] reduce reliance on mutation in conversion process~~ reduce reliance on mutation in conversion process Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce reliance on mutation in conversion process#105

reduce reliance on mutation in conversion process#105
jordanverasamy wants to merge 17 commits intomasterfrom
improvement/remove-mutation-reliance

jordanverasamy commented Oct 25, 2018 •

edited

Loading

Uh oh!

surajreddy left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jordanverasamy commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem I'm trying to solve

Anyone anywhere can mutate the Shopify record

Magento data always gets inserted into the Shopify record, then erased later

Proposed solution

Restructure the way pipeline stages are wired together

Restructure the RecordBuilder to match the new functional design

Don't let accumulators ever mutate external objects

Rewrite pipeline stages to match the new structure

Checklist:

Related issues:

Uh oh!

surajreddy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jordanverasamy commented Oct 25, 2018 •

edited

Loading