A project for Indroduction to NodeJS Course at edX
Additional requirements:
- nodeJS
- npm
- mongodb, usually installed globally
- Must have m3-customer-data.json and m3-customer-address-data.json retrieved saved in directory.
How to install:
- clone package
- run npm install in the resulting directory.
How to run:
- Start mongodb: mongod
- run node migrate-data.js X where X is number of records to process in each parallel assignment
- or use ./go.sh X
-
This was a mess from the beginning. Instructions say "Luckily, your friends were able to restore the address information from a backup replica of a MongoDB instance" which seems to imply that the information is already in a MongoDB database. The walk-through indicates that is not the case.
-
The instructions say "You have millions of records so you need to create a script which can run queries to the database in parallel." This would indicate that the final script should be able to handle millions of records. One problem with that would be that you may not be able load an array or set of arrays into memory if it consists of millions of records. Ideally, you would read the records from the json file (or the database), and process them a portion at a time in some sort of parallel or sequential fashion. The walk-through, however, loads two arrays into memory.
-
The instructions "Read the number of objects to process in a single query from a CLI argument. For example, node migrate-data.js 1000 will run 10 queries in parallel out of 1000 objects while node migrate-data.js 50 will run 20 queries in parallel out of the same 1000 objects" is simply wrong.
1000 divided by 1000 is not 10. -
So the solution is simply to read two json files into memory as arrays of objects. Then you cycle through each object merging using Object.assign().
For each record as you merge the arrays into one, check to see if it is a whole chunk using the mod operator (%), and if so, push it onto the array of tasks.
After the array of tasks is built, you've gone through the whole of the two customer data arrays, and assembled a new array. Theoretically, that could have been done in parallel, but again in the walk-through, it is not. Now, go through the tasks in parallel, each task ads one slice into the Mongo database.
- If you need to reset to experiment with it, use node reset.js which will drop the collection from the MongoDB database.