How to source our test data? #66
thorwhalen
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
You're writing some code to test or demo something.
You need some data to do this.
Where do you put it?
Problem
If the data is small (e.g. a small array or dict) we often see the data being "made" within the code itself like this
Now, there's pros and cons to this, but we won't go there for now.
Let's focus on the biggest problem first: When the data is a bit bigger or complex.
Then what do folks do?
They often do what's the easiest, without thinking of the consequences behind their myopic need.
That's not necessarily bad, if we provide the tools so the path of least resistance just so happens to be the one with a more scalable design. Let's see what the various ways people take care of this problem, and discuss the pros and cons.
Solutions
Local files
They put things in files, and tell the test to look there for it's data dependencies.
What files?
Well, too often, it's a local file, which therefore can only be run by the original author as is.
Shared files
They can make that a bit better by sharing that file (e.g. via dropbox), and using the shared link in the code.
With
graze, for example, they can now have code that will automatically go fetch the data if it's missing locally, and then use the local copy.Project files
Another common way to do this is to put the files in the same repository as the code, often under a
datafolder in thetestsfolder. (By the way, the proper robust way to then refer to these files is described here.)This is a good solution, but still has a big con as the data required becomes even bigger. That data is there for tests, yet when I download the repo, or do any kind of git synching operations on it, I find myself spending more than 99% of the computing resources (bandwidth, storage, sync-computations,...) on something that is only for tests.
So...
Separate data
Putting the files elsewhere could go a long way here. Put them in a remote storage system -- could be a github repo, some S3 bucket, some dropbox folder, etc. -- just separate. Then in your test code, use a function that will mediate the access to that data, and take care of aspects like "do I keep a local copy?" etc.
Having the test data separate also enables reuse of the data in other contexts than just one test.
Data accessor
Here, some kind of "lazy" approach would be warranted. One where the data is "acquired" only if, and when, needed.
More generally, if you source your data through a function that gives you the data (from where ever), it keeps the test data consumers open-closed from that aspect: You just change the function (how it gets the data).
(_This is what I advise especially when developing GUIs that usually start with some hard-coded data, than move on to getting those datas from other GUI components or APIs to external resources. If from the start, the data dependencies go through functions or "callbacks", it makes the design SOLID from the start. _)
Generate the data
If and when the test data can be actually generated by computation, then we can get unbounded amounts of data with a finite storage footprint (just the function's code). This is what
hum(for vibration) andforged(for other types of data) are for.Beta Was this translation helpful? Give feedback.
All reactions