Skip to content

Testing with huge ndjson files may cause tests to crash and sessions to hang #2

@arscan

Description

@arscan

The approach that the test suite currently uses is to load all ndjson files into memory (in scratch), and then in subsequent tests read the resources from memory and perform various tests on them.

This makes the test code very readable and concise, but for systems that want to test very large json files, it may cause the tests to use up all memory allocated to a worker. It is unclear if a system would ever want to do this during testing, but it is possible.

We would need to do some testing to see how the production system would handle running out of memory. It is possible that the Inferno system as a whole would handle it gracefully, killing and restarting the worker process that ran out of memory, not causing any issues to other testers. In this best case, the tester that is running the tests against the large data set would have their session get 'stuck' and it might be unrecoverable. So that's not great.

It is possible that a worse situation could occur, where the worker process isn't properly restarted, and thus preventing anyone from running any tests on the inferno host. I think this is unlikely, but we would have to test to find out for sure.

A simple solution to protect the system might be to just put in some guards that check the size of the ndjson file, and if it is too big (e.g. > 10mb), just have the test skip. There are much more involved solutions as well (e.g. streaming data back and only keeping the latest resource in memory, which we do in our bulk data test kit and our smart scheduling links test kit), but that is much more complex to implement in inferno right now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions