Skip to content

Add the ability to parse a DTS transfer manifest into bulk import specifications #200

@jeff-cohere

Description

@jeff-cohere

Now that the Data Transfer Service is ostensibly working, it would be convenient for the staging service to be able to parse the contents of a manifest.json file (deposited in a transfer-specific directory within a KBase user's staging area) into one or more xSV files that can then be imported using the existing bulk import functionality. I'm using this issue to track my investigation of how we might add such a feature.

Assessment

The KBase Narrative itself defines a SetupImportCells function that interacts with the staging service to parse files for import, including

  • single files
  • a list of bulk-importable files
  • import specification spreadsheets

Downstream, this function calls a bulkSpecification method which is mapped to the staging service's bulk_specification endpoint here.

Upstream, the function is called by the KBase staging area viewer UI code here.

Proposed approach

We could modify the SetupImportCells function to add support for the intermediate parsing of a DTS transfer manifest which produces a set of import specification spreadsheets. Currently, the function produces its "import app cells" in 3 stages:

  1. Filter the file infos into bins based on their given type - single upload, bulk upload, or xsv (that's CSV, TSV, Excel)
  2. Get all the extended file info from the xsv files to push into the bulk uploader section
  3. Make the bulk import and other singleton import cells.

We could modify this procedure to inject logic to parse a DTS transfer manifest if it was found among the selected set of files (passed to the function in fileInfo):

  1. Filter the file infos into bins based on their given type - single upload, bulk upload, DTS manifest, or xsv (that's CSV, TSV, Excel)
  2. Parse any selected DTS manifest file, writing out xsv files and adding them to the list of xsv files to parse for individual files.
  3. Get all the extended file info from the xsv files to push into the bulk uploader section
  4. Make the bulk import and other singleton import cells.

We would have to add another endpoint to the staging service to support stage 2, but this seems easier than modifying existing endpoints and changing their assumptions. I'll continue to update this issue as I find out more.

Risks / Constraints

Because this approach is incremental and depends on the existing import specification machinery in the staging service, it carries with it all of the related limitations and drawbacks:

  1. Speed: the bulk import spec can be very slow for >500 files, making the Narrative unresponsive in the process.
  2. UI clunkiness: any manifest containing files of differing "data types" necessarily generates multiple import specification spreadsheets. This might be confusing for users if many such spreadsheets are generated from a single manifest.

I'm more concerned about item 1 (can we improve the performance of the staging service without a ton of work?). But I assume that we'll eventually want something a bit more automated or less finicky for users to work with.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions