Skip to content

Add an API call that acts as a message that a DTS transfer has finished. #217

@briehl

Description

@briehl

Summary

The staging service needs an API endpoint that the DTS service can invoke when it has finished transferring files to a temporary area, so the staging service can complete the transfer by copying those files to the user's datastore. This proposes calling it complete_transfer.

Background

See also #200
The current design around how the DTS operates is that it has permission to copy files to an intermediate storage area in Globus. However, it does not have permission to write directly to users' directories, which it shouldn't. To solve this, the staging service needs to be alerted when a file copy operation is completed, so it can finish the file copy process.

Term definitions

There's a couple terms in here that'll get used, I just want to make sure they're properly defined

  • DTS - Data Transfer Service - the external(ish) service that moves data around from outside KBase to inside.
  • DTS user - the service account that the Data Transfer Service uses
  • DTS directory - the intermediate Globus directory that the DTS drops data into

Proposal

The proposal here is a new endpoint tentatively called complete_transfer.
This should be an authenticated call, only allowed by a user with the DTS_TRANSFER auth credential [not sure if something like that exists or not yet].
This should be a POST command with the following body variables

  • target_user: the user to send the files to
  • subdirectory: the path to the subdirectory in the DTS staging area

The command should result in the subdirectory and all of its contents getting copied to either the user's root directory or a dts_transfers subdirectory (TBD). It should check that the files were copied correctly via checksum.

When complete, it should return:

  • A 201 response if all is copied well, with body {"path": "/user/new_directory"}
  • A 400 error if either variable is missing, or if the user doesn't exist.
  • A 401 if the caller isn't authenticated
  • A 403 if the caller doesn't have permission (either auth role or globus permissions)
  • A 404 if the source directory doesn't exist
  • A 500 if the copy failed for any reason. Any copied files and new directories should be removed as well (should they?).

An edge case here is what to do if the target directory already exists. It's unlikely, since DTS transfers show up in a UUID-named directory, but if so, they should just get appended with -1 and return the new directory as above.

Options to be decided

  1. Either have the calling service specify the complete Globus path to the file endpoint, which might not be /data/bulk/... but some other Globus share, or specify that in a new config option. This would make the difference between a call like:
{ "target_user": "kbase_user", "subdirectory": "/data/dts/some_subdir" }
vs
{ "target_user": "kbase_user", "subdirectory": "some_subdir" }
  1. Target directory. Should this be under the user's root, or a specified dts subdirectory? If in the user's root, then they'll get a UUID-named directory appearing, which is probably fine as it'll be expected, but might get lost if they have a bunch of other files present. If in a subdirectory, they'll know where to look, at least.
  2. Copying vs. moving. IMO copies should be done and verified via checksum in case any file system things go wonky. Unless there's a direct globus call to do that, we'll have to make some Python shutil calls or similar. After copying is done and verified, we can either delete source files in the service call, or wait for external cleanup to happen.
  3. File ownership. These should get updated as well, once copies are done. [not sure if necessary, the file upload endpoint doesn't do this]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions