-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Summary
The staging service needs an API endpoint that the DTS service can invoke when it has finished transferring files to a temporary area, so the staging service can complete the transfer by copying those files to the user's datastore. This proposes calling it complete_transfer.
Background
See also #200
The current design around how the DTS operates is that it has permission to copy files to an intermediate storage area in Globus. However, it does not have permission to write directly to users' directories, which it shouldn't. To solve this, the staging service needs to be alerted when a file copy operation is completed, so it can finish the file copy process.
Term definitions
There's a couple terms in here that'll get used, I just want to make sure they're properly defined
DTS- Data Transfer Service - the external(ish) service that moves data around from outside KBase to inside.DTS user- the service account that the Data Transfer Service usesDTS directory- the intermediate Globus directory that the DTS drops data into
Proposal
The proposal here is a new endpoint tentatively called complete_transfer.
This should be an authenticated call, only allowed by a user with the DTS_TRANSFER auth credential [not sure if something like that exists or not yet].
This should be a POST command with the following body variables
target_user: the user to send the files tosubdirectory: the path to the subdirectory in the DTS staging area
The command should result in the subdirectory and all of its contents getting copied to either the user's root directory or a dts_transfers subdirectory (TBD). It should check that the files were copied correctly via checksum.
When complete, it should return:
- A
201response if all is copied well, with body{"path": "/user/new_directory"} - A
400error if either variable is missing, or if the user doesn't exist. - A
401if the caller isn't authenticated - A
403if the caller doesn't have permission (either auth role or globus permissions) - A
404if the source directory doesn't exist - A
500if the copy failed for any reason. Any copied files and new directories should be removed as well (should they?).
An edge case here is what to do if the target directory already exists. It's unlikely, since DTS transfers show up in a UUID-named directory, but if so, they should just get appended with -1 and return the new directory as above.
Options to be decided
- Either have the calling service specify the complete Globus path to the file endpoint, which might not be
/data/bulk/...but some other Globus share, or specify that in a new config option. This would make the difference between a call like:
{ "target_user": "kbase_user", "subdirectory": "/data/dts/some_subdir" }
vs
{ "target_user": "kbase_user", "subdirectory": "some_subdir" }
- Target directory. Should this be under the user's root, or a specified dts subdirectory? If in the user's root, then they'll get a UUID-named directory appearing, which is probably fine as it'll be expected, but might get lost if they have a bunch of other files present. If in a subdirectory, they'll know where to look, at least.
- Copying vs. moving. IMO copies should be done and verified via checksum in case any file system things go wonky. Unless there's a direct globus call to do that, we'll have to make some Python
shutilcalls or similar. After copying is done and verified, we can either delete source files in the service call, or wait for external cleanup to happen. - File ownership. These should get updated as well, once copies are done. [not sure if necessary, the file upload endpoint doesn't do this]