You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@lamanda0227 currently our data is distributed by our S3 bucket which would work for the week of the course but not good idea in the long run for people to still access the data. On the other hand, for annovar we distribution via S3 bucket on the fly but we should not distribute it because of license issues.
Also since the data is >20GB for people running the image on their local machine we should not download all data at start-up. We should give people an easy to boot interface first then they have to download data at their own discretion -- so they will realize the challenge to work with this large data-set and potentially just rely on cloud solutions.
The idea is simple -- we now put all our data on synapse, and provide a command to download. However, the command will only be triggered if we can succesfully deploy annovar software, which is an indication that we are running it on mmcloud. Otherwise, it is going to throw error message and prompt users to download on their own.
For users to download separately, they will have to start the server then type get-data command to download everything.
@lamanda0227 i think this is going to work, if you could upload all the data to synapse and change this line with the synapse project ID:
the data should be of the exact structure on synapse. I suggest you test it out locally first by first clone the code repository then see if this command adds data on top of the code repo structure. I think it is going to work.
Once this is done, let @yiweizh-memverge know so we can try rebuild the image, to use this new way of getting the data. That means our image more or less independent of S3.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
@lamanda0227 currently our data is distributed by our S3 bucket which would work for the week of the course but not good idea in the long run for people to still access the data. On the other hand, for annovar we distribution via S3 bucket on the fly but we should not distribute it because of license issues.
Also since the data is >20GB for people running the image on their local machine we should not download all data at start-up. We should give people an easy to boot interface first then they have to download data at their own discretion -- so they will realize the challenge to work with this large data-set and potentially just rely on cloud solutions.
I made this change to acheive both goals:
cumc/handson-tutorials@c560c55
The idea is simple -- we now put all our data on synapse, and provide a command to download. However, the command will only be triggered if we can succesfully deploy annovar software, which is an indication that we are running it on mmcloud. Otherwise, it is going to throw error message and prompt users to download on their own.
For users to download separately, they will have to start the server then type
get-datacommand to download everything.@lamanda0227 i think this is going to work, if you could upload all the data to synapse and change this line with the synapse project ID:
https://github.com/cumc/handson-tutorials/blob/main/setup/course_entrypoint.sh#L32
the data should be of the exact structure on synapse. I suggest you test it out locally first by first clone the code repository then see if this command adds data on top of the code repo structure. I think it is going to work.
Once this is done, let @yiweizh-memverge know so we can try rebuild the image, to use this new way of getting the data. That means our image more or less independent of S3.
Beta Was this translation helpful? Give feedback.
All reactions