-
Notifications
You must be signed in to change notification settings - Fork 0
CBS Python (Docker)
Here you will find instructions about installing and using our software toolkit (aka CBS Python) that is contained in a pre-built and (hopefully!) easy-to-use Docker image.
Basically, this Docker image contains a functional Python environment for working with Cambridge Brain Sciences (CBS) data. It includes custom Python packages (and their required dependencies) for performing common preprocessing routines. Consider this Docker image an alternative to having to install and maintain a Python development environment; instead, just spin up a container and make use of all the Python-ic goodness stored inside! You can preprocess your CBS datasets to extract score features, re-organize the data, calculate norms, calculate domain scores, encrypt your data files, and other things. These functions are all accessed using Script Entrypoints - see below for descriptions.
Note, these tools do not include any statistics or analysis routines! You'll have to use your software of choice (R, SPSS, Python, etc.) to read and analyze the data saved by the commands you run here.
Why use Docker for this stuff? Well, it makes it easy to distribute these tools for others to use without having to worry about what kind of computer they're using, or what specific Python version, or package versions, are installed. Better yet, you don't even have to know how to use Python in order to use all these Python scripts! Also, it's easy to distribute updates for these tools without having to worry about dependencies, etc; you can just tell Docker to pull the latest image when sa new one is available. There are lots of reasons! Go forth and Google...
This toolkit was initially designed to pre-process CBS data in a Datalad pipeline to ensure computational reproducibility. See datalad-containers and the Datalad handbook. It's worth reading a bit about DataLad if you have never heard about it...
Alternatively, you just can just use the CBS Python image to manually spin up a Docker container and process your data. See the appropriate Instructions depending on your use case.
Either way, you will have to use a command-line terminal (e.g., on MacOS) to use these tools. That's right - no GUI for you! If you're not familiar with a terminal, maybe get started with a tutorial or two.
- Make sure that you have Docker installed. See instructions for MacOS, Windows, Linux, etc.
- In your terminal of choice, test your Docker installation:
# Running the following command should display a "Hello from Docker!" message:
docker run hello-world
Our image is not stored on the DockerHub registry, but rather in a GitHub package associated with this repository. To use this image with Docker on your machine you will need to jump through a few extra hoops.
- The image is stored in a private GH pcakage (for now), and to gain read access you will need to be added to TheOwenLab's "collaborators" team. Message me on the Discussion Board or via email with your GitHub account name and your intended use case, and I will will grant you access.
- Create a Personal Access Token for your GitHub account. [UPDATE: use "classic" token, and not "fine grained" one] This acts like a secondary password for your account that grants restricted permissions. The only "scope" (permission) required for our purposes is
read:packages. - In your terminal, use your new token to log the Docker application into the Github container registry:
export GH_PAT=PASTE_YOUR_TOKEN_HERE
echo $GH_PAT | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
Note: You are probably going to need to use your token more than once, e.g., if you have to re-pull the Docker image for updates. Good practice would be to add the export GH_PAT=PASTE_YOUR_TOKEN_HERE to your ~/.bashrc, ~/.profile, or equivalent file. That way the environment variable GH_PAT will always be available when you start a new terminal (also, note that GitHub will only show you your token once when it is generated).
- Pull the image from the GH registry to your local machine:
docker pull ghcr.io/theowenlab/cbspython:latest
That's it! CBS Python is now installed on your machine, and you can use the Docker run command (or datalad-containers run) to invoke various magical commands.
The toolkit provided in this Docker image is really just a collection of Python scripts (kinda) that process your input files (e.g., a raw CBS data export in .csv form) and save some output files. Any of the following entrypoints can be executed in your terminal with a docker run ... command. Details are provided below.
The script takes as an input a raw CBS .csv data export and parses/extracts all kinds of score feature data and saves them into a nice wideform format with one row per assessment.
Parses the data like the previous script, but also:
- generates age/gender matched norms for each participant in your dataset,
- z-scores all test scores and their features,
- calculates "domain" scores based on the Varimax-rotated PCA loadings from Hampshire et al. 2012,
- generates age/gender matched norms for the domain scores for each of your participants,
- z-scores the domain scores.
Batch encryption of files using Fernet encryption - which is an implementation of symmetric (aka "secret key") authenticated cryptography. Basically, super securely encrypt data files. More information to added...
Batch decrypt files that have been encrypted by cbs_encrypt. More information to added...
- To be implemented at a later time (need to test the GHCR implementation....)
Each of the scripts provided in this toolkit can be executed in your terminal by using a docker run ... command that looks like this:
docker run --rm -it -v $PWD:/tmp -w /tmp ghcr.io/theowenlab/cbspython:latest ENTRYPOINT_NAME ARG1 ARG2 ETC
Ok, the command above:
- creates and runs a new container ("
docker run"), - that will be removed after it is done running ("
--rm"), - in an interactive mode ("
-it"), - using the latest cbspython image ("
ghcr.io/theowenlab/cbspython:latest"), - mounts your current working directory to
/tmp("-v $PWD:/tmp"), - sets the containers working directory to that folder ("
-w tmp"), - and executes the remaining stuff (i.e., the script name with supplied arguments) within the container's environment.
For example, try looking at the "help" displayed by cbs_parse_data:
docker run --rm -it -v $PWD:/tmp -w /tmp ghcr.io/theowenlab/cbspython:latest cbs_parse_data --help
To run one of our scripts on your CBS data:
- Navigate to the folder containing your raw CBS data export (in a .csv form) (e.g.,
cd ~/Documents/myproject/data/) - Run the desired script in a container:
# Replace ENTRYPOINT_NAME with one of the above script names, and ARGUMENTS with the required arguments.
docker run --rm -it -v $PWD:/tmp -w /tmp ghcr.io/theowenlab/cbspython:latest ENTRYPOINT_NAME ARGUMENTS
The CBS Python Docker image is simply a python3.9-slim base image that installs three custom Python pacakges (and required dependencies). The various scripts are provided as console entrypoints by the individual packages:
- Private CBS repository, for now.
- Provides the
cbs_parse_dataandcbs_score_calculatorconsole scripts, and other various functionality.
- Provides the
cbs_encryptandcbs_decryptconsole scripts.
To build this image, you need to have SSH access to CBS Bitbucket and the Owenlab GIN Server. That is, you must have an account on each hosting site with your public SSH key added to the profile in each account.
# From the cbs-sci-containers root directory
ssh-add ~/.ssh/id_rsa
datalad run -m "Rebuilding cbspython Docker image" -o cbspython/image/ "cd cbspython && ./build.sh"
Run a bash shell: docker run --rm -it --entrypoint /bin/bash cbspython:latest
$ datalad containers-run -m "Testing CBSPython parsing" -n cbspython -i test/test_data.csv "cbs_parse_data {inputs} test_data_out -o stdout --user-tfm 'lambda x: x[:x.index(\"@\")]'"
git reset --hard # removes staged and working directory changes
git clean -f -d # remove untracked
- Home
- Score Features
-
CBS Python (Docker)
- cbs_parse_data
- cbs_score_calculator
- cbs_encrypt
- cbs_decrypt
- Terminology