docs: add submodule advice (#105)

jackapbutler · kjappelbaum · pre-commit-ci[bot] · web-flow · commit 446aab5fa5c2 · 2023-03-14T11:24:53.000Z
Co-authored-by: Kevin M Jablonka &lt;32935233+kjappelbaum@users.noreply.github.com&gt;
Co-authored-by: pre-commit-ci[bot] &lt;66853113+pre-commit-ci[bot]@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -62,3 +62,26 @@ and log-in to `wandb` with your API key per [here](https://docs.wandb.ai/quickst
 ### Adding a new dataset (to the model training pipline)
 
 We specify datasets by creating a new function [here](src/chemnlp/data/hf_datasets.py) which is named per the dataset on Hugging Face. At present the function must accept a tokenizer and return back the tokenized train and validation datasets.
+
+### Installing submodules
+
+In order to ensure you also clone and install the required submodules (i.e. gpt-neox) you will have to do one of the following;
+
+- Recursively clone the submodule from GitHub
+
+  ```
+   # using ssh (if you have your ssh key on GitHub)
+  git clone --recurse-submodules --remote-submodules git@github.com:OpenBioML/chemnlp.git
+
+   # using https (if you use personal access token)
+  git clone --recurse-submodules --remote-submodules [git@github.com:OpenBioML/chemnlp.git ](https://github.com/OpenBioML/chemnlp.git)
+  ```
+
+  > This will automatically initialize and update each submodule in the repository, including nested submodules if any of the submodules in the repository have submodules themselve
+
+- Initialise and install the submodule after cloning
+
+  ```
+  git submodule init # registers submodule
+  git submodule update # clones and updates submodule
+  ```