Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
3029e6d
rename R Scripts to R_scripts
evalieungh Jan 27, 2025
0fa465d
move sourcing from commonlines to other scripts
evalieungh Jan 27, 2025
9e38ddd
update draft changes, GBIF download test
evalieungh Jan 28, 2025
0ee4fbb
Merge pull request #28 from BioDT/main
evalieungh Jan 30, 2025
2a8335b
drafting SoilGrids download code
evalieungh Jan 30, 2025
02ba75f
update gitignore to exclude data
evalieungh Jan 30, 2025
95ac20e
update data info
evalieungh Jan 30, 2025
1a5f0f8
resolving conflicts
evalieungh Jan 30, 2025
2d3f2ab
Merge branch 'evalieungh-capfitogen_dev' into capfitogen_dev
evalieungh Jan 30, 2025
011f7d3
add back HJ notes into notes script
evalieungh Jan 30, 2025
43e68bb
add back files lost in recent merge
evalieungh Jan 30, 2025
6667db8
update draft code for soil/edaphic data
evalieungh Feb 4, 2025
6d78db5
add ERA5 citation, debug functions
evalieungh Feb 5, 2025
e1f8d4c
update edaphic data download function (still draft)
evalieungh Feb 6, 2025
223aed5
update SoilGrids downloads
evalieungh Feb 14, 2025
7ca8146
first functioning edaphic download function
evalieungh Feb 17, 2025
5915d4f
draft download from google EE - not working
evalieungh Feb 17, 2025
3ce70d2
start setting up ELCmaps
evalieungh Feb 18, 2025
99d76b9
start scripting capfitogen pipeline, downloads
evalieungh Feb 19, 2025
64cda56
update ELC pipeline - raster read errors
evalieungh Feb 19, 2025
a8f4207
get to next error, puntos object missing
evalieungh Feb 19, 2025
c915265
update script, stuck on vifcor error
evalieungh Feb 20, 2025
94376ac
new error with ELCmaps find.clusters
evalieungh Feb 20, 2025
e7bb01f
bug fix FUN.DownEV
evalieungh Feb 21, 2025
a1b3e46
add wind data
evalieungh Feb 21, 2025
c683601
get ELCmaps running, draft Complementa code
evalieungh Feb 21, 2025
78375df
stuck on Complementa LATITUDE not found
evalieungh Feb 21, 2025
a4f333e
get to next error in Complementa; add illustration
evalieungh Feb 21, 2025
9ecdbe9
get to next Complementa error & protected areas download
evalieungh Feb 24, 2025
7f79bcc
delete unused draft scripts
evalieungh Feb 24, 2025
4eaacb7
working Complementa tool without areas analysis
evalieungh Feb 25, 2025
c317398
draft protected areas download
evalieungh Feb 26, 2025
34fe867
update annotation
evalieungh Feb 26, 2025
72060b0
fix paths wdpa
evalieungh Feb 27, 2025
c17bfb8
Merge branch 'main' into capfitogen_dev
evalieungh Feb 27, 2025
132a2c4
Add Slurm submission script for CapFitogen analysis (draft)
MichalTorma Feb 27, 2025
5e45754
Add fallback mechanism for BioClim data loading
MichalTorma Feb 27, 2025
6269781
Clean up ModGP-commonlines.R script
MichalTorma Feb 27, 2025
71dd0a4
Add Capfitogen submodule
MichalTorma Feb 27, 2025
ee31705
Merge pull request #42 from BioDT/capfitogen_dev_dev
evalieungh Feb 27, 2025
e9b597f
annotation updates
evalieungh Feb 27, 2025
c723ba2
Merge branch 'capfitogen_dev' of https://github.com/BioDT/uc-CWR into…
evalieungh Feb 27, 2025
46e1b16
comment out packages
evalieungh Feb 27, 2025
cd09797
delete Capfitogen repo download
evalieungh Feb 27, 2025
92808ed
move wdpa download to SHARED-Data as function
evalieungh Mar 21, 2025
52455c6
move plotting to visualisation script
evalieungh Mar 21, 2025
f352fee
fix saving wdpa as gpkg
evalieungh Mar 21, 2025
47fec08
update scripts
evalieungh Apr 2, 2025
4d707fd
draft capfitogen default data download
evalieungh Apr 10, 2025
dbe7efe
update script
evalieungh Apr 11, 2025
9b8df07
add capfitogen download links
evalieungh Apr 11, 2025
1da1dca
move data download functions out of main scripts
evalieungh Apr 14, 2025
59e0861
restructuring
evalieungh Apr 14, 2025
0388e76
adjust download procedure
evalieungh Apr 15, 2025
5838d5a
draft cropping code, fix bug in data download
evalieungh Apr 15, 2025
877ed79
fix template raster
evalieungh Apr 23, 2025
eac7297
fix raster stacking, use workaround for capfitogen data
evalieungh Apr 23, 2025
f4a00be
fix extent error
evalieungh Apr 24, 2025
21b20f9
update cropping draft
evalieungh Apr 24, 2025
e9588cb
fix google drive download
evalieungh Apr 28, 2025
e45750d
update scripts
evalieungh Apr 29, 2025
8cb2122
add introduction, links
evalieungh Apr 30, 2025
3ab2962
add tech doc draft
evalieungh Apr 30, 2025
6b4ff83
create prep and exec script drafts
evalieungh Apr 30, 2025
e55548a
delete redundant script
evalieungh Apr 30, 2025
a422241
update structure
evalieungh Apr 30, 2025
6dc25d7
update illustration
evalieungh Apr 30, 2025
7d5dae3
update tech doc
evalieungh Apr 30, 2025
155e362
Reorder packages
trossi May 2, 2025
1c627e9
Add extra packages
trossi May 2, 2025
029e776
Update R
trossi May 2, 2025
549cd03
Remove unused broken package
trossi May 2, 2025
f2f495c
Update image version
trossi May 2, 2025
aa27c27
Add instructions for building containers
trossi May 2, 2025
3087d3f
Merge pull request #45 from BioDT/update-container
evalieungh May 2, 2025
1c868ac
update container version
evalieungh May 2, 2025
a14a4da
fix project number
evalieungh May 5, 2025
961b99a
add API credentials template
evalieungh May 12, 2025
426af8c
updates
evalieungh May 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,12 @@ rsconnect/
*.png
*.txt
*.rds
*.tif
*.xlsx
*.zip
!Data/*/GlobalAreaCRS.RData
Data/Environment/soil/
!**/capfitogen_world_data_googledrive_links.csv

Logo/*

Expand All @@ -69,7 +74,10 @@ Logo/*

# Job output files
*.out
results/ELCmap/
results/

# Downloads
Capfitogen/

# Downloaded data
CAPFITOGEN3/
hq
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[submodule "Capfitogen"]
path = Capfitogen
url = https://github.com/evalieungh/Capfitogen.git
ignore = dirty
1 change: 1 addition & 0 deletions Capfitogen
Submodule Capfitogen added at 205ec9
8 changes: 0 additions & 8 deletions Data/Capfitogen/Placeholder.rtf

This file was deleted.

20 changes: 20 additions & 0 deletions Data/Environment/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Environmental data

downloaded with functions run in capfitogen_master.R and defined in SHARED-Data.R.

## Bioclimatic variables (FUN.DownBV)

A set of 19 bioclimatic variables, downloaded and processed with the KrigR package.

## Capfitogen's set of environmental variables (FUN.DownCAPFITOGEN)

A collection of publicly available environmental data: bioclimatic, edaphic, and geophysical variables from e.g. WorldClim, SoildGrids and other sources collected in a google drive. The function downloads the data and collects it in a NetCDF (.nc) file.

### Drafted download functions not currently in use:

**Edaphic variables (EV)**

Soil data downloaded from SoilGrids. Each map occupies ~ 5 GB. "SoilGrids is a system for global digital soil mapping that uses state-of-the-art machine learning methods to map the spatial distribution of soil properties across the globe. SoilGrids prediction models are fitted using over 230 000 soil profile observations from the WoSIS database and a series of environmental covariates. Covariates were selected from a pool of over 400 environmental layers from Earth observation derived products and other environmental information including climate, land cover and terrain morphology. The outputs of SoilGrids are global soil property maps at six standard depth intervals (according to the GlobalSoilMap IUSS working group and its specifications) at a spatial resolution of 250 meters. Prediction uncertainty is quantified by the lower and upper limits of a 90% prediction interval. The SoilGrids maps are publicly available under the [CC-BY 4.0 License](https://creativecommons.org/licenses/by/4.0/). Maps of the following soil properties are available: pH, soil organic carbon content, bulk density, coarse fragments content, sand content, silt content, clay content, cation exchange capacity (CEC), total nitrogen as well as soil organic carbon density and soil organic carbon stock." See [SoilGrids FAQ](https://www.isric.org/explore/soilgrids/faq-soilgrids).

**Geophysical variables (GV)**

86 changes: 86 additions & 0 deletions Data/GBIF/Lathyrus angulatus.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
{
"@context": [
["https://w3id.org/ro/crate/1.1/context"]
],
"@graph": [
{
"@type": ["CreativeWork"],
"@id": ["ro-crate-metadata.json"],
"conformsTo": {
"@id": ["https://w3id.org/ro/crate/1.1"]
},
"about": {
"@id": ["./"]
}
},
{
"@id": ["./"],
"hasPart": [
{
"@id": ["Lathyrus angulatus.RData"]
}
],
"about": [
{
"@id": ["https://www.gbif.org/species/5356429"]
}
],
"@type": [
["Dataset"]
],
"creator": {
"@id": ["https://orcid.org/0000-0002-4984-7646", "biodt-robot@gbif.no"]
},
"author": {
"@id": ["https://orcid.org/0000-0002-4984-7646", "biodt-robot@gbif.no"]
},
"license": {
"@id": ["https://creativecommons.org/licenses/by/4.0/"]
},
"studySubject": [
["http://eurovoc.europa.eu/632"]
],
"datePublished": ["2025-05-19 18:53:17"],
"name": ["Cleaned GBIF occurrence records for species Lathyrus angulatus"],
"encodingFormat": ["application/ld+json"],
"mainEntity": ["Dataset"],
"keywords": [
["GBIF"],
["Occurrence"],
["Biodiversity"],
["Observation"],
["Capfitogen"]
],
"description": ["Capfitogen input data for Lathyrus angulatus"]
},
{
"@id": ["Lathyrus angulatus.RData"],
"@type": [
["File"]
],
"name": ["Lathyrus angulatus.RData"],
"contentSize": ["NA"],
"encodingFormat": ["application/RData"]
},
{
"@id": ["https://orcid.org/0000-0002-4984-7646", "biodt-robot@gbif.no"],
"@type": ["Person", "Organisation"],
"name": ["biodt-cwr", "Erik Kusch"]
},
{
"@id": ["#action1"],
"@type": ["CreateAction"],
"agent": {
"@id": ["https://orcid.org/0000-0002-4984-7646", "biodt-robot@gbif.no"]
},
"instrument": {
"@id": ["https://github.com/BioDT/uc-CWR"]
},
"result": [
{
"@id": ["./"]
}
]
}
]
}
4 changes: 4 additions & 0 deletions ModGP-run_exec.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,11 @@ message(sprintf("SPECIES = %s", SPECIES))
Dir.Base <- getwd()
Dir.Scripts <- file.path(Dir.Base, "R_scripts")

## Sourcing ---------------------------------------------------------------
source(file.path(Dir.Scripts, "ModGP-commonlines.R"))
source(file.path(Dir.Scripts,"SHARED-Data.R"))
source(file.path(Dir.Scripts,"ModGP-SDM.R"))
source(file.path(Dir.Scripts,"ModGP-Outputs.R"))

# Choose the number of parallel processes
RUNNING_ON_LUMI <- TRUE
Expand Down
4 changes: 4 additions & 0 deletions ModGP-run_prep.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,11 @@ message(sprintf("SPECIES = %s", SPECIES))
Dir.Base <- getwd()
Dir.Scripts <- file.path(Dir.Base, "R_scripts")

## Sourcing ---------------------------------------------------------------
source(file.path(Dir.Scripts, "ModGP-commonlines.R"))
source(file.path(Dir.Scripts,"SHARED-Data.R"))
source(file.path(Dir.Scripts,"ModGP-SDM.R"))
source(file.path(Dir.Scripts,"ModGP-Outputs.R"))

## API Credentials --------------------------------------------------------
try(source(file.path(Dir.Scripts, "SHARED-APICredentials.R")))
Expand Down
4 changes: 4 additions & 0 deletions ModGP_MASTER.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,11 @@ message(sprintf("SPECIES = %s", SPECIES))
Dir.Base <- getwd()
Dir.Scripts <- file.path(Dir.Base, "R_scripts")

## Sourcing ---------------------------------------------------------------
source(file.path(Dir.Scripts, "ModGP-commonlines.R"))
source(file.path(Dir.Scripts,"SHARED-Data.R"))
source(file.path(Dir.Scripts,"ModGP-SDM.R"))
source(file.path(Dir.Scripts,"ModGP-Outputs.R"))

## API Credentials --------------------------------------------------------
try(source(file.path(Dir.Scripts, "SHARED-APICredentials.R")))
Expand Down
40 changes: 26 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
# uc-CWR
# Use case Crop Wild Relatives (uc-CWR)

This repository hosts code for a [Biodiversity Digital Twin](https://biodt.eu/) use case: the prototype Digital Twin for [Crop Wild Relatives](https://biodt.eu/use-cases/crop-wild-relatives). The prototype Digital Twin can be accessed through a grapical user interface made with R shiny and hosted on Lifewatch: [prototype digital twins GUI](http://app.biodt.lifewatch.eu/)

> *"The Prototype Biodiversity Digital Twin (pDT) for Crop Wild Relatives is an advanced tool designed to aid in the identification and use of crop wild relatives (CWR) genetic resources to enhance crop resilience against climate-driven stresses"* [BioDT.eu/use-cases/crop-wild-relatives](https://biodt.eu/use-cases/crop-wild-relatives)

For technical documentation, see a separate [markdown file](technical_documentation.md). Below we also outline quick instructions for running the ModGP and Capfitogen tools in R and on the LUMI supercomputer. The prototype Digital Twin is also presented in a 'Research ideas and outcomes' paper: [Chala et al. 2024](https://doi.org/10.3897/rio.10.e125192). The core functionality of the digital twin is ModGP (Modelling the GermPlasm of interest), but two of Capfitogen's tools have since been added to extend the prototype Digital Twin's usefulness.

> *"MoDGP leverages species distribution modelling, relying on occurrence data of CWR to produce habitat suitability maps, establish mathematical correlations between adaptive traits, such as tolerance to drought and pathogens and environmental factors and facilitates mapping geographic areas where populations possessing genetic resources for resilience against various biotic and abiotic stresses are potentially growing."* [Chala et al. 2024](https://doi.org/10.3897/rio.10.e125192)

---------------------------------

## ModGP on Rstudio

1. Source `ModGP_MASTER.R` and change `SPECIES` argument at line 19 to execute ModGP pipeline for a specific genus.
1. Source `ModGP_MASTER.R` and change `SPECIES` argument at line 19 to execute ModGP pipeline for a specific genus. NB! ModGP should be run on a supercomputer. The environmental data download has very large interim files (>40GB per year per variable, >200 GB overall), and the distribution modelling also requires a long time to run.

## ModGP on LUMI with Hyperqueue

Expand All @@ -22,20 +32,22 @@
sbatch submit_modgp_exec_lumi_HQ.sh Lathyrus


## CAPFITOGEN demo
## CAPFITOGEN

As an addition to ModGP, you can run two of [Capfitogen](https://www.capfitogen.net/en/)'s most useful tools: [ecogeographic land characterization (ELC) maps](https://www.capfitogen.net/en/tools/elc-mapas/) and [Complementa](https://www.capfitogen.net/en/tools/complementa/) maps to visualise overlap with protected areas.
Because a lot of variables will be downloaded and processed, the total memory requirements may be too large for most personal computers. Try with a subset of the data if necessary.

NB! After cloning this repository, you need to clone Capfitogen (a submodule) as well with `git submodule update --init`.

Alternative ways of running the capfitogen capabilities:

See [documentation](https://www.capfitogen.net/en).
- To run our version of CAPFITOGEN in [RStudio](https://posit.co/downloads/), open `capfitogen_master.R` and execute the code, changing inputs like species name and other parameters. The script guides you through the whole process. After changing the species name, you can run the whole script as a background job if desired.

1. Download `CAPFITOGEN3.zip` from
[here](https://drive.google.com/file/d/1EJw-XcC1NRVFS7mwzlg1VpQBpRCdfWRd/view?usp=sharing)
and extract it to the project root.
- To run on LUMI (assumes access to LUMI and the project):

2. Download `rdatamaps/world/20x20` directory from
[here](https://drive.google.com/drive/folders/19bqG_Z3aFhzrCWQp1yWvMbsLivsCicHh)
and extract it to `CAPFITOGEN3/rdatamaps/world/20x20`.
1. Fetch the container: `singularity pull --disable-cache docker://ghcr.io/biodt/cwr:0.6.0`
2. then submit the job for a desired species (e.g. Lathyrus):

3. Run on LUMI: obtain interactive session:
`srun -p small --nodes=1 --ntasks-per-node=1 --mem=8G -t 4:00:00 --pty bash`
and execute the workflow:
`singularity run --bind $PWD cwr_0.2.0.sif capfitogen.R`
sbatch submit_capfitogen_prep_lumi.sh Lathyrus
sbatch submit_capfitogen_exec_lumi.sh Lathyrus

97 changes: 97 additions & 0 deletions R_scripts/Capfitogen_visualisation.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
##############################################################
#' Visualisation of Capfitogen inputs and output
#' CONTENTS:
#' - Visualisation of input data
#' - World Database on Protected Areas (WDPA)
#' - Visualisation of outputs
#' - ELC maps
#' -
#' DEPENDENCIES:
#' - capfitogen_master.R (to download and create data)
#' - ModGP-commonlines.R (packages, paths)
#' AUTHORS: [Eva Lieungh]
#' ###########################################################

# Load dependencies ------------------------------------------
# Define directories in relation to project directory
Dir.Base <- getwd()
Dir.Scripts <- file.path(Dir.Base, "R_scripts")

# source packages, directories, simple functions (...)
source(file.path(Dir.Scripts, "ModGP-commonlines.R"))

# VISUALISE INPUTS ===========================================

# WDPA -------------------------------------------------------





# VISUALISE OUTPUTS ==========================================

# ELC maps ---------------------------------------------------
## quick visualisation ----
# List all the .tif files in the directory
elc_tif_outputs <- list.files(path = Dir.Results.ELCMap,
pattern = "\\.tif$",
full.names = TRUE)

# Loop over each .tif file
for (file_path in elc_tif_outputs) {
# Read the raster file
map_i <- rast(file_path)

# Replace NaN with NA (if they exist)
map_i[is.nan(values(map_i))] <- NA

# Create a mask to highlight non-zero areas
non_zero_mask <- mask(map_i, !is.na(map_i))

# Convert to points to find non-zero values' extent
points <- as.points(non_zero_mask, na.rm = TRUE)

# If there are any valid points, proceed with cropping
if (!is.null(points) && nrow(points) > 0) {
# Calculate extent directly from the non-empty points
coordinates <- terra::geom(points)[, c("x", "y")]
xmin = min(coordinates[,"x"])
xmax = max(coordinates[,"x"])
ymin = min(coordinates[,"y"])
ymax = max(coordinates[,"y"])
non_zero_extent <- ext(xmin, xmax, ymin, ymax)

# Crop the raster using this extent
cropped_map <- crop(map_i, non_zero_extent)

# Plot the cropped raster
plot(cropped_map, main = basename(file_path))
} else {
plot(map_i, main = paste(basename(file_path), "(No non-zero values)"))
}
}


# Complementa ---------------------------------------------------------

complementa_map <- rast(
file.path(Dir.Results.Complementa,
"AnalisisCeldas_CellAnalysis/Complementa_map.tif"))
plot(complementa_map)
complementa_map[is.nan(values(complementa_map))] <- NA
non_zero_mask <- mask(complementa_map,
!is.na(complementa_map))
complementa_points <- as.points(non_zero_mask, na.rm = TRUE)
plot(complementa_points)

map(
'world',
col = "grey",
fill = TRUE,
bg = "white",
lwd = 0.05,
mar = rep(0, 4),
border = 0,
ylim = c(-80, 80)
)
points(complementa_points)
Loading