LYCEUM is a deep learning-based tool designed for predicting copy number variations (CNVs) in ancient whole-genome sequencing (WGS) data using read depth sequences.
The manuscript can be found here: LYCEUM : Learning to Call Copy Number Variants on Low Coverage Ancient Genomes
The repository with processed samples, ground truth data, and CNV predictions for real and simulated datasets to reproduce the analyses in the paper can be found here: LYCEUM results reproduction
Deep Learning, Ancient DNA, Copy Number Variation, Whole Genome Sequencing
Mehmet Alper Yilmaz, Ahmet Arda Ceylan, Gun Kaynar, A. Ercument Cicek
[firstauthorname].[firstauthorsurname]@bilkent.edu.tr
[lastauthorsurname]@cs.bilkent.edu.tr
Warning: Please note that LYCEUM software is completely free for academic usage. However it is licenced for commercial usage. Please first refer to the License section for more info.
- LYCEUM is a python3 script and it is easy to run after the required packages are installed.
For easy requirement handling, you can use LYCEUM_environment.yml files to initialize conda environment with requirements installed:
$ conda env create --name lyceum_env -f LYCEUM_environment.yml
$ conda activate lyceum_envNote that the provided environment yml file is for Linux systems. For MacOS users, the corresponding versions of the packages might need to be changed.
- LYCEUM provides GPU support optionally. See GPU Support section.
Important notice: Please call the LYCEUM_call.py script from the scripts directory.
- The fine-tuned model of the paper: (1) lyceum
- Batch size to be used to perform CNV call on the samples.
- Relative or direct path for are the processed Ancient WGS samples, including read depth data.
- Relative or direct output directory path to write LYCEUM output file.
- The regions you desire, choose one of the options: (1) exonlevel, (2) genelevel.
- Relative or direct path to the lookup file containing mean and standard deviation statistics of read depth values for each ancient samples. These statistics are utilized to normalize the typically high variability in ancient read depths. The lookup file can be created by running the mean_std_calculator.py script.
- Confidence threshold for calling CNV labels.
- Select higher values for more confident calls.
- Set to PCI BUS ID of the gpu in your system.
- You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:
-Check the version of LYCEUM.
-See help page.
LYCEUM is very easy to use! Here, We provide an example small-sized BAM file and show how to run LYCEUM on this toy dataset.
-
This project uses conda package management software to create virtual environment and facilitate reproducability.
-
For Linux users:
-
Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.
-
Replace this
Anaconda3-version.num-Linux-x86_64.shwith your choice
$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh- It is important to set up the conda environment which includes the necessary dependencies.
- Please run the following lines to create and activate the environment:
$ conda env create --name lyceum_env -f LYCEUM_environment.yml
$ conda activate lyceum_env- It is necessary to perform preprocessing on Ancient WGS data samples to obtain read depth and other meta data and make them ready for CNV calling.
- Please run the following line:
$ source preprocess_samples.sh- Here, we demonstrate an example to run LYCEUM on gpu device 0, and obtain gene-level CNV call.
- Please run the following script:
$ source lyceum_call.shYou can change the argument parameters within the script to run it on cpu and/or to obtain exon-level CNV calls.
- At the end of the CNV calling procedure, LYCEUM will write its output file to the directory given with -o option. In this tutorial it is ./lyceum_calls_output
- Output file of LYCEUM is a tab-delimited.
- Columns in the exon-level output file of LYCEUM are the following with order: 1. Sample Name, 2. Chromosome, 3. Exon Start Location, 4. Exon End Location, 5. LYCEUM Prediction
- Columns in the gene-level output file of LYCEUM are the following with order: 1. Sample Name, 2. Chromosome, 3. Gene Name, 4. LYCEUM Prediction
- Following figure is an example of LYCEUM gene-level output file.
Important notice: Please call the LYCEUM_finetune.py script from the scripts directory.
- Batch size to be used to perform CNV call on the samples.
- Relative or direct path for are the processed Ancient WGS samples, including read depth data.
- Relative or direct output directory path to write LYCEUM output weights.
- Relative or direct path to the lookup file containing mean and standard deviation statistics of read depth values for each ancient samples. These statistics are utilized to normalize the typically high variability in ancient read depths. The lookup file can be created by running the mean_std_calculator.py script.
- The number of epochs the finetuning will be performed.
- The learning rate to be used in finetuning
- The path for the pretrained model weights to be loaded for finetuning
- Set to PCI BUS ID of the gpu in your system.
- You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:
-Check the version of LYCEUM.
-See help page.
You may want to fine-tune LYCEUM with your ancient dataset. We provide an example of how LYCEUM can be fine-tuned using a small-sized BAM file along with its corresponding ground truth calls.
Step-0 and Step-1 are the same as the LYCEUM call example.
-
This project uses conda package management software to create virtual environment and facilitate reproducability.
-
For Linux users:
-
Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.
-
Replace this
Anaconda3-version.num-Linux-x86_64.shwith your choice
$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh- It is important to set up the conda environment which includes the necessary dependencies.
- Please run the following lines to create and activate the environment:
$ conda env create --name lyceum_env -f LYCEUM_environment.yml
$ conda activate lyceum_env-
It is necessary to perform preprocessing on Ancient WGS data samples to obtain read depth and other meta data and make them ready for LYCEUM finetuning.
-
LYCEUM Finetuning requires .bam and ground truth calls. Please see the below image for a sample ground truths format.
-
Please run the following line:
$ source finetune_preprocess_samples.sh- Here, we demonstrate an example to run LYCEUM Finetuning on gpu device 0.
- Please run the following script:
$ source lyceum_finetune.shYou can change the argument parameters within the script to run it on cpu.
- At the end of LYCEUM Finetuning, the script will save its model weights file to the directory given with -o option. In this tutorial it is ./LYCEUM_finetuned_model_weights
- CC BY-NC-SA 2.0
- Copyright 2024 © LYCEUM.
- For commercial usage, please contact.

