This if a forked branch to the original CARLIN
The main purpose of this fork is to modify the original CARLIN code for use of CatchAllCmdW.exe in Linux environment.
To use CatchAllCmdW.exe in Linux, we used a Windows emulator, Wine:
sudo apt install --install-recommends winehq-stable
sudo apt-get install wine-mono
wget https://dl.winehq.org/wine/wine-mono/9.0.0/wine-mono-9.0.0-x86.msi
wine msiexec /i wine-mono-9.0.0-x86.msiThen replace the % 4. Run CatchAll code chunck in /@Bank/Bank.m with the below:
% 4. Run CatchAll
% Ensure absolute paths
if ~startsWith(catchall_input_file, '/')
catchall_input_file = fullfile(pwd, catchall_input_file);
end
if ~startsWith(catchall_subdir, '/')
catchall_subdir = fullfile(pwd, catchall_subdir);
end
disp(catchall_input_file);
disp(catchall_subdir);
% Convert to Wine-friendly Windows paths
catchall_input_file_win = unix2winpath(catchall_input_file);
catchall_subdir_win = unix2winpath(catchall_subdir);
catchall_input_file_win = strtrim(catchall_input_file_win);
catchall_subdir_win = strtrim(catchall_subdir_win);
prog = strtrim(prog);
disp(['wine ' prog ' ' catchall_input_file_win ' ' catchall_subdir_win]);
system(['wine ' prog ' ' catchall_input_file_win ' ' catchall_subdir_win]);For super huge fastq file, you may need a large swap file:
sudo swapoff -a
sudo dd if=/dev/zero of=/swapfile bs=1M count=500000
sudo chmod 0600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfileFor large FASTQ inputs, the analysis code now writes the raw and aligned sequence data to temporary MAT files instead of keeping everything in memory. These spill files are loaded back on demand and greatly reduce the peak RAM requirements at the cost of extra disk usage.
Summary
- The
FastQDataclass now includes aspill_to_diskmethod that writes large fields to aMAT-fileand clears them from memory, withload_from_diskreloading the data when needed analyze_CARLINspillsFASTQandalignmentdata to disk during processing to keep memory usage low
The CARLIN pipeline calls alleles from sequencing runs of the CARLIN amplicon. It was written in MATLAB 2019a.
Please cite the companion paper if you use this software package:
S. Bowling*, D. Sritharan*, F. G. Osorio, M. Nguyen, P. Cheung, A. Rodriguez-Fraticelli, S. Patel, W-C. Yuan, Y. Fujiwara, B. E. Li, S. H. Orkin, S. Hormoz, F. D. Camargo. "An Engineered CRISPR-Cas9 Mouse Line for Simultaneous Readout of Lineage Histories and Gene Expression Profiles in Single Cells." Cell (2020), https://doi.org/10.1016/j.cell.2020.04.048
To reproduce all the results and figures from the paper, please visit the paper repository.
To download the pipeline into the directory given by CODE_PATH:
$ git clone https://gitlab.com/hormozlab/carlin.git CODE_PATHInstall the CARLIN software by opening MATLAB, and running the following commands. Installation requires a C++ compiler, which should be included as part of your MATLAB installation if you are an academic user. If not, you can download the free MinGW compiler.
>> cd(CODE_PATH);
>> install_CARLIN;If you are installing on your local machine, you should additionally save the path.
>> savepath;If you are installing on a cluster machine, the MATLAB path file will be read-only, so you will have to re-run the following command at the start of each session to set up paths:
>> addpath(genpath(CODE_PATH));You should also re-run the installation when you pull new changes from git.
The entry point to the pipeline is the analyze_CARLIN function.
>> help analyze_CARLIN analyze_CARLIN calls alleles from FASTQs sequencing the CARLIN amplicon.
analyze_CARLIN(FASTQ_FILE, CFG_TYPE, OUTDIR) analyzes FASTQ_FILE
(*.fastq, *.fastq.gz) generated according to the experimental CARLIN
protocol defined in CFG_TYPE (one of 'Sanger', 'BulkDNA', 'BulkRNA',
'scInDropsV2', 'scInDropsV3', 'sc10xV2', 'sc10xV3') and saves the
following files to OUTDIR:
Analysis.mat.gz - contains all variables including a compactified
representation of the FASTQ files, a depot of unique aligned
sequences, the collection of CBs and UMIs detected before and after
denoising, and the called alleles. This file may be large, and is not
intended for use by a casual user. Rather, it stores all intermediate
variables for custom one-off analysis.
Summary.mat - contains the subset of variables in Analysis.mat that
is usually sufficient for subsequent downstream analysis
including a list of alleles, their frequencies, tags (UMIs or CBs)
reporting that allele, input parameters, thresholds used by the
algorithm, and, for SC runs, the reference barcode list.
Alleles.png - Plot of mutations harbored in each allele, distribution of
allele occurrence frequencies and summary statistics.
AlleleAnnotations.txt - Annotations describing each allele in plaintext
format (HGVS) for use by other downstream tools.
AlleleColonies.txt - List of tags (UMIs or CBs) which report each
allele, for use by other downstream tools. The ordering corresponds
to AlleleAnnotations.txt
Results.txt - Human-readable summary of the analysis results.
Diagnostic.png - Detailed dashboard of various diagnostic quantities.
Warnings.txt - List of issues detected in the data.
Log.txt - Running log of the pipeline.
The pipeline can optionally be invoked with some extra paramters:
analyze_CARLIN(..., 'CARLIN_amplicon', CARLIN_def) uses the nucleotide
sequence and alignment parameters in 'CARLIN_def' to define the CARLIN
amplicon and align reads against it. Defaults to 'OriginalCARLIN', which
corresponds to the amplicon used in the (Cell, 2020) paper. Can also
specify 'TigreCARLIN'. See *.json files in cfg/amplicon for full definitions.
analyze_CARLIN(..., 'read_cutoff_UMI_denoised', cutoff) uses a minimum
read threshold of 'cutoff' when attempting to call alleles from denoised
UMIs (default = 10). This cutoff is not used in its raw form, but
combined as part of a cutoff function. It represents an absolute floor,
but the cutoff used in practice will generally be higher.
analyze_CARLIN(..., 'read_override_UMI_denoised', cutoff) short circuits
the cutoff function, and sets the threshold to 'cutoff'. All UMIs with a
read count >= 'read_override_UMI_denoised' will be asked to call an allele.
Default is unset.
For bulk sequencing runs (CFG_TYPE='Bulk*'):
analyze_CARLIN(..., 'max_molecules', N) considers (at most) the N denoised
UMIs with the most reads, for calling alleles. For CFG_TYPE='BulkDNA',
N is the number of cells in the sample. For CFG_TYPE='BulkRNA', N is the
number of transcripts (if unsure, 10x the number of cells is a suitable
guess). Default is inf.
For single-cell sequencing runs of the CARLIN amplicon (CFG_TYPE='sc*'):
analyze_CARLIN(..., 'max_cells', N) considers (at most) the N denoised
cell barcodes with the most reads, for calling alleles. Defaults to
the number of possible CBs in the single-cell platform specified by
CFG_TYPE, or the number of elements in 'ref_CB_file' (see below), if
specified.
analyze_CARLIN(..., 'read_cutoff_CB_denoised', cutoff) uses a minimum
read threshold of 'cutoff' when attempting to call alleles from denoised
CBs (default = 10). This cutoff is not used in its raw form, but
combined as part of a cutoff function. It represents an absolute floor,
but the cutoff used in practice will generally be higher.
analyze_CARLIN(..., 'read_override_CB_denoised', cutoff) short circuits
the cutoff function, and sets the threshold to 'cutoff'. All CBs with a
read count >= 'read_override_CB_denoised' will be asked to call an allele.
Default is unset.
analyze_CARLIN(..., 'ref_CB_file', file) uses the reference list of
cell barcodes in the file specified by 'ref_CB_file' when denoising
barcodes found in the FASTQ files. The reference list should have one
cell barcode per line. Each cell barcode should be a string consisting
of only the characters {A,C,G,T}. The length of the barcode should match
the length expected by the platform specified by CFG_TYPE. This reference
list is typically produced by the software used to process the
corresponding transcriptome run. Defaults to the full barcode list in
the single-cell platform specified by CFG_TYPE. Specifying a reference
list leads to more accurate denoising than using the full barcode list.
Examples:
% Run analysis on bulk RNA sample
analyze_CARLIN('run1/PE.fastq.gz', 'BulkRNA', 'run1_output');
% Run analysis on multiple sequencing runs of the same bulk DNA sample
analyze_CARLIN({'run1/PE.fastq.gz'; run2/PE.fastq.gz'}, 'BulkDNA', 'combined_output');
% Run analysis on sample prepared with InDropsV3
analyze_CARLIN('indrops/amplicon/Lib1.fastq', 'scInDropsV3', ...
'output', 'ref_CB_file', 'indrops/transcriptome/abundant_barcodes.txt');
% Run analysis on sample prepared with 10xGenomics V2
analyze_CARLIN({'tenx/amplicon/Mouse_R1_001.fastq', 'tenx/amplicon/Mouse_R2_001.fastq'}, ...
'sc10xV2', 'tenx/processed_amplicon', ...
'ref_CB_file', 'tenx/transcriptome/filtered_barcodes_umi_mt.txt');
Author: Duluxan Sritharan. Hormoz Lab. Harvard Medical School.
The outputs are described in the MATLAB help screen captured above. The most important output is Summary.mat which stores all the information that should typically be required for any downstream analysis.
Each run of the pipeline also produces a summary plot, with useful at-a-glance metrics in the top-left corner:
Here's a brief look at the text outputs from a sample run:
$ tail AlleleAnnotations.txt 50_157del,212_265del
47_131del
47A>T
23_184del
23_237del,262_267del
23_23del,49_49delinsTC,104_105insCC,265_266insAC
22_23insG,77_265del
23_241del
17_49delinsTCG,77_238del,266_266del
17_150del,264_265del
$ head AlleleColonies.txt -n 20 | tail -n 10AATAGAGGTGAGACCA,CATACTTAGTCATCCA,CCTCAGTGTGCCTAAT,CGAAGGATCGTTCTGC,GAGATGGCAAAGGCTG,GGTGAAGCAACCCTAA,TCCTCTTTCACAGTGT
AGATAGACAGCACCCA,GCGTTTCTCACAAGAA,GTAGTACTCAAAGCCT,GTGAGGATCTCGTCAC,GTGTAACAGAAACCCG,TAGGTTGTCACGATAC
ATGGGAGTCATTACGG,CATTCTATCCGCACGA,GCTGAATGTGACACAG,GGGACCTGTCCTTAAG,TACGTCCCACCGGAAA,TTAGGCAAGCCACAAG
CTCCACAGTGCGACAA,GCCAGGTGTTCCTAGA,GTAATCGTCATTGAGC,TGCTCGTCACACTTAG,TTCCGGTGTCATAACC
CACCGTTGTGGACAGT,GAGTCTATCATTTACC,TGCTTGCTCGGTGAAG,TTCTAGTTCCACTTCG
GACATCAGTGTCCCTT,TGCGGCAGTTATCCAG,TTCAGGAAGCTAGAAT,TTTCATGTCAACTACG
GAACTGTTCGTGCTCT,TACCCACGTCACATTG,TCGATTTCATCCAATG,TTACCGCGTAATTAGG
ACACAGTCATCGGAAG,CGTTGGGCAGATACCT,GCATTAGTCCCTTTGG
AGGCTGCTCTCACTCG,ATGGGTTGTCTCGGGT,ATTCGTTCAACTCCCT
CTAACCCAGCCTCACG,GTTACAGTCTATACTC,TGGATCACACAGCATT
$ tail Results.txtALLELES
Total (including template): 83
Singletons (including template): 49
% CBs edited: 20
Effective Alleles: 44
Diversity Index (normalized by all): 0.03
Diversity Index (normalized by edited): 0.17
Mean CARLIN potential (by allele): 5.24
$ cat Warnings.txtOFF-TARGET AMPLIFICATION
Significant off-target amplification detected: 73% of reads are not CARLIN.
REFERENCE_LIST
Collapsing MiSeq cell barcodes against the platform's reference list may lead to more FPs and FNs.
FILTERING
Sequencing depth insufficent. Low number of reads detected in FASTQ.
Only 61% of reads have usable provenance information (CB or UMI). See Results.txt for a more detailed breakdown of QC issues.
ANALYSIS
Number of common CBs is low (163). This is likely due to:
- reads dedicated to FP CBs owing to the inflated reference list used
- low sequencing depth so that many CBs do not exceed the required threshold
- QC issues with CB/UMI which cause reads to be dedicated to spurious CBs that persist after filtering
RESULTS
Low +Dox induction detected. Only 0% of CBs reported an edited CARLIN allele.
For allele(s) 1, >10% of CB halves or UMIs differ pairwise by only 1bp. This may indicate that that the cell count for these alleles is artificially inflated. This is likely because of QC issues on the CB/UMI reads. This is a known issue with InDrops.
Each run of the CARLIN pipeline produces an ExperimentSummary or ExperimentReport object saved in Summary.mat called summary. This is the only file needed for most common downstream analyses. The field summary.alleles contains a list of called CARLIN alleles sorted in descending order according to summary.allele_freqs, which indicates how many UMIs (for bulk experiments) or CBs (for single-cell experiments), each allele is found in.
ExperimentSummary objects from separate experiments can be merged to analyze how alleles are shared across the experiments:
>> mouse1 = load('Mouse1/Summary.mat');
>> mouse2 = load('Mouse2/Summary.mat');
>> mouse3 = load('Mouse3/Summary.mat');
>> [merged, sample_map, allele_breakdown_by_sample] = ExperimentSummary.FromMerge([mouse1.summary; mouse2.summary; mouse3.summary]);merged is a new ExperimentSummary object that can be handled in the usual way for assessing statistical significance of alleles or preparing visualizations (see below). sample_map{j}(i) indicates which allele in merged corresponds to allele i in experiment j. allele_breakdown_by_sample(i,j) indicates how many times allele i in merged appeared in experiment j.
A main contribution of the CARLIN paper (Cell, 2020) is to quantify the statistical significance of alleles (see Methods). Cas9 editing does not generate all alleles with equal probability - those with complex editing patterns are more rare. The more common alleles are more likely to coincidentally mark multiple progenitors, and are therefore less useful for lineage tracing. Alleles that should be rare but are abundant in a particular experiment are statistically significant.
Statistical significance is determined by comparing the observed frequency of an allele in an experiment, with the expected frequency. The observed frequency is stored in summary.allele_freqs. The expected frequency distribution can be determined by using the allele bank, (stored in the Bank variable of Bank_OriginalCARLIN.mat), which is the same bank used in Figure 3 of (Cell, 2020).
You can compute the two p-values described in Methods as shown:
>> load('Summary.mat');
>> load('Bank.mat');
>> p_clonal = bank.compute_clonal_pvalue(summary);
>> p_frequency = bank.compute_frequency_pvalue(summary);The output in each case is a vector with elements corresponding to the alleles in summary.alleles.
To create a custom bank characterizing the null distribution of allele frequencies for your own protocol, put CatchAllCmdW executable file into a desired directory and copy the full path (including the file name) into:
CODE_PATH/@Bank/CatchAllPath.txt
Run the following code to create a Bank object by pooling three libraries (saved in directories Mouse1, Mouse2, and Mouse3), each processed separately using the CARLIN pipeline, and save the result in MyCustomBank/Bank.mat:
>> mouse1 = load('Mouse1/Summary.mat');
>> mouse2 = load('Mouse2/Summary.mat');
>> mouse3 = load('Mouse3/Summary.mat');
>> bank = Bank.Create([mouse1.summary; mouse2.summary; mouse3.summary], {'Mouse1'; 'Mouse2'; 'Mouse3'}, 'MyCustomBank');The CatchAll code runs only on Windows. You can use Windows emulators such as VirtualBox or Wine to run it on non-Windows machine.
A CARLIN allele is stored as two aligned strings - a sequence and a reference. To view the alignment of the fifth most common allele in the experiment:
>> [summary.alleles{5}.get_seq; summary.alleles{5}.get_ref]
ans =
2×276 char array
'CGCCGGACTGCACGACAGTCGACGATGGAGTCGACACGACTCGCGCATA------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------CGATGGGAGCT'
'CGCCGGACTGCACGACAGTCGACGATGGAGTCGACACGACTCGCGCATACGATGGAGTCGACTACAGTCGCTACGACGATGGAGTCGCGAGCGCTATGAGCGACTATGGAGTCGATACGATACGCGCACGCTATGGAGTCGAGAGCGCGCTCGTCGACTATGGAGTCGCGACTGTACGCACACGCGATGGAGTCGATAGTATGCGTACACGCGATGGAGTCGAGTCGAGACGCTGACGATATGGAGTCGATACGTAGCACGCAGACGATGGGAGCT'To retrieve a list of mutations called from a set of alleles, and inspect the second mutation in the thirteenth allele:
>> mut_list = cellfun(@(x) Mutation.identify_cas9_events(summary.CARLIN_def, x), summary.alleles, 'un', false);
>> mut_list{13}(2)
ans =
Mutation with properties:
type: 'D'
loc_start: 263
loc_end: 266
seq_new: '----'
seq_old: 'AGAC'
>> mut_list{13}(2).annotate()
ans =
'263_266del' To output a list of mutations to a text file in HGVS format as shown in AlleleAnnotations.txt:
>> Mutation.ToFile(summary.CARLIN_def, mut_list, 'output_path', 'mootations.txt');To construct CARLIN alleles by applying mutations from a list specified in HGVS format (like AlleleAnnotations.txt) to the CARLIN_amplicon defined in CARLIN_def:
>> alleles = cellfun(@(x) Mutation.apply_mutations(CARLIN_def, x), Mutation.FromFile(CARLIN_def, 'dootations.txt'), 'un', false);This package also includes a few standard visualizations, that will produce figures like those shown in the (Cell, 2020) paper. They all take an ExperimentSummary object as input. You can change the color scheme by modifying the constants in plots/CARLIN_viz.m
Visualize the top 75 most common edited alleles:
>> plot_highlighted_alleles(summary, 75);Generate a CDF and histogram of allele frequencies:
>> plot_allele_frequency_CDF(summary, 'Eyeball');Generate a histogram of mutations according to length (normalized by the number of cells in the experiment):
>> plot_indel_freq_vs_length(summary);Summarize the different edit patterns at each target site:
>> plot_site_decomposition(summary, true, 'Eyeball', '# of Transcripts');Visualize the editing patterns in more detail:
>> plot_stargate.create(summary);The CARLIN pipeline allows limited scope for customization by allowing advanced users to specify their own configuration file, which tells the pipeline how reads from the input FASTQ file(s) should be interpreted. Since only the default configurations have been tested, we cannot guarantee that exotic configurations will work out-of-the-box. Caveat emptor.
To specify a custom configuration file set CFG_TYPE='PATH/TO/CustomCfg.json' when calling analyze_CARLIN. Instead of creating your configuration file from scratch, copy the existing default from CODE_PATH/cfg/library/*.json that most resembles your desired settings and make the required modifications.
The pipeline expects configuration files to have the following fields:
- type : {"Bulk", "SC"}
Are the FASTQs from "Bulk" experiments, in which reads are only tagged with UMIs, or from "SC" experiments, in which reads are also associated with CBs? If type="Bulk", the input FASTQ should contain paired-end reads, and the UMI should always be immediately 5' or 3' of the amplicon sequence (see UMI.location and read_perspective below). If type="SC", the software expects FASTQs in the exact format produced by either the CellRanger (10x) or CARLIN_InDrops pipeline.
- SC.Platform : {"10x", "InDrops"}
- SC.Version : {2, 3}
Specify the FASTQ format when type="SC". Do not include if type~="SC".
- UMI.length : [whole number]
Specify the number of bps in a UMI.
- UMI.location : {"L", "R"}
When type="Bulk", is the UMI located to the left or right of the amplicon sequence in the read (before applying the transformations in read_perspective below)? Do not include if type="SC".
- CB.length : [whole number]
Specify the number of bps in a CB. Do not include if type~="SC".
- read_perspective.ShouldComplement : {"Y", "N"}
- read_perspective.ShouldReverse : {"Y", "N"}
Specify what transformations have to be performed on the lines of the input FASTQ file so that when read left-to-right, they match the reference sequence defined in the CARLIN_amplicon file (modulo mutations).
- trim.Primer5 : {"exact", "misplaced", "malformed", "ignore"}
- trim.Primer3 : {"exact", "misplaced", "malformed", "ignore"}
- trim.SecondarySequence: {"exact", "misplaced", "malformed", "ignore"}
After applying the transformations specified in read_perspective, the lines should have the following structure:
|| A | Primer5 | CARLIN | Secondary Sequence | Primer3 | B ||
The CARLIN sequence can therefore be extracted by searching for the flanking primers and/or secondary sequence according to various levels of stringency (from most to least):
-
"exact" discards a read unless an exact match is found at the precise location:
- Primer5 : A=UMI when type="Bulk" and (UMI.location=="L")==(read_perspective.Reverse=="N"). A='' otherwise.
- Primer3/SecondarySequence : B=UMI when type="Bulk" and (UMI.location=="L")~=(read_perspective.Reverse=="N"). B='' otherwise.
-
"misplaced" discards a read unless an exact match is found irrespective of location.
-
"malformed" discards a read unless nwalign with NUC44 scoring matrix and Glocal=true returns a score greater than or equal to the corresponding match_score specified in the CARLIN_amplicon file (see below).
-
"ignore" does not search for the specified sequence i.e. the sequence is treated as the empty string.
When type="Bulk", the UMI associated with each read is therefore considered to be the UMI.length bps to the left (right) of Primer5 (Primer3) when (UMI.location=="L")==(read_perspective.Reverse=="N") is true (false). To avoid registration of spurious UMIs due to poor alignment with the primers, it is best to set trim.Primer5={"exact","misplaced"} (trim.Primer3={"exact","misplaced"}) when type="Bulk".
The CARLIN pipeline allows limited scope for customization by allowing advanced users to supply their own CARLIN_amplicon file, which specifies the CRISPR nucleotide sequence defining the CARLIN amplicon and alignment parameters for aligning reads against this reference. Since only the default amplicons have been tested, we cannot guarantee that all CRISPR sequences will work out-of-the-box. Caveat emptor.
To specify a custom CARLIN_amplicon file, supply 'PATH/TO/CustomAmplicon.json' as the 'CARLIN_amplicon' argument when calling analyze_CARLIN. Instead of creating your CARLIN_amplicon file from scratch, copy one of the existing defaults from CODE_PATH/cfg/amplicon/*.json and make the required modifications.
The pipeline expects CARLIN_amplicon files to have the following fields:
- sequence.segments : ["N...N", ... , "N...N"]
An array of target sites as defined in Figure 1B of (Cell, 2020). All target sites must be the same length.
- sequence.pam : ["N...N", ... , "N...N"]
An array of PAM sequences as defined in Figure 1B of (Cell, 2020). The number of PAM sequences specified should be one less than the number of target sites and all PAM sequences must be the same length. If a single PAM sequence is specified, it will be repeated between all target sites. Specify an empty string if only one target site is specified in sequence.segments.
- sequence.{prefix,postfix} : "N...N"
A nucleotide sequence specifying the {prefix, postfix} as defined in Figure 1B of (Cell, 2020).
- sequence.{Primer5,Primer3,SecondaySequence} : "N...N"
A nucleotide sequence specifying the {5' Primer, 3' Primer, Secondary Sequence} (see diagram above).
- match_score.{Primer5,Primer3,SecondarySequence} : [double]
When the configuration file specifies trim.{Primer5,Primer3,SecondarySequence}="malformed", if nwalign with NUC44 scoring matrix and Glocal=true returns a score greater than or equal to this threshold when aligning the read to sequence.{Primer5,Primer3,SecondarySequence}, the primer/sequence is considered to be found in the read.
- {open_penalty,close_penalty}.{prefix,postfix} : [double array]
An array of length equal to sequence.{prefix,postfix} specifying the opening/closing penalty for the sequence. See Supplementary Table 2 and Methods of (Cell, 2020).
- {open_penalty,close_penalty}.pam : [[double array], ..., [double array]]
A 2D array specifying the opening/closing penalty for PAM sequences in sequence.pam. If a single array is specified, the same opening/closing penalty will be used for all PAM sequences. See Methods and Supplementary Table 2 of (Cell, 2020).
- {open_penalty,close_penalty}.{consites,cutsites} : [[double array], ..., [double array]]
A 2D array specifying the opening/closing penalty for conserved sites/cutsites in sequence.segments as defined in Figure 1B of (Cell, 2020). If a single array is specified, the same opening/closing penalty will be used for all target sites. The arrays must be of constant width, and the combined widths of the conserved site and cutsite penalties must match the number of bps in the target site. See Methods and Supplementary Table 2 of (Cell, 2020).
- open_penalty.init : [double]
Specify the opening insertion penalty. See Methods and Supplementary Table 2 of (Cell, 2020).
Each ExperimentSummary object contains a CARLIN_def field that stores information about the CARLIN_amplicon used when running the pipeline. You can only merge ExperimentSummary objects if they have the same CARLIN_amplicon saved in their respective CARLIN_def fields. Additionally, you can only assess statistical significance of alleles in an ExperimentSummary object against a Bank created using ExperimentSummary objects with the same CARLIN_def. Otherwise, all other features in the pipeline are agnostic to the specifics of the CARLIN amplicon.
If you're running into issues, check if the CARLIN pipeline runs properly on test datasets on your installation.
>> cd([CODE_PATH '/tests']);
>> runtests;There are also examples in tests/CARLIN_pipeline_test.m to check if your invocation is correct. For other examples of how the code is used, please visit the paper repository.





