-
Notifications
You must be signed in to change notification settings - Fork 1
Set Configs
The TopicVI model can be configured using a YAML file. Below is an example configuration file that you can use to set up your training parameters.
use the code to create a default configuration file:
import topicvi
import yaml
config = topicvi.make_default_config(
project_name="example running config",
save_dir="/path/to/save/",
)
# config is a dictionary containing the default configuration
# you can modify it in python or save it to a YAML file
os.makedirs(config['save_dir'], exist_ok=True)
topicvi.utils.write_config(
config,
os.path.join(config['save_dir'], "config.yaml"),
)Setting the config when training the model.
topicvi.run_topicvi(
adata,
config=config, # or the path to the config file
)Only the essential parameters are included in this configuration file. This is useful for quick setups or when you want to run a basic training without many customizations. Pay attention to "#MODIFY#" comments, which indicate where you should modify the parameters according to your needs.
project_name: "example running config"
save_dir: "/path/to/save/"
description:
train_kwargs:
early_stopping: true
# early_stopping_patience:
data_kwargs: #MODIFY# based on your data
batch_key: "batch"
label_key: "cell_type"
size_factor_key: "size_factor"
annotation_key: "annotation"
model_kwargs:
n_topics: 20
n_clusters: 10
extra_kwargs:
topicvi:
model_kwargs:
data_kwargs:
default_cluster_key: 'leiden' # will use to initialization the clustering. if not set, it will automatically run leiden clustering
train_kwargs:
pretrain_model: /path/to/pretrained/model # path to a pretrained model
cl_weight: 1 #MODIFY# # weight for the cluster loss, larger value means more focus on clustering.
This configuration file includes all parameters that can be set for training TopicVI. It allows for more detailed customization of the model and training process. You can modify the parameters according to your specific requirements. "#HYPERPARAMETERS#" comments indicate where you can adjust hyperparameters for your model.
# TopicVI Configuration File
project_name: "example running config"
save_dir: "/path/to/save/"
description:
# basic parameters for training
train_kwargs:
max_epochs: 500
batch_size: 512
lr: 0.001
early_stopping: true
early_stopping_patience: 10
# basic parameters for data
data_kwargs:
batch_key: "batch"
label_key: "cell_type"
size_factor_key: "size_factor"
annotation_key: "annotation" # in `data.uns` with keys: [clusters, background]. set when in preprerpocess function `topicvi.prior.add_prior_to_adata(key_added='annotation')`
# model parameters
model_kwargs:
n_topics: 32 # if you do not know how many topics to use, you can start with 32 or set it by 2*n_clusters+10
n_clusters: 10
# extra parameters [To compatiable with benchmarking analysis TopicVI analysis]
# !IMPORTANT!: Will cover the above parameters if set
extra_kwargs:
topicvi:
data_kwargs:
setup_kwargs: # additional setup parameters for data setup, see the scvi documentation.
default_cluster_key: 'leiden' # will use to initialization the clustering. if not set, it will automatically run leiden clustering
model_kwargs:
max_init_cells: 10000 # max cells to use for initialization [only used when `default_cluster_key` is not set]
running_mode: 'unsupervised' # 'unsupervised' or 'supervised'
topic_decoder_params:
topic_similarity_penalty_weight: 1 # HYPERPARAMETERS# # weight for the topic similarity penalty
n_topics_without_prior: #HYPERPARAMETERS# number of topics without prior. if not set, it will use 25% percent of n_topics
cluster_decoder_params:
alpha: 10 # HYPERPARAMETERS# use for DCE loss
adaptive_penalty_weight: 1 # HYPERPARAMETERS# weight for the adaptive penalty
selftraining_penalty_weight: 1 # HYPERPARAMETERS# weight for the self-training penalty
center_penalty_weight: 1 # HYPERPARAMETERS# weight for the center penalty
topicvi_kwargs:
n_hidden:
n_latent:
n_layers:
dropout_rate: 0 # it should be 0 in topicVI model.
dispersion: 'gene' # 'gene' or 'gene-batch', or 'gene-label'
gene_likelihood: 'nb' # 'nb' or 'zinb' or 'normal'
train_kwargs:
pretrain_model: /path/to/pretrained/model # path to a pretrained model, if not exist, it will train from scratch
pretrain_kwargs: # parameters for pretraining the scvi model, see the scvi documentation.
gene_emb_dir: /path/to/gene/embeddings # Include gene embeddings from other source. Not recommended. In practice, it will not make model better.
## HYPERPARAMETERS##
kl_weight: 1 # weight for the KL divergence loss
ce_weight: 1 # weight for the topic building loss
cl_weight: 1 # weight for the cluster loss
classification_ratio: 1 # ratio for the classification loss in supervised mode