Skip to content

Set Configs

gx-Cai edited this page Jul 17, 2025 · 1 revision

Setting configs for training TopicVI

The TopicVI model can be configured using a YAML file. Below is an example configuration file that you can use to set up your training parameters.

use the code to create a default configuration file:

import topicvi
import yaml

config = topicvi.make_default_config(
    project_name="example running config",
    save_dir="/path/to/save/",
)
# config is a dictionary containing the default configuration
# you can modify it in python or save it to a YAML file

os.makedirs(config['save_dir'], exist_ok=True)
topicvi.utils.write_config(
    config,
    os.path.join(config['save_dir'], "config.yaml"),
)

Setting the config when training the model.

topicvi.run_topicvi(
    adata, 
    config=config, # or the path to the config file
)

Simple Configuration File

Only the essential parameters are included in this configuration file. This is useful for quick setups or when you want to run a basic training without many customizations. Pay attention to "#MODIFY#" comments, which indicate where you should modify the parameters according to your needs.

project_name: "example running config"
save_dir: "/path/to/save/"
description:

train_kwargs:
  early_stopping: true
  # early_stopping_patience:

data_kwargs: #MODIFY# based on your data
  batch_key: "batch"
  label_key: "cell_type"
  size_factor_key: "size_factor"
  annotation_key: "annotation" 

model_kwargs:
  n_topics: 20
  n_clusters: 10

extra_kwargs:
  topicvi:
    model_kwargs:
    data_kwargs:
      default_cluster_key: 'leiden' # will use to initialization the clustering. if not set, it will automatically run leiden clustering
    train_kwargs:
        pretrain_model: /path/to/pretrained/model # path to a pretrained model
        cl_weight: 1 #MODIFY# # weight for the cluster loss, larger value means more focus on clustering.

The detailed configuration file

This configuration file includes all parameters that can be set for training TopicVI. It allows for more detailed customization of the model and training process. You can modify the parameters according to your specific requirements. "#HYPERPARAMETERS#" comments indicate where you can adjust hyperparameters for your model.

# TopicVI Configuration File

project_name: "example running config"
save_dir: "/path/to/save/"
description:

# basic parameters for training
train_kwargs:
  max_epochs: 500
  batch_size: 512
  lr: 0.001
  early_stopping: true
  early_stopping_patience: 10


# basic parameters for data
data_kwargs:
  batch_key: "batch"
  label_key: "cell_type"
  size_factor_key: "size_factor"
  annotation_key: "annotation" # in `data.uns` with keys: [clusters, background]. set when in preprerpocess function `topicvi.prior.add_prior_to_adata(key_added='annotation')`  

# model parameters
model_kwargs:
  n_topics: 32 # if you do not know how many topics to use, you can start with 32 or set it by 2*n_clusters+10
  n_clusters: 10

# extra parameters [To compatiable with benchmarking analysis TopicVI analysis]
# !IMPORTANT!: Will cover the above parameters if set
extra_kwargs:
    topicvi:
        data_kwargs:
            setup_kwargs: # additional setup parameters for data setup, see the scvi documentation.
            default_cluster_key: 'leiden' # will use to initialization the clustering. if not set, it will automatically run leiden clustering
        model_kwargs:
            max_init_cells: 10000 # max cells to use for initialization [only used when `default_cluster_key` is not set]
            running_mode: 'unsupervised' # 'unsupervised' or 'supervised'
            topic_decoder_params:
                topic_similarity_penalty_weight: 1 # HYPERPARAMETERS# # weight for the topic similarity penalty
                n_topics_without_prior: #HYPERPARAMETERS# number of topics without prior. if not set, it will use 25% percent of n_topics
            cluster_decoder_params:
                alpha: 10 # HYPERPARAMETERS# use for DCE loss
                adaptive_penalty_weight: 1 # HYPERPARAMETERS# weight for the adaptive penalty
                selftraining_penalty_weight: 1 # HYPERPARAMETERS# weight for the self-training penalty
                center_penalty_weight: 1 # HYPERPARAMETERS# weight for the center penalty
            topicvi_kwargs:
                n_hidden:
                n_latent:
                n_layers:
                dropout_rate: 0 # it should be 0 in topicVI model.
                dispersion: 'gene' # 'gene' or 'gene-batch', or 'gene-label'
                gene_likelihood: 'nb' # 'nb' or 'zinb' or 'normal'
        train_kwargs:
            pretrain_model: /path/to/pretrained/model # path to a pretrained model, if not exist, it will train from scratch
            pretrain_kwargs: # parameters for pretraining the scvi model, see the scvi documentation.
            gene_emb_dir: /path/to/gene/embeddings # Include gene embeddings from other source. Not recommended. In practice, it will not make model better. 
            ## HYPERPARAMETERS##
            kl_weight: 1 # weight for the KL divergence loss
            ce_weight: 1 # weight for the topic building loss
            cl_weight: 1 # weight for the cluster loss
            classification_ratio: 1 # ratio for the classification loss in supervised mode

Clone this wiki locally