Computational analysis for "Comparative Analysis of AAV Serotypes for the Transduction of Olfactory Sensory Neurons" by Belfort and Jia et al. 2024
This document describes three Python scripts designed to handle AnnData objects for Gene Expression Omnibus (GEO) submissions: a preprocessing script, an AnnData to GEO conversion script, and a GEO to AnnData reconstruction script.
Python
pandas
anndata
scipy
numpy
Standardizes the file structure of GEO submission files by removing prefixes and organizing files into a consistent format.
input_dir: Path to the directory containing all screen folders and final object folders.
- Standardized file structure in the specified output directory.
- Processes each subdirectory in the input directory.
- Removes prefixes from filenames.
- Copies files to a new directory structure with standardized names.
python preprocess_geo_files.py /path/to/input/directory /path/to/output/directoryConverts an AnnData object to the format required for GEO submission.
input_file: Path to the input .h5ad file.output_dir: Path to the output directory for GEO submission files.--prefix(optional): Prefix to add to output files.--chunk_size(optional): Chunk size for splitting large matrices (default: 1,000,000).
- GEO submission files in the specified output directory, including:
- Cell metadata (obs)
- Gene metadata (var)
- Count matrix (in chunks)
- Layer data (if present)
- Raw data (if present)
- README file
- Reads an AnnData object.
- Saves metadata, count matrix, layers, and raw data (if present) in GEO-compatible format.
- Splits large matrices into chunks.
- Generates a README file describing the dataset.
python anndata_to_geo.py input.h5ad /path/to/output/directory --prefix optional_prefix_ --chunk_size 500000Reconstructs an AnnData object from GEO submission files.
input_dir: Path to the directory containing GEO submission files.output_file: Path to save the reconstructed .h5ad file.
- Reconstructed AnnData object saved as an .h5ad file.
- Automatically detects file prefixes.
- Reads cell metadata, gene metadata, and count matrix.
- Reconstructs layers and raw data if present.
- Converts matrices to sparse format if they contain more than 50% zero values.
- Creates and saves an AnnData object.
python geo_to_anndata.py /path/to/geo/submission/files /path/to/output/reconstructed_file.h5ad-
Standardize your file structure:
python preprocess_geo_files.py /original/files /standardized/files
-
Convert AnnData to GEO format (if needed):
python anndata_to_geo.py input.h5ad /geo/submission/files --prefix screen1_
-
Reconstruct AnnData from GEO files:
python geo_to_anndata.py /geo/submission/files reconstructed_data.h5ad
Remember to adjust file paths and options according to your specific dataset and requirements.