Written by: Dieter van der Westhuizen
Date: 2023
This script is designed for filtering mitochondrial genome data in a bioinformatics pipeline. Given an input .txt file containing filter criteria and a .tsv file containing the dataset, the script filters the dataset based on the specified criteria and outputs the filtered data to a new CSV file.
- User-friendly interface: Provides a list of available
.txtand.tsvfiles in the current working directory for the user to select. - Separator detection: Dynamically detects the separator used in the input files (such as comma, semicolon, pipe, or tab) for accurate reading.
- Filtering mechanism: Efficiently filters the
.tsvdataset using criteria specified in the.txtfile and merges additional information from the filter list into the final result. - CSV output: Exports the filtered dataset to a new CSV file, which is named by combining the names of the
.txtand.tsvfiles.
- Place the script in a directory containing your
.txtand.tsvfiles. - Run the script. You'll first be prompted to select a
.txtfile containing your filter criteria. - Next, you'll be prompted to select a
.tsvfile containing the dataset. - The script will then detect the separators, filter the dataset, and save the result to a new CSV file in the same directory.
- Check the newly generated CSV file for your filtered results.
- Python environment with the
pandaslibrary installed. - Ensure that the
.txtfile contains a column named "Gene Symbol" and the.tsvfile has a column named "SYMBOL" for the filtering to work correctly.
- If you haven't installed Anaconda yet, download and install it from the official website.
- Open Anaconda Navigator or the Anaconda terminal.
- Create a new environment with the desired Python version (e.g.,
3.8):conda create --name mito_filtering python=3.8
- Activate the environment:
conda activate mito_filtering
- Installing Dependencies
Install the required pandas library:
conda install pandas
- Running the Script
Navigate to the script directory:
cd path_to_your_script_directory - Run the script:
python mitochondrial_analysis.py
- Closing Anaconda
Once done, you can deactivate the Anaconda environment:
conda deactivate
###Setting Up Jupyter Notebook in Anaconda Ensure you've activated the Anaconda environment where you have the required dependencies installed. If you haven't, refer to the instructions in the "Setting Up Anaconda" and "Creating and Activating a New Environment" sections above.
Install jupyter within the environment:
```bash
conda install jupyter
Launch Jupyter Notebook:
```bash
jupyter notebook
You can run each cell by selecting the cell and clicking the "Run" button or by pressing Shift + Enter. Make sure you run the cells in order, especially if they have dependencies on previous cells. You'll see the outputs (e.g., print statements) right below each cell after you run it. If you want to save the outputs, click on "File" and then "Save and Checkpoint". Once you're done, you can close the Jupyter Notebook browser tab and stop the Jupyter server in your terminal by pressing Ctrl + C twice.