Skip to content

Project Plan

William Wallén edited this page Apr 3, 2025 · 9 revisions

Project Plan: Re-analysis of Streptomyces rimosus (Paper 2)

1. Introduction

In Paper 2, “Systems biology of industrial oxytetracycline production in Streptomyces rimosus: the secrets of a mutagenized hyperproducer” (Beganovic et al., 2023), the authors investigate why a particular mutagenized strain (S. rimosus HP126) significantly outperforms the wild-type (S. rimosus R7) in producing oxytetracycline, which is an important antibiotic. To understand this hyperproducer, they apply multiomics techniques.

Overall, Beganovic et. al concluded that the hyperproducers mutations relative to the wild-type were affecting the metabolic pathway of the production of oxytetracycline, and not mutations in the oxytetracycline gene cluster. They found that the precursors for this cluster were upregulated, while competing genes were downregulated, leading to an increase in the production of the antibiotic.

Our project will re-analyze the raw sequencing data from Beganovic et. al, 2023. Specifically, we will replicate (to an extent) steps such as genome assembly and evaluation, trimming, annotation, and any comparative genomics to confirm the major rearrangements and expression shifts that drive enhanced oxytetracycline yields in S. rimosus HP126. We will also count and map RNA-seq reads, as well as do synteny comparisons with the wild-type genome.

2. Aim and Research Questions

Aim:
Re-analyze the genome and transcriptome data from wild-type S. rimosus and a mutagenized hyperproducer strain to:

  1. Evaluate how the genome assembly compares to the references wild-type (e.g. synteny).
  2. Investigate differential gene expression (RNA-seq).
  3. Assess regulatory differences that might explain enhanced antibiotic production.

Research Questions

  1. Genomic Rearrangements
    • Did our assembly/analysis confirm the published large-scale deletions or duplications?
  2. Gene Expression Patterns
    • Which biosynthetic cluster genes show up/downregulation in the hyperproducer relative to the wild-type?

3. Data Description

  • Whole Genome Sequencing (WGS) short reads (Illumina) for both wild-type and mutant, long reads with Nanopore.
  • RNA-seq reads for both wild-type and mutant (used for differential expression).
  • Metadata for documentation of the data, for example, where the data came from and what the columns mean.

Total file size estimations:

  • Raw data files are about 5 GB.
  • Up to 30+ GB total storage usage with uncompressed raw data and intermediate files (e.g., assemblies, alignments).

4. Project Organisation

  • Metadata: We will keep metadata together with its corresponding data. This is to document what the data means. It should be well-structured and understandable to read. The metadata for the reads was obtained from NCBI.

  • Structure: There should be a clear structure in the working directory. Scripts, output files and data are kept in different directories named appropriately.

  • File Naming: The file naming should be clear. The convention I will use for output directories is dd-mm_analysis_informative-name-of-sample.

5. Planned Analyses and Workflow

Below is the plan for the analyses with the goal and programs.

5.1 Read Quality Control

  • Tools: FastQC
  • Goal: Check the quality of the raw data
  • Estimated Time: ~15 minutes

5.2 Read Trimming/Preprocessing

  • Tools: Trimmomatic (short reads) and Porechop (long reads)
  • Goal: Remove adapters, low-quality bases
  • Estimated Time: 1.5h for long reads, 15 min for short reads

5.3 Genome Assembly and Assembly Evaluation

  1. Assembly
    • Tools: Flyeand Pilon
    • Goal: Reconstruct the S. rimosus genome for each strain
    • Estimated Time: 2h
  2. Assembly Evaluation
    • Tools: QUAST and MUMmerplot
    • Goal: Evaluate our assembly outputs
    • Time: 5-10min

5.4 Annotation

  • Tools: Prokka and eggNOGmapper
  • Goal: Identify genes, including those in oxytetracycline cluster
  • Estimated Time: 10min (Prokka), 13h (eggNOGmapper)

5.5 Mapping and Expression Analysis

  1. Alignment
    • Tools: BWA
    • Goal: Align the sequences to reference genome
    • Time: 15min per sample
  2. Differential Expression
    • Tools: HTSeq, featureCounts
    • Goal: Quantify how many RNA-seq reads map to each gene
    • Time: 30min per sample
  3. Gene Comparison
    • Tools: BLAST
    • Goal: Check for large differences or confirm the presence/absence of specific genes in the hyperproducer
    • Time: 1min

6. Deadlines

Below is a plan for when different analyses has to be done to be able to finish in time.

07-04-2025 Genome Assembly

10-04-2025 DNA mapping

22-04-2025 Polishing

25-04-2025 Annotation

28-04-2025 Comparative Genomics

02-05-2025 RNA Trimming

09-05-2025 RNA Mapping

14-05-2025 Read Counting

20-05-2025 DE analysis

23-05-2025 Finish wiki

7. Refrences

Beganovic S, Rückert-Reed C, Sucipto H, Shu W, Gläser L, Patschkowski T, Struck B, Kalinowski J, Luzhetskyy A, Wittmann C. 2023. Systems biology of industrial oxytetracycline production in Streptomyces rimosus: the secrets of a mutagenized hyperproducer. Microbial Cell Factories 22: 222.