diff --git a/.claude/README.md b/.claude/README.md new file mode 100644 index 0000000..b457d53 --- /dev/null +++ b/.claude/README.md @@ -0,0 +1,39 @@ +# Claude Code Configuration + +This directory contains configuration and hooks for Claude Code development sessions. + +## Session Start Hook + +The `hooks/SessionStart` script runs automatically when you start a Claude Code session. It: + +- Verifies conda/mamba is available +- Checks if the `aatrnaseqpipe` environment exists (creates it if not) +- Activates the environment +- Validates Snakemake installation +- Validates configuration files +- Sets up pre-commit hooks +- Runs a quick Snakemake syntax check + +This ensures your development environment is properly configured before starting work. + +## Manual Testing + +You can run the session start hook manually: + +```bash +bash .claude/hooks/SessionStart +``` + +## Development Workflow + +1. **Before making changes**: Run `bash .tests/run_local_tests.sh` to verify everything works +2. **During development**: Use `snakemake -n --configfile=config/config-test.yml` for syntax checks +3. **Before committing**: Pre-commit hooks will automatically run linting and formatting +4. **After committing**: GitHub Actions will run full CI/CD checks + +## Learn More + +- Project documentation: `CLAUDE.md` +- Local test script: `.tests/run_local_tests.sh` +- CI/CD workflows: `.github/workflows/` +- Pre-commit config: `.pre-commit-config.yaml` diff --git a/.claude/SETUP.md b/.claude/SETUP.md new file mode 100644 index 0000000..c37392c --- /dev/null +++ b/.claude/SETUP.md @@ -0,0 +1,254 @@ +# Complete Test and Build Check Setup + +This document describes the comprehensive test and build infrastructure for the aa-tRNA-seq-pipeline. + +## โœ… What's Installed + +### 1. GitHub Actions CI/CD + +**Location**: `.github/workflows/` + +#### CI Pipeline (`ci.yml`) +Runs on: push to main/master/develop/claude/**, pull requests, manual trigger + +- **Syntax Check Job** + - Sets up conda environment with Snakemake + - Validates Snakemake syntax with dry-run + +- **Pipeline Integration Test Job** + - Downloads test data + - Sets up dorado and modkit tools + - Runs merge_pods rule (non-GPU test) + - Validates output directory creation + +- **Configuration Validation Job** + - Validates YAML syntax in all config files + - Checks samples.tsv files exist and have content + - Verifies required directory structure + +#### Lint Pipeline (`lint.yml`) +Runs on: push to main/master/develop/claude/**, pull requests, manual trigger + +- **Snakemake Linting**: snakefmt format checking +- **Python Linting**: black and flake8 on workflow/scripts/ +- **Markdown Linting**: markdownlint on all .md files +- **YAML Linting**: yamllint on config/ and .github/ + +### 2. Pre-commit Hooks + +**Location**: `.pre-commit-config.yaml` + +Automatically runs before each commit: +- Trailing whitespace removal +- End-of-file fixing +- YAML syntax validation +- Large file detection (>1MB warning) +- Merge conflict detection +- Line ending normalization +- Black formatting for Python scripts +- Flake8 linting for Python scripts +- Snakefmt formatting for Snakemake files + +**Installation**: +```bash +pip install pre-commit +pre-commit install +``` + +### 3. Local Test Script + +**Location**: `.tests/run_local_tests.sh` + +Runs the same checks as CI locally: +1. Verifies Snakemake installation +2. Validates configuration files (YAML syntax) +3. Checks sample files exist +4. Runs Snakemake dry-run (syntax check) +5. Verifies required directory structure + +**Usage**: +```bash +# Requires active conda environment +mamba activate aatrnaseqpipe +bash .tests/run_local_tests.sh +``` + +### 4. Claude Code Session Hooks + +**Location**: `.claude/hooks/SessionStart` + +Automatically runs when starting a Claude Code session: +- Verifies conda/mamba availability +- Creates aatrnaseqpipe environment if missing +- Activates the environment +- Validates Snakemake installation +- Checks config file syntax +- Installs pre-commit hooks if not present +- Runs quick Snakemake syntax check +- Displays helpful development commands + +## ๐Ÿ“‹ Development Workflow + +### Starting a New Session + +When you start working with Claude Code, the SessionStart hook automatically: +1. Sets up your environment +2. Validates configurations +3. Shows available commands + +### Before Making Changes + +```bash +# Run full local test suite +bash .tests/run_local_tests.sh + +# Or just syntax check +snakemake -n --configfile=config/config-test.yml +``` + +### During Development + +```bash +# Test specific rule +snakemake -n --configfile=config/config-test.yml + +# Run linting manually +pre-commit run --all-files + +# Format Snakemake files +snakefmt workflow/ + +# Format Python scripts +black workflow/scripts/ +``` + +### Before Committing + +Pre-commit hooks run automatically. If they fail: +```bash +# Fix issues and re-stage +git add + +# Or skip hooks (not recommended) +git commit --no-verify +``` + +### After Pushing + +GitHub Actions automatically runs: +- All syntax checks +- Integration tests +- All linters +- Configuration validation + +Check status at: `https://github.com///actions` + +## ๐Ÿงช Testing Levels + +### Level 1: Quick Validation (< 30 seconds) +```bash +# Config and syntax only +python3 -c "import yaml; yaml.safe_load(open('config/config-test.yml'))" +snakemake -n --configfile=config/config-test.yml +``` + +### Level 2: Local Tests (~ 2 minutes) +```bash +# Full local test suite +bash .tests/run_local_tests.sh +``` + +### Level 3: Integration Test (~ 10-15 minutes) +```bash +# Download test data (first time only) +bash .tests/dl_test_data.sh + +# Setup tools (first time only) +snakemake setup_dorado dorado_model setup_modkit --cores 1 --configfile=config/config-test.yml + +# Run non-GPU pipeline rules +snakemake merge_pods --cores 2 --configfile=config/config-test.yml +``` + +### Level 4: Full Pipeline (requires GPU) +```bash +# Run complete pipeline with test data +snakemake --cores 12 --configfile=config/config-test.yml + +# Or submit to LSF cluster +bsub < run-test.sh +``` + +## ๐Ÿ” Continuous Integration Details + +### What Gets Tested on Every Push + +1. **Snakemake syntax**: Dry-run validation +2. **Config files**: YAML syntax validation +3. **Sample files**: Existence and content checks +4. **Directory structure**: Required directories present +5. **Code formatting**: Black, flake8, snakefmt +6. **Documentation**: Markdown linting + +### What Gets Tested on Pull Requests + +All of the above, plus: +- Integration test with test data +- Tool setup (dorado, modkit) +- Pipeline execution (non-GPU rules) + +### What's NOT Tested in CI + +Due to GitHub Actions limitations: +- GPU-intensive rules (rebasecall, classify_charging) +- Full end-to-end pipeline +- LSF cluster execution + +These should be tested locally or on your cluster before merging. + +## ๐Ÿ› ๏ธ Maintenance + +### Updating Dependencies + +```bash +# Update conda environment +mamba env update -n aatrnaseqpipe -f workflow/envs/aatrnaseqpipe-env.yml + +# Update pre-commit hooks +pre-commit autoupdate +``` + +### Adding New Tests + +1. **Local tests**: Edit `.tests/run_local_tests.sh` +2. **CI tests**: Edit `.github/workflows/ci.yml` +3. **Linting**: Edit `.pre-commit-config.yaml` and `.github/workflows/lint.yml` + +### Troubleshooting + +**Pre-commit hooks failing?** +```bash +# Run manually to see detailed errors +pre-commit run --all-files + +# Update hooks to latest versions +pre-commit autoupdate +``` + +**CI failing but local tests pass?** +- Check GitHub Actions logs for specific errors +- Ensure all files are committed +- Verify config files are valid YAML + +**SessionStart hook not running?** +- Check if Claude Code session hooks are enabled +- Run manually: `bash .claude/hooks/SessionStart` +- Ensure file is executable: `chmod +x .claude/hooks/SessionStart` + +## ๐Ÿ“š Additional Resources + +- **Project Overview**: `CLAUDE.md` +- **CI Workflows**: `.github/workflows/` +- **Pre-commit Config**: `.pre-commit-config.yaml` +- **Test Scripts**: `.tests/` +- **Snakemake Docs**: https://snakemake.readthedocs.io/ diff --git a/.claude/hooks/SessionStart b/.claude/hooks/SessionStart new file mode 100755 index 0000000..ef1b530 --- /dev/null +++ b/.claude/hooks/SessionStart @@ -0,0 +1,111 @@ +#!/bin/bash +# Claude Code Session Start Hook +# This script runs automatically when starting a Claude Code session +# It verifies the development environment is properly configured + +set -e + +echo "==========================================" +echo "๐Ÿงฌ aa-tRNA-seq-pipeline Development Setup" +echo "==========================================" + +# Function to print colored output +print_status() { + echo "โœ“ $1" +} + +print_error() { + echo "โœ— $1" +} + +print_info() { + echo "โ„น $1" +} + +# Check if conda is available +if ! command -v conda &> /dev/null && ! command -v mamba &> /dev/null; then + print_error "Conda/Mamba not found. Please install miniforge or miniconda." + exit 1 +fi +print_status "Conda/Mamba available" + +# Check if environment exists +if conda env list | grep -q "aatrnaseqpipe"; then + print_status "Environment 'aatrnaseqpipe' exists" +else + print_info "Environment 'aatrnaseqpipe' not found. Creating it now..." + mamba env create -f workflow/envs/aatrnaseqpipe-env.yml || { + print_error "Failed to create environment" + exit 1 + } + print_status "Environment created successfully" +fi + +# Activate environment if not already activated +if [[ "${CONDA_DEFAULT_ENV}" != "aatrnaseqpipe" ]]; then + print_info "Activating aatrnaseqpipe environment..." + eval "$(conda shell.bash hook)" + conda activate aatrnaseqpipe || { + print_error "Failed to activate environment. Please run: mamba activate aatrnaseqpipe" + exit 1 + } + print_status "Environment activated" +else + print_status "Environment 'aatrnaseqpipe' already activated" +fi + +# Verify Snakemake installation +if command -v snakemake &> /dev/null; then + SNAKEMAKE_VERSION=$(snakemake --version) + print_status "Snakemake ${SNAKEMAKE_VERSION} installed" +else + print_error "Snakemake not found in environment" + exit 1 +fi + +# Quick config validation +echo "" +echo "Validating configuration files..." +if python3 -c "import yaml; yaml.safe_load(open('config/config-base.yml'))" 2>/dev/null && \ + python3 -c "import yaml; yaml.safe_load(open('config/config-test.yml'))" 2>/dev/null; then + print_status "Config files are valid" +else + print_error "Config file validation failed" + exit 1 +fi + +# Check if pre-commit is installed +if command -v pre-commit &> /dev/null; then + print_status "Pre-commit hooks available" + if [ ! -f .git/hooks/pre-commit ]; then + print_info "Installing pre-commit hooks..." + pre-commit install + print_status "Pre-commit hooks installed" + fi +else + print_info "Pre-commit not installed. Install with: pip install pre-commit" +fi + +# Quick syntax check +echo "" +echo "Running Snakemake syntax check..." +if snakemake -n --configfile=config/config-test.yml > /dev/null 2>&1; then + print_status "Snakemake syntax check passed" +else + print_error "Snakemake syntax check failed. Run for details: snakemake -n --configfile=config/config-test.yml" +fi + +echo "" +echo "==========================================" +echo "โœ“ Development environment ready!" +echo "==========================================" +echo "" +echo "Available commands:" +echo " โ€ข Run local tests: bash .tests/run_local_tests.sh" +echo " โ€ข Syntax check: snakemake -n --configfile=config/config-test.yml" +echo " โ€ข Dry run: snakemake -n --cores 1 --configfile=config/config-test.yml" +echo " โ€ข Format code: pre-commit run --all-files" +echo " โ€ข Full test setup: bash .tests/dl_test_data.sh" +echo "" +echo "Documentation: See CLAUDE.md for project details" +echo "" diff --git a/.github/TESTING.md b/.github/TESTING.md new file mode 100644 index 0000000..e813733 --- /dev/null +++ b/.github/TESTING.md @@ -0,0 +1,222 @@ +# Testing and CI/CD Documentation + +This document describes the testing infrastructure and continuous integration (CI/CD) setup for the aa-tRNA-seq-pipeline. + +## Overview + +The pipeline includes several automated checks to ensure code quality and functionality: + +1. **CI Workflow** (`.github/workflows/ci.yml`) - Syntax validation and integration tests +2. **Lint Workflow** (`.github/workflows/lint.yml`) - Code quality and formatting checks +3. **Local Test Script** (`.tests/run_local_tests.sh`) - Run tests locally before pushing +4. **Pre-commit Hooks** (`.pre-commit-config.yaml`) - Automatic checks before git commits + +## CI Workflow + +The main CI workflow runs on every push and pull request. It includes three jobs: + +### 1. Syntax Check +- Sets up Mambaforge and the conda environment +- Verifies Snakemake installation +- Runs `snakemake -n` (dry-run) to validate workflow syntax + +### 2. Pipeline Integration Test +- Downloads test data +- Sets up dorado and modkit tools +- Downloads the dorado basecalling model +- Runs the `merge_pods` rule as a basic integration test +- Validates that output files are created + +### 3. Configuration Validation +- Validates YAML syntax of all config files +- Checks that sample files exist and have content +- Verifies required directory structure + +## Lint Workflow + +The linting workflow checks code quality and formatting: + +### 1. Snakemake Linting +- Uses `snakefmt` to check Snakemake file formatting + +### 2. Python Linting +- **black**: Checks Python code formatting +- **flake8**: Checks for Python code quality issues +- Both checks are non-blocking (won't fail the build) + +### 3. Markdown Linting +- Uses `markdownlint` to check Markdown file formatting +- Non-blocking + +### 4. YAML Linting +- Uses `yamllint` to check YAML file formatting +- Non-blocking + +## Running Tests Locally + +### Quick Syntax Check + +Run the local test script to verify basic functionality: + +```bash +# Activate the conda environment first +mamba activate aatrnaseqpipe + +# Run the test script +bash .tests/run_local_tests.sh +``` + +This script checks: +- Snakemake installation +- Config file validity +- Sample file existence +- Workflow syntax (dry-run) +- Required directory structure + +### Full Integration Test + +To run a complete test of the pipeline: + +```bash +# 1. Download test data (first time only) +bash .tests/dl_test_data.sh + +# 2. Set up tools (first time only) +snakemake setup_dorado dorado_model setup_modkit --configfile=config/config-test.yml + +# 3. Run the pipeline with test data +snakemake --cores 2 --configfile=config/config-test.yml +``` + +### Running Specific Tests + +```bash +# Syntax check only +snakemake -n --configfile=config/config-test.yml + +# Run a specific rule +snakemake merge_pods --configfile=config/config-test.yml --cores 1 + +# Force rerun of a specific rule +snakemake --forcerun --configfile=config/config-test.yml +``` + +## Pre-commit Hooks + +Pre-commit hooks automatically check your code before each commit. + +### Installation + +```bash +# Install pre-commit +pip install pre-commit + +# Install the git hooks +pre-commit install +``` + +### Usage + +Once installed, the hooks will run automatically on `git commit`. To run manually: + +```bash +# Run on all files +pre-commit run --all-files + +# Run on staged files only +pre-commit run +``` + +### What Gets Checked + +- Trailing whitespace removal +- End-of-file fixer +- YAML syntax validation +- Large file detection (max 1MB) +- Merge conflict detection +- Line ending normalization +- Python code formatting (black) +- Python linting (flake8) +- Snakemake formatting (snakefmt) + +## CI Status Badges + +The README includes badges showing the status of CI workflows: + +- [![CI](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/ci.yml/badge.svg)](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/ci.yml) - Main CI tests +- [![Lint](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/lint.yml/badge.svg)](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/lint.yml) - Code quality checks + +## Troubleshooting + +### CI Failures + +If the CI workflow fails: + +1. Check the GitHub Actions logs for detailed error messages +2. Run the same commands locally to reproduce the issue +3. Use the local test script to verify basic functionality +4. Ensure all config files are valid YAML +5. Verify that test data paths are correct in `config/samples-test.tsv` + +### Common Issues + +**Conda environment not activated:** +```bash +Error: No conda environment is activated. +Solution: mamba activate aatrnaseqpipe +``` + +**Missing test data:** +```bash +Error: Test data not found +Solution: bash .tests/dl_test_data.sh +``` + +**Snakemake syntax errors:** +```bash +Solution: Check the error message and fix the syntax in the indicated file +Run: snakemake -n --configfile=config/config-test.yml +``` + +**Pre-commit hook failures:** +```bash +Solution: Fix the issues reported by the hooks +Run: pre-commit run --all-files +Bypass (not recommended): git commit --no-verify +``` + +## GPU Tests + +Note: The CI pipeline does not run GPU-intensive rules (`rebasecall`, `classify_charging`) as GitHub Actions runners don't have GPU access. These rules should be tested locally on a system with GPU access or on the HPC cluster. + +To test GPU rules locally: + +```bash +# Ensure CUDA is available +nvidia-smi + +# Run specific GPU rule +snakemake rebasecall --configfile=config/config-test.yml --cores 1 +snakemake classify_charging --configfile=config/config-test.yml --cores 1 +``` + +## Contributing + +When contributing to this repository: + +1. Install pre-commit hooks: `pre-commit install` +2. Run local tests before pushing: `bash .tests/run_local_tests.sh` +3. Ensure all CI checks pass on your pull request +4. Fix any linting issues reported by the Lint workflow + +## Future Enhancements + +Potential improvements to the testing infrastructure: + +- [ ] Add unit tests for Python scripts in `workflow/scripts/` +- [ ] Add integration tests for individual Snakemake rules +- [ ] Set up test data generation/validation +- [ ] Add code coverage reporting +- [ ] Add performance benchmarking +- [ ] Create Docker container for reproducible testing +- [ ] Add GPU-enabled CI runners for full pipeline testing diff --git a/.github/linters/.snakefmt.toml b/.github/linters/.snakefmt.toml deleted file mode 100644 index 5915968..0000000 --- a/.github/linters/.snakefmt.toml +++ /dev/null @@ -1,7 +0,0 @@ -[tool.snakefmt] -line_length = 90 -include = '\.smk$|^Snakefile|\.py$' - -# snakefmt passes these options on to black -[tool.black] -skip_string_normalization = true diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..5590b62 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,154 @@ +name: CI + +on: + push: + branches: [main, master, develop, claude/**] + pull_request: + branches: [main, master, develop] + workflow_dispatch: + +jobs: + syntax-check: + name: Snakemake Syntax Check + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup Miniforge + uses: conda-incubator/setup-miniconda@v3 + with: + miniforge-variant: Miniforge3 + miniforge-version: latest + activate-environment: aatrnaseqpipe + use-mamba: true + + - name: Cache conda environment + uses: actions/cache@v3 + with: + path: /usr/share/miniconda3/envs/aatrnaseqpipe + key: conda-${{ runner.os }}-${{ hashFiles('workflow/envs/aatrnaseqpipe-env.yml') }} + + - name: Install dependencies + shell: bash -l {0} + run: | + mamba env update -n aatrnaseqpipe -f workflow/envs/aatrnaseqpipe-env.yml + + - name: Verify Snakemake installation + shell: bash -l {0} + run: | + conda activate aatrnaseqpipe + snakemake --version + + - name: Snakemake dry-run (syntax check) + shell: bash -l {0} + run: | + conda activate aatrnaseqpipe + snakemake -n --configfile=config/config-test.yml + + pipeline-test: + name: Pipeline Integration Test + runs-on: ubuntu-latest + needs: syntax-check + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup Miniforge + uses: conda-incubator/setup-miniconda@v3 + with: + miniforge-variant: Miniforge3 + miniforge-version: latest + activate-environment: aatrnaseqpipe + use-mamba: true + + - name: Cache conda environment + uses: actions/cache@v3 + with: + path: /usr/share/miniconda3/envs/aatrnaseqpipe + key: conda-${{ runner.os }}-${{ hashFiles('workflow/envs/aatrnaseqpipe-env.yml') }} + + - name: Install dependencies + shell: bash -l {0} + run: | + mamba env update -n aatrnaseqpipe -f workflow/envs/aatrnaseqpipe-env.yml + + - name: Download test data + shell: bash -l {0} + run: | + bash .tests/dl_test_data.sh + + - name: Setup dorado + shell: bash -l {0} + run: | + conda activate aatrnaseqpipe + snakemake setup_dorado --cores 1 --configfile=config/config-test.yml + + - name: Download dorado model + shell: bash -l {0} + run: | + conda activate aatrnaseqpipe + snakemake dorado_model --cores 1 --configfile=config/config-test.yml + + - name: Setup modkit + shell: bash -l {0} + run: | + conda activate aatrnaseqpipe + snakemake setup_modkit --cores 1 --configfile=config/config-test.yml + + - name: Run pipeline test (non-GPU rules only) + shell: bash -l {0} + run: | + conda activate aatrnaseqpipe + # Run merge_pods rule as a basic test (doesn't require GPU) + snakemake merge_pods --cores 2 --configfile=config/config-test.yml + + - name: Validate outputs + shell: bash -l {0} + run: | + # Check that expected output files were created + if [ ! -d ".tests/outputs" ]; then + echo "Error: Output directory not created" + exit 1 + fi + echo "Pipeline test completed successfully" + + config-validation: + name: Configuration Validation + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Check config files syntax + run: | + # Validate YAML syntax + python3 -c "import yaml; yaml.safe_load(open('config/config-base.yml'))" + python3 -c "import yaml; yaml.safe_load(open('config/config-test.yml'))" + python3 -c "import yaml; yaml.safe_load(open('config/config-preprint.yml'))" + echo "All config files are valid YAML" + + - name: Check samples files + run: | + # Validate samples TSV files exist and have content + if [ ! -f "config/samples-test.tsv" ]; then + echo "Error: samples-test.tsv not found" + exit 1 + fi + if [ ! -s "config/samples-test.tsv" ]; then + echo "Error: samples-test.tsv is empty" + exit 1 + fi + echo "Sample files validated" + + - name: Check required directories + run: | + # Check that required directories exist + directories=("workflow" "workflow/rules" "workflow/scripts" "config" "resources" "cluster") + for dir in "${directories[@]}"; do + if [ ! -d "$dir" ]; then + echo "Error: Required directory $dir not found" + exit 1 + fi + done + echo "All required directories exist" diff --git a/.github/workflows/lint.yaml b/.github/workflows/lint.yaml deleted file mode 100644 index 81fc454..0000000 --- a/.github/workflows/lint.yaml +++ /dev/null @@ -1,34 +0,0 @@ -name: Lint with snakefmt - -on: # yamllint disable-line rule:truthy - push: null - pull_request: null - -permissions: {} - -jobs: - build: - name: Lint - runs-on: ubuntu-latest - - permissions: - contents: read - packages: read - # To report GitHub Actions status checks - statuses: write - - steps: - - name: Checkout code - uses: actions/checkout@v4 - with: - # super-linter needs the full git history to get the - # list of files that changed across commits - fetch-depth: 0 - - - name: Super-linter - uses: super-linter/super-linter@v7.2.1 # x-release-please-version - env: - # To report GitHub Actions status checks - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - VALIDATE_ALL_CODEBASE: false - VALIDATE_SNAKEMAKE_SNAKEFMT: true diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml new file mode 100644 index 0000000..b7f6f78 --- /dev/null +++ b/.github/workflows/lint.yml @@ -0,0 +1,97 @@ +name: Lint + +on: + push: + branches: [main, master, develop, claude/**] + pull_request: + branches: [main, master, develop] + workflow_dispatch: + +jobs: + snakemake-lint: + name: Snakemake Linting + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.11' + + - name: Install Snakemake + run: | + pip install snakemake snakefmt + + - name: Run snakefmt check + run: | + snakefmt --check workflow/ + + python-lint: + name: Python Linting + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.11' + + - name: Install linting tools + run: | + pip install black flake8 pylint + + - name: Run black check + continue-on-error: true + run: | + black --check workflow/scripts/ || echo "Black formatting issues found (non-blocking)" + + - name: Run flake8 + continue-on-error: true + run: | + flake8 workflow/scripts/ --max-line-length=100 --ignore=E203,W503 || echo "Flake8 issues found (non-blocking)" + + markdown-lint: + name: Markdown Linting + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: '18' + + - name: Install markdownlint + run: | + npm install -g markdownlint-cli + + - name: Run markdownlint + continue-on-error: true + run: | + markdownlint '**/*.md' --ignore node_modules || echo "Markdown linting issues found (non-blocking)" + + yaml-lint: + name: YAML Linting + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.11' + + - name: Install yamllint + run: | + pip install yamllint + + - name: Run yamllint + continue-on-error: true + run: | + yamllint -d "{extends: default, rules: {line-length: {max: 120}, document-start: disable}}" config/ .github/ || echo "YAML linting issues found (non-blocking)" diff --git a/.gitignore b/.gitignore index afead54..61099d7 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,4 @@ -.test +.tests .snakemake ext logs @@ -6,3 +6,6 @@ logs .temp_dorado_model* .vscode results/ +resources/tools/* +*.bam +*.pod5 diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 0000000..423a895 --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,36 @@ +# Pre-commit hooks for aa-tRNA-seq-pipeline +# Install: pip install pre-commit && pre-commit install +# Run manually: pre-commit run --all-files + +repos: + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.5.0 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + args: ['--allow-multiple-documents'] + - id: check-added-large-files + args: ['--maxkb=1000'] + - id: check-merge-conflict + - id: mixed-line-ending + + - repo: https://github.com/psf/black + rev: 23.12.1 + hooks: + - id: black + language_version: python3 + files: ^workflow/scripts/.*\.py$ + + - repo: https://github.com/pycqa/flake8 + rev: 7.0.0 + hooks: + - id: flake8 + args: ['--max-line-length=100', '--ignore=E203,W503'] + files: ^workflow/scripts/.*\.py$ + + - repo: https://github.com/snakemake/snakefmt + rev: v0.10.0 + hooks: + - id: snakefmt + files: ^workflow/.*\.(smk|Snakefile)$ diff --git a/.test/dl_test_data.sh b/.tests/dl_test_data.sh similarity index 100% rename from .test/dl_test_data.sh rename to .tests/dl_test_data.sh diff --git a/.test/make_test_data.sh b/.tests/make_test_data.sh similarity index 88% rename from .test/make_test_data.sh rename to .tests/make_test_data.sh index 630e16b..c68ae41 100644 --- a/.test/make_test_data.sh +++ b/.tests/make_test_data.sh @@ -45,22 +45,6 @@ od=sample2/pod5_fail mkdir -p $od pod5 filter $ex/rbc/JMW_510_37C/JMW_510_37C.pod5 --ids ex2_read_ids_3.txt --force-overwrite -o $od/1.pod5 -od=sample1/fast5_pass -mkdir -p $od -pod5 convert to_fast5 -f -o $od sample1/pod5_pass/*.pod5 - -od=sample1/fast5_fail -mkdir -p $od -pod5 convert to_fast5 -f -o $od sample1/pod5_fail/*.pod5 - -od=sample2/fast5_pass -mkdir -p $od -pod5 convert to_fast5 -f -o $od sample2/pod5_pass/*.pod5 - -od=sample2/fast5_fail -mkdir -p $od -pod5 convert to_fast5 -f -o $od sample2/pod5_fail/*.pod5 - # make another "sample2" dataset to test merging multiple runs # don't duplicate sample2 reads to avoid throwing an error when merging # pod5s diff --git a/.tests/run_local_tests.sh b/.tests/run_local_tests.sh new file mode 100755 index 0000000..994df8d --- /dev/null +++ b/.tests/run_local_tests.sh @@ -0,0 +1,65 @@ +#!/bin/bash +# Local test script for aa-tRNA-seq-pipeline +# This script runs the same checks as the CI pipeline locally + +set -e + +echo "========================================" +echo "Running local tests for aa-tRNA-seq-pipeline" +echo "========================================" + +# Check if conda/mamba environment is activated +if [[ -z "${CONDA_DEFAULT_ENV}" ]]; then + echo "Error: No conda environment is activated." + echo "Please activate the aatrnaseqpipe environment first:" + echo " mamba activate aatrnaseqpipe" + exit 1 +fi + +echo "" +echo "[1/5] Checking Snakemake installation..." +if ! command -v snakemake &> /dev/null; then + echo "Error: Snakemake is not installed in the current environment" + exit 1 +fi +echo "โœ“ Snakemake version: $(snakemake --version)" + +echo "" +echo "[2/5] Validating configuration files..." +python3 -c "import yaml; yaml.safe_load(open('config/config-base.yml'))" || exit 1 +python3 -c "import yaml; yaml.safe_load(open('config/config-test.yml'))" || exit 1 +python3 -c "import yaml; yaml.safe_load(open('config/config-preprint.yml'))" || exit 1 +echo "โœ“ All config files are valid YAML" + +echo "" +echo "[3/5] Checking sample files..." +if [ ! -f "config/samples-test.tsv" ] || [ ! -s "config/samples-test.tsv" ]; then + echo "Error: samples-test.tsv is missing or empty" + exit 1 +fi +echo "โœ“ Sample files validated" + +echo "" +echo "[4/5] Running Snakemake dry-run (syntax check)..." +snakemake -n --configfile=config/config-test.yml || exit 1 +echo "โœ“ Snakemake dry-run successful" + +echo "" +echo "[5/5] Checking required directories..." +for dir in workflow workflow/rules workflow/scripts config resources cluster; do + if [ ! -d "$dir" ]; then + echo "Error: Required directory $dir not found" + exit 1 + fi +done +echo "โœ“ All required directories exist" + +echo "" +echo "========================================" +echo "All local tests passed! โœ“" +echo "========================================" +echo "" +echo "To run the full pipeline test:" +echo " 1. Download test data: bash .tests/dl_test_data.sh" +echo " 2. Setup tools: snakemake setup_dorado dorado_model setup_modkit" +echo " 3. Run test: snakemake --cores 2 --configfile=config/config-test.yml" diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..2488a19 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,172 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is a Snakemake pipeline for processing Oxford Nanopore Technologies (ONT) aa-tRNA-seq data. The pipeline distinguishes between charged (aminoacylated) and uncharged tRNA molecules using Remora machine learning models trained on nanopore signal data over the CCA 3' end of tRNA molecules. + +## Setup and Environment + +### Initial Setup + +```bash +# Create conda environment +mamba env create -f workflow/envs/aatrnaseqpipe-env.yml +mamba activate aatrnaseqpipe + +# Download test data (first time only) +bash .tests/dl_test_data.sh + +# Install dorado and modkit (first time only) +snakemake setup_dorado dorado_model setup_modkit +``` + +### Running the Pipeline + +```bash +# Dry run with test config +snakemake -n --configfile=config/config-test.yml + +# Execute locally (specify cores) +snakemake --cores 12 --configfile=config/config-test.yml + +# Run on LSF cluster (uses cluster/lsf profile) +bsub < run-test.sh +``` + +### Cluster Execution + +The pipeline is optimized for LSF scheduler. Key files: +- `run-test.sh`: Test data execution on LSF +- `run-preprint.sh`: Full preprint data execution on LSF +- `cluster/lsf/config.yaml`: LSF-specific resource configurations + +GPU-intensive rules (rebasecall, classify_charging) automatically request GPU resources via LSF queue configuration. + +## Architecture + +### Snakemake Workflow Structure + +The workflow is modular with rules split across multiple files: + +``` +workflow/ +โ”œโ”€โ”€ Snakefile # Main entry point, includes all rule modules +โ”œโ”€โ”€ rules/ +โ”‚ โ”œโ”€โ”€ common.smk # Sample parsing, helper functions, outputs definition +โ”‚ โ”œโ”€โ”€ tool_setup.smk # Dorado and modkit installation +โ”‚ โ”œโ”€โ”€ aatrnaseq-process.smk # Core processing: pod5 merge โ†’ basecalling โ†’ alignment +โ”‚ โ””โ”€โ”€ aatrnaseq-summaries.smk # Summary statistics and output tables +โ”œโ”€โ”€ scripts/ # Python scripts called by rules +โ””โ”€โ”€ envs/ + โ””โ”€โ”€ aatrnaseqpipe-env.yml # Conda environment +``` + +**Key Architectural Details:** + +- **Sample Management**: `workflow/rules/common.smk` contains `parse_samples()` which reads `config/samples.tsv` and `find_raw_inputs()` which recursively searches for pod5 files in specified directories +- **Dynamic PATH Setup**: The Snakefile `onstart` handler dynamically adds dorado and modkit binaries to PATH based on configured versions +- **Output Aggregation**: `pipeline_outputs()` in `common.smk` defines all final output files for the `rule all` target + +### Core Processing Pipeline (aatrnaseq-process.smk) + +The main data flow through the pipeline: + +1. **merge_pods**: Merge all pod5 files per sample into single pod5 +2. **rebasecall**: Use dorado to rebasecall with move tables (required for Remora) +3. **ubam_to_fastq**: Extract reads from unmapped BAM to FASTQ +4. **bwa_align**: Align reads to tRNA + adapter reference with BWA MEM +5. **filter_reads**: Filter for full-length tRNA reads with proper adapter boundaries +6. **classify_charging**: Use Remora model to classify charged vs uncharged reads (adds ML tag to BAM) +7. **transfer_bam_tags**: Transfer alignment tags back to classified BAM + +### Summary Generation (aatrnaseq-summaries.smk) + +After classification, generates: +- Charging probability tables (ML tag values per read) +- CPM (counts per million) for charged/uncharged tRNA +- Base calling error frequencies +- Alignment statistics +- Remora signal metrics (if kmer table provided) +- Modkit modification calls and pileups + +## Configuration + +### Main Config Files + +- `config/config-base.yml`: Base configuration included by Snakefile + - Base calling model path + - Reference fasta + - Remora models and kmer tables + - Tool versions (dorado, modkit) + - Command-line options for tools (dorado, bwa, filters) + +- `config/samples.tsv`: Two-column TSV (no header) + - Column 1: Unique sample ID + - Column 2: Path to sequencing run folder containing pod5_pass/pod5_fail/pod5 subdirectories + +- `config/config-test.yml`: Overrides base config for test data + +### Important Config Parameters + +- **opts.bam_filter**: Controls full-length read filtering (`-5 24 -3 23 -s` requires 24bp 5' adapter, 23bp 3' adapter, positive strand) +- **opts.dorado**: Includes `--modified-bases pseU m5C inosine_m6A --emit-moves` for modification calling and move tables +- **opts.bwa**: RNA-optimized alignment parameters (`-W 13 -k 6 -T 20 -x ont2d`) +- **ml-threshold**: Currently hardcoded in `get_cca_trna_cpm` rule (200-255 = charged, <200 = uncharged) + +## Charged vs Uncharged Classification + +The pipeline uses Remora machine learning to classify charging state: + +- **Model Location**: `remora_cca_classifier` config parameter (resources/models/cca_classifier.pt) +- **Signal Region**: 6-nucleotide kmer spanning CCA 3' end + first 3 adapter bases (CCAGGC) +- **ML Tag**: Classification score stored in BAM ML tag (0-255 scale) +- **Threshold**: ML โ‰ฅ 200 = charged, ML < 200 = uncharged (adjustable in get_cca_trna_cpm rule) +- **Filtering**: Only full-length tRNA reads with proper 5'/3' adapters are classified + +## Development + +### Adding New Rules + +When adding new Snakemake rules: +- Place processing rules in `aatrnaseq-process.smk` +- Place summary/analysis rules in `aatrnaseq-summaries.smk` +- Add helper functions to `common.smk` +- Reference Python scripts should go in `workflow/scripts/` +- Update `pipeline_outputs()` if rule produces final outputs + +### Testing Changes + +```bash +# Always test with dry run first +snakemake -n --configfile=config/config-test.yml + +# Run specific rule +snakemake --configfile=config/config-test.yml + +# Force rerun of specific rule +snakemake --forcerun --configfile=config/config-test.yml +``` + +### Cluster Resource Configuration + +Modify `cluster/lsf/config.yaml` to adjust: +- Memory requirements per rule (mem_mb) +- GPU queue assignments +- LSF project tags +- Maximum concurrent jobs + +Rules requiring GPU (rebasecall, classify_charging) must set: +- lsf_queue: "gpu" +- lsf_extra: "-gpu num=1:j_exclusive=yes" +- ngpu: 1 + +## Important Notes + +- The pipeline requires Snakemake 8.0+ +- Dorado and modkit are installed by the pipeline (not via conda) to specific versions +- The pipeline tracks git commit ID for reproducibility (see `get_pipeline_commit()`) +- CUDA_VISIBLE_DEVICES is passed through to dorado if set +- Pod5 files are searched recursively in pod5_pass/pod5_fail/pod5 subdirectories +- The ML threshold for charging classification is currently hardcoded in the `get_cca_trna_cpm` rule diff --git a/README.md b/README.md index b5b2000..4027449 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,47 @@ # aa-tRNA-seq-pipeline -A pipeline to process ONT aa-tRNA-seq data built using snakemake. +[![CI](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/ci.yml/badge.svg)](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/ci.yml) +[![Lint](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/lint.yml/badge.svg)](https://github.com/rnabioco/aa-tRNA-seq-pipeline/actions/workflows/lint.yml) + +A Snakemake pipeline to process ONT aa-tRNA-seq data. Downstream analysis to generate figures for the initial preprint can be found at: [https://github.com/rnabioco/aa-tRNA-seq](https://github.com/rnabioco/aa-tRNA-seq) ## Usage -The pipeline can be configued by editing the `config/config.yml` file. The config file specifications will +The pipeline can be configured by editing the `config/config.yml` file. The config file specifications will run a small example dataset through the pipeline. To download these data files: ``` -git clone https://github.com/rnabioco/AAtRNAseqPipe.git +git clone https://github.com/rnabioco/aa-tRNA-seq-pipeline.git -# download test data +# download test data bash .test/dl_data.sh ``` Set up a conda environment: ```bash -mamba env create -f environment.yml -conda activate aatrnaseqpipe +mamba env create -f workflow/envs/aatrnaseqpipe-env.yml +mamba activate aatrnaseqpipe +``` + +Set up the dorado and modkit resources. This will install the tools in the `resources/tools` directory, +so only need to be done once during the first run of the pipeline. + +``` +snakemake setup_dorado dorado_model setup_modkit ``` Test the pipeline by invoking a dry-run snakemake in the pipeline root directory: ``` -snakemake -n -c 1 -p +snakemake -n --configfile=config/config-test.yml ``` ## Configuration -To use on your own samples, edit `config.yml` and `samples.tsv` in `config/`. +To use on your own samples, edit `config.yml` and `samples.tsv` in `config/`. See [README.md in the config directory](https://github.com/rnabioco/aa-tRNA-seq-pipeline/tree/main/config) for additional details. @@ -53,7 +63,3 @@ A few notes about Remora classification for charged vs. uncharged tRNA reads ## Cluster execution The pipeline includes a `run.sh` script optimized for the LSF scheduler. For more details on configuring for HPC jobs, see `cluster/config.yaml`. - -## Notes - -The dorado basecaller can be installed using pre-built binaries available from [github](https://github.com/nanoporetech/dorado?tab=readme-ov-file#installation). The conda `environment.yml` installs dorado 0.7.2 from an unsupported (by ONT) channel. diff --git a/cluster/config.yaml b/cluster/generic/config.yaml similarity index 79% rename from cluster/config.yaml rename to cluster/generic/config.yaml index 31c2a69..7e801c6 100644 --- a/cluster/config.yaml +++ b/cluster/generic/config.yaml @@ -28,12 +28,14 @@ set-resources: - rebasecall:gpu_opts="-gpu num=1:j_exclusive=yes" - rebasecall:ngpu=1 - rebasecall:mem_mb=24 - - cca_classify:queue="gpu" - - cca_classify:gpu_opts="-gpu num=1:j_exclusive=yes" - - cca_classify:ngpu=1 - - cca_classify:mem_mb=24 + - classify_charging:queue="gpu" + - classify_charging:gpu_opts="-gpu num=1:j_exclusive=yes" + - classify_charging:ngpu=1 + - classify_charging:mem_mb=24 - remora_signal_stats:mem_mb=24 - bwa_align:mem_mb=24 + - modkit_extract_calls:mem_mb=96 + - modkit_extract_full:mem_mb=48 printshellcmds: True diff --git a/cluster/lsf/config.yaml b/cluster/lsf/config.yaml new file mode 100644 index 0000000..7b45b59 --- /dev/null +++ b/cluster/lsf/config.yaml @@ -0,0 +1,30 @@ +executor: lsf +jobs: 300 + +# N.B. GB values should be passed to mem_mb +default-resources: + - 'mem_mb=8' + - 'lsf_queue=rna' + - 'lsf_project=aatrnaseq' + - 'lsf_extra=""' + +resources: + - ngpu=12 + +# set rule specific requirements +set-resources: + - rebasecall:lsf_queue="gpu" + - rebasecall:lsf_extra="-gpu num=1:j_exclusive=yes" + - rebasecall:ngpu=1 + - rebasecall:mem_mb=24 + - classify_charging:lsf_queue="gpu" + - classify_charging:lsf_extra="-gpu num=1:j_exclusive=yes" + - classify_charging:ngpu=1 + - classify_charging:mem_mb=24 + - remora_signal_stats:mem_mb=24 + - bwa_align:mem_mb=24 + - modkit_extract_calls:mem_mb=96 + +printshellcmds: True +show-failed-logs: True +latency-wait: 15 diff --git a/config/README.md b/config/README.md index 8b897ec..1343153 100644 --- a/config/README.md +++ b/config/README.md @@ -6,7 +6,8 @@ Edit config.yml to specify the following parameters. TSV file should have (1) a unique id for the sample and (2 a path to the sequencing run folder which has `pod5_pass`, `pod5`, `pod5_fail`, `fast5_pass`, or `fast5_fail` subdirectories containing raw data. The pipeline will recursively search for POD5 files to process within the specified directory. -- `output_directory`: A path to an output directory for files produced by pipeline. + - a unique id for the sample + - a path to the sequencing run folder with `pod5_pass` and `pod5_fail` subdirectories containing raw data. - `base_calling_model`: Path to the dorado basecalling model to use for rebasecalling. We use `rna004_130bps_sup@v5.0.0` for now, will evaluate newer model soon. diff --git a/config/config-base.yml b/config/config-base.yml index ccc2014..34093cc 100644 --- a/config/config-base.yml +++ b/config/config-base.yml @@ -2,10 +2,7 @@ # this is included in the main Snakefile # either a path to a basecalling model to use with dorado or a model selection name to specify model to download and use -base_calling_model: "resources/models/rna004_130bps_sup@v5.0.0" - -# either FAST5 or POD5, if FAST5 then these files will be converted to pod5 before rebasecalling -input_format: "POD5" +base_calling_model: "resources/models/rna004_130bps_sup@v5.1.0" # path to fasta file to use for bwa alignment. # a BWA index will be built if it does not exist for this fasta file @@ -18,10 +15,17 @@ remora_kmer_table: "resources/kmers/9mer_levels_v1.txt" # read classification model - remora trained model to classify charged vs uncharged reads remora_cca_classifier: "resources/models/cca_classifier.pt" +# software tools +dorado_version: 0.9.1 +dorado_model: rna004_130bps_sup@v5.1.0 +modkit_version: 0.4.3 + # additional options for particular commands opts: - # additional options for dorado basecalling e.g - dorado: " --emit-moves " + # additional options for dorado basecalling + # XXX place modified bases first as the arg parser gets confused + # XXX add `-v` for verbose logging + dorado: " --modified-bases pseU m5C inosine_m6A --emit-moves " # additional options for bwa alignment # based on Novoa lab optimising bwa for tRNA alignment diff --git a/config/config-test.yml b/config/config-test.yml index 50f4ca2..6fed93f 100644 --- a/config/config-test.yml +++ b/config/config-test.yml @@ -11,4 +11,4 @@ samples: config/samples-test.tsv # output directory for files produced by pipline -output_directory: ".test/outputs" +output_directory: ".tests/outputs" diff --git a/config/samples-test.tsv b/config/samples-test.tsv index 3ed42bf..36dc4bb 100644 --- a/config/samples-test.tsv +++ b/config/samples-test.tsv @@ -1,3 +1,3 @@ -sample1 .test/sample1 -sample2 .test/sample2 -sample2 .test/sample2_1 +sample1 .tests/sample1 +sample2 .tests/sample2 +sample2 .tests/sample2_1 diff --git a/resources/ref/sacCer3-mito/convert-mt.py b/resources/ref/sacCer3-mito/convert-mt.py new file mode 100644 index 0000000..a3e15b6 --- /dev/null +++ b/resources/ref/sacCer3-mito/convert-mt.py @@ -0,0 +1,17 @@ +#! /usr/bin/env python + +from Bio import SeqIO +import sys +import pdb + +adaptor5 = "CCTAAGAGCAAGAAGAAGCCTGGN" +adaptor3 = "CCAACCTTGCCTTAAAAAAAAAA" + +for record in SeqIO.parse(sys.argv[1], "fasta"): + if "pseudo" in record.description: continue + + fs = record.description.split(" ") + name = f"mt-{fs[3]}-{fs[4].replace('(','').replace(')','')}" + seq = adaptor5 + str(record.seq) + adaptor3 + + print(f">{name}\n{seq}") diff --git a/resources/ref/sacCer3-mito/make-ref.sh b/resources/ref/sacCer3-mito/make-ref.sh new file mode 100644 index 0000000..0f09917 --- /dev/null +++ b/resources/ref/sacCer3-mito/make-ref.sh @@ -0,0 +1,3 @@ +bioawk -c fastx '{OFS=""} /chrM/ {print ">"$name,"\n",$seq}' ~/ref/genomes/sacCer3/sacCer3.fa > sacCer3.chrM.fa +tRNAscan-SE -a sacCer3.chrM.trnas.fa sacCer3.chrM.fa +python convert-mt.py sacCer3.chrM.trnas.fa > sacCer3.chrM.trnas.adapted.fa diff --git a/resources/ref/sacCer3-mito/sacCer3.chrM.fa b/resources/ref/sacCer3-mito/sacCer3.chrM.fa new file mode 100644 index 0000000..25157a1 --- /dev/null +++ b/resources/ref/sacCer3-mito/sacCer3.chrM.fa @@ -0,0 +1,2 @@ +>chrM +TTCATAATTAATTTTTTATATATATATTATATTATAATATTAATTTATATTATAAAAATAATATTTATTATTAAAATATTTATTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAATTATTAATTAATAATAAATTATTATTAATAATTATTTATTATTTTATCATTAAAATATATAAATAAAAAATATTAAAAAGATAAAAAAAATAATGTTTATTCTTTATATAAATTATATATATATATATAATTAATTAATTAATTAATTAATTAATAATAAAAATATAATTATAAATAATATAAATATTATTCTTTATTAATAAATATATATTTATATATTATAAAAGTATCTTAATTAATAAAAATAAACATTTAATAATATGAATTATATATTATTATTATTATTAATAAAATTATTAATAATAATCAATATGAAATTAATAAAAATCTTATAAAAAAGTAATGAATACTCCTTTTTAAAAATAAAAAGGGGTTCGGTCCCCCCCCTTCCGTATACTTACGGGAGGGGGGTCCCTCACTCCTTCTTAATTAAATTATCTTAATTAAATTATCTTAATTAAATTATCTTAATTAAATTATCTTAATTAAATTATCTTAATTAAATTAAAAGGGGACTTTATATTTATAAAGTAATTATATTATTATTATTATTATTATTTATTTATTTTATTTTTATTATTTTATTATATATATTATATATTAATACAGATAGAAGCCAAAAGGTCAGGCGCTTTCTTTGGGAGAAAGACCTAGTTAGTTCGAGTCTATCCTATCTGATAATAATTTAATTAACCATTAAAAAAAAGTATATATATTTATCATAATATATTAAATTTTATTACATTACAAATGAACACTTTTATTTATATTTATAAAAATATGAACTCCTTCGGGGTCCGCCCCGCGGGGGCGGGCCGGACTCCATATTATTATTATTATAATTATTATTATAATTATTATTATAATTATTATTATAATTATTATTATAATTAAAGAGTTTTGGATACCAATATGATATAATATGATATAGGACCGAAACCCCTCATTTTATCATTTATTTATAATATTATAAATAAAAAAAAATATTATATATTATAATAAAATTAATATCATAATATATTATATTATATATTATATTATATATATATATATATATATTCTTTTATAAAATTTATATTCTTCTTATTAAAATTAAAAAGGGAGCGGACTTTTAATTATATTTAATTATAGTTTTTAATCATTGGTTGAGATTTCAAAATAAGGTATAATATTTATATTATTCTTTAACAAATATTATATTATAAAAAAAGATATAATATTTATATTATTCTTTAACAAATATTATATTATAAAAAAGATATAATATTTATATATTATTATTAATATTATTTTTAAGTTCCGAAAGGAGAAACTTATAATTTTTATATCATTATTTATTATTATTTTTAATTTCAACTCCTTTTAGGTATTTCCATTTAACTTTCAGCAGAGACTTTCTAATTATAATTATATATATATAAATTTAAATACATTTATAAAAAAGTATATAATATAATTATATTATATATAATAATATTATTAAATGAAGTATTCTTTATTATTAATTATAGGATATCTGGGGTCCATTAATAATTATTATTGTAAATAATAATAAGGACCCCCCCCATTATCTAATTAATAAATATATAAATAATCATTAATAAATATATTAATAATTATTAATAAATATATAAATAATCATTAATAAATATATAAATAATATAATATATTATAAAAATATAATAATAATAATTTATTATTAAAATATAATAATTTATTATAAAAATATAATAATTTATTATAAAAATATAATAATAACTCCTTTCGGGGTTCACACCTTTATAAATAATAAATAATAAATAATAAATAATAAATAATAAATATTAGTATTCACTAATATAAAATAATAATTATAAAAATAATCATTATTAAAAATATTATTAATTATTAAATTAAATACAATTAATATAATTTAGTTGTTTATATAATTTTAAATAATGTTTATATCAATTTAATAAAATTAAATTTATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTTTATCTATATATTATAATAACTATATGAATTTAATTATTAAAAATAATAAAAATAAGGAATTTTAATAAGAAGTAATATTTATTATATAATATATAAAAAAAATATATATATATATATAAAAATATATATAATAAGTTTTATTATAATATATATTAAATTAATTATTATGAGGGGTTCGGTCCCTTTCCGGGCCCCAATTCATCTCATCTCATTTTATTTCATTTCAATATCATCTAATCTCATTTCTTTATAGATTTTACATATATATAAATATAAATATAAGATATTCACATTTATATATAATATAATATAATATAATAGATATTCATTCCTCTTTGATTAAACTAATAATTAATAATTAATAATTAATAATTAATAATTAATAATTATTCAGTAGAACTCCTTCTTAAAAAGGGGTTCGGTCCCCCTCCCATTAGTATAGTATAGGGAGGGGTCCCTCACTCCTTCGGGGTCCGCCCCGCAGGGGGCGGGCCGGACTATTATTAAATAATTTATAATTTATTATTTATTAATATATTTATATAATATAATATAATATAATATTATTCATACTTTTTATTAATATAATATAATATAATATTATTAATACTTTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTATTAATATAAAGAAAAGAGTTTCAATTATTTATTTATTTATTTATTTTTTATAAAAATAAGTCCCCGCCCCGGCGGGGACCCCGAAGGAGTATTAATTTAAATAATTTATTTAATGAAATTATTAATTATAAATAAAAATAATAATTTTTAAAGATGTAATATAAAAATAAATATAATATAATTTAGGATAATTATATAAAATATTTATTATATATAGTTTTTATAAAGAGTTTTAAAAGTGATAATATAATATATAATATTTATAAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTTATTTATATATATATAATTATAATCTTATTAATTATTTATATATATATTTAATATTATTTTTATATAATTTTATATTAAAGTATTATAATTATATATTTAATATTATTTTTATATAATTTTATATTATTTATTTATTTATTTATTTATTTAAAAATATTATAATCATATATTTAATATTATTTAATATATTTTATATATTATATCTTTTATTGATTTATATATATATAGATTTAATAAATATATATATATATATATATATAAATATTCATTATATATTTATTATTATTATTATTATTTATTACTATTTTTTATTATATATTAATAATATATATATTATTAGTTATGGGTATCCTAATAGTATATTATTATTTTTAATAATAATTTATGATTTATGTATAATAAATAAGTAGGGAATCGGTACGAATATCGAAAGGAGTTATATATTATTAATTATTTATAATTATTTTATATATTATTAATTATTTATAATTATTTTATATATTTATAATTATTTTATATAGATAGGTTAGATAGGATAGATAGTATAGATAGGGGTCCCATTTATTATTTACAATAATAATTATTAATGGGACCCGGATATCTTATTGTTATTAATTTATATATTATTCATTATTATTAATATATATTTAATATAATTAAATATTATATTATATTATATTATATTATTTATTAAAAAAAAATCTATTACTTATTTTTTTTATTAATATATAAATTATTTATATAATTTATCATTTTTATTTATATATTATTATTTTTTATATATAAATTAATATATATATATATTATATATACTTTTTTTTTTATAATATATCTATATATATAAATAAATATATTATATTATATTTTTATATAATATATTATTAATTATTATTTTAATTTTCTATTCTATTGTGGGGGTCCCAATTATTATTTTCAATAATAATTATTATTGGGACCCGGATATCTTCTTGTTTATCATTTATTATTTTATTAAATTTATTATTATTTTTAATTTATATTTATATTATATAATTAATTATATCGTTTATACTCCTTCGGGGTCCCCGCCGGGGCGGGGACTTTATATTTTATTATATAATATATTATATTCTTATAATATATTTATTGATTATGTTATAAAATTTATTCTATGTGTGCTCTATATATATTTAATATTCTGGTTATTATCACCCACCCCCTCCCCCTATTACGTCTCCGAGGTCCCGGTTTCGTAAGAAACCGGGACTTATATATTTATAAATATAAATCTAACTTAATTAATAATTTAAATAATATACTTTATATTTTATAAATAAAAATAATTATAACCTTTTTTATAATTATATATAATAATAATATATATTATCAAATAATTATTATTTCTTTTTTTTCTTTAATTAATTAATTAATTAATATTTTATAAAAATATATTTCTCCTTACGGGGTTCCGGCTCCCGTAGCCGGGGCCCGAAACTAAATAAAATATATTATTAATAATATTATATAATATAATAATAATATAATAATTTTATATAAATATATATTTATATATTAAATTAAATTATAATTTTATTATGAAAATTATATCTTTTTTTTATATTTTTATATAATAAAAATATGTTATATATATATTAATAATAAAAGGTAGTGAGGATTAAATAAATTATATAATAATTATAACTCTTAATTATAAAATAAATATATATATATATATAAGTATCCATTTCCATATAATCTTTTAATAAATATTAATAAATATTAAAAAAAAATAATATTATAATATTTTAGTATATAATTCAATAAAATTCATTGGAGGGGTAAATAATAATAATTTACTAATGGCAAGTTATAGTCTTAAAGGTTTTTATTTTTTTTATTAAATTAATAAAATAATAATACCATTTATATATTCCATTATATATATATATTTAATAAAAATAATAATATCATTTATATATTTTATTATATATTATATATATTTTATATAAAATAATAATAATAAATTTATATTTTTATATATTATTATTAAATAATAATAATATAAATAACTCCTTCGGGGTTCGGTCCCCACGGGTCCCTCACTCCTTCTTAAGAATAAAAAGGGGTTCGGTCCCCCTCCCGTTAGTACACGGGAGGGGGTCTCTCACTCCTTCTTAAAAAATAAAAAGGTGGAAGGACTAATATAATTTTAAATAATAATTAATACTTTAATAATAATTTGTATTTCTTTATTATTAATATATTAAATATAATAATAATTAATATAATTACAATATATTAATATTATCAAATATTAATAAATATACTTTTTTATATAATTTATTTATTTATTTATTTTTTTTTTATTAAACTAATTATAATTGTAATTTCGAAAAGGGGGTGGGAGTAAACATATATAATTTATAATCTATATATATATATATATAATTTTTTAATAAATATTAATAAATATTTATAAAAAGAATAATTTATATTTATAATATATAATTTATATATTTTATTTTTATTATACAATTAATATAAAATATAAAATATTAAATATTAAATATTAAATATTAAATATTAAATATTAATTTTTATAGGGGTTATATAATAATTATATTTATAATTATATAATATTAAAAAGGGTATTTTTATAATTATTACATTTTTATTTTATTTATAAAAATATTAATTTTAATAAGTATTGAATACTTTATATAATATAAATATTAATTACATAATTAATAATTAAATAATATTTAATAATATTATTTAAATTTATTATTTATAATTATTTATTTATAAAATTCTATTTTTATTATTATTATTTTTATTTTATTATTAAAGATTAATATAATAATTATTAATATATTAAAAATCTTTTATTATATTAATATTTATAAAAAAGTATTTAATAAAAAAGATGTATAAATTTATAAATTATATAATATTATTAATTTATATAATAATAATATTATAACTTTGTGATTGTCAATTTAGTTAATCATTGTTATTAATAAAGGAAAGATATAAAAAATATTCTCCTTCTTAAAAAGGGGTTCGGTTCCCCCCCGTAAGGGGGGGGTCCCTCACTCCTTTGGTCGGACTCCTTCGGGGTCCGCCCCGCGGGGGCGGGCCGGACTAATTTAACTTTTAATATTAATATTAATATTATTTATATTTTTAATATATAAAAATAAATAATTTTATTTTTATTAATAGTATATTATATAAACAATAAAATAGTATTAATTATATAAAATTTATATAAAATATATATAAATTTATTATATATATATATATTAATATTTTAATAAAGTTTTTATTATAAATTTATTTATTTATTTATTATAATATTAATAATTTATTTATTATTATATAAGTAATAAATAATAGTTTTATATAATAATAATAATATATATATATATATATTATTATATTAGTTATATAATAAGGAAAAGTAAAAAATTTATAAGAATATGATGTTGGTTCAGATTAAGCGCTAAATAAGGACATGACACATGCGAATCATACGTTTATTATTGATAAGATAATAAATATGTGGTGTAAACGTGAGTAATTTTATTAGGAATTAATGAACTATAGAATAAGCTAAATACTTAATATATTATTATATAAAAATAATTTATATAATAAAAAGGATATATATATAATATATATTTATCTATAGTCAAGCCAATAATGGTTTAGGTAGTAGGTTTATTAAGAGTTAAACCTAGCCAACGATCCATAATCGATAATGAAAGTTAGAACGATCACGTTGACTCTGAAATATAGTCAATATCTATAAGATACAGCAGTGAGGAATATTGGACAATGATCGAAAGATTGATCCAGTTACTTATTAGGATGATATATAAAAATATTTTATTTTATTTATAAATATTAAATATTTATAATAATAATAATAATAATATATATATATAAATTGATTAAAAATAAAATCCATAAATAATTAAAATAATGATATTAATTACCATATATATTTTTATATGGATATATATATTAATAATAATATTAATTTTATTATTATTAATAATATATTTTAATAGTCCTGACTAATATTTGTGCCAGCAGTCGCGGTAACACAAAGAGGGCGAGCGTTAATCATAATGGTTTAAAGGATCCGTAGAATGAATTATATATTATAATTTAGAGTTAATAAAATATAATTAAAGAATTATAATAGTAAAGATGAAATAATAATAATAATTATAAGACTAATATATGTGAAAATATTAATTAAATATTAACTGACATTGAGGGATTAAAACTAGAGTAGCGAAACGGATTCGATACCCGTGTAGTTCTAGTAGTAAACTATGAATACAATTATTTATAATATATATTATATATAAATAATAAATGAAAATGAAAGTATTCCACCTGAAGAGTACGTTAGCAATAATGAAACTCAAAACAATAGACGGTTACAGACTTAAGCAGTGGAGCATGTTATTTAATTCGATAATCCACGACTAACCTTACCATATTTTGAATATTATAATAATTATTATAATTATTATATTACAGGCGTTACATTGTTGTCTTTAGTTCGTGCTGCAAAGTTTTAGATTAAGTTCATAAACGAACAAAACTCCATATATATAATTTTAATTATATATAATTTTATATTATTTATTAATATAAAGAAAGGAATTAAGACAAATCATAATGATCCTTATAATATGGGTAATAGACGTGCTATAATAAAATGATAATAAAATTATATAAAATATATTTAATTATATTTAATTAATAATATAAAACATTTTAATTTTTAATATATTTTTTTATTATATATTAATATGAATTATAATCTGAAATTCGATTATATGAAAAAAGAATTGCTAGTAATACGTAAATTAGTATGTTACGGTGAATATTCTAACTGTTTCGCACTAATCACTCATCACGCGTTGAAACATATTATTATCTTATTATTTATATAATATTTTTTAATAAATATTAATAATTATTAATTTATATTTATTTATATCAGAAATAATATGAATTAATGCGAAGTTGAAATACAGTTACCGTAGGGGAACCTGCGGTGGGCTTATAAATATCTTAAATATTCTTACATAAATATTAATCTAAATATTAATATAAATATTAATATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAATATTAATATAAATATAAATATTAATATAAATATAAATATAAATATAAATATATTTTAATATAATATAATATAATATATAATATATTATATAAATATAATATATAAATAATATAATAAAATATTTTAATATATATATAATATAATATAATTATTATTATAATTTAATATAAATTATTATTATAATTTAATATAATAAATAAATAAATAATTATAATTATAATTATAATTATAATCTCAATATATAAATGATAAATTATTATAAATACAAAGGAAATAATTGATTTTTAAAATATATTTAATAAAATATATAATATAAATTATACTTTTTTTGTTATTATATAATAATTATATTAATATATTTAATAGAATTAAACTCCTTCGGCCGGACTATTATTCATTTTATATATTAATGATAAATCATTAATTATTATTAATAAATTTATTTATAATATTTAATTTTATATATTATTATTTATAATAAAAAAAATTATATTATAACAATTTAATTTTAATTTTTATTTTTAAATTATAAAATTAATAATTTATTTGTTTAAATAAAATTTATAACTCCTTCGGGGTTCGGCCGGACTATTAATATAAATAAATAATAAATATTTATAATAAAATAATATACATCTTCTTTAAATAAAAAAAGGGGACATTATAAATAGTATATAAATATATTATATCTTTTTTATTATTATTATTAATAAATAATAATAATAATTTATATATTTATAATATATTTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAATGTATTATAATTATTACATATAATTATTATTATTCACTTCTTATTAAAAATAATACTCTATATAATTTATATAATTTATTTTAATATATATATATTTATATATAATATAATATATATATTTATTTATTATAATCATTTTTTTTTAACTTAAAATAAAACTTATTATAATTTATATAATTTATAATTTTTATATAAAAATAATTATATAATTTTTATTTATTTATATAATAATAATATTATTTGTTATATATTATATATTATATATATAATAAATAAATAAATAATAAATAATAATAATAAGGATATAGTTTAATGGTAAAACAGTTGATTTCAAATCAATCATTAGGAGTTCGAATCTCTTTATCCTTGATAATAATAATAAAAATATGTATTTATTTAATTATTTTAATATTTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTATTAATATAATATAATATAATATAAATATTCATTTATCTTTTTTTTAATATTCTTAATTAATTAATTAATTAATATATTAATTATAAAAAATATATTATAATTTTATTATTAATAAGTATAAATATATTATTAATAATAATTTATTAAAAATATATTATTATAATATATTAATATATCATAATTATAATCAATATTATATTATTTAATTTTATAATACTTAATTATTAATATATTATTCATATATATATAAATTAAATTAAATTAATTATATTGAATATATAAATATATATATATATAAATATATAAAAAATTATATAAATTATTTTAAGTAAAAATAATATTAATAAAAATTATACAATAATAATAATAAATATTCATTATTATTTAATTAATATCTCCTTTACTTCTTTTTCCTCCGTTGAGGACTTATTATTAAGTATATTATTATATACTACTTAAGATTATATATATAATATATATATATATATTATATATAAAATATAAATATATAAATAATATAAAAATTAATAAAATAAATAAAATAAATTAGTCCGATCGAATCCCCTATTTAATTAAATTAAATTAAATTAAGAAAGAGATAAATTTATATAAAATATTATTTATAATTAATTATAATTAAATTATAATATAATATAATATAAATAATAATATAATAAAAATAAAAATAAAATAATATTAGATTATATTATATAATTTATATAATTTTTTAATAATAATAATAAATAAGTTTATTTATAATTATAAATATAAATATAAATATAAATAAAGAAGGTATTATATTTTATAAAATATAATAATAATACAAAATTTATATTTTAATAAATATTAATATAAGTTTAAAGTTCCGGGGCCCGGCACGGGAGCCGGAACCCCGAAAGGAGAAATAAATAATATATTTATAAAAAATTAAATAAATAAATATTATCTATTTAAAAATAAATATAATATAATATAATATAATAATTCTAAATATAAATAATATTTATTATAATTATTATAATAATTGTATTATTTATTAATAATATATATAATTATATTAAAACTAATATTACATTATTTTGTATATTTAAACAATTAAATTGATTATTCTTATTTGTAATCTTTATTTATTTTATTATATCTTATTAATGATAAATTATAATTATTATTAAAATAATAATTTACTTCTTTTGATATAAAAATAAAATAATATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGGAAGGAGATAAATATATTATATTTTTATTCCTACCTATTAAAGGTAAAGACTCGATTCTCATAATTAAATTTATATCCTTCGGCCGGATTAATTTATTTTATTTATATTTATATTTATAGTGAATACCTTTTTTAATATTTATTTTTAATATTTATTTTTAATATTTTATTTTTAATAAAATATAATCTTGTAAGTAAGAAAAGAATTTCGGTGATTGGAACCTTGAAAGGATAAATTTCTTATTTATTATAATATTTATATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTATTATTAAACATTTAATATATTATATTAATATTTAATTTAAATGATTAATATATTATTATAATAATATTTATTTTATATTAAAATATTATAATTAATATATATATATTTATTTTAATAATATTATTATTATTATTATTAAAATTATTATTTTTATAAATATATATATATATATATATATATTATTTTTATTCTTATATAAATTATATAAAAAAAATATATATAATATATAATTAATTAATATATATTATTTAAATTATATATTATTTAAAATACTTTTTATATTATATCTTCTTTAAATTAAAATATAATTATTATTTATATTATAATTATTTATGAAATATTATTATTAAAATAAAAAAGAGGTTTAGACTATATATTTATTATTTATAAACTTATTATATTATTTATTATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAATAAATAAAATAAAAAATAATAAATATTAATATTATTAAATATTATTTATAATAAATATTAATATTATTAAATATTATTCATATTAATAAATTTTATTATTATTTGTAATATATTAAATATTAATAATATATATATTATTTATTATAATGAAAACCTATCCTATATTATCCTATCATATAATATCATATCATATTATATTATATCTTATTATATGATATATAAAGTATTCACTCTATATGAGGTTATGATTATTATATAAATCTTATTTTATTTTTATTTTTATTTGGACTAATAATAATTATAATAATAATTATTGATATGTTCTAATATTAATAAATACATATTTATATTATAATATAAATATTCATTTCTTACTAATTAATAAAAAGTTTTTATATTCATTATAATATAAATATATAAATATATATAAATATTTTAATAATTATAATTATATTAAGATATTATAAATATATATTTATTTTTTTTTATAAAATAAATAAATAAATAAATAATTAATATTTTTATATTATAACTTATTTTTATAATAATAATAAGTATTTTATTTTTTATTATATTATTATTTATATAATTATATATATATTAATTTCAATTTAATTAATTAATTAATTGGTATTTGGCATATAATATCAATTAATTGTAATTCTTATAAGAATTAATTAATTAATATGCTTTTTATATAATTTATACTTTTATATTTCTCCTTCCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTATTATTATTATTTTTATTTATTTATTATTAAAATATAATAATAAATAGTCCGGCCCGCCCCGCGGGGCGGACGCCGGAGGAGAATTATATTTTTATATAATAATTTATATTTCTATATATATATATATATATTATATATAAATATTATTATATATATTTTTATATATATTATAATTATATTCATTAATATTTTATTATAGTGGTGGGGTCCCAATTATTATTTTCAATAATAATTTATCATGGGACCCGGATATCTTCTTGTTTTTATTTATTATTTTATTAAATTTATTTTAATTATTTATTTATAATTTATATTATACAATTTATTATTTCGTTAATACCTTTATTTATATTATATAATATATTATATTATTATAATATATTTATTGATTATATTAATACATTTAACTAATGTGTGCTCTATATTTATTGAATAGTTTGGTTCTTATCACCCACCCCCTCCCCCTATTACGTCTCCGAGGTCCCGGTTTCGTAAGAAACCGGGACTTATATATTTAATACTAAAAATATAACTACATTACTTTTTTAATATATATAACAATATATATATATATATATATTAATTATATAAAATATAATACTCTATATTAAATATTATTTTTATCAATATTTATTTATATATATAATAATAATAATAATAATCAATATTAATTATTTATATATATAAGATTAATATTATTTAATATATTATGAATAATTTAATTAATAAATCTTTAAATATTATCATAAAAATATAAATTAAATAATTTCTTATTTATAATAAAGAATAATAATATATATAAATATAATAAAGAATGTAAATAATATATATATAATATAATATAATATAAAAAATATATATATATATAAATATATATATAATATATAGATAATAATATTTTTATATAATTTATTTTATTATTAAGTAATAAATAATAAAAAAATCAATATATTAAATAATATATTTATATTAGTTCGGTTTAGTTGGTATTTTGTAATGAGTAAAAAGTAATATATAATATTAAATAATAAGTATTGATATAAGTAATAGATATAATAATAATATTATTAATATTTTATATAAATAATATTAATAATATAGATTATGAAAGAGAGTATTAATATCATTAAATATATATATATGTTATATAATTTAAATGATTTTAATATATATATATATATTATATTATAGATTATGATACATTTATATAAATAATATATATATAAAAATTAATTATACTATTACTTTATAATATAATAATATTTATTTATAAAGATATAAAAGAATTGTTTAAAGTTATAACTAAAATATTATATAGTATTCATTAATAATTAATATTATAAATTCAACTATTGTTATATTTATAAATAGAATAATATATTATTATCCTTTAAGATATAACAATAATTATTTAAATTAAATTAAATTAAATTTAATTAATTTTTTTTTTTAATGAATATAATAATAATAATATTATTAAAATTAATATATAAAAAAAAAGTAAAAATGGTACAAAGATGATTATATTCAACAAATGCAAAAGATATTGCAGTATTATATTTTATGTTAGCTATTTTTAGTGGTATGGCAGGAACAGCAATGTCTTTAATCATTAGATTAGAATTAGCTGCACCTGGTTCACAATATTTACATGGTAATTCACAATTATTTAATGGTGCGCCTCTCAGTGCGTATATTTCGTTGATGCGTCTAGCATTAGTATTATGAATCATCAATAGATACTTAAAACATATGACTAACTCAGTAGGGGCTAACTTTACGGGGACAATAGCATGTCATAAAACACCTATGATTAGTGTAGGTGGAGTTAAGTGTTACATGGTTAGGTTAACGAACTTCTTACAAGTCTTTATCAGGATTACAATTTCCTCTTATCATTTGGATATAGTAAAACAAGTTTGATTATTTTACGTTGAGGTAATCAGATTATGATTCATTGTTTTAGATAGCACAGGCAGTGTGAAAAAGATGAAGGACCTAAATAACACAAAAGGAAATACGAAAAGTGAGGGATCAACTGAAAGAGGAAACTCTGGAGTTGACAGAGGTATAGTAGTACCGAATACTCAAATAAAAATGAGATTTTTAAATCAAGTTAGATACTATTCAGTAAATAATAATTTAAAAATAGGGAAGGATACCAATATTGAGTTATCAAAAGATACAAGTACTTCGGACTTGTTAGAATTTGAGAAATTAGTAATAGATAATATAAATGAGGAAAATATAAATAATAATTTATTAAGTATTATAAAAAACGTAGATATATTAATATTAGCATATAATAGAATTAAGAGTAAACCTGGTAATATAACTCCAGGTACAACATTAGAAACATTAGATGGTATAAATATAATATATTTAAATAAATTATCAAATGAATTAGGAACAGGTAAATTCAAATTTAAACCCATGAGAATAGTTAATATTCCTAAACCTAAAGGTGGTATAAGACCTTTAAGTGTAGGTAATCCAAGAGATAAAATTGTACAAGAAGTTATAAGAATAATTTTAGATACAATTTTTGATAAAAAGATATCAACACATTCACATGGTTTTAGAAAGAATATAAGTTGTCAAACAGCAATTTGAGAAGTTAGAAATATATTTGGTGGAAGTAATTGATTTATTGAAGTAGACTTAAAAAAATGTTTTGATACAATTTCTCATGATTTAATTATTAAAGAATTAAAAAGATATATTTCAGATAAAGGTTTTATTGATTTAGTATATAAATTATTAAGAGCTGGTTATATTGATGAGAAAGGAACTTATCATAAACCTATATTAGGTTTACCTCAAGGATCATTAATTAGTCCTATCTTATGTAATATTGTAATAACATTGGTAGATAATTGATTAGAAGATTATATTAATTTATATAATAAAGGTAAAGTTAAAAAACAACATCCTACATATAAAAAATTATCAAGAATAATTGCAAAAGCTAAAATATTTTCGACAAGATTAAAATTACATAAAGAAAGAGCTAAAGGCCCACTATTTATTTATAATGATCCTAATTTCAAGAGAATAAAATACGTTAGATATGCAGATGATATTTTAATTGGGGTATTAGGTTCAAAAAATGATTGTAAAATAATCAAAAGAGATTTAAACAATTTTTTAAATTCATTAGGTTTAACTATAAATGAAGAAAAAACTTTAATTACTTGTGCAACTGAACTACCAGCAAGATTTTTAGGTTATAATATTTCAATTACACCTTTAAAAAGAATACCTACAGTTACTAAACTAATTAGAGGTAAACTTATTAGAAGTAGAAATACAACTAGACCTATTATTAATGCACCAATTAGAGATATTATCAATAAATTAGCTACTAATGGATATTGTAAGCATAATAAAAATGGTAGAATAGGAGTGCCTACAAGAGTAGGTAGATGACTATATGAAGAACCTAGAACAATTATTAATAATTATAAAGCGTTAGGTAGAGGTATCTTAAATTATTATAAATTAGCTACTAATTATAAAAGATTAAGAGAAAGAATCTATTACGTATTATATTATTCATGTGTATTAACTTTAGCTAGTAAATATAGATTAAAAACAATAAGTAAAACTATTAAAAAATTTGGTTATAATTTAAATATTATTGAAAATGATAAATTAATTGCCAATTTTCCAAGAAATACTTTTGATAATATCAAAAAAATTGAAAATCATGGTATATTTATATATATATCAGAAGCTAAAGTAACTGATCCTTTTGAATATATCGATTCAATTAAATATATATTACCTACAGCTAAAGCTAATTTTAATAAACCTTGTAGTATTTGTAATTCAACTATTGATGTAGAAATACATCATGTTAAACAATTACATAGAGGTATATTAAAAGCACTTAAAGATTATATTCTAGGTAGAATAATTACCATAAACAGAAAACAAATTCCATTATGTAAACAATGTCATATTAAAACACATAAAAATAAATTTAAAAATATAGGACCTGGTATATAAAATCTATTATTAATGATACTCAATATGGAAAGCCGTATGATGGGAAACTATCACGTACGGTTTGGGAAAGGCTCTTTAACACGTGGCAACATAGGTTAATTTGCTATTTCATTTTTAGTAGTTGGTCATGCTGTATTAATGATTTTCTGTGCGCCGTTTCGCTTAATTTATCACTGTATTGAAGTGTTAATTGATAAACATATCTCTGTTTATTCAATTAATGAAAACTTTACCGTATCATTTTGGTTCTGATTATTAGTAGTAACATACATAGTATTTAGATACGTAAACCATATGGCTTACCCAGTTGGGGCCAACTCAACGGGGACAATAGCATGCCATAAAAGCGCTGGAGTAAAACAGCCAGCGCAAGGTAAGAACTGTCCGATGGCTAGGTTAACGAATTCCTGTAAAGAATGTTTAGGGTTCTCATTAACTCCTTCCCACTTGGGGATTGTGATTCATGCTTATGTATTGGAAGAAGAGGTACACGAGTTAACCAAAAATGAATCATTAGCTTTAAGTAAAAGTTGACATTTGGAGGGCTGTACGAGTTCAAATGGAAAATTAAGAAATACGGGATTGTCCGAAAGGGGAAACCCTGGGGATAACGGAGTCTTCATAGTACCCAAATTTAATTTAAATAAAGTGAGATACTTTAGTACTTTATCTAAATTAAATGCAAGGAAGGAAGACAGTTTAGCGTATTTAACAAAGATTAATACTACGGATTTTTCCGAGTTAAATAAATTAATAGAAAATAATCATAATAAACTTGAAACCATTAATACTAGAATTTTAAAATTAATGTCAGATATTAGAATGTTATTAATTGCTTATAATAAAATTAAAAGTAAGAAAGGTAATATATCTAAAGGTTCTAATAATATTACCTTAGATGGGATTAATATTTCATATTTAAATAAATTATCTAAAGATATTAACACTAATATGTTTAAATTTTCTCCGGTTAGAAGAGTTGAAATTCCTAAAACATCTGGAGGATTTAGACCTTTAAGTGTTGGAAATCCTAGAGAAAAAATTGTACAAGAAAGTATGAGAATAATATTAGAAATTATCTATAATAATAGTTTCTCTTATTATTCTCATGGATTTAGACCTAACTTATCTTGTTTAACAGCTATTATTCAATGTAAAAATTATATGCAATACTGTAATTGATTTATTAAAGTAGATTTAAATAAATGCTTTGATACAATTCCACATAATATGTTAATTAATGTATTAAATGAGAGAATCAAAGATAAAGGTTTCATAGACTTATTATATAAATTATTAAGAGCTGGATATGTTGATAAAAATAATAATTATCATAATACAACTTTAGGAATTCCTCAAGGTAGTGTTGTCAGTCCTATTTTATGTAATATTTTTTTAGATAAATTAGATAAATATTTAGAAAATAAATTTGAGAATGAATTCAATACTGGAAATATGTCTAATAGAGGTAGAAATCCAATTTATAATAGTTTATCATCTAAAATTTATAGATGTAAATTATTATCTGAAAAATTAAAATTGATTAGATTAAGAGACCATTACCAAAGAAATATGGGATCTGATAAAAGTTTTAAAAGAGCTTATTTTGTTAGATATGCTGATGATATTATCATTGGTGTAATGGGTTCTCATAATGATTGTAAAAATATTTTAAACGATATTAATAACTTCTTAAAAGAAAATTTAGGTATGTCAATTAATATAGATAAATCCGTTATTAAACATTCTAAAGAAGGAGTTAGTTTTTTAGGGTATGATGTAAAAGTTACACCTTGAGAAAAAAGACCTTATAGAATGATTAAAAAAGGTGATAATTTTATTAGGGTTAGACATCATACTAGTTTAGTTGTTAATGCCCCTATTAGAAGTATTGTAATAAAATTAAATAAACATGGCTATTGTTCTCATGGTATTTTAGGAAAACCCAGAGGGGTTGGAAGATTAATTCATGAAGAAATGAAAACCATTTTAATGCATTACTTAGCTGTTGGTAGAGGTATTATAAACTATTATAGATTAGCTACCAATTTTACCACATTAAGAGGTAGAATTACATACATTTTATTTTATTCATGTTGTTTAACATTAGCAAGAAAATTTAAATTAAATACTGTTAAGAAAGTTATTTTAAAATTCGGTAAAGTATTAGTTGATCCTCATTCAAAAGTTAGTTTTAGTATTGATGATTTTAAAATTAGACATAAAATAAATATAACTGATTCTAATTATACACCTGATGAAATTTTAGATAGATATAAATATATGTTACCTAGATCTTTATCATTATTTAGTGGTATTTGTCAAATTTGTGGTTCTAAACATGATTTAGAAGTACATCACGTAAGAACATTAAATAATGCTGCCAATAAAATTAAAGATGATTATTTATTAGGTAGAATGATTAAGATAAATAGAAAACAAATTACTATCTGTAAAACATGTCATTTTAAAGTTCATCAAGGTAAATATAATGGTCCAGGTTTATAATAATTATTATACTATTAAATATGCGTTAAATGGAGAGCCGTATGATATGAAAGTATCACGTACGGTTCGGAGAGGGCTCTTTTATATGAATGTTATTACATTCAGATAGGTTTGCTACTCTACTCTTAGTAATGCCTGCTTTAATTGGAGGTTTTGGTAACCAAAAAAGATATGAAAGTAATAATAATAATAATCAAGTAATAGAAAATAAAGAATATAATTTAAAATTAAATTATGATAAGTTGGGACCTTATTTAGCTGGATTAATTGAAGGTGATGGAACTATTCTAGTTCAAAATTCATCTTCAATAAAAAAATCTAAATATAGACCGTTAATTGTTGTAGTATTTAAATTAGAAGATTTAGAATTAGCTAATTATTTATGTAATTTAACTAAATGTGGAAAAGTGTATAAAAAAATTAATCGTAATTATGTATTATGACTTATTCATGATTTAAAAGGTGTATATACATTATTAAATATTATTAATGGATATATGAGAACACCTAAATATGAAGCATTTGTTAGAGGTGCTGAATTTATAAATAATTATATTAATTCAACAACAATTCTACATAATAAATTAAAAAATATAGATAATATTAAAATTAAACCATTAGATACATCAGATATTGGTTCAAACGCTTGATTAGCTGGTATGACAGATGCAGATGGTAATTTTTCTATTAATTTAATAAATGGTAAAAATCGTTCTAGTAGAGCAATGCCTTATTATTGTTTAGAATTAAGACAAAATTATCAAAAAAATTCTAATAATAATAATATTAATTTTTCTTATTTTTATATTATGTCTGCAATTGCACTATATTTTAATGTTAATTTATATAGTAGAGAACGTAATTTAAATTTATTAGTATCTCTTAATAATACGTATAAACTATATTATAGTTATAAAGTAATAGTGGCTAATCTATATAAAAATATTAAAGTAATAGAATACTTTAATAAATATTCTTTATTATCATCTAAACACTTAGATTTTTTAGATTGATCTAAATTAGTTATTTTAATTAATAATGAGGGTCAAAGTATAAAACTTAATGGTAGTTGAGAATTAGGTATAAATTTACGTAAAGATTATAATAAAACTAGAACTACGTTTACTTGATCTCATTTAAAAAATACATATTTAGAAAATAAATAAATAAATTATTATTACTTTCTTCCCCTCCGAATCCGTAATATATTTACGGATATATAATCTCGTAGTGTAAAAGGTGTAACGAGATTATTAATAAGTTGCCGTAATATATTGTAAAATATATTATTATTACAACACTATATGCGGGAAAACCCTAAAGTCATAATATAATATTATCCCCACGAGGGCCACACATGTGTGGCCCTCGCGGGGTATGGTAAATTTAATTAAGTTATAAATGTACTATAGTATTAAAAATTATTATGAATAATTTCCCCACCCCCATGCGAAGCATGGGGGGGGGTATAAGTATGGACAATCCGCAGGAAACCAAATAATAATTAATATCCTGAAACAAAGTAAGTGAAGGAGATATCTTAAAATATATATAATATATATTTTATAAATTATTATGTAGGATCCTCAGAGACTACACGTGTTGCACCCATTATATTATGTATAATGGGTTGAAGATATAGTCCAAATATAATTGAAAGATTATAATAAAATGAACTATTTATTACCATTAATAATTGGAGCTACAGATACAGCATTTCCAAGAATTAATAACATTGCTTTTTGAGTATTACCTATGGGGTTAGTATGTTTAGTTACATCAACTTTAGTAGAATCAGGTGCTGGTACAGGGTGAACTGTCTATCCACCATTATCATCTATTCAGGCACATTCAGGACCTAGTGTAGATTTAGCAATTTTTGCATTACATTTAACATCAATTTCATCATTATTAGGTGCTATTAATTTCATTGTAACAACATTAAATATGAGAACAAATGGTATGACAATGCATAAATTACCATTATTTGTATGATCAATTTTCATTACAGCGTTCTTATTATTATTATCATTACCTGTATTATCTGCTGGTATTACAATGTTATTATTAGATAGAAACTTCAATACTTCATTCTTTGAAGTATCAGGAGGTGGTGACCCAATCTTATACGAGCATTTATTTTGATTCTTTGGTCAAACAGTGGCCCTTATTATTATATTAATAATATATAATGATATGCATTTTTCTAAATGCTGGAAATTATTAAAAAAATGAATTACAAATATTATAAGTCTATTATTTAAAGCCTTATTTGTAAAAATATTCATATCTTATAATAATCAGCAGGATAAGATAATAAATAATCTTATATTAAAAAAAGATAATATTAAAAGATCCTCAGAGACTACAAGAAAAATATTAAATAATTCAATAAATAAAAAATTTAATCAATGATTAGCTGGATTAATTGATGGTGATGGATATTTTGGTATTGTAAGTAAGAAATATGTATCATTAGAAATTCTAGTAGCATTAGAAGATGAAATAGCTTTAAAAGAAATTCAAAATAAATTTGGTGGTTCTATTAAATTAAGATCAGGTGTAAAAGCTATTAGATATAGATTACTTAATAAAACTGGTATAATTAAATTAATTAATGCAGTTAATGGTAATATTAGAAATACTAAAAGATTAGTACAATTTAATAAAGTTTGTATTTTATTAGGTATTGATTTTATTTATCCAATTAAATTAACTAAAGATAATAGTTGATTTGTTGGATTTTTTGATGCTGATGGTACAATTAATTATTCATTTAAAAATAATCATCCTCAATTAACAATTTCTGTAACTAATAAATATTTACAAGATGTACAAGAATATAAAAATATTTTAGGTGGTAATATTTATTTTGATAAATCACAAAATGGTTATTATAAATGATCCATTCAATCAAAAGATATAGTATTAAATTTTATTAATGATTATATTAAAATAAATCCATCAAGAACACTAAAAATAAATAAATTATATTTAAGTAAAGAATTTTATAATTTAAAAGAATTAAAAGCTTATAATAAATCTTCTGATTCAATACAATATAAAGCATGATTAAATTTTGAAAATAAATGAAAAAATAAATAAATTATTTAATAAAGATATAGTCCAAATTATATATATATAATATATATATATATAACAAGCACCCTGAAGTATATATTTTAATTATTCCTGGATTTGGTATTATTTCACATGTAGTATCAACATATTCTAAAAAACCTGTATTTGGTGAAATTTCAATGGTATATGCTATGGCTTCAATTGGATTATTAGGATTCTTAGTATGATCACATCATATGTATATTGTAGGATTAGATGCAGATCTTAGAGCATATTTCCTATCTGCACTAATGATTATTGCAATTCCAACAGGAATTAAAATTTTCTCATGATTAATAAATCCCTTTAGCAAGGATAAAAATAAAAATAAAAATAAAAAGTTGATCAGAAATTATCAAAAAATAAATAATAATAATATAATAAAAACATATTTAAATAATAATAATATAATTATAATAAATATATATAAAGGTAATTTATATGATATTTATCCAAGATCAAATAGAAATTATATTCAACCAAATAATATTAATAAAGAATTAGTAGTATATGGTTATAATTTAGAATCTTGTGTTGGTATACCTCTATATACTAATATTGTAAAACATATAGTAGGTATTCCTAATAATATTTTATATATTATAACAGGTATTTTATTAACAGATGGTTGAATTGATTATCTATCTAAAAAAGATTTAGATAAAAAAACAATTATAGAAATTAATTGTAGATTTAGATTAAAACAATCAATAATTCATAGTGAATATTTAATATATGTATTTATATTATTATCACATTATTGTATAAGTTATCCTAAAATAAAAATTGCTAAAGTTAAAGGTAAATCATATAATCAATTAGAATTTTATACTAGATCATTACCATGTTTTACTATTTTAAGATATATATTTTATAATGGTAGAGTAAAAATTGTACCTAATAATTTATATGATTTATTAAATTATGAATCTTTAGCTCATATAATTATATGTGATGGTTCATTTGTAAAAGGTGGAGGTTTATATTTAAATTTACAATCTTTTCTAACTAAAGAATTAATTTTTATTATAAATATTTTAAAAATTAAATTTAATTTAAATTGTCTATTACATAAATCTAGAAATAAATATCTTATTTATATAAGAGTAGAATCTGTTAAAAGATTATTTCCTATAATTTATAAATATATTTTACCTTCTATAAGATATAAATTTGATATTATATTATGACAAAAAAAATATAATATGATTAATTAATTAATTAATTAATTAATTTATTTATTATTTACTTTTTTGATATATATAGAGGCAAACTCGAGGAAAACCATATAATTAGAATAAGTAATAATTATATGACAACCGTCGAACTAAATCATATTCAAGAAATTAATATGTAAAAGCGTAGAGATTAGACGCCTCTGGTTATCTAAGTAATATATATATATATATTATATGATAACATAAGGTATAATCCAATGAGATCAGTAATGATTTTAAAACAATAATTTTGTTTTAAGTATTAATAATAATATTAATATTCGACCTCTTAATTGAGGATATTATAATCATAATTTTTTATATTATAATATAAAATTTAACTAGCTAGATAATATTATATAAAAAAAAAAAATAATATTATATAAATTAATTAAAATAATTTTTATTAATTGAAACTGAAATGTTTTAAAGTTAAATAAAAGAGCTCTAATCCATGGTGGTTCAATTAGATTAGCACTACCTATGTTATATGCAATTGCATTCTTATTCTTATTCACAATGGGTGGTTTAACTGGTGTTGCCTTAGCTAACGCCTCATTAGATGTAGCATTCCACGATATTAATTTAATAAGTGTCGTGCTTAAAATTCACTAAAATAATATATAATAAATTATAATAAATATATAAAAAAAATAAAAAAAATAAAAAAAAATTAATATCTTATGATTAATTTTATATAAATAAAAATTTATTAAATATTATTGGTTATATATATATATATATTAATAATAAAAAAATATATATATATATATAGCTAACGGGGAAACTCTTATAATTATTATTTATATAATAAATAAGACAATCCCGTGATAACTTTAATATATATATATTATATATTAAAGTATTGTAGAGACTAAACGTGAATGATTTTAATATTATTTAAATATTAAAATTAAGAGATAGTCCAATCTTATATGTAAATATAAGTTAATACCAAAAAAAAAATAATATTATTTTGACTTATTATATATTAATATTATTAATAATAATTTTAACTAATAATAAAGTTTTTATAGAAACTTTATATTATTATTTAATATTTAATTTTCAATTAATATCTCCTTTTGGGGTTCCGGTCCCTGGTCCGGCCCCCGAAACTAAAGATATTAAGAATTTATATGAATCAATTATAAATAATTATATTAATATTTTAAATAAATATCTTATTAATATTAATAAAGATAATATTAATAAATTAAAATTTTTAGATAATTATACTGAAGAAGAAAAAGGTTATTATTTATCTGGATTATTTGAAGGAGATGGTAATATTTATACTAGATGTTTTTCAATTACTTTTTCTTTAGAAGATGTTTTATTAGCTAATTATTTATGTCTTTATTTTAAAATTGGTCATATTACAGCTAAATATAATTTTAATAAAGAATTAACAGCTGTTAAATGAAATATTATAAAAAAAAAAGAACAAGAAGTATTTATAAATTATATTAATGGTAAATTATTAACATATAAAAGATATGATCAATATTTTAAATATAATTTTAATAATCGTTTAAATATTAAATTATTAAAACCTAAAGAATTTGATTTACTATTAAATCCTTGATTAACAGGTTTTAATGATGCTGATGGTTATTTTTATCTAGGTTTTCAAAAACATAAAAATAGTCAATGATTAAAATTTCATTTAGAATTATCACAAAAAGATAGTTATATTTTAGTCCGGCCCGCCCCCGCGGGGCGGACCCCAAAGGAGATATTATTAAAAAATATTTTAAACTTGGTGGTATTTTAAAAAGAGATTATAAATCTGGTGCTACAGCTTATATTTATAAAGCTCAATCATCAAAAGCTATAAAACCTTTTATTGAATATTTTAATAATTATCAACCATTAAGTCTTAGAAGATATAAACAATATTTATTATTAAATATTGCTTACTTATTAAAATTAAATAAATTACATATATTACTTAATTCTTTATTAATATTAAAAGAATTAATATTATTACAAAGTGTTAAAAATATATCTTTAGAAATAAAAAATGAATTAAATAATAGAGTTAAAATTATTATTAATAAACTTCATTATAACAATATCGAATAATGATAATATTAAAGAGTAAAATTCTTAAAGTGTTAATTAAATAATATTCTTTTTTTTTTATGACTTACTACGTGGTGGGACATTTTCGTGCGGTCTGAAAGTTATCATAAATAATATTTACCATATAATAATGGATAAATTATATTTTTATCAATATAAGTCTAATTACAAGTGTATTAAAATGGTAACATAAATATGCTAAGCTGTAATGACAAAAGTATCCATATTCTTGACAGTTATATTATAAAAAAAGATGAAGGAACTTTGACTGATCTAATATGCTCAACGAAAGTGAATCAAATGTTATAAAATTACTTACACCACTAATTGAAAACCTGTCTGATATTCAATTATTATTTATTATTATATAATTATATAATAATAAATAAAATGGTTGATGTTATGTATTGGAAATGAGCATACGATAAATCATATAACCATTAGTAATATAATTTGAGAGCTAAGTTAGATATTTACGTATTTATGATAAAACAGAATAAACCCTATAAATTATTATTATTAATAATAAAAAATAATAATAATACCAATATATATATTATTTAATTTATTATTATTATATTAATAAAATTTAATATATATTATAAATAATTATTGGATTAAGAAATATAATATTTTATAGAAATTTTCTTTATATTTAGAGGGTAAAAGATTGTATAAAAAGCTAATGCCATATTGTAATGATATGGATAAGAATTATTATTCTAAAGATGAAAATCTGCTAACTTATACTATAGGTGATATGCCTATCTTTATTTATATATATATTATTATTATTAATAATAAAAAAAAAAATTAAAAAAAAGATAGGAGGTTTATATATAACTGATAAATATTTATTATATTATTTTTTTTTATAATAAATATTAAAAGATATTGCGTGAGCCGTATGCGATGAAAGTCGCACGTACGGTTCTTACCGGGGGAAAACTTGTAAAGGTCTACCTATCGGGATACTATGTATTATCAATGGGTGCTATTTTCTCTTTATTTGCAGGATACTATTATTGAAGTCCTCAAATTTTAGGTTTAAACTATAATGAAAAATTAGCTCAAATTCAATTCTGATTAATTTTCATTGGGGCTAATGTTATTTTCTTCCCAATGCATTTTTTAGGTATTAATGGTATGCCTAGAAGAATTCCTGATTATCCTGATGCTTTCGCAGGATGAAATTATGTCGCTTCTATTGGTTCATTCATTGCACTATTATCATTATTCTTATTTATCTATATTTTATATGATCAATTAGTTAATGGATTAAACAATAAAGTTAATAATAAATCAGTTATTTATAATAAAGCACCTGATTTTGTAGAATCTAATCTTATCTTTAATTTAAATACAGTTAAATCTTCATCTATCGAATTCTTATTAACTTCTCCACCAGCTGTACACTCATTTAATACACCAGCTGTACAATCTTAAGTTATAAAATTTAATTATTTACTTAATAATTAAAAAGTAAATATTATATCTAAACTTAATAATATAATAATAATATTCTTATAAAAATATATAAAAAAAAATATATAAAATTTATTAAAATATCTCCTTTCGGGAACTATAATATATTTATATAAATAAATACTAATATAATCCTATTATATATATATATATATAAAATAATATATATATATAATTAATATAAATAATATTTATAATAATTTTTTAATAATATATATAATTTAATATATTAATGAATATTATATAATTATTAAATATATTATAATATTATTATTATTTTATAATAAAAATATTTTTAATACTAATTATTATTTATTATTTATAAATATATAAATAGTATGTTTAATATTATTAATACTAAAAAAAATATAATTATAATTAGGATCTAACAATACATTTATCTGATTAATATTAATATTAATATTAATATTTATATTAATAAACGGATTAAATTAATTGTATCCAATTTAATTAAATTATAGATATATTATTTATAATATTAATATATTGTTTTATTAAAAAGGTAAAAATAGTTTTTATTTTATATATAAATATAGGATATAAATAAATATATTATAGTGAACCCCGAAAGGAGAATATATTAAGAATATATTTATATTTTACATATAATTATTTATAATATAAATATCTCCGCAAAGCCGGATTAATGTAATTATTTAATAATTTTATTTAATAATTTATTAAAATAAATATTTACATTTGATAATATTTATATTATGTCAGTTATTTTATATTAATGTTTAATCTATTATAATATTTTTTTTTATAAATATATTATTTATTTATATTAATTATATATATATATTATTTTTATAATATATATATATTTTTATTAAATATTTATTAAATATTTATTAAATTATTATAATGTTGTTATTAATCTTATTAAAAAATATATATAAAAATGCCACAATTAGTTCCATTTTATTTTATGAATCAATTAACATATGGTTTCTTATTAATGATTCTATTATTAATTTTATTCTCACAATTCTTTTTACCTATGATCTTAAGATTATATGTATCTAGATTATTTATTTCTAAATTATAATATATATTATTAATTTATTTATTCATATAAATATTATTATTATATATAAATATTAATAATATTTATACTTATTTAATAATAATAAAATAAAAAATAATTATAATTTAATATATTTAATATATTTCCTTACGGACTATATATTTATATATATATATTAAATACAATTTAATTTAATTTAATTATGTTATTTATTAAATAAAGTTATATTATGATATAATAACAATATTATATATTATTATATAATTATAATATATTTTAATATAATTATCAAAAGAAATAATAAAAAAATATTAATAAGAATATAATTTAATAATTATTAAAAAAAAATTCTTATAGTCCGGCCCGCCCCCCCCGCGGGGCGGACCCCAAAGGAGGAGTAATAAAAATTATTAAATACAAATATTATATATATATAATTCATTATATATATATATATATAATAATTAATCTTATTTTTTTATATATTTATTTATATATCTATTTATATTTTATATATATTTATTTATATATCTAAGGGGTTCGGTCCCTCCCCCCGTAAGTATAATATACGGGGGTGGGTCCCTCACTATTTATATTTTTATTTTATATATTTTATATATTTATAAATAAAGTATAATAAGATATAATTATGATTAATTATTTATAAGTTATAGTTTTATAAATTTATAATTATTATGTTTAATTTATTAAATACATATATTACATCACCATTAGATCAATTTGAGATTAGACTATTATTTGGTTTACAATCATCATTTATTGATTTAAGTTGTTTAAATTTAACAACATTTTCATTATATACTATTATTGTATTATTAGTTATTACAAGTTTATATCTATTAACTAATAATAATAATAAAATTATTGGTTCAAGATGATTAATTTCACAAGAAGCTATTTATGATACTATTATAAATATGCTTAAAGGACAAATTGGAGGTAAAAATTGAGGTTTATATTTCCCTATGATCTTTACATTATTTATGTTTATTTTTATTGCTAATTTAATTAGTATGATTCCATACTCATTTGCATTATCAGCTCATTTAGTATTTATTATCTCTTTAAGTATTGTTATTTGATTAGGTAATACTATTTTAGGTTTATATAAACATGGTTGAGTATTCTTCTCATTATTCGTACCTGCTGGTACACCATTACCATTAGTACCTTTATTAGTTATTATTGAAACTTTATCTTATTTCGCTAGAGCTATTTCATTAGGTTTAAGATTAGGTTCTAATATCTTAGCTGGTCATTTATTAATGGTTATTTTAGCTGGTTTACTATTTAATTTTATGTTAATTAATTTATTTACTTTAGTATTCGGTTTTGTACCTTTAGCTATGATCTTAGCCATTATGATGTTAGAATTCGCTATTGGTATCATTCAGGGATATGTCTGGGCTATTTTAACAGCATCATATTTAAAAGATGCAGTATACTTACATTAAATTATAAAATAAAATTATAAAATAAAATAATTTACATATGGAGTATTAAACTATAATAAATACAATATACCCCATCCCCCCCTTTTAATAATATTCTTTTATCTAATAAAATATTTATTTATTAATATTATTATTATCTTCTTCAAGGACTTATTTAATATATTTAATAACTTATTATACTTATTTATATTTATAATTAATACAAATATATTATTAATCTTACTCCTTCGGAGTTCGGCCCCCCATAAGGGGGGGACCTCACTCCTTCCCCACTGCACTGGATGCGGGGACTTATTTTTATTATTATTATTTAATCTTTATTTATAAAATTATATATTATATATAAATTATTATACTTAATAATTAAAAAAAAACCTCTAATTATTATTAATATTATATATAATATATATATTCTCATTAATGTTATATATAATATATATATTCTCATTAATATATTAATATAGTATTAAAAAAAATAAAATATTTAATAAATATTATTATTAATAATATTTATTAAAAATAATATAACATAATAAATATAAGATTATTATATAATATATTTATTATATCATATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGGAAGGAGAAATTATAACATATTTTTTAATAATATTCATATTTATTTTATATACAAATAAATATATTTATTTAGAATAATAAAAAAAAATAATAAATAAATATATTATTATCATTATTATACTTTATTCATTATTTATTATAATAATTATATATAACAATTATAATATATAATTATATTTTATATAATATTATATTAATATTTAATATATTTATTATTATTATTACTTCTATGGAAACTTTATATTTTAGATATTTTTATTATTATTATTAATTTATAATGTTATATTTTTGATTTATAAATATATAAGTCCCGGTTTCTTACGAAACCGGGACCTCGGAGACGTAATAGGGGGAGGGGGTGGGTGATAATAACCAGAATATTCAATAAATACAGAGCACACATTAGATAAATTTTATAATATAACCAATATAAAATAAAATTAAAATAATTAATATATATATATAAATATAATAAATTATTATATATAAATATATATAATTTTTATAATAAATATTATAATATTATATAAATAAATAATTATAATATATAATAAATATATAATAATAATAAAAATATTAACAATATAATAAAAATTTATAATATAAATATAAATTATAAATAAGTTAAATTAATAAAATAATAAATGATTAACAAGAAGATATCTGGGGTCCCATTAATAATTATTATTTTCAATAATAATTGGGACCCCCCACCATTATAATATCATATTAATTAATATAATAATAATGTATATAAAATAGAAATAATAATTAATATAATAATAATAATATATATAAAATAGAAATAATAATTAAATATATATATAAATAATTATTTATATAATATATTATAAATAATAATAATAATAAATATTTATTAATTAATAATGATTATAAATATTTTATTTAATATAAATTTATAACTATTTTATTATATATATATTTTTTATTCATAAAAATTCCTTTTGAGGATTTTTATTTTATATAAATATCTTCTAATATTTATAATAAATAATAATATATTCATTATATTTATAATTATATATAATGTAATACGGGTAAACATTACCCGTTGTTCACGGGTAATGTTTACCCTATTTTATATAATTCTTAATAAATATATTTATATTTTTATATAAAAAAAATTATAATAATTTATTAATTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGAACTCCGGAACTATAAAAATAATTTTAATATAATTTATATATTTTATGATTAATATAATATATTATTAATGTAACTCCTTCGGGATTTGGTCCCCCTCGTAAGTATATAGTATATAGTATATAGTATACGGGGGGTCCCTCACTCCTTCGGGGTTCGGTCCTCCCTTACGGGTACGGATACGGATACGAATATGGGGAGTCCCTCACTCCTTATCACTACGCTGAAGGTGGAATTTATTTTATATTATTATTAAATCTTTATTTATTTAATTATATATTTAATATATATATTATTATAATAAAACACCTAATTATTATTAATGTTATATTTAATATAATATATATATTCTTAAAAATTTATATAATATAAATAAATAAAAAAAAAAGAAAGTACATAATTAATATTATTATAAATAATATTATTAAAAAGAATATAATATAATTAATAGAAAGACGTTTTAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAGAGTTTTGGTTTACATATCAAGACCCAATTCAATTGAAACTATTTATTTATTAATCTCCTCCCCTCCCCCTCACTATTATTATAAGTACAATTAGGGCGCCAACCCCGCAGTGTTATTTACTGGGAAATGTTTATCCCAATTAATATAATAACGAGAGTTATTAATTATTATTTATAAATTCATATAATGTAATATAATGTAATGTAATTAATAGAACATTATTGTGTTATTCACCAGTGTTAAGATATATTAATCCCAATTTTATTTAATAGTGAAGATTATATTTTATTAATTATGAATCCATATTATTATTATTTAATATATTTATAATATTATATATAATTATAATTATAAATAATTTATATAAAAAAAGTTTTATTAAAAAATATTATTAAAAATATAATATTAATAATAAATAAAAATAATATTATACTCTTAATAGAATTTATAATGATAAAAATTAAGATGAAGACTTTTTTTTATAATTATTATAAATTTATATAAAAATAATATATATATATTTATATTTATTTTATTAATATATATAATATATTTATGTATATTAAAAAGATATATTTAAATATTTTTATTTTTTTTTTATAAGATAATTTTTGTAAATATATAAGTAATAAATTAAGTTTTATAGGGGGAGGGGGTGGGTGATTAGAAACTTAACTGAATAATATATATAAAGCATACATTAGTTAATATTTAATAATATAATCAATATATAATAATTATAAAATAATTAATTATATAATAATAATAATGTATAAACAATATAATAAATTGTATAAAATAAAATATAAATCATAAATAAAGCTAAATTAATAAAATAATAAATGATAAACAAGAAGATATCCGGGTCCCAATAATAATTATTATTGAAAATAATAATTGGGACCCCATATAGAATATAAATAATTAAATATATATATATAAATAATAATTTATATAATATATTATAAATAAATAATAATAAATATTATTAATCTATAATAATTATAAATATTTTATTAATATAAATTTAATAATTATATATATTTTTATAATAACTCCGAAAGAGTAAGGAGATATTAATTTCTTATAAAAATTTATTAATAATAATAATATATAAAATATATAAATAATATATTATATATAAAATAAAATAAAATAAATAATATATTAAAAATATTGAAAGTATTTTAATAAATAATAAATTTAAAATTCATATTTATAATAATAAATAAATAAATAAATAAATAAGTAAATATTTAGATTCTCATTAATATTAATATTTATATTTCTTTTTTTTTATAATAATAAAAATATCATATATAAATATAATATAATATAATATAATAAATTATTATATATAAATAATAAATATTAAATATAATATATAATAATATATAATCTTACAATTTATAATTTAATAAAGAAGGAAATAAATAATAATAACTCCTTTTGGGGTTCCGGTGGGGTTCACACCTTTATAAATAATAAATAAAGATGTTTACTCCTCTTCGGGGTTCCGGTCCCCTTTTTGGGTTCCGGAACTAATTAATATTTTATATAATAATAATAATATATTAATATAATTTCATTATTAATAAATATCTCCTGCGGGGTTCGGTTCCCCCCCGTAAGGGGGGGGTCCCTCACTCCTTCGGAGCGTACTATTATTATAAATAATTATATATTATAATATAATTAAAAAGTATTATAATTGAAACGAAAATTGTAATTTTAAATGGAATAATAATTATTATATATTTAATATATTTAATAAAGTTATAATATCTCTTTCTACCGGACTATTTTATTTTATTTTATTTTATTTTTATAAAGAAAAATAGTAATAATATTATCTTCTCCTCCTTTCGGGGTTCCGGTTCCCGTGCCGGGCCCCGGAACTATTAATTATATAATATAATATAATATAATATAATATAATATGATACGGATCAAACATTACCCGTTGTTCACTGGCAATGTTTAATCCTATTGTATATAAATATAATAAAATAATTATCCCTCTCGTAATACATATATAAAATATAAAATATAAAATAAAAATATTATGATTATTATAATATATATATATATATATATAAATATATATATATAATTTATAATTTATATGATTAATATATTATATATATAAAAAATATATTAAATTTACTTTTTATAGAAAGGAGTGAGGGACCCCCCCCCCTTACGGGGGGGAACCGAACCCCGCAGGAGATATTTATTTTAATACTTATATAGTATTTATTAATAATATAATAATTGTTATTATAAATATTAATAATAATATAAAAATAGGGTAAATAATATAAATAATATGAATAAATATAAAAACATATTAAATATAAAATATATCATAAATTTAATAAATATTATAATAATTTATAAATGATAGATATCTGGGGTCCTATAAATAATAATTATTTTCAATAATTATAGGGACCCCCACCTATTATATAAATATAAATATAAATATAAATATAAATACAAATATAAATATATAAATATATAAATATAATATAAATACAAATATAATATATAAATATAAATATAAATATATAAATATAAGTCCCCGCCCCGGCGGGGACCCCGAAGGAGTGAGGGACCCCTCCCTATACTAATGGGAGGGGGACCGAACCCCGAAGGAGTATAAATAAAAATTAATAATATATATATAATTATAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAATAATAATATAATATATAATAAAATATAACTTATTAATATAATATTAAAAATATAATTAACAAGAATAAATAGTCCGTGGGATCGAACCCCCTTTTTTATTTAATATTTAATATTTAAAGAAGGAATTGTTTATATATATTAATATCTTATTTGGGGATTAATATAATATATAAGTTTTGGATACCAGGCCAAAGACCGGAATCCCAAAAGGAGATTATATAAATATTATTTATCTCCCTTTTTTAATATTATAATAATTTTATTAAAAATAAAATAATAATAATAATTATAATTTATAATAACAATTATAATAATTTAATTAATTAATTAATTAATTAATTAATTAATTAATTAATTAATAATAAATATAAATATAAAAAGAATATAATTTATAATAAATAAATTTATATATATATATATATATTAAATAAAATATTTACTTCATTAATATAAAATATAAATATATTTAATTAATAAGTATATATATATAATAATATATAATAACCTATTTATATATATAATCTTAATATAATTATAAGAAATATTATATAAGTAATATATAAAAATAATATAAAATAATTATAATTCAATTTATATATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGGAATAAGATAAATATATAAATTATATTAATAAATATAAATTTTAAATGAATTAATAAAATTAATATATATATGTATATATATATATATATTAAAAATATTTAATTATTTTTAGGAAGGAGTGATAGATCCCTTTGGGGGACCGAACCCCTATTTAAGAAGGAGTGCGGGACCCCGTGGGAACCGAACCCCTTTTTTATTTAAAGAAGAAGTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTAATTTAATTTTAATTAGGTTAATAAATAGTAATAATAAACTTAATAATAATAATAATAATTTTATTTTTATAATTTATTAATAATAATAATAATTATATATATATATATTATTAATAAATATAGACCTTATCGTCTAATGGTTACGACATCACCTCTTCATGTTGATAATATCGGTTCGATTCCGATTAAGGTTATTCATAATAATAAATATTTGTAAAAAAAGTATATATAATTAAACATATTCTTTATATTAATTAATAATTATTAATAATATACATTTTATATAATACAATTATATATATATATATATTTTTTTTTAATACAAATAATATATTCATAATAATAAATACCGATTGTTATTATACTATAATAAAATATATAATATATTTTTCATTATAATATTTTTAAATAAATATTATAATAAATTATATAAATAATATTTATGTATAATAATAATAATAATAATTGTTATTAATTAATTCTATAATTATTATATATTTAATTTTTTTTTTTAATATAATATATAATAATATAATTTATTTTATTTTTTTTTATAGTTCCGGGGCCCGGTCACGGGAGCCGGAACCCCGAAAGGAGAATATAAATTAATAATAATATAAATAACATATTAACAATAAATTATTGTTAATATAATAATAATAATAACAATATTAATAAATAATATAAAAATTATTAATATTATATTTATATAATATTAATATAAAAATCTTTCATAATATTAATTATTATTAAATAATAATGATATCATTAATATTAATATAATCGTCAATATTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTATTAAATAAATATTTTTAAATATTATATTATATTATTAACTTTTTATTAAAAAAATTAATAATGATATAATATAATTAATATTATCCACGGGACCAATGACCAACCCAGTAGTTGACCGGATTGGCGCCCGCGAGGTTTATATTTAATAAATAATAATAATAATATTAATAAAATCTATTAACTTTTTTTTTTAATGGATTATATTAATGAAAAAAAAATGAGAAATATCTTTTTTTTTTAATAATTATAATTTATATATAATAAAATATGTATATATAATAAAAAAATAGTTTTTAATATTATAATATAATTATATATATAATTATAAATATATATATATATAATAAGTATTAATTAATAATATATATTTATATATTTTTTATTAATTAATATATATAAAATATTAGTAATAAATAATATTATTAATATTTTATAAATAAATAATAATAATATGGCATTTAGAAAATCAAATGTGTATTTAAGTTTAGTGAATAGTTATATTATTGATTCACCACAACCATCATCAATTAATTATTGATGAAATATGGGTTCATTATTAGGTTTATGTTTAGTTATTCAAATTGTAACAGGTATTTTTATGGCTATGCATTATTCATCTAATATTGAATTAGCTTTTTCATCTGTTGAACATATTATAAGAGATGTGCATAATGGTTATATTTTAAGATATTTACATGCAAATGGTGCATCATTCTTTTTTATGGTAATGTTTATGCATATGGCTAAAGGTTTATATTATGGTTCATATAGATCACCAAGAGTACTATTATGAAATGTAGGTGTTATTATTTTCATTTTAACTATTGCTACAGCTTTTTTAGGTTATTGTTGTGTTTATGGACAGAGTGAGACAAGTATAAGTATATTATTATAATATCATACCATTAAATAAATTATTTTAATGAAATGATTATGTTTATATATAACATATACCTAATTAGACATGCATTATTAGTAATAATTTTGTATGAAACTCTAATAATAATAATTATTATTAATTATTAAGGTAAGATTCATATGGATAGCGTAAGTCAATCTAATATTATAAAATATCGTAACATAAACAATATTTTTTTCTATTATTAATTAATAAATAATAATAAATAAAAATAATTATATGAGAAGTAAGATATTCAATTCTGTCTAGAATACATATATATACGTTAATACTCATCGGTATAAAATTAGAATCCTAAGTGAATTATTGAAAGTATAATAATATAAACTTGGTAAGCCCAATTATTTCCATATAATATTAATATAAATATTATATGGTAGTTATATATAATATTATTAAATAAATAATAATAGAAATTATAATATAGATAAGTGGGTAAAAGACTATTGAAAAAGCTAAAGATTATATGTAATGTATAATATAGATCAAATTATTTATATATTTTAATAAAAATATATTAATAATGGTTAATATTATTATTAATTAATTAATTAATTAATTAATAATAATAACGAATAAATGATTAATGTGAAAGCATGCTAACTTCAATATAGGATGATTTATATAGTATATAAATTGTTTGAGCTGTATACTATGAAAGTAGTACGTACAGTTCTGAGTGGGGGAAAATTTGTAAAGATCTACCTATCACAATTGTCACATTGAGGTAATATAAATATCGCCTCAAATATATTTAATATAATAAAACTAATTTATATAATAATGTTAATATTATTAATTTATATTTTTTATACGATTATAATAAGACAAATAATAAAAACTAAAGAATATCTTATATTAATTAAGAGTATAGATTATATTAATAAAAATAAATATATAATTAATTTAAATATAACAAATAAGAAAGATATAAATAATAATATTGGTCCATTAAATATAAACATTTTATCAATTATTTATGGTTCAATATTAGGAGATGGTCATGCTGAAAAAAGAAAAGGTGGTAAAGGAACAAGAATTGTATTTCAACAAGAATATTGTAATATTAATTATTTATATTATTTACATAGTTTATTAGCTAATTTAGGTTATTGTAATACTAATTTACCTTTAATTAAAACTAGATTAGGTAAAAAAGGTAAAATTAGACAATATTTAAAATTTAATACATGAACTTATGATTCATTTAATATGATTTATTCAGAATGGTATATTAAAAATATATCTGGAAAAGGTAATATTAAAGTTATTCCTAAATCTTTAGACAATTATTTAACTCCTTTAGCTTTAGCTATTTGAATTATAGATGATGGATGTAAATTAGGTAAAGGTTTAAAATTCACAACTAATTGTTTTAGTTATAAAGATGTTCAATATTTACTTTATTTATTACATAATAAATATAATATTAAATCTACTATTCTTAAAGGCAATAAAGAAAATACACAATTTGTTATTTATGTATGAAAAGAATCTATACCTATTTTAACTAAAATTGTATCTCCTTATATTATTCCTAGTATAAAATATAAATTAGGTAATTATTTATAATAAAATATATAGTATTATATTAATTATTATATTATTATAATGCGATATTATTGAAAACATGTCAAAATTATATTATTAAGTAACAAGACAGTGGGTTATATAATTATATGATCCCAACAGAATACACCAATAATAGGTATTATTATAAAAAAAATAATAATATTTAATGTTTATTCGAAGAAAATTTATAATATTATTATTATAACACAAGGTTTAATAATCTATATATATATATTATATATATAACTACTGTTATTATTCCATTTACCTAATTAATATATAAATAATGAATTATAATTATTATGATTAATATTTTTATAATAATAACCCCATCATAACATTTATATATAACATTTATATATAACATTTATATATAATATTTATATTATGGTATTATTAGGTATAAATATTTATTCATAAGAGAAAATAGTGATTAAATGGAATTATAAAAAGGGTAGATATTATTAAATACAGGGTATTATTTATATTAATAAATCAATAAATATTGAGATTATTATTATTAAAAAATAATAATAATTTATAAATAATATTATTTTCTTGGCACTAGTTATTACTAATTTATTCTCAGCAATTCCATTTGTAGGTAACGATATTGTATCTTGATTATGAGGTGGGTTTAATATAGAGGATCCATATTATAGTAATATAATATTAAATAAATCTGTTTTATGCTGAAATATCTTCATTTGAATAATAAATTACTATATTATTCAATTAATTATTTATAATAATATAATTTGAAATAAAAATAATATAGTTAAAATATTTATTATAAGAAGAAAATTAGCAGTAATTAATATATATATATATATAAAATTAATTATTCAGAGACTTTATAGTTATTATATAAATAATACTATTATTTATGATAAAAATCATAAATTAAACACAGATAATCCTATTTATGCATATATTGGTGGTTTATTTGAAGGAGATGGTTGAATTACTATTTCAAAAAAAGGTAAATATTTATTATATGAATTAGGTATTGAAATACATATTAGAGATATTCAATTATTATATAAAATTAAAAATATTTTAGGTATTGGTAAAGTAACAATTAAAAAATTAAAAATAAAAGATGGTACTATTAAAGAAATATGTAAATTTAATGTAAGAAATAAAAATCATTTAAAGAATATTATTATTCCTATTTTTGATAAATATCCTATATTAACTAATAAACATTATGATTATTTATATTTTAAAGATAATTTATTAAAAGATATTAAATATTATAATGATTTATCTTATTATTTACGTCCTATTAAACCATTTAATACTCTTGAAGATATTTTAAATAAAAATTATTTTTCTTCATGATTAATTGGTTTTTTTGAAGCTGAAAGTTGTTTTAGTATTTATAAACCTATAAATAAAAAAATAAAACTTGCTAGTTTTGAAGTATCTCAAAATAATAGTATAGAAGTTATATTAGCTATTAAATCATATTTAAAAATTACTCAAAATATTTATACAGATAAATTTAATAATTCAAGAATAACACTTAAAAGTATTAATGGTATTAAAAATGTTGTAATATTTATTAATAATAACCCTATTAAATTATTAGGTTATAAAAAATTACAATATTTATTATTCTTAAAAGATTTACGTCTTATTCTTAAATATAATAATTATTTTAAAATTCCTCCTAAATATTAATCTTATATAAAAATATAATAATAATATATTTATATATTATATAATTATATAAACAAAATATAATTTATATATAATTATTTATTATAAATATAGTCCGGCCCGCCCCGCGGGGCGGACCCCGGAGGAGTGAGGGACCCCTCCCTATTCTAACGGGAGGGGGACCGAACCCCGAAGGAGTTTAATTATATATTAAATATATTATTATCAATAAATAATTCCTTTGAACTATTTATTATTTTATTATATTTATTTTCTCCTTCATTATTAATTTTTATTAATAATTAAAATCTTATCATTTTATGGTATTTTTATTTCTATTTTAGGATATCGAAACTATAAATTAAAAAGTATAATTTTATTAATTATAATTTATGATTAATAAATAAGAAATAAAAACTTTAGAAGTAATATTTATCTTTTTTTTTTATAAATAAATATTATGATTAATATATAATCATTTATAAATATTTATATATAATTATATATATACATAAATAGGATTAAGATATAGTCCGAACAATATAGTGATATATTGATAATAGTTTTCAAATATGTAACTATTTAAACATTAAAAGCTCAGTATCTAACCCTCTAATCCAGAGATTCTTTGCGTTACATTATTTAGTACCTTTTATCATTGCTGCAATGGTTATTATGCATTTAATGGCATTACATATTCATGGTTCATCTAATCCATTAGGTATTACAGGTAATTTAGATAGAATTCCAATGCATTCATACTTTATTTTTAAAGATTTAGTAACTGTTTTCTTATTTATGTTAATTTTAGCATTATTTGTATTCTATTCACCTAATACTTTAGGTCAAAATATGGCCTTATTATTAATTACATATGTAATTAATATTTTATGTGCTGTATGCTGGAAATCTTTATTTATTAAATATCAATGAAAAATTTATAATAAAACTCTATATTATTTTATTATTCAAAATATTTTAAATACAAAACAATTAAATAATTTCGTATTAAAATTTAATTGAACAAAGCAATATAATAAAATAAATATTGTAAGTGATTTATTTAATCCCAATAGAGTAAAATATTATTATAAAGAAGATAATCAGCAGGTAACCAATATAAATTCTTCTAATACTCACTTAACGAGTAATAAAAAGAATTTATTAGTAGATACTTCAGAGACTACACGCACACTAAAAAATAAATTTAATTATTTATTAAATATTTTTAATATAAAAAAAATAAATCAAATTATTCTTAAAAGACATTATAGTATTTATAAAGATAGTAATATTAGATTTAACCAATGATTGGCCGGTTTAATTGACGGAGATGGTTATTTTTGTATTACTAAAAATAAATATGCATCTTGTGAAATTCTTGTAGAATTAAAAGATGAAAAAATGTTAAGACAAATCCAAGATAAATTTGGTGGTTCTGTAAAATTAAGATCAGGTGTTAAGGCTATTAGATATAGATTACAAAATAAAGAAGGTATAATTAAATTAATTAATGCCGTTAATGGTAATATTCGTAATAGTAAAAGATTAGTACAATTTAATAAAGTATGTATTTTATTAAATATCGATTTTAAAGAACCTATTAAATTAACTAAAGATAATGCTTGATTTATAGGGTTCTTTGATGCTGATGGTACTATTAATTATTATTATTCCGGTAAATTAAAAATTAGACCTCAATTAACTATTAGCGTTACAAATAAATATTTACATGATGTTGAATACTATAGAGAAGTATTTGGTGGTAATATTTATTTTGATAAAGCTAAAAATGGTTATTTTAAATGATCTATTAATAATAAAGAATTACATAATATTTTTTATCTTTATAATAAAAGTTGTCCTTCTAAATCTAATAAAGGTAAACGTTTATTTTTAATTGATAAATTTTATTATTTATATGATTTATTAGCTTTTAAAGCACCTCATAATACTGCTTTATATAAAGCTTGATTAAAATTTAATGAAAAATGAAATAATAATTAAATTTTCTCCGTATTCATTATTATATTATCTAATTTATAAAATATTTAAAGATTCCTTATAATAATATAACATCTTTGTAAATTATTGTTAAAGATAATATAAATTATTATGAATCGGTAGATTATATTTTTACAATCTTATTAAATAAAATTCTGATCATTAAACATGATTGAAGAAATAATAATAGTTTATGAAATAAGATAGTGTAATATAAATTTTTATGAAGATATAGTCCATTTTATATTTATTATAAAAGCATCCTGATAACTATATTCCTGGTAATCCTTTAGTAACACCAGCATCTATTGATATTAAAAATATTAATAAAATTATTATTATTTAATCTTATTTATTTTATATAAAAAAAATAAATAATAATTATTAATAAAAATATATTATTTATTTCTCCTTTCGGGGTTATTTATATATATTCCTTTATAATTTATATTTAATATATTATATTAAATATATGAAAAATTATAATAAATAAATTAATTAATTAATAATAAATAATAATAAAAAGTACAGTAGCATTAAATATTCTTAAGTTTCCGCTTTGTGGGAACTCCCATAAGGAGTTTAATGATTAAAATTGGTTAATTGTCAAGAAAATCTAAGGTATTAATAAATAAATAATACTATGACAACTTGCAGCGAAGTTTATATCATCTCTATATTATATATTAATATATATATATAATAATAATAATAATATTAATATAATATAAGATATAAAAACGTTCAACGACTAGAAAGTGAACTGAGATAGTAATACCTTTCCACGAAAACCAATTAATTTATAAATTATTTTTAAATAAAGAATAGATTATTAATTTTTTTTATATAGTTCCGGGCCCCGGCCACGGGAGCCGGAACCCCGGAAGGAGTAATATATATTATATATAAAATAAAAAATATATATATATATATTATAAAATATCAAAAGTTTTAATCTTTTATTATAAATTAATGACATAGTCTGAACAATAATGAAAATTATTGAGATAAGATATTAAATAATCTTATGTTAACATATATAAATTGTGTACCTGAATGATACTTATTACCATTCTATGCTATTTTAAGATCTATTCCTGATAAATTATTAGGAGTTATTCTAATGTTTGCAGCTATTTTAGTATTATTAGTTTTACCATTTACTGATAGAAGTGTAGTAAGAGGTAATACTTTTAAAGTATTATCTAAATTCTTCTTCTTTATCTTTGTATTCAATTTCGTATTATTAGGACAAATTGGAGCATGCCATGTAGAAGTACCTTATGTCTTAATGGGACAAATCGCTACATTTATCTACTTCGCTTATTTCTTAATTATTGTACCTGTTATCTCTACTATTGAAAATGTTTTATTCTATATCGGTAGAGTTAATAAATAATATATAATTAAATTAATACATAGATATAATATATATATTATTATTATTAATAATATAATAAAAATAAAAATAAAATTATTAATAATAATAATACTTTAATAATATTCTTAAAAATAATATATCTCTAATTTATAAAAATTAAATAATAATAATAAAAAAAAAATATTATAAAATATAAATTAATTAATAATGAAAATAATATACTTATTAAATTAATATAAATAAATGAATAATATAATATAACTATATTGAATTATAATCTATCTATCTTTTTTTTTCATATAATTATAATATATATATTAATATATATAATTATTATTTTATATATTATAGTTCCGGGGCCCGGTCACGGAAGCCGGAACCCCGCAAGGAGATTTATTAATTATTATTATCATTATTATTTTTTATTTAATCTTATTTATTATAAAATAATTAATTATCATAAAGCATAATTATTATAGAATCTTATTATTTTCTTTATTTAAATTTATAAAAATATAAAGTCCCCGCCCCCTTTTTATTTTATTTAATTAAGAAGGTATTTTAAAAAAGGAGTGAGGGACCCCCTCCCGTTAGGGAGGGGGACCGAACCCCGAAGGAGTACTCATTTAATATAAATATTAAATAAAAATTATTTTATATATATTAATGATTATTAATATTGATAATATAAATTATTTTATAATTAATTATTATAAATATATAACTATTAATAATTAATTTTTAATCTAGGGGTTTCCCCCACTTACATAAACTTACGTATACTTACATATACTTATGTATACTTACATATACTTACGTATACTTATATATACTTATGTATACTTACGTATACTTACATATATGGGGGATCCCTCACTCCTCCGGCGTCCTACTCACCCTATTTATTAATCATTAATAAGAAATTATTATTAAAAAAATTATAATTTACTCAAAGTTAATTATAAATATATTTTTAAATATCTATTTTATTAATCTTTTATAAAATTTAAATTAATTGTAATTAATTAATATTATAATAATTATTCTTAGGAAGGATATTTATTTATTTTAATTATGAATTCCTGACATAGAGACAATTAATTAGAACTTCTTATTATTATTATAGTAATAATAAAAATATTCTAAATATATTATATATATTATTATTTTTTTTATTATTAATAAAATATTATAATAAATTTAAATAAGTTTATAATTTTTGATAAGTATTGTTATATTTTTTATTTCCAAATATATAAGTCCCGGTTTCTTACGAAACCGGGACCTCGGAGACGTAATAGGGGGAGGGGGTGGGTGATAAGAACCAAACTATTCAATAAATATAGAGCACACATTAGTTAATATTTAATAATATAACTAATATATAATAATTATAAAATAATTAATTATATAATATAATATAAAGTCCCCGCCCCGGCGGGGACCCCAAAGGAGTATTAACAATATAATATATTGTATAAAATAAATTATAAATATTAAATAAAAACCAAATAAATAATATAATAAATGATAAACAAGAAGATATCCGGGTCCCAATAATAATTATTATTGAAAATAATAATTGGGACCCCCATCTAAAATATATATATAACTAATAATATATTATATATATTAATATATAATAATATTATTAAAATATAATATTATTAAAAAAAAAGTATATATAAAATAAGATATATATATATAAATATATATATTCTTAATAAATATTATATATAATAATAATAAATTATTTCATAATAAATTATTTCTTTTTATTAATAAAAATTACTTATCTCCTTCGACCGGACTATTAAATATTAAATATTTAATATTTAATATTTAATATTTTATTCTATAGATATTCATATGAAAAATAATAAGTATATAATTATGATAATGAATATATTTTTATTTATAATTTATTATTATAAAAATATTTTAATTTAATAATAATAATAAATCATTATATTAATTCTTTTAAGAATTTATAATTGTCATTATTTATTATATACTCCTTATTAAAAGGGATTCGGTTTCCCTCATCCTCATGGGTATCCCTCACTCCTTCTGATAATTAATTTTATAATAATAATAAAATAAACTTAATTAAATATTATATATTTATTTACAATTATATATATATATTACTCATAATTAAATTAAATTAAGATGCAATTCAATACGGTTGTATTATATTATTCATCAAATATTGTTAATATTGATACCTACAGAGATATTTAATATTTTTATTATTATTATCCATTACTTTTTTTATTATATTTTAATTATTTATTTATTTATTTATTTATAATAATAATATTTCATATTATCAATTATTATTTTTTTTTTTTATAATATATAATTAAATTATTTATATAGTTCCCCGAAAGGAGAATAAATAAAATATTATATAAATATTTATATCTTTATTAATATTAATATAAGTAATATATATAGTTTATGATATTTAATTTTATCATAATATAATAATAATTATATAAATCTTATACACATTTATATAAGTATATATATATATTATTAATATAATGAACATCTATTAAATAAAATAATTGTAAATCTCAAGTAAATTATTATTATTTTATTTTTAATAATAATTTATGATTTATAATTAATAAATAAAAGAGTAATTATATGATAAAAAAGGTAATAAATAAAATTTATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTTTATTTATATATATATATATATGAATTAATATTTAATAATAAATAATAATATAATTAATAATATTATTATTATTATAATTTTTTTATTTATAATATTAATAAAATATTATTATATATATATTATAATAATATTAATAAGATATATAAATAAGTCCCTTTTTTTTTATTTAAAATAAAGAAAGAATAATTAAATAATATTTTAATAATTTAATTAAATAGTGTATTAAAAGATAATAAAAAGTAATATTAATATGTTAATTATATATAATATATTTATATATAATTATATATATATATATAAATAATAATAAATATATATATAATATAAAAATAAGAATAGATTAAATATTTAATAAATAAATATTATGCAATTAGTATTAGCAGCTAAATATATTGGAGCAGGTATCTCAACAATTGGTTTATTAGGAGCAGGTATTGGTATTGCTATCGTATTCGCAGCTTTAATTAATGGTGTATCAAGAAACCCATCAATTAAAGACCTAGTATTCCCTATGGCTATTTTAGGTTTCGCCTTATCAGAAGCTACAGGTTTATTCTGTTTAATGGTTTCATTCTTATTATTATTCGGTGTATAATATATATAATATATTATAAATAAATAAAAAATAATGAAATTAATAAAAAAATAAAATAAAATAAAATCTCATTTGATTAAATTAATAACATTCTTATAATTATATAATTATTATAAAATATATAAATATTATAATAATAATAATATATATAAATTATAATAAAAAATAATAATAATATATAATATACCTTTTTTTTAATATATTAATATATAAATAAATAAATAATGGATAATATATAATTACTTTTTTTATATTATTAATAATAATAATTTATAAATATTGTTATAATAAACATTTATATAAATAAATATAAATTACCATAATAAGATATATTATTTATTAATAATAAAAATATTTATTAATAAATAAGAAATATATATATTATGATAATATTTATTAATAAATAATAAATTCTTTATATATAAATATATTAAATATATTTAATTGAACACAATATAATTTTTATTGTATTATTCATTTAATAATATTAATATTAATATTAATATAATATTAGTGAACATCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTATTAATATTTAATAAAATATATATAATTTATAATTTTCATATAATTAATATAATAATTAGGTTTATAAATAAATTATAATATATTATAACAATATAATAAAATATATTATAAATCTATCTATCTATCTATATAATATATAAATTTATATATACATTAATAATATTTAATTATAATTATTTAAATATTTAATTTATTAATATTCCCCGCGGGCGCCAATCCGGTTGTTCACCGGATTGGTCCCGCGGGGTTTATATTATTTAAATATTAAATATTAAATAATAATTTATATTATATTAATAAATATAATAAATTAAAAATATATGATTAATTATATAATAATAATAATAATTATTTTAATATTATAATTTATAAAATTAATTATATTAATTATATTAATTCTTATTATATAATAATTATTAATAATAATTTATTTTAAGAAAGGAGTGAGGGACCCCCTCCCGTTAGGGAGGGGGACCGAACCCCGAAGGAGAAAATAAATTAATAAAAGTTTAAAAGTTCTTATATTAATAATTATATAATATTATATTAAAGATTTTTATAATATATATATATAATATATTTATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTTTATTTAATATTTATATTTATATTAATATTTATATTTATATTTATATTCCTCTTAAGGATGGTTGACTGAGTGGTTTAAAGTGTGATATTTGAGCTATCATTAGTCTTTATTGGCTACGTAGGTTCAAATCCTACATCATCCGTAATAATACATATATATAATAATAATTTTAATATTATTCCTATAAAAATAAAATAAATAAATAAATAATAATAATTAATTAATTAATTAATTTTAATAAATATAAAATATATAAAAATAATAATAATAATAATTATTATTTTAATAATATTATTTATATAATAGTCCGGTCCGACCCTTTTTATTCTTAAGAAGGGATTTTATTTTATTAATTAATAATAATATATTAAAAATTATAAATAATTAATAATTCTTTATATTTATATATATATATATATATTTATATATTTATATATATATTTTAATAATATTATGATATATTTTATTTTAATAATATTTTTATTTTTATATATAAAATTATAATATTTTATTTTATAAATTATTTATATATAAATTATTAATAATAATTATTTTTTTTTATTTGGGATTTATATTATTATTATAAAGAATATAATGTTATTAATAACTGCAAAAAATATCTAATATATTATTATTTATAATAATAAATAATATTATAATAAGGATGCATATTATATATATATATATATTTCTATTTATATTAATATTAATATTAATATGTATATATAATAGATAAAAAGTAAAAATAAAAAATAATGAAATTAAAATTATTAAATATAATTTTATCAATAATAAATAAACTTAATAATAATAATAATATTATTATTAATAATCTATTAGATTCATTAATAAATAAGAAATTATTATTAAAGAATATATTATTAGATATAAATAATAAAAAAATAAATAATATAAAAAGAATATTAAATAATAATAATATAAACCCCGCGGGCGCCAATCCGGTTGTTCACCGGATTGGTCCCGCGGGGAATATTAATAATAAATTACAACATTTAAATAATATAAATAATTGAAATCTACAAATTTATAATTATAATAAAAATATAGAAATTATAAATACTATAAATGATAAATTAATTAATAAATTATTATATAAAATAATAACTTTAAAATTAAATAATATAAATATTAATAAAATTATTATAAGTAAACTTATTAATCAACATAGTTTAAATAAATTAAATATTAAATTTTATTATTATAATAATGATATTAATAATAATAATAATAATAATAATAATAATTATTATATAAATATAATAAATAAATTAATAAATATTATAAATAATAATATAAATAATAATTTATGTAATATTTTAAGTTATTATTATAAAAAAAAAGTAACTATTGAACCTATTAAATTATCATATATTTATTTAAATAGTGATATTTTTAGTAAATATATTAGTTTAAATGATATAGATAAATATAATAATGGTATCTTAACTAATTATCAACGTATATTAAATAATATTATGCCTAAATTAAATGATCATAATATTTCTATAAATTATATTAATAATATTAATAATATTAATAATAATAAATATAATAATATAATTAATTTATTAAATAATAATAATAATATTAATAATAATAATAATTATAATAATAATAATAATAATTATATTGGTAATATTAATAATATTTATAATAATATAACTATTGATAATATTCCTATAGATATTTTAATATATAAATATTTAGTTGGTTGATCTATTAAATTTAAAGGTAGATTAAGTAATAATAATGGTAGAACTAGTACACTTAATTTATTAAATGGTACTTTTAATAATAAAAAATATTTATGAAGTAATATTAATAATAATTATAAATTAAATTATATCCCTTCTAATCATAATTTATATAATAATTCTAATATTAATAAAAATGGTAAATATAATATTAAAGTTAAATTAAACTTTATTTAATATATATATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAATAAAATAAATATAATAAATAAAATAAATAAATAAATAATATATATATATATATAAATATATAAAATAATATTTACTTTTTATATATATATAATTATATATAAATAAAATATAATATAATATCATATAATTATATAAAAATAAAATTATAATTTATTTATATTAAAAATATTAATTAATTAATTTTTTTATATAATTATTATAATAATAATTTAATTAAAAATAAATATCAAATAAAATTATAAATTAATCCTACTTTTGGATCCTATTTATATTTTATTATTATAAATAATTATTATTGATAGTTAATTAAATAAAAATATATATATATATTACTCCTTCGGGGTCCGCCCCGCAGGGGGCGGGCCGGACTATTATAATTATTATTAATATATTAATTATTAAATTATATAAACCGCCCCCGCGGGGGCGGTTAGTTATTTATATTAATATATTTTATATTAATATATAATACTCTTTTTTCTATTATATTTTAATATATAATATTAAAAAAAATAAATAAAATAATATTCTTAATTTTTATTCTTTATCTTCTTTAACCAAACTCCTTCGGGGTTCGGTCCCCCTCCCATTAGGTTAGGGAGGGGGTCCCTCACTCCTTCGGGGTCCGCCCCCCCCCGCGGGGGCGGGCCGGACTATTTTAAATTTTAATTTAAATTTTATAAATATAATATTTAATTATAAATTTAATAATAATATATAAAAAATATATATATGGTTAATATATATAAAGATTATAATCTTTTTATTAAATAAAGGAAAATTTATTATATAATTTTTCTCTATAGTTATATATTTAAAACTTATTTTTTTTTTTTTATAAATAATAATTATAATAAATAATATTAATTATTTATTATATAATTAATTGGCCCCCATGCTGGGTTCCGGAACTCCTCCTTCTCGCGAGGTTAACACCTATTATATAACTATAACTATAACTATAACTATAATTATAATTATAACTATAACTATAAATATTCATTTTAATAATAATAATAATAATAATATTAATATAAATAGTCGAAGAATATATTTATTTATTTTAATATAAATAAAAAGTTTCAATTAATTTGAATTTGGAATTAAATTATTACTTCATATGGGGTTATGGATTTCGTTCGGAACTCCTCCCTCCTACCTCTATTTATTAATCATAAATCATAAATTATTATTAATTAATAATAATAATTTACTCGAGGTTCATACCTATTTTAATATTAATATTAATATTGATAAAATATATATTCACTAAAAAGTATATAATTTACTCAATTTATACTATAATTTTATATTTTTTTATTATAATTTAATTATTTCAAATAAAGTAATTATAATAATATATATCCTTTATTAAATATATATTAATTAATATATATATAAAAAGTAAATATTATTAATTGTATATAATTATAAATAATTAATATTTATTAAAATATATATAATTTATAATCCTCATATAATTAATATAATAAATAATATAACACAATGTAATTTAATTTAATTACATAATAAATTTATTATTATTATAATTATTATTTATTTATTTATTTATTATTATAAATTATAAATATTATTATAATTAAAATCAATTATTAATTATTAATGATAAATAATTAATGATAAATTATCAATAACCAATTAGATTATTTATCGATATTTAATTATATTATATTATATTATATTATATATATATATATATATTATATTATAAAATTTATTTATAAATATTTGTTTATTTATTTATTTATTGAATAACAATAGAATTAAATATTGTCAATAAATAATAAATAATGTTTAATATATATTATATTATATTAATATTAATATTATTATTATTTTTTTTATTATATTAATATAATTTATAAAAATATAAAATTATTATTTTTATTATAATTTATATATATATAATATATATATTTATTAAAATATTTTAAGAAAGGAGAAAAATAATTAAATTAAATTAAATTAAATTATTTATTATTATTATTATTATTTATATAATAATATATTATTTAAATATTTATATATTATTTTTATATTAATATTTATAGATGGGGGGTCCCTATTATTATTGAAAATAATAATTATTAATGGACCCCAGATAGCTTCTTGTTTATCATTTATATATATATATATATTATTAATTATTTTATTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTTTATAATATTATTATTAATTATTTAATTAATATTATAATCATATAATTTAATATTTTATTTAATTTTATTAAAATTTAATATATATATTTTTATTATTATTTAATTAATTTATAAATATAAAATATTCTTAATATTAAAAATAAATAAATAATAAAGTTTATAAATCATATATTATAATTATTTATTATTTTTATATTATATTAATAAAATATTATTATTATAAAAAAAAATAGAAATTTTATAATATTTTTATATATTTTTAATTATTATTATTAATATTTATTAAAGGAAATATAAAAACCGAAGGAATATTATAATTATAATTATAATTATTATTATATTTAATTTATTATTATAATAATAATTATAGTCTGCCCCCTCTTTATCTTTATTTTAAAGTTCCGGGGCCCGGCTACGGGAGCCGGAACCCCGAAAGGAGAAGGATATTTAATAATTTATAATATTTAATTCATATATATATATATATATTTTATTTTTTATATATATATTTAATATATTATATTTATATTTATATTATTATTATATTTATATTATATTATTTAATTATTTTTTAATAATATATTATTAATATTTTACCTTTTGATAAATAAAAATTTATTAAAAATTTTATAATAAGTATTAAAATATCATAAAAGTATAATATTTATATAAAATGTATAAATTTATAATCTTCTAATTAAATTAAATTAAATAAATAAAATAAAATAAATTAAACTCCTTTTGAGATTCACACCTATTTTATTAAAAATAGGTATTCACTTAATTAAATTAAATTAAATTAAATTAAATTATGGATAATTTATTTAATAAATATATATATTAATTATAAAATAATAGTCCGGCCCGCCCCGCGGGGCGGACCCCGAAAGAGTCTGCCCCTTTTTATTTAATATTTAATATTTAATATTTAATATTTAATATTTAATATTTAAAGAAGGATATATTTATAATTTATCATAATATTATTTAATAAGAAATTATTAATTAATTAATTAATTAATTTATTTATTGTTTATATTTATTAATATTAATATAATAAAAATGTAAAATACTTAATATTATTAATATTATTATATATAATATATATAATAATATATTATATTTATATCTCCTTTATTCCTTTTTCCCCCGATGGGGACTTATTATATTATATTATTATATATTTCTTCGATAACTTTATATATATTTTATTTTTATAAAAAAATATTTATATATTATTATTTACAATAATAATTATTAATAGTCCGGCCCGTCCCGCGGGGGGGAACCGAAGGAGTGCGGGACCCCGTGGGAACCGCATCCCTTTTTATTTTTAATTAAGAAGGAGTGAGGGACCCCGTGGGGACCGAACCCCGAAGGAGTCTTTTTTCTATTTATTAATAATAACTATAAATTATATTTAAAATAATAATTTACTTGTTATAATCTTAATGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAGTATATAAATATTTACTTGTTATAATTTATTATATATTTATAACCTCCTTCTTAAAATTATCTTTACTTTATAATAAAAATTAATATAATATAATCTGATAATAATCGAATTTTATTATATTTAATTTAATTAATAATAGACAATTATTATTATTATTTTACTTATTAATATTAATTTAGATTTATATATATAAATATTATTAATTTTATATTAATTTTTTATTAATTATTTATTTTTATATTCATATTTTTTATTAATATTATTTTTATTAATAACTTTTTAAATAATTATAAACTATATATTATTTATATTTATATTTATAATAAATGAAACAATTATAATAAAAATTACAATTACAATTATATTATAATTATGATTACAATAGGGTTAAACATTACCTGTGAACAACTGGTAATGTTTAACCCGTATTATTATTTATTATATTATATATATATTAAAATATTAATATTAATATTAATATTATATTATATTATATTATATTATATTATATTATATTATATTATATTTATAATTATATTATATTATATAATTTATATACTTTTATAATTCTTATTATTATTTATTTATTATTTATTTATTATTATTTAAATATATTATTATTATATATTAATAATATATATATTATTTTATATATTTTATTTAATATAAATTATTTATATTTTTATATTTTATTATGAGGGGGGGTCCCAATTATTATTTTCAATAATAATTTATCATGGGACCCGGATATCTTCTTGTTTATCATTTATTATTCTTATTATTTGGTTTTTATTTAATATTTATAATTTATTTTATACAATTTATTATATTGTTTATACCTTATTATTATTATATAATATATTATATTATTATAATAATTTAATTAATTATATTATTAAATATTAACTAATGTGTGCTCTATATATATTATTCATTCTAGTTTCTAATCACCCACCCCCTCCCCCTATTACTTATATATCTAGAAATAAAAATACATAACATATATTTTAAATATATATATATAATTATATAATAATTATTATATATAAAATATATATATATATAATATATATTTATAAAATAATAATAATAAATATTATTACTCCATTAGAGGTTTTGGTCCCATATCAGGAACCGAAACTATAATAATATATAATATTATAATAAAGATATTCTTATTTATAATATATTATTAAATAAATTAATAATAATTATAATATATATATATAATATATTATAATATATTTATTCGAGAACTTTTTATTTATTATAAAATAAAATATTTTATTTATTATTTAGTTTTTTTTTATTAAACATTTTATAAAAATATAAATGTTAATAATATTATGATTAATAAGTAATAATAAATTTATTTATTTTTATTAATTACTTCTTCGAGGTATTAGTATCAGTATCAGTATCAGTATCGTAAAAAACGGGTGACTAAAATATATATATATATAAAATTATAAATAAAAATATTATAATAATTTTAAATAAATAAATATCAATATATTATTATTATTTATATTATAATAAATATTATCTAATAATAGTCCGGCCCGCCCCCGCGGGGCGGACCCCGAAGGAGTCCGAACCCCTTTTTTATTTAATTTTATTTAAAGAAGGAGTGAGGGACCCCTCCCGTTAGGGAGGGGGACCGAACCCCGAAGGAGATAATTAGATATAATTATATTTTATTTTATATAAATTATATAATATTATATAATAATAATTATATAATAAGTTAATAATAATTATATAATAAGTTAATAATAATCATATCTCCTTTATAAATGAACTTTTATTAAATATATTTTATTAAATATTAAATATATTTTTTATAATATTAAATATATTTTATTAAAATATTTAATATATTTTATTAAATATTAAATATATTTTATTAAATATTAAATATAAATAAAGGTTTATATTATAATTCATTATTTATATCTTCTTTATAAATTAATATTCGTATTAGATCCTTATTTAATTTATAATCCTTTAAAAAACTTTTAATAAATATAATATAATATATATATAATTTTTATTATTTTTATATTATTTTTATTATTTAAATATATTATATATTTCATTATAATAATTATTTAAAAAGTTATTTAATAAATAATCTGATATTATATTTTATAATTAATTTTATTTATTTTATTTATTATATATATTATTATATATAATTAAAATTATAATTACAATTATAACTATAATTAAATTAAATTAAATTAAATTGGATTAAATTAAATTAAATTGGGCGCCAAGCCGGTTGTTCACCGACTTGGTCCCAATATAATATGAGATAATATAATATACTATATGATATAACATAAATATAATATATTATATGATATAACATAAATATAATATACTCCTTCGGGGTCCGCCCCCGCGTGGGCGGACCGGACTATATGAATATATTATTATTATAATTATAAAATTATAATAAATAAATAAAATTTCTTTAATAATTATTAATTAATATTATTAATTTATTTACAAATATTTTATTAATTTTTATTTTTATTAAATATAAATATATAAATATATATATATTTATTTATAATATTATTTATATTTATTATATATTATTATTAAATATATTTTTATTATATATCATTAAATATTAATATGTTATTATAGTGGTGGGGGTCCCAATTATTATTTTCAATAATAATTATTATTGGGACCCCGGATATCTTCTTGTTAATCAATTATTATATTATTTAATTTATTTTATTTCTTATTTATAATTTATATTATATAATTTATTATATTGTTAATACTCCTTCGGGGTCCCCGCCGGGGCGGGGACTTTTATTTATATTATTAATTATATTATATTATTATAATATATTTAATTGATTATATTATAAAATTATAACTAATGTATGCTTTGTATTTATTGAATAGTTTGGTTCTTATCACCCACCCCCTCCCCCTATTACTTCTCCGAGGTCCCGGTTTCGTAAGAAACCGGGACTTATATATTTGGTAATTAAAAATATAACTTATATAAATATTTAATAAATATATATTAAATATATTATTATTAATAATTTATTATTATATAAAAAAATAATAAATATTATTAATGATTTAAATTATATAAATATTAATTATTAAATAAATAATTATACTTTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTTTAAAATAATATATATATATATAAAAGTATTTTATAATAATTAGTTTAATTATTATTCTTTTTTTTTATTAAATATAAAATCATTTTAGGTTATTAATTTTTATTTATTAAAAATAAATTTTATAATTAATATTTCTCCTTTCTTAAAATAAATAATATTATTATTATAATTATTAATTAATGAATACTCTTCTCTTTTGGGGTTCGGTCCACCCTCCCGTATACTTACGGGAGGGGGGTCCCTCACTCCTTTTGAGACTTTAATTTTATAAATATAAATATAAATATAATAAGATGTTAACTCTTTTATAAATAAATAATAAATATAATTCTATTTTTAATAATAATATATAATATTTTTATAATAAAATATATAAATAATAATATTTATATATATATATATACTTTTTTTTATATAAGAATAATATATATAGTTCACATTGGAGGCGAGTAAAAGGAGATAAGAAATATAATATAATATAATAATAAAAATATAATGAATAATAATAATAAAAATTTATATAATAACAAAATAGTCCGACCGAAGGAGATGAGATTATTAATATTATTAAATAATAAAATGTATTAATTATAAAATATAAAACCTATAAATAATTTATAATATAATTTATATTATGATAATAATAATATATATATTATAATATTTTATATATATATTTATTATATTTATATTTATATAAAAAAGTGATATTGATTAATTAATTAATTTATAATTAATAATTATTAATATAGTCCGGCCCGCCCCCGCGGGGCGGACCCCGAAGGAGTCCGGCCGAAGGAGTTTATTATATTATATTAAATAAGATTTATAATATAATTAATATATATTTTAATAAATATAAAAGATTATATTATATTATAAAAAGTATATTTTATATTTATATTTTATTTATTATTATTATTATATATATAAGTAGTAAAAAGTAGAATAATAGATTTGAAATATTTATTATATAGATTTAAAGAGATAATCATGGAGTATAATAATTAAATTTAATAAATTTAATATAACTATTAATAGAATTAGGTTACTAATAAATTAATAACAATTAATTTTAAAACCTAAAGGTAAACCTTTATATTAATAATGTTATTTTTTATTATTTTTATAATAAGAATAATTATTAATAATAATAAACTAAGTGAACTGAAACATCTAAGTAACTTAAGGATAAGAAATCAACAGAGATATTATGAGTATTGGTGAGAGAAAATAATAAAGGTCTAATAAGTATTATGTGAAAAAAATGTAAGAAAATAGGATAACAAATTCTAAGACTAAATACTATTAATAAGTATAGTAAGTACCGTAAGGGAAAGTATGAAAATGATTATTTTATAAGCAATCATGAATATATTATATTATATTAATGATGTACCTTTTGTATAATGGGTCAGCAAGTAATTAATATTAGTAAAACAATAAGTTATAAATAAATAGAATAATATATATATATAAAAAAATATATTAAAATATTTAATTAATATTAATTGACCCGAAAGCAAACGATCTAACTATGATAAGATGGATAAACGATCGAACAGGTTGATGTTGCAATATCATCTGATTAATTGTGGTTAGTAGTGAAAGACAAATCTGGTTTGCAGATAGCTGGTTTTCTATGAAATATATGTAAGTATAGCCTTTATAAATAATAATTATTATATAATATTATATTAATATTATATAAAGAATGGTACAGCAATTAATATATATTAGGGAACTATTAAAGTTTTATTAATAATATTAAATCTCGAAATATTTAATTATATATAATAAAGAGTCAGATTATGTGCGATAAGGTAAATAATCTAAAGGGAAACAGCCCAGATTAAGATATAAAGTTCCTAATAAATAATAAGTGAAATAAATATTAAAATATTATAATATAATCAGTTAATGGGTTTGACAATAACCATTTTTTAATGAACATGTAACAATGCACTGATTTATAATAAATAAAAAAAAATAATATTTAAAATCAAATATATATATATTTGTTAATAGATAATATACGGATCTTAATAATAAGAATTATTTAATTCCTAATATGGAATATTATATTTTTATAATAAAAATATAAATACTGAATATCTAAATATTATTATTACTTTTTTTTTAATAATAATAATATGGTAATAGAACATTTAATGATAATATATATTAGTTATTAATTAATATATGTATTAATTAAATAGAGAATGCTGACATGAGTAACGAAAAAAAGGTATAAACCTTTTCACCTAAAACATAAGGTTTAACTATAAAAGTACGGCCCCTAATTAAATTAATAAGAATATAAATATATTTAAGATGGGATAATCTATATTAATAAAAATTTATCTTAAAATATATATATTATTAATAATTATATTAATTAATTAATAATATATATAATTATATTATATATTATATATTTTTTATATAATATAAACTAATAAAGATCAGGAAATAATTAATGTATACCGTAATGTAGACCGACTCAGGTATGTAAGTAGAGAATATGAAGGTGAATTAGATAATTAAAGGGAAGGAACTCGGCAAAGATAGCTCATAAGTTAGTCAATAAAGAGTAATAAGAACAAAGTTGTACAACTGTTTACTAAAAACACCGCACTTTGCAGAAACGATAAGTTTAAGTATAAGGTGTGAACTCTGCTCCATGCTTAATATATAAATAAAATTATTTAACGATAATTTAATTAAATTTAGGTAAATAGCAGCCTTATTATGAGGGTTATAATGTAGCGAAATTCCTTGGCCTATAATTGAGGTCCCGCATGAATGACGTAATGATACAACAACTGTCTCCCCTTTAAGCTAAGTGAAATTGAAATCGTAGTGAAGATGCTATGTACCTTCAGCAAGACGGAAAGACCCTATGCAGCTTTACTGTAATTAGATAGATCGAATTATTGTTTATTATATTCAGCATATTAAGTAATCCTATTATTAGGTAATCGTTTAGATATTAATGAGATACTTATTATAATATAATGATAATTCTAATCTTATAAATAATTATTATTATTATTATTAATAATAATAATATGCTTTCAAGCATAGTGATAAAACATATTTATATGATAATCACTTTACTTAATAGATATAATTCTTAAGTAATATATAATATATATTTTATATATATTATATATAATATAAGAGACAATCTCTAATTGGTAGTTTTGATGGGGCGTCATTATCAGCAAAAGTATCTGAATAAGTCCATAAATAAATATATAAAATTATTGAATAAAAAAAAAATAATATATATTATATATATTAATTATAAATTGAAATATGTTTATATAAATTTATATTTATTGAATATATTTTAGTAATAGATAAAAATATGTACAGTAAAATTGTAAGGAAAACAATAATAACTTTCTCCTCTCTCGGTGGGGGTTCACACCTATTTTTAATAGGTGTGAACCCCTCTTCGGGGTTCCGGTTCCCTTTCGGGTCCCGGAACTTAAATAAAAATGGAAAGAATTAAATTAATATAATGGTATAACTGTGCGATAATTGTAACACAAACGAGTGAAACAAGTACGTAAGTATGGCATAATGAACAAATAACACTGATTGTAAAGGTTATTGATAACGAATAAAAGTTACGCTAGGGATAATTTACCCCCTTGTCCCATTATATTGAAAAATATAATTATTCAATTAATTATTTAATTGAAGTAAATTGGGTGAATTGCTTAGATATCCATATAGATAAAAATAATGGACAATAAGCAGCGAAGCTTATAACAACTTTCATATATGTATATATACGGTTATAAGAACGTTCAACGACTAGATGATGAGTGGAGTTAACAATAATTCATCCACGAGCGCCCAATGTCGAATAAATAAAATATTAAATAAATATCAAAGGATATATAAAGATTTTTAATAAATCAAAAAATAAAATAAAATGAAAAATATTAAAAAAAATCAAGTAATAAATTTAGGACCTAATTCTAAATTATTAAAAGAATATAAATCACAATTAATTGAATTAAATATTGAACAATTTGAAGCAGGTATTGGTTTAATTTTAGGAGATGCTTATATTCGTAGTCGTGATGAAGGTAAACTATATTGTATGCAATTTGAGTGAAAAAATAAGGCATACATGGATCATGTATGTTTATTATATGATCAATGAGTATTATCACCTCCTCATAAAAAAGAAAGAGTTAATCATTTAGGTAATTTAGTAATTACCTGAGGAGCTCAAACTTTTAAACATCAAGCTTTTAATAAATTAGCTAACTTATTTATTGTAAATAATAAAAAACTTATTCCTAATAATTTAGTTGAAAATTATTTAACACCTATAAGTTTAGCATATTGATTTATAGATGATGGAGGTAAATGAGATTATAATAAAAATTCTCTTAATAAAAGTATTGTATTAAATACACAAAGTTTTACTTTTGAAGAAGTAGAATATTTAGTTAAAGGTTTAAGAAATAAATTTCAATTAAATTGTTATGTTAAAATTAATAAAAATAAACCAATTATTTATATTGATTCTATAAGTTATTTAATTTTTTATAATTTAATTAAACCTTATTTAATTCCTCAAATGATATATAAATTACCTAATACTATTTCATCCGAAACTTTTTTAAAATAATATTCTTATTTTTATTTTATGATATATTTCATAAATATTTATTTATATTAAATTTTATTTGATAATGATATAGTCTGAACAATATAGTAATATATTGAAGTAATTATTTAAATGTAATTACGATAACAAAAAATTTGAACAGGGTAATATAGCGAAAGAGTAGATATTGTAAGCTATGTTTGCCACCTCGATGTCGACTCAACATTTCCTCTTGGTTGTAAAAGCTAAGAAGGGTTTGACTGTTCGTCAATTAAAATGTTACGTGAGTTGGGTTAAATACGATGTGAATCAGTATGGTTCCTATCTGCTGAAGGAAATATTATCAAATTAAATCTCATTATTAGTACGCAAGGACCATAATGAATCAACCCATGGTGTATCTATTGATAATAATATAATATATTTAATAAAAATAATACTTTATTAATATATTATCTATATTAGTTTATATTTTAATTATATATTATCATAGTAGATAAGCTAAGTTGATAATAAATAAATATTGAATACATATTAAATATGAAGTTGTTTTAATAAGATAATTAATCTGATAATTTTATACTAAAATTAATAATTATAGGTTTTATATATTATTTATAAATAAATATATTATAATAATAATAATTATTATTATTAATAAAAAATATTAATTATAATATTAATAAAATACTAATTTATCAGTTATCTATATAATATCTAATCTATTATTCTATATACTTATTACTCCTTTTTAATTAAATTAAAAAGGGGTTCGGTTCCCCCCCCCCATAAGTATGATTATAATTATAATTATAATATAAGGGAGGGGTCCCTCACTCCTTATGGGGTCCCGGTTGGACCGAGACTCCTCCCTTGCGGGATTGGTTCACACCTTTATAAATAAATAATAAATAATAAATAAAGGTGTTCACTAATAAATATATATATATATATATATATATTATATTATAATATTATTTAATACTTAATATATTATATATTTTATATTTAATAAATAAAAAAAATATTAATAAATAATAATATTAATAATAAAGAAATTATAATTAATACCCTCTATATATAATTCTAATTAATTAAATTAAATATTTATATATAATAATCAATATATTATTAATTTAATAATTATTATAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTTTATAAAAGATATATTTTTATATTATATTATATTATATTTAATAAATATTACCTTTTTTTATTATTATTTTTATATATTATATAATATTATTAATTTTTATTATAATATTATTTACTTTTTTATTGGATTATTTATTTATTTATTTATTTATTAATTAATTAATTAAATATTTATTAATTAATATATATATTAAATATTAATATTTCATTAAAAAAAAGAGATATATGAATAATATATTATGTTATATTATATTATATAATTATATTATTTTTATAATATTAATAATTAAAAATAAGAACTTATTTAAAAATTATAATTATGATAATAAATTAATACTTTTTAATTTATAAAAATATAAATTTCTTTACATATATATATATATATATTATTATTATTTATATTAATCATAATTTTAATATTTATAATAAATTTATATAAAATCAATTATAATATTATATACTTTTTATATACTTTATAATCTTTATATCTTCACCCCCCCTTTTTTAATAATATATTATATTAAAAATATAATAATTTATATGATTTATTAATACTTTTTATATAATTATATTATTATTTTTTTTTATAGATGTTATATTATTTTTTATAATAATTTTTTTTTATTTAAATAAAATTTATAACTCCTTCTTAATTAAAGATAAAAGGGGTTCCCCCCTTAAGTATAAGTATAAGTATAAGTATAAGTATAAGTATAAGTATAAGTATAAGTATAAGTATAAGTATAGTATACGGGGGGGGGGTCCCTCACTCCTTCGTTAATTTATATATATTATTAATAATTATTTAATTTTTATTATTTATTTATATATAAAAATATTCTAAAATTATTAATATTTATAATAGAATAAATATTATAAAGTATAATTATAAATAATTAATTATTTAAATAATAATAATATATTTATTATTATATAATAAATATATTATAAATAATAGTTATATTAGCTTAATTGGTAGAGCATTCGTTTTGTAATCGAAAGGTTTGGGGTTCAAATCCCTAATATAACAATAATAATAATAAAATATTAAAATAAATATAATATTTATAAAAAATTTATTAATTTATATAAAAAATATATATATAAATAATAATTATAATAAAACATTTTATAATCAATAATTTAATAAATAATCTTCTTATTATAATATTATGTTTAAATATTACTCTTTATGAGGTCCAACAAACTAATAAGATATAAATATATATATATTATATAATAATAATAATAATAATATATTATTTAATATATTATCAAGAAGATAAATATAAATAATATTTTAATAATTTTAAATAAATCTAATTTATATATTAATAATTTAATAATCTTAATATTTATTATCATTATTTCATATTTATATTATATAAATATTTATTTAAATAAAAAATATTAAAGAGTTTATTTTATTTATTATAAATTATTTAATAAAATATATATAATAATATATAGAATAAAGATATAAATAATTATAAGTATATAAAGTAATAAAGGAGATGTTGTTTTAAGGTTAAACTATTAGATTGCAAATCTACTTATTAAGAGTTCGATTCTCTTCATCTCTTAAATAAATAATATAATAATAAAATATTATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGGAAGGAGATAAATATATATATATTTATAATAATTATATAATAAAGGTGAATATATTTCAATGGTAGAAAATACGCTTGTGGTGCGTTAAATCTGAGTTCGATTCTCAGTATTCACCCTATAAATAATAATAATAATATATTTTATTATTCTTAAATTTTTTATTCTTTATATTATATATATAATATTAATATTATTACTTTTTAATAACAAAATATTATAATTAATTGATATATATATATACCAAATATAATTAATTGAAATTAAATAATAAATAAAATATTTACTTCTTTATTAAAATTCTAATTAATTGATTCTTTTTATTGAATATTAAATTCTATTATAACTTATTAATTAATTAATTAATTAATTATAATAATAATAATATTTATTATTAATTATTAAATATTTATTATTATATATAAGATTTAATTTTAAATATTAATAAAAAAAGAATAAAATAAAATAAAATGAATAATATTTCTTTATCTCTTTCGATCGGACTCCTTCGGCCGGACTCCTTCGGGGTCCGCCCCGCGGGGCGGGCCGGACTATTTATTATTATAATATAATATTTAATCAATAGATTTATAATTTATTTAATGAATATTTTATAAATATATAAAACAATTCCTTTTTATTATTATAAATTTTTCATTATTTATTTATTTATTTATTTATTTATTCAATATATAAAAATAATTATAAAAAGATTATTAAAAATAATAATTTAATGATAAATATATATTATATATATTAATATAAAAATAATAAATATAAATATATTATGTAAATATTATATAAATTTGTATATGTATATATTATAATAATGTTATATAAGTAATAATATAATAAAATATTTTATGTAATTTATATATATTTATAATTATAAAATAAAAATATTATAAATAATAAAATTAATAATAATAATAATTTTAATAAAATAAATTATATATTTAATTTTATTATGAAGTTTATACTTAATATAAATTATATTTCCTTTATAAATTATTAATATATCCTTTTTAATTAAATAAAATAAAAATATTATAAATATTAATAATTAATTTTTTATTTATATTTATATATATATTAAAGATTAAATATATTATTAATACTAATTTATAATTTATTATTAATAAATAGTCCGGCCCGCCCCCTGCGGGGCGGACCCCGAAGGAGTTCGACTTAAATTATAATTTAATAATTTTTATTTATTAATAGTTTCGGGGCCCGGCCACGGGAGTCGGAACCCCGAAAGGAGTTTTATTATTAATATAAAAAGAGTAAGGATAATAATAAATTCTTTTAATTTATTTTTAATAAAATATAATTTTAAAATAGTTTTTATAGTCCGGCCCGCCCCGCGGGGGGGGGCGGACCCCGAAGGAGTTCGGTCTGGCATTAATTATAATAATTATATTAATATTATTATTATTTATTATATTATAATATATTTATTTATATTTTATAATATTAATAATTATTTTATATTTAATAAATATAATATATATATTATTTTTTTTAATAACTATCTAATTAATAGCTATTTTGGTGGAATTGGTAGACACGATACTCTTAAGATGTATTACTTTACAGTATGAAGGTTCAAGTCCTTTAAATAGCAATAAATATATATAATATATATAATATATATAAATGAGTCGTAGACTAATAGGTAAGTTACCAAAATTTGAGTTTGGAGTTTGTTTGTTCGAATCAAACCGATTCAATATTATAATATATATATTATTTATATATAAATATATAATTATACTCCTATTTTTATATTAATTAATTAATAATATATGATAATATAAAAATTATTGAATTATTAACTCTTATTAATAATAATAATAATCATAATAATAATATATATATATATAGTATATATATAAAAGTTTTATTATATTATATTATATTATATATTTATTTATATATAATTCTTATTAATTGAAAAAAGAATAATTAATAATCTTATTAAAAAAATAAATACTTTCATTTTATTTTATTTTATTTAATTTAATTATAATATATAAATATTAAAAAAAGGATATAAGTTTTTTATAAGATATAATATATATATATATTAAATATAAAGAAGTTAATATTTATATTTTAATTATAAAATGTTAATACTCCTTTGGGGACTTATTAATTAAATTATTAATTAATAATAATTTATGATTTATAAATAATAAATAAAGGAATAAGTATCAATTAATTAATATATTATATTTAATATTTTATATTTAATATTTAATATTTAATATTTTAAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGTAGTATTAATTATGGATAGTGAGGGTGGATTTAATCCTTTTGTTATGTTATTAATTAATTAATTAATTTATATATATAAAATATTTTAATTAATTTTTATATAAATATATATATATATATATATTAATAATAGTCCGGCCCGCCCCGTGGGGCGGACCCCAAAGGAGTAATATATATTATGTATAAACAATAGAGAATATTGTTTAATGGTAAAACAGTTGTCTTTTAAGCAACCCATGCTTGGTTCAACTCCAGCTATTCTCATAATATTATATATATATATTTCCCTTTCTAAAAATAATAATAATTATATATAATAATAATATAATTATATATATATATATTATAATAATAATAATAATAATAATAATAAATAATAATAATTATTTTTATTAATAATATTAATATATTATAATTATTAATAAATATTAATAAAAATAGCTCTCTTAGCTTAATGGTTAAAGCATAATACTTCTAATATTAATATTCCATGTTCAAATCATGGAGAGAGTAATTATATTATATTAATAATCCCCCCCCCATTTTTAATTAAATTAAGAAGTTTAATTTACTATTTAATAATAAATGAAATAATAATAATAGATATAAGTTAATTGGTAAACTGGATGTCTTCCAAACATTGAATGCGAGTTCGATTCTCGCTATCTATAATTAATATTAATATAAATTAATATCCTATAATTAATTAAATACAAAATTATATTAAAACTTATATTATATTATATTATAATATTATATTATTATTATATAAAAATATAATAATAATAATATTTAATTTTATTTAATAATAATATTTTATATAATAAAATAATCATATTTATAATATTTAATATTAATAATAATTTATTATAATAATTCTTTAATATACTTATTTATTATTATTTTAATAAATAAATATAATTCTTATAAATATATTATAACAAAATATATTATATTTTAATTAAATACAATATTATAAATATATATATATATATAAATATTTATATAAAAAAAAAATAAAAATATTTTAATAATTATTCTTTATAAATAAATAATGATAATAATAATTTATAATAATCTCCTTGTGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTATAATATATTTTAATATATTTTTTATTACTCCTCCTTTGGGGTCCGCCCCGCGGGGGCGGGGCGGACTATAATAATTTTTTATTGATAAAAAAGTATATATAATATAATTAATATATTTCTTTTTATATAAATTATAAATATTATTTTATAATAAAAAAAGTATATATAATATTATATATTTAATAAATAATATAATAATAATATAAATAAATATATATATATTATTAATATATTAAATTTTATAATAATAATTATAATAATAGTAGTAGGTATAAATTTTAATAAAGAGTTTTATTCCAATGGAGTAATAATAATAATAATAATAAAATAAAGGATCTGTAGCTTAATAGTAAAGTACCATTTTGTCATAATGGAGGATGTCAGTGCAAATCTGATTAGATTCGTATATTTATACTTAATATAAAAAAAATAAATAATAATCTTTTTTATTATTATATTTATTAATAATAAATTATTTTGTTATTATTATTAATTTATATTAATATTTTATATAAATTATTTATTTAATCTTTCATTATATATTTAATATATTATTAATATTAATTAATATTTTATAATAAATAAATAAAATAAAATAAATATTTTAATATAATACTCCTTCGGGGTTCGGTCCCCCTCCCATTAGTATAGTATAGGGAGGGGTCCCTCACTCCTTCGGGGTCCCCGCCGGGGCGGGGACTTATTTTTATATTTATTAATAATAATTAATTTTTATATAAATTTATTATTTCTTACAATATATTTATTACTATTATTTTTTAATAATCTTATATATAATATATAAAATATATATATATTATATATATATATAAATATAATATATATTATTATAAATATTTATAATCTTATTAATTAATTAGATTATATTATATTATATTAGATCATATTATATTATATTATATTATATTATATTATTATTATTAATATTTTTATTTTTATTTTATATTTAATAGTAAAAAATCATAATTTTATAATTTATTAATTATTATATAATTTCATTAATATATTTCTTCTTTTTATTTATTTATTTATTACTTATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAAAATAATATAAAAAATAATTATAATTTATTATAATTTATTAATTTATTAATTTATTAATTTATTTATTAATTTATTAATTTATTTATTATTATATTTTTTTTAATAAAGGAAAATTAACTATAGGTAAAGTGGATTATTTGCTAAGTAATTGAATTGTAAATTCTTATGAGTTCGAATCTCATATTTTCCGTATATATCTTTAATTTAATGGTAAAATATTAGAATACGAATCTAATTATATAGGTTCAAATCCTATAAGATATTATATTATATTATATAATATTATATATTAATAAATATTATTAATTAATTTATTTATTTATTTATTATTAAATAAAAATATTTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAATAATATAAAATATTATAATTATTTATATATTAATTATTAATTATTTATTATTTATTATATAAAAAGTATATAATTTTATATTTTAATATAGGGTTAATTAATTAATTATTAATTTTTTATAATAAGATAATAATATATTAAAAACTTATTATAAATTTATAAAATAATATTTATTTACTTTGATATTATTTTTAATCTTTCATTAATATATATTTTATTATAAGTAATAATATAGTTTAATTTAATTAATATAAATAAATTACATAAGAATAATATTATAATAATATTATATATTATATAAAGAAATAATAATTTATATTTTTATTTTTTTTATAAATAATATAAATATAAATATAATGGGGTTATAGTTAAATTTGGTAGAACGACTGCGTTGCATGCATTTAATATGAGTTCAAGTCTCATTAACTCCAATAATTATATTATATAATATATATATTAATAAATTATATATATATATATATATATAAATATTAAATAAATATTATATTAATAAATAATATAAATTATCTAATCGAAGGAGATATTTATAATATAATATAAATATTTTAATAAATTAATAAATATTATATTAATAAATAATTAATAAATATATAAATTATAATAAATTTTAATATTATTATATAAATTAATTAAATATAATAATTAATGAAATAGAAACTATAATTCAATTGGTTAGAATAGTATTTTGATAAGGTACAAATATAGGTTCAATCCCTGTTAGTTTCATATTATATATCATTAATATATAAAATATAAATATATATATTATAATAATAATAATAATAAATATAAATATAATTATATATATATATATATATAAATAAATAATTATTTAATTTATAATAAATATATATAGTTCCCGCGAAGCGGGAACCCCATAAGGAGTTTTATTATTAATTATATTTAATAAATATTAATTATTAATTTTATATTTATAAATAAATTTATTACTCCTTCTTAATTAAGAATAAAAAGGGATGCGGTTCCCATGGGGTCCCGCACTCCTTCGGGGTCCGCCCCCTCCCCTGCGGGAGGGGAGCGGACTATTTTATTAAAAATATTATAATTAAATAATAATATAAATAATTTATAATATAATAATATATACTTATAAATAATATTTAAATCTTATTATTAATTTATAAATCATAAATTATTATTAATAAATATCTCTTTTAGATAAGATAAATTGAACTTATATTTATATTATATATATATAGATATAAATCTTAAATAGAGTAAATATATTATAATAATTATATAAATATATATATATTATATTAAGATAATAATATATATATATATTAATATATAAGGAGGGATTTTCAATGTTGGTAGTTGGAGTTGAGCTGTAAACTCAATGACTTAGGTCTTCATAGGTTCAATTCCTATTCCCTTCATAATTTATTATTAATTATATATTATTATAAATCAAATCCATTGAAATTAATATAATCCAATGAATAATTAATTTAATACATAATTTAATATATAAAATTATATATATATATACTTTATAAAAAAAAAAATTATATAATAATTATATTAATATATTTATATATATAAATAAATAAATAAATAATAATAATTATAATTATAATTATAATTAATTAATTAATAAATAAATAATAATTTATATTATCTTTATAATATATATATATACTTTTATAAAAAAAATATATAAATAATTCTAAAATGTATATTTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTATTAATAAAATTAATAATAAAATAATTATTATCTGTATTTAATAAATTTAATTATAGAGTTATATTTCTATATATTTATATATTTATTTATTTATTCTCCTTCCGGAACTAATAAAATATATAAAATAAGGGTTTTTATTTATTTAATTAATATATATTTATTCTTTTATATAATATGTCCTTATAGCTTATCGGTTAAAGCATCTCACTGTTAATGAGAATAGATGGGTTCAATTCCTATTAAGGACGATAATAATATATATATATTTTAATTTATATATCATATATATATATATATTAAAGAAAATAATATAAAAAGTATGTATTAATAATAATAATAAATAAATAATAATAAATAATTTTATTATATTATATTATATTATATTTATTGATATATTTATTGATATTTATTAATTTAAGATTATTCATTAAATATATAATTATTAATAATTTAATATATTTTATAATTTTTATTATATTTTATGTAAGAAGAAACTATTTTATATATTATATATATATATATAATTTTTATAAAATGATAAATTTTATATTATAAATATTATTAAAATATTTTTATAAATATTTAAATTATTTATAAAAAGGTATATAATAATAATTATTAATATTATATTATATTATATATTTATTTATATTATATATAATAATATATTTATATATATATTAATTAATAAATTAAATAAGTATCTATATTTTATATTATATTATATTATTTTATTTTATTAATTCCGGAAGGAGAATAAAAAGTATTCTAAAGAAATTATATATTTATTATTTTTATTAATATGTTATAAATTAATAAAAAATAAATATGTATATATAAATTATATTTATTATGTTTAATTATTTATAATTTATTATAATATATAGTATAAGATATCTTATTTATATTTATATATAATAAAGAATATTATTAAACTAACACCTATATTATATATATTATATTATATAATATTATATATATATTAATTACTAAGAATAAATTTATAATTAGATAATATTTATATTTATTTATTTATTTAATTAACAAATATATTAATATTTTTAATTAATTAATAATACCTTTATATATATATATATATATATATTAATTTTAATTATATAATTATCTTTTTTATTAATAATTATAAATATATTATATATTTTATATAATAAGATTATAATTTTATAATTATTTTATTTTTTATTAAAAATTATTATTATTATAATTATTATATTATAATTATAAATTATTAAAGAATATATTTATTAATATTTTAATAATTAATATCTTTTATTTATATTTATAAAATAAGGTATAAATATTGATAATAAAGAGTAAATATTGTATTAATTATAATAATAATTATAATTAAGGAGCTTGTATAGTTTAATTGGTTAAAACATTTGTCTCATAAATAAATAATGTAAGGTTCAATTCCTTCTACAAGTAATAATGATTATAATATTTATATATATTAAAATAATATTAATAAATAATTACTCCTCCTAGCAGGATTCACATCTCCTTCGGCCGGACTCCTTCGGGGTCCGCCCCGCGGGGGCGGGCCGGACTATTTTATTATTATTAAATAGATGTTCATTAAATAATTATAAATATAATTTATCTTTTAAATATATATATATAATATAATATTTAAATATATATTATAAATAAATAAATAAATAATTAATTAATAAAAACATATAATGTATATTTATCTATAAAAAATATTAATTAAATTAATATATTATTACAGTTCCGGGGGCCGGCCACGGGAGCCGGAACCCCGAAGGAGATAAATAAATAAATAAATATAAATAATTCTTCTTCTTTAAAATTAAATAAAATAAAATAAAAAGGGGGGCGGACTCCTTCGGGGTCCCGCCCCCCTCCGCGGGGCGGACTATTTTATTTTTAAATATATATTATATTAATAATATAAATATAAGTCCCCGCCCCGGCGGGGACCCCGAAGGAGTATAAATAAAAATTAATAATATATTATATATATATTATATTAATAATAATAATAATAATAATAATAATAAATAATAACTCCTTGCTTCATACCTTTATAAATAAGGTAATCACTAATATATTATAATAATAAAAATTATATATATTATATATAATCTAAATATTATATATTTTAATAAATATTAATATATATGATATGAATATTATTAGTTTTTGGGAAGCGGGAATCCCGTAAGGAGTGAGGGACCCCTCCCTAACGGGAGGAGGACCGAAGGAGTTTTAGTATTTTTTTTTTTTTAATAAAATATATATTTATATGATTAATAATATTATATATATTATTTATAAAAATAATATATAATTTTAATTATTTTTAATAAAAAAAGGTGGGGTTGATAATATAATATAATATTTTTTATTTTAATTTATAATATATAATAATAAATTATAAATAAATTTTAATTAAAAGTAGTATTAACATATTATAAATAGACAAAAGAGTCTAAAGGTTAAGATTTATTAAAATGTTAGATTTATTAAGATTACAATTAACAACATTCATTATGAATGATGTACCAACACCTTATGCATGTTATTTTCAGGATTCAGCAACACCAAATCAAGAAGGTATTTTAGAATTACATGATAATATTATGTTTTATTTATTAGTTATTTTAGGTTTAGTATCTTGAATGTTATATACAATTGTTATAACATATTCAAAAAATCCTATTGCATATAAATATATTAAACATGGACAAACTATTGAAGTTATTTGAACAATTTTTCCAGCTGTAATTTTATTAATTATTGCTTTTCCTTCATTTATTTTATTATATTTATGTGATGAAGTTATTTCACCAGCTATAACTATTAAAGCTATTGGATATCAATGATATTGAAAATATGAATATTCAGATTTTATTAATGATAGTGGTGAAACTGTTGAATTTGAATCATATGTTATTCCTGATGAATTATTAGAAGAAGGTCAATTAAGATTATTAGATACTGATACTTCTATAGTTGTACCTGTAGATACACATATTAGATTCGTTGTAACAGCTGCTGATGTTATTCATGATTTTGCTATTCCAAGTTTAGGTATTAAAGTTGATGCTACTCCTGGTAGATTAAATCAAGTTTCTGCTTTAATTCAAAGAGAAGGTGTCTTCTATGGAGCATGTTCTGAGTTGTGTGGGACAGGTCATGCAAATATGCCAATTAAGATCGAAGCAGTATCATTACCTAAATTTTTGGAATGATTAAATGAACAATAATTAATATTTACTTATTATTAATATTTTTAATTATTAAAAATAATAATAATAATAATAATTATAATAATATTCTTAAATATAATAAAGATATAGATTTATATTCTATTCAATCACCTTATATTAAAAATATAAATATTATTAAAAGAGGTTATCATACTTCTTTAAATAATAAATTAATTATTGTTCAAAAAGATAATAAAAATAATAATAAGAATAATTTAGAAATAGATAATTTTTATAAATGATTAGTAGGATTTACAGATGGAGATGGTAGTTTTTATATTAAATTAAATGATAAAAAATATTTAAGATTTTTTTATGGTTTTAGAATACATATTGATGATAAAGCATGTTTAGAAAAGATTAGAAATATATTAAATATACCTTCTAATTTTGAAGAACTACTTAAAACAATTATATTAGTAAATTCACAAAAGAAATGGTTATATTCTAATATTGTAACTATTTTTGATAAGTATCCTTGTTTAACAATTAAATATTATAGTTATTATAAATGAAAAATAGCTATAATTAATAATTTAAATGGTATATCTTATAATAATAAAGATTTATTAAATATTAAAAATACAATTAATAATTATGAAGTTATACCTAATTTAAAAATTCCATATGATAAAATAAATGATTATTGAATTTTAGGTTTTATTGAAGCTGAAGGTTCATTTGATCTATCTCCAAAACGTAATATTTGTGGTTTTAATGTTTCACAACATAAACGTAGTATTAATACATTAAAAGCTATTAAATCTTATGTATTAAATAATTGAAAACCAATTGATAATACACCATTATTAATTAAAAATAAATTATTAAAAGATTGAGATTCATCTATTAAATTAACTAAACCTGATAAAAATGGAGTTATTAAATTAGAATTTAATAGAATAGATTTTTTATATTATGTTATTTTACCTAAATTATATTCATTAAAATGATATAGTCGTAAAGAAATTGATTTCCAATTATGAAAAACACTTATAGAAATCTATATAAAAGGTTTACATAATACACTTAAAGGTTCTAATTTATTAAAATTAATTAATAATAATATTAATAAAAAAAGATATTATTCTAATTATAATATTTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCCGGAACTAAAAATATTATTGATGATGTATTAAATATAAATCTTATCTATAATTATAAATTACCATATCGTATAAATAGTGATATTCAACGTTTAAATTCTATAAATAATAATAATACTAAATTTATTAATGTTGGAGTATTTGTTTATGATTTAAATAATACATTAATTATAACATTTACTGGTTATAGACCAGCAGCTCTTTACTTTAATTGTTCTCCTTTTCGGGGTCCCGACTGGGGCCGGGACTAAACATGAAATTGCTAAATATATTAAAAATGGTAATGTATTTATAAATAAATATATTTTAAAAAATATTTTATTAGATTAATTATTATTTTTACTTCTTCTTAAAATTAAAAAAGGAGACTTTTTTATATTTATATAAATTATATATAAATTATTCTTTTATTATAAATATATAAAATTATTTTCTTTTAATTATTTTTATAATTAATTAATTCTTCATGGCTATAGCCATAACTTTTAATAATATTCTTTTATTCTTTATTATTATATATATATATATTTATTATTTATTATTATAGAATTTATATTTATAAAAATATTAATATTTTATTTAAAATAAATAATGATTAATTTATAAAATATATATTAATTAAGTTTCGGGTCCCGGCTACGGGACCCGGAACCCCCGAGAGGAGTTATTATATTTATAATTAAATCTTTAAATAATATATCTTAAATTATTATATTGATATTAATATTATATTGATATTAATATTAAATATATATTTAATATTTAGCTTATTATTTTATAAAATTATATTTATATATTATAATATAATTAAATATATTATAAATTTAATAATTTAATAAAAATATTCTTTTTATAATTATTATAATAATTAAATAAATAATAATAATAAGAATAATTAATGTATAATTTTTTTATAAATATTATATATTTTTATATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAATATTAATAAAATAAAATAAAATTATAATATAATTAAATTATAAGAATTATATTTACTCCTTTTATAATTTATATTTATAATATAATATAATATAAAATAAATATAATATAATATAAAATAAATATAATGTAATAGGTATTCACTCCTCTTTGGGGTTCCGATCCCCCATACGGATACGGATACGGATACGAATACGGATACGGATACGGATACGGGGGGCCGTCCCCCAGAACTTAATATTATATCTTAAATAATTAATATAAATATAATATATTATTTAATAATAATAATAAATAAATAAATAAATAAATAAATAAATTAAATAAATAATAATATTATTATAATTACTTTTTAATAAATAATATTAATATAATATTATATTAGTATTATAAATAGACTTTTTATTATTTTATATATAATATAGTCCGGCCCGCCCCCGCGGGGCGGACCCCGAAGGAGTAATATATTATATAATTATTATTTTTAATTATAAATAAAATATAATTATTATTTATTATATAATTTATATAAATATATATATATATTTATTATATATATAAATATAAATATAAATATAATAATTAATAATATTAAAGTTTTATATATATTAATATATTATAAAAGGTTTATATATATATATAATAAGATAAGTAATAAATTAATTAATTAATAATATAAAAATATATATTATATATTATGTTTTATTTATATATATATATATATTATGTATTATTATATAAATATATATATATATTATATTATAAGTAATAATAAGTATTATATTATATATAGCTTTTATAGCTTAGTGGTAAAGCGATAAATTGAAGATTTATTTACATGTAGTTCGATTCTCATTAAGGGCAATAATAATAATATATTAATTAATAATTAATTTATAATAAATATATTATAATAATTAATATATATATATATAATATATTTAATACAAAGAAAATATATATTATATCTCTTATTTATTTATTTATTAATATTTTAATAAATATAATATTATAAAAAAAAGTTTATATATTTAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGGTAGGAGAAATATAAATATAAATATAAATATAATATAAGTTTGGTATTCATTTAATTATATTATTTAATTAAAAATATTCTAAATAAGAATAAATATCAATAAAGGAGTTATAAATATATATATATATTAATATATATATAAAAATATATATTATTATTAGTTCCCGCTTTGCGGGAACCCCGTAAGGAGTGAGGGACCCCATGGGAACCGAACCCCTATTTAAGAAGGAGTTTTATTATAATAAAATTTATATATATTTAATATATAATTATAAAAATATTATATAATAAATAATAAATAATTATTAATAATAAATAAATATAATAATAATATTATAATAAATTTATAAATGATTATAATAAATTTATATTAATTTTTTATTTTGTAAATACTAAGATTTGAACTTAGATAATATGCACCTAAAAACATACATTTTACCATTAAATTATATTTACCTTATTAATTATATAAAATTTATTAAATATATAATATATTAATTATATAAAAATTATTAAATAAATATATAATATATTATATATAATTTATAATATATATATTATAAATATTATTATATATAAAATATAATATACTACTTATAAAAATATATATATATATATAAATATATATATAAATAAATATTTTATATATTAAATTAAATAATTATTAATAAATTTAATTATAAAGTATAATTTTCAATAGGAATATTTATAAGATTATAATAATTATATGAATTATTATAATTATATATATATAAATAAATAAAATAATAATTATAATAATTAATAAGAGTTTTGGATATATATCTGTGGAGTATATATTTTATAAAGGAGATTAGCTTAATTGGTATAGCATTCGTTTTACACACGAAAGATTATAGGTTCGAACCCTATATTTCCTAAATCTAGATATAATATTATATCTATCTTAATATAATAATATTTATTTATTAAATAAAAAAAAAATAAATAATATTAATTAATATAAGATTCTTTTTTAATTATAATAATAAATAAATAAAAAGAAGATATTATCAATGATTTATATTAATAATAAATATAAATAATAAAAAATATATATAATATAATATAATAAATATATTTCCTTTAATATTAATAAATTAATAATAATAATAATAATAATAATAAAATATTTAAATAAATTATATTCAATACAAATTAATTATTTATATTATTAATAATTGAATAAATAATCCGGTCGAAAGAGATATTAATTCGATTATATTATTTATTTAATTATATTTAATTTAAATATATAAATTAATATATATATATTGAATTATATATAAATTTATTTTATAATTTTATAAATAATATATTATTATAAATATTTAATATAATTTATATTATTATTAAATAAAAGATTTATTAAATTAATATTATTATTTAATTTTATTATATAGTTTAAGGGATAATATTTTATTAATATTTTTTTTATTTATTTATTTAATTATATTATATATATAATATATATATAACAATAAATTTATGACACATTTAGAAAGAAGTAGACATCAACAACATCCATTTCATATGGTTATGCCTTCACCATGACCTATTGTAGTATCATTTGCATTATTATCATTAGCATTATCACTAGCATTAACAATGCATGGTTATATTGGTAATATGAATATGGTATATTTAGCATTATTTGTATTATTAACAAGTTCTATTTTATGATTTAGAGATATTGTAGCTGAAGCTACATATTTAGGTGATCATACTATAGCAGTAAGAAAAGGTATTAATTTAGGTTTCTTAATGTTTGTATTATCTGAAGTATTAATCTTTGCTGGTTTATTCTGAGCTTATTTCCATTCAGCTATGAGTCCTGATGTACTATTAGGTGCATGTTGACCACCCGTAGGTATTGAAGCTGTACAACCTACCGAATTACCTTTATTAAATACTATTATCTTATTATCTTCTGGTGCTACTGTAACTTATAGTCATCATGCCTTAATCGCAGGTAATAGAAATAAAGCCTTATCAGGTTTATTAATTACATTCTGATTAATTGTTATTTTTGTTACTTGTCAATATATTGAATATACTAATGCTGCATTCACTATCTCTGATGGTGTTTATGGTTCAGTATTCTATGCTGGTACAGGATTACATTTCTTACATATGGTAATGTTAGCAGCTATGTTAGGTGTTAATTATTGAAGAATGAGAAATTATCATTTAACAGCTGGACATCATGTTGGATATGAAACAACTATTATTTATCTACATGTTTTAGATGTTATCTGATTATTTTTATACGTAGTCTTCTACTGATGAGGAGTCTAAGGCTATAGAATTATATATCTAAATGATTAATATATATATTATTAATAATTAACAATAATTAATATATTATAATTTATATATATATATTTTATATTATTATAATAATATTCTTACAAATATAATTATTATATATTATTCCTTCAAAACTCCTAACGGGGTTCCCGCGAAGCGGGAACTAATAATAATATAATCATTATACTCTTTTTTCATTTACCTTTTATAAAGATAATTAATAAATTTATTTAATATTTATAAAAAAAAAAATATAATATTAATATAATATAATATAATAATGTAATTATTTATATTTTTATATTCCTTCGAGGTCACCGCCTCACCTCCAGCGGGACTTTTTTAATATGATATAATATAATATAAATATTATTAATTTAACTAATATATAAATTCATATATATATATATATTATTAATATTATTTTATAAAAAATATTTTTTATTTGATTATTATTAAATATTATATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAAATATTAATATATTATAAATATACTATTTATGTAATTATTTTTTGAAGTGAGCACCTATTTTATATATATTTTATATATATTTTATTATATTTTATTAAAAATAGGTGTGAACCTCCATGAGAGAGGAATGAATACCTATTTTATAAAGTATATTTTATATTCTATATATTATAAATATGAACCAAAAAAAGGAGTTTAAAATTTAATTAAATTTAATTAATTGAATTTCTTTATTATTATTATCATAATTATTAAACCCTTTATTAATATAATAATATATTATTTATTATCAAAATACCTACCCTTTTTATAATTTATATCTTTAATAATATAATTAAATATAAAATGTTTATTAAATATTATATAAAAATAAAAATAAAAATATATATATATATATAAATGATAAATAATAAGGAATTCACACTTATATAAATTTAAATATAAAGTCCCAAAAGAAGTATTCATTAAATAAATTATCATTAATTAATTATAATAAACTTATTTAATATTATTAAAGATTAATTTATAATAATAATTATTATTATTATTATTAATATTAATAAAATATATAAATAATTAAATAGTTCATATATTAAAAAGAATTAGAATTAAACTTTAATAAGTGTATTTAATATATAGAATATTAATAGAATATTTATTCTATTTATATATATATTTATATATATATATATTAAATAATATTATTTATATTATATTTTATATATATATTATTAATATAAAAAGTATATTATATGTATTATATATATTATATATTATATATTTAATAATATATTACTCCTTTGGGGTGGGTCCGCCCCACGGGGCGGGCCGGACTATTATAATTAATAATTTTATAAAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAATAAATAATTATATATCTTCTTCTTAATTAAATTAAATTAAATTAAATTAAATTAAATTAAATTAAATTAAAAAGGGGTTCGGTCCCCCTCCCTAACGGGAGGGGGTCCCTCACTCATTCAAACTATAATTTAATATATTATGATATTATTTATAATTTATAATATAATGTATAATATTATATTATAAATATTATATAAAAATAAAATGATATATATAATAATAATAATAATAATAATAAAAAAATAGAAAAGAATAATTTTTATTATTTTAGTATATATAAGAATTTAATAAGTTATATTATTGCGGACACCGTTACGCGGAGTGGGGACTATTATATTTTACCTATATATATTAATATTATTATAATTTCCTTCTTTAAAAGAAAAAAGGAATTCGAGAACTTATTATTATATTAATATATTAATAATAAATAATAATAAATAATAAAAAAGTAAATAATTATAAATTATATAAAAATATAATTTTATTATTAAGAAAGGAGTTTAAATATAAAATATAATATTATCATTAAGTTCTAATAAAGGTATATAATGAAGATCTATTAGAACCTAAAAAGAATATTAATATATCTATTATAAAATAATAATAATAAATATAAATATAAAAATAAATTGTAATATTTATAAATAATAATAAAAAATAAATAAGGAATATATTAATTATTAATAATAAATAAATTATATTAAAATATAATATTATTATTAAATTAAAGAATTATATTAAATATATTTATTAAAATTTTATAAATAAGTTAATATTTTATTAAATAATATTTATAAATAATAAAAAAAAATAAGTATATAATTATTAATATATTAATTTATTATGTTATATATTTATATATTTCAAATATATAAGTAATAGGGGGAGGGGGTGGGTGATAATAACCAGAATATTAAATAAATACAGAGCACACATTTGTTAATATTTAATAATATAATCAATAAATATATTATAATAATATAATATAATTAATAATAGATATAAAGTATAAACAATATAATAAATTATATAAAATAAATATAAATTAAAAATAATAACCAAATAATTAATATAATAAATGATAAACAAGAAGATATCCGGGTCCCAATAATAATTATTATTGAAAATAATAATTGGGACCCCCACAATAGAATAAAAAATAAAAAGAATTAATAATATATAAATAATATAAAATATATTATATATATATATAATATATATATATATATAATAAAAAAAAATATATAATATAATATATATATATAAAATAATAAATTATATATATATATATAAAATAATAAAAAATAATAATCATATGAATTTTATAAATATAATTATTATTAATAATAATAATAATAATAATAAAGTCCGGTCCGCCCCGCGGAGGGGGCGGACCCCCGAAGGAGTGCGGGACCCCGTGGGAACCGCATCCCTTTTTATTCTTAATTAAGAAGGAGATAATAATTTATAAAAATTAATATTTATTTTATGTAATATTAATATTAATATTAATATAATATAATATAATATAATACGGATTAAATATTACCAGTTGTTCACAGGTAATATAAAATCCTATTGTTTCACCTATTATTAATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGAAAGGAGAATAAGTATATATAATAAAATTTAATAAAAAAAAATAATTATATAATAAATATATATATTATAATATTATATAAATATAAAATATAATTGATATTAACATTATATAATTAATAATATAATCAAATAATATAAATATAATATAAAAAGTTTTAATTATTAAAATTATATAAATATTATTTAATAAAAATAAAAATAATAATAATAATAATAATAATAAAGTCCGGTCCGCCCCCTCCGCGGAGGGGGCGGACCCCGAAAGAGTGAGGGACCCCCCCGTATACTTACGGGGGGAGAACCGAACCCCTTTTTTTATTTAAAGAAGGAGATAAATATTTATATCTTTATTTATAATTATATATAAATAAAAGTTTATTAAAATTTATAATAATAATATAAAAAAGTATATAATAAATTTATTATAAATAAATAAATATTTAGTAATAATATTTAATAAAATTATAAATATTATAAATAAAATATTAATAATAAATAATAAATATATAATATAATATAATATAATAAATTAATAACAATAAGATATCCGGGTCCCCTAAATAATTATTATATAAAATAATAATTGGGACCCATACATATAAATATAAAATATTTTAATATTTATATATAAATAATAATAATATATATTTATATTATATTATAATATAACCCTTTCCAATTAATATTAATATTAATATTAATTACTTCCTTAAAAAAATAATAATTAATTAATTGATTTTTATATTAATATAAAAAAGTTAATATATATATTTATATATAAATAATATAAATTAATATAAAGATAATAAGTCCCCGCTTTCAGCGCAGTGAGGGACCCCCTCCCGTAAATATACGGGAGGGGAGACCGAACCCCAAAGGAATAATAAATAATAGTATGTATTTAAATAAATATTTAATATACTATTTTTTTTTATTATTTTTATAATATATTTATAATAATATATTTAATTATAATTTATAAAAAAGAGATATAATATTTTATTATATATAATATTAATATAATACAAATTAACATTATTTAATTATTATTAATAATATTTAACTTTATTATTATCTTCTACGGTTGGACTCCTTCTTAAAAAGGGGTTCGGTCCCCCTCCCATTAGGGAGGGGTCCCTCACTCCTTCGGGGTCCGCGCCCCCCGCGGGGGGGGGCGGACCGGACTATTATTACTATTTATTTATTAATAATAAATAATAAATTATAAAGTCACTGAAAGAGTGAGGAATTTTCCTTTTCCCAAGGGAAAACCCCAAAGGATAATATAAATATTATAAAATTTTTATTAAATAATATAAAATTCAATAAAATAATTTTAATTAATTAATTAATTAATTAATATAAAAATAAATATTTTTAATTAATATTAATATTAATAGTTCCGGGGCCCGGCCACGGGAGCCGGAACCCCGGAAGGAGAAATATAAATATAATAGTATAGTATATAGGAAGTTAATAATAATATAAATATTATATAATATATATATGTATATATATTATATTATATAATTAATTTTCTCCTTTTGTATTTACATCTTAATAAAATATAAAATATAAAATGTTATTAACAATAAAAATTATTAATCTTTATAATATTAATAATAGTAAATTTATTTATATATCTCCTTTAGGATGGACTCCTTCGGCCGGACTCCTTCGGGGTCCGCCCCGCGGGGGCGGGCCGGACTATTTTTATTTTTTTTTTAAAAAATATTAAATATTATAAATATATTATAAATATATTATAAATATGTTATAAATATATTATAAATAGAATATAATATAATATTATATATTATAATGATAAAGATTATATATATTTTCTTTTTTTTTTTATTTATTATTTTTAATAAGTAAAAATTATATTATATATATATATATATTAGATTTTATAAGTAATATAATATAAGTATTAATATATAAATGCAATATGATGTAATTGGTTAACATTTTAGGGTCATGACCTAATTATATACGTTCAAATCGTATTATTGCTAATAAATTAATATATAATATTTATAAAAAAGTATAATAAAATATATTATAAGAAGAATATATTATATAATAATTATATTAATAATATTAATAAATAATATATAAATAATTATAAAAAAGTATATAATATTAATCAATTAATTAATTAATAAATATAAATAATATATTAATTTTTAATTAATTTGAATAAGATATTTATATTATTAATAGGAAAGTCATAAATATATAAATTATATTATATAATTAATATAATAATAAAATAAATTATATATTTTATTTATAATATTATTTCTTTATAAGATAAAATATTATCTGATGAATAATTAGATTGAATAATATTTATAAAGAAATATATATAAAAAGTCATTATATAAATTTAATTATAATTTAAATAAATTTTATATAAATTAATATAATATTAATAAAGTAATTAGTATAAATAAATAATATGAAAATAAAACTTAATAAATATATAAATATAGTCCGGCCCGCCCCCCCGCGGCGGGCGGACCCCGAAGGAGTGAGGGACCCCTCCCTAATGGGAGGGGGACCGAACCCCTTTTTAAGAAGGAGTCCATATATATATATTAATAAAAAAAAGTAATATATATATATATATTGGAATAGTTATATTATTATACAGAAATATGCTTAATTATAATATAATATCCATA diff --git a/resources/ref/sacCer3-mito/sacCer3.chrM.trnas.adapted.fa b/resources/ref/sacCer3-mito/sacCer3.chrM.trnas.adapted.fa new file mode 100644 index 0000000..b0173f5 --- /dev/null +++ b/resources/ref/sacCer3-mito/sacCer3.chrM.trnas.adapted.fa @@ -0,0 +1,26 @@ +>mt-Sup-TCA +CCTAAGAGCAAGAAGAAGCCTGGNAAGGATATAGTTTAATGGTAAAACAGTTGATTTCAAATCAATCATTAGGAGTTCGAATCTCTTTATCCTTGCCAACCTTGCCTTAAAAAAAAAA +>mt-Ser-TGA +CCTAAGAGCAAGAAGAAGCCTGGNGGATGGTTGACTGAGTGGTTTAAAGTGTGATATTTGAGCTATCATTAGTCTTTATTGGCTACGTAGGTTCAAATCCTACATCATCCGCCAACCTTGCCTTAAAAAAAAAA +>mt-Thr-TGT +CCTAAGAGCAAGAAGAAGCCTGGNGTTATATTAGCTTAATTGGTAGAGCATTCGTTTTGTAATCGAAAGGTTTGGGGTTCAAATCCCTAATATAACACCAACCTTGCCTTAAAAAAAAAA +>mt-His-GTG +CCTAAGAGCAAGAAGAAGCCTGGNGTGAATATATTTCAATGGTAGAAAATACGCTTGTGGTGCGTTAAATCTGAGTTCGATTCTCAGTATTCACCCCAACCTTGCCTTAAAAAAAAAA +>mt-Leu-TAA +CCTAAGAGCAAGAAGAAGCCTGGNGCTATTTTGGTGGAATTGGTAGACACGATACTCTTAAGATGTATTACTTTACAGTATGAAGGTTCAAGTCCTTTAAATAGCACCAACCTTGCCTTAAAAAAAAAA +>mt-Lys-TTT +CCTAAGAGCAAGAAGAAGCCTGGNGAGAATATTGTTTAATGGTAAAACAGTTGTCTTTTAAGCAACCCATGCTTGGTTCAACTCCAGCTATTCTCACCAACCTTGCCTTAAAAAAAAAA +>mt-Arg-TCT +CCTAAGAGCAAGAAGAAGCCTGGNGCTCTCTTAGCTTAATGGTTAAAGCATAATACTTCTAATATTAATATTCCATGTTCAAATCATGGAGAGAGTACCAACCTTGCCTTAAAAAAAAAA +>mt-Ala-TGC +CCTAAGAGCAAGAAGAAGCCTGGNGGGGTTATAGTTAAATTTGGTAGAACGACTGCGTTGCATGCATTTAATATGAGTTCAAGTCTCATTAACTCCACCAACCTTGCCTTAAAAAAAAAA +>mt-Ile-GAT +CCTAAGAGCAAGAAGAAGCCTGGNGAAACTATAATTCAATTGGTTAGAATAGTATTTTGATAAGGTACAAATATAGGTTCAATCCCTGTTAGTTTCACCAACCTTGCCTTAAAAAAAAAA +>mt-Asn-GTT +CCTAAGAGCAAGAAGAAGCCTGGNGTCCTTATAGCTTATCGGTTAAAGCATCTCACTGTTAATGAGAATAGATGGGTTCAATTCCTATTAAGGACGCCAACCTTGCCTTAAAAAAAAAA +>mt-Met-CAT +CCTAAGAGCAAGAAGAAGCCTGGNGCTTGTATAGTTTAATTGGTTAAAACATTTGTCTCATAAATAAATAATGTAAGGTTCAATTCCTTCTACAAGTACCAACCTTGCCTTAAAAAAAAAA +>mt-Phe-GAA +CCTAAGAGCAAGAAGAAGCCTGGNGCTTTTATAGCTTAGTGGTAAAGCGATAAATTGAAGATTTATTTACATGTAGTTCGATTCTCATTAAGGGCACCAACCTTGCCTTAAAAAAAAAA +>mt-Val-TAC +CCTAAGAGCAAGAAGAAGCCTGGNAGGAGATTAGCTTAATTGGTATAGCATTCGTTTTACACACGAAAGATTATAGGTTCGAACCCTATATTTCCTACCAACCTTGCCTTAAAAAAAAAA diff --git a/resources/ref/sacCer3-mito/sacCer3.chrM.trnas.fa b/resources/ref/sacCer3-mito/sacCer3.chrM.trnas.fa new file mode 100644 index 0000000..7a3b6a4 --- /dev/null +++ b/resources/ref/sacCer3-mito/sacCer3.chrM.trnas.fa @@ -0,0 +1,42 @@ +>chrM.trna1 chrM:9374-9444 (+) Sup (TCA) 71 bp Sc: 48.3 +AAGGATATAGTTTAATGGTAAAACAGTTGATTTCAAATCAATCATTAGGAGTTCGAATCT +CTTTATCCTTG +>chrM.trna2 chrM:48201-48287 (+) Ser (TGA) 87 bp Sc: 57.4 +GGATGGTTGACTGAGTGGTTTAAAGTGTGATATTTGAGCTATCATTAGTCTTTATTGGCT +ACGTAGGTTCAAATCCTACATCATCCG +>chrM.trna3 chrM:63862-63934 (+) Thr (TGT) 73 bp Sc: 63.6 +GTTATATTAGCTTAATTGGTAGAGCATTCGTTTTGTAATCGAAAGGTTTGGGGTTCAAAT +CCCTAATATAACA +>chrM.trna4 chrM:64597-64667 (+) His (GTG) 71 bp Sc: 46.1 +GTGAATATATTTCAATGGTAGAAAATACGCTTGTGGTGCGTTAAATCTGAGTTCGATTCT +CAGTATTCACC +>chrM.trna5 chrM:66095-66176 (+) Leu (TAA) 82 bp Sc: 45.0 +GCTATTTTGGTGGAATTGGTAGACACGATACTCTTAAGATGTATTACTTTACAGTATGAA +GGTTCAAGTCCTTTAAATAGCA +>chrM.trna6 chrM:67061-67132 (+) Lys (TTT) 72 bp Sc: 35.1 +GAGAATATTGTTTAATGGTAAAACAGTTGTCTTTTAAGCAACCCATGCTTGGTTCAACTC +CAGCTATTCTCA +>chrM.trna7 chrM:67309-67381 (+) Arg (TCT) 73 bp Sc: 54.7 +GCTCTCTTAGCTTAATGGTTAAAGCATAATACTTCTAATATTAATATTCCATGTTCAAAT +CATGGAGAGAGTA +>chrM.trna8 chrM:69289-69359 (+) Arg (ACG) 71 bp Sc: 38.9 Possible pseudogene +ATATCTTTAATTTAATGGTAAAATATTAGAATACGAATCTAATTATATAGGTTCAAATCC +TATAAGATATT +>chrM.trna9 chrM:69846-69918 (+) Ala (TGC) 73 bp Sc: 39.4 +GGGGTTATAGTTAAATTTGGTAGAACGACTGCGTTGCATGCATTTAATATGAGTTCAAGT +CTCATTAACTCCA +>chrM.trna10 chrM:70162-70234 (+) Ile (GAT) 73 bp Sc: 46.2 +GAAACTATAATTCAATTGGTTAGAATAGTATTTTGATAAGGTACAAATATAGGTTCAATC +CCTGTTAGTTTCA +>chrM.trna11 chrM:71433-71504 (+) Asn (GTT) 72 bp Sc: 52.6 +GTCCTTATAGCTTATCGGTTAAAGCATCTCACTGTTAATGAGAATAGATGGGTTCAATTC +CTATTAAGGACG +>chrM.trna12 chrM:72632-72705 (+) Met (CAT) 74 bp Sc: 50.6 +GCTTGTATAGTTTAATTGGTTAAAACATTTGTCTCATAAATAAATAATGTAAGGTTCAAT +TCCTTCTACAAGTA +>chrM.trna13 chrM:77431-77502 (+) Phe (GAA) 72 bp Sc: 53.8 +GCTTTTATAGCTTAGTGGTAAAGCGATAAATTGAAGATTTATTTACATGTAGTTCGATTC +TCATTAAGGGCA +>chrM.trna14 chrM:78533-78605 (+) Val (TAC) 73 bp Sc: 60.5 +AGGAGATTAGCTTAATTGGTATAGCATTCGTTTTACACACGAAAGATTATAGGTTCGAAC +CCTATATTTCCTA diff --git a/run-test.sh b/run-test.sh index 862b54b..584e142 100644 --- a/run-test.sh +++ b/run-test.sh @@ -1,12 +1,12 @@ #! /usr/bin/env bash #BSUB -n 1 -#BSUB -J aatrnaseq-main -#BSUB -e results/logs/aatrnaseq-main_%J.err -#BSUB -o results/logs/aatrnaseq-main_%J.out +#BSUB -J aatrnaseq-main-test +#BSUB -e .test/logs/aatrnaseq-main-test_%J.err +#BSUB -o .test/logs/aatrnaseq-main-test_%J.out -mkdir -p results/logs +mkdir -p .test/logs snakemake \ --configfile=config/config-test.yml \ - --profile cluster + --profile=cluster/lsf diff --git a/workflow/Snakefile b/workflow/Snakefile index 90b51b3..603f562 100644 --- a/workflow/Snakefile +++ b/workflow/Snakefile @@ -8,12 +8,30 @@ min_version("8.0") configfile: "config/config-base.yml" +# Add a onstart handler to update the PATH environment variable +DORADO_VERSION = config["dorado_version"] +DORADO_DIR = f"resources/tools/dorado/{DORADO_VERSION}" +MODKIT_VERSION = config["modkit_version"] +MODKIT_DIR = f"resources/tools/modkit/{MODKIT_VERSION}" + + +onstart: + import os + + dorado_bin_path = os.path.abspath(f"{DORADO_DIR}/bin") + modkit_bin_path = os.path.abspath(f"{MODKIT_DIR}/bin") + os.environ["PATH"] = f"{dorado_bin_path}:{modkit_bin_path}:{os.environ['PATH']}" + shell.prefix(f"export PATH={dorado_bin_path}:{modkit_bin_path}:$PATH; ") + + SNAKEFILE_DIR = os.path.dirname(workflow.snakefile) PIPELINE_DIR = os.path.dirname(SNAKEFILE_DIR) include: "rules/common.smk" -include: "rules/aatrnaseq.smk" +include: "rules/tool_setup.smk" +include: "rules/aatrnaseq-process.smk" +include: "rules/aatrnaseq-summaries.smk" report_metadata() diff --git a/environment.yml b/workflow/envs/aatrnaseqpipe-env.yml similarity index 90% rename from environment.yml rename to workflow/envs/aatrnaseqpipe-env.yml index 6858622..9b13a25 100644 --- a/environment.yml +++ b/workflow/envs/aatrnaseqpipe-env.yml @@ -3,20 +3,19 @@ name: aatrnaseqpipe channels: - conda-forge - bioconda - - hcc dependencies: - pysam - - numpy + - numpy < 2 - pandas - samtools - bwa - deeptools + - bedtools - snakemake - snakefmt - gitpython - pip - - dorado >= 0.7.2 - pip: - pod5 - ont-remora >= 3.2 diff --git a/workflow/rules/aatrnaseq-process.smk b/workflow/rules/aatrnaseq-process.smk new file mode 100644 index 0000000..0ab428a --- /dev/null +++ b/workflow/rules/aatrnaseq-process.smk @@ -0,0 +1,178 @@ +""" +Rules for processing raw data from aa-tRNA-seq experiments +""" + + +rule merge_pods: + """ + merge pod5s into a single pod5 + """ + input: + get_raw_inputs, + output: + os.path.join(outdir, "pod5", "{sample}", "{sample}.pod5"), + log: + os.path.join(outdir, "logs", "merge_pods", "{sample}"), + threads: 12 + shell: + """ + pod5 merge -t {threads} -f -o {output} {input} + """ + + +rule rebasecall: + """ + rebasecall using different accuracy model + + TODO: remove `-v` to reduce log file size. Removing it cases the call to fail. + """ + input: + rules.merge_pods.output, + output: + protected( + os.path.join(outdir, "bam", "rebasecall", "{sample}", "{sample}.rbc.bam") + ), + log: + os.path.join(outdir, "logs", "rebasecall", "{sample}"), + params: + model=config["base_calling_model"], + raw_data_dir=get_basecalling_dir, + temp_pod5=os.path.join(outdir, "{sample}", "{sample}.pod5"), + dorado_opts=config["opts"]["dorado"], + shell: + """ + if [[ "${{CUDA_VISIBLE_DEVICES:-}}" ]]; then + echo "CUDA_VISIBLE_DEVICES $CUDA_VISIBLE_DEVICES" + export CUDA_VISIBLE_DEVICES + fi + + dorado basecaller {params.dorado_opts} {params.model} {input} > {output} + """ + + +rule ubam_to_fastq: + """ + extract reads from bam into FASTQ format for alignment + """ + input: + rules.rebasecall.output, + output: + os.path.join(outdir, "fq", "{sample}.fq.gz"), + log: + os.path.join(outdir, "logs", "ubam_to_fastq", "{sample}"), + shell: + """ + samtools fastq -T "*" {input} | gzip > {output} + """ + + +rule bwa_idx: + input: + config["fasta"], + output: + multiext(config["fasta"], ".amb", ".ann", ".bwt", ".pac", ".sa"), + log: + os.path.join(outdir, "logs", "bwa_idx", "log"), + shell: + """ + bwa index {input} + """ + + +rule bwa_align: + """ + align reads to tRNA references with bwa mem + """ + input: + reads=rules.ubam_to_fastq.output, + idx=rules.bwa_idx.output, + output: + bam=os.path.join(outdir, "bam", "aln", "{sample}", "{sample}.aln.bam"), + bai=os.path.join(outdir, "bam", "aln", "{sample}", "{sample}.aln.bam.bai"), + params: + index=config["fasta"], + bwa_opts=config["opts"]["bwa"], + log: + os.path.join(outdir, "logs", "bwa_align", "{sample}"), + threads: 12 + shell: + """ + bwa mem -C -t {threads} {params.bwa_opts} {params.index} {input.reads} \ + | samtools view -F 4 -h \ + | awk '($1 ~ /^@/ || $4 <= 25)' \ + | samtools view -Sb - \ + | samtools sort -o {output.bam} + + samtools index {output.bam} + """ + + +rule classify_charging: + """ + run remora trained model to classify charged and uncharged reads + """ + input: + pod5=rules.merge_pods.output, + bam=rules.bwa_align.output.bam, + output: + charging_bam=os.path.join(outdir, "bam", "charging", "{sample}.charging.bam"), + charging_bam_bai=os.path.join( + outdir, "bam", "charging", "{sample}.charging.bam.bai" + ), + temp_sorted_bam=temp( + os.path.join(outdir, "bam", "charging", "{sample}.charging.bam.tmp") + ), + log: + os.path.join(outdir, "logs", "classify_charging", "{sample}"), + params: + model=config["remora_cca_classifier"], + shell: + """ + if [[ "${{CUDA_VISIBLE_DEVICES:-}}" ]]; then + echo "CUDA_VISIBLE_DEVICES $CUDA_VISIBLE_DEVICES" + export CUDA_VISIBLE_DEVICES + fi + + remora infer from_pod5_and_bam {input.pod5} {input.bam} \ + --model {params.model} \ + --out-bam {output.charging_bam} \ + --log-filename {log} \ + --reference-anchored \ + --device 0 + + # sort the result + samtools sort {output.charging_bam} > {output.temp_sorted_bam} + cp {output.temp_sorted_bam} {output.charging_bam} + + samtools index {output.charging_bam} + """ + + +rule transfer_bam_tags: + """ + creates final bam with classified reads MM and ML tags and table with charging probability per read + + MM/ML tags from the charging classification are transferred to CM/CL so as not to interfere with + base modifications. + """ + input: + source_bam=rules.classify_charging.output.charging_bam, + target_bam=rules.bwa_align.output.bam, + output: + classified_bam=os.path.join(outdir, "bam", "final", "{sample}.bam"), + classified_bam_bai=os.path.join(outdir, "bam", "final", "{sample}.bam.bai"), + log: + os.path.join(outdir, "logs", "transfer_bam_tags", "{sample}"), + params: + src=SCRIPT_DIR, + shell: + """ + python {params.src}/transfer_tags.py \ + --tags ML MM \ + --rename ML=CL MM=CM \ + --source {input.source_bam} \ + --target {input.target_bam} \ + --output {output.classified_bam} + + samtools index {output.classified_bam} + """ diff --git a/workflow/rules/aatrnaseq-summaries.smk b/workflow/rules/aatrnaseq-summaries.smk new file mode 100644 index 0000000..3e4f7a4 --- /dev/null +++ b/workflow/rules/aatrnaseq-summaries.smk @@ -0,0 +1,258 @@ +rule get_cca_trna: + """ + extract and report charing probability (ML tag) per read + """ + input: + bam=rules.transfer_bam_tags.output.classified_bam, + output: + charging_tab=os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.charging_prob.tsv.gz" + ), + log: + os.path.join(outdir, "logs", "get_cca_trna", "{sample}"), + params: + src=SCRIPT_DIR, + shell: + """ + python {params.src}/get_charging_table.py \ + --tag CL \ + {input.bam} \ + {output.charging_tab} + """ + + +rule get_cca_trna_cpm: + """ + calculate cpm for cca classified trnas + """ + input: + charging_tab=rules.get_cca_trna.output.charging_tab, + output: + cpm=os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.charging.cpm.tsv.gz" + ), + log: + os.path.join(outdir, "logs", "cca_trna_cpm", "{sample}"), + params: + src=SCRIPT_DIR, + # XXX move `ml_thresh` to config file + ml_thresh=200, + shell: + """ + python {params.src}/get_trna_charging_cpm.py \ + --input {input.charging_tab} \ + --output {output.cpm} \ + --ml-threshold {params.ml_thresh} + """ + + +rule base_calling_error: + """ + extract base calling error metrics to tsv file + """ + input: + bam=rules.transfer_bam_tags.output.classified_bam, + bai=rules.transfer_bam_tags.output.classified_bam_bai, + output: + tsv=os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.bcerror.tsv.gz" + ), + log: + os.path.join(outdir, "logs", "bcerror", "{sample}.bwa"), + params: + src=SCRIPT_DIR, + fa=config["fasta"], + shell: + """ + python {params.src}/get_bcerror_freqs.py \ + {input.bam} \ + {params.fa} \ + {output.tsv} + """ + + +rule align_stats: + """ + extract alignment stats + """ + input: + unmapped=rules.rebasecall.output, + aligned=rules.bwa_align.output.bam, + classified=rules.transfer_bam_tags.output.classified_bam, + output: + tsv=os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.align_stats.tsv.gz" + ), + log: + os.path.join(outdir, "logs", "stats", "{sample}.align_stats"), + params: + src=SCRIPT_DIR, + shell: + """ + python {params.src}/get_align_stats.py \ + -o {output.tsv} \ + -a unmapped aligned classified \ + -i {wildcards.sample} \ + -b {input.unmapped} \ + {input.aligned} \ + {input.classified} + """ + + +rule bam_to_coverage: + input: + bam=rules.transfer_bam_tags.output.classified_bam, + bai=rules.transfer_bam_tags.output.classified_bam_bai, + output: + counts_tmp=temp( + os.path.join(outdir, "summary", "tables", "{sample}", "{sample}.counts.bg") + ), + cpm_tmp=temp( + os.path.join(outdir, "summary", "tables", "{sample}", "{sample}.cpm.bg") + ), + counts=protected( + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.counts.bg.gz" + ) + ), + cpm=protected( + os.path.join(outdir, "summary", "tables", "{sample}", "{sample}.cpm.bg.gz") + ), + params: + bg_opts=config["opts"]["coverage"], + log: + os.path.join(outdir, "logs", "bg", "{sample}.txt"), + threads: 4 + shell: + """ + bamCoverage \ + -b {input.bam} \ + -o {output.cpm_tmp} \ + --normalizeUsing CPM \ + --outFileFormat bedgraph \ + -bs 1 \ + -p {threads} \ + {params.bg_opts} + + bamCoverage \ + -b {input.bam} \ + -o {output.counts_tmp} \ + --outFileFormat bedgraph \ + -bs 1 \ + -p {threads} \ + {params.bg_opts} + + gzip -c {output.counts_tmp} > {output.counts} + gzip -c {output.cpm_tmp} > {output.cpm} + """ + + +rule remora_signal_stats: + """ + run remora to get signal stats + """ + input: + bam=rules.transfer_bam_tags.output.classified_bam, + bai=rules.transfer_bam_tags.output.classified_bam_bai, + pod5=rules.merge_pods.output, + output: + tsv=os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.remora.tsv.gz" + ), + log: + os.path.join(outdir, "logs", "remora", "{sample}"), + params: + src=SCRIPT_DIR, + kmer=config["remora_kmer_table"], + opts=config["opts"]["remora"], + shell: + """ + python {params.src}/extract_signal_metrics.py \ + --pod5_dir {input.pod5} \ + --bam {input.bam} \ + --kmer {params.kmer} \ + --sample_name {wildcards.sample} \ + {params.opts} \ + | gzip -c \ + > {output.tsv} + """ + + +rule modkit_pileup: + """ + """ + input: + bam=rules.transfer_bam_tags.output.classified_bam, + bai=rules.transfer_bam_tags.output.classified_bam_bai, + output: + bed=os.path.join( + outdir, "summary", "modkit", "{sample}", "{sample}.pileup.bed.gz" + ), + log: + os.path.join(outdir, "logs", "modkit", "pileup", "{sample}"), + params: + fa=config["fasta"], + shell: + """ + modkit pileup \ + --log-filepath {log} \ + --ref {params.fa} \ + {input.bam} - \ + | gzip -9 -c > {output.bed} + """ + + +rule modkit_extract_calls: + """ + TODO: inspect edge filter settings + """ + input: + bam=rules.transfer_bam_tags.output.classified_bam, + bai=rules.transfer_bam_tags.output.classified_bam_bai, + output: + tsv=os.path.join( + outdir, "summary", "modkit", "{sample}", "{sample}.mod_calls.tsv.gz" + ), + log: + os.path.join(outdir, "logs", "modkit", "extract_calls", "{sample}"), + params: + fa=config["fasta"], + shell: + """ + modkit extract calls \ + --bgzf \ + --reference {params.fa} \ + --log-filepath {log} \ + --edge-filter 10 \ + --mapped --pass \ + {input.bam} {output.tsv} + """ + + +rule modkit_extract_full: + """ + TODO: inspect edge filter settings + """ + input: + bam=rules.transfer_bam_tags.output.classified_bam, + bai=rules.transfer_bam_tags.output.classified_bam_bai, + output: + tsv=os.path.join( + outdir, "summary", "modkit", "{sample}", "{sample}.mod_full.tsv.gz" + ), + threads: 12 + log: + os.path.join(outdir, "logs", "modkit", "extract_full", "{sample}"), + params: + fa=config["fasta"], + shell: + """ + modkit extract full \ + --bgzf \ + --threads 12 \ + --reference {params.fa} \ + --log-filepath {log} \ + --edge-filter 10 \ + --mapped \ + {input.bam} {output.tsv} + """ diff --git a/workflow/rules/aatrnaseq.smk b/workflow/rules/aatrnaseq.smk deleted file mode 100644 index d784294..0000000 --- a/workflow/rules/aatrnaseq.smk +++ /dev/null @@ -1,343 +0,0 @@ - -rule merge_pods: - """ - merge all fast5/pod5s into a single pod5 - """ - input: - get_raw_inputs, - output: - os.path.join(rbc_outdir, "{sample}", "{sample}.pod5"), - log: - os.path.join(outdir, "logs", "merge_pods", "{sample}"), - params: - is_fast5=config["input_format"], - shell: - """ - if [ "{params.is_fast5}" == "FAST5" ]; then - pod5 convert fast5 -f --output {output} {input} - else - pod5 merge -f -o {output} {input} - fi - """ - - -rule rebasecall: - """ - rebasecall using different accuracy model - requires a GPU - """ - input: - rules.merge_pods.output, - output: - protected(os.path.join(rbc_outdir, "{sample}", "{sample}.unmapped.bam")), - log: - os.path.join(outdir, "logs", "rebasecall", "{sample}"), - params: - model=config["base_calling_model"], - is_fast5=config["input_format"], - raw_data_dir=get_basecalling_dir, - temp_pod5=os.path.join(rbc_outdir, "{sample}", "{sample}.pod5"), - dorado_opts=config["opts"]["dorado"], - shell: - """ - if [[ "${{CUDA_VISIBLE_DEVICES:-}}" ]]; then - echo "CUDA_VISIBLE_DEVICES $CUDA_VISIBLE_DEVICES" - export CUDA_VISIBLE_DEVICES - fi - - dorado basecaller {params.dorado_opts} -v {params.model} {input} > {output} - """ - - -def get_optional_bam_inputs(wildcards): - sample = wildcards.sample - - if config["input_format"] == "BAM": - return samples[sample]["raw_files"] - else: - return os.path.join(rbc_outdir, sample, sample + ".unmapped.bam") - - -rule ubam_to_fq: - """ - extract reads from bam into FASTQ format for alignment - """ - input: - get_optional_bam_inputs, - output: - os.path.join(outdir, "fastqs", "{sample}.fastq.gz"), - log: - os.path.join(outdir, "logs", "ubam_to_fq", "{sample}"), - shell: - """ - samtools fastq -T "*" {input} | gzip > {output} - """ - - -rule bwa_idx: - input: - config["fasta"], - output: - multiext(config["fasta"], ".amb", ".ann", ".bwt", ".pac", ".sa"), - log: - os.path.join(outdir, "logs", "bwa_idx", "log"), - shell: - """ - bwa index {input} - """ - - -rule bwa_align: - """ - align reads to tRNA references with bwa mem - """ - input: - reads=rules.ubam_to_fq.output, - idx=rules.bwa_idx.output, - output: - bam=os.path.join(outdir, "bams", "{sample}", "{sample}.bwa.unfiltered.bam"), - bai=os.path.join(outdir, "bams", "{sample}", "{sample}.bwa.unfiltered.bam.bai"), - params: - index=config["fasta"], - bwa_opts=config["opts"]["bwa"], - log: - os.path.join(outdir, "logs", "bwa", "{sample}"), - threads: 12 - shell: - """ - bwa mem -C -t {threads} {params.bwa_opts} {params.index} {input.reads} \ - | samtools view -F 4 -h \ - | awk '($1 ~ /^@/ || $4 <= 25)' \ - | samtools view -Sb - \ - | samtools sort -o {output.bam} - - samtools index {output.bam} - """ - - -rule cca_classify: - """ - run remora trained model to classify charged and uncharged reads - """ - input: - pod5=rules.merge_pods.output, - bam=rules.bwa_align.output.bam, - output: - mod_bam=os.path.join(outdir, "mod_bams", "{sample}_mod.bam"), - mod_bam_bai=os.path.join(outdir, "mod_bams", "{sample}_mod.bam.bai"), - txt=os.path.join(outdir, "mod_bams", "{sample}.txt"), - temp_sorted_bam=temp(os.path.join(outdir, "mod_bams", "{sample}_mod.bam.tmp")), - log: - os.path.join(outdir, "logs", "cca_classify", "{sample}"), - params: - model=config["remora_cca_classifier"], - shell: - """ - if [[ "${{CUDA_VISIBLE_DEVICES:-}}" ]]; then - echo "CUDA_VISIBLE_DEVICES $CUDA_VISIBLE_DEVICES" - export CUDA_VISIBLE_DEVICES - fi - - remora infer from_pod5_and_bam {input.pod5} {input.bam} \ - --model {params.model} \ - --out-bam {output.mod_bam} \ - --log-filename {output.txt} \ - --reference-anchored \ - --device 0 - - # sort the result - samtools sort {output.mod_bam} > {output.temp_sorted_bam} - cp {output.temp_sorted_bam} {output.mod_bam} - - samtools index {output.mod_bam} - """ - - -rule transfer_bam_tags: - """ - creates final bam with classified reads MM and ML tags and table with charging probability per read - """ - input: - source_bam=rules.cca_classify.output.mod_bam, - target_bam=rules.bwa_align.output.bam, - output: - classified_bam=os.path.join(outdir, "classified_bams", "{sample}.bam"), - classified_bam_bai=os.path.join(outdir, "classified_bams", "{sample}.bam.bai"), - log: - os.path.join(outdir, "logs", "transfer_bam_tags", "{sample}"), - params: - src=SCRIPT_DIR, - shell: - """ - python {params.src}/transfer_tags.py \ - -s {input.source_bam} \ - -t {input.target_bam} \ - -o {output.classified_bam} - - samtools index {output.classified_bam} - """ - - -rule get_cca_trna: - """ - extract and report charing probability (ML tag) per read - """ - input: - bam=rules.transfer_bam_tags.output.classified_bam, - output: - charging_tab=os.path.join( - outdir, "tables", "{sample}", "{sample}.charging_prob.tsv.gz" - ), - log: - os.path.join(outdir, "logs", "get_cca_trna", "{sample}"), - params: - src=SCRIPT_DIR, - shell: - """ - python {params.src}/get_charging_table.py \ - {input.bam} \ - {output.charging_tab} - """ - - -rule get_cca_trna_cpm: - """ - calculate cpm for cca classified trnas - """ - input: - charging_tab=rules.get_cca_trna.output.charging_tab, - output: - cpm=os.path.join(outdir, "tables", "{sample}", "{sample}.charging.cpm.tsv.gz"), - log: - os.path.join(outdir, "logs", "cca_trna_cpm", "{sample}"), - params: - src=SCRIPT_DIR, - # XXX move `ml_thresh` to config file - ml_thresh=200, - shell: - """ - python {params.src}/get_trna_charging_cpm.py \ - --input {input.charging_tab} \ - --output {output.cpm} \ - --ml-threshold {params.ml_thresh} - """ - - -rule bcerror: - """ - extract base calling error metrics to tsv file - """ - input: - bam=rules.transfer_bam_tags.output.classified_bam, - bai=rules.transfer_bam_tags.output.classified_bam_bai, - output: - tsv=os.path.join(outdir, "tables", "{sample}", "{sample}.bcerror.tsv.gz"), - log: - os.path.join(outdir, "logs", "bcerror", "{sample}.bwa"), - params: - src=SCRIPT_DIR, - fa=config["fasta"], - shell: - """ - python {params.src}/get_bcerror_freqs.py \ - {input.bam} \ - {params.fa} \ - {output.tsv} - """ - - -rule align_stats: - """ - extract alignment stats - """ - input: - unmapped=get_optional_bam_inputs, - aligned=rules.bwa_align.output.bam, - classified=rules.transfer_bam_tags.output.classified_bam, - output: - tsv=os.path.join(outdir, "tables", "{sample}", "{sample}.align_stats.tsv.gz"), - log: - os.path.join(outdir, "logs", "stats", "{sample}.align_stats"), - params: - src=SCRIPT_DIR, - shell: - """ - python {params.src}/get_align_stats.py \ - -o {output.tsv} \ - -a unmapped aligned classified \ - -i {wildcards.sample} \ - -b {input.unmapped} \ - {input.aligned} \ - {input.classified} - """ - - -rule bam_to_coverage: - input: - bam=rules.transfer_bam_tags.output.classified_bam, - bai=rules.transfer_bam_tags.output.classified_bam_bai, - output: - counts_tmp=temp( - os.path.join(outdir, "tables", "{sample}", "{sample}.counts.bg") - ), - cpm_tmp=temp(os.path.join(outdir, "tables", "{sample}", "{sample}.cpm.bg")), - counts=protected( - os.path.join(outdir, "tables", "{sample}", "{sample}.counts.bg.gz") - ), - cpm=protected(os.path.join(outdir, "tables", "{sample}", "{sample}.cpm.bg.gz")), - params: - bg_opts=config["opts"]["coverage"], - log: - os.path.join(outdir, "logs", "bg", "{sample}.txt"), - threads: 4 - shell: - """ - bamCoverage \ - -b {input.bam} \ - -o {output.cpm_tmp} \ - --normalizeUsing CPM \ - --outFileFormat bedgraph \ - -bs 1 \ - -p {threads} \ - {params.bg_opts} - - bamCoverage \ - -b {input.bam} \ - -o {output.counts_tmp} \ - --outFileFormat bedgraph \ - -bs 1 \ - -p {threads} \ - {params.bg_opts} - - gzip -c {output.counts_tmp} > {output.counts} - gzip -c {output.cpm_tmp} > {output.cpm} - """ - - -rule remora_signal_stats: - """ - run remora to get signal stats - """ - input: - bam=rules.transfer_bam_tags.output.classified_bam, - bai=rules.transfer_bam_tags.output.classified_bam_bai, - pod5=rules.merge_pods.output, - output: - tsv=os.path.join(outdir, "tables", "{sample}", "{sample}.remora.tsv.gz"), - log: - os.path.join(outdir, "logs", "remora", "{sample}"), - params: - src=SCRIPT_DIR, - kmer=config["remora_kmer_table"], - opts=config["opts"]["remora"], - shell: - """ - python {params.src}/extract_signal_metrics.py \ - --pod5_dir {input.pod5} \ - --bam {input.bam} \ - --kmer {params.kmer} \ - --sample_name {wildcards.sample} \ - {params.opts} \ - | gzip -c \ - > {output.tsv} - """ diff --git a/workflow/rules/common.smk b/workflow/rules/common.smk index d9a2f68..ee2faa8 100644 --- a/workflow/rules/common.smk +++ b/workflow/rules/common.smk @@ -58,27 +58,16 @@ def report_metadata(): def find_raw_inputs(sample_dict): """ - parse through directories listed in samples.tsv and identify fast5 or pod5 files to process + parse through directories listed in samples.tsv and identify pod5 files to process store input files and uuid base file names in dictionary for each sample """ POD5_DIRS = ["pod5_pass", "pod5_fail", "pod5"] - FAST5_DIRS = ["fast5_pass", "fast5_fail"] - fmt = config["input_format"] - - data_subdirs = [] - if fmt == "POD5": - data_subdirs = POD5_DIRS - ext = ".pod5" - elif fmt == "FAST5": - data_subdirs = FAST5_DIRS - ext = ".fast5" - else: - sys.exit("input_format config option must be either FAST5, or POD5") + ext = ".pod5" for sample, info in sample_dict.items(): raw_fls = [] for path in info["path"]: - for subdir in data_subdirs: + for subdir in POD5_DIRS: data_path = os.path.join(path, subdir, "*" + ext) fls = glob.glob(data_path) raw_fls += fls @@ -93,7 +82,6 @@ def find_raw_inputs(sample_dict): # set up global samples dictionary to be used throughout pipeline outdir = config["output_directory"] -rbc_outdir = os.path.join(outdir, "rbc_bams") samples = parse_samples(config["samples"]) samples = find_raw_inputs(samples) @@ -102,45 +90,85 @@ samples = find_raw_inputs(samples) # Define target files for rule all def pipeline_outputs(): outs = expand( - os.path.join(outdir, "tables", "{sample}", "{sample}.charging_prob.tsv.gz"), + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.charging_prob.tsv.gz" + ), sample=samples.keys(), ) outs += expand( - os.path.join(outdir, "tables", "{sample}", "{sample}.charging.cpm.tsv.gz"), + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.charging.cpm.tsv.gz" + ), sample=samples.keys(), ) outs += expand( - os.path.join(outdir, "tables", "{sample}", "{sample}.bcerror.tsv.gz"), + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.bcerror.tsv.gz" + ), sample=samples.keys(), ) outs += expand( - os.path.join(outdir, "tables", "{sample}", "{sample}.align_stats.tsv.gz"), + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.align_stats.tsv.gz" + ), sample=samples.keys(), ) outs += expand( - os.path.join(outdir, "tables", "{sample}", "{sample}.{values}.bg.gz"), + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.{values}.bg.gz" + ), sample=samples.keys(), values=["cpm", "counts"], ) + # modkit outputs + outs += expand( + os.path.join( + outdir, + "summary", + "modkit", + "{sample}", + "{sample}.pileup.bed.gz", + ), + sample=samples.keys(), + ) + + outs += expand( + os.path.join( + outdir, "summary", "modkit", "{sample}", "{sample}.mod_calls.tsv.gz" + ), + sample=samples.keys(), + ) + + outs += expand( + os.path.join( + outdir, "summary", "modkit", "{sample}", "{sample}.mod_full.tsv.gz" + ), + sample=samples.keys(), + ) + + outs += [f"{DORADO_DIR}/bin/dorado"] + + outs += [os.path.join("resources/models", config["dorado_model"])] + + outs += [f"{MODKIT_DIR}/bin/modkit"] + if ( "remora_kmer_table" in config and config["remora_kmer_table"] != "" and config["remora_kmer_table"] is not None ): outs += expand( - os.path.join(outdir, "tables", "{sample}", "{sample}.remora.tsv.gz"), + os.path.join( + outdir, "summary", "tables", "{sample}", "{sample}.remora.tsv.gz" + ), sample=samples.keys(), ) - # if "trna_table" in config and config["trna_table"] != "" and config["trna_table"] is not None: - # outs += expand(os.path.join(outdir, "tables", "{sample}", "{sample}.charging_status.tsv"), - # sample = samples.keys()) - return outs diff --git a/workflow/rules/tool_setup.smk b/workflow/rules/tool_setup.smk new file mode 100644 index 0000000..f4e7a4e --- /dev/null +++ b/workflow/rules/tool_setup.smk @@ -0,0 +1,124 @@ +# Determine OS-specific and architecture-specific Dorado URL +import platform + +system = platform.system().lower() +machine = platform.machine().lower() + +# Detect architecture +if machine in ["arm64", "aarch64"]: + arch = "arm64" +elif machine in ["x86_64", "amd64", "x64"]: + arch = "x64" +else: + raise ValueError(f"Unsupported architecture: {machine}") + +# Construct OS-specific suffix and file extension +if system == "linux": + os_suffix = f"linux-{arch}" + file_ext = "tar.gz" +elif system == "darwin": + os_suffix = f"osx-{arch}" + file_ext = "zip" +else: + raise ValueError(f"Unsupported operating system: {system}") + +DORADO_URL = f"https://cdn.oxfordnanoportal.com/software/analysis/dorado-{DORADO_VERSION}-{os_suffix}.{file_ext}" + + +rule setup_dorado: + output: + dorado_bin=f"{DORADO_DIR}/bin/dorado", + params: + dorado_url=DORADO_URL, + dorado_dir=DORADO_DIR, + file_ext=file_ext, + shell: + """ + # Create directory structure if it doesn't exist + mkdir -p {params.dorado_dir} + + # Download Dorado tarball + curl -L -o dorado.{params.file_ext} {params.dorado_url} + + # Extract based on file type + if [ "{params.file_ext}" = "tar.gz" ]; then + tar -xzf dorado.{params.file_ext} -C {params.dorado_dir} --strip-components=1 + elif [ "{params.file_ext}" = "zip" ]; then + unzip -o dorado.{params.file_ext} -d {params.dorado_dir}_temp + # Find the extracted directory and move its contents + mv {params.dorado_dir}_temp/*/* {params.dorado_dir}/ + rm -rf {params.dorado_dir}_temp + fi + + # Remove the tarball + rm dorado.{params.file_ext} + + # Make the binary executable (just to be sure) + chmod +x {output.dorado_bin} + """ + + +rule dorado_model: + """ + Download dorado base-calling model + """ + output: + directory(os.path.join("resources", "models", config["dorado_model"])), + log: + os.path.join(outdir, "logs", "dorado_model.log"), + params: + model=config["dorado_model"], + model_dir=os.path.join("resources", "models"), + shell: + """ + mkdir -p {params.model_dir} + + # Run Dorado download + dorado download --model {params.model} --models-directory {params.model_dir} > {log} 2>&1 + + # Create a marker file if needed + if [ ! -d "{output}" ]; then + echo "Error: Model directory not created: {output}" >> {log} + exit 1 + fi + + # List the contents of the model directory to help with debugging + echo "Contents of {output}:" >> {log} + ls -la {output} >> {log} 2>&1 + + # Touch a marker file in case the download doesn't create any files + # This ensures the output directory is not empty + touch {output}/.downloaded + """ + + +rule setup_modkit: + """ + Install modkit from pre-built binary or source depending on the system + """ + output: + modkit_bin=f"{MODKIT_DIR}/bin/modkit", + params: + modkit_dir=MODKIT_DIR, + modkit_version=MODKIT_VERSION, + modkit_repository="https://github.com/nanoporetech/modkit", + linux_binary_url=f"https://github.com/nanoporetech/modkit/releases/download/v{MODKIT_VERSION}/modkit_v{MODKIT_VERSION}_u16_x86_64.tar.gz", + log: + os.path.join(outdir, "logs", "setup_modkit.log"), + shell: + """ + if ! command -v rustc &> /dev/null; then + echo "Rust not found, installing..." >> {log} 2>&1 + curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y >> {log} 2>&1 + source "$HOME/.cargo/env" + fi + + export PATH="$HOME/.cargo/bin:$PATH" + + export CARGO_NET_GIT_FETCH_WITH_CLI=true + cargo install --git {params.modkit_repository} \\ + --tag v{params.modkit_version} \\ + --root {params.modkit_dir} \\ + --jobs 4 >> {log} 2>&1 + + """ diff --git a/workflow/scripts/extract_signal_metrics.py b/workflow/scripts/extract_signal_metrics.py index 4f8669b..07b99a5 100644 --- a/workflow/scripts/extract_signal_metrics.py +++ b/workflow/scripts/extract_signal_metrics.py @@ -6,9 +6,8 @@ import pod5 import pysam import numpy as np -import polars as pl -import remora +import logging from remora import io, refine_signal_map, util ####### @@ -19,8 +18,7 @@ ####### # silence Remora DEBUG messages -# logging.getLogger("Remora").setLevel(logging.DEBUG) - +logging.getLogger("Remora").setLevel(logging.INFO) def get_metric_data( bam_fh, @@ -36,7 +34,6 @@ def get_metric_data( signal_norm_method="norm", scale_iters_opt=0, ): - s_name = sample_name ref_chr = chrom ref_strand = strand @@ -69,7 +66,6 @@ def get_metric_data( def iter_metrics(sample_name, samples_metrics, all_bam_reads, ref_reg): - for metrics, reads in zip(samples_metrics, all_bam_reads): n_reads, n_pos = next(iter(metrics.values())).shape @@ -185,11 +181,11 @@ def main(args): parser = argparse.ArgumentParser( description=""" Extract signal metrics using the Remora API. Output will be generated - across all regions with read coverage, restricting to reads mapped on - the positive strand, as it is expected that the reads are aligned + across all regions with read coverage, restricting to reads mapped on + the positive strand, as it is expected that the reads are aligned against transcripts rather than a genome reference. Output can be restricted to a specific region using the --region option, or to a set of regions - using the --bed option. + using the --bed option. The output is TSV text with the following columns: Sample\tContig\tReference_Position\tRead_id\tMetric1\tMetric2\tMetric3\t... The Reference_Position is 1-based. @@ -215,7 +211,7 @@ def main(args): help=""" Window size used for signal extraction. Regions will be chunked into windows of this length prior to processing. Use this option if you want to extract data from large regions (e.g. regions >> than the read length). - Without this option the entire region will be processed at once, which for e.g. chromosomes or long RNAs + Without this option the entire region will be processed at once, which for e.g. chromosomes or long RNAs would use excessive memory. Set this to the median of the read lengths in the dataset. Setting to 0 disables this option, which is the default. Default: 0 """, diff --git a/workflow/scripts/filter_reads.py b/workflow/scripts/filter_reads.py index 391a8e7..d4f53e1 100644 --- a/workflow/scripts/filter_reads.py +++ b/workflow/scripts/filter_reads.py @@ -73,7 +73,6 @@ def __str__(self): def count_adapter_edits(aln, ref_start, ref_end): - # use get_aligned_pairs to get the aligned positions # count number of mismatches, deletions and insertions # in the adapter region @@ -97,7 +96,6 @@ def count_adapter_edits(aln, ref_start, ref_end): in_adapter = False for query_pos, ref_pos, seq in aligned_pairs: - if ref_pos is not None and ref_pos >= ref_end: break @@ -122,7 +120,6 @@ def count_adapter_edits(aln, ref_start, ref_end): def compatible_secondary_alignments(aln, trna_ref_dict, isodecoder_ref_dict): - # read doesn't have multiple alignments (at least not in the XA tag) # likely due to a larger number of secondary alignments than -h setting in bwa mem if not aln.has_tag("XA"): @@ -150,7 +147,6 @@ def compatible_secondary_alignments(aln, trna_ref_dict, isodecoder_ref_dict): def filter_bam(args): - five_p_truncation = args.five_p_truncation p3_truncation_max = args.three_p_truncation min_mapq = args.min_mapq @@ -264,7 +260,6 @@ def filter_bam(args): if __name__ == "__main__": - parser = argparse.ArgumentParser( description=""" Filter reads in a BAM file according to the indicated criteria. diff --git a/workflow/scripts/get_align_stats.py b/workflow/scripts/get_align_stats.py index afd80b5..cca26c4 100644 --- a/workflow/scripts/get_align_stats.py +++ b/workflow/scripts/get_align_stats.py @@ -185,7 +185,6 @@ def get_read_stats(fn, flag=None, sample_id=None, sample_info=None): if __name__ == "__main__": - parser = argparse.ArgumentParser( description=""" Collect read and alignment statistics from unmapped, mapped, and further processed BAM files. diff --git a/workflow/scripts/get_bcerror_freqs.py b/workflow/scripts/get_bcerror_freqs.py index d4e2464..1de4bae 100644 --- a/workflow/scripts/get_bcerror_freqs.py +++ b/workflow/scripts/get_bcerror_freqs.py @@ -1,6 +1,6 @@ import argparse import pysam -import polars as pl +import pandas as pd import gzip """ @@ -158,23 +158,18 @@ def calculate_error_frequencies(bam_file, fasta_file): samfile.close() faidx.close() - return pl.DataFrame(error_data) + return pd.DataFrame(error_data) if __name__ == "__main__": parser = argparse.ArgumentParser( description="Calculate Basecalling Error Frequencies" ) - + parser.add_argument("bam_file", help="Path to the BAM file") parser.add_argument("fasta_file", help="Path to the FASTA file") parser.add_argument("output_tsv", help="Path for the output TSV file") args = parser.parse_args() error_freq_df = calculate_error_frequencies(args.bam_file, args.fasta_file) - - if args.output_tsv.endswith(".gz"): - with gzip.open(args.output_tsv, "wt") as file_out: - error_freq_df.write_csv(file_out, separator = "\t") - else: - error_freq_df.write_csv(args.output_tsv, separator = "\t") \ No newline at end of file + error_freq_df.to_csv(args.output_tsv, sep="\t", index=False, compression="gzip") diff --git a/workflow/scripts/get_charging_summary.py b/workflow/scripts/get_charging_summary.py index 2a0d980..9000bbe 100644 --- a/workflow/scripts/get_charging_summary.py +++ b/workflow/scripts/get_charging_summary.py @@ -4,7 +4,6 @@ def get_charging_stats(fn): - fo = pysam.AlignmentFile(fn) read_counts = {} for read in fo: @@ -18,7 +17,6 @@ def get_charging_stats(fn): if __name__ == "__main__": - parser = argparse.ArgumentParser( description=""" Generate a table of read counts and percent aminoacylation. diff --git a/workflow/scripts/get_charging_table.py b/workflow/scripts/get_charging_table.py index 632e8aa..bb997f1 100644 --- a/workflow/scripts/get_charging_table.py +++ b/workflow/scripts/get_charging_table.py @@ -11,7 +11,6 @@ def extract_tag(bam_file, output_tsv, tag): - open_func = gzip.open if output_tsv.endswith(".gz") else open mode = "wt" if output_tsv.endswith(".gz") else "w" @@ -28,7 +27,7 @@ def extract_tag(bam_file, output_tsv, tag): tag_array = dict(read.tags).get(tag, None) # XXX: handle case where there are more than 1 tag value - # not clear why this is, but we skip for now as it's a small + # not clear why this is, but we skip for now as it's a small # number of reads affected if len(tag_array) > 1: continue diff --git a/workflow/scripts/get_trna_charging_cpm.py b/workflow/scripts/get_trna_charging_cpm.py index 91ab892..295f336 100644 --- a/workflow/scripts/get_trna_charging_cpm.py +++ b/workflow/scripts/get_trna_charging_cpm.py @@ -1,15 +1,21 @@ #! /usr/bin/env python """ -and collapses the output of running a Remora CCA model and extracting +Collapses the output of running a Remora CCA model and extracting per-read information on charging likelihood into an ML tag into per-isodecoder counts (and CPM-normalized counts) of charged and uncharged tRNAs as determined by the model with a ML >= 200 threshold. -tRNA-AA-anticodon-family-species-chargingref are all preserved from BWA alignment, +TODO: we should compute this directly from the final BAM file, which + contains the same charging information in the `CL` tag; no need to write out + the intermediate per-read charging info. Also, `per_read_charging()` + does not reflect what this script actuall does. Should be `aggregate_trna_charging()` + or similar. + +tRNA-AA-anticodon-family-species-ref are all preserved from BWA alignment, and can be further collapsed as desired in downstream analysis -CPM normalization reflects counts per million reads that passed alingnment and +CPM normalization reflects counts per million reads that passed alignment and the filtering parameters for Remora classification; these are full length tRNA """ @@ -23,7 +29,7 @@ def per_read_charging(input, output, threshold): # Categorize tRNAs as charged or uncharged df["status"] = df["charging_likelihood"].apply( - lambda x: "charged" if x >= threshold else "uncharged" + lambda x: "counts_charged" if x >= threshold else "counts_uncharged" ) # Group by tRNA and status to get counts @@ -33,8 +39,8 @@ def per_read_charging(input, output, threshold): total_reads = len(df) # Normalize counts by CPM - count_data["charged_CPM"] = (count_data["charged"] / total_reads) * 1e6 - count_data["uncharged_CPM"] = (count_data["uncharged"] / total_reads) * 1e6 + count_data["cpm_charged"] = (count_data["counts_charged"] / total_reads) * 1e6 + count_data["cpm_uncharged"] = (count_data["counts_uncharged"] / total_reads) * 1e6 if output.endswith(".gz"): output_file = gzip.open(output, "wt") diff --git a/workflow/scripts/transfer_tags.py b/workflow/scripts/transfer_tags.py index 5fbdc42..fcd589c 100644 --- a/workflow/scripts/transfer_tags.py +++ b/workflow/scripts/transfer_tags.py @@ -1,30 +1,33 @@ #! /usr/bin/env python """ -Transfer ML/MM tags from one BAM file to another based on read IDs. -Output only primary alignments with transferred tags." +Transfer tags from one BAM file to another based on read IDs. + +Output only primary alignments with transferred tags. + +Use `--rename` to rename tags during transfer. """ -import pysam +from collections import defaultdict +from pysam import AlignmentFile + +def transfer_tags(tags, rename, source_bam, target_bam, output_bam): + renamed_tags = parse_tag_items(rename) -def transfer_tags(source_bam, target_bam, output_bam): - - with( - pysam.AlignmentFile(source_bam, "rb") as source, - pysam.AlignmentFile(target_bam, "rb") as target, - pysam.AlignmentFile(output_bam, "wb", template=target) as output + with ( + AlignmentFile(source_bam, "rb") as source, + AlignmentFile(target_bam, "rb") as target, + AlignmentFile(output_bam, "wb", template=target) as output, ): + # Store tags from the source BAM based on read ID + source_tags = defaultdict(dict) - # Create a dictionary to store ML and MM tags from the source BAM based on read IDs - source_tags = {} for read in source: if not read.is_unmapped: - # Extract ML and MM tags - ml_tag = read.get_tag("ML") if read.has_tag("ML") else None - mm_tag = read.get_tag("MM") if read.has_tag("MM") else None - if ml_tag or mm_tag: - source_tags[read.query_name] = (ml_tag, mm_tag) + for tag in tags: + if read.has_tag(tag): + source_tags[read.query_name][tag] = read.get_tag(tag) # Transfer tags and write only primary alignments with transferred tags for read in target: @@ -33,33 +36,50 @@ def transfer_tags(source_bam, target_bam, output_bam): continue if read.query_name in source_tags: - ml_tag, mm_tag = source_tags[read.query_name] - # Add ML and MM tags to the target read if they exist - if mm_tag is not None: - read.set_tag("MM", mm_tag) - if ml_tag is not None: - read.set_tag("ML", ml_tag) - # Write the read to the output BAM only if tags were transferred + for tag, tag_val in source_tags[read.query_name].items(): + if tag in renamed_tags: + read.set_tag(renamed_tags[tag], tag_val) + else: + read.set_tag(tag, tag_val) + + # Write read only if tags were transferred output.write(read) +def parse_tag_items(rename): + ret = {} + for item in rename: + key, val = map(str.strip, item.split("=")) + ret[key] = val + return ret + + if __name__ == "__main__": import argparse parser = argparse.ArgumentParser( - description="Transfer ML/MM tags from one BAM file to another based on read IDs, and output only primary alignments with transferred tags." + description="Transfer tags from one BAM file to another based on read IDs, and output only primary alignments with transferred tags." ) parser.add_argument( - "-s", "--source", required=True, help="Source BAM file (with ML/MM tags)" + "-t", "--tags", metavar="MM", nargs="+", required=True, help="Tags to transfer" ) + + parser.add_argument( + "--rename", + nargs="+", + metavar="OLD=NEW", + help="tags to rename during transfer", + ) + + parser.add_argument("--source", required=True, help="Source BAM file (with tags)") + parser.add_argument( - "-t", "--target", required=True, help="Target BAM file (without ML/MM tags)" + "--target", required=True, help="Target BAM file (without tags)" ) parser.add_argument( - "-o", "--output", required=True, help="Output BAM file with transferred tags" + "--output", required=True, help="Output BAM file with transferred tags" ) args = parser.parse_args() - # Call the function to transfer tags - transfer_tags(args.source, args.target, args.output) + transfer_tags(args.tags, args.rename, args.source, args.target, args.output) diff --git a/workflow/workflow_dag.png b/workflow/workflow_dag.png index 23c56fa..3c3e609 100644 Binary files a/workflow/workflow_dag.png and b/workflow/workflow_dag.png differ