diff --git a/.DS_Store b/.DS_Store
index 687c27a..a350e65 100644
Binary files a/.DS_Store and b/.DS_Store differ
diff --git a/minota/.DS_Store b/bioinformatics/.DS_Store
similarity index 80%
rename from minota/.DS_Store
rename to bioinformatics/.DS_Store
index 1ee8e50..c28b493 100644
Binary files a/minota/.DS_Store and b/bioinformatics/.DS_Store differ
diff --git a/bioinformatics/analysis/.DS_Store b/bioinformatics/analysis/.DS_Store
new file mode 100644
index 0000000..d70db8a
Binary files /dev/null and b/bioinformatics/analysis/.DS_Store differ
diff --git a/minota/markdown/minota-quality-control.md b/bioinformatics/analysis/intro_to_quality_control.md
similarity index 93%
rename from minota/markdown/minota-quality-control.md
rename to bioinformatics/analysis/intro_to_quality_control.md
index f7cd684..ede2c34 100644
--- a/minota/markdown/minota-quality-control.md
+++ b/bioinformatics/analysis/intro_to_quality_control.md
@@ -1,6 +1,7 @@
-# MINOTA Workshop: Introduction to Quality Control
+# Introduction to Quality Control
+
+In this tutorial, we'll be covering the critical process of Quality Control.
-Welcome to the MDIBL MINOTA Workshop. In this portion of the course, we'll be covering the critical process of Quality Control.
First, a review of some exploratory tools for both pre and post transcriptome analysis, and how they can provide you with an overview of your input data and output results, without having to sift through pages of text log files (as fun as that sounds).
Next, we're taking a look at a couple of software packages that do the heavy lifting, and are chiefly responsible for the Quality Control aspects of QC. Specifically Trimmomatic, in an integrated context within Trinity, and Trim Galore!; a wonderful piece of software built by the developers of FastQC (which is, coincidentally, half of it!).
@@ -97,7 +98,7 @@ First, you're going to want to fire up your favorite terminal, and ssh into your
It'll look something like this:
-
+
### QC Workflow Guided Run
@@ -121,7 +122,7 @@ Delete the text after `path:` under `seqfile:`.
Fill the empty `path:` with the file path we `ls`'d earlier:
-
+
**If you botch or delete something, and you don't remember what it was: in `nano`, use `control + x` on macOS / `ctrl + x` on Windows, followed by `n` and `enter`.**
@@ -131,7 +132,7 @@ To save your changes, use `control + o` and `enter`.
To execute the workflow in CWL, simply type the following on the command line:
-
+
Depending on the file you chose, it should process fairly quickly, depositing six named files in your directory: an html file,a zipped folder containing the raw contents of the initial FastQC report, a trimmed read file, a trimmed html file + zipped raw contents, and a trimming report text file.
@@ -143,7 +144,7 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Basic Statistics
-
+
* Summary statistics of your input file
* File type, encoding, total sequence, sequence quality, length, and %GC
@@ -151,7 +152,7 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Per Base Sequence Quality
-
+
* BoxWhisker plot made up of
* central red line being the median value
@@ -166,7 +167,7 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Per Tile Sequence Quality
-
+
* Shows deviations from average quality for each tile
* Blue = positions where quality was at or above average for that base in the run
@@ -174,7 +175,7 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Per Sequence Quality Scores
-
+
* Shows you if a subset of your sequences contain universally low quality values
* If a large amount of sequences in a run have an overall low quality, may point to a systemic problem (either entirely, or a portion) with the run itself
@@ -182,7 +183,7 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Per Base Sequence Content
-
+
* Plots out proportion of each base position in a file, where normal DNA bases are called
* There should be little to no difference, and the lines should run close to one another (should not be massively imbalanced)
@@ -196,7 +197,7 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Per Sequence GC Content
-
+
* Measures GC content across entire length of every sequence, compares to a normal distribution model of GC content
* Sharp peaks in measured may indicate a contaminated library
@@ -204,21 +205,21 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Per Base N Content
-
+
* When a base call is unable to be made by a sequencer, an N is put in place of the normal base
* If there are a significant portion of per-base N content, it suggests that the pipeline was not able to conduct valid base calls
#### Sequence Length Distribution
-
+
* Graphs the distribution of fragment sizes in the sequence file analyzed
* Some sequencers generate fragments with uniform lengths; even so, after trimming the uniformity will be broken and variation in length introduced
#### Sequence Duplication Levels
-
+
* When working with a diverse library, most sequences should occur only once in the final set
* low levels of duplication can point to a high amount of coverage of the target sequence
@@ -235,14 +236,14 @@ For those of you may not have worked with FastQC reports before, I'll be going o
#### Overrepresented Sequences
-
+
* Lists all sequences making up more that 0.1% of total, though to conserve memory, only ones that appear in first 100,000 sequencs are tracked
* For every overrepresented sequence, FastQC looks for matches in a database of common contaminants, reporting best found hits
### Adapter Content
-
+
* Picks up wether your library has a significant amount of adapter sequence, and whether you need to conducting trimming.
* Plot shows cumulative percentage count of the proportion of the library that has seen each adapter sequence at each position
diff --git a/minota/images/quality-control/.DS_Store b/bioinformatics/analysis/intro_to_quality_control_img/.DS_Store
similarity index 100%
rename from minota/images/quality-control/.DS_Store
rename to bioinformatics/analysis/intro_to_quality_control_img/.DS_Store
diff --git a/minota/images/quality-control/adp_con.png b/bioinformatics/analysis/intro_to_quality_control_img/adp_con.png
similarity index 100%
rename from minota/images/quality-control/adp_con.png
rename to bioinformatics/analysis/intro_to_quality_control_img/adp_con.png
diff --git a/minota/images/quality-control/basic_stats.png b/bioinformatics/analysis/intro_to_quality_control_img/basic_stats.png
similarity index 100%
rename from minota/images/quality-control/basic_stats.png
rename to bioinformatics/analysis/intro_to_quality_control_img/basic_stats.png
diff --git a/minota/images/quality-control/ovr_rep_seq.png b/bioinformatics/analysis/intro_to_quality_control_img/ovr_rep_seq.png
similarity index 100%
rename from minota/images/quality-control/ovr_rep_seq.png
rename to bioinformatics/analysis/intro_to_quality_control_img/ovr_rep_seq.png
diff --git a/minota/images/quality-control/per_base_N_con.png b/bioinformatics/analysis/intro_to_quality_control_img/per_base_N_con.png
similarity index 100%
rename from minota/images/quality-control/per_base_N_con.png
rename to bioinformatics/analysis/intro_to_quality_control_img/per_base_N_con.png
diff --git a/minota/images/quality-control/per_base_seq_con.png b/bioinformatics/analysis/intro_to_quality_control_img/per_base_seq_con.png
similarity index 100%
rename from minota/images/quality-control/per_base_seq_con.png
rename to bioinformatics/analysis/intro_to_quality_control_img/per_base_seq_con.png
diff --git a/minota/images/quality-control/per_base_seq_qual.png b/bioinformatics/analysis/intro_to_quality_control_img/per_base_seq_qual.png
similarity index 100%
rename from minota/images/quality-control/per_base_seq_qual.png
rename to bioinformatics/analysis/intro_to_quality_control_img/per_base_seq_qual.png
diff --git a/minota/images/quality-control/per_seq_gc_con.png b/bioinformatics/analysis/intro_to_quality_control_img/per_seq_gc_con.png
similarity index 100%
rename from minota/images/quality-control/per_seq_gc_con.png
rename to bioinformatics/analysis/intro_to_quality_control_img/per_seq_gc_con.png
diff --git a/minota/images/quality-control/per_seq_qual_score.png b/bioinformatics/analysis/intro_to_quality_control_img/per_seq_qual_score.png
similarity index 100%
rename from minota/images/quality-control/per_seq_qual_score.png
rename to bioinformatics/analysis/intro_to_quality_control_img/per_seq_qual_score.png
diff --git a/minota/images/quality-control/per_tile_seq_qual.png b/bioinformatics/analysis/intro_to_quality_control_img/per_tile_seq_qual.png
similarity index 100%
rename from minota/images/quality-control/per_tile_seq_qual.png
rename to bioinformatics/analysis/intro_to_quality_control_img/per_tile_seq_qual.png
diff --git a/minota/images/quality-control/qc_workflow_edit.png b/bioinformatics/analysis/intro_to_quality_control_img/qc_workflow_edit.png
similarity index 100%
rename from minota/images/quality-control/qc_workflow_edit.png
rename to bioinformatics/analysis/intro_to_quality_control_img/qc_workflow_edit.png
diff --git a/minota/images/quality-control/qc_workflow_run.png b/bioinformatics/analysis/intro_to_quality_control_img/qc_workflow_run.png
similarity index 100%
rename from minota/images/quality-control/qc_workflow_run.png
rename to bioinformatics/analysis/intro_to_quality_control_img/qc_workflow_run.png
diff --git a/minota/images/quality-control/seq_dup_lev.png b/bioinformatics/analysis/intro_to_quality_control_img/seq_dup_lev.png
similarity index 100%
rename from minota/images/quality-control/seq_dup_lev.png
rename to bioinformatics/analysis/intro_to_quality_control_img/seq_dup_lev.png
diff --git a/minota/images/quality-control/seq_len_dist.png b/bioinformatics/analysis/intro_to_quality_control_img/seq_len_dist.png
similarity index 100%
rename from minota/images/quality-control/seq_len_dist.png
rename to bioinformatics/analysis/intro_to_quality_control_img/seq_len_dist.png
diff --git a/minota/images/quality-control/ssh_1.png b/bioinformatics/analysis/intro_to_quality_control_img/ssh_1.png
similarity index 100%
rename from minota/images/quality-control/ssh_1.png
rename to bioinformatics/analysis/intro_to_quality_control_img/ssh_1.png
diff --git a/bioinformatics/online_resources/intro_to_databases.md b/bioinformatics/online_resources/intro_to_databases.md
new file mode 100644
index 0000000..111d4c0
--- /dev/null
+++ b/bioinformatics/online_resources/intro_to_databases.md
@@ -0,0 +1,32 @@
+---
+title: Introduction to Databases
+author: "Nathaniel Maki"
+organization: MDIBL Computational Core
+date: "January 20th"
+---
+
+# Introduction to Databases
+
+## Learning Objectives
+
+* Learn the differences between Primary and Secondary databases
+* Exposure to the wide range of databases available for exploration
+* Become familiar with standard use cases for a selection of sites covered
+
+## Summary
+
+Commonly, databases are characterized as either primary or secondary, and this holds true for bioinformatics as it does for other data-rich fields
+
+**Primary databases** are comprised of data that has been experimentally derived, with the results being uploaded directly into the database by researchers
+
+* In our domain for example, the information that is archived is made up of content such as nucleotide or protein sequence, or macromolecular structure
+* Once assigned an accession number, the data stored within a primary database becomes static, and is designated a Record
+ * Ex: GenBank, ENA, GEO
+
+**Secondary databases** could be considered an "extension" of primary ones, due to their makeup being derived from the analysis of primary data
+
+* Pull from multiple sources, such as other databases, and available scientific literature
+* These resources are incredibly complex, combining manual and computational analysis/interpretation, and are very highly curated
+* Their primary purpose is to exist as vast repositories of reference material, with detailed data ranging from single genes, to complete and published experimental results
+ * Ex: Uniprot, Ensembl, InterPro
+
diff --git a/bioinformatics/online_resources/intro_to_ensembl.md b/bioinformatics/online_resources/intro_to_ensembl.md
new file mode 100644
index 0000000..5595ef8
--- /dev/null
+++ b/bioinformatics/online_resources/intro_to_ensembl.md
@@ -0,0 +1,113 @@
+---
+title: Introduction to Ensembl
+author: "Nathaniel Maki"
+organization: MDIBL Computational Core
+date: "January 24th"
+---
+
+# Introduction to Ensembl
+
+## Summary
+
+* Ensembl is a genome browser, acting as a vast repository of reference genomes and annotations for a wide range of organisms, including Human, Mouse, C. Elegans, and Zebrafish
+* Mostly dedicated to model organisms, but does contain resources for a number of non-model species
+* Primarily focused on vertebrates, Ensembl Genomes extends across to non-vertebrates, and includes Plants, Fungi, and Bacteria
+
+Ensembl annotates a large swath of data onto its genome assemblies, first type is Gene Models(builds)
+
+## Gene Models
+
+Comprised of:
+
+International Nucleotide Sequence databases (ENA, GenBank, DDBJ)
+* cDNAs
+* ESTs
+* RNAseq
+
+NCBI RefSeq
+* Manually annotated proteins and MRNAs
+
+Protein Sequence databases
+* Swiss-Prot
+
+Sequences from the above resources are aligned to the genome, transcripts clustered from alignments based on overlapping coding sequences
+
+Forms Ensembl genes (automated genome annotation pipeline)
+
+Ensembl genomes can either be automatically or manually annotated (HAVANA for manual)
+* Set of genes is known as the Gencode geneset
+
+In addition to gene annotation, other data types are added to genome, including variation data, comparative genomics, and regulatory features (which we'll touch on later)
+
+## Querying
+
+Choosing human genome build, and search for `tp53`
+
+* links on the left of the page show specific information related to the TP53 gene
+
+### Summary
+
+* Gene has 27 transcripts annotated, 312 orthologues, 2 paralogues
+* `Show transcript table` gives us detailed information regarding the Gene and it's associated transcripts
+ * Transcript ID
+ * Biotype
+ * CCDS (Consensus Coding DNA seq set)
+ * Uniprot Match - Link to Protein transcript entry
+
+#### Gene Track
+
+* Blue bar = Contigs (sequence of overlapping reads)
+ * Transcripts above contig are on the forward strand, below it they're on the reverse
+ * Boxes are exons, lines which connect them are the introns
+ * Filled in boxes contain coding sequence, unfilled represent untranslated regions
+ * Red = Ensembl Protein coding (annotated by Ensembl automated)
+ * Gold = merged Ensembl/Havana (annotated by Ensembl automated + Havana manual annotation)
+ * Blue = Processed Transcript
+* Regulation
+ * Dark Salmon = Promoter
+ * Light Salmon = Promoter Flank
+ * Pink = Transcription Factor Binding Site
+ * Cyan = CTCF
+
+* Selecting a Transcript
+ * Click box of choosing and select the transcript ID
+ * Can examine supporting evidence
+ * Protein Information (reference UniProt)
+
+* Region in detail
+ * Selection Location
+ * Top of page is chromosomal overview, red box denotes region of chromosome where other views on page focus on
+ * Red box in Detail highlights 1mb overview of TP53
+ * Scrolling further down shows most detailed location of TP53
+ * Tracks can be formatted, added, removed, etc
+ * Gear icon (configure) lets you add additional tracks
+
+### Comparative Genomics
+
+Allows you to compare Gene against multiple alignments, Gene Trees, Orthologous and Paralogues
+* Gene Ortholog - homomlogous genes that diverged following evolution giving rise to new species, maintain similar function to precursor gene
+ * Originate from speciation event
+* Gene Paralog - homologous genes that diverged within a species, a new gene that upholds a new function
+ * Come into existence during gene duplication, where a copy of the gene obtains a mutation -> new gene with new function
+
+#### Alignments
+
+* Pairwise - meaning two sequences at a time
+* Multiple - more than two (attempt to align all sequences within a query set)
+
+Can choose many sequences to potentially align to
+* Examine full map to see areas of similar sequence
+* Also look at high quality assemblies compared to low quality
+
+#### Gene Tree
+* Relation of gene between species, includes homologs
+
+#### Orthologous
+* Lists gene orthologous
+
+#### Paralagous
+* Lists gene paralogous
+
+
+
+
diff --git a/intro_to_geo.md b/bioinformatics/online_resources/intro_to_geo.md
similarity index 87%
rename from intro_to_geo.md
rename to bioinformatics/online_resources/intro_to_geo.md
index 4ccf0d5..34b332d 100644
--- a/intro_to_geo.md
+++ b/bioinformatics/online_resources/intro_to_geo.md
@@ -6,7 +6,6 @@ organization: MDIBL Computational Core
date: "May 25th, 2020"
---
# Introduction to GEO
-
## Learning Objectives
* Become familiar with what GEO is used for, and how it can supplement research
@@ -34,7 +33,7 @@ date: "May 25th, 2020"
## Platform Overview
-
+
The GEO homepage is comprised of 4 components:
* Getting Started
@@ -64,11 +63,11 @@ The GEO homepage is comprised of 4 components:
* May reference many Samples that have been submitted by multiple submitters
* GEO accession number (GPLxxx)
-
+
#### Example Platform Record
-
+
### Sample
@@ -76,11 +75,11 @@ The GEO homepage is comprised of 4 components:
* May only reference one Platform, but can exist in multiple Series
* GEO accession number (GSMxxx)
-
+
#### Example Sample Record
-
+
### Series
@@ -89,11 +88,11 @@ The GEO homepage is comprised of 4 components:
* May contain tables describing extracted data, summary conclusions, and/or analyses
* GEO accession number (GSExxx)
-
+
#### Example Series Record
-
+
## Curated records
@@ -107,7 +106,7 @@ The GEO homepage is comprised of 4 components:
* DataSet records contain resources and tools for further analysis, including clustering utilities and multi-sample comparisons
* Because of a (massive) backlog in the generation of DataSets, not every Series has an accompanying DataSet record
-
+
## Searching GEO
@@ -120,12 +119,12 @@ GEO offers both `general` and `advanced` query functionality
* To General search, type content into the `Search` box on the GEO Datasets front page and hit enter
* While easy to use, a general search will often give you an overwhelming number of results
-
+
* To refine your query, you can use the `Advanced Search` button
* Selecting `Advanced Search` brings you to another GEO Datasets page
-
+
* This lists *everything* related to your search query. To refine to Datasets, choose `DataSets` from the `Entry type` column on the left side of the page
@@ -134,13 +133,13 @@ GEO offers both `general` and `advanced` query functionality
* Advanced search is a bit more involved, but still fairly easy to use
* To access the `Advanced Search` builder, select `Advanced` under the general search bar
-
+
* To build a search query, first you need to `Add terms to the query box`
* The dropdown menu gives you a large number of fields to choose from, which can be further refined by the terms entered
* There is also an autocomplete feature built in that helps avoid spelling mistakes, and expands functionality
-
+
* Refine your advanced search to only DataSets by choosing `DataSets` from the `Entry type` column on the left side of the page
@@ -151,7 +150,7 @@ GEO offers both `general` and `advanced` query functionality
* Underneath Accession number are quick links to related GEO Profiles, PubMed citation page, PMC free full-text articles, and tools for analysis
* Selecting the main title brings you to the specific DataSet Record page
-
+
#### GEO DataSet Records
@@ -161,7 +160,7 @@ GEO offers both `general` and `advanced` query functionality
* Under that are options to download various files containing additional DataSet information, experimental variable subsets, etc
* At the bottom are additional data analysis tools for finding genes, comparing sets of samples, generating heatmaps, and examining experimental design + value distributions.
-
+
### GEO Profile
@@ -170,7 +169,7 @@ GEO offers both `general` and `advanced` query functionality
* Assembled and sourced from GEO microarray data
* Queries based upon gene annotation / profile characteristics
-
+
#### GEO Profile Results
@@ -182,7 +181,7 @@ GEO offers both `general` and `advanced` query functionality
* Reporter: Original sequence reporter(s) taken from the Platform record supplied by submitter
* Experiment: DataSet from where the profile comes from
-
+
#### GEO Profile Chart
@@ -194,7 +193,7 @@ GEO offers both `general` and `advanced` query functionality
* Because of this, the values should be considered arbitrary, and direct comparisons between different Datasets may not be accurate.
* The squares represent rank order of expression measurements, and indicate where the expression of that gene falls in comparison to all other genes on an array.
-
+
#### GEO Profile Sample Accession
@@ -208,17 +207,17 @@ GEO offers both `general` and `advanced` query functionality
* Also includes the Platform ID (GPL) and Series ID at the bottom
* You also have the option to download raw Sample specific CEL data, either through a web browser, or through FTP
-
+
## Downloading GEO Data with FTP
* NCBI provides you with a plethora of options to download GEO data. We'll touch on the FTP site briefly
-
+
* Almost all data queried or interacted with on NCBI can be direct downloaded from the FTP site, using an index location and command line programs such as `curl` or `wget`
-
+
## Contact
diff --git a/geo_images/.DS_Store b/bioinformatics/online_resources/intro_to_geo_img/.DS_Store
similarity index 100%
rename from geo_images/.DS_Store
rename to bioinformatics/online_resources/intro_to_geo_img/.DS_Store
diff --git a/geo_images/dataset_query.png b/bioinformatics/online_resources/intro_to_geo_img/dataset_query.png
similarity index 100%
rename from geo_images/dataset_query.png
rename to bioinformatics/online_resources/intro_to_geo_img/dataset_query.png
diff --git a/geo_images/dataset_record.png b/bioinformatics/online_resources/intro_to_geo_img/dataset_record.png
similarity index 100%
rename from geo_images/dataset_record.png
rename to bioinformatics/online_resources/intro_to_geo_img/dataset_record.png
diff --git a/geo_images/dataset_sample.png b/bioinformatics/online_resources/intro_to_geo_img/dataset_sample.png
similarity index 100%
rename from geo_images/dataset_sample.png
rename to bioinformatics/online_resources/intro_to_geo_img/dataset_sample.png
diff --git a/geo_images/dataset_start.png b/bioinformatics/online_resources/intro_to_geo_img/dataset_start.png
similarity index 100%
rename from geo_images/dataset_start.png
rename to bioinformatics/online_resources/intro_to_geo_img/dataset_start.png
diff --git a/geo_images/general_query.png b/bioinformatics/online_resources/intro_to_geo_img/general_query.png
similarity index 100%
rename from geo_images/general_query.png
rename to bioinformatics/online_resources/intro_to_geo_img/general_query.png
diff --git a/geo_images/geo_advancedquery.png b/bioinformatics/online_resources/intro_to_geo_img/geo_advancedquery.png
similarity index 100%
rename from geo_images/geo_advancedquery.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_advancedquery.png
diff --git a/geo_images/geo_dataset_allsearch.png b/bioinformatics/online_resources/intro_to_geo_img/geo_dataset_allsearch.png
similarity index 100%
rename from geo_images/geo_dataset_allsearch.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_dataset_allsearch.png
diff --git a/geo_images/geo_dataset_browser.png b/bioinformatics/online_resources/intro_to_geo_img/geo_dataset_browser.png
similarity index 100%
rename from geo_images/geo_dataset_browser.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_dataset_browser.png
diff --git a/geo_images/geo_dataset_querybuilder.png b/bioinformatics/online_resources/intro_to_geo_img/geo_dataset_querybuilder.png
similarity index 100%
rename from geo_images/geo_dataset_querybuilder.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_dataset_querybuilder.png
diff --git a/geo_images/geo_dataset_record.png b/bioinformatics/online_resources/intro_to_geo_img/geo_dataset_record.png
similarity index 100%
rename from geo_images/geo_dataset_record.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_dataset_record.png
diff --git a/geo_images/geo_dataset_results.png b/bioinformatics/online_resources/intro_to_geo_img/geo_dataset_results.png
similarity index 100%
rename from geo_images/geo_dataset_results.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_dataset_results.png
diff --git a/geo_images/geo_dataset_search.png b/bioinformatics/online_resources/intro_to_geo_img/geo_dataset_search.png
similarity index 100%
rename from geo_images/geo_dataset_search.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_dataset_search.png
diff --git a/geo_images/geo_download.png b/bioinformatics/online_resources/intro_to_geo_img/geo_download.png
similarity index 100%
rename from geo_images/geo_download.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_download.png
diff --git a/geo_images/geo_index.png b/bioinformatics/online_resources/intro_to_geo_img/geo_index.png
similarity index 100%
rename from geo_images/geo_index.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_index.png
diff --git a/geo_images/geo_platform_browser.png b/bioinformatics/online_resources/intro_to_geo_img/geo_platform_browser.png
similarity index 100%
rename from geo_images/geo_platform_browser.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_platform_browser.png
diff --git a/geo_images/geo_platform_ex.png b/bioinformatics/online_resources/intro_to_geo_img/geo_platform_ex.png
similarity index 100%
rename from geo_images/geo_platform_ex.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_platform_ex.png
diff --git a/geo_images/geo_profile_chart.png b/bioinformatics/online_resources/intro_to_geo_img/geo_profile_chart.png
similarity index 100%
rename from geo_images/geo_profile_chart.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_profile_chart.png
diff --git a/geo_images/geo_profile_record.png b/bioinformatics/online_resources/intro_to_geo_img/geo_profile_record.png
similarity index 100%
rename from geo_images/geo_profile_record.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_profile_record.png
diff --git a/geo_images/geo_profile_result.png b/bioinformatics/online_resources/intro_to_geo_img/geo_profile_result.png
similarity index 100%
rename from geo_images/geo_profile_result.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_profile_result.png
diff --git a/geo_images/geo_profile_sampleacc.png b/bioinformatics/online_resources/intro_to_geo_img/geo_profile_sampleacc.png
similarity index 100%
rename from geo_images/geo_profile_sampleacc.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_profile_sampleacc.png
diff --git a/geo_images/geo_sample_ex.png b/bioinformatics/online_resources/intro_to_geo_img/geo_sample_ex.png
similarity index 100%
rename from geo_images/geo_sample_ex.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_sample_ex.png
diff --git a/geo_images/geo_samples_browser.png b/bioinformatics/online_resources/intro_to_geo_img/geo_samples_browser.png
similarity index 100%
rename from geo_images/geo_samples_browser.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_samples_browser.png
diff --git a/geo_images/geo_series_browser.png b/bioinformatics/online_resources/intro_to_geo_img/geo_series_browser.png
similarity index 100%
rename from geo_images/geo_series_browser.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_series_browser.png
diff --git a/geo_images/geo_series_ex.png b/bioinformatics/online_resources/intro_to_geo_img/geo_series_ex.png
similarity index 100%
rename from geo_images/geo_series_ex.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_series_ex.png
diff --git a/geo_images/geo_start.png b/bioinformatics/online_resources/intro_to_geo_img/geo_start.png
similarity index 100%
rename from geo_images/geo_start.png
rename to bioinformatics/online_resources/intro_to_geo_img/geo_start.png
diff --git a/geo_images/profile_query.png b/bioinformatics/online_resources/intro_to_geo_img/profile_query.png
similarity index 100%
rename from geo_images/profile_query.png
rename to bioinformatics/online_resources/intro_to_geo_img/profile_query.png
diff --git a/geo_images/profile_record.png b/bioinformatics/online_resources/intro_to_geo_img/profile_record.png
similarity index 100%
rename from geo_images/profile_record.png
rename to bioinformatics/online_resources/intro_to_geo_img/profile_record.png
diff --git a/geo_images/profile_record_indepth.png b/bioinformatics/online_resources/intro_to_geo_img/profile_record_indepth.png
similarity index 100%
rename from geo_images/profile_record_indepth.png
rename to bioinformatics/online_resources/intro_to_geo_img/profile_record_indepth.png
diff --git a/geo_images/sample_accession.png b/bioinformatics/online_resources/intro_to_geo_img/sample_accession.png
similarity index 100%
rename from geo_images/sample_accession.png
rename to bioinformatics/online_resources/intro_to_geo_img/sample_accession.png
diff --git a/intro_to_ncbi.md b/bioinformatics/online_resources/intro_to_ncbi.md
similarity index 86%
rename from intro_to_ncbi.md
rename to bioinformatics/online_resources/intro_to_ncbi.md
index 8f09ce1..77c7444 100644
--- a/intro_to_ncbi.md
+++ b/bioinformatics/online_resources/intro_to_ncbi.md
@@ -1,11 +1,10 @@
---
title: Introduction to NCBI
-author: Nathaniel Maki
+author: "Nathaniel Maki"
organization: MDIBL Computational Core
-date: May 17th, 2020
+date: "May 23rd"
---
# Introduction to NCBI
-
## Learning Objectives
* Acquire a basic understanding of the NCBI website, its resources, and its most commonly used utilities/tools
@@ -23,11 +22,11 @@ date: May 17th, 2020
In addition the the above, NCBI also produces training materials to assist in orienting students, fledgling researchers, and seasoned PIs to the tools and resources they offer
-#### Training material can be found [here](https://www.ncbi.nlm.nih.gov/home/learn/)
+**Training material can be found [here](https://www.ncbi.nlm.nih.gov/home/learn/)**
## Logging in / Creating an NCBI Account
-#### Having an NCBI account greatly increases the flexibility of the tools and resources at your disposal.
+**Having an NCBI account greatly increases the flexibility of the tools and resources at your disposal.**
* This includes working with NCBI's programming API:
* With a verified account, your API calls are not throttled, and you're afforded more compute than if you were to remain anonymous
@@ -38,12 +37,10 @@ In addition the the above, NCBI also produces training materials to assist in or
## Platform Overview
-
+
The front page of NCBI acts as a hub, with the main source of navigation being the Entrez search bar, and the database selection menu to the left, and popular resources on the right
-On the right side of the page are popular resources within the NCBI site.
-
Content in the center is comprised of links to the main functions of the NCBI site
* Submit: NCBI Submission Portal, for uploading data to their online repositories
@@ -55,27 +52,27 @@ Content in the center is comprised of links to the main functions of the NCBI si
At the top left of the page, under the label `Resources`, are links to grouped tools and data repositories
-
+
Specifically selecting BLAST will take you to the tools' homepage:
-
+
For a comprehensive list of all resources, select `All Resources` under the `Resources` drop-down menu
-
+
Tutorials for many facets of NCBI are under the `How-To` menu
-
+
For guides on common tool use-cases, select `Data and Software`:
-
+
Selecting `Download the complete genome for an organism` will take you to a walkthrough of how to use the FTP site:
-
+
## Entrez Advanced Search
@@ -104,7 +101,7 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* Curated and expanded upon by NCBI in the forms of GEO Datasets and Profiles
* In addition to functioning as an experimental archive, NCBI provides a robust suite of tools for further analysis and exploration of submitted records
-
+
### Sequence Read Archive (SRA)
@@ -113,7 +110,7 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* Largest publicly available repository of High Throughput Sequencing (HTS) data
* Accompanied by SRA-toolkit suite for acquisition + dumping of data from SRA database to local and remote machines
-
+
### Genbank
@@ -123,7 +120,7 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* Gene (region of biological interest in record)
* Coding sequence (CDS)
-
+
### PubMed + PubMed Central
@@ -134,7 +131,7 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* **PubMed Central (PMC)** contains full-text versions of articles for free
* If available, PubMed will link to PMC for appropriate article
-
+
## Commonly used NCBI Site Utilities
@@ -149,18 +146,18 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* CD-search, which locates conserved domains in the submitted sequence
* Primer-BLAST, which allows you to design primers that are specific to a PCR template
-
+
### Downloading + FTP
* There are suites of tools for general data access (Entrez Programming Utilities) and ones that are repository specific (SRA Toolkit, GEO2R)
* Some require software to be installed locally before use
-
+
* Almost all data queried or interacted with on NCBI can be direct downloaded from the FTP site, using an index location and command line programs such as `curl` or `wget`.
-
+
## Under Development
@@ -175,7 +172,7 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* For example, searching for *Mus musculus*, will present you with only a couple reference genomes, and a single available annotation
* As this resource expands, expect usefulness to mature as well
-
+
### NCBI Virus (Beta)
@@ -185,7 +182,7 @@ As this is just an introduction to NCBI, a guide to building queries with Entrez
* Links to a page dedicated to COVID-19 research and literature
* Here you can browse vast amounts of nucleotide and protein data, locate relevant PubMed and PMC articles, perform in-place alignments, and construct phylogenetic trees
-
+
## Contact
diff --git a/ncbi_images/.DS_Store b/bioinformatics/online_resources/intro_to_ncbi_img/.DS_Store
similarity index 100%
rename from ncbi_images/.DS_Store
rename to bioinformatics/online_resources/intro_to_ncbi_img/.DS_Store
diff --git a/ncbi_images/GEO_dataset_record.png b/bioinformatics/online_resources/intro_to_ncbi_img/GEO_dataset_record.png
similarity index 100%
rename from ncbi_images/GEO_dataset_record.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/GEO_dataset_record.png
diff --git a/ncbi_images/GEO_datasets_results.png b/bioinformatics/online_resources/intro_to_ncbi_img/GEO_datasets_results.png
similarity index 100%
rename from ncbi_images/GEO_datasets_results.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/GEO_datasets_results.png
diff --git a/ncbi_images/GEO_profile_record.png b/bioinformatics/online_resources/intro_to_ncbi_img/GEO_profile_record.png
similarity index 100%
rename from ncbi_images/GEO_profile_record.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/GEO_profile_record.png
diff --git a/ncbi_images/GEO_profile_results.png b/bioinformatics/online_resources/intro_to_ncbi_img/GEO_profile_results.png
similarity index 100%
rename from ncbi_images/GEO_profile_results.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/GEO_profile_results.png
diff --git a/ncbi_images/GEO_sample_accessions.png b/bioinformatics/online_resources/intro_to_ncbi_img/GEO_sample_accessions.png
similarity index 100%
rename from ncbi_images/GEO_sample_accessions.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/GEO_sample_accessions.png
diff --git a/ncbi_images/Pubmed_searchop.png b/bioinformatics/online_resources/intro_to_ncbi_img/Pubmed_searchop.png
similarity index 100%
rename from ncbi_images/Pubmed_searchop.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/Pubmed_searchop.png
diff --git a/ncbi_images/blast_start.png b/bioinformatics/online_resources/intro_to_ncbi_img/blast_start.png
similarity index 100%
rename from ncbi_images/blast_start.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/blast_start.png
diff --git a/ncbi_images/bookshelf_search.png b/bioinformatics/online_resources/intro_to_ncbi_img/bookshelf_search.png
similarity index 100%
rename from ncbi_images/bookshelf_search.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/bookshelf_search.png
diff --git a/ncbi_images/download_hub.png b/bioinformatics/online_resources/intro_to_ncbi_img/download_hub.png
similarity index 100%
rename from ncbi_images/download_hub.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/download_hub.png
diff --git a/ncbi_images/ftp_index.png b/bioinformatics/online_resources/intro_to_ncbi_img/ftp_index.png
similarity index 100%
rename from ncbi_images/ftp_index.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ftp_index.png
diff --git a/ncbi_images/genbank_start.PNG b/bioinformatics/online_resources/intro_to_ncbi_img/genbank_start.PNG
similarity index 100%
rename from ncbi_images/genbank_start.PNG
rename to bioinformatics/online_resources/intro_to_ncbi_img/genbank_start.PNG
diff --git a/ncbi_images/howto_expanded.png b/bioinformatics/online_resources/intro_to_ncbi_img/howto_expanded.png
similarity index 100%
rename from ncbi_images/howto_expanded.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/howto_expanded.png
diff --git a/ncbi_images/howto_ftp_teaser.png b/bioinformatics/online_resources/intro_to_ncbi_img/howto_ftp_teaser.png
similarity index 100%
rename from ncbi_images/howto_ftp_teaser.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/howto_ftp_teaser.png
diff --git a/ncbi_images/ncbi_allresource.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_allresource.png
similarity index 100%
rename from ncbi_images/ncbi_allresource.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_allresource.png
diff --git a/ncbi_images/ncbi_blast.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_blast.png
similarity index 100%
rename from ncbi_images/ncbi_blast.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_blast.png
diff --git a/ncbi_images/ncbi_center.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_center.png
similarity index 100%
rename from ncbi_images/ncbi_center.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_center.png
diff --git a/ncbi_images/ncbi_datasets.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_datasets.png
similarity index 100%
rename from ncbi_images/ncbi_datasets.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_datasets.png
diff --git a/ncbi_images/ncbi_home.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_home.png
similarity index 100%
rename from ncbi_images/ncbi_home.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_home.png
diff --git a/ncbi_images/ncbi_howto_dropdown.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_howto_dropdown.png
similarity index 100%
rename from ncbi_images/ncbi_howto_dropdown.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_howto_dropdown.png
diff --git a/ncbi_images/ncbi_resources_dropdown.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_resources_dropdown.png
similarity index 100%
rename from ncbi_images/ncbi_resources_dropdown.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_resources_dropdown.png
diff --git a/ncbi_images/ncbi_signin.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_signin.png
similarity index 100%
rename from ncbi_images/ncbi_signin.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_signin.png
diff --git a/ncbi_images/ncbi_virus.png b/bioinformatics/online_resources/intro_to_ncbi_img/ncbi_virus.png
similarity index 100%
rename from ncbi_images/ncbi_virus.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/ncbi_virus.png
diff --git a/ncbi_images/pubmed_PMC.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_PMC.png
similarity index 100%
rename from ncbi_images/pubmed_PMC.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_PMC.png
diff --git a/ncbi_images/pubmed_advanced_result.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_advanced_result.png
similarity index 100%
rename from ncbi_images/pubmed_advanced_result.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_advanced_result.png
diff --git a/ncbi_images/pubmed_advanced_search.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_advanced_search.png
similarity index 100%
rename from ncbi_images/pubmed_advanced_search.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_advanced_search.png
diff --git a/ncbi_images/pubmed_allfields.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_allfields.png
similarity index 100%
rename from ncbi_images/pubmed_allfields.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_allfields.png
diff --git a/ncbi_images/pubmed_general_search.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_general_search.png
similarity index 100%
rename from ncbi_images/pubmed_general_search.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_general_search.png
diff --git a/ncbi_images/pubmed_querybuilt.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_querybuilt.png
similarity index 100%
rename from ncbi_images/pubmed_querybuilt.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_querybuilt.png
diff --git a/ncbi_images/pubmed_results.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_results.png
similarity index 100%
rename from ncbi_images/pubmed_results.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_results.png
diff --git a/ncbi_images/pubmed_searchbuild.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_searchbuild.png
similarity index 100%
rename from ncbi_images/pubmed_searchbuild.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_searchbuild.png
diff --git a/ncbi_images/pubmed_start.png b/bioinformatics/online_resources/intro_to_ncbi_img/pubmed_start.png
similarity index 100%
rename from ncbi_images/pubmed_start.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/pubmed_start.png
diff --git a/ncbi_images/sra_start.png b/bioinformatics/online_resources/intro_to_ncbi_img/sra_start.png
similarity index 100%
rename from ncbi_images/sra_start.png
rename to bioinformatics/online_resources/intro_to_ncbi_img/sra_start.png
diff --git a/intro_to_pubmed.md b/bioinformatics/online_resources/intro_to_pubmed.md
similarity index 63%
rename from intro_to_pubmed.md
rename to bioinformatics/online_resources/intro_to_pubmed.md
index e2ccd6a..595f88f 100644
--- a/intro_to_pubmed.md
+++ b/bioinformatics/online_resources/intro_to_pubmed.md
@@ -5,7 +5,6 @@ organization: MDIBL Computational Core
date: May 25th, 2020
---
# Introduction to PubMed
-
## Learning Objectives
* Become familiar with accessing and navigating PubMed/PMC
@@ -19,47 +18,51 @@ date: May 25th, 2020
## Logging in / Creating an NCBI Account
-#### Having an NCBI account greatly increases the flexibility of the tools and resources at your disposal.
+**Having an NCBI account greatly increases the flexibility of the tools and resources at your disposal.**
+
+This includes working with NCBI's programming API:
+* With a verified account, your API calls are not throttled, and you're afforded more compute than if you were to remain anonymous
+
+Assists in searching for and submitting through NCBI:
+* Saving of search strategies and queries built through PubMed and other repositories
+* You're required to have an account for data submission
-* This includes working with NCBI's programming API:
- * With a verified account, your API calls are not throttled, and you're afforded more compute than if you were to remain anonymous
-* Assists in searching for and submitting through NCBI:
- * Saving of search strategies and queries built through PubMed and other repositories
- * You're required to have an account for data submission
-* To sign in/create an account, click this [link](https://www.ncbi.nlm.nih.gov/account/?back_url=https%3A%2F%2Fwww.ncbi.nlm.nih.gov%2F)
+To sign in/create an account, click this [link](https://www.ncbi.nlm.nih.gov/account/?back_url=https%3A%2F%2Fwww.ncbi.nlm.nih.gov%2F)
## Platform Overview
-
+
-* PubMed draws from three main sources for citations and literature
- * MEDLINE
- * PMC
- * BookShelf
+PubMed draws from three main sources for citations and literature
+* MEDLINE
+* PMC
+* BookShelf
### MEDLINE
-* US National Library of Medicine (NLM) primary bibliographic database, containing over 25 million references to journal articles with a focus on biomedicine
+US National Library of Medicine (NLM) primary bibliographic database, containing over 25 million references to journal articles with a focus on biomedicine
* Most of PubMed references are acquired from this source
-* A fair number of citations in MEDLINE are being updated with links to the free full text articles, archived in PMC
-* If full text is not available through PMC, you can use the `Loansome Doc` feature to "borrow" the article through the National Network of Libraries of Medicine
+
+A fair number of citations in MEDLINE are being updated with links to the free full text articles, archived in PMC
+
+**If full text is not available through PMC, you can use the `Loansome Doc` feature to "borrow" the article through the National Network of Libraries of Medicine**
### PubMed Central (PMC)
-* Free full-text archive of biomedical and life sciences journal literature, managed by NCBI
-* Contains over 5 million full text records, with literature dating back to as late as the 1700s
+Free archive of biomedical and life sciences journal literature, managed by NCBI, and contains over 5 million full text records, with records dating back to as late as the 1700s
* If a full text article exists in a PubMed record, often it will include a link to a sibling PMC page
-* Through the NIH Preprint Pilot (as of June 2020), includes preprints that are the result of research funded by National Institutes of Health
+
+Through the NIH Preprint Pilot (as of June 2020), also includes preprints that are the result of research funded by National Institutes of Health
* Currently focused on preprints relating to SARS-CoV-2 virus and COVID-19
-
+
### Bookshelf
* Online archive that provides free access to books and documentation in the healthcare and life science fields
* Differentiates itself from PMC and MEDLINE by the depth of the content available
-
+
## Searching PubMed
@@ -73,12 +76,12 @@ PubMed offers both `general` and `advanced` query functionality
* Depending on what you looked for, PubMed (using a machine learning algorithm) will do its best to find the most relevant citation(s)
* While easy to use, a general search will often give you an overwhelming number of results
-
+
* You can use the filters on the left to help refine your records
* Once you've selected an entry:
-
+
* Notice the `Full Text Links` button, that will take you to the publisher page, where the full article is freely available to access
* You can use the `Page Navigation` links on the right to find similar articles and additional resources
@@ -88,26 +91,26 @@ PubMed offers both `general` and `advanced` query functionality
* Advanced search is a bit more involved, but still fairly easy to use
* To access the `Advanced Search` builder, select `Advanced` under the general search bar
-
+
* To build a search query, first you need to `Add terms to the query box`
* The dropdown menu gives you a large number of fields to choose from, which can be further refined by the terms entered
* There is also an autocomplete feature built in that helps avoid spelling mistakes, and expands functionality
-
+
* You can also modify how these terms are interpreted by using boolean operators
-
+
* Once you've entered in all terms, click `Search`
-
+
* From here, you can refine your search in the same way that we covered in the `General search` component
* If you go back to the `PubMed Advanced Search Builder` page, you'll notice that near the bottom of the page, your History and Search Details are saved
-
+
* When signed in to NCBI, these queries are recorded, and provide a good reference point, should you ever need to revisit for similar content in the future
diff --git a/bioinformatics/online_resources/intro_to_uniprot.md b/bioinformatics/online_resources/intro_to_uniprot.md
new file mode 100644
index 0000000..acf97e3
--- /dev/null
+++ b/bioinformatics/online_resources/intro_to_uniprot.md
@@ -0,0 +1,106 @@
+---
+title: Introduction to UniProt
+author: "Nathaniel Maki"
+organization: MDIBL Computational Core
+date: "January 24th"
+---
+
+# Introduction to UniProt
+
+## Summary
+
+* Universal Protein Resource - comprehensive resource of protein sequence and functional information
+* Supported by EMBL-EBI / PIR (Protein Information Resource) / SIB (Swiss Institute of Bioinformatics)
+* Data is pulled from resources such as Ensembl, INSDC (NCBI, DDBJ, ENA), and direct submission
+
+Most of data (90%) in Uniprot comes from translations of coding sequences that have been user submitted to INSDC
+
+## Resources
+
+UniProtKB
+* Contains functional + structural data on proteins, sourced from Swiss-Prot and TrEMBL
+ * Swiss-Prot submissions are manually annotated + reviewed
+ * TrEMBL submissions are automatically annotated, not reviewed
+
+Proteomes
+* Protein sets expressed by organisms
+* Provides proteomes from completely sequenced genomes
+
+UniParc
+* Database that contains most of the publically available protein sequences
+
+UniRef
+* Provides clustered set of sequences
+* Useful for examining sequence conservation between species
+
+## UniProtKB
+
+TrEMBL
+* All direct submissions go to TrEMBL
+* Once reviewed and manually annotated, moved to Swiss-Prot
+
+Swiss-Prot
+* All entries are manually reviewed by curators
+* Contain detailed info about protein function that may be missing/incomplete on TrEMBL initial submissions
+
+## How to Access Data
+
+Search bar defaults to UniProtKB (can change with dropdown)
+
+`Quick Access Tiles` let you search specific resource
+
+`Getting Started` takes you to tools such as BLAST
+
+All data on Uniprot can also be accessed programatically
+
+### Free Text Search
+Can query by
+* protein name
+* gene name
+* species
+* disease
+* keywords
+* GO term
+
+### Advanced Search
+Much like Entrez, allows for advanced query building, combining any accepted terms to produce output
+
+### Results page
+Can refine results by using `filter` function on left side of page, filter by organism as well as reviewed or unreviewed resource (querying for `TP53`)
+
+### Retrieve/ID Mapping
+Plug multiple queries into search box
+* TP53, BRAC2
+* Choose `From Gene Name` and `To UniProtKB`
+* Organism is `Homo sapiens`
+
+### Explore Protein Sequences + Features
+Use Blast + Align to locate similar proteins both within species and in different ones
+* Paste sequence into BLAST box
+* Set Target db to UniProtKB
+
+Identity column on right shows how similar search query is to protein sequence in results
+
+BLAST results can be filtered by normal means (reviewed, organism, etc)
+
+### Align
+Align 2 or more sequences to locate regions of similarity, or find conserved regions
+* Run alignment using: `TPA_HUMAN TPA_PIG`
+* `*` means Conserved residues (Proteins that have fewer AA replacements) (Exactly same in all sequences)
+* `:` Strong similar properties (AA that aren't same but have similar chemical properties)
+* `.` Weak similar properties (AA that aren't same, weaker chemical properties)
+
+## Accessing Proteomes
+Proteome contains a set of proteins theorized to be expressed by an organism
+* for species that have fully sequenced genomes
+* contain protein sequence and functional information for large swath of species
+* Allows for cross-species comparison of orthology + conservation
+
+### Reference Proteomes
+Subset of proteomes - best annotated for a chosen taxonomic group
+* for well studied model organisms + species of interest
+
+#### Query Proteomes
+Homo sapiens as an example
+* can be mapped to both reviewed and unreviewed resources for comparison
+
diff --git a/minota/images/.DS_Store b/bioinformatics/worksheets/.DS_Store
similarity index 65%
rename from minota/images/.DS_Store
rename to bioinformatics/worksheets/.DS_Store
index 891bfe4..4b88e0b 100644
Binary files a/minota/images/.DS_Store and b/bioinformatics/worksheets/.DS_Store differ
diff --git a/bioinformatics/worksheets/assembly_worksheet.md b/bioinformatics/worksheets/assembly_worksheet.md
new file mode 100644
index 0000000..9ac8a94
--- /dev/null
+++ b/bioinformatics/worksheets/assembly_worksheet.md
@@ -0,0 +1,345 @@
+---
+title: Transcriptome Assembly Worksheet
+author: Nathaniel Maki
+organization: MDIBL Computational Core
+date: March 1st, 2021
+---
+
+# Transcriptome Assembly Worksheet
+
+## Learning Objectives
+
+* Improve skills in filesystem navigation and file modification through the use of the CLI and nano
+* Through the Bowdoin SGE Compute Cluster, use a Trinity script in conjunction with Commander to generate a basic transcriptome
+* Understand the options contained within the Trinity script, and how they can impact your resulting Assembly
+
+### Step 0: Connecting to the Cluster
+
+This step has been covered in a previous lecture, and is required to proceed with this worksheet
+
+In short, ensure that you've connected to the Bowdoin VPN, have logged into Moosehead, and that your terminal is displaying the following
+
+
+
+### Step 1: Navigating through the Cluster ( where are all the things :thinking: )
+
+The path to your specific User directory is `/mnt/courses/biol2566/people/` followed by your username
+
+This User directory will be automatically populated with `Analysis`, `Data`, etc by Commander, and scripts you launch will dump their output here, in a predictable, findable, and structured format
+
+We're going to be editing some template files to point towards your specific directory structure, but first, we need to retrieve them
+
+Those template files can be found at `/mnt/courses/biol2566/software/trinity_demo`
+
+### Step 2: Copying Template files to qsub directory
+
+First, create a directory called `qsub` in your home directory (reached by typing `cd ~`)
+
+* `mkdir qsub`
+* `cd` into `qsub`
+
+Copy the template json file we've provided from the above path to the `qsub` directory
+
+* `cp /mnt/courses/biol2566/software/trinity_demo/*.json ./`
+
+### Step 3: Editing the Trinity json file
+
+First, we're going to want to give the template file a more unique name
+
+Using the `mv` command, change the name of the `bowdoin_trinity_template.json` file to `"username"_trinity_template.json`
+
+* `mv bowdoin_trinity_template.json` `nmaki_trinity_template.json`
+
+Now open the named `trinity_template.json` in `nano`
+
+The default file will look like this:
+
+```
+{
+ "job_details": {
+ "job_name": "YOUR_USERNAME_trinity_test"
+ },
+ "experiment_details": {
+ "pi": "Bowdoin",
+ "experiment_name": "jcoffman_001.reduced",
+ "analysis_id": "0123456789",
+ "sample_path": "/mnt/courses/biol2566/data",
+ "analysis_path": "/mnt/courses/biol2566/people/YOUR_USERNAME/analysis",
+ "workdir": "/mnt/hpc/tmp/YOUR_USERNAME",
+ "sample_file": null,
+ "sample_file_type": null
+ },
+ "sge_preamble":{
+ "current_directory": true,
+ "join_output": true,
+ "email_address": "YOUR_EMAIL",
+ "shell": "/bin/bash",
+ "parallel_environment": "smp 16",
+ "memory": "virtual_free=4g"
+ },
+ "misc_preamble": [
+ "### ENVIRONMENT VARIABLES TO BE USED AT QSUB-SCRIPT LEVEL ONLY ###",
+ "### DO NOT TRY TO REFERENCE THESE VARIABLES IN THE TRINITY COMMAND ###",
+ "# Top level course space",
+ "classdirectory='/mnt/courses/biol2566'",
+ "",
+ "# Temporary HPC directory for temporary runtime files",
+ "hpctmp='/mnt/hpc/tmp/'$USER",
+ ""
+ ],
+ "commands": [
+ {
+ "command": "Trinity",
+ "batch": false,
+ "tasks": 1,
+ "cpus": 3,
+ "memory": 128,
+ "singularity_path": "$classdirectory/software/sif",
+ "singularity_image": "trinity_v_latest.sif",
+ "work_dir": null,
+ "volumes": [
+ {
+ "host_path":"$classdirectory",
+ "container_path":"/compbio"
+ },
+ {
+ "host_path":"$hpctmp",
+ "container_path":"/hpctmp"
+ }
+ ],
+ "options": [
+ "--seqType fq",
+ "--SS_lib_type RF",
+ "--normalize_by_read_set",
+ "--samples_file /compbio/data/Bowdoin/jcoffman_001.reduced/jcoffman_001.reduced.samples.txt",
+ "--trimmomatic",
+ "--max_memory 64G",
+ "--CPU 16",
+ "--workdir /hpctmp/YOUR_USERNAME_trinity_test-$JOB_ID",
+ "--output /hpctmp/YOUR_USERNAME_trinity_test-$JOB_ID"
+ ],
+ "arguments":null
+ }
+ ],
+ "cleanup":[
+ "",
+ "# sync and wait for a bit",
+ "sync",
+ "sleep 60s",
+ "",
+ "# copy final output file from temporary working directory",
+ "cp $hpctmp/YOUR_USERNAME_trinity_test-$JOB_ID/Trinity.* $classdirectory/people/$USER/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity"
+ ]
+}
+```
+
+You'll need to replace anything in ALL-CAPS with details relative to your username and options that we'll define below
+
+* Ignore the `misc_preamble`, that does not need to be modified
+
+Lets edit the file together, moving through it block by block
+
+The block identifier will be denoted in **bold**, with expected output following below that
+#### 3.1: job_details
+
+* Update the "job_name" entry with a new id (can be something as simple as nmaki_trinity_test)
+
+**Expected**
+
+```
+"job_details": {
+ "job_name": "nmaki_trinity_test"
+```
+
+#### 3.2: experiment_details
+
+* Update the "analysis_path" entry with your username where specified
+* Update the "workdir" entry with your username where specified
+
+**Expected**
+
+```
+"experiment_details": {
+ "pi": "Bowdoin",
+ "experiment_name": "jcoffman_001.reduced",
+ "analysis_id": "0123456789",
+ "sample_path": "/mnt/courses/biol2566/data",
+ "analysis_path": "/mnt/courses/biol2566/people/nmaki/analysis",
+ "workdir": "/mnt/hpc/tmp/nmaki",
+ "sample_file": null,
+ "sample_file_type": null
+```
+
+#### 3.3: sge_preamble
+
+* Update the "email_address" entry to point to your email
+
+**Expected**
+
+```
+"sge_preamble":{
+ "current_directory": true,
+ "join_output": true,
+ "email_address": "nmaki@mdibl.org",
+ "shell": "/bin/bash",
+ "parallel_environment": "smp 16",
+ "memory": "virtual_free=4g"
+```
+
+#### 3.4: misc_preamble
+
+* Ignore me
+
+#### 3.5: commands
+
+* Ignore the entries up until the "options" block
+* Update the "--workdir" option with the "JOBNAME" we defined at the top of the file
+* Update the "--output" option with the "JOBNAME" we defined at the top of the file
+
+**Expected**
+
+```
+"commands": [
+ {
+ "command": "Trinity",
+ "batch": false,
+ "tasks": 1,
+ "cpus": 3,
+ "memory": 128,
+ "singularity_path": "$classdirectory/software/sif",
+ "singularity_image": "trinity_v_latest.sif",
+ "work_dir": null,
+ "volumes": [
+ {
+ "host_path":"$classdirectory",
+ "container_path":"/compbio"
+ },
+ {
+ "host_path":"$hpctmp",
+ "container_path":"/hpctmp"
+ }
+ ],
+ "options": [
+ "--seqType fq",
+ "--SS_lib_type RF",
+ "--normalize_by_read_set",
+ "--samples_file /compbio/data/Bowdoin/jcoffman_001.reduced/jcoffman_001.reduced.samples.txt",
+ "--trimmomatic",
+ "--max_memory 64G",
+ "--CPU 16",
+ "--workdir /hpctmp/nmaki_trinity_test-$JOB_ID",
+ "--output /hpctmp/nmaki_trinity_test-$JOB_ID"
+ ],
+ "arguments":null
+ }
+ ],
+```
+
+#### 3.6 cleanup
+
+* Update the final line of the script, starting with `"cp`, with your YOUR_USERNAME using the input we defined in the "job_details" block above
+
+**Expected**
+
+```
+"cleanup":[
+ "",
+ "# sync and wait for a bit",
+ "sync",
+ "sleep 60s",
+ "",
+ "# copy final output file from temporary working directory",
+ "cp $hpctmp/nmaki_trinity_test-$JOB_ID/Trinity.* $classdirectory/people/$USER/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity"
+ ]
+```
+
+### Step 4: Executing Commander on edited Trinity json file
+
+Once we've made all the necessary modification, and our script closely resembles the example above, save the changes with `ctrl + o` and then exit out of `nano` using `ctrl + x`
+
+Next, we want to feed the json file into Commander, which will generate a bash script and allow us to run the job on our compute cluster
+
+The command you'll want to run is as follow:
+
+* `/mnt/courses/biol2566/software/compbio_commander/commander --sge --preflight nmaki_trinity_template.json`
+
+If you execute an `ls` in your `qsub` directory, you'll see that we now have another file, one with a `.sh` suffix
+
+### Step 5: Launching the Trinity shell script through qsub
+
+To launch your shell script, simply type the following into the command line
+
+* `qsub nmaki_trinity_test.sh`
+
+To check on a currently submitted job, you can use
+
+* `qstat -u "username"`
+
+### Step 6: Examining output
+
+Your output will be deposited in a path similar to the one below
+
+* `/mnt/courses/biol2566/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity`
+
+The names of the directories that have been auto-generated rely upon the parameters given in the json script
+
+Assuming that everything has ran properly, running an `ls` in the `Trinity` directory should yield three results
+
+* Trinity.fasta
+* Trinity.fasta.gene_trans_map
+* Trinity.timing
+
+#### 6.1 Trinity.fasta
+
+Running the `head -n 5` command on the `Trinity.fasta` file will give us a look into what constitutes our transcriptome assembly
+
+Keep in mind that every assembly will be a bit different, so your output will vary
+
+```
+>TRINITY_DN30_c0_g1_i1 len=259 path=[0:0-258]
+TTATTATGTCTACAAGTTTACCAGGCCTCCTCTTTCAATCCCAGTTCCAAGATATTGTGCAGTGTCTTTCCCCACCCCAAGTGTCGACCCCTGCGACAACTACATCAGTCTGGATGATCCTTGGAGATCCACTGAAAACCCTGCTAACTACTATAGCTTTTGTGATTGGGGTAATTCATGGAATGGCTTTTACAGGCTGTTCTACAACAGTCAGAGTGCTCAAATGCCAGAATCATGTGTAAATGAAGGCATGTGTGGC
+>TRINITY_DN30_c0_g2_i5 len=1272 path=[0:0-149 2:150-278 3:279-286 5:287-287 7:288-294 8:295-1271]
+CCGAGATTTTGCGCAGCTTACTGTGTTGCTGCAAACCAGTCCAGCACAGCCGTGACAACAACCCAGCCAATCACAACCATAGACTTCATTAATCCACCAACCACTGCTCCCCCTGTTGACCCCTGCAATAACTTCTTGATCCTGGATGAACCATGGAGAGCCACCAGCAATCAAAACTCCTCTCAGTTAATGTGTGACAGTGCGGTGAGCTGGAGCGGCTGGTACCGTCTCTTCATTAATGGTCAGAGTGTTCAGATGCCAGACACATGTGTTGATGAGAATAGCTGCGGCACTAATGCTCCACTGTGGCTGAGCGGAGGACATCCAACACTTGAGGATGGAGTGGTCTCTCGTAATGTCTGCGGACACTGGAACAACGACTGCTGCTATTTCCAGTCCAATCCCATTCAAGTCAAAGCCTGTCCTGGAGGTTTTTATGTCTATGAGTTTGTGAGGCCGACCACCTGCAATTTAGCATACTGTGCAGATGTGAGGTTTAACACTAGCTATACAACTGACATACCAGAGACGACCACAACAGAAACAGCAGCTGAAACCAGAACTATAATATTTGATGACAGAAACCCCTGTTCTGAACTCAACTGCTCCAAAGAGGAAAGGTGTGGGATGAAAAATGGTGTTTATGGCTGTTTATGTAACAAAGGCCACCAAAAACAGCGAGCAGCTCAAGACTCCTTTGATTTCAATGAGACCTGTGAGAGCAGCTCTGGCTCCATGTCTGTGTCTCGCTGTCAGCTTTTTGAAGCTGGTTTTTCAGCTGAGCACTTACACCTCAATGACCCCAGCTGCAGAGGAACCGTCCAGAACGGCAGAGTGGAGTTTAACTTCGACAACGATCAACACATCTGTGGCACAAATCTTGTGGCCAACGGCAGCCACTTCATCTACAGTAACTATATTGTGGGGACGCCGGGAACAGAAGGTCTCATCAGCAGAGTGAGAATCCTGAAGCTTTCTTTCAGCTGTGTTTATCCTCAAACACAAACACTTTCCATGAACGTGGAGATCAACCCACTGCAGAGCACCGTGCACAAGGTCCTCCCCAGTGGTGAAGGGGTTTATCAGGTGCGGATGGTCCCGTATGTGGATGAAGAGTTCACTCAGCCCTTCACTGGTAGAGTGGATGCAGAGCTGGACCAGGAGATGCATGTGGAGGTTGGTGTTGAGGGGGTCGACAGCCGCCAGTTTGCCCTGGTGATGGACACGTGTTGGGCTACACCTGTGAATGACCCTGATTACAGTCTCCGCTGG
+```
+
+* Trinity groups its transcripts into clusters based upon shared sequence content
+* These clusters are referenced as a gene, and are encoded within the Trinity fasta accession
+ * The fasta accession encodes Trinity gene and isoform data
+
+Lets examine an accession, `TRINITY_DN30_c0_g2_i5` in this example
+
+`TRINITY_DN30_c0_g2_i5` points to Trinity read cluster `TRINITY_DN30_c0`, gene `g2`, and isoform `i5`
+
+Any run of Trinity is going to involve a large amount of clusters of reads, with each one being assembled separately
+
+* Because gene numberings are unique within any selected processed read cluster, the gene identifier needs to be thought of as an aggregate of both the read cluster and corresponding gene identifier
+* In this example, `TRINITY_DN30_c0_g2`
+
+In short, gene id: `TRINITY_DN30_c0_g2` encoding isoform id: `TRINITY_DN30_c0_g2_i5`
+
+Path information is also stored on the same line as the accession, in the form of `path=[0:0-149 2:150-278, etc]`
+
+* This indicates the path that was traversed in the Trinity compressed de Bruijn graph to build that transcript
+* In this example, node `0` corresponds to sequence range `0-149` of the transcript, node `2` corresponds to sequence range `150-278`, and so on
+* These node id's are only unique in the context of a chosen Trinity gene identifier
+ * they can't be compared among isoforms to identify unique and shared sequences of each isoform of a selected gene
+
+#### 6.2 Trinity.fasta.gene_trans_map
+
+* Mapping between the a Trinity gene id, and it's corresponding transcript accession
+* It's the necessary bridge between individual transcripts, which is where all of our direct questions concerning alignment, expression, and processing are carried out, and the genes, which is normally how the summary of the experiment will be presented
+* In a nutshell, we need to work with the transcripts, but the most interesting and interpretable answers lie at the gene level
+
+#### 6.3 Trinity.timing
+
+* Includes details about the Trinity run, such as parameters used, input file sizes, unique KMERs, and runtime
+
+## Contact
+If you have questions about the information in this worksheet, please contact:
+
+```
+Nathaniel Maki
+Bioinformatics Training Specialist
+MDI Biological Laboratory
+nmaki[at]mdibl.org
+```
\ No newline at end of file
diff --git a/bioinformatics/worksheets/assembly_worksheet_img/img1.png b/bioinformatics/worksheets/assembly_worksheet_img/img1.png
new file mode 100644
index 0000000..5d8fa18
Binary files /dev/null and b/bioinformatics/worksheets/assembly_worksheet_img/img1.png differ
diff --git a/bioinformatics/worksheets/cutting_class-gene_query.md b/bioinformatics/worksheets/cutting_class-gene_query.md
new file mode 100644
index 0000000..cd65b93
--- /dev/null
+++ b/bioinformatics/worksheets/cutting_class-gene_query.md
@@ -0,0 +1,89 @@
+---
+title: Cutting Class Worksheet
+author: Nathaniel Maki
+organization: MDIBL Computational Core
+date: March 24th, 2021
+---
+
+# Cutting Class Worksheet
+
+## Learning Objectives
+
+* Become familiar with the Gene Search on the CC site, and the Sequence Similarity Search
+* Gain an understanding of the query results
+
+### Search Genes
+
+The Search Genes tool lets you search for a Gene by name, or putative (known/assumed) function
+
+For this, we're interested in a gene that's in any of the four species of Planarian, so we can keep `Species` as Any
+
+2 ways to get gene of interest in this organism
+
+
+* `Search by Name` lets you plug in the unique identifier of the sequence, or upload a text file containing a nucleotide sequence(s)
+ * Useful if you've got multiple sequences you want to query against
+* What if you already know a fruit fly, human, or planarian gene name or function, and you'd like to locate a similar analogue in the four species listed above?
+ * You can use the `Search by Putative Function`
+ * Searching for descriptions of the proteins that were matched by the transcripts putative proteins
+ * We assign functionality in novel organism by trying our best to align new protein to existing protein to get function
+
+Lets search for the `FoxA` gene, which is evolutionarily conserved, being involved in the development of the digestive system in a large swath of mammals. In planaria, it plays a role in regeneration
+
+You'll get a whole host of summarized information in a `record` table
+
+* Record number
+* Gene name, linking out to gene page
+* Length of nucleotide sequence
+* Name of homologous sequences found in S. mediterranea
+* Name of homologous sequences found in H. Sapiens
+* Name of homologous sequences found in D. melanogaster
+* Description of sequence in `Uniprot` (Universal Protein Resource)
+* Uniprot E-value (denotes how likely it is that a sequence is homologous to another)
+ * Number of `hits` one will get by querying a database of some size, decreasing as match increases
+
+Selecting a gene will take you to the Gene page
+
+* Overview has a summary of sequence, with name, unique name, type of sequence, organism, and sequence length
+* Analyses lists the analysis that was performed to get the requisite data for this sequence
+* Under Analyses you've got all of the results from the various type of analysis conducted, from generation of a transcriptome assembly (set of all RNA transcriptome in an organism / set of cells) to the results of BLASTXing Girardia Sp. against Human, to BLASTXing against the protein database Swissprot
+
+Let's take a look at the Transcriptome!
+
+* Analysis overview is very similar to a GEO Series (experiment record)
+* Includes the exact Trinity software package that was used to build the Transcriptome Assembly (set of all RNA transcripts) of the organism
+ * Used seqclean QC tool to "spruce" up the output (generally though, QC is done before Assembly)
+ * Wanting to "map meaning" to the constructed assembly, created a set of non-redundant likely protein coding sequences using Transdecoder, and then reducing the resulting set down to its longest sequence by clustering with cd-hit, selecting the longest sequence from each cluster to be the representative seq
+
+#### Homology
+
+Contains top sequence results from a number of BLAST runs on this specific sequence against database sequences from human, fruit fly, planarian, and Swissprot (protein db)
+
+* Each table has the match sequence name, as well as the E-value (number of times a match is expected in database of n size, lower = better match), percent identity (how similar query seq is to target seq, higher = better match), and description
+* Why gap in BLASTX against Schmidtea mediterranea?
+ * potential mismatch, no sequence aligning at that regions (creating a gap)
+
+#### Sequences
+
+Contains mRNA, coding (CDS) and protein sequences (we're going to blastn/blastx these)
+
+### Sequence Similarity Search
+
+Searching for sequences similar to one we've already selected
+
+Are you working with a nucleotide or protein sequence?
+
+If nucleotide, `blastn` or `blastx`:
+
+* blastn can be used when checking a nucleotide sequence between the same species (nucleotide to nucleotide db)
+* blastx is utilized when searching a nucleotide sequence from a species that differs from the species of the comparison protein sequence (nucleotide to protein db)
+ * First, the query nucleotide is processed through a translation layer, to the amino acid sequences in every of the 6 reading frames
+ * Then, that translated sequence (now protein) is checked against protein db
+
+If protein, `tblastn` or `blastp`
+
+* tblastn when you want to check a protein sequence against a nucleotide db
+ * In this case, the entire nucleotide db is translated to protein (AA) sequence and then compared against
+* blastp to query a protein sequence against a protein db
+
+
diff --git a/bioinformatics/worksheets/exploration_worksheet.md b/bioinformatics/worksheets/exploration_worksheet.md
new file mode 100644
index 0000000..8779fdd
--- /dev/null
+++ b/bioinformatics/worksheets/exploration_worksheet.md
@@ -0,0 +1,349 @@
+---
+title: Assembly Exploration Worksheet
+author: Nathaniel Maki
+organization: MDIBL Computational Core
+date: March 1st, 2021
+---
+
+# Assembly Exploration Worksheet
+
+## Learning Objectives
+
+* Improve skills in filesystem navigation and file modification through the use of the CLI and nano
+* Through the Bowdoin SGE Compute Cluster, use a Transrate script in conjunction with Commander to generate a some statistical output on your assembled transcriptome
+* Understand the options contained within the Transrate script, and how they can impact your resulting analysis
+
+### Step 0: Connecting to the Cluster
+
+This step has been covered in a previous lecture, and is required to proceed with this worksheet
+
+In short, ensure that you've connected to the Bowdoin VPN, have logged into Moosehead, and that your terminal is displaying the following
+
+
+
+### Step 1: Navigating through the Cluster ( where are all the things :thinking: )
+
+The path to your specific User directory is `/mnt/courses/biol2566/people/` followed by your username
+
+This User directory will be automatically populated with `Analysis`, `Data`, etc by Commander, and scripts you launch will dump their output here, in a predictable, findable, and structured format
+
+We are also going to be editing some template files, to point towards your specific directory structure
+
+Those template files can be found at `/mnt/courses/biol2566/software/compbio_commander/paramfile_templates`
+
+### Step 2: Copying Template files to qsub directory
+
+First, create a directory called `qsub` in your home directory (reached by typing `cd ~`)
+
+* `mkdir qsub`
+* `cd` into `qsub`
+
+Copy the template json files we've provided from the above path to the `qsub` directory
+
+* `cp /mnt/courses/biol2566/software/compbio_commander/paramfile_templates/*.json ./`
+
+### Step 3: Editing the Transrate json file
+
+First, we're going to want to give the template file a more unique name
+
+Using the `mv` command, change the name of the `bowdoin_transrate_template.json` file to `"username"_transrate_template.json`
+
+* `mv bowdoin_transrate_template.json` `nmaki_transrate_template.json`
+
+Now open the named `transrate_template.json` in `nano`
+
+The default file will look like this:
+
+```
+{
+ "job_details": {
+ "job_name": "YOUR_USERNAME_transrate_test"
+ },
+ "experiment_details": {
+ "pi": "Bowdoin",
+ "experiment_name": "jcoffman_001.reduced",
+ "analysis_id": "0123456789",
+ "sample_path": "/mnt/courses/biol2566/data",
+ "analysis_path": "/mnt/courses/biol2566/people/YOUR_USERNAME/analysis",
+ "workdir": null,
+ "sample_file": null,
+ "sample_file_type": null
+ },
+ "sge_preamble":{
+ "current_directory": true,
+ "join_output": true,
+ "email_address": "YOUR_EMAIL",
+ "shell": "/bin/bash",
+ "parallel_environment": "smp 16",
+ "memory": "virtual_free=4g"
+ },
+ "misc_preamble": [
+ "# *** ENVIRONMENT VARIABLES TO BE USED AT QSUB-SCRIPT LEVEL ONLY *** #",
+ "# *** DO NOT TRY TO REFERENCE THESE VARIABLES IN THE TRINITY COMMAND *** #",
+ "",
+ "# Top level course space",
+ "classdirectory='/mnt/courses/biol2566'",
+ ""
+ ],
+ "commands": [
+ {
+ "command": "transrate",
+ "batch": false,
+ "tasks": 1,
+ "cpus": 3,
+ "memory": 128,
+ "singularity_path": "$classdirectory/software/sif",
+ "singularity_image": "transrate_v1_0_3.sif",
+ "work_dir": null,
+ "volumes": [
+ {
+ "host_path":"$classdirectory",
+ "container_path":"/compbio"
+ }
+ ],
+ "options": [
+ "--assembly /compbio/people/YOUR_USERNAME/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity/Trinity.fasta",
+ "--output /compbio/people/YOUR_USERNAME/analysis/Bowdoin/jcoffman_001.reduced/0123456789/transrate/",
+ "--threads 16"
+ ],
+ "arguments":null
+ }
+ ]
+}
+```
+
+You'll need to replace anything in ALL-CAPS with details relative to your username and options that we'll define below
+
+* Ignore the `misc_preamble`, that does not need to be modified
+
+Lets edit the file together, moving through it block by block
+
+The block identifier will be denoted in **bold**, with expected output following below that
+#### 3.1: job_details
+
+* Update the "job_name" entry with a new id (can be something as simple as nmaki_transrate_test)
+
+**Expected**
+
+```
+"job_details": {
+ "job_name": "nmaki_transrate_test"
+```
+
+#### 3.2: experiment_details
+
+* Update the "analysis_path" entry with your username where specified
+
+**Expected**
+
+```
+"experiment_details": {
+ "pi": "Bowdoin",
+ "experiment_name": "jcoffman_001.reduced",
+ "analysis_id": "0123456789",
+ "sample_path": "/mnt/courses/biol2566/data",
+ "analysis_path": "/mnt/courses/biol2566/people/nmaki/analysis",
+ "workdir": null,
+ "sample_file": null,
+ "sample_file_type": null
+```
+
+#### 3.3: sge_preamble
+
+* Update the "email_address" entry to point to your email
+
+**Expected**
+
+```
+"sge_preamble":{
+ "current_directory": true,
+ "join_output": true,
+ "email_address": "nmaki@mdibl.org",
+ "shell": "/bin/bash",
+ "parallel_environment": "smp 16",
+ "memory": "virtual_free=4g"
+```
+
+#### 3.4: misc_preamble
+
+* Ignore me
+
+#### 3.5: commands
+
+* Ignore the entries up until the "options" block
+* Update the "--assembly" option with the details for "pi", "experiment_name" and "analysis_id"
+* Update the "--output" option with the details for "pi", "experiment_name" and "analysis_id"
+
+**Expected**
+
+```
+"commands": [
+ {
+ "command": "transrate",
+ "batch": false,
+ "tasks": 1,
+ "cpus": 3,
+ "memory": 128,
+ "singularity_path": "$classdirectory/software/sif",
+ "singularity_image": "transrate_v1_0_3.sif",
+ "work_dir": null,
+ "volumes": [
+ {
+ "host_path":"$classdirectory",
+ "container_path":"/compbio"
+ }
+ ],
+ "options": [
+ "--assembly /compbio/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity/Trinity.fasta",
+ "--output /compbio/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/transrate/",
+ "--threads 16"
+ ],
+ "arguments":null
+ }
+ ]
+```
+
+### Step 4: Executing Commander on edited Transrate json file
+
+Once we've made all the necessary modification, and our script closely resembles the example above, save the changes with `ctrl + o` and then exit out of `nano` using `ctrl + x`
+
+Next, we want to feed the json file into Commander, which will generate a bash script and allow us to run the job on our compute cluster
+
+The command you'll want to run is as follow:
+
+* `/mnt/courses/biol2566/software/compbio_commander/commander --sge --preflight nmaki_transrate_template.json`
+
+If you execute an `ls` in your `qsub` directory, you'll see that we now have another file, one with a `.sh` suffix
+
+### Step 5: Launching the Transrate shell script through qsub
+
+To launch your shell script, simply type the following into the command line
+
+* `qsub nmaki_transrate_test.sh`
+
+To check on a currently submitted job, you can use
+
+* `qstat -u "username"`
+
+### Step 6: Examining output
+
+Your output will be deposited in two regions:
+
+Log file with many of the metrics we want to look at will be placed in your `qsub` directory
+
+* `/home/username/qsub`
+ * `nmaki_transrate_test.sh.o***`
+
+The core stats that make up that file will be located here in the following directory
+
+* `/mnt/courses/biol2566/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/transrate/`
+
+The names of the directories that have been auto-generated rely upon the parameters given in the json script
+
+Some of these folders may differ depending upon what you called them
+
+Assuming that everything has ran properly, running an `ls` in the `transrate` directory should yield two results
+
+* assemblies.csv
+* /Trinity/ directory
+ * contigs.csv
+
+#### Step 6.1 Contig metrics
+
+* Use the `cat` command on your `nmaki_transrate_test.sh.o***` to view the contents of the file
+
+**Expected**
+
+```
+[ INFO] 2021-03-08 19:42:00 : Loading assembly: /compbio/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity/Trinity.fasta
+[ INFO] 2021-03-08 19:42:00 : Analysing assembly: /compbio/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/Trinity/Trinity.fasta
+[ INFO] 2021-03-08 19:42:00 : Results will be saved in /compbio/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/transrate/Trinity
+[ INFO] 2021-03-08 19:42:00 : Calculating contig metrics...
+[ INFO] 2021-03-08 19:42:00 : Contig metrics:
+[ INFO] 2021-03-08 19:42:00 : -----------------------------------
+[ INFO] 2021-03-08 19:42:00 : n seqs 195
+[ INFO] 2021-03-08 19:42:00 : smallest 201
+[ INFO] 2021-03-08 19:42:00 : largest 5258
+[ INFO] 2021-03-08 19:42:00 : n bases 201102
+[ INFO] 2021-03-08 19:42:00 : mean len 1031.29
+[ INFO] 2021-03-08 19:42:00 : n under 200 0
+[ INFO] 2021-03-08 19:42:00 : n over 1k 70
+[ INFO] 2021-03-08 19:42:00 : n over 10k 0
+[ INFO] 2021-03-08 19:42:00 : n with orf 86
+[ INFO] 2021-03-08 19:42:00 : mean orf percent 70.85
+[ INFO] 2021-03-08 19:42:00 : n90 352
+[ INFO] 2021-03-08 19:42:00 : n70 1272
+[ INFO] 2021-03-08 19:42:00 : n50 2010
+[ INFO] 2021-03-08 19:42:00 : n30 2632
+[ INFO] 2021-03-08 19:42:00 : n10 4289
+[ INFO] 2021-03-08 19:42:00 : gc 0.45
+[ INFO] 2021-03-08 19:42:00 : bases n 0
+[ INFO] 2021-03-08 19:42:00 : proportion n 0.0
+[ INFO] 2021-03-08 19:42:00 : Contig metrics done in 0 seconds
+[ INFO] 2021-03-08 19:42:00 : No reads provided, skipping read diagnostics
+[ INFO] 2021-03-08 19:42:00 : No reference provided, skipping comparative diagnostics
+[ INFO] 2021-03-08 19:42:00 : Writing contig metrics for each contig to /compbio/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/transrate/Trinity/contigs.csv
+[ INFO] 2021-03-08 19:42:01 : Writing analysis results to assemblies.csv
+```
+
+These metrics are based on just an analysis of the set of contigs themselves, and only really useful as a quick-and-dirty way of screening for major issues in your assembly
+
+They can be informative, but aren't great for judging over assembly quality (due to not knowing what the "optimum" is)
+
+Only extremes can reliably be recognized, either a very small (<5000) or very large (>100,000) number of contigs is not biologically probable for most organisms
+
+You'll need to leverage your biological knowledge to choose which values you find acceptable
+
+Obviously for this test dataset, we can't really draw any meaningful conclusions
+
+#### Step 6.2 Assembly score
+
+* Use the `cat` command on the `assemblies.csv` file found in your user-specific `transrate` directory
+
+**Expected**
+
+```
+assembly,n_seqs,smallest,largest,n_bases,mean_len,n_under_200,n_over_1k,n_over_10k,n_with_orf,mean_orf_percent,n90,n70,n50,n30,n10,gc,bases_n,proportion_n,score,optimal_score,cutoff,weighted
+Trinity.fasta,195,201,5258,201102,1031.29231,0,70,0,86,70.84614,352,1272,2010,2632,4289,0.44746,0,0.0,NA,NA,NA,NA
+```
+
+The `assemblies.csv` file echoes a lot of what was found in the log file, but what we're most interested in is the `score` of our assembly
+
+* How confident we can be in what we assembled
+* How complete the assembly is
+
+The score ranges from 0 to 1, with a higher score increasing the likelihood that you have an assembly that is biologically accurate
+
+#### Step 6.3 Read mapping metrics
+
+* Generated by aligning the reads used in the assembly to the assembled transcriptome
+
+The most useful metric, due to the wealth of organism-specific information that the reads contain
+
+This data can be used to evaluate confidence in every base and contig present in the assembly
+
+Upon including the `--left` and `--right` input options, Transrate will
+
+* Map provided reads to the assembly
+* Infer most likely contig of origin for any reads that multi-map with `Salmon`
+* Inspect resulting alignments with `transrate-tools`, using them for evaluation on every contig in the transcriptome
+
+Able to detect "good" and "bad" mappings
+
+* Good mappings are those that align in such a way that is consistent with the contig being perfectly composed
+ * Both members of pair are aligned
+ * In proper orientation
+ * On same contig
+ * Without overlapping either end of contig
+
+* Bad mappings are those that fail any of the above conditions
+
+## Contact
+If you have questions about the information in this worksheet, please contact:
+
+```
+Nathaniel Maki
+Bioinformatics Training Specialist
+MDI Biological Laboratory
+nmaki[at]mdibl.org
+```
\ No newline at end of file
diff --git a/bioinformatics/worksheets/transcriptome_assessment_worksheet.md b/bioinformatics/worksheets/transcriptome_assessment_worksheet.md
new file mode 100644
index 0000000..55afaf2
--- /dev/null
+++ b/bioinformatics/worksheets/transcriptome_assessment_worksheet.md
@@ -0,0 +1,347 @@
+---
+title: Introduction to Transcriptome Assessment and Analysis
+author: Nathaniel Maki
+organization: MDIBL Computational Core
+date: March 20th, 2021
+---
+
+# Introduction to Transcriptome Assessment and Analysis
+
+## Learning Objectives
+
+* Become familiar with some of the exploratory post-assembly assessment and annotation processes
+* Acquire a fundamental working knowledge of the tools used for the above processes, their necessity, and how to interpret their output
+* Gain hands-on experience by applying your new-found skills to your own assembly
+
+### Step 0: Acquiring Files
+
+Before we can begin this lesson, we need to copy down a suite of template files from `paramfile_templates` down to the `qsub` folder we created (since we need all the files in that dir, you can use the `cp *.json` command to bring them all down):
+
+* `/mnt/courses/biol2566/software/compbio_commander/paramfile_templates`
+ * `bowdoin_cdhit_template.json`
+ * `bowdoin_rnaspades_template.json`
+ * `bowdoin_soapdenovo_127mer_template.json`
+ * `bowdoin_soapdenovo_31mer_template.json`
+ * `bowdoin_transdecoder_longorfs_template.json`
+ * `bowdoin_transdecoder_predict_template.json`
+ * `bowdoin_transrate_template.json`
+
+Once those files are in your qsub directory, we'll move on to the first component, Assessment
+
+**Note for the below Steps**
+
+While these are described in steps, the order in which we do 1 and 2 can be reversed
+
+### Step 1: Assessment
+
+#### Trinitystats
+
+Built into Trinity, provides some basic metrics on our assembly
+
+```
+################################
+## Counts of transcripts, etc.
+################################
+Total trinity 'genes': 172
+Total trinity transcripts: 188
+Percent GC: 44.52
+
+########################################
+Stats based on ALL transcript contigs:
+########################################
+
+ Contig N10: 4734
+ Contig N20: 3158
+ Contig N30: 2711
+ Contig N40: 2361
+ Contig N50: 2010
+
+ Median contig length: 527.5
+ Average contig: 1032.97
+ Total assembled bases: 194198
+
+
+#####################################################
+## Stats based on ONLY LONGEST ISOFORM per 'GENE':
+#####################################################
+
+ Contig N10: 3732
+ Contig N20: 2989
+ Contig N30: 2621
+ Contig N40: 2187
+ Contig N50: 1939
+
+ Median contig length: 461
+ Average contig: 970.77
+ Total assembled bases: 166972
+```
+
+Contains information about contig length distributions, based on all transcripts and only on the longest isoform per gene
+
+#### Transrate
+
+Written by the developers of Trinity, Transrate is a piece of software for *de-novo* transcriptome assembly quality analysis
+
+It's capable of providing detailed reports, and examines your assembly, comparing it to experimental evidence (the initial sequencing reads), writing out quality scores for assemblies and contigs
+
+Also has the capability to merge together multiple assemblies from varied assemblers, and conduct scoring on the resulting amalgamation
+
+Overall, it analyzes an assembly in three ways
+
+* Inspection of contig sequences (what the transcriptome assembly is composed of)
+* Mapping Reads to contigs, and inspecting how well they align (how closely does your resulting assembly match the data that was used to generate it)
+* Aligning the contigs against proteins or transcripts from related species, inspecting alignments
+
+The most *useful* metrics are the ones based upon read mapping
+
+* Transrate Assembly score
+* Optimized Assembly score
+* Individual contig scores
+
+#### Step 1.1: Editing `bowdoin_transrate_template.json`
+
+For this specific run, just make the appropriate changes where CAPS are present
+
+#### Step 1.2: Launching `bowdoin_transrate_template.json` through Commander
+
+The process here is identical to running Trinity:
+
+* `/mnt/courses/biol2566/software/compbio_commander/commander --sge --preflight bowdoin_transrate_template.json`
+* Then `qsub` the generated `transrate.sh` file
+
+#### Step 1.3: Examining Output
+
+* Looking at the *.o##### log file
+
+##### Transrate Score
+
+The most useful metric, measure quality of the assembly *without* using a reference
+
+* Score is generated for the entire assembly, and for each contig, with the scoring process using the reads that were used to build the assembly as evidence
+* Provides you with the capability to compare multiple assemblies based off of the same reads
+ * an increase in your score most likely corresponds to an assembly with higher biological accuracy
+ * captures how confident you can be in what was assembled, and how "complete" your transcriptome is
+ * Scales from 0 to 1.0 (maximum)
+
+Expression-weighted quality score
+
+* Score for each contig is multiplied by its relative expression before being included in assembly score (low weight assigned to poorly expressed contigs)
+* More generous to assemblies with poorly assembled contigs of low expression
+ * stored in the `assemblies.csv` file
+
+```
+TRANSRATE ASSEMBLY SCORE 0.0573
+-----------------------------------
+TRANSRATE OPTIMAL SCORE 0.1539
+TRANSRATE OPTIMAL CUTOFF 0.3685
+good contigs 81
+p good contigs 0.43
+```
+
+##### Contig Score
+
+Stored in the `contigs.csv` file, each contig gets assigned a score by measuring how well it's supported by read evidence
+
+Four components to the score
+
+* Measure of correct base call
+* Measure of whether each base is part of the transcript
+* Probability that the contig is derived from a single transcript (and not pieces of two or more)
+* Probability that the contig is structurally complete and accurate
+
+##### Optimized Assembly Score
+
+Using contig scores, bad contigs are filtered out from your assembly, leaving only those that are well assembled
+
+* Done automatically, by learning contig score cutoff that maximizes assembly score
+* Good contigs determined by the above optimization are in the good.*.fa file
+* Bad contigs are in the bad.*.fa file
+
+##### Contig Metrics
+
+* Measured based upon analyzing the set of contigs themselves
+* Useful as a quick way of detecting massive issues with your assembly, namely very small or very large numbers of contigs
+ * those that are biologically improbable
+
+```
+Contig metrics:
+-----------------------------------
+n seqs 188 number of contigs in assembly
+smallest 201 size of smallest contig
+largest 5258 size of largest contig
+n bases 194198 number of bases included in assembly
+mean len 1032.97 mean length of the contig
+n under 200 0 number of contigs < 200 bases
+n over 1k 67 number of contigs > 1K bases
+n over 10k 0 number of contigs > 10K bases
+n with orf 79 number of contigs that had an ORF
+mean orf percent 69.84 for contigs with ORF, mean % of the contig covered by ORF
+n90 351 largest contig size at which at least X% of bases are contained in contigs at least this length
+n70 1289 *
+n50 2010 *
+n30 2711 *
+n10 4734 *
+gc 0.45 % of bases that are G or C
+bases n 0 number of N bases
+proportion n 0.0 proportion of bases that are N
+Contig metrics done in 0 seconds
+Calculating read diagnostics...
+```
+
+##### Read Mapping Metrics
+
+* Based upon aligning the reads used in assembly to the assembled contigs
+* The mapped reads contain a large amount of information specific to the organism that was sequenced, and this info can be leveraged to evaluate the confidence in each base and contig in your resulting assembly (you're essentially mapping the source material on to your transcriptome, looking to see how much was preserved/properly reconstructed)
+
+With the `--left` and `--right` option enabled, transrate:
+
+* Maps the provided reads to the assembly using SNAP
+* Infers the most likely contig of origin for any multi-mapping reads with Salmon
+* Inspects the resulting alignment with transrate-tools and evaluates each contig in the assembly
+
+```
+Read mapping metrics:
+-----------------------------------
+fragments 535815 number of read pairs provided
+fragments mapped 396882 total number of read pairs mapping
+p fragments mapped 0.74 proportion of provided read pairs that mapped successfully
+good mappings 189659 number of read pairs mapping in a way indicating a good assembly
+p good mapping 0.35 proportion of the above
+bad mappings 207223 number of read pairs mapping in a way indicating a poor assembly
+potential bridges 41 number of potential links between contigs that are supported by the reads
+bases uncovered 12027 number of bases that aren't covered by any reads
+p bases uncovered 0.06 proportion of the above
+contigs uncovbase 143 number of contigs that contain at least one base with no read coverage
+p contigs uncovbase 0.76 proportion of the above
+contigs uncovered 40 number of contigs that have a mean per-base read coverage of < 1
+p contigs uncovered 0.21 proportion of the above
+contigs lowcovered 82 number of contigs that have a mean per-base read coverage of < 10
+p contigs lowcovered 0.44 proportion of the above
+contigs segmented 8 number of contigs that have a >=50% estimated chance of being segmented
+p contigs segmented 0.04 proportion of the above
+Read metrics done in 10 seconds
+No reference provided, skipping comparative diagnostics
+```
+
+What makes a "good" mapping?
+
+* Both member of the read pair are aligned
+* in proper orientation
+* on the same contig
+* without overlapping either end of the contig
+
+Your mapping is "poor" if any of the above metrics aren't met
+
+The core stats that make up that file will be located here in the following directory
+
+* `/mnt/courses/biol2566/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789/transrate/`
+
+The names of the directories that have been auto-generated rely upon the parameters given in the json script
+
+Some of these folders may differ depending upon what you called them
+
+Assuming that everything has ran properly, running an `ls` in the `transrate` directory should yield two results
+
+* assemblies.csv
+* /Trinity/ directory
+ * contigs.csv
+
+### Step 2: Reduction
+
+#### CD-HIT-EST
+
+A program (from the CD-HIT suite) that is primarily used to cluster and compare protein and/or nucleotide sequences, massively reducing the amount of computational cycles required for downstream tasks
+
+Clusters nucleotides sequences that match some similarity threshold, building a fasta file of representative sequences (reducing redundancy) and a text file of the clusters
+
+#### Step 2.1: Editing `bowdoin_cdhit_template.json`
+
+For this specific run, just make the appropriate changes where CAPS are present
+
+#### Step 2.2: Launching `bowdoin_cdhit_template.json` through Commander
+
+* Same as for the Transrate script
+
+#### Step 2.3: Examining Output
+
+Located here:
+
+* `/mnt/courses/biol2566/people/nmaki/analysis/Bowdoin/jcoffman_001.reduced/0123456789`
+
+Generates two files:
+
+* jcoffman_001.reduced (reduced transcript file)
+* jcoffman_001.reduced.clstr (list of clusters found)
+
+```
+>Cluster 0
+0 5258aa, >TRINITY_DN35_c0_g1_... *
+>Cluster 1
+0 4734aa, >TRINITY_DN102_c0_g1... *
+>Cluster 2
+0 3306aa, >TRINITY_DN27_c0_g2_... at 99.64%
+1 3306aa, >TRINITY_DN27_c0_g2_... at 99.18%
+2 3749aa, >TRINITY_DN27_c0_g2_... *
+3 3749aa, >TRINITY_DN27_c0_g2_... at 99.65%
+4 247aa, >TRINITY_DN76_c0_g1_... at 94.33%
+>Cluster 3
+0 1953aa, >TRINITY_DN47_c0_g1_... at 97.54%
+1 3732aa, >TRINITY_DN47_c0_g1_... *
+```
+
+The tool has two modes, global and local
+
+* global computes seq identify from the number of identical bases divided by the length of a chosen sorter sequence
+* local computes it as the number identical bases divided by the length of the alignment
+
+Sequence identity needs to exceed a sequence identity threshold to relate that two sequences are part of the same cluster
+
+Used as a reduction technique, though you run the risk of merging together biologically interesting but similar sequences
+
+* Upshot is drop potentially redundant isoforms that would impact the quality of your assembly
+
+### Step 3: Exploration
+
+#### TransDecoder
+
+Identifies candidate protein coding regions within transcripts, allowing you to stage sequences in blastp/blastx for functional discovery
+
+Based upon following criteria:
+
+* minimum length ORF found in transcript
+* log-likelihood score being > 0 (explain?)
+* above coding score is greatest when ORF is scored in the 1st reading frame in comparison to scores in the other 2 forward reading frames
+* if candidate ORF is found to be fully encapsulated by the coordinates of another candidate ORF, longer one is reported
+ * Single transcripts is able to report multiple ORFS (chimeras, etc)
+
+First we need to extract the ORFS from our transcript assembly
+
+* TransDecoder defaults to identifying ORFs that are at least 100 AA long (can be modified to be lower)
+ * lowering this can increase the rate of false positive ORF predictions using shorter thresholds
+
+```
+longest_orfs.pep : all ORFs meeting the minimum length criteria, regardless of coding potential.
+longest_orfs.gff3 : positions of all ORFs as found in the target transcripts
+longest_orfs.cds : the nucleotide coding sequence for all detected ORFs
+
+longest_orfs.cds.top_500_longest : the top 500 longest ORFs, used for training a Markov model for coding sequences.
+
+hexamer.scores : log likelihood score for each k-mer (coding/random)
+
+longest_orfs.cds.scores : the log likelihood sum scores for each ORF across each of the 6 reading frames
+longest_orfs.cds.scores.selected : the accessions of the ORFs that were selected based on the scoring criteria (described at top)
+longest_orfs.cds.best_candidates.gff3 : the positions of the selected ORFs in transcripts
+```
+
+Also have the option to identify ORFs that have homology to known proteins through blastp queries
+
+Once you've extracted your ORFS, you can predict likely coding regions using TransDecoder.Predict
+
+* Normally, final set of candidate coding regions have the flag '.transdecoder' and consist of extensions .pep, .cds, .gff3, and .bed
+
+```
+transcripts.fasta.transdecoder.pep : peptide sequences for the final candidate ORFs; all shorter candidates within longer ORFs were removed.
+transcripts.fasta.transdecoder.cds : nucleotide sequences for coding regions of the final candidate ORFs
+transcripts.fasta.transdecoder.gff3 : positions within the target transcripts of the final selected ORFs
+transcripts.fasta.transdecoder.bed : bed-formatted file describing ORF positions, best for viewing using GenomeView or IGV.
+```
diff --git a/cli_workshops_2020/images/images_workshop_1/powershell.png b/cli_workshops_2020/images/images_workshop_1/powershell.png
index d70497c..5b7839e 100644
Binary files a/cli_workshops_2020/images/images_workshop_1/powershell.png and b/cli_workshops_2020/images/images_workshop_1/powershell.png differ
diff --git a/cli_workshops_2020/images/images_workshop_1/search_powershell.png b/cli_workshops_2020/images/images_workshop_1/search_powershell.png
index 1ea5d45..48686f9 100644
Binary files a/cli_workshops_2020/images/images_workshop_1/search_powershell.png and b/cli_workshops_2020/images/images_workshop_1/search_powershell.png differ
diff --git a/cli_workshops_2020/images/images_workshop_1/start_menu.png b/cli_workshops_2020/images/images_workshop_1/start_menu.png
index ab30037..bc0f657 100644
Binary files a/cli_workshops_2020/images/images_workshop_1/start_menu.png and b/cli_workshops_2020/images/images_workshop_1/start_menu.png differ
diff --git a/computational_skills/commandline_interface/alternative_software.md b/computational_skills/commandline_interface/alternative_software.md
new file mode 100644
index 0000000..baf1a84
--- /dev/null
+++ b/computational_skills/commandline_interface/alternative_software.md
@@ -0,0 +1,35 @@
+---
+title: Overview of Alternative Software
+author: "Nathaniel Maki"
+organization: MDIBL Computational Core
+date: "January 12th"
+---
+
+# Overview of Alternative Software
+
+## Learning Objectives
+* Provide a brief introduction to a few supplementary/alternative pieces of software, tailored towards Windows users
+* Installation and/or exploration of PuTTY and FileZilla
+
+## Summary
+* PuTTY offers a Windows-Version agnostic method of connecting to a remote machine via SSH
+* FileZilla provides a way of transferring files between local and remote machines, when command line tools such as `rsync` may not be available (or desired)
+
+### PuTTY + FileZilla
+
+PuTTY is a free implementation of SSH for Windows, and will let you easily access your remote Amazon instance
+* To install PuTTY, follow this [Link](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html) and select the 64-bit MSI installer
+* To connect, launch PuTTY and paste your Amazon instance name under `Host Name` and click `Open`
+
+
+
+* Click `Yes` on the initial prompt asking about ECSA key, and then enter your username and password to login
+
+
+
+FileZilla, on the other hand, is a (free) FTP tool, letting you move data between remote machines and local
+* To install FileZilla, follow this [Link](https://filezilla-project.org/download.php?type=client), click `Download FileZilla Client`, and choose the standard FileZilla "version"
+ * When running through the installer, be sure to choose `no` on the additional installation of McAfee
+* To connect, launch FileZilla and paste your Amazon instance name under `Host`, followed by entering your username and password under `Username` and `Password` respectively, and entering 22 for `Port`, and selecting the `Quickconnect` button
+
+
\ No newline at end of file
diff --git a/computational_skills/commandline_interface/cli_exercises.md b/computational_skills/commandline_interface/cli_exercises.md
new file mode 100644
index 0000000..d84d3a8
--- /dev/null
+++ b/computational_skills/commandline_interface/cli_exercises.md
@@ -0,0 +1,102 @@
+---
+title: Command Line Worksheet: Testing CLI Proficiency
+author: Nathaniel Maki
+created: 09-29-2021
+---
+
+
+# Command Line Worksheet
+
+## Introduction:
+
+The Bioinformatics Core at MDIBL has put together a short document to help you brush up on your Unix skills. We'll start with first getting you set up on your remote computer and gradually move through useful commands, with links in our Orientation page to relevant third-party and in-house documentation should you need a refresher.
+
+These module will be loosely based upon the following [documentation](https://ngs-docs.github.io/2021-august-remote-computing/introduction-to-the-unix-command-line.html)
+
+If you get stuck on an exercise or are unsure about how to proceed, don't hesitate to check the above link or post in the Discussion board on LabCentral
+
+We will also be hosting office hours at set times throughout the week, where you can drop in and ask any questions you may have
+
+## Module 0: SSh and Accessing Your Remote Machine
+
+* Launch your terminal (macOS) / PowerShell (Windows)
+* Log into your provisioned AWS machine using provided key / login credentials
+
+## Module 1: Navigation on the Command Line
+
+### Exercise 1.0
+
+* Find your current working directory
+
+### Exercise 1.1
+
+* List contents of current directory
+* List *all* contents of current directory
+* Learn more about the list command
+* Clear the terminal
+
+### Exercise 1.2
+
+* Move into the hidden *ghost* directory
+* Get current working directory
+* List all contents of *ghost* directory
+* Move back to a directory above
+* Move into *example_dir* directory
+* List all contents of *example_dir*
+* Move back to your home directory, using shorthand
+* Access ```man``` pages of a chosen CLI program
+
+## Module 2: File and Folder Manipulation
+
+### Exercise 2.1
+
+* Move back into *ghost* directory
+* Create a folder called test_dir
+* Create a file called test1.txt inside of the *ghost* directory
+* Move test1.txt into test_dir
+* Move into test_dir, copy test1.txt to test2.txt
+* Now migrate out of test_dir, up one directory
+* Move test_dir and its contents into your home directory
+
+### Exercise 2.2
+
+* Recursively copy test_dir to test_dir2
+* Move into test_dir2
+* Remove both test*.txt files
+* Move out of test_dir2
+* Remove the test_dir2 directory
+
+## Module 3: Content Visualization
+
+### Exercise 3.1
+
+* From your home directory, print the first 10 lines from the README file
+* Print all contents of the README file to the CLI
+* Run the same command without an input, and cancel the program invocation
+* View all contents of the README file in an interactive program
+
+## Module 4: File Permissions and Editing
+
+### Exercise 4.1
+
+* Check permissions of the contents of your root dir
+* Move into test_dir1
+* Open the test1.txt file in nano and add some text, saving changes, and exit
+* Using the wildcard character, remove both text files
+* Navigate up a directory and remove test_dir
+* Using redirection, store the contents of your directory into a text file
+* Pipe ls -l into less
+* Print out the `history` of your command line entries to the terminal
+
+## Module 5: Introduction to tmux
+
+### Exercise 5.1
+
+* Start a tmux session
+* Quit your tmux session
+* Re-enter tmux, and detach from current session
+* View active sessions
+* Re-attach to your session, and split the pane
+* Move into new pane
+* Kill current pane
+* Kill all tmux sessions
\ No newline at end of file
diff --git a/computational_skills/commandline_interface/cli_orientation.md b/computational_skills/commandline_interface/cli_orientation.md
new file mode 100644
index 0000000..c5e704f
--- /dev/null
+++ b/computational_skills/commandline_interface/cli_orientation.md
@@ -0,0 +1,113 @@
+---
+title: Command Line Orientation: Building CLI Proficiency
+author: Nathaniel Maki
+created: 09-07-2021
+---
+
+# Command Line Task Orientation: Building CLI Proficiency
+
+## Introduction
+
+Hello! Welcome to our Command Line Orientation! This document is meant to act as a guide to the command line, with links to useful resources, videos, and additional documentation for you to access and review. We've also included a link to a short worksheet, meant to test your CLI skills. For parity of experience, we've spun up personal Amazon compute instances for each user. This lets every student follow along, regardless of their local operating system.
+
+Included under each module will be links to relevant material, provided by both in-house and external authors.
+
+## Available Materials and References
+
+* For our in-house Command Line markdown documentation, select this [link](https://compbio.mdibl.org/resources/tutorials/computational_skills/commandline_interface/documents).
+* Click this [link](https://www.youtube.com/channel/UCEGqL6Li_k2DX86a2L6DAoQ/videos) to access useful Command Line videos by the Hubbard Center for Genome Studies. These will be helpful for all Modules.
+* This [link](https://ngs-docs.github.io/2021-august-remote-computing/introduction-to-the-unix-command-line.html) also contains a useful guide to the command line, and from which the exercises somewhat follow.
+* Lastly, this [link](https://medium.com/hackernoon/a-gentle-introduction-to-tmux-8d784c404340) provides a fantastic introduction to the ```tmux``` program (which we'll cover in class).
+ * An alternative [link](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) if Medium is not available
+ * Tmux cheat sheet [link](https://tmuxcheatsheet.com/)
+
+## Module 0: SSH and Accessing Your Remote Machine
+
+This module will cover the basics of locating your Terminal in macOS or Linux, and in the Windows OS, PowerShell. These tools act as a first step towards becoming acquainted with the Command Line Interface. For the purpose of this workshop, all instruction will be based upon and conducted on your Amazon Web Services machine. This way, all students will be able to have a uniform experience, regardless of local hardware. Before continuing, you need to be able to:
+
+## Locating Your Terminal
+
+Before this workshop can begin in earnest, we must first locate our terminal and connect to our remote AWS machine.
+
+For ease of use, we've broken Module 0 up into OS specific portions
+## Module 0: macOS
+
+### Launching a terminal
+
+From Spotlight Search, type **terminal**. The terminal application should be the top result.
+
+You can also launch the terminal from the application launcher or menu.
+
+## Module 0: Windows
+
+### Launching PowerShell in Windows
+
+Open the Start Menu, and search for PowerShell, then select.
+
+## SSH and Connecting to your AWS Machine
+
+Launching an ssh session is accomplished with the `ssh` command. `ssh` accepts the destination of the remote machine as an argument.
+
+You'll have been provided credential to access your remote AWS machine
+
+Connecting will look similar to the below ssh command
+
+`ssh minota@ec2-3-138-120-84.us-east-2.compute.amazonaws.com`
+
+Your terminal/PowerShell windows should look like the following.
+
+Once logged in, you can begin with the next Module in this workshop
+
+## Module 1: Navigation on the Command Line
+
+The next module introduces basic commands and operations able to be carried out on the CLI. It includes listing directory contents, moving around on the command line, and accessing the Manual of a given program. Before being able to proceed with this module, Module 0 must be completed.
+
+### Required Skills:
+
+* Find your current working directory
+* List all files/folders, use arguments to show hidden files/file details
+* Move throughout the filesystem
+* Work with both relative and absolute paths
+
+## Module 2: File and Folder Manipulation
+
+This module covers creating files and folders, moving, copying, renaming, and removing files and folders.
+
+### Required Skills:
+
+* Create a file
+* Create a folder
+* Move a file from one directory to another
+* Move a directory from one directory to another
+* Copy files and folders
+* Rename files and folders
+* Remove files and folders
+
+## Module 3: Content Visualization
+
+This module covers file visualization, using basic and interactive programs.
+
+### Required Skills:
+
+* Print the contents of a file to terminal
+* Program cancellation
+* Open file in a CLI application
+
+## Module 4: File Permissions and Editing
+
+This module encompasses using a text editor program to modify files, using wildcards, viewing permissions, and piping.
+
+### Required Skills:
+
+* View permissions
+* Edit files and folders, using wildcard character
+* Command redirection
+* Piping
+
+## Module 5: Introduction to tmux
+
+The last module covers working with tmux and actions such as attaching and detaching sessions, splitting, and moving between open sessions.
+
+### Required Skills:
+
+* tmux session management
\ No newline at end of file
diff --git a/computational_skills/commandline_interface/configuring_ssh.md b/computational_skills/commandline_interface/configuring_ssh.md
new file mode 100644
index 0000000..7a1fa33
--- /dev/null
+++ b/computational_skills/commandline_interface/configuring_ssh.md
@@ -0,0 +1,84 @@
+---
+title: Configuring SSH
+author: "Nathaniel Maki"
+organization: MDIBL Computational Core
+date: "January 12th"
+---
+
+# Configuring SSH
+
+## Learning Objectives
+* Create an SSH config file in both Windows and Linux
+* Edit the config file with an SSH host
+* Use alias defined in config file for SSH
+
+This is a short quality-of-life guide for anyone who doesn't have prior experience working with SSH configuration files
+
+A config file gets rid of having to remember long and unwieldy remote machine names, and makes connecting to them via SSH that much easier
+
+This is primarily targeted towards Windows users, but there is a portion for those on Linux/Unix systems as wll
+
+## Windows
+
+Open PowerShell and type in the following command:
+* `cd ~/.ssh`
+* This brings you to your "hidden" ssh directory
+
+Use `ls` to check the contents of the folder
+
+
+
+If there is no config file present, create one using:
+* `ni config`
+
+Then, open it with:
+* `notepad config`
+
+Here is where you'll give your host machine a more concise name
+
+
+
+You can see in my example (ignore the first `Host jenkins` entry) that I named the `Host` smcc_aws
+* `Hostname` is where you'd paste the full length name of your Amazon machine
+* `User` is whichever account you use to log into that machine
+
+Save your changes and go back to your PowerShell window
+
+Now when you go to remote in, instead of typing in `ssh` `user@ec2.....`, simply provide `ssh` followed by whatever you gave as a name for the `Host` in your config file
+
+It should look something like this
+
+
+
+## Linux/macOS
+
+Open a terminal and type in the following command:
+* `cd ~/.ssh`
+
+Use `ls -al` to check the contents of the folder
+
+
+
+If there is no config file present, create one using:
+* `touch config`
+
+Then, open it with:
+* `nano config`
+
+This launches the `nano` text editor within the terminal
+
+The format of this file is identical to the one mentioned above for Windows machines
+
+
+
+To save your changes to the config file:
+* On a Windows based keyboard, use `ctrl` + `o`, followed by enter to confirm
+* On a macOS based keyboard, use `ctrl` + `o`, followed by enter to confirm
+
+To exit out of `nano`:
+* On a Windows based keyboard, use `ctrl` + `x`
+* On a macOS based keyboard, use `ctrl` + `x`
+
+To remote in, just type `ssh` followed by whatever you gave as a name for the `Host` in your config file
+
+
\ No newline at end of file
diff --git a/computational_skills/commandline_interface/configuring_ssh_img/config_notepad.png b/computational_skills/commandline_interface/configuring_ssh_img/config_notepad.png
new file mode 100644
index 0000000..2bcb897
Binary files /dev/null and b/computational_skills/commandline_interface/configuring_ssh_img/config_notepad.png differ
diff --git a/computational_skills/commandline_interface/configuring_ssh_img/powershell_config_ssh.png b/computational_skills/commandline_interface/configuring_ssh_img/powershell_config_ssh.png
new file mode 100644
index 0000000..2f4192b
Binary files /dev/null and b/computational_skills/commandline_interface/configuring_ssh_img/powershell_config_ssh.png differ
diff --git a/computational_skills/commandline_interface/configuring_ssh_img/powershell_ssh_ls.png b/computational_skills/commandline_interface/configuring_ssh_img/powershell_ssh_ls.png
new file mode 100644
index 0000000..9aff734
Binary files /dev/null and b/computational_skills/commandline_interface/configuring_ssh_img/powershell_ssh_ls.png differ
diff --git a/computational_skills/commandline_interface/configuring_ssh_img/terminal_config_ssh.png b/computational_skills/commandline_interface/configuring_ssh_img/terminal_config_ssh.png
new file mode 100644
index 0000000..f2aef1c
Binary files /dev/null and b/computational_skills/commandline_interface/configuring_ssh_img/terminal_config_ssh.png differ
diff --git a/computational_skills/commandline_interface/configuring_ssh_img/terminal_nano.png b/computational_skills/commandline_interface/configuring_ssh_img/terminal_nano.png
new file mode 100644
index 0000000..f44cdb8
Binary files /dev/null and b/computational_skills/commandline_interface/configuring_ssh_img/terminal_nano.png differ
diff --git a/computational_skills/commandline_interface/configuring_ssh_img/terminal_ssh_ls.png b/computational_skills/commandline_interface/configuring_ssh_img/terminal_ssh_ls.png
new file mode 100644
index 0000000..22da15a
Binary files /dev/null and b/computational_skills/commandline_interface/configuring_ssh_img/terminal_ssh_ls.png differ
diff --git a/computational_skills/commandline_interface/intro_to_WSL.md b/computational_skills/commandline_interface/intro_to_WSL.md
new file mode 100644
index 0000000..ca5e03f
--- /dev/null
+++ b/computational_skills/commandline_interface/intro_to_WSL.md
@@ -0,0 +1,189 @@
+---
+title: Introduction to the Windows Subsystem for Linux
+author: "Nathaniel Maki"
+organization: MDIBL Computational Core
+date: "May 28th"
+---
+# Introduction to the Windows Subsystem for Linux
+
+## Learning Objectives
+
+* Introduction to PowerShell
+* Learn how to enable the WSL feature-set
+* Install the Ubuntu Linux distribution from the Microsoft Store
+* Launch Ubuntu Linux and install a small piece of software
+
+## Summary
+
+* PowerShell, an integrated Command Line Interface (CLI) within Windows, has the capability to connect to remote machines via SSH
+* For Windows users, PowerShell acts as a robust test environment for basic command line proficiency to be established
+* If looking for a more comprehensive "Unix-on-Windows" experience, install and work within the Windows Subsystem for Linux (WSL)
+
+## A Couple Caveats
+
+Before we begin, there are a couple things that I'd like to point out, mostly to save you time
+
+**If all you're looking to do is to remote into a server via SSH, then enabling WSL and installing Ubuntu Linux is probably overkill**
+
+* If your Windows 10 release version is **1809** or higher, and you are running **PowerShell 5.1** or higher, you already have SSH capabilities
+* This means that you can remotely connect to a server via PowerShell without any additional installations
+
+If on a Windows OS earlier than Windows 10 1809 (Windows 7, 8, 8.1 etc), you'll have to enable ssh by installing the OpenSSH feature into PowerShell
+
+### Launching PowerShell
+
+Open the start menu
+
+
+
+Search for PowerShell and select from menu
+
+
+
+To check which version of PowerShell you have installed, run the following command:
+ * `Get-Host | Select-Object Version`
+
+
+
+To see which version of Windows you're on, enter `winver` into the PowerShell command line
+
+
+
+If you meet all of the criteria above, feel free to remote away :blush:
+
+If not, you need to install the OpenSSh plugin, or you can work with PuTTY and FileZilla to get similar functionality:
+
+### Installing OpenSSH into PowerShell (For Windows Version 7 +)
+
+First search for `Optional Features` on the Windows start menu
+
+* Select `+ Add a feature`
+* Search the list for `OpenSSH Client`
+* Select `Install`
+* After install completes, reboot
+
+For those on PowerShell, if you'd like a nice quality of life upgrade, you can install the Windows Terminal from the Microsoft Store
+
+Windows Terminal is a fast, customizable, and modern terminal application, specifically built for Windows 10
+
+It acts as a hub for Command Prompt, PowerShell, and WSL, and natively supports many terminal features that can be found on MacOS/Linux distributions
+
+## Windows Subsystem for Linux
+
+WSL gives you access to a fully featured Linux environment within the Windows operating system. No virtual machine or VM software required (the capability is already integrated)
+
+This lets you leverage the wide range of professional programs developed for Windows, along with the vast repositories of free and open source software built on and for Linux
+
+Most importantly, it brings Windows users to functional command line parity with macOS and Linux, by providing them with a native Unix shell
+
+* WSL allows for the running of many common command-line tools, including:
+* `rsync`, `grep`,`awk`,`sed`
+* Execution of Bash shell scripts
+* Linux CLI apps like tmux, vim, and emacs
+* Language support for Python, Ruby, NodeJS, etc
+* Utilization of the selected Linux distributions package manager: installation of additional software/tools
+
+At the moment, you are constrained to command-line tools and applications, though Microsoft is working to bring full GUI applications to the platform.
+
+## Enabling WSL
+
+WSL 1 is supported on Windows 10 Version 1709 and higher
+
+If you are running an earlier version, you need to update your system to gain access to this feature
+
+With the above requirement met, turning on WSL is fairly straightforward, especially as it comes baked into Windows
+
+### If you are comfortable working on the command line
+
+* Open an elevated (administrator) PowerShell window
+ * To do so, right click on the **Windows PowerShell** application and select *Run as administrator*
+* Next, paste the following code into your open, elevated PowerShell window:
+ * `dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart`
+
+### If you'd rather use the Graphical User Interface (GUI)
+
+Open the start menu and search for *turn Windows features on or off*
+
+
+
+Scroll down and check the box marked *Windows Subsystem for Linux*
+
+
+
+Read below before clicking *OK*
+
+## WSL 1 / WSL 2
+
+The steps above outline the install process for WSL 1, however, there is an upgrade available called WSL 2 (the caveat being you need a newer version of Windows).
+
+WSL 2 provides access to a full linux kernel, as opposed to the cut-down (but still very functional) version in WSL 1. Additionally, the upgrade provides greater system call compatibility, and a general performance uplift
+
+WSL2 is supported on Windows 10 Version 1903 and higher
+
+If you still have the `Turn Windows features on or off` window open, check the box marked `Virtual Machine Platform`
+
+* Reboot
+
+### If you'd rather use the command line
+
+Open an elevated (administrator) PowerShell window
+
+* To do so, right click on the **Windows PowerShell** application and select *Run as administrator*
+
+Next, paste the following code into your open, elevated PowerShell window:
+
+* `dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart`
+
+Restart your machine to finish the install and update to WSL2
+
+On reboot, open PowerShell and run the following command to set WSL2 as the default version
+
+* `wsl --set-default-version 2`
+
+## Installing a Linux Distribution
+
+Enabling WSL is the first step, the second being to choose a Linux distribution to install from the Microsoft Store
+
+Microsoft offers a few distributions of Linux, including Debian, OpenSUSE, Ubuntu, and Kali. For this example, we're going to work with Ubuntu.
+
+Developed by the company Canonical, Ubuntu is one of the most widely used and supported Linux distributions, in part due to it's stability and robust feature-set. It's LTS (Long Term Support) versions are supported for up to 5 years after launch!
+
+There are a couple LTS Ubuntu distributions available, but we're going to want the one marked simply **Ubuntu**, with no numbers next to the name.
+
+This is built off of the most recent release of the OS, 20.04, and came out in April of this year.
+
+Open up the Microsoft Store and search `Ubuntu`
+
+
+
+* From there, just click *install*, and once it finishes, *launch*
+* Follow along with the on screen prompts (creating a username/password, etc)
+* **Great! You've successfully installed and configured Ubuntu Linux on your Windows machine :)**
+* For a quick test-run, try installing the program Neofetch
+ * Neofetch is a small tool that gives you an overview of your system resources, along with a neat little distribution-specific ASCII graphic
+ * First, add the repository (software source): `sudo add-apt-repository ppa:dawidd0811/neofetch`
+ * then run `sudo apt-get install neofetch`
+ * after the program is installed, run it by executing `neofetch` on the command line
+
+
+
+#### Navigating to the Windows Filesystem
+
+By default, when launching your installed Linux distribution through WSL, you will be placed into your Windows user directory
+
+To navigate to your home directory within Linux, use the `cd ~` command
+
+To go back and work with files/folders that exist on your Windows Filesystem, you can use the following command: `cd /mnt/`, followed by the letter of the drive where your data is stored on
+
+* For example, if you've saved your output data to the `C:` drive, then your command would look like this: `cd /mnt/c`. If you've saved to another drive, just replace `c` with that letter (`d`,`e`,etc)
+
+## Contact
+
+If you have questions about the information in this workshop document, please contact:
+
+```md
+Nathaniel Maki
+Bioinformatics Research Training Specialist
+MDI Biological Laboratory
+nmaki[at]mdibl.org
+```
\ No newline at end of file
diff --git a/intro_to_cwl.md b/computational_skills/commandline_interface/intro_to_cwl.md
similarity index 87%
rename from intro_to_cwl.md
rename to computational_skills/commandline_interface/intro_to_cwl.md
index 08361f1..3ca4a67 100644
--- a/intro_to_cwl.md
+++ b/computational_skills/commandline_interface/intro_to_cwl.md
@@ -7,6 +7,7 @@ date: "June 21st, 2020"
# Introduction to CWL
[//]: <> (TODO: add reference to man-pages)
+[//]: <> (TODO: update with reference images)
## Learning Objectives
@@ -16,24 +17,27 @@ date: "June 21st, 2020"
## Summary
-* The analysis work we do in the MDIBL Computational Core can be boiled down to a few key components/tools/languages, two of which we will be discussing today (CWL and JSON/YAML)
+The analysis work we do in the MDIBL Computational Core can be boiled down to a few key components/tools/languages, two of which we will be discussing today (CWL and JSON/YAML)
## CWL
-* In a nutshell, the Common Workflow Language, also known as CWL, serves as a medium in which to write descriptions of existing command line programs and tools, and then to string those descriptions into functional workflows
-* The language requires you to preemptively record inputs and expected outputs, but can be very flexible in what it takes in, and what results are given
+In a nutshell, the Common Workflow Language (also known as CWL), serves as a medium in which to write descriptions of existing command line programs and tools, and then to string those descriptions into functional workflows
+
+The language requires you to preemptively record inputs and expected outputs, but can be very flexible in what it accepts, and what results are given
* The main benefit of using this language is its incredible portability, re-usability, and scalability
-* Instead of having to deal with the environment of your host machine to run your analysis / execute your tool, you can use integrated Docker calls to download and configure images of your tool at runtime
- * To run a pipeline written in CWL, essentially all you need to have installed is CWLTool and Docker
+
+Instead of having to deal with the environment of your host machine to run your analysis / execute your tool, you can use integrated Docker calls to download and configure images of your tool at runtime
+ * To launch a pipeline written in CWL, you only need to have CWLTool and Docker installed
* Important to note, edits do need to be made to the accompanying YAML/JSON configuration file to properly pass in whatever files and program parameters you want to run your pipeline with.
* However, you only need to do this once, and once edited, you have a saved record of what settings you used, and what tools, in what order, and which files were provided for input
-* Because of the above attributes, CWL has seen adoptions in fields such as Astronomy, Machine Learning, and Bioinformatics
+
+Because of the above attributes, CWL has seen adoptions in fields such as Astronomy, Machine Learning, and Bioinformatics
**This tutorial assumes that you already have CWLTool and Docker installed locally or on a remote machine**
## YAML/JSON
-* These are human-readable data transfer languages, and interchangeable with one another (serve the same purpose, syntactically similar)
+These are human-readable data transfer languages, and interchangeable with one another (serve the same purpose, syntactically similar)
### JSON Example
```json
@@ -69,7 +73,7 @@ nthreads: 8
## Editing Configuration Files
-* Before we can start working with CWL directly, it's important to first get a grasp on how to edit the configuration files needed to successfully execute a wrapped tool
+Before we can start working with CWL directly, it's important to first get a grasp on how to edit the configuration files needed to successfully execute a wrapped tool
* As mentioned above, they can be written in either the JSON or YAML language
* To access the scripts and config files for this tutorial, clone the biocore_documentation repository found [here](https://github.com/mdibl/biocore_documentation)
1. To clone, simply click the `Clone` button on the main repo page, and copy the link provided in the `Clone with HTTPS` box
diff --git a/computational_skills/commandline_interface/intro_to_powershell.md b/computational_skills/commandline_interface/intro_to_powershell.md
new file mode 100644
index 0000000..288a822
--- /dev/null
+++ b/computational_skills/commandline_interface/intro_to_powershell.md
@@ -0,0 +1,47 @@
+# Introduction to PowerShell
+
+## Common Commands
+
+### Navigation:
+
+List everything in current directory
+* gci (get child item) / ls / dir
+
+Get current directory
+* pwd
+
+Moving between directories
+* sl (set location), cd
+
+Editing + creating a file:
+* ni (test.txt)
+* notepad (test.txt)
+
+Create a new directory:
+* mkdir
+
+Copying + moving a file:
+* cp test.txt dir
+* mv test.txt dir
+
+List contents of a file:
+* cat (filename)
+* echo test.txt
+
+Open current directory in explorer:
+* explorer .
+
+Removing items:
+* rm (file/folder)
+
+Wildcards:
+* test1.txt test2.txt test3.txt > *.txt
+
+**tab autocompletion**
+
+Killing a running process:
+* ctrl + c
+
+Exiting Powershell
+* Exit
+
diff --git a/dataset_downloads_sra.md b/computational_skills/commandline_interface/intro_to_sra.md
similarity index 92%
rename from dataset_downloads_sra.md
rename to computational_skills/commandline_interface/intro_to_sra.md
index 80aa9c9..8c432f7 100644
--- a/dataset_downloads_sra.md
+++ b/computational_skills/commandline_interface/intro_to_sra.md
@@ -1,12 +1,11 @@
---
-title: Working with the SRA and the SRA Toolkit
+title: Introduction to the Sequence Read Archive and SRA Toolkit
author: "Nathaniel Maki"
-contact: nmaki@mdibl.org
organization: MDIBL Computational Core
date: "May 12th, 2020"
---
-**add images to improve guide**
-## What is SRA?
+**add intro_to_sra_img to improve guide**
+# Introduction to the Sequence Read Archive and SRA Toolkit
The Sequence Read Archive, or SRA, is an online archive for raw sequence data, generated from next generation sequencing tech such as Illumina, PacBio, and IonTorrent.
It is also the National Institute of Health's (NIH) primary repository for high throughput sequencing data.
@@ -61,7 +60,7 @@ The Sequence Read Archive can be broken into four main levels, each with their o
* Analysis: DRZ, ERZ, SRZ accessions
* Run: DRR, ERR, SRR accessions
-
+
**will introduce GEO before SRA**
@@ -71,11 +70,11 @@ We'll be using this one [GEO summary page](https://www.ncbi.nlm.nih.gov/geo/quer
To find the SRA link, just scroll down from the top of the page:
-
+
And look near the bottom, and under **Relations** select the SRA accession link
-
+
The accession link brings you to a page that holds all biological samples related to this analysis.
Selecting, for example, this [Link](https://www.ncbi.nlm.nih.gov/sra/SRX365519[accn]) loads that specific run, and the files associated with it.
@@ -88,7 +87,7 @@ To do this:
The SRA Run Selector page should open upo in another tab, and lookc like this:
-
+
Under **Common Fields** you'll find a ton of information describing the samples from the study, including:
* Assay Type (RNA-seq, CHiP-seq, etc)
diff --git a/images/geo_accession_lower.png b/computational_skills/commandline_interface/intro_to_sra_img/geo_accession_lower.png
similarity index 100%
rename from images/geo_accession_lower.png
rename to computational_skills/commandline_interface/intro_to_sra_img/geo_accession_lower.png
diff --git a/images/geo_accession_upper.png b/computational_skills/commandline_interface/intro_to_sra_img/geo_accession_upper.png
similarity index 100%
rename from images/geo_accession_upper.png
rename to computational_skills/commandline_interface/intro_to_sra_img/geo_accession_upper.png
diff --git a/images/sra_run_selector.png b/computational_skills/commandline_interface/intro_to_sra_img/sra_run_selector.png
similarity index 100%
rename from images/sra_run_selector.png
rename to computational_skills/commandline_interface/intro_to_sra_img/sra_run_selector.png
diff --git a/images/sra_structure_infograph.png b/computational_skills/commandline_interface/intro_to_sra_img/sra_structure_infograph.png
similarity index 100%
rename from images/sra_structure_infograph.png
rename to computational_skills/commandline_interface/intro_to_sra_img/sra_structure_infograph.png
diff --git a/wsl_images/.DS_Store b/computational_skills/commandline_interface/intro_to_wsl_img/.DS_Store
similarity index 100%
rename from wsl_images/.DS_Store
rename to computational_skills/commandline_interface/intro_to_wsl_img/.DS_Store
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/filezilla.png b/computational_skills/commandline_interface/intro_to_wsl_img/filezilla.png
new file mode 100644
index 0000000..d568952
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/filezilla.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/ms-store_ubuntu.png b/computational_skills/commandline_interface/intro_to_wsl_img/ms-store_ubuntu.png
new file mode 100644
index 0000000..de7d74b
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/ms-store_ubuntu.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/on-off_windows.png b/computational_skills/commandline_interface/intro_to_wsl_img/on-off_windows.png
new file mode 100644
index 0000000..d56cd57
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/on-off_windows.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/powershell.png b/computational_skills/commandline_interface/intro_to_wsl_img/powershell.png
new file mode 100644
index 0000000..5b7839e
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/powershell.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/powershell_ver_check.png b/computational_skills/commandline_interface/intro_to_wsl_img/powershell_ver_check.png
new file mode 100644
index 0000000..0c5ac2d
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/powershell_ver_check.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/putty_launch.png b/computational_skills/commandline_interface/intro_to_wsl_img/putty_launch.png
new file mode 100644
index 0000000..ce760c8
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/putty_launch.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/putty_login.png b/computational_skills/commandline_interface/intro_to_wsl_img/putty_login.png
new file mode 100644
index 0000000..bcf60fe
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/putty_login.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/search_powershell.png b/computational_skills/commandline_interface/intro_to_wsl_img/search_powershell.png
new file mode 100644
index 0000000..48686f9
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/search_powershell.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/start_menu.png b/computational_skills/commandline_interface/intro_to_wsl_img/start_menu.png
new file mode 100644
index 0000000..bc0f657
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/start_menu.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/ubuntu_terminal.png b/computational_skills/commandline_interface/intro_to_wsl_img/ubuntu_terminal.png
new file mode 100644
index 0000000..25b3d0f
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/ubuntu_terminal.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/winver.png b/computational_skills/commandline_interface/intro_to_wsl_img/winver.png
new file mode 100644
index 0000000..3f7acf5
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/winver.png differ
diff --git a/computational_skills/commandline_interface/intro_to_wsl_img/wsl_box.png b/computational_skills/commandline_interface/intro_to_wsl_img/wsl_box.png
new file mode 100644
index 0000000..9ee5bbf
Binary files /dev/null and b/computational_skills/commandline_interface/intro_to_wsl_img/wsl_box.png differ
diff --git a/computational_skills/commandline_interface/minota_2021_CLI_script.md b/computational_skills/commandline_interface/minota_2021_CLI_script.md
new file mode 100644
index 0000000..8d46e17
--- /dev/null
+++ b/computational_skills/commandline_interface/minota_2021_CLI_script.md
@@ -0,0 +1,302 @@
+# Command Line Intro Script
+
+First things first, locate terminal/Powershell
+
+* The CLI that is built into your macOS or Windows computer
+* Incredibly powerful, provides you with text-based interaction of OS
+ * lets you execute programs written specifically for CLI
+ * manipulate files/directories
+ * perform bulk actions
+ * saves running history of commands/actions, reproducible
+
+### ssh
+
+From here, use the `ssh` program to connect to your remote AWS machine
+
+* `ssh` user@ec2 etc....
+
+Why use a remote computer and not just your local PC?
+
+* Available compute/memory/resources
+ * Access to these locally will vary, and in real world use will not be adequate
+* Variations in OS, permissions, etc between attendees, provides a "standardized" learning env
+
+Once all users are connected, we can proceed
+
+### pwd
+
+First command we'll cover, `pwd`
+
+* prints out your current working dir to terminal
+* if you're not sure where you are in the filesystem, this command will help you find your bearings
+* right now you're inside your `home directory`
+
+### ls
+
+Next is `ls`, can anyone guess what this program does?
+
+* lists all contents of your current working dir
+* can be given an `argument` that modifies the behavior of the program
+* `ls -l` lists contents of your dir, but with additional info, and in a list-format
+
+Before we move on, lets look at what's been printed out on the terminal
+
+* For each object row (file or dir), moving from left to right, we can see:
+ * whether the object is a file or dir, denoted by `d` or `-`, first "flag"
+ * available read/write/execute permissions, ordered by user/group/all
+ * file size
+ * date of last change to file
+ * name
+
+* `ls -al` lists *all* contents of our directory, including hidden files and folders
+* `ls -lt` lists in order of creation
+
+### man pages
+
+As you can see, even a fairly straightforward program such as `ls` has a myriad of options
+
+* To learn more about all that a specific program is capable of, you can view the manual using
+* `man` *program name*
+* Quit out of `man` with `q`
+
+### clear
+
+By now our terminal is starting to look a bit messy, lets clean it up by using the `clear` command
+
+* clears your terminal of text
+
+### cd
+
+To navigate throughout the filesystem, we use the `cd` command (standing for change directory), followed by where we want to go
+
+* first, lets use `ls -al` again to view all contents of our homedir
+* now use the `cd` command to move into the dir called `ghost_dir`
+* executing the `pwd` command demonstrates that our current working directory has changed
+
+lets list the contents of this dir with `ls`
+
+* returns nothing, dir seems empty? Rerun with `ls -al`
+* hidden file present, generally reserved for system files, etc
+
+How can we get back to our home dir? A couple methods
+
+* first is to simply use `cd` without any args
+* another is to use a shorthand denoting "one dir above", this is `cd ..`
+
+Now lets change into the `example_dir` directory
+
+* run an `ls -l` to view contents
+* `cd` into the `fastqc` directory
+* `ls -l` contents
+* Output of the program `fastqc`, a preliminary QC tool that generates a read quality report
+ * also present is the config file used to execute this program (we'll cover this later), and log
+* `pwd` shows that we're a couple directories down from `home`, few ways to get out
+
+### absolute and relative paths
+
+When working on the command line, you can navigate to, reference, view, and interact with files and dirs in two primary ways
+
+Through using its Absolute path, or relative one
+
+#### absolute path
+
+* points to a location in file system, independent of your current working directory
+* also called a "full path"
+* location of file or dir relative to root directory
+ * root dir is the highest dir in a file structure hierarchy (draw on screen?)
+* ex: `/home/minota/example_dir/fastqc`
+
+#### relative path
+
+* points to location in file system, uses current dir as reference
+* location of a file or dir relative to current dir
+* ex: `./example_dir/fastqc`
+ * truncated, not the "complete path"
+
+Using the absolute path as an argument to `cd`, lets move back to our home directory
+
+* `cd /home/minota/` *mention tab-auto-complete*
+
+### mkdir / touch
+
+* `cd` into `.ghost_dir`
+* using the `mkdir` command, create a folder called `test_dir`
+* `cd` into `test_dir`
+* now use the `touch` program to create a file called `test1.txt`
+
+### cp
+
+next we're going to copy the file we just created using the `cp` command
+
+* `cp test1.txt` followed by the name of the "copied" file you want to create (`test2.txt`)
+* running `ls` shows that we have 2 files in our dir now
+* when working with directories, you can use the `-r` flag to copy all of the contents of a directory
+ * `cp -r`
+
+### mv
+
+the `mv` command allows you to move files and folders around the filesystem, as well as renaming them
+
+* move `test2.txt` up a directory using the shorthand `../`
+* `mv test2.txt ../`
+
+now lets change the name of our `test1.txt` file to `renamed_test1.txt` using
+
+* `mv test1.txt renamed_test1.txt`
+* `ls -l` to show change
+
+once the rename has taken place, lets move up a directory with `cd ../`
+
+### rm
+
+To remove a file, you can use the `rm` command, followed by the file you wish to delete
+
+* keep in mind, this is a *permanent* action, so be very careful when executing
+* `rm test2.txt`
+
+next, we're going to use the `mv` command to move our `test_dir` into our homedir
+
+* `mv test_dir` `~`
+
+now move out of `.ghost_dir` with `cd`
+
+### rmdir
+
+lets try to delete the `.ghost_dir` directory with the `rmdir` command
+
+* `rmdir .ghost_dir` command
+
+it failed, why?
+
+a directory must be empty before deleting, there appears to be a file still present
+
+* from your home directory, remove the hidden file in `.ghost_dir`
+* `rm ./ghost_dir/.hidden_file.txt`
+
+now use the `rmdir` command with `.ghost_dir` as target
+
+### wildcards
+
+when working with multiple files, you can use a special `wildcard` character to perform bulk manipulation by pattern matching
+
+* `cd` into `test_dir`
+* `touch test2.txt`
+
+to remove both files at once, execute
+
+* `rm *test*`
+* removes any file with "test" in its name
+
+`cd` back to homedir
+
+### head + cat
+
+the `head` command prints out the first 10 lines of the file you give it
+
+* `head README.txt`
+* the `-n` flag lets you custom specify the number of lines
+* `head -n 20 README.txt`
+
+the `cat` command prints out all contents of a file to terminal
+
+* actually stands for concatenate, lets you join files together
+* `cat README.txt`
+
+now lets run `cat` on the `.fastq` file and see what happens
+
+* small sequence file, begins printing all contents to terminal
+* larger files could take a minute
+* to kill an active command, use `ctrl + c`
+* to rerun a previous command, you can use the `up` arrow on the keyboard, or the `!!` shorthand
+ * rerun `cat` and cancel with `ctrl + c`
+
+### less
+
+opens file in interactive program to view and scroll through
+
+* open the `.fastq` file in `less`, quit
+
+### nano - interactive editing
+
+nano is an interactive text editor, allows you to create and modify files on the CLI
+
+* open `README.txt` with `nano`
+* scroll with arrow keys
+* to edit, just type, to save changes use `shift + o`, and to quit `shift + x`
+
+### redirection
+
+You can also redirect the output of a program to a text file, as opposed to letting it print to the terminal
+
+* lets print the contents of our homedir to a file using the `>` redirection character
+* `ls -al > homedir.txt`
+* `ls -al`
+* `cat homedir.txt`
+
+### piping
+
+The CLI has the capacity to chain together multiple programs, using the `|` pipe special character
+
+* `ls -l | less`
+* instead of printing to a dir, or being redirected to a file, contents of our working dir are opened in the `less` program
+
+### history
+
+For a running record of everything we've done so far, we can use the `history` command
+
+* if you'd like to save it to a file, try `history > history_day1.txt`
+
+### tmux
+
+tmux is a terminal multiplexer. Essentially, it allows you to organize your terminal sessions into panes, which you can interact with and manipulate
+
+* eliminates need to have multiple terminal windows open
+* one of the most useful features is the ability to detach from a session
+ * session lives until you kill it (restart, etc), and you can reattach at any time
+ * especially needed when running large Bioinformatics processes
+ * trigger a pipeline, detach, and check back later
+* Otherwise, a "standard" terminal session can time out or lose connection, prematurely killing your process
+
+#### launch
+
+to launch a `tmux` session, simply execute the `tmux` command on the CLI
+
+* a green bar at the bottom of your terminal indicates that `tmux` is active
+
+commands in `tmux` are initiated by the use of a *prefix key*, followed by *command key*. Default is `ctrl + b` + `command`
+
+#### exit
+
+to exit `tmux`:
+
+* type `exit` on the terminal, or use `ctrl + d`
+
+now, relaunch `tmux`
+
+#### detach
+
+* to `detach` from an active session, use `ctrl + b` + `d`
+
+this drops you out of `tmux` and back into your standard shell
+
+however, you've only detached, not killed, so you can reconnect to your still-running session
+
+#### view
+
+* to view current sessions, use `tmux ls`
+
+#### attach
+
+to reattach to your session, use `tmux attach-session -t "session #"`
+
+#### pane-split
+
+to split your pane in a horizontal fashion, use `ctrl + b` + `"`
+
+for vertical, `ctrl + b` + `%`
+
+#### pane-navigation
+
+great, we now have a couple open panes. how do we navigate between them?
+
+simply use `ctrl + b` + `arrow key`
\ No newline at end of file
diff --git a/computational_skills/commandline_interface/minota_2021_day3_script.md b/computational_skills/commandline_interface/minota_2021_day3_script.md
new file mode 100644
index 0000000..c1c9fcf
--- /dev/null
+++ b/computational_skills/commandline_interface/minota_2021_day3_script.md
@@ -0,0 +1,386 @@
+# MINOTA 2021 Day 3 Script
+
+## Guide
+
+For tomorrow, here are two things that would be great if you could work on. Assume that we will have at least two transcriptomes built from the jcoffman_001.reduced data, one from trinity and one from rna_spades. Working with one or both of those, we are going to want to run it through (1) cd-hit-est by itself, (2) transdecoder by itself, (3) cd-hit-est followed by transdecoder, (4) cd-hit-est followed by transdecoder followed by cd-hit. If we have the time, it would be very much of interest to potentially concatenate the files together and then run the same analyses, under the assumption that each program will do an imperfect job, and joining their outputs and using the filtering tools here will come up with a better answer
+
+In addition to this, we will probably also want to run transrate on some subset of this as well, using the input fastq files against the transcriptomes
+
+--------
+
+It would probably be best to start with a run through transrate first, including the starting data (which would be the trim_galore output). Then for comparison, you could use cd-hit-est on the data set and then run transrate on the resulting reduced data set
+
+## Getting Started
+
+First, let’s get ourselves logged back into our remote AWS machines.
+
+* Today we’ll be covering refinement, clustering, protein prediction, and assessing the “completeness” of an assembled transcriptome
+* `cd` into your `minota_work` directory
+* run the following commands:
+ * `cd_hit_est_singularity -x`
+ * `cd_hit_singularity -x`
+ * `transrate_pe_singularity -x`
+ * `transdecoder_singularity -x`
+
+## Transrate
+
+The first program we're going to run our trinity transcriptome through is Transrate
+
+WTransrate is a piece of software for *de-novo* transcriptome assembly quality analysis
+
+It's capable of providing detailed reports, and examines your assembly - comparing it to experimental evidence (which are your initial sequencing reads), writing out quality scores for assemblies and generated contigs
+
+It also has the capability to merge together multiple assemblies from varied assemblers, and conduct scoring on the resulting amalgamation
+
+Overall, it analyzes an assembly in three ways
+
+* Inspection of contig sequences (what the transcriptome assembly is composed of)
+* Mapping Reads to contigs, and inspecting how well they align (how closely does your resulting assembly match the data that was used to generate it)
+* Aligning the contigs against proteins or transcripts from related species, inspecting resulting alignments
+
+The most *useful* metrics are the ones based upon read mapping
+
+* Transrate Assembly score
+* Optimized Assembly score
+* Individual contig scores
+
+### Editing Transrate config file
+
+Now lets edit our generated Transrate config file using `nano`
+
+* here is where it may be useful to split your `tmux` session
+* this will allow you to query paths to your assembly/input data, while keeping your editor open in another pane
+
+Once open in `nano`, point the *assembly* parameter to the location of your Trinity assembly (*Trinity.fasta*)
+
+* `assembly=/home/minota/minota_work/trinity_out/Trinity.fasta`
+
+Next are your right and left reads
+
+* point towards the trimmed read files
+* `left=/home/minota/minota_work/trim_galore_out/jcoffman_001.reduced_all_R1_val_1.fq.gz`
+* `right=/home/minota/minota_work/trim_galore_out/jcoffman_001.reduced_all_R2_val_2.fq.gz`
+
+And finally the name and location of your output directory
+
+* `output=/home/minota/minota_work/transrate_out`
+
+#### Examining Transrate Output
+
+First, we want to look at the .log file
+
+The core stats that make up that file will be located here in the following directory
+
+* `/home/minota/minota_work/transrate_out`
+
+The names of the directories that have been auto-generated rely upon the parameters given in the configuration file
+
+Some of these folders may differ depending upon what you called them
+
+Assuming that everything has ran properly, running an `ls` in the `transrate_out` directory should yield two results
+
+* assemblies.csv
+* /Trinity/ directory
+ * contigs.csv
+
+##### Transrate Score
+
+The most useful metric, measure quality of the assembly *without* using a reference
+
+* Score is generated for the entire assembly, and for each contig, with the scoring process using the reads that were used to build the assembly as evidence
+* Provides you with the capability to compare multiple assemblies based off of the same reads
+ * an increase in your score most likely corresponds to an assembly with higher biological accuracy
+ * captures how confident you can be in what was assembled, and how "complete" your transcriptome is
+ * Scales from 0 to 1.0 (maximum)
+
+Expression-weighted quality score
+
+* Score for each contig is multiplied by its relative expression before being included in assembly score (low weight assigned to poorly expressed contigs)
+* More generous to assemblies with poorly assembled contigs of low expression
+ * stored in the `assemblies.csv` file
+
+What we're most interested in is the `score` of our assembly
+
+* How confident we can be in what we assembled
+* How complete the assembly is
+
+The score ranges from 0 to 1, with a higher score increasing the likelihood that you have an assembly that is biologically accurate
+
+```
+TRANSRATE ASSEMBLY SCORE 0.0573
+-----------------------------------
+TRANSRATE OPTIMAL SCORE 0.1539
+TRANSRATE OPTIMAL CUTOFF 0.3685
+good contigs 81
+p good contigs 0.43
+```
+
+* Use the `cat` command on the `assemblies.csv` file found in your user-specific `transrate_out` directory
+
+**Expected**
+
+```
+assembly,n_seqs,smallest,largest,n_bases,mean_len,n_under_200,n_over_1k,n_over_10k,n_with_orf,mean_orf_percent,n90,n70,n50,n30,n10,gc,bases_n,proportion_n,fragments,fragments_mapped,p_fragments_mapped,good_mappings,p_good_mapping,bad_mappings,potential_bridges,bases_uncovered,p_bases_uncovered,contigs_uncovbase,p_contigs_uncovbase,contigs_uncovered,p_contigs_uncovered,contigs_lowcovered,p_contigs_lowcovered,contigs_segmented,p_contigs_segmented,score,optimal_score,cutoff,weighted
+/home/minota/minota_work/trinity_out/Trinity.fasta,133,202,5258,185932,1397.98496,0,67,0,77,71.56003,603,1530,2272,3176,4734,0.4471,0,0.0,535728,426019,0.79522,187828,0.3506,238191,40,3010,0.01619,96,0.7218,1,0.00752,1,0.00752,7,0.05263,0.08865,0.14176,0.31556,925.65749
+```
+
+##### Contig Score
+
+Stored in the `contigs.csv` file, each contig gets assigned a score by measuring how well it's supported by read evidence
+
+Four components to the score
+
+* Measure of correct base call
+* Measure of whether each base is part of the transcript
+* Probability that the contig is derived from a single transcript (and not pieces of two or more)
+* Probability that the contig is structurally complete and accurate
+
+##### Optimized Assembly Score
+
+Using contig scores, bad contigs are filtered out from your assembly, leaving only those that are well assembled
+
+* Done automatically, by learning contig score cutoff that maximizes assembly score
+* Good contigs determined by the above optimization are in the good.*.fa file
+* Bad contigs are in the bad.*.fa file
+
+##### Contig Metrics
+
+* Measured based upon analyzing the set of contigs themselves
+* Useful as a quick way of detecting massive issues with your assembly, namely very small or very large numbers of contigs
+ * those that are biologically improbable
+
+```
+Contig metrics:
+-----------------------------------
+n seqs 188 number of contigs in assembly
+smallest 201 size of smallest contig
+largest 5258 size of largest contig
+n bases 194198 number of bases included in assembly
+mean len 1032.97 mean length of the contig
+n under 200 0 number of contigs < 200 bases
+n over 1k 67 number of contigs > 1K bases
+n over 10k 0 number of contigs > 10K bases
+n with orf 79 number of contigs that had an ORF
+mean orf percent 69.84 for contigs with ORF, mean % of the contig covered by ORF
+n90 351 largest contig size at which at least X% of bases are contained in contigs at least this length
+n70 1289 *
+n50 2010 *
+n30 2711 *
+n10 4734 *
+gc 0.45 % of bases that are G or C
+bases n 0 number of N bases
+proportion n 0.0 proportion of bases that are N
+Contig metrics done in 0 seconds
+Calculating read diagnostics...
+```
+
+##### Read Mapping Metrics
+
+* Based upon aligning the reads used in assembly to the assembled contigs
+* The mapped reads contain a large amount of information specific to the organism that was sequenced, and this info can be leveraged to evaluate the confidence in each base and contig in your resulting assembly (you're essentially mapping the source material on to your transcriptome, looking to see how much was preserved/properly reconstructed)
+
+With the `--left` and `--right` option enabled, Transrate:
+
+* Maps the provided reads to the assembly using SNAP
+* Infers the most likely contig of origin for any multi-mapping reads with Salmon
+* Inspects the resulting alignment with transrate-tools and evaluates each contig in the assembly
+
+```
+Read mapping metrics:
+-----------------------------------
+fragments 535815 number of read pairs provided
+fragments mapped 396882 total number of read pairs mapping
+p fragments mapped 0.74 proportion of provided read pairs that mapped successfully
+good mappings 189659 number of read pairs mapping in a way indicating a good assembly
+p good mapping 0.35 proportion of the above
+bad mappings 207223 number of read pairs mapping in a way indicating a poor assembly
+potential bridges 41 number of potential links between contigs that are supported by the reads
+bases uncovered 12027 number of bases that aren't covered by any reads
+p bases uncovered 0.06 proportion of the above
+contigs uncovbase 143 number of contigs that contain at least one base with no read coverage
+p contigs uncovbase 0.76 proportion of the above
+contigs uncovered 40 number of contigs that have a mean per-base read coverage of < 1
+p contigs uncovered 0.21 proportion of the above
+contigs lowcovered 82 number of contigs that have a mean per-base read coverage of < 10
+p contigs lowcovered 0.44 proportion of the above
+contigs segmented 8 number of contigs that have a >=50% estimated chance of being segmented
+p contigs segmented 0.04 proportion of the above
+Read metrics done in 10 seconds
+No reference provided, skipping comparative diagnostics
+```
+
+What makes a "good" mapping?
+
+* Both member of the read pair are aligned
+* in proper orientation
+* on the same contig
+* without overlapping either end of the contig
+
+Your mapping is "poor" if any of the above metrics aren't met
+
+## TransDecoder
+
+Identifies candidate protein coding regions within transcripts, allowing you to stage sequences in blastp/blastx for functional discovery
+
+Based upon following criteria:
+
+* minimum length ORF found in transcript
+* log-likelihood score being > 0 (explain?)
+* above coding score is greatest when ORF is scored in the 1st reading frame in comparison to scores in the other 2 forward reading frames
+* if candidate ORF is found to be fully encapsulated by the coordinates of another candidate ORF, longer one is reported
+ * Single transcripts is able to report multiple ORFS (chimeras, etc)
+
+### Editing TransDecoder Config file
+
+Point the `transcript_file` entry to your transcriptome
+
+* `transcript_file=/home/minota/minota_work/trinity_out/Trinity.fasta`
+* `output_dir=/home/minota/minota_work/transdecoder_out`
+
+### TransDecoder LongORFS
+
+First we need to extract the ORFS from our transcript assembly
+
+* TransDecoder defaults to identifying ORFs that are at least 100 AA long (can be modified to be lower)
+ * lowering this can increase the rate of false positive ORF predictions using shorter thresholds
+
+```
+longest_orfs.pep : all ORFs meeting the minimum length criteria, regardless of coding potential.
+longest_orfs.gff3 : positions of all ORFs as found in the target transcripts
+longest_orfs.cds : the nucleotide coding sequence for all detected ORFs
+
+longest_orfs.cds.top_500_longest : the top 500 longest ORFs, used for training a Markov model for coding sequences.
+
+hexamer.scores : log likelihood score for each k-mer (coding/random)
+
+longest_orfs.cds.scores : the log likelihood sum scores for each ORF across each of the 6 reading frames
+longest_orfs.cds.scores.selected : the accessions of the ORFs that were selected based on the scoring criteria (described at top)
+longest_orfs.cds.best_candidates.gff3 : the positions of the selected ORFs in transcripts
+```
+
+Also have the option to identify ORFs that have homology to known proteins through blastp queries
+
+### TransDecoder Predict
+
+Once you've extracted your ORFS, you can predict likely coding regions using TransDecoder.Predict
+
+* Normally, final set of candidate coding regions have the flag '.transdecoder' and consist of extensions .pep, .cds, .gff3, and .bed
+
+```
+transcripts.fasta.transdecoder.pep : peptide sequences for the final candidate ORFs; all shorter candidates within longer ORFs were removed.
+transcripts.fasta.transdecoder.cds : nucleotide sequences for coding regions of the final candidate ORFs
+transcripts.fasta.transdecoder.gff3 : positions within the target transcripts of the final selected ORFs
+transcripts.fasta.transdecoder.bed : bed-formatted file describing ORF positions, best for viewing using GenomeView or IGV.
+```
+
+## CD-HIT-EST
+
+The next tool we'll work with is CD-HIT-EST
+
+A program (from the CD-HIT suite) that is primarily used to cluster and compare nucleotide sequences, massively reducing the amount of computational cycles required for downstream tasks
+
+Clusters nucleotides sequences that match some similarity threshold, building a fasta file of representative sequences (reducing redundancy) and a text file of the clusters
+
+Before starting to edit the config file, create a new directory called `cd_hit_est_out` under `minota_work`
+
+### Editing CD-HIT-EST config file
+
+Once open in `nano`, point the *fasta* parameter to the location of your Trinity assembly (*Trinity.fasta*)
+
+* `fasta=/home/minota/minota_work/trinity_out/Trinity.fasta`
+
+The location and name of your output files
+
+* `output=/home/minota/minota_work/cd_hit_est_out/jcoffman_001.reduced`
+
+Number of cores
+
+* 4
+
+And amount of memory
+
+* 32000
+
+Execute with: `cd_hit_est_singularity -p cd_hit_est_parameter_template.txt`
+
+#### Examining CD-HIT-EST output
+
+Next lets `cd` into our new directory and examine the contents
+
+Our script generates two files:
+
+* jcoffman_001.reduced (reduced transcript file)
+* jcoffman_001.reduced.clstr (list of clusters found)
+
+```
+>Cluster 0
+0 5258aa, >TRINITY_DN35_c0_g1_... *
+>Cluster 1
+0 4734aa, >TRINITY_DN102_c0_g1... *
+>Cluster 2
+0 3306aa, >TRINITY_DN27_c0_g2_... at 99.64%
+1 3306aa, >TRINITY_DN27_c0_g2_... at 99.18%
+2 3749aa, >TRINITY_DN27_c0_g2_... *
+3 3749aa, >TRINITY_DN27_c0_g2_... at 99.65%
+4 247aa, >TRINITY_DN76_c0_g1_... at 94.33%
+>Cluster 3
+0 1953aa, >TRINITY_DN47_c0_g1_... at 97.54%
+1 3732aa, >TRINITY_DN47_c0_g1_... *
+```
+
+The tool has two modes, global and local
+
+* global computes seq identify from the number of identical bases divided by the length of a chosen sorter sequence
+* local computes it as the number identical bases divided by the length of the alignment
+
+Sequence identity needs to exceed a sequence identity threshold to relate that two sequences are part of the same cluster
+
+Used as a reduction technique, though you run the risk of merging together biologically interesting but similar sequences
+
+* Upshot is drop potentially redundant isoforms that would impact the quality of your assembly
+
+### Re-run Transrate on reduced dataset
+
+To examine how the quality of our transcriptome has been improved by clustering, lets execute transrate on our CD-HIT_EST output
+
+Inside of the `cd_hit_est_out` directory, copy down your transrate config file
+
+* update your `assembly` path to point to your cd_hit_est output: `/home/minota/minota_work/cd_hit_est_out/jcoffman_001.reduced`
+* add `cd_hit_est_out` to your `output` path
+
+Run, and now lets compare the two reports!
+
+## CD-HIT
+
+CD-HIT clusters proteins that meet some similarity threshold, usually a sequence identity
+Each cluster has a representative sequence, with the input being a protein dataset in fasta format
+It outputs a fasta file of representative sequences and a text file of list of clusters
+
+Before starting to edit the config file, create a new directory called `cd_hit_out` under `minota_work`
+
+* Once open in `nano`, point the *fasta* parameter to the location of your TransDecoder predicted protein output (*Trinity.fasta.transdecoder.pep*)
+* `fasta_file=/home/minota/minota_work/transdecoder_out/Trinity.fasta.transdecoder.pep`
+* `output_file=/home/minota/minota_work/cd_hit_out/jcoffman_001.reduced`
+* `num_threads=4`
+* `memory_limit=32000`
+
+### Examining CD-HIT Output
+
+```
+>Cluster 0
+0 1700aa, >TRINITY_DN50_c0_g1_... *
+>Cluster 1
+0 1338aa, >TRINITY_DN14_c0_g1_... at 99.70%
+1 233aa, >TRINITY_DN14_c0_g1_... at 99.14%
+2 1620aa, >TRINITY_DN14_c0_g1_... *
+3 1338aa, >TRINITY_DN14_c0_g1_... at 99.33%
+4 1338aa, >TRINITY_DN14_c0_g1_... at 99.63%
+5 1620aa, >TRINITY_DN14_c0_g1_... at 99.81%
+>Cluster 2
+0 1108aa, >TRINITY_DN0_c0_g1_i... at 97.20%
+1 1122aa, >TRINITY_DN0_c0_g1_i... *
+2 153aa, >TRINITY_DN0_c0_g1_i... at 90.20%
+3 196aa, >TRINITY_DN0_c0_g1_i... at 96.43%
+4 940aa, >TRINITY_DN0_c0_g1_i... at 98.83%
+5 486aa, >TRINITY_DN0_c0_g1_i... at 95.68%
+```
\ No newline at end of file
diff --git a/dep/minota-assembly.md b/dep/minota-assembly.md
deleted file mode 100644
index 7d24bb4..0000000
--- a/dep/minota-assembly.md
+++ /dev/null
@@ -1,50 +0,0 @@
-# Introduction to De Novo Transcriptome Assembly
-
-Welcome back to the MDIBL MINOTA Workshop! Today we're going to be discussing De Novo Transcriptome Assembly. We'll start by a brief introduction to the rationale behind assembly, followed by a review of some tools that are commonly implemented.
-
-We'll be focusing primarily on Trinity, the main assembler for this course; delving into how it functions, and its algorithmic processes. Included will be a brief review of the importance of pre-processing your data before assembly, as well as some utilities that come built into the software package.
-
-## De novo Transcriptome Assembly: Trinity
-
-## Trinity: Overview
-
-We'll first start with quick overview, before we dive into the details of how Trinity as a whole functions. Trinity takes its name from the three software modules that make up the core of the assembler.
-
-The first component is Inchworm, and is responsible for assembling sequencing data into linear contigs. Next is Chrysalis, which groups constructed contigs that are related, (either due to alternative splicing or gene duplication), and builds de bruijn graphs. Lastly, Butterfly is examines input reads in the context of the de bruijn graphs, reporting the complete and final full-length transcripts and isoforms of transcripts.
-
-## Inchworm Algorithm
-
-Inchworm first decomposes reads into a catalog of overlapping kmers (overlapping 25-mers by default). This portion is very similar to the initial step of constructing a de Bruijn graph, though one isn't actually being built. The kmers are stored, along with frequency in reads. Edges between the kmers are not, to save on memory + computational resources.
-
-The single-most abundant kmer that has some level of reasonable sequence complexity is identified as a "seed" kmer, which is then extended at the 3' end and guided by the coverage of overlapping kmers. For every extension that occurs, there exist four possible kmers, with each ending with one of the four nucleotides.
-
-Each of the possible overlapping kmers is then looked up in the kmer catalog to determine frequency in reads
-For this specific example, the kmer ending with 'G' is found 4 times
-
-'A' is found once
-
-The kmer ending with 'T' doesn't exist in the reads, so its count is 0
-
-The kmer ending with 'C' is found 4 times
-
-Now a tie exists between 'G' and 'C'
-
-When a tie is encountered, the graph branches out, and the tied paths are explored recursively to locate the extension that provides the highest cumulative coverage
-
-In this case, the extension of two overlapping kmers ending with an 'A' provides the highest cumulative coverage, and the other paths are ignored
-
-
-
-
-
-
-
-## Contact
-If you have questions about the information in this workshop document, please contact:
-
-```
-Nathaniel Maki
-Bioinformatics Training Specialist
-MDI Biological Laboratory
-nmaki[at]mdibl.org
-```
\ No newline at end of file
diff --git a/intro_to_WSL.md b/intro_to_WSL.md
deleted file mode 100644
index b8f1860..0000000
--- a/intro_to_WSL.md
+++ /dev/null
@@ -1,132 +0,0 @@
----
-title: Introduction to the Windows Subsystem for Linux
-author: "Nathaniel Maki"
-contact: nmaki@mdibl.org
-organization: MDIBL Computational Core
-date: "May 28th"
----
-# Introduction to the Windows Subsystem for Linux
-
-## Learning Objectives
-* Get introduced to PowerShell
-* Learn how to enable the WSL feature-set
-* Install the Ubuntu Linux distribution from the Microsoft Store
-* Launch Ubuntu Linux and install a small piece of software
-
-## Summary
-* PowerShell, an integrated Command Line Interface (CLI) within Windows, has the capability to connect to remote machines via SSH
-* For Windows users, PowerShell acts as a robust test environment for basic command line proficiency to be established
-* If looking for a more comprehensive "Unix-on-Windows" experience, install and work within the Windows Subsystem for Linux (WSL)
-
-## A Couple Caveats
-
-Before we begin, there are a couple things that I'd like to point out, mostly to save you time.
-
-**If all you're looking to do is to remote into a server via SSH, then enabling WSL and installing Ubuntu Linux is probably overkill.**
-* If your Windows 10 release version is **1809** or higher, and you are running **PowerShell 5.1** or higher, you already have SSH capabilities
-* This means that you can remotely connect to a server via PowerShell without any additional installations
-
-### Launching PowerShell
-
-#### Open the start menu
-
-
-#### Search for powershell
-
-
-#### Select from the menu
-
-
-* To check which version of PowerShell you have installed, run the following command:
- * `Get-Host | Select-Object Version`
-* If you meet all of the criteria above, open a PowerShell window and remote away!
-
-* Additionally, if you'd like a nice quality of life addition, you can install Windows Terminal from the Microsoft Store
- * Windows Terminal is a fast, customizable, and modern terminal application, specifically built for Windows 10
- * It acts as a hub of sorts for Command Prompt, PowerShell, and WSL
- * Natively supports many terminal features that can be found on MacOS/Linux distributions
-
-With that out of the way, let's begin!
-
-## Windows Subsystem for Linux
-
-
In a nutshell, WSL lets you execute a Linux environment directly in Windows. No virtual machine or VM software required. - -At the moment, you are constrained to command-line tools and applications, though Microsoft is working to bring full GUI -applications to the platform. - -This lets you leverage the wide range of professional programs developed for Windows, along with -the vast repositories of free and open source software built on and for Linux.
- -* WSL allows for the running of many common command-line tools, including: -* `grep`,`awk`,`sed` -* Execution of Bash shell scripts -* Linux CLI apps like tmux, vim, and emacs -* Language support for Python, Ruby, NodeJS, etc -* Utilization of the selected Linux distributions package manager: installation of additional software/tools - -## Enabling WSL - -**WSL is supported on Windows 10 version 1709 and higher** - -**If you are running an earlier build, you need to update your system to gain access to this feature** - -With the above requirement met, turning on WSL is pretty straightforward, especially as it comes baked into Windows. - -### If you are comfortable working on the command line -* Open an elevated (administrator) PowerShell window - * To do so, right click on the **Windows PowerShell** application and select *Run as administrator* -* Next, paste the following code into your open, elevated PowerShell window: - * `dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart` -* Reboot your computer - -### If you'd rather use the Graphical User Interface (GUI) - -#### Open the start menu and search for *turn Windows features on or off* -
-
-#### Scroll down and check the box marked *Windows Subsystem for Linux*
-
-
-#### Click OK and Reboot!
-
-## Installing a Linux Distribution
-
-#### Enabling WSL is the first step, the second being to select a Linux distribution to install from the Microsoft Store
-
-Microsoft offers a few distributions of Linux for install on its store, including Debian, OpenSUSE, Ubuntu, and Kali. For this example, we're going to work with Ubuntu.
-
-Developed by the company Canonical, Ubuntu is one of the most widely used and supported Linux distributions, in part due to it's stability and robust feature-set. It's LTS versions are supported for up to 5 years after launch!
-
-There are a couple LTS Ubuntu distributions on the store, but we're going to want the one marked simply **Ubuntu**, with no numbers next to the name.
-
-This is built off of the most recent release of the OS, 20.04, and came out in April of this year.
-
-#### Open up the Microsoft Store and search "Ubuntu"
-
-
-* From there, just click *install*, and once it finishes, *launch*
-* Follow along with the on screen prompts (creating a username/password, etc)
-* **Congratulations! You've successfully installed and configured Ubuntu Linux on your Windows machine!**
-* For a quick test-run, try installing the program Neofetch
- * Neofetch is a small tool that gives you an overview of your system resources, along with a neat little distribution-specific ASCII graphic
- * First, add the repository (software source): `sudo add-apt-repository ppa:dawidd0811/neofetch`
- * then run `sudo apt-get install neofetch`
-
-
-
-#### Navigating to the Windows Filesystem
-* By default, when launching your installed Linux distribution through WSL, you will be placed in your instance-specific home directory
-* To work with files/folders that exist on your Windows Filesystem, use the following command: `cd /mnt/c/`
-
-## Contact
-
-If you have questions about the information in this workshop document, please contact:
-
-```
-Nathaniel Maki
-Bioinformatics Research Training Specialist
-MDI Biological Laboratory
-nmaki[at]mdibl.org
-```
-
diff --git a/wsl_images/ms-store_ubuntu.png b/wsl_images/ms-store_ubuntu.png
deleted file mode 100755
index c81550d..0000000
Binary files a/wsl_images/ms-store_ubuntu.png and /dev/null differ
diff --git a/wsl_images/on-off_windows.png b/wsl_images/on-off_windows.png
deleted file mode 100755
index 03ab5db..0000000
Binary files a/wsl_images/on-off_windows.png and /dev/null differ
diff --git a/wsl_images/powershell.png b/wsl_images/powershell.png
deleted file mode 100644
index d70497c..0000000
Binary files a/wsl_images/powershell.png and /dev/null differ
diff --git a/wsl_images/search_powershell.png b/wsl_images/search_powershell.png
deleted file mode 100644
index 1ea5d45..0000000
Binary files a/wsl_images/search_powershell.png and /dev/null differ
diff --git a/wsl_images/start_menu.png b/wsl_images/start_menu.png
deleted file mode 100644
index ab30037..0000000
Binary files a/wsl_images/start_menu.png and /dev/null differ
diff --git a/wsl_images/ubuntu_terminal.png b/wsl_images/ubuntu_terminal.png
deleted file mode 100755
index 23157e6..0000000
Binary files a/wsl_images/ubuntu_terminal.png and /dev/null differ
diff --git a/wsl_images/wsl_box.png b/wsl_images/wsl_box.png
deleted file mode 100755
index 5feb088..0000000
Binary files a/wsl_images/wsl_box.png and /dev/null differ