2.4 Annotation

Description

A functional annotation was then performed on the bins, in order to see what functional genes they might consist of.

Method and tools

The annotation was based on the software pipeline Prokka, that combines both structural and functional annotations of prokaryotic genomes. The command was run with the --force, that forces the output to overwrite any old existing one, and --addgenes which adds gene features to each CDS feature. The bins that were labeled as potential Archaea also had the additional --kingdom Archaea option enabled, as the default for Prokka is 'Bacteria'. Script performing this step was 07_combined_annotation.sh.

Results

The annotation resulted in each bin receiving several files describing the amount of genes identified and their potential function. A summary of these results is shown in table 1.

Table 1. Summary of annotation results

Bin name	CDS	tRNA	rRNA	tmRNA	Hypothetical proteins
bin_1	1819	29			1819
bin_2	2418	32			953
bin_4	1182	40	1		610
bin_6	1289	34			582
bin_8	1530	9			633
bin_11	2872	52	2	1	1316
bin_12	2415	36		1	812
bin_14	1545	9		1	463
bin_15	1984	38	1		874
bin_17	1170	13			540
bin_18	1385	25		1	456
bin_19	1717	31	6	1	515
bin_20	688	11			344
bin_24	1508	36	1	1	584
bin_25	1297	11			575
bin_26	1405	36	1	1	575

Discussion

It hard to evaluate how well the annotation performed, but the results look promising and will most likely be good enough to progress further.

Questions from student manual

Annotation

What types of features are detected by the software? Which ones are more reliable a priori?
Prokka has detected the features: CDS, tRNA, rRNA and tmRNA. From these I would say that the highly conserved genes of rRNA and maybe tRNA/tmRNA are the most reliable. The highly conserved nature of these types of genes should make them easier to correctly identify.

How many features of each kind are detected in your contigs? Do you detect the same number of features as the authors? How do they differ?
The number of features per bin is seen in table 1, but the original article haven't published these results so no comparisons can be made.

Why is it more difficult to do the functional annotation in eukaryotic genomes?
The presence of splicing makes it harder to classify the function genes, and could even make it harder to identify the correct ORF. This is primarily because of alternate splicing and the chance of stop codons appearing within introns

How many genes are annotated as ‘hypothetical protein’? Why is that so? How would you tackle that problem?
The amount of genes annotated as a hypothetical protein for each bin is shown in table 1. These genes have been identified as possibly coding for proteins, but with no matches to any of the proteins in Prokka's databases. A way to tackle this problem would be to compare these sequences to other databases that might have them on record. It's even a possibility to compare to other species has functional genes often are to some degree conserved. A BLAST search could be an option for this.

How can you evaluate the quality of the obtained functional annotation?
You could compare your results to earlier studies or public databases that might contain functional region of genomes of the same species or closely related ones. Another option is to simply use your biological knowledge to assess how plausible the annotation is according to its context. But the most accurate, and extensive, method of evaluation the quality of the predicted annotation is to perform actual in vivo tests, like for example knock-out studies.

How comparable are the results obtained from two different structural annotation softwares?
They are directly comparable when looking at prokaryotic genes as each gene only results in one type of product and function. Exceptions for this is multipurpose proteins.

It becomes a bit more complex for eukaryotic genes because of the possibility of alternate splicing. This makes it possible for a singular gene to produce several proteins of different functions. Each software might thus identify different functions of genes even though they are both correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.4 Annotation

Description

Method and tools

Results

Discussion

Questions from student manual

Annotation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally