Changes from version 1.02 to version 1.03

The original article used a pre-release version of one of the annotation tools, snoStrip. During the proof reading, the snoStrip tool was released and published [1]. In this note, we bring the results of our original paper in accordance with the release version of snoStrip. Only the snoRNA annotation is affected, and no changes in conflicts of annotation is induced by the update. In summary, we found 621 high confident snoRNA loci using the combined snoStrip, BLAST, and Infernal-1.02/Rfam 10.1 annotation tools, compared to 638 high confident snoRNA loci when using the pre-release version of the snoStrip tool. The difference between this and the previous annotation is caused by the removal of 32 snoRNA loci, and the addition of 16 new loci, all of them annotated exclusively by the pre-release and the release versions of snoStrip, respectively. Two snoRNA loci (snoU2-30 and snoU2-19) from the previous version were combined to one locus (SCARNA9) in the current version.

The curation of the snoRNAs was also revisited. The newly annotated snoRNA loci were curated according to the methods outlined in the article, which results in 24 new curated snoRNAl loci, while 3 curated snoRNA loci where removed from the high confident set, and thus also from the curated set (these are syntenic pseudogenes in human and pig). SCARNA9, which was comprised of two different snoRNA annotations in the previous version, was curated as well.

As a final note, an rRNA incorrectly marked as curated in the previous annotation were removed from the curated set of genes in the current version.

The updated annnotation as well as a more detailed changelog is available on http://rth.dk/resources/rnannotator/susscr102/version1.03

snoStrip, is an annotation pipeline designed specifically for homology based annotation of snoRNAs, taken in to account factors beyond sequence and structure conservation. During the development phase a number of internal cutoffs and procedures were changed, which changed the results we originally used for the publication. New snoRNAs loci were introduced and others were removed. The original annotation was based in part on 513 snoStrip results. The release version changed this total to 504 loci, which is the result of the removal of 67 loci and the addition of 58 new ones. Based on hand curation we decided to keep 12 of the 67 loci, which cover two otherwise lost snoRNAs, SNORD89 and SNORD117, as well as four members of SNORD113, four members of SNORA70, and two members of SNORA22. That is 516 snoStrip-based annotations for pig in total.

The snoRNA annotation presented in the paper was a combination of high confident annotation based on snoStrip, BLAST, and Infernal 1.02/Rfam 10.1. In Table 2, we present the updated results of the homology based pipeline. After combination of the three methods we found 621 annotated snoRNA loci, compared to 638 snoRNA loci using the pre-release version. This change in the total number of annotated snoRNAS reflects that the release version of snoStrip has a larger overlap with Infernal/Rfam than the pre-release version, and therefore introduces fewer loci annotated exclusively by snoStrip. The number of conserved(syntenic) structured RNA loci and the number of structured RNA loci overlapping RNAz loci are also affected by the changes in the snoRNA annotation. The effect can be seen in the updated Tables 5 and 6. The curated annotation were updated as well as shown in Table 7.


Table 2: Results of the homology based pipeline
 RNA class
High



  Families Loci 



 cisreg-elements 31 139 
 lncRNA-loci 58 58 



 miRNA 321 359 
 ribozyme 3
 rRNA 5 185 
 snoRNA 209 621 
 snRNA 101,030 
 tRNA 51 810 
 Other 7 153 
 Conflict 9 11 



 Sum 7043,374 
 
Updated Table 6 of the original article. Changed numbers are marked in italics in the table. The changes are the results of the updated snoStrip snoRNAs. The combined results of the sequence similarity search, structure homology search and class specific tools at the high confident cutoff level (See Table 1). The column RNA class contains cisreg-elements: cis-regulatory elements from Rfam/Infernal; lncRNA-loci: Infernal lncRNA structure loci; the next 7 rows contain (full length) ncRNA genes, miRNA: BLAST from miRBase and miRDeep predictions; ribozyme: ribozymes from Rfam/Infernal; rRNA: ribosomoal RNAs primarily from RNAmmer; snRNA and snoRNA: BLAST results and results from Infernal/Rfam; tRNA: tRNAs tRNAs from BLAST; tRNAscan-SE and Infernal/Rfam; lncRNA-loci: structural loci from larger genes(lncRNAs); other: RNA families from Rfam not belonging to one of the other classes; conflict: conflicts of annotation. Loci are the number of RNA loci of a given class; Families are a subdivision of classes into RNAs with the same name. 12 tRNAs and 15 miRNAs were moved to the medium confident annotation as part of the curation procedure. See text for details. Note that for the final high confident annotation we add 165 RNA-seq based miRNA candidates, reaching the total of 3,539 high confident RNA loci.


Table 5: Synteny of the RNAs of the homology based pipeline
 
RNA class
# loci
hg19
RNAs conserved in N other organisms
  SyntenicConservedBoth 1 5 15 








 cisreg-elements 139 80 84 65 116 86 31 
 lncRNA-loci 58 57 53 53 58 57








 miRNA 369 303 349 292 360 349 102 
 putative-miRNA 155 121 25 20 65 25
 ribozyme 8 8 3 3 3 3
 rRNA 185 143 0 0 6 1
 snoRNA 621 457 268 225 406 286 42 
 snRNA 1,030 674 24 13 119 26
 tRNA 810 549 274 199 389 284 14 
 other 153 111 10 10 17 11








 Conflict 11 10 10 10 10 10
 Sum 3,539 2,513 1,100 8901,5491,138 205 
 
The columns are, RNA class, # RNA loci. # loci in human syntenic blocks, # loci conserved in human by 80% sequence identity. # loci both syntenic and conserved that is the number of ncRNAs in syntenic blocks where the ncRNA is actually conserved in human. # loci RNAs conserved in N other organisms, grouped by number of loci conserved in at least 1, 5, or 15 other organisms. Conservation is determined by the sequence identity in the pairwise alignments. The RNA loci are located in the pairwise alignments and the sequence identity is calculated when at least 80% of an RNA locus is covered. The RNA locus is counted as conserved in that organism if the locus has a sequence identity of at least 80%. In the table the number of RNAs conserved in at least N (N=1; N=5 or N=15) of the other genomes: bosTau5, canFam2, choHof1, danRer7, dasNov2, echTel1, equCab2, eriEur1, felCat4, galGal3, hg19, loxAfr3, mm9, monDom5, ornAna1, rn4, oryCun2, tarSyr1, turTru1, xenTro2


Table 6: Overlap of the RNAz predicted de novo with the high confident annotation
 
RNA class
Annotation
RNAz overlap
 
Families
Loci
Families
Loci





 cisreg-elements 31 139 4 10 
 lncRNA-loci 58 58 3





 miRNA 330 369 222241 
 putative-miRNA 135 155 14 16 
 ribozyme 3 8 0
 rRNA 5 185 2
 snoRNA 209 621 57 74 
 snRNA 101,030 9 20 
 tRNA 51 810 36154 
 other 7 153 3





 conflict 9 11 4
 sum 8483,539 354530 
 
Updated Table 6 of the original article. Changed numbers are marked in italics in the table. The changes are the results of the updated snoStrip snoRNAs. Comparison of the strand specific RNAz results with the result of the automatic annotation pipeline. The columns are, RNA class, # RNA families in the high confident annotation, # RNA loci in the high confident annotation, # RNA families that overlap with the RNAz predictions, # RNA loci that overlap with the RNAz predictions.


Table 7: Curated annotation
# high confident loci# curated loci# pseudogenes# loci,#pseudogenes subtracted






cisreg-elements 139 93 31 108
lncRNA-loci 58 0 0 58






miRNA 369 125 0 369
putative-miRNA 155 0 0 155
ribozyme 8 1 5 3
rRNA 185 2 182 2
snoRNA 621 289 251 370
snRNA 1,030 69 960 70
tRNA 810 0 0 810
Other 153 3 125 28






Conflict 11 8 0 11






Sum 3,539 590 1,555 1,984






Updated Table 7 of the original article. Changed numbers are marked in italics in the table. The changes are the results of an incorrectly curated rRNA in the original annotation and of the updated snoStrip snoRNAs. The high confident annotation is a combination of the results of the high confident homology pipeline and the miRDeep results. See Table 2 for row labels. Column labels: high confident is the high confident annotation prior to curation; curated are the number of loci curated by methods explained in the text; pseudogenes are the loci expected to be PolII/PolIII transcript, but failing to be so, ribosomal RNAs not part of the cluster on chromosome 6, and cis-regulatory elements without gene context. In the column overlaps to structured RNA loci annotated by homology as well as putative miRNAs are given. Curated annotation contains loci that are a) curated or b) loci not tested in the curation procedure, e.g., miRNA loci. High confident: is the complete high confident annotation (homology + miRDeep). 8 miRNAs detected by miRDeep, but not by high confident BLAST where re-annotated in the section miRNAs in the pig genome.

References

1.    Bartschat S, Kehr S, Tafer H, Stadler PF, Hertel J: snoStrip: A snoRNA annotation pipeline. Bioinformatics 2014, 30(1):115–116.