Genomic screen for structured RNAs in Drosophila melanogaster and related insects

Background

Computational screens for conserved RNA structures (CRSs) have been performed in Drosophila melanogaster in the past using sliding window approaches on sequence-based whole genome alignments. However, these alignments can be prone to misalignments in regions with low sequence identity, as it is often observed for regions encoding non-coding RNAs and structured RNA elements. Here, we make use of the structural realignment approach provided by CMfinder, identifying ~30,000 CRSs in D. melanogaster and 26 related insects. In addition, we observed differential expression of CRSs across the complete D. melanogaster development and in several cell lines.

Data

Field nr Header Description
01 chrom Chromosome
02 start Start coordinate (dm6)
03 end End coordinate (dm6)
04 motifid CRS motif identifier
05 pscore Pscore
06 strand Strand
07 energy Average free energy of sequences in the alignment (kcal/mol)
08 len Alignment length (bp)
09 num Number of species in the alignment
10 seqid Average pairwise sequence identity of the alignment (%)
11 gc GC content of the alignment (excluding gaps)
12 fdr Estimated false discovery rate
13 realign Realignment score (0 = no realignment, 1 = total realignment)
14 bp Number of base pairs
15 infcont Average mutual information content across all columns
16 entropy Average relative entropy (Kullback-Leibler divergence) across all columns
17 comp Number of compensatory base changes
18 seq dm6 motif sequence
19 struc dm6 motif structure
20 region Identifier of the merged CRS region the CRS motif belongs to
Field nr Header Description
01 chrom Chromosome
02 start Start coordinate (dm6)
03 end End coordinate (dm6)
04 regionid CRS region identifier
05 phastcons PhastCons score
06 anno All annotation classes the CRS region overlaps with (min. 1bp)
07 expnr Number of modENCODE tiling array experiments in which the CRS regions is expressed by at least 50% of its size
08 coexpr Co-expression score of intergenic CRS regions that are expressed in at least 3 modENCODE tiling array experiments (min. 50% size of CRS region)

Reference

Identification and characterization of novel conserved RNA structures in Drosophila
Kirsch R, Seemann SE, Ruzzo WL, Cohen SM, Stadler PF, Gorodkin J BMC Genomics. 2018 Dec 11;19(1):899
[ PubMed | Paper | Dataresource ]


Contact

Jan Gorodkin RTH, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark

Release history

The current release is 2.0.
Data version Release date Comments
Version 2.0 2018-07-27 Searchable interface.
Version 1.0 2018-02-19 Predicted CRSs in dm6 and 27-species tree.