Genomic screen for structured RNAs in Drosophila melanogaster and related insects
Background
Computational screens for conserved RNA structures (CRSs) have been performed in Drosophila melanogaster in the past using sliding window approaches on sequence-based whole genome alignments. However, these alignments can be prone to misalignments in regions with low sequence identity, as it is often observed for regions encoding non-coding RNAs and structured RNA elements. Here, we make use of the structural realignment approach provided by CMfinder, identifying ~30,000 CRSs in D. melanogaster and 26 related insects. In addition, we observed differential expression of CRSs across the complete D. melanogaster development and in several cell lines.
Data
Field nr |
Header |
Description |
01 |
chrom |
Chromosome |
02 |
start |
Start coordinate (dm6) |
03 |
end |
End coordinate (dm6) |
04 |
motifid |
CRS motif identifier |
05 |
pscore |
Pscore |
06 |
strand |
Strand |
07 |
energy |
Average free energy of sequences in the alignment (kcal/mol) |
08 |
len |
Alignment length (bp) |
09 |
num |
Number of species in the alignment |
10 |
seqid |
Average pairwise sequence identity of the alignment (%) |
11 |
gc |
GC content of the alignment (excluding gaps) |
12 |
fdr |
Estimated false discovery rate |
13 |
realign |
Realignment score (0 = no realignment, 1 = total realignment) |
14 |
bp |
Number of base pairs |
15 |
infcont |
Average mutual information content across all columns |
16 |
entropy |
Average relative entropy (Kullback-Leibler divergence) across all columns |
17 |
comp |
Number of compensatory base changes |
18 |
seq |
dm6 motif sequence |
19 |
struc |
dm6 motif structure |
20 |
region |
Identifier of the merged CRS region the CRS motif belongs to |
Field nr |
Header |
Description |
01 |
chrom |
Chromosome |
02 |
start |
Start coordinate (dm6) |
03 |
end |
End coordinate (dm6) |
04 |
regionid |
CRS region identifier |
05 |
phastcons |
PhastCons score |
06 |
anno |
All annotation classes the CRS region overlaps with (min. 1bp) |
07 |
expnr |
Number of modENCODE tiling array experiments in which the CRS regions is expressed by at least 50% of its size |
08 |
coexpr |
Co-expression score of intergenic CRS regions that are expressed in at least 3 modENCODE tiling array experiments (min. 50% size of CRS region) |
Reference
Identification and characterization of novel conserved RNA structures in Drosophila
Kirsch R, Seemann SE, Ruzzo WL, Cohen SM, Stadler PF, Gorodkin J*
BMC Genomics. 2018 Dec 11;19(1):899
[ PubMed | Paper | Dataresource ]
Contact
Jan Gorodkin RTH, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Release history
The current release is 2.0.
Data version |
Release date |
Comments |
Version 2.0 |
2018-07-27 |
Searchable interface. |
Version 1.0 |
2018-02-19 |
Predicted CRSs in dm6 and 27-species tree. |