Conserved RNA Structures (CRSs) in the vertebrate genome
This resource is based on our computationally screen of the human centered 100-way vertebrate sequence alignment from UCSC Genome Browser for Conserved RNA secondary Structures (CRS) with CMfinder. On this page you find info about how to navigate the web site and where the data is coming from.
Selected new features
- We provide a redundancy reduced set of CRSs with FDR≤10% and with maximal 40% of base pairs in common between two overlapping CRSs (originating from the same CRS region). This resource of 127,942 CRSs has been exported to RNAcentral . It is named as Low FDR and non-redundant CRSs.
- The selection ratio has been improved since the publication of the resource, including an update of the local neutral model and introduction of false discovery calculation of the selection ratio (FDR(SR)).
- Covariation analysis with R-scape for an independent test of evolutionary support of the conserved RNA structure of CRSs.
Help about the "Search" page
The data source can be searched for CRSs that
- are inside a genomic region of any of the vertebrate genomes of UCSC's 100 species tree (default: Human assembly hg38),
- are near (including overlapping) to a human gene (allows partial matches to gene symbols in the GENCODE annotation), and
- have specific structure/alignment features.
The list of CRSs matching the query will be separated into
Low FDR and non-redundant CRSs and
High FDR and redundant CRSs. The CRSs are listed in sortable tables of their genomic locations, FDR and any additional queried features. By default the table is sorted by genomic coordinates.
In case you know the CRS's or CRS region's identifier you can also directly search for it (this will ignore all other selections). If you want to look into more than one CRS or CRS region identifier then list them all separated by ','.
Use the
[Example] links to get started. Hover over the
[?] behind the different query fields with the mouse cursor to get more information!
Help about the "Result" page
Hint: Additional information to many of the annotation tracks is available by hover over the annotation titles with the mouse cursor!
[UCSC Genome browser]
Link to the UCSC genome browser to the hg38 location of the respective CRS. The link also adds the project's track hub with the Hub Name
CMfinder vert screen for Assembly
hg38 consisting of the tracks
CaptureSeq (CRS targeting CaptureSeq) with subtracks
42deg (annealing temperature of 42 degrees) and
70deg (annealing temperature of 70 degrees),
CMfinder CRSs (CMfinder predicted CRSs (pscore>50)) and
probes (60nt long CaptureSeq probes).
[CRS region]
Link shows a sortable table of all CRSs that are located in the same CRS region (region of overlapping CRSs).
+ Summary
General features of the CRSs. Alignment specific features have been retrieved from both the
100 species tree and
17 species tree.
+ Consensus structure
CMfinder predicted consensus secondary structure is shown as dot-bracket notation and as interactive drawing by
forna for the
100 species tree alignment. Bases in the interactive structure are named as consensus sequence and colored by relative entropy (also called Kullback–Leibler divergence) in bits (from 0 as purple to 2 as dark red). As reference base probabilities we use 0.25 for all four nucleotides. The structure mapped to the
17 species tree alignment is available as
PNG and its bases are named as human sequence and
colored by the base pair conservation in the 17 species tree alignment.
+ Alignment
CRS alignment based on the CMfinder predicted covariance model for both the
100 species tree and the
17 species tree (the latter is a subset of the former with gap columns being removed). The interactive alignments use the
JSAV - JavaScript Sequence Alignment Viewer. It enables the user to hide and select sequences, change the color scheme and retrieve the selected sequences as
FASTA file. In the
FASTA file all the gap columns (which ocurred due to the removal of sequences) are removed. The genomic coordinates of the sequences in the CRS alignment can be seen in the viewer and can be retrieved as
TXT by the
Export Fasta button. In addition, the alignments can be retrieved in
STOCKHOLM format or as static
PNG image
colored by the base pair conservation in the respective alignment.
+ Overlap with RNAcentral
We have integrated CRSs with a false discovery rate lower or equal to 10% into
RNAcentral using their genomic locus in 29 vertebrate species. CRSs were excluded if they match to known structured RNAs from Rfam. in this Result subsection, you find the overlap of non-coding RNA (ncRNA) sequences annotated in
RNAcentral.
+ Annotation
Gene annotation in human on both strands based on
GENCODE Release 25.
+ Evolutionary selection
Several tests for negative selection of CRSs. The selection ratio has been improved since the publication of the resource, including an update of the local neutral model and false discovery calculation of the selection ratio (FDR(SR)), and will be presented in another manuscript soon.
+ Covarying base pairs
Covariation analysis with
R-scape for an independent test of evolutionary support of the conserved RNA structure of CRSs. Two independent covariation tests are performed, one on the base pairs in the CRS consensus structure compared to the conserved primary sequence, the other on all other possible pairs. We run R-scape v1.5.16 on the 100 species tree alignment with parameters
-s --GTp --C16. The consensus structure is visualized with the significant covarying base pairs (E-value<0.05) highlighted in green (
R2R drawing).
+ Chromatin signatures
DNase hypersensitive sites (from 95 cell types in ENCODE) and Genome Segmentations based on chromatin modifications (from 6 cell lines in ENCODE) in human.
+ Protein binding sites
ChIP and CLIP-seq in human.
+ Expression
Poly-A selected RNA-seq, total RNA-seq and targeted total RNA-seq in human (our CaptureSeq experiment in human fetal brain targeting CRSs with 60 nt probes).
+ Transcript boundaries and stability
Closest upstream 5' site (CAGE) and downstream poly-A site (3'-end sequencing), and exosome sensitivity (CAGE) in human.