Conserved RNA Structures (CRSs) in the vertebrate genome

This resource is based on our computationally screen of the human centered 100-way vertebrate sequence alignment from UCSC Genome Browser for Conserved RNA secondary Structures (CRS) with CMfinder. On this page you find info about how to navigate the web site and where the data is coming from.

Selected new features

Help about the "Search" page

The data source can be searched for CRSs that

  1. are inside a genomic region of any of the vertebrate genomes of UCSC's 100 species tree (default: Human assembly hg38),
  2. are near (including overlapping) to a human gene (allows partial matches to gene symbols in the GENCODE annotation), and
  3. have specific structure/alignment features.
The list of CRSs matching the query will be separated into Low FDR and non-redundant CRSs and High FDR and redundant CRSs. The CRSs are listed in sortable tables of their genomic locations, FDR and any additional queried features. By default the table is sorted by genomic coordinates.

In case you know the CRS's or CRS region's identifier you can also directly search for it (this will ignore all other selections). If you want to look into more than one CRS or CRS region identifier then list them all separated by ','.

Use the [Example] links to get started. Hover over the [?] behind the different query fields with the mouse cursor to get more information!

Help about the "Result" page

Hint: Additional information to many of the annotation tracks is available by hover over the annotation titles with the mouse cursor!

[UCSC Genome browser]

Link to the UCSC genome browser to the hg38 location of the respective CRS. The link also adds the project's track hub with the Hub Name CMfinder vert screen for Assembly hg38 consisting of the tracks CaptureSeq (CRS targeting CaptureSeq) with subtracks 42deg (annealing temperature of 42 degrees) and 70deg (annealing temperature of 70 degrees), CMfinder CRSs (CMfinder predicted CRSs (pscore>50)) and probes (60nt long CaptureSeq probes).

[CRS region]

Link shows a sortable table of all CRSs that are located in the same CRS region (region of overlapping CRSs).

+ Summary

General features of the CRSs. Alignment specific features have been retrieved from both the 100 species tree and 17 species tree.

+ Consensus structure

CMfinder predicted consensus secondary structure is shown as dot-bracket notation and as interactive drawing by forna for the 100 species tree alignment. Bases in the interactive structure are named as consensus sequence and colored by relative entropy (also called Kullback–Leibler divergence) in bits (from 0 as purple to 2 as dark red). As reference base probabilities we use 0.25 for all four nucleotides. The structure mapped to the 17 species tree alignment is available as PNG and its bases are named as human sequence and colored by the base pair conservation in the 17 species tree alignment.

+ Alignment

CRS alignment based on the CMfinder predicted covariance model for both the 100 species tree and the 17 species tree (the latter is a subset of the former with gap columns being removed). The interactive alignments use the JSAV - JavaScript Sequence Alignment Viewer. It enables the user to hide and select sequences, change the color scheme and retrieve the selected sequences as FASTA file. In the FASTA file all the gap columns (which ocurred due to the removal of sequences) are removed. The genomic coordinates of the sequences in the CRS alignment can be seen in the viewer and can be retrieved as TXT by the Export Fasta button. In addition, the alignments can be retrieved in STOCKHOLM format or as static PNG image colored by the base pair conservation in the respective alignment.

+ Overlap with RNAcentral

We have integrated CRSs with a false discovery rate lower or equal to 10% into RNAcentral using their genomic locus in 29 vertebrate species. CRSs were excluded if they match to known structured RNAs from Rfam. in this Result subsection, you find the overlap of non-coding RNA (ncRNA) sequences annotated in RNAcentral.

+ Annotation

Gene annotation in human on both strands based on GENCODE Release 25.

+ Evolutionary selection

Several tests for negative selection of CRSs. The selection ratio has been improved since the publication of the resource, including an update of the local neutral model and false discovery calculation of the selection ratio (FDR(SR)), and will be presented in another manuscript soon.

+ Covarying base pairs

Covariation analysis with R-scape for an independent test of evolutionary support of the conserved RNA structure of CRSs. Two independent covariation tests are performed, one on the base pairs in the CRS consensus structure compared to the conserved primary sequence, the other on all other possible pairs. We run R-scape v1.5.16 on the 100 species tree alignment with parameters -s --GTp --C16. The consensus structure is visualized with the significant covarying base pairs (E-value<0.05) highlighted in green (R2R drawing).

+ Chromatin signatures

DNase hypersensitive sites (from 95 cell types in ENCODE) and Genome Segmentations based on chromatin modifications (from 6 cell lines in ENCODE) in human.

+ Protein binding sites

ChIP and CLIP-seq in human.

+ Expression

Poly-A selected RNA-seq, total RNA-seq and targeted total RNA-seq in human (our CaptureSeq experiment in human fetal brain targeting CRSs with 60 nt probes).

+ Transcript boundaries and stability

Closest upstream 5' site (CAGE) and downstream poly-A site (3'-end sequencing), and exosome sensitivity (CAGE) in human.