WebCircRNA: Assessing the circular RNA potential of coding and noncoding RNA

About

Circular RNAs (circRNAs) are increasingly recognized to play crucial roles in posttranscriptional gene regulation including functioning as microRNA sponges. It is therefore highly relevant to identify if a transcript of interest can also function as a circRNA.
Here we present a user-friendly webserver that predicts if coding and noncoding RNAs have circRNA isoforms and whether circRNAs are expressed in stem cells. The predictions are made by random forest models using sequence-derived features as input. The output scores are converted to fractiles, which are used to assess the circRNA and stem cell potential. The performances of the three models are reported as the area under the ROC curve and are 0.82 for coding genes, 0.89 for lncRNAs and 0.72 for stem cell expression.



Three Random forest models

WebCircRNA predict the fractile score from 3 different models:
1. CP-lncRNA: being circRNAs from lncRNAs;
2. CP-PCG: being circRNAs from protein coding genes;
3. SP-circRNA: being stem cell circRNAs from other circRNAs;

They are trained on different training datasets. The first one is circRNA vs lncRNA dataset [3], the second one is on circRNA vs PCG dataset, the third one is stem cell circRNA vs other circRNAs. WebCircRNA webserver show the results from the 3 models.
For circRNA vs lncRNA and circRNA vs PCG model, they are trained on the same positive dataset, but on different negative dataset. One is using lncRNA as negative samples, one is using PCGs as negative samples.

Models for different inputs

WebCircRNA also provide machine leanring based models for two type of inputs. One is BED format input, which take sequence informarion and context information, such as conservation, into model training. circRNAs are conserved across many species. In this model, if users want to predict circRNA in other species, the candidate need to be mapped to human (hg19) using BLAT . The other is seqeucne fasta input, which only used sequence information without using conservation information. In this model, users can directly predict candidates in other mammals/vertebrates. All the models are trained on human data, because circRNA is conservered across different species, so this trained model on human data is also applied for other speceis.

Reference

[1] Xiaoyong Pan*, Kai Xiong*, et al. WebCircRNA: Assessing the circular RNA potential of coding and noncoding RNA.
[2] Kai Xiong*, Xiaoyong Pan*, Poul Hyttel, Jan Gorodkin, Kristine K Freude. LIN28A Expresses Circular RNAs in Early Porcine Embryos.
[3] Xiaoyong Pan, Kai Xiong. (2015) PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. Mol Biosyst. 2015 Aug;11(8):2219-26