WebCircRNA: Assessing the circular RNA potential of coding and noncoding RNA

Downloads

WebCircRNA is a machine learning based method to discriminate circRNAs from protein coding genes and long non-coding RNA, further predict stem cell specific circRNAs, which is trained on stem cell circRNA and other circRNA dataset.

Training dataset for circRNA vs PCG model

10,000 training circRNAs
8,000 training PCGs

independent testing dataset for circRNA vs PCG model

4,084 independent testing circRNAs
1,533 independent testing PCGs

Training dataset for stem cell circRNA and other circRNA model

1,800 training stem cell circRNAs
1,800 training other circRNAs

independent testing dataset for stem cell circRNA and other circRNA model

282 independent testing stem cell circRNAs
282 independent testing other circRNAs

Training lncRNA with the same positive training circRNA in circRNA vs PCG model

10,000 training lncRNAs

Independent testing lncRNA with the same independent circRNA in circRNA vs PCG model

9,722 independent lncRNAs

Independent testing dataset from mouse

5,657 mouse circRNAs
3,904 mouse lncRNAs
16,763 mouse PCGs

Software

Download WebCircRNA

To use WebCircRNA, you firstly need untar the package and run config.sh to do some configuration, then you can run the WebCircRNA.py script.
and the following dependency is required before you run config.sh:
1. txCdsPredict: http://hgdownload.cse.ucsc.edu/admin/
2. Tandem repeats finder(trf): http://tandem.bu.edu/trf/trf.download.html
3. GraphProt: http://www.bioinf.uni-freiburg.de/Software/GraphProt/GraphProt-1.0.1.tar.bz2, our package include the source code, if EDen does not work, you should compile it.
4. machine learning lib scikit-learn: https://github.com/scikit-learn/scikit-learn


software usage

For BED format input:
python WebCircRNA.py --inputfile=test.bed --outputfile=result

For fasta file input, where model is trained only on sequence features without conservation, ALU repeat and SNP features
python WebCircRNA.py --inputfile=test.fa --outputfile=result --seq=1


Related sources

circbase
GENCODE

For comments, suggestions for improvement or bug reports contact Xiaoyong Pan: panxy(at)rth.dk