WebCircRNA: Assessing the circular RNA potential of coding and noncoding RNA
Downloads
WebCircRNA is a machine learning based method to discriminate circRNAs from protein coding genes and long non-coding RNA, further predict stem cell specific circRNAs, which is trained on stem cell circRNA and other circRNA dataset.
Training dataset for circRNA vs PCG model
10,000 training circRNAs8,000 training PCGs
independent testing dataset for circRNA vs PCG model
4,084 independent testing circRNAs1,533 independent testing PCGs
Training dataset for stem cell circRNA and other circRNA model
1,800 training stem cell circRNAs1,800 training other circRNAs
independent testing dataset for stem cell circRNA and other circRNA model
282 independent testing stem cell circRNAs282 independent testing other circRNAs
Training lncRNA with the same positive training circRNA in circRNA vs PCG model
10,000 training lncRNAsIndependent testing lncRNA with the same independent circRNA in circRNA vs PCG model
9,722 independent lncRNAsIndependent testing dataset from mouse
5,657 mouse circRNAs3,904 mouse lncRNAs
16,763 mouse PCGs
Software
Download WebCircRNA To use WebCircRNA, you firstly need untar the package and run config.sh to do some configuration, then you can run the WebCircRNA.py script.
and the following dependency is required before you run config.sh:
1. txCdsPredict: http://hgdownload.cse.ucsc.edu/admin/
2. Tandem repeats finder(trf): http://tandem.bu.edu/trf/trf.download.html
3. GraphProt: http://www.bioinf.uni-freiburg.de/Software/GraphProt/GraphProt-1.0.1.tar.bz2, our package include the source code, if EDen does not work, you should compile it.
4. machine learning lib scikit-learn: https://github.com/scikit-learn/scikit-learn
software usage
For BED format input:python WebCircRNA.py --inputfile=test.bed --outputfile=result
For fasta file input, where model is trained only on sequence features without conservation, ALU repeat and SNP features
python WebCircRNA.py --inputfile=test.fa --outputfile=result --seq=1
Related sources
circbaseGENCODE
For comments, suggestions for improvement or bug reports contact Xiaoyong Pan: panxy(at)rth.dk