Seminar: Identifying and Cataloguing Functional LncRNAs in Human and Mouse

2015-09-10: by Jennifer Harrow, Wellcome Trust Sanger Institute, UK. The seminar will take place September 10th, 14.15-15.00 at University of Copenhagen, SUND/SCIENCE, Orangeriet, Dyrlægevej 36, Frederiksberg C.

Registration is not necessary. Refreshments will be provided.

Abstract: Many groups are generating and data-mining a wealth of Illumina RNAseq data available in the public domain to identify “tens of thousands” of novel long non-coding RNAs. The reliability of these models is variable and can depend on the length and quality of input data and algorithms used. As part of the GENCODE consortium we are combining different resources to produce a reference non-coding gene catalogue in human and mouse, publicly available in UCSC and Ensembl browsers. Currently we have identified around 15 900 human loci and 8 000 mouse loci that are potential long non-coding (lnc) genes. Nomenclature and classification of these entities is usually based on proximity to other coding genes, rather than based on their function. As part of the GENCODE project, we have analysed 400 lncRNAs identified as partial transcripts through the lack of CAGE data or polyadenylation signals. We have extended these loci using RACEseq and long read protocols to investigate expression in 8 different tissues. The majority of lncRNA sequences appear to be poorly conserved on the sequence level, yet annotating both mouse and human regions in parallel reveals syntenically equivalent transcripts. We are also using capture-seq technology and PacBio sequencing to compare expression of lncRNAs in the different organisms and identify novel full-length transcripts. The coding potential of these novel loci was investigated using proteomics data from the Kuster and Pandey labs. In summary, we highlight how this mix of next generation data may double the number of genes in GENCODE, presenting new challenges in cataloguing functional lncRNAs for human and mouse.