PhD Defence: Sachin Pundir

2013-10-08: In Silico detection of RNA-seq profiles in high-throughput sequencing data and their relation to non-coding RNAs. Venue: Orangeriet, Dyrlægevej 36, Frederiksberg C. Time: 13.00h.
Everybody is welcome. Registration is not necessary.

Assessment committee:

  • Professor Lars Juhl Jensen, CPR, The Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, Panum Instituttet, KU (Chairman).
  • Professor Ivo L Hofacker,Institute for Theoretical Chemistry and Research Group Bioinformatics and Computational Biology, University of Vienna
  • Associate Professor Chris Workman, Center for Biological Sequence Analysis, Department of Systems Biology, DTU.

Chair of defense:
Associate Professor Jakob Hull Havgaard, Center for non-coding RNA in Technology and Health, IKVH, KU.

Professor Jan Gorodkin, Center for non-coding RNA in Technology and Health, IKVH, KU.


The focus of the Ph.D. project was to determine and develop methods to search for characteristic features from the high-throughput sequencing, in particular RNA-seq, data that can aid to annotate non-coding RNAs and to understand the regulatory mechanism by which the transcriptome diversity within the cell is defined. For this, we used clusters of mapped reads (from RNA-seq experiments) termed as ‘read profile’ to analyze the high-throughput sequencing data. A method for the optimal alignment of two read profiles (rather than the primary sequences) is developed and
its application showed that read profile is a characteristic and distinguishable feature for a number of non-coding RNAs. Next, we utilized this knowledge to predict novel microRNA candidates in human genome, most of which were identified in non-conserved regions. Notably, their processing patterns were still similar to known ones.

To potentially obtain more knowledge about the regulation of the transcript processing machinery that most often is represented in its read profile, we performed an analysis of read profiles across biological replicates of short RNA-seq experiments performed on nine human tissues from the ENCODE project. We aimed to identify loci where read profiles for a sub-set of tissues are significantly different from the rest of the tissues (differentially processed) or are completely coherent across all the tissues (coherently processed). Based on sound statistical methods, we identified known and novel cases of ‘arm-switching’ in microRNA as well as differential processing in other class of ncRNAs. Also, we observed a significant enrichment of coherently processed loci
 overlapping to transcription start sites and DNase I hypersensitive sites. For the wider usability, we developed a web server for the analysis and comparison of read profiles, utilizing the methods established during the previous three studies.

During this work, we also created a database of read profiles that are characterized by similar arrangement of the constituent reads with respect to their start position between multiple tissues. The database currently holds read profiles belonging to various ncRNA classes from five organisms. This database can be regarded as the first attempt to make a compilation of biologically relevant read profiles, the similarity search to which can aid in the annotation of genomic loci that share a read profile for example, similar to a known ncRNA read profile. As a case study, we utilized the wealth of publicly available high-throughput sequencing data to identify regulatory elements within the first intron of Cd247 gene. Based on the analysis, we identified long ncRNA and two transcription factor binding sites that are spatially conserved in both human and mouse