ISCB-Asia/SCCG 2012 Proceedings Talk

iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data

Kun Sun1,2, Xiaona Chen1,3, Peiyong Jiang1,2, Xiaofeng Song4, Huating Wang1,3 & Hao Sun1,2
1Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong
2Department of Chemical Pathology, The Chinese University of Hong Kong 3Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong 4Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics 2

Abstract

Background: Long intergenic non-coding RNAs (lincRNAs) are emerging as a novel class of non-coding RNAs and potent gene regulators. High-throughput RNA-sequencing combined with de novo assembly promises quantity discovery of novel transcripts. However, the identification of lincRNAs from thousands of assembled transcripts is still challenging due to the difficulties of separating them from protein coding transcripts (PCTs).

Results: We therefore implemented iSeeRNA, a support vector machine (SVM)-based classifier for the identification of lincRNAs.

Conclusion: iSeeRNA demonstrates high prediction accuracy and runs several magnitudes faster than other similar programs. It can be integrated into the transcriptome data analysis pipelines or run as a web server, thus offering a valuable tool for lincRNA study.

Availability: Both source code and pre-complied packages are available at http://www.myogenesisdb.org/iSeeRNA/.