Abstract
Next-generation sequencing technologies have brought a paradigm shift in biology and genomic sequence data has become an integral part of most of studies ranging from biodiversity mapping to designing personalised treatments in cancer and other diseases. Furthermore, attempts are made to map viral and microbial diversity in individual patients/populations. This data provides opportunities to study phylogenetic diversity within and between species of interest. A large number of methods/models are available for gene-based (phylogenetics) and genome-based (phylogenomics) molecular phylogeny analysis (MPA). The typical steps include compilation of data, multiple sequence alignments, derivation of distances using distance-based or character-based models, use of clustering method(s), statistical models to assess tree topology(ies) and analysis of inferred tree(s). However, the time and complexity for MPA depends on the length and number of sequences being analysed. Most of the alignment-based MPA methods cannot be used with increasing genomic data. Attempts are made to develop a few alignment free methods which use information content and/or frequency analysis of k-mers to infer phylogeny. Our group has developed a novel alignment-free method for clustering and phylogenetic analysis, which is based on concept of return time distribution (RTD) of k-mers used in stochastic processing. The RTD and its parameters, mean and sigma are used to derive the distance function. The unique feature of this distance function is that it takes in to account relative order of its components as return times and is shown to perform equally well at varying levels of sequence similarity. Thus, in addition to phylogeny, the method also has applications in typing of viruses and a few servers for typing of Mumps virus, Dengue virus and West nile virus etc. are made available online (
http://bioinfo.net.in). The method has also been tested for MLST typing of bacterial species and development of servers for typing of bacterial species is under progress. The method and its applications in study of diversity of viral and bacterial pathogens will be discussed along with its implications in designing viral vaccines using reverse vaccinology approach.