ISCB-Asia/SCCG 2012, BGI Special Session


Stephen Kwok-Wing Tsui
Chinese University of Hong Kong

Discovering Protein-DNA Binding Sequence Patterns Using Association Rule Mining

Abstract

Protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein-DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs.

With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF-TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values ≤ 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations.

Resultant subtypes have various biological meanings. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEIL-CAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. With further independent verifications from literatures, Protein Data Bank / homology modeling and ChIP-seq data, there are strong evidences that the patterns discovered reveal real TF-TFBS bindings across different TFs and TFBSs, which are informative and promising for more biological findings.

References

"Discovering protein–DNA binding sequence patterns using association rule mining",
Kwong-Sak Leung, Ka-Chun Wong, Tak-Ming Chan, Man-Hon Wong, Kin-Hong Lee, Chi-Kong Lau & Stephen K. W. Tsui
Nucl. Acids Res., 38:19, 6324-37 2010.