Talk Abstract

Structure prediction with AlphaFold2 is set to have a huge impact on biology, medicine, and biotechnology.  AlphaFold2 is not only accurate but, if optimized, also fast.  Our ColabFold-AlphaFold2 pipeline accurately predicts the structures of a whole proteome within two days on a single GPU, approx. 100 times faster compared to the AlphaFold2 base system.  The availability of these methods and Deepmind/EBIs large-scale effort to predict the structure of every UniRef90 protein sequence (>100Mio.) is rapidly increasing the number of available structures.  Analysing these structural datasets became a major bottleneck.  In particular, a simple search for homologous structures in a database of one million entries takes a week on a single core using currently available tools.  To address this issue, we developed Foldseek for fast and sensitive similarity searching through large structural databases.  Foldseek is about four orders or magnitude faster than current structural aligners allowing to search in seconds through millions of structures.  During this talk I will explain how we designed Colabfold to predict highly accurate structures in seconds as well as how Foldseek efficiently queries large structural database.  Both tools are open source and can be accessed at colabfold.com and foldseek.com, respectively.

Speaker Biography

Dr. Steinegger is an Assistant Professor at the Seoul National University, where he is affiliated to the Biology department, Institute of Molecular Biology and Genetics, Artificial Intelligence Institute and the Bioinformatics Graduate School.  His research group focuses on the development of big data and machine learning algorithms to analyse genomic and proteomic sequence data.  His group is best known for bioinformatics software to cluster (Linclust), assemble (Plass), search (MMseqs2) sequences and to predict protein structures (AlphaFold2/ColabFold) and search (Foldseek) them.  These software packages are used by researchers around the world and were installed hundreds of thousands of times.

He studied bioinformatics and computer science at the Technical University Munich and Ludwig Maximilian University of Munich.  During this time, he worked as a research assistant of Professor Burkhard Rost, focusing on the development of methods for predicting protein mutation effects.  He received his Ph.D. from the Technical University Munich in collaboration with Dr. Johannes Söding at the Max Planck Institute for Biophysical Chemistry for his work on computational methods to assemble, cluster and annotate metagenomic sequencing data.  As a Postdoc in the group of Professor Steven L. Salzberg at the CCB at Johns Hopkins University, he developed methods for the identification of pathogenic agents in infectious diseases, the detection of assembly contamination in public datasets and the annotation of missing exons in the human proteome.

Dr. Steinegger is an expert on large scale sequence data analysis and method development and an advocate for open science and open source.