Wednesday 31 October 2018

SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions

Idrees S, Pérez-Bercoff Å & Edwards RJ. (2018) SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions. PeerJ 6:e5858 https://doi.org/10.7717/peerj.5858

Abstract

Many important cellular processes involve protein–protein interactions (PPIs) mediated by a Short Linear Motif (SLiM) in one protein interacting with a globular domain in another. Despite their significance, these domain-motif interactions (DMIs) are typically low affinity, which makes them challenging to identify by classical experimental approaches, such as affinity pulldown mass spectrometry (AP-MS) and yeast two-hybrid (Y2H). DMIs are generally underrepresented in PPI networks as a result. A number of computational methods now exist to predict SLiMs and/or DMIs from experimental interaction data but it is yet to be established how effective different PPI detection methods are for capturing these low affinity SLiM-mediated interactions. Here, we introduce a new computational pipeline (SLiMEnrich) to assess how well a given source of PPI data captures DMIs and thus, by inference, how useful that data should be for SLiM discovery. SLiMEnrich interrogates a PPI network for pairs of interacting proteins in which the first protein is known or predicted to interact with the second protein via a DMI. Permutation tests compare the number of known/predicted DMIs to the expected distribution if the two sets of proteins are randomly associated. This provides an estimate of DMI enrichment within the data and the false positive rate for individual DMIs. As a case study, we detect significant DMI enrichment in a high-throughput Y2H human PPI study. SLiMEnrich analysis supports Y2H data as a source of DMIs and highlights the high false positive rates associated with naïve DMI prediction. SLiMEnrich is available as an R Shiny app. The code is open source and available via a GNU GPL v3 license at: https://github.com/slimsuite/SLiMEnrich. A web server is available at: http://shiny.slimsuite.unsw.edu.au/SLiMEnrich/.

Ziying Zhang (Visiting Masters student)

Ziying Zhang is a visiting Masters student from Wageningen University in the Netherlands. She got a bachelor degree with the background of bioscience at Hainan University, China in 2016. She started her MSc in bioinformatics at Wageningen University in February 2017. The project for Ziying’s Masters thesis focused on developing a downstream analysis toolbox for microbial diversity analysis, executing SPARQL to query RDF datasets of 16s rRNA before launching an analysis to simplify the pre-processing procedures.

Ziying started an internship at the Edwards Lab at the end of October 2018, working on genome size prediction from kmer profiles. Her interests include genomics, genetics, bioinformatics, programming and data management.

[LinkedIn]

Wednesday 3 October 2018

Research snapshot - October 2018

One of the most important, interesting and challenging questions in biology is how new traits evolve at the molecular level. My lab employs sequence analysis techniques to interrogate protein and DNA sequences for the signals left behind by evolution. We are a bioinformatics lab but like to incorporate bench data through collaboration wherever possible.

Main Research

The core research in the lab is broadly divided into three main themes:

1. Short Linear Motifs (SLiMs)

Many protein-protein interactions are mediated by Short Linear Motifs (SLiMs): short stretches of proteins (5-15 amino acids long), of which only a few positions are critical to function. These motifs are vital for biological processes of fundamental importance, acting as ligands for molecular signalling, post-translational modifications and subcellular targeting. SLiMs have extremely compact protein interaction interfaces, generally encoded by less than 4 major affinity-/specificity-determining residues. Their small size enables high functional density and evolutionary plasticity, making them frequent products of convergent "ex nihilo" evolution. It also makes them challenging to identify, both experimentally and computationally.

A major focus of the lab is the computational prediction of SLiMs from protein sequences. This research originated with Rich’s postdoctoral research, during which he developed a sequence analysis methods for the rational design of biologically active short peptides. He subsequently developed SLiMDisc, one of the first algorithms for successfully predicting novel SLiMs from sequence data - and coined the term “SLiM” into the bargain. This subsequently lead to the development of SLiMFinder, the first SLiM prediction algorithm able to estimate the statistical significance of motif predictions. SLiMFinder greatly increased the reliability of predictions. SLiMFinder has since spawned a number of motif discovery tools and webservers and is still arguably the most successful SLiM prediction tool on benchmarking data.

Current research is looking to develop these SLiM prediction tools further and apply them to important biological questions. Of particular interest is the molecular mimicry employed by viruses to interact with host proteins and the role of SLiMs in other diseases, such as cancer. Other work is concerned with the evolutionary dynamics of SLiMs within protein interaction networks.

2. The evolution of novel functions.

Previous work in the lab has focused on the evolution of functional specificity following gene duplication. Since moving to UNSW, activities have shifted more towards the use of PacBio long read sequencing and other cutting-edge sequencing technologies, working closely with the Ramaciotti Centre for Genomics. We are collaborating with industrial and academic partners to de novo sequence, assemble, annotate and interrogate the genomes of a selection of microbes with interesting metabolic abilities. Most notably, we have an ARC Linkage Grant with Microbiogen Pty Ltd. to understand how a strain of Saccharomyces cerevisiae has evolved to efficiently use xylose as a sole carbon source: something vital for second generation biofuel production that wild yeast cannot do. We are combining comparative genomics, evolutionary genetics, RNA-Seq transcriptomics, and competition assays to understand how the novel metabolism evolved. Through deep Illumina resequencing of evolving populations, and assembling reliable complete genomes of the founding ancestors, the ultimate goal is to trace how mutations have interacted with existing genetic variation during adaptive evolution.

3. Whole genome sequencing and assembly.

Following our experiences with de novo whole genome assembly in yeast, the lab is getting involved in an increasing number of genome sequencing projects. The biggest of these is leading the bioinformatics and assembly effort in a consortium to sequence the cane toad genome. The lab is also leading the BABS Genome project two iconic Australian snakes for use in teaching and public engagement. We are a member of the Oz Mammals Genomics initiative, assisting with the sequencing and assembly of Australia's unique marsupial fauna. We also have an number of bacterial long-read whole genome sequencing collaborations.

Other Research Projects

In addition to the main research in the lab, the lab has a number of interdisciplinary collaborative projects applying bioinformatics tools and molecular evolution theory to experimental biology, often using large genomic, transcriptomic and/or proteomic datasets. These projects often involve the development of bespoke bioinformatics pipelines and a number of open source bioinformatics tools have been generated as a result. We frequently have small collaborations and/or undergraduate student research projects. Many of these are “on hold” waiting for the right person, or sometimes data, to come along. If you think that you have what it needs, get in touch!

Previous Research

The lab has been involved in a number of interdisciplinary collaborative projects applying bioinformatics tools and molecular evolution theory to experimental biology, often using large genomic, transcriptomic and/or proteomic datasets. These projects often involved the development of bespoke bioinformatics pipelines and a number of open source bioinformatics tools have been generated as a result. Please see the Publications and Lab software pages for more detail, or get in touch if something catches your eye and you want to find out more.