Kirsti M G Paulsen, Norman E Davey, Sobia Idrees, Åsa Pérez-Bercoff & Richard J Edwards.
This work was presented at the Sydney Bioinformatics Research Symposium 2018. (Abstract below.) Click on thumbnail for full resolution PDF.
Short linear motifs (SLiMs) are short stretches of proteins that are directly involved in protein-protein interactions (PPI). Identifying SLiMs is important for understanding fundamental processes involved in normal cellular function. SLiMs are commonly only 3 - 10 amino acids in length and form low affinity interactions. This makes them ideal for fast cellular processes, such as cell signalling or response to stimuli, but also difficult to predict experimentally. As a result, many computational SLiM prediction methods have been developed. In order to increase the signal to noise ratio of SLiM predictions, different sequence masking techniques have been developed. These attempt to screen out areas that are unlikely to contain SLiMs and thereby preferentially eliminate the random nonfunctional sequences. One widely implemented masking strategy is to remove protein regions that form stable three-dimensional structures; SLiMs are typically found in regions of intrinsic disorder that are natively unstructured in their unbound form. To date, there has been no systematic study of how best to predict intrinsic disordered protein regions for SLiM discovery. Poor quality predictions will not have the desired noise-removal, while over-stringent masking will remove too many true positives. The aim of this study is to compare how ten different disorder prediction methods affect SLiM occurrence prediction and to identify the best method and settings for this purpose. The disorder prediction scores for each residue in the human proteome was obtained from the MobiDB database. Further, this study aims to investigate whether the optimal disorder masking settings for occurrence SLiM prediction are the same for de novo SLiM prediction and for identification of SLiM mediated PPIs.