Saturday, 25 September 2010

SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context

Davey NE, Haslam NJ, Shields DC & Edwards RJ (2010): SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context. In: Pattern Recognition in Bioinformatics Edited by Dijkstra TMH, Tsivtsivadze E, Marchiori E & Heskes T. Springer-Verlag, Berlin. Lecture Notes in Bioinformatics 6282: 50-61.

Abstract

Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch (Short, Linear Motif Search) webserver is a flexible tool that enables researchers to identify novel occurrences of predefined SLiMs in sets of proteins. Numerous masking options give the user great control over the contextual information to be included in the analyses, including evolutionary filtering and protein structural disorder. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. Users can search motifs against the human proteome, or submit their own datasets of UniProt proteins, in which case motif support within the dataset is statistically assessed for over- and under-representation, accounting for evolutionary relationships between input proteins. SLiMSearch is freely available as open source Python modules and all webserver results are available for download. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch.html.

Wednesday, 11 August 2010

Protein interactions with the platelet integrin alpha(IIb) regulatory motif

Raab M, Daxecker H, Edwards RJ, Treumann A, Murphy D & Moran N (2010): Protein interactions with the platelet integrin alpha(IIb) regulatory motif. Proteomics 10: 2790-2800.

Abstract

Integrins are transmembrane proteins regulating cellular shape, mobility and the cell cycle. A highly conserved signature motif in the cytoplasmic tail of the integrin alpha-subunit, KXGFFKR, plays a critical role in regulating integrin function. To date, six proteins have been identified that target this motif of the platelet-specific integrin alpha(IIb)beta(3). We employ peptide-affinity chromatography followed-up with LC-MS/MS analysis as well as protein chips to identify new potential regulators of integrin function in platelets and put them into their biological context using information from protein:protein interaction (PPI) databases. Totally, 44 platelet proteins bind with high affinity to an immobilized LAMWKVGFFKR-peptide. Of these, seven have been reported in the PPI literature as interactors with integrin alpha-subunits. 68 recombinant human proteins expressed on the protein chip specifically bind with high affinity to biotin-tagged alpha-integrin cytoplasmic peptides. Two of these proteins are also identified in the peptide-affinity experiments, one is also found in the PPI databases and a further one is present in the data to all three approaches. Finally, novel short linear interaction motifs are common to a number of proteins identified.

PMID: 20486118

Wednesday, 2 June 2010

Computational identification and analysis of protein short linear motifs

Davey NE, Edwards RJ & Shields DC (2010): Computational identification and analysis of protein short linear motifs. Frontiers in Bioscience 15: 801-25.

Abstract

Short linear motifs (SLiMs) in proteins can act as targets for proteolytic cleavage, sites of post-translational modification, determinants of sub-cellular localization, and mediators of protein-protein interactions. Computational discovery of SLiMs involves assembling a group of proteins postulated to share a potential motif, masking out residues less likely to contain such a motif, down-weighting shared motifs arising through common evolutionary descent, and calculation of statistical probabilities allowing for the multiple testing of all possible motifs. Much of the challenge for motif discovery lies in the assembly and masking of datasets of proteins likely to share motifs, since the motifs are typically short (between 3 and 10 amino acids in length), so that potential signals can be easily swamped by the noise of stochastically recurring motifs. Focusing on disordered regions of proteins, where SLiMs are predominantly found, and masking out non-conserved residues can reduce the level of noise but more work is required to improve the quality of high-throughput experimental datasets (e.g. of physical protein interactions) as input for computational discovery.

PMID: 20515727

Monday, 24 May 2010

SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs

Davey NE, Haslam NJ, Shields DC & Edwards RJ (2010): SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nucleic Acids Research 38: W534-W539.

Abstract

Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. The Short, Linear Motif Finder (SLiMFinder) web server is a de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses. The SLiMFinder server combines these with user-friendly output and visualizations of motif context to allow the user to quickly gain insight into the validity of a putatively functional motif. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Returned motifs can also be compared with known SLiMs from the literature using CompariMotif. All results are available for download. The SLiMFinder server is available at: http://bioware.ucd.ie/slimfinder.html.

PMID: 20497999

Friday, 8 January 2010

Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins

Davey NE, Edwards RJ & Shields DC (2010): Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins. BMC Bioinformatics 11: 14.

Abstract

BACKGROUND: Large datasets of protein interactions provide a rich resource for the discovery of Short Linear Motifs (SLiMs) that recur in unrelated proteins. However, existing methods for estimating the probability of motif recurrence may be biased by the size and composition of the search dataset, such that p-value estimates from different datasets, or from motifs containing different numbers of non-wildcard positions, are not strictly comparable. Here, we develop more exact methods and explore the potential biases of computationally efficient approximations.

RESULTS: A widely used heuristic for the calculation of motif over-representation approximates motif probability by assuming that all proteins have the same length and composition. We introduce pv, which calculates the probability exactly. Secondly, the recently introduced SLiMFinder statistic Sig, accounts for multiple testing (across all possible motifs) in motif discovery. However, it approximates the probability of all other possible motifs, occurring with a score of p or less, as being equal to p. Here, we show that the exhaustive calculation of the probability of all possible motif occurrences that are as rare or rarer than the motif of interest, Sig’, may be carried out efficiently by grouping motifs of a common probability (i.e. those which have permuted orders of the same residues). Sig’v, which corrects both approximations, is shown to be uniformly distributed in a random dataset when searching for non-ambiguous motifs, indicating that it is a robust significance measure.

CONCLUSIONS: A method is presented to compute exactly the true probability of a non-ambiguous short protein sequence motif, and the utility of an approximate approach for novel motif discovery across a large number of datasets is demonstrated.

PMID: 20055997