Thursday, 20 July 2006

SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent

Davey NE, Shields DC & Edwards RJ (2006): SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res. 34(12):3546-54.

Abstract

Many important interactions of proteins are facilitated by short, linear motifs (SLiMs) within a protein’s primary sequence. Our aim was to establish robust methods for discovering putative functional motifs. The strongest evidence for such motifs is obtained when the same motifs occur in unrelated proteins, evolving by convergence. In practise, searches for such motifs are often swamped by motifs shared in related proteins that are identical by descent. Prediction of motifs among sets of biologically related proteins, including those both with and without detectable similarity, were made using the TEIRESIAS algorithm. The number of motif occurrences arising through common evolutionary descent were normalized based on treatment of BLAST local alignments. Motifs were ranked according to a score derived from the product of the normalized number of occurrences and the information content. The method was shown to significantly outperform methods that do not discount evolutionary relatedness, when applied to known SLiMs from a subset of the eukaryotic linear motif (ELM) database. An implementation of Multiple Spanning Tree weighting outperformed two other weighting schemes, in a variety of settings.

PMID: 16855291

Wednesday, 16 November 2005

BADASP: predicting functional specificity in protein families using ancestral sequences

Edwards RJ & Shields DC (2005): BADASP: predicting functional specificity in protein families using ancestral sequences. Bioinformatics 21(22):4190-1.

Abstract

SUMMARY: Burst After Duplication with Ancestral Sequence Predictions (BADASP) is a software package for identifying sites that may confer subfamily-specific biological functions in protein families following functional divergence of duplicated proteins. A given protein phylogeny is grouped into subfamilies based on orthology/paralogy relationships and/or user definitions. Ancestral sequences are then predicted from the sequence alignment and the functional specificity is calculated using variants of the Burst After Duplication method, which tests for radical amino acid substitutions following gene duplications that are subsequently conserved. Statistics are output along with subfamily groupings and ancestral sequences for an easy analysis with other packages.

AVAILABILITY: BADASP is freely available from http://www.bioinformatics.rcsi.ie/~redwards/badasp/

PMID: 16159912

Wednesday, 2 November 2005

Correlation of probiotic Lactobacillus salivarius growth phase with its cell wall-associated proteome

Kelly P, Maguire PB, Bennett M, Fitzgerald DJ, Edwards RJ, Thiede B, Treumann A, Collins JK, O’Sullivan GC, Shanahan F & Dunne C (2005): Correlation of probiotic Lactobacillus salivarius growth phase with its cell wall-associated proteome. FEMS Microbiol. Lett. 252(1):153-159.

Abstract

Lactobacillus salivarius subsp. salivarius UCC118 is a probiotic bacterium that was originally isolated from human intestinal tissues and was subsequently shown in a pilot study to alleviate symptoms associated with mild-moderate Crohn’s disease. Strain UCC118 can adhere to animal and human intestinal tissue, and to both healthy and inflamed ulcerative colitis mucosa, irrespective of location in the gut. In this study, an enzymatic technique has been combined with proteomic analysis to correlate bacterial growth phase with the presence of factors present in the cell wall of the bacterium. Using PAGE electrophoresis, it was determined that progression from lag to log to stationary growth phases in vitro correlated with increasing prominence of an 84kD protein associated with in vitro adherence ability. Isolated proteins from the 84kD band region were further separated by two-dimensional electrophoresis, resolving this band into 20 individual protein spots at differing isoelectric points. The protein moieties were excised, trypsin digested and subjected to tandem mass spectrometry. The observed proteins are analogous to those reported to be associated with the Listeria monocytogenes cell-wall proteome, and include DnaK, Ef-Ts and pyruvate kinase. These data suggest that at least some of the beneficial attributes of probiotic lactobacilli, and in particular this strain, may be due to nonpathogenic mimicry of pathogens and potentially be mediated through a form of attenuated virulence.

PMID: 16214296

Friday, 29 July 2005

Tandem repeat copy-number variation in protein-coding regions of human genes

O’Dushlaine CT, Edwards RJ, Park SD & Shields DC (2005): Tandem repeat copy-number variation in protein-coding regions of human genes. Genome Biol. 6(9):R69.

Abstract

BACKGROUND: Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

RESULTS: Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term ‘protein-binding’ remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

CONCLUSION: Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.

PMID: 16086851

Saturday, 26 March 2005

Tyrosine phosphorylation of myosin heavy chain during skeletal muscle differentiation: an integrated bioinformatics approach

Harney DF, Butler RK & Edwards RJ (2005): Tyrosine phosphorylation of myosin heavy chain during skeletal muscle differentiation: an integrated bioinformatics approach. Theor. Biol. Med. Model. 2(1):12.

Abstract

BACKGROUND: Previously it has been shown that insulin-mediated tyrosine phosphorylation of myosin heavy chain is concomitant with enhanced association of C-terminal SRC kinase during skeletal muscle differentiation. We sought to identify putative site(s) for this phosphorylation event.

RESULTS: A combined bioinformatics approach of motif prediction and evolutionary and structural analyses identified tyrosines163 and 1856 of the skeletal muscle heavy chain as the leading candidate for the sites of insulin-mediated tyrosine phosphorylation.

CONCLUSION: Our work is suggestive that tyrosine phosphorylation of myosin heavy chain, whether in skeletal muscle or in platelets, is a significant event that may initiate cytoskeletal reorganization of muscle cells and platelets. Our studies provide a good starting point for further functional analysis of MHC phosphor-signalling events within different cells.

PMID: 15790426

Tuesday, 7 September 2004

GASP: Gapped Ancestral Sequence Prediction for proteins

Edwards RJ & Shields DC (2004): GASP: Gapped Ancestral Sequence Prediction for proteins. BMC Bioinformatics 5(1):123.

Abstract

BACKGROUND: The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments.

RESULTS: Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy.

CONCLUSIONS: GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike.

PMID: 15350199

Tuesday, 21 January 2003

Transiently beneficial insertions could maintain mobile DNA sequences in variable environments

Edwards RJ & Brookfield JFY (2003): Transiently beneficial insertions could maintain mobile DNA sequences in variable environments. Mol Biol Evol. 20(1):30-37.

Abstract

The maintenance of mobile DNA sequences in clonal organisms has been seen as a paradox. If selfish mobile sequences spread through genomes only by overreplication in transposition, then sexuality is necessary for their spread through populations. The persistence of bacterial transposable elements without obvious dominant selectable markers has previously been explained by horizontal transfer. However, advantageous insertions of mobile DNAs are known in bacteria. Here we model maintenance of an otherwise selfish mobile DNA element in a clonal species in which selection for null mutations occurs during one of two temporally alternating environments. Large areas of parameter space permit maintenance of mobile DNAs where, without selection, they would have gone extinct. Horizontal transfer diminishes, rather than enhances, mean copy number. In finite populations, effective population sizes are greatly reduced by selective sweeps, and mean copy number can be increased as the reduced variance in copy number results in reduced selection.

PubMed: 12519903