Thursday 29 October 2015

Friday 9 October 2015

Edwards Lab at #ABACBS2015

The Australian Bioinformatics And Computational Biology Society (ABACBS) 2015 Conference is almost upon us and the lab has four posters. Come visit us if you are attending! If not, you can check them out on the Australian Bioinformatics And Computational Biology Society Conference F1000Research channel and tweet or email with questions:

#ABACBS Poster P27: Genetic Characterisation of the Evolution of a Novel Metabolic Function in Yeast

Åsa Pérez-Bercoff, Tonia L. Russell, Paul V. Attfield, Philip J.L. Bell & Richard J. Edwards.


We have a unique opportunity to study how new biological pathways have evolved to produce a yeast with a novel metabolic activity. Thirty-seven Saccharomyces cerevisiae strains were grown as a mixed population, and adapted on a specific media for 1463 days, undergoing sexual mating every two months. As a pilot study, three of the ancestral strains from the starting population, including two diploids, were selected for new PacBio RSII long-read single molecule real time (SMRT) sequencing at the Ramaciotti Centre for Genomics. High quality complete genomes were assembled de novo using the hierarchical genome-assembly process (HGAP3) using only PacBio non-hybrid long-read SMRT sequencing data, and corrected using Quiver. In addition, we have shotgun metagenomic Illumina data from a population exhibiting early adaptation to growth on the selective media. We are developing methods to map these short-read data onto several high quality ancestral genomes in order to estimate the relative contribution of each ancestor’s genetic variation to the evolved population, and identify possible sites of recombination. In addition, metagenomic data is being mapped against the official S. cerevisiae reference strain S288c to conduct variant calling to identify single nucleotide polymorphisms (SNPs) that are not present in any of our ancestral or reference genomes. These will be compared to the publicly available “100 yeast genomes”, and partitioned into natural variation and candidates for novel mutations. By doing these comparisons we hope to elucidate how the population has evolved to acquire its novel characteristics.

#ABACBS2015 Poster P20: SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks

Emily Olorin, Kevin T. O’Brien, Nicolas Palopoli, Åsa Pérez-Bercoff, Denis C. Shields & Richard J. Edwards.


Short linear motifs (SLiMs) are small protein sequence patterns that mediate a large number of critical protein-protein interactions, involved in processes such as complex formation, signal transduction, localisation and stabilisation. SLiMs show rapid evolutionary dynamics and are frequently the targets of molecular mimicry by pathogens. Identifying enriched sequence patterns due to convergent evolution in non-homologous proteins has proven to be a successful strategy for computational SLiM prediction. Tools of the SLiMSuite package use this strategy, using a statistical model to identify SLiM enrichment based on the evolutionary relationships, amino acid composition and predicted disorder of the input proteins. The quality of input data is critical for successful SLiM prediction. Cytoscape provides a user-friendly, interactive environment to explore interaction networks and select proteins based on common features, such as shared interaction partners. SLiMScape embeds tools of the SLiMSuite package for de novo SLiM discovery (SLiMFinder and QSLiMFinder) and identifying occurrences/enrichment of known SLiMs (SLiMProb) within this interactive framework. SLiMScape makes it easier to (1) generate high quality hypothesis-driven datasets for these tools, and (2) visualise predicted SLiM occurrences within the context of the network. To generate new predictions, users can select nodes from a protein network or provide a set of Uniprot identifiers. SLiMProb also requires additional query motif input. Jobs are then run remotely on the SLiMSuite server ( for subsequent retrieval and visualisation. SLiMScape can also be used to retrieve and visualise results from jobs run directly on the server.

#ABACBS Poster FF6: Evidence for viral causes of cancer in prostate cancer transcriptomes and genomes

Timothy G. Amos, James S. Lawson, Wendy K. Glenn, Noel J. Whitaker & Richard J. Edwards.


Viruses are known to cause 10-15% of human cancers. Some ongoing viral infections such as HPV in cervical cancer can be easily detected in the cancer transcriptome. However, in other situations viruses contribute to cancer causation without sustained high expression levels. PCR has detected HPV in prostate cancers but RNA-Seq has not shown strong evidence of continuing infections. We analysed prostate cancer transcriptomes and genomes from The Cancer Genome Atlas to further investigate HPV’s role in prostate cancer causation. Previous studies have filtered out all reads that might not be viral and then set thresholds for real infections by comparing to positive controls. In contrast, we found all possible viral reads and then evaluated multiple streams of evidence that they were genuine. Sequences were evaluated on the uniqueness of alignments to human, viral and vector sequences. They were also evaluated for sequence complexity, sequence quality, alignment quality, relative alignment locations of paired-end reads and the presence of chimeric reads that indicate viral integration sites. By screening cancer transcriptomes and genomes against all viruses in the NCBI database (c. N = 5766), including non-human viruses, we are also able to compare the strength of evidence against known false positives. We discuss the results of 22 viral candidates for oncogenesis in 558 prostate cancer paired RNA-Seq and WGS datasets.

#ABACBS2015 Poster FF5: The SMRT Way to Sequence a Yeast Genome

Åsa Pérez-Bercoff, Tonia L. Russell, Paul V. Attfield, Philip J.L. Bell & Richard J. Edwards.


We have performed PacBio single molecule real time (SMRT) sequencing of three yeast whole genomes. A haploid reference yeast strain (S288C) and two novel diploid strains were sequenced as part of a larger functional genomics project. For each strain, 20kb SMRT Bell library preps were performed and sequenced on two SMRT Cells using the P6-C4 chemistry. Between 1.74 Gb and 2.55 Gb of usable sequence data was generated for each strain, with read lengths of up to 53.3 kb. Pure PacBio whole genome de novo assemblies were generated using the HGAP3 pipeline. We are using the S288C data to explore performance in comparison to the published genome as a reference. An initial assembly of S288C yielded over 99.9% genome coverage at 99.997% accuracy on only 26 unitigs, versus 17 reference chromosomes (16 nuclear chromosomes plus mitochondrion). Of these, 16 chromosomes were essentially returned as a single, complete unitig. We are now using the S288C data to optimise the assembly process and derive assembly settings for two novel strains. To this end, we have developed a new pipeline for the comparative assessment of high quality whole genomes against a reference. In this poster, we explore how good PacBio sequencing is for de novo genome sequencing in yeast with examples of both the power and limits of the technology.