Friday, 4 September 2020

We are recruiting! Two year Postdoc available in plant conservation genomics

We have a two year full time postdoc position available to conduct bioinformatics, laboratory and field research in the area of conservation genomics as part of ARC Linkage Project LP180100721, and assist in the supervision of honours and postgraduate research students as required. This is a collaborative project between the University of New South Wales, Royal Botanic Gardens Sydney and the Australian National University to develop approaches for the conservation of plant species that are threatened by a fungal pathogen (Austropuccinia psidii, the cause of myrtle rust). Techniques involved include but are not limited to field collection of plant material, DNA extraction, plant growth experiments, execution of software for bioinformatic and statistical analyses.

More details can be found on the UNSW jobs site here: Informal enquiries are welcome if you want to know more about the project.

Tuesday, 23 June 2020

Do epigenetic changes drive corticosterone responses to alarm cues in larvae of an invasive amphibian?

Our latest cane toad paper is now online at Integrative and Comparative Biology, using our draft cane toad genome as the reference for differential methylation analysis. (Thanks to coronavirus delays, the updated cane toad genome was not quite ready for this analysis but watch this space!)

Sarma RR, Edwards RJ, Crino OL, Eyck HJF, Waters PD, Crossland MR, Shine R & Rollins LA (accepted): Do epigenetic changes drive corticosterone responses to alarm cues in larvae of an invasive amphibian? Integrative and Comparative Biology icaa082


The developmental environment can exert powerful effects on animal phenotype. Recently epigenetic modifications have emerged as one mechanism that can modulate developmentally plastic responses to environmental variability. For example, the DNA methylation profile at promoters of hormone receptor genes can affect their expression and patterns of hormone release. Across taxonomic groups, epigenetic alterations have been linked to changes in glucocorticoid (GC) physiology. GCs are metabolic hormones that influence growth, development, transitions between life-history stages, and thus fitness. To date, relatively few studies have examined epigenetic effects on phenotypic traits in wild animals, especially in amphibians. Here, we examined the effects of exposure to predation threat and experimentally manipulated DNA methylation on corticosterone (CORT) levels in tadpoles and metamorphs of the invasive cane toad (Rhinella marina). We included offspring of toads sampled from populations across the species’ Australian range. In these animals, exposure to chemical cues from injured conspecifics induces shifts in developmental trajectories, putatively as an adaptive response that lessens vulnerability to predation. We exposed tadpoles to these alarm cues, and measured changes in DNA methylation and CORT levels, both of which are mechanisms that have been implicated in the control of phenotypically plastic responses in tadpoles. To test the idea that DNA methylation drives shifts in GC physiology, we also experimentally manipulated methylation levels with the drug zebularine. We found differentially methylated regions between control tadpoles and their full-siblings exposed to alarm cues, zebularine or both treatments. However, the effects of these manipulations on methylation patterns were weaker than clutch (e.g. genetic, maternal, etc.) effects. CORT levels were higher in larval cane toads exposed to alarm cues and zebularine. We found little evidence of changes in DNA methylation across the glucocorticoid receptor gene (NR3C1) promoter region in response to alarm cue or zebularine exposure. In both alarm cue and zebularine-exposed individuals, we found differentially methylated DNA in the suppressor of cytokine signaling 3 gene (SOCS3), which may be involved in predator avoidance behavior. In total, our data reveal that alarm cues have significant impacts on tadpole physiology, but show only weak links between DNA methylation and CORT levels. We also identify genes containing differentially methylated regions in tadpoles exposed to alarm cues and zebularine, particularly in range-edge populations, that warrant further investigation.

Thursday, 2 April 2020

Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C

Our latest paper is out! This one is a bit more photogenic than the cane toad - a German Shepherd Dog called Nala. This was a big international effort in a collaboration led by Bill Ballard at UNSW that included a dozen institutions across four continents. We threw all the main sequencing technologies at this one and achieved a chromosome-level assembly of better quality than the current “CanFam” reference genome.

You can find out more in the UNSW press release.

Field MA, Rosen BD, Dudchenko O, Chan EKF, Minoche AM, Edwards RJ, Barton K, Lyons RJ, Enosi Tuipulotu D, Hayes VM, Omer AD, Colaric Z, Keilwagen J, Skvortsova K, Bogdanovic O, Smith MA, Lieberman Aiden E, Smith TPL, Zammit RA & Ballard JWO (2020): Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. GigaScience 9(4):giaa027. [GigaScience]



The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties.


Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies. We generated this improved canid reference genome (CanFam_GSD) utilizing a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is ∼80 times as contiguous as the current canid reference genome (20.9 vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFamv3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. BUSCO analyses of the genome assembly results show that 93.0% of the conserved single-copy genes are complete in the GSD assembly compared with 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to ∼99%. Detailed examination of the evolutionarily important pancreatic amylase region reveals that there are most likely 7 copies of the gene, indicative of a duplication of 4 ancestral copies and the disruption of 1 copy.


GSD genome assembly and annotation were produced with major improvement in completeness, continuity, and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.

Photo credit: Outdoor Action Photography.

Friday, 31 January 2020

Research snapshot - January 2020

One of the most important, interesting and challenging questions in biology is how new traits evolve at the molecular level. My lab employs sequence analysis techniques to interrogate DNA and protein sequences for the signals left behind by evolution. We are a bioinformatics lab but like to incorporate bench/field data through collaboration wherever possible.

Main Research

Building on a solid foundation of bioinformatics and evolutionary theory, we apply genomics, transcriptomics, proteomics and interactomics and systems analysis to understand complex biological systems. Core research activities can be broadly divided into two main themes:

  1. Evolutionary Genomics, with a focus on applying de novo genome assembly and population genomics to problems in ecology and biotechnology.

  2. Protein-protein interactions, with a focus on the prediction of short linear interaction motifs and their role in human health and disease, including host-pathogen interactions.

These are explored in more detail, below.

1. Evolutionary Genomics.

The main research focus of the lab is the exploitation of genomic and post-genomic data to understand biological function and adaptation to novel environments. We work closely with the Ramaciotti Centre for Genomics and are involved in numerous de novo whole genome sequencing and assembly projects, using short read (Illumina), long read (PacBio & Nanopore) and linked read (10x Chromium) sequencing. One of the biggest of these is leading the bioinformatics and assembly effort in a consortium to sequence the cane toad genome, and leading the BABS Genome project to sequence two iconic Australian snakes. We are a member of the Oz Mammals Genomics initiative, assisting with the sequencing and assembly of Australia’s unique marsupial fauna. In 2018, we were selected as part of a team to sequence the Waratah genome as part of the pilot phase for the new Genomics of Australian Plants initiative.

We enjoy bringing our bioinformatics to bear on a variety of collaborative research projects. Most notably, we have an ARC Linkage Grant with Microbiogen Pty Ltd to understand how a strain of Saccharomyces cerevisiae has evolved to efficiently use xylose as a sole carbon source: something vital for second generation biofuel production that wild yeast cannot do. We are combining comparative genomics, evolutionary genetics, RNA-Seq transcriptomics, and competition assays to understand how the novel metabolism evolved. Through deep Illumina resequencing of evolving populations, and assembling reliable complete genomes of the founding ancestors, the ultimate goal is to trace how mutations have interacted with existing genetic variation during adaptive evolution. More recently, we have received an ARC Linkage Grant with the Royal Botanic Gardens and Domain Trust, to apply genomics approaches to the challenges of rainforest tree conservation in the face of climate change and invasive pathogens. We are also collaborating with industrial and academic partners to de novo sequence, assemble, annotate and interrogate the genomes of a selection of microbes with interesting metabolic abilities.

2. Short Linear Motifs (SLiMs).

Many protein-protein interactions are mediated by Short Linear Motifs (SLiMs): short stretches of proteins (5-15 amino acids long), of which only a few positions are critical to function. These motifs are vital for biological processes of fundamental importance, acting as ligands for molecular signalling, post-translational modifications and subcellular targeting. SLiMs have extremely compact protein interaction interfaces, generally encoded by less than 4 major affinity-/specificity-determining residues. Their small size enables high functional density and evolutionary plasticity, making them frequent products of convergent “ex nihilo” evolution. It also makes them challenging to identify, both experimentally and computationally.

A major focus of the lab is the computational prediction of SLiMs from protein sequences. This research originated with Rich’s postdoctoral research, during which he developed a sequence analysis methods for the rational design of biologically active short peptides. He subsequently developed SLiMDisc, one of the first algorithms for successfully predicting novel SLiMs from sequence data - and coined the term “SLiM” into the bargain. This subsequently lead to the development of SLiMFinder, the first SLiM prediction algorithm able to estimate the statistical significance of motif predictions. SLiMFinder greatly increased the reliability of predictions. SLiMFinder has since spawned a number of motif discovery tools and webservers and is still arguably the most successful SLiM prediction tool on benchmarking data. Methods are made available through the SLiMSuite bioinformatics package and webservers.

Current research is looking to develop these SLiM prediction tools further and apply them to important biological questions. Of particular interest is the molecular mimicry employed by viruses to interact with host proteins and the role of SLiMs in other diseases, such as cancer. Other work is concerned with the evolutionary dynamics of SLiMs within protein interaction networks.


In addition to the main research in the lab, the lab has a number of interdisciplinary collaborative projects applying bioinformatics tools and molecular evolution theory to experimental biology, often using large genomic, transcriptomic and/or proteomic datasets. These projects often involve the development of bespoke bioinformatics pipelines and a number of open source bioinformatics tools have been generated as a result. Please see the Publications and Lab software pages for more detail, or get in touch if something catches your eye and you want to find out more. We frequently have small collaborations and/or undergraduate student research projects. Many of these are “on hold” waiting for the right person, or sometimes data, to come along. If you think that you have what it needs, get in touch!

Tuesday, 10 December 2019

Edwards Lab at GIW/ABACBS2019

The Edwards Lab and close affiliates have five posters at GIW-ABACBS2019 this year. Come and check them out during the two poster sesstions:

Poster #26: Comparative performance of long-read whole genome assembly tools in diploid eukaryotes

Åsa Pérez-Bercoff, Paris Thompson & Richard J. Edwards [PDF]

As long-read (single molecule) sequencing from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) is getting more common, more long-read assemblers are emerging. Whilst a few benchmarking studies have been performed, there is no clear “best” assembly tool, and the choice is often a combination of compute resource availability and anecdotal reports of relative performance in similar organisms. Here, we have de novo assembled PacBio long-read sequencing data from 13 diploid Saccharomyces cerevisiae (baker’s yeast) strains (11 diploid, 1 haploid and 1 tetraploid) using four different assemblers: Canu v1.8, Flye v2.4.2, WTDBG2 (a.k.a. Redbean) v2.4 and Ra v20181211. Assembly performance statistics have been generated by comparing assemblies to the reference yeast genome (SGD R64.2.1) using QUAST v5.0.2, and rating each assembly for accuracy, coverage, and contiguity.

The latest generation of assembly tools decide how to process reads based on a genome size parameter. For heterozygous diploid organisms, it is not clear whether this should be the haploid or diploid genome size, or something in-between based on heterozygosity. In this study, we used three genome size settings: haploid (13.1Mb), diploid (26.2Mb) and half-way in-between (19.65Mb). This will enable us to establish how the genome size parameter influences the quality of the resulting genome assembly.

Poster #33: Estimating genome size using long read depth profiles and single copy regions of draft genome assemblies

Timothy G Amos, Ziying Zhang & Richard J Edwards [PDF]

Estimating the size of a eukaryote genome is a fundamental task in genome assembly. As well as informing decisions on sequencing technology and depth, greater accuracy in genome size prediction can assist in assessing the completeness and duplication of a genome assembly. Genome sizes can be estimated through both experimental and genome sequencing approaches. Experimental methods include densitometry, flow cytometry analysis of stained nuclei and quantitative PCR (qPCR) of single copy genes. Genome sequencing approaches include short-read k-mer distributions. However, these methods can give variable results, and are prone to inaccurate predictions for genomes with abundant repetitive sequences.

Here, we show that the read depth of long reads in a draft genome can be used to provide a relatively accurate estimate of genome size across various model organisms. In general, diploid genome assemblies will consist of regions of haploid, diploid and incorrect (or multi-copy organelle) depth. Repeats and assembly errors are likely to be highly variable in terms of genome vs assembly copy number and, therefore, coverage. The modal read depth of conserved single copy orthologues should therefore approximate the sequencing depth of the input data, which can be extrapolated to estimate genome size. We found that our method can provide comparable, if not more accurate, estimates than short-read k-mer distributions.

Poster #92: Using genomics to reveal drivers of invasion success

Katarina Stuart, Lee Ann Rollins, William Sherwin, Richard Edwards, Natalie Hofmeister & Yuanyuan Cheng [PDF]

Invasive species are a global concern due to their negative impacts on the economy and local ecosystems. However, well-documented invasions provide a useful system in which to pose biologically interesting questions regarding short time scale evolution. Answering these questions will further our knowledge of evolutionary mechanisms, as well as inform specific management strategies for the invasive population. The European starling (Sturnus vulgaris) is a global pest that was introduced into Australia’s south-eastern states in the 1860’s and has since greatly expanded its range. Previous research on multiple introduced starling populations has demonstrated that their morphology has undergone subtle shifts following colonisation. My research applies a range of sequencing techniques to investigate genomic variation across Australia’s starling population. We are combining long-, short- and linked-read whole genome sequencing to assemble a high-quality starling reference genome. This genome will be annotated using Iso-seq long-read (PacBio) whole transcriptome sequencing in order to properly identify putatively evolving loci. Population sequencing data (whole genome and DArTSeq) will be mapped onto this reference to identify functional SNPs and reveal potential drivers of the rapid phenotypic divergences across the Australian population.

Poster #93: Advancing genomic resources for myrtle rust research and management

Stephanie Chen, Jason Bragg & Richard Edwards [PDF]

Myrtle rust is a plant disease caused by an invasive fungal pathogen (Austropuccinia psidii) first detected in Australia in 2010. Over 350 native species from the family Myrtaceae, which includes eucalypts, paperbarks, tea-trees, and lillipillies, are known hosts. Detailed genetic information needed for an effective and coordinated response encompassing conservation management is lacking. We performed de novo genome assembly and annotation for two species which exhibit a spectrum of resistance – Rhodamnia argentea (malletwood) and Syzygium oleosum (blue lilly pilly). Reference genomes have been generated using 10X linked reads, and are being supplemented with long reads and a hybrid assembly approach. These genomes will complement reduced representation sequencing (DArTseq) and rust resistance assays from different genotypes across the landscape. Together, these data will facilitate the characterisation of genetic structure and rust resistance across space as well as within and among species. This research is crucial for the management of at-risk species through optimising methods of improving disease resistance and adaptation to future climates in addition to increasing our understanding of myrtle rust which is a pressing concern to native biodiversity.

Poster #168: BUSCOMP: BUSCO Compilation and Comparison for Assessing Completeness in Multiple Genome Assemblies

Richard J. Edwards [PDF]

Advances in DNA sequencing technology and bioinformatics tools have placed de novo genome assembly of complex organisms firmly in the domain of individual labs and small consortia. Nevertheless, the assemblies produced are often fragmented and incomplete. Optimal assembly depends on the size, repeat landscape, ploidy and heterozygosity of the genome, which are often unknown. It is therefore common practice to try multiple strategies, and there is a bottleneck in assessing and comparing assemblies.

BUSCO [1] is a powerful and popular tool that estimates genome completeness using gene prediction and curated models of single-copy protein orthologues. However, results can be counterintuitive: adding/removing scaffolds can alter BUSCO predictions elsewhere in the assembly, while low sequence quality may reduce “completeness” scores and miss genes that are present in the assembly [2].

BUSCOMP (BUSCO Compilation and Comparison) complements BUSCO to identify/overcome these issues. BUSCOMP compiles a non-redundant set of the highest-scoring single-copy BUSCO complete sequences, rapidly searches these against assemblies, and robustly re-rates genes as Complete (Single/Duplicated), Fragmented/Partial or Missing. On test data from three organisms (yeast, cane toad and mainland tiger snake), BUSCOMP (1) gives consistent results when re-running the same assembly, (2) is not affected by adding or removing non-BUSCO-containing scaffolds, and (3) is minimally affected by assembly quality. This makes BUSCOMP ideal to run alongside BUSCO when trying to compare and rank genome assemblies, even in the absence of error-correction.

Available at: >>.

  1. Simão FA et al. (2015) Bioinformatics 31:3210–3212
  2. Edwards RJ et al. (2018) GigaScience 7:giy095

Tuesday, 15 October 2019

An intrinsically disordered proteins community for ELIXIR

Davey NE, Babu MM, Blackledge M, Bridge A, Capella-Gutierrez S, Dosztanyi Z, Drysdale R, Edwards RJ, Elofsson A, Felli IC, Gibson TJ, Gutmanas A, Hancock JM, Harrow J, Higgins D, Jeffries CM, Le Mercier P, Mészáros B, Necci M, Notredame C, Orchard S, Ouzounis CA, Pancsa R, Papaleo E, Pierattelli R, Piovesan D, Promponas VJ, Ruch P, Rustici G, Romero P, Sarntivijai S, Saunders G, Schuler B, Sharan M, Shields DC, Sussman JL, Tedds JA, Tompa P, Turewicz M, Vondrasek J, Vranken WF, Wallace BA, Wichapong K & Tosatto SCE. (2019) An intrinsically disordered proteins community for ELIXIR. F1000Res. 8(ELIXIR):1753. [F1000Research] [PubMed]


Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled “An intrinsically disordered protein user community proposal for ELIXIR” held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.

PMID: 31824649.

Tuesday, 9 July 2019

We’ve been funded! ARC Linkage - Optimising plant populations for ecological restoration and resilience

We are very happy to report another successful ARC Linkage grant application.

LP180100721: Optimising plant populations for ecological restoration and resilience

Dr Richard Edwards; Professor Justin Borevitz; Dr Jason Bragg; Dr Maurizio Rossetto; Dr Brett Summerell; Dr Marlien van der Merwe

When choosing individual plants for restoration populations, there is potentially a trade-off between maximising genetic diversity (‘adaptability’) and selection for desirable properties (‘adaptation’). This project aims to develop pioneering methods to quantify this trade-off, and facilitate the design of optimised populations, with a focus on two Australian rainforest trees that are being impacted by myrtle rust infection: Rhodamnia argentea and Rhodamnia rubescens. By studying the genetic variation in each species, and how this relates to myrtle rust resistance and climate, this project aims to design populations that are genetically diverse, maximally resistant to myrtle rust, and adapted to future climate.

We are collaborating with the Royal Botanic Garden and Domain Trust to apply genomics to challenges of conservation for rainforest trees in the face of climate change and invasive pathogens.

There will be job and studentship opportunities associated with this grant, so watch this space (or get in touch)!