Wednesday, 25 November 2020

#ABACBS2020: Unsupervised orthologous gene tree enrichment for cost-effective phylogenomic analysis and a test case on waratahs (Telopea spp.)

Stephanie Chen, Maurizio Rossetto, Marlien van der Merwe, Hervé Sauquet, Patricia Lu-Irving, Jia-Yee Yap, William Studley, Greg Bourke, Jason Bragg, Richard J. Edwards


Whole-genome shotgun sequencing is becoming increasingly common in phylogenetic research due to the falling cost of whole genome sequencing compared to traditional methods which target subsets of genomes. However, there are few existing packages for assembling putatively orthologous loci from evolutionarily diverged samples and making alignments for phylogenetic analysis from these data. Additionally, short-read Illumina sequencing data are highly accurate but at low coverages, it can be difficult to draw out meaningful phylogenomic inferences, especially for non-model organisms for which there is no reference genome available.

We have developed a scalable method of rapidly generating species trees from short-read data without the need for a reference genome. The workflow involves (1) de novo genome assembly with ABySS at a range of k values (2) extracting the most complete BUSCO (Benchmarking Universal Single-Copy Orthologs) genes from each set of assemblies with the BUSCO Compiler and Comparison tool (BUSCOMP) (3) generating gene trees, and (4) constructing a species tree.

The workflow has been applied to a whole genome shotgun sequencing waratah (Telopea spp.) dataset of five species, comprising of two samples from each of the seven lineages; there are three lineages of T. speciosissima (New South Wales waratah) – coastal, upland, and southern. We have also generated a reference genome for T. speciosissima, and examine the robustness of the workflow by comparison to a reference-based approach. It is anticipated that the workflow will maximise the recovery of informative data from genomic datasets for reproducible phylogenomic studies and be especially useful for non-model organisms.

#ABACBS2020: Whole transcripts in genome assembly, annotation, and assessment: the draft genome assembly of the globally invasive common starling, Sturnus vulgaris

Katarina Stuart, Yuanyuan Cheng, Lee Rollins & Richard J. Edwards


Native to the Palearctic, the common starling (Sturnus vulgaris) is a near-globally invasive passerine that has now colonised every continent barring Antarctica. Ecological interest in the species is two-fold – they are considered a conservation risk and crop pest within the invasive ranges, while recent decades have brought with them a worrying decline in starling numbers within historical native ranges. Despite the global interest in this species, there are still fundamental knowledge gaps in our understanding of the genetics and population differences of this species across their native and invasive range. We present the Australian S. vulgaris draft genome and transcriptome to be used as a reference for further investigation into evolutionary characterisation of this ecologically significant species. An initial 10x Genomics linked-read assembly was scaffolded and gap-filled with low coverage nanopore sequencing, complemented by PacBio Isoseq full-length transcript data. Isoseq data was incorporated into assembly scaffolding, annotation, and assembly assessment to inform workflow decisions. We produced a draft assembly with a scaffold N50 size of 72.5 Mb, and assess this alongside a North American S. vulgaris draft genome, previously assembled from Illumina data. Lastly, we use these different reference genomes, alongside a non-scaffolded version of the Australian S. vulgaris genome to assess how choice of reference genome affects common population genetic downstream analysis using a global whole genome resequencing data set.

Tuesday, 24 November 2020

#ABACBS2020: The role of gene duplication in the evolution of snake venoms

Jack Clarke, Vicki Thomson & Richard Edwards


Snakes are one of the most venomous animals on the planet, using their venom for defence and the capturing of prey. Snake venoms have evolved independently of other venoms in other vertebrates, and there is considerable variation between species in their proteomic composition. One of the primary mechanisms through which snake venoms are thought to evolve is the duplication, recruitment and specialisation of proteins from other tissues. In some cases, this evolution is known to involve the tandem duplication of genes resulting in chromosomal clusters of venom genes in some gene families. We have recently sequenced and assembled the genomes of two highly venomous Australian snakes: Notechis scutatus (mainland tiger snake) and Pseudonaja textilis (eastern brown snake). In conjunction with publicly available proteomes from 10 other venomous snakes and 2 non-venomous snakes, these genomes provide an excellent opportunity to examine the role that duplication and neofunctionalisation has played in snake venom evolution.

We have analysed 43 protein families known to play a role in snake venom and examined their pattern of duplication in snakes, compared to high quality reference genomes of other reptiles and non-venomous vertebrates. We find evidence for extensive duplications across some of these families, but no clear enrichment for duplication in the evolution of venom specifically. Instead, we identify a trend where numerous duplications specific to venomous snakes occur in proteins that seem predisposed to evolve by duplication and specialisation, even in non-venomous vertebrates. A subset of high-quality snake genomes was then used to further explore the nature of duplications. While tandem gene duplication is evident in some larger families, it remains absent in many.

The snake venom metalloproteinase (SVMP) family provides an excellent case study, with multiple duplication events throughout its evolutionary history in vertebrates. Part of the broader ADAM (“a disintegrin and metalloproteinase”) family of single-pass transmembrane and secreted zinc proteases, SVMP appears to have expanded by independent tandem duplications in different snake lineages. We also identify a second ADAM subfamily, ADAM20, with an abundance of venomous snake-specific duplications. Ongoing work in exploring the possible role of ADAM20 proteins in snake venoms and the role that genome assembly quality has played in our ability to robustly detect the presence or absence of gene duplication events.

Friday, 4 September 2020

We are recruiting! Two year Postdoc available in plant conservation genomics

We have a two year full time postdoc position available to conduct bioinformatics, laboratory and field research in the area of conservation genomics as part of ARC Linkage Project LP180100721, and assist in the supervision of honours and postgraduate research students as required. This is a collaborative project between the University of New South Wales, Royal Botanic Gardens Sydney and the Australian National University to develop approaches for the conservation of plant species that are threatened by a fungal pathogen (Austropuccinia psidii, the cause of myrtle rust). Techniques involved include but are not limited to field collection of plant material, DNA extraction, plant growth experiments, execution of software for bioinformatic and statistical analyses.

More details can be found on the UNSW jobs site here: Informal enquiries are welcome if you want to know more about the project.

Tuesday, 23 June 2020

Do epigenetic changes drive corticosterone responses to alarm cues in larvae of an invasive amphibian?

Our latest cane toad paper is now online at Integrative and Comparative Biology, using our draft cane toad genome as the reference for differential methylation analysis. (Thanks to coronavirus delays, the updated cane toad genome was not quite ready for this analysis but watch this space!)

Sarma RR, Edwards RJ, Crino OL, Eyck HJF, Waters PD, Crossland MR, Shine R & Rollins LA (accepted): Do epigenetic changes drive corticosterone responses to alarm cues in larvae of an invasive amphibian? Integrative and Comparative Biology icaa082


The developmental environment can exert powerful effects on animal phenotype. Recently epigenetic modifications have emerged as one mechanism that can modulate developmentally plastic responses to environmental variability. For example, the DNA methylation profile at promoters of hormone receptor genes can affect their expression and patterns of hormone release. Across taxonomic groups, epigenetic alterations have been linked to changes in glucocorticoid (GC) physiology. GCs are metabolic hormones that influence growth, development, transitions between life-history stages, and thus fitness. To date, relatively few studies have examined epigenetic effects on phenotypic traits in wild animals, especially in amphibians. Here, we examined the effects of exposure to predation threat and experimentally manipulated DNA methylation on corticosterone (CORT) levels in tadpoles and metamorphs of the invasive cane toad (Rhinella marina). We included offspring of toads sampled from populations across the species’ Australian range. In these animals, exposure to chemical cues from injured conspecifics induces shifts in developmental trajectories, putatively as an adaptive response that lessens vulnerability to predation. We exposed tadpoles to these alarm cues, and measured changes in DNA methylation and CORT levels, both of which are mechanisms that have been implicated in the control of phenotypically plastic responses in tadpoles. To test the idea that DNA methylation drives shifts in GC physiology, we also experimentally manipulated methylation levels with the drug zebularine. We found differentially methylated regions between control tadpoles and their full-siblings exposed to alarm cues, zebularine or both treatments. However, the effects of these manipulations on methylation patterns were weaker than clutch (e.g. genetic, maternal, etc.) effects. CORT levels were higher in larval cane toads exposed to alarm cues and zebularine. We found little evidence of changes in DNA methylation across the glucocorticoid receptor gene (NR3C1) promoter region in response to alarm cue or zebularine exposure. In both alarm cue and zebularine-exposed individuals, we found differentially methylated DNA in the suppressor of cytokine signaling 3 gene (SOCS3), which may be involved in predator avoidance behavior. In total, our data reveal that alarm cues have significant impacts on tadpole physiology, but show only weak links between DNA methylation and CORT levels. We also identify genes containing differentially methylated regions in tadpoles exposed to alarm cues and zebularine, particularly in range-edge populations, that warrant further investigation.

Thursday, 2 April 2020

Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C

Our latest paper is out! This one is a bit more photogenic than the cane toad - a German Shepherd Dog called Nala. This was a big international effort in a collaboration led by Bill Ballard at UNSW that included a dozen institutions across four continents. We threw all the main sequencing technologies at this one and achieved a chromosome-level assembly of better quality than the current “CanFam” reference genome.

You can find out more in the UNSW press release.

Field MA, Rosen BD, Dudchenko O, Chan EKF, Minoche AM, Edwards RJ, Barton K, Lyons RJ, Enosi Tuipulotu D, Hayes VM, Omer AD, Colaric Z, Keilwagen J, Skvortsova K, Bogdanovic O, Smith MA, Lieberman Aiden E, Smith TPL, Zammit RA & Ballard JWO (2020): Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. GigaScience 9(4):giaa027. [GigaScience]



The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties.


Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies. We generated this improved canid reference genome (CanFam_GSD) utilizing a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is ∼80 times as contiguous as the current canid reference genome (20.9 vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFamv3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. BUSCO analyses of the genome assembly results show that 93.0% of the conserved single-copy genes are complete in the GSD assembly compared with 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to ∼99%. Detailed examination of the evolutionarily important pancreatic amylase region reveals that there are most likely 7 copies of the gene, indicative of a duplication of 4 ancestral copies and the disruption of 1 copy.


GSD genome assembly and annotation were produced with major improvement in completeness, continuity, and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.

Photo credit: Outdoor Action Photography.