Wednesday 25 November 2020

#ABACBS2020: Unsupervised orthologous gene tree enrichment for cost-effective phylogenomic analysis and a test case on waratahs (Telopea spp.)

Stephanie Chen, Maurizio Rossetto, Marlien van der Merwe, Hervé Sauquet, Patricia Lu-Irving, Jia-Yee Yap, William Studley, Greg Bourke, Jason Bragg, Richard J. Edwards


Whole-genome shotgun sequencing is becoming increasingly common in phylogenetic research due to the falling cost of whole genome sequencing compared to traditional methods which target subsets of genomes. However, there are few existing packages for assembling putatively orthologous loci from evolutionarily diverged samples and making alignments for phylogenetic analysis from these data. Additionally, short-read Illumina sequencing data are highly accurate but at low coverages, it can be difficult to draw out meaningful phylogenomic inferences, especially for non-model organisms for which there is no reference genome available.

We have developed a scalable method of rapidly generating species trees from short-read data without the need for a reference genome. The workflow involves (1) de novo genome assembly with ABySS at a range of k values (2) extracting the most complete BUSCO (Benchmarking Universal Single-Copy Orthologs) genes from each set of assemblies with the BUSCO Compiler and Comparison tool (BUSCOMP) (3) generating gene trees, and (4) constructing a species tree.

The workflow has been applied to a whole genome shotgun sequencing waratah (Telopea spp.) dataset of five species, comprising of two samples from each of the seven lineages; there are three lineages of T. speciosissima (New South Wales waratah) – coastal, upland, and southern. We have also generated a reference genome for T. speciosissima, and examine the robustness of the workflow by comparison to a reference-based approach. It is anticipated that the workflow will maximise the recovery of informative data from genomic datasets for reproducible phylogenomic studies and be especially useful for non-model organisms.

#ABACBS2020: Whole transcripts in genome assembly, annotation, and assessment: the draft genome assembly of the globally invasive common starling, Sturnus vulgaris

Katarina Stuart, Yuanyuan Cheng, Lee Rollins & Richard J. Edwards


Native to the Palearctic, the common starling (Sturnus vulgaris) is a near-globally invasive passerine that has now colonised every continent barring Antarctica. Ecological interest in the species is two-fold – they are considered a conservation risk and crop pest within the invasive ranges, while recent decades have brought with them a worrying decline in starling numbers within historical native ranges. Despite the global interest in this species, there are still fundamental knowledge gaps in our understanding of the genetics and population differences of this species across their native and invasive range. We present the Australian S. vulgaris draft genome and transcriptome to be used as a reference for further investigation into evolutionary characterisation of this ecologically significant species. An initial 10x Genomics linked-read assembly was scaffolded and gap-filled with low coverage nanopore sequencing, complemented by PacBio Isoseq full-length transcript data. Isoseq data was incorporated into assembly scaffolding, annotation, and assembly assessment to inform workflow decisions. We produced a draft assembly with a scaffold N50 size of 72.5 Mb, and assess this alongside a North American S. vulgaris draft genome, previously assembled from Illumina data. Lastly, we use these different reference genomes, alongside a non-scaffolded version of the Australian S. vulgaris genome to assess how choice of reference genome affects common population genetic downstream analysis using a global whole genome resequencing data set.

Tuesday 24 November 2020

#ABACBS2020: The role of gene duplication in the evolution of snake venoms

Jack Clarke, Vicki Thomson & Richard Edwards


Snakes are one of the most venomous animals on the planet, using their venom for defence and the capturing of prey. Snake venoms have evolved independently of other venoms in other vertebrates, and there is considerable variation between species in their proteomic composition. One of the primary mechanisms through which snake venoms are thought to evolve is the duplication, recruitment and specialisation of proteins from other tissues. In some cases, this evolution is known to involve the tandem duplication of genes resulting in chromosomal clusters of venom genes in some gene families. We have recently sequenced and assembled the genomes of two highly venomous Australian snakes: Notechis scutatus (mainland tiger snake) and Pseudonaja textilis (eastern brown snake). In conjunction with publicly available proteomes from 10 other venomous snakes and 2 non-venomous snakes, these genomes provide an excellent opportunity to examine the role that duplication and neofunctionalisation has played in snake venom evolution.

We have analysed 43 protein families known to play a role in snake venom and examined their pattern of duplication in snakes, compared to high quality reference genomes of other reptiles and non-venomous vertebrates. We find evidence for extensive duplications across some of these families, but no clear enrichment for duplication in the evolution of venom specifically. Instead, we identify a trend where numerous duplications specific to venomous snakes occur in proteins that seem predisposed to evolve by duplication and specialisation, even in non-venomous vertebrates. A subset of high-quality snake genomes was then used to further explore the nature of duplications. While tandem gene duplication is evident in some larger families, it remains absent in many.

The snake venom metalloproteinase (SVMP) family provides an excellent case study, with multiple duplication events throughout its evolutionary history in vertebrates. Part of the broader ADAM (“a disintegrin and metalloproteinase”) family of single-pass transmembrane and secreted zinc proteases, SVMP appears to have expanded by independent tandem duplications in different snake lineages. We also identify a second ADAM subfamily, ADAM20, with an abundance of venomous snake-specific duplications. Ongoing work in exploring the possible role of ADAM20 proteins in snake venoms and the role that genome assembly quality has played in our ability to robustly detect the presence or absence of gene duplication events.

Monday 2 November 2020

Antarctic desert soil bacteria exhibit high novel natural product potential, evaluated through long-read genome sequencing and comparative genomics

The third of our collaborative “controlled bacterial metagenome” de novo whole genome assembly projects was published in Environmental Microbiology in November. This was a fun collaboration with the Ferrari lab at UNSW trying to maximise bang for buck to sequence some complete bacterial genomes using PacBio sequencing to identify biosynthetic gene clusters. As with a previous paper, we used pooled genomic DNA sequencing and were able to assemble complete genomes (and plasmids) of the 13/17 species that had sufficient depth of coverage. Coolest of all (if you excuse the pun), these were bugs from an Antarctic expedition! Head over to the Ferrari lab website to find out more about their research.

Benaud N, Edwards RJ, Amos TG, D’Agostino PM, Gutiérrez-Cháveza C, Montgomery K, Nicetic I & Ferrari BC (2020). Antarctic desert soil bacteria exhibit high novel natural product potential, evaluated through long-read genome sequencing and comparative genomics. Environmental Microbiology.


Actinobacteria and Proteobacteria are important producers of bioactive natural products (NP), and these phyla dominate in the arid soils of Antarctica, where metabolic adaptations influence survival under harsh conditions. Biosynthetic gene clusters (BGCs) which encode NPs, are typically long and repetitious high G + C regions difficult to sequence with short‐read technologies. We sequenced 17 Antarctic soil bacteria from multi‐genome libraries, employing the long‐read PacBio platform, to optimize capture of BGCs and to facilitate a comprehensive analysis of their NP capacity. We report 13 complete bacterial genomes of high quality and contiguity, representing 10 different cold‐adapted genera including novel species. Antarctic BGCs exhibited low similarity to known compound BGCs (av. 31%), with an abundance of terpene, non‐ribosomal peptide and polyketide‐encoding clusters. Comparative genome analysis was used to map BGC variation between closely related strains from geographically distant environments. Results showed the greatest biosynthetic differences to be in a psychrotolerant Streptomyces strain, as well as a rare Actinobacteria genus, Kribbella, while two other Streptomyces spp. were surprisingly similar to known genomes. Streptomyces and Kribbella BGCs were predicted to encode antitumour, antifungal, antibacterial and biosurfactant‐like compounds, and the synthesis of NPs with antibacterial, antifungal and surfactant properties was confirmed through bioactivity assays.