Friday, 14 January 2022

The Waratah genome paper is out!

The final version of the waratah genome paper now out in Molecular Ecology Resources. This was a fun collaboration with the Royal Botanic Gardens and Domain Trust as one of the pilot genomes for BioPlatforms Australia’s Genomics for Australian Plants (GAP) initiative.

You can read the press release here, or our piece in the Conversation, We’ve unveiled the waratah’s genetic secrets, helping preserve this Australian icon for the future.

In this paper, we present a chromosome-level assembly for the NSW State Floral Emblem, the New South Wales waratah, Telopea speciosissima. This joins macadamia as the 2nd reference genome for the Proteaceae family & should help future studies for the remaining ca. 1700 species.

The genome was assembled from a ONT chassis, scaffolded with 10x Genomics linked reads and Phase Genomics HiC - made possible thanks to quality data from AGRF and the Ramaciotti Centre for Genomics. The final assembly was chromosome-level, with 94.1% on the 11 chromosomes (2n = 22).

As well as the assembly itself, the paper presents a three genomics tools that we hope will be helpful for other assemblies:

1. DepthSizer uses long-read depths and BUSCO predictions to estimate genome size. We estimated the waratah genome to be ca. 900 Mbp - bigger than kmer estimates, but smaller than flow cytometry of Tasmanian waratah.

2. Diploidocus builds on Purge Haplotigs, combining read depths, kmer frequencies & BUSCO predictions to classify and curate/filter assembly scaffolds. This decreases false duplications & contamination, and flags collapsed repeats for closer inspection.

3. DepthKopy uses BUSCO Complete genes to establish sequencing depth (like DepthSizer) and then estimates copy number for regions (e.g. genes), scaffolds & sliding windows of the assembly. This showed that most “Duplicated” BUSCOs are real duplicates.


Chen SH, Rossetto M, van der Merwe M, Lu-Irving P, Yap JS, Sauquet H, Bourke G, Amos TG, Bragg JG & Edwards RJ (accepted): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. Molecular Ecology Resources.
[Mol Ecol Res] [bioRxiv]

Abstract

Telopea speciosissima, the New South Wales waratah, is an Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Here, we report the first chromosome-level genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8% of Embryophyta BUSCOs “Complete”. We present a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering scaffolds, which combines read depths, k-mer frequencies and BUSCO predictions. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single-copy orthologues and estimate the genome size to be approximately 900 Mb. The largest 11 scaffolds contained 94.1% of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. We investigated CYCLOIDEA (CYC) genes, which have a role in determination of floral symmetry, and confirm the presence of two copies in the genome. Read depth analysis of 180 “Duplicated” BUSCO genes using a new tool, DepthKopy (https://github.com/slimsuite/depthkopy), suggests almost all are real duplications, increasing confidence in the annotation and highlighting a possible need to revise the BUSCO set for this lineage. The chromosome-level T. speciosissima reference genome (Tspe_v1) provides an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.

If you want a read and don’t have access, please get it touch or check out the bioRxiv preprint.

Friday, 3 December 2021

Limited Introgression between Rock-Wallabies with Extensive Chromosomal Rearrangements

When we were assembling the 10x Genomics linked read assemblies of several rock wallabies, one of them broke the lab record for Supernova assembly quality.

10x linked reads are a bit of a dead technology now, but genomes made with them are still useful and generating insights. One example is the first rock wallaby we assembled, as a (very small) part of a team that used the draft 10x genome assembly for Petrogale penicillata as a reference to investigate chromosomal rearrangements, published online today in Molecular Biology and Evolution, one of my favourite journals.


Potter P, Bragg JG, Turakulov R, Eldridge MDB, Deakin J, Kirkpatrick M, Edwards RJ & Moritz C (accepted): Limited introgression between rock-wallabies with extensive chromosomal rearrangements. Molecular Biology and Evolution msab333 https://doi.org/10.1093/molbev/msab333

Abstract

Chromosome rearrangements can result in the rapid evolution of hybrid incompatibilities. Robertsonian fusions, particularly those with monobrachial homology, can drive reproductive isolation amongst recently diverged taxa. The recent radiation of rock-wallabies (genus Petrogale) is an important model to explore the role of Robertsonian fusions in speciation. Here, we pursue that goal using an extensive sampling of populations and genomes of Petrogale from north-eastern Australia. In contrast to previous assessments using mitochondrial DNA or nuclear microsatellite loci, genomic data are able to separate the most closely related species and to resolve their divergence histories. Both phylogenetic and population genetic analyses indicate introgression between two species that differ by a single Robertsonian fusion. Based on the available data, there is also evidence for introgression between two species which share complex chromosomal rearrangements. However, the remaining results show no consistent signature of introgression amongst species pairs and where evident, indicate generally low introgression overall. X-linked loci have elevated divergence compared with autosomal loci indicating a potential role for genic evolution to produce reproductive isolation in concert with chromosome change. Our results highlight the value of genome scale data in evaluating the role of Robertsonian fusions and structural variation in divergence, speciation, and patterns of molecular evolution.

Thursday, 2 December 2021

EdwardsLab at Australasian Evolution Society #AUSEVO2021

Look out for some interesting talks by Edwards Lab members at this year’s Australasian Evolution Society 2021 conference, starting today:

Thursday 2nd December: 3-minute talks | 1130-1230

Kelton Cheung - Analysis of mitochondrial DNA reveals significant genetic diversity in invasive Australian cane toads

Kelton Cheung, Mark Richardson, Richard Edwards & Lee Ann Rollins

Mitochondrial DNA haplotype patterns across the native and invaded ranges can reveal the history and evolutionary trajectory of invasions. The invasion of Australia by the cane toad (Rhinella marina) has accelerated as it expanded westward. Despite this success, previous studies reported no mitochondrial genetic diversity in Australia. Here, we assembled a complete mitochondrial reference genome and haplotyped toads (N=119) from the native range and two introduced populations (Hawai’i and Australia), using whole genome and RNA-seq data. The complete R. marina mitochondrial genome consists of 18,152 base pairs with no significant gene arrangement as compared to other bufonid species. Although native range genetic diversity was much higher than that of the introduced ranges, we identified 29 haplotypes in Australia. While we did find evidence of founder effects following introduction, our results suggest there is significant genetic diversity within Australia, which may assist adaptation and invasion success in this species.


Thursday 2nd December: Selection 1 | 1415-1430

Katarina Stuart - A genetic perspective on rapid adaptation in the globally invasive European starling (Sturnus vulgaris)

Abstract coming soon…


Thursday 2nd December: Biogeography and phylogenetics | 1530-1545

Richard Edwards - DepthKopy: copy number prediction using single-copy long-read depth profiles

Richard J Edwards, Stephanie H Chen, Katarina C Stuart, Mark M Tanaka & Jason G Bragg

Gene duplication, followed by functional divergence of gene copies, is a fundamental component of genetic adaption and the evolution of novelty. Increasingly, evolutionary studies make use of the ever-expanding number of high-quality genome assemblies to characterise patterns of gene gain and loss. However, even reference genomes of the highest quality can experience assembly errors at tandemly repeated gene loci, resulting in incorrect inference of gene duplication patterns. Here, we present DepthKopy (https://github.com/slimsuite/depthkopy), which estimates the copy number for a gene, region or sequence of interest, using an estimate of single-copy sequencing depth derived from complete BUSCO genes. This is useful for identifying haplotigs, and collapsed repeat regions during genome assembly curation. Critically, for evolutionary studies of gene families, DepthKopy can identify genes for which the number of genes in the assembly does not seem to match the genome.


Friday 3rd December: Climate change and temperature | 1330-1345

Collin Ahrens - Genomic constraints of drought adaptation

Abstract coming soon…

Wednesday, 24 November 2021

#ABACBS2021 Lightning talk - DepthSizer and DepthKopy: genome size and copy number prediction using single-copy long-read depth profiles

Tune in for the ABACBS 2021 lightning talks this morning to hear about our applications of long reads to genome size and copy number prediction. For more information, you can read the NSW Waratah genome paper pre-print or visit the GitHub pages for DepthSizer and DepthKopy.

Richard J Edwards, Stephanie H Chen, Katarina C Stuart, Mark M Tanaka, Jason G Bragg.

DepthSizer and DepthKopy: genome size and copy number prediction using single-copy long-read depth profiles

A fundamental part of any genome project is establishing the genome size of the organism being sequenced. The gold standard for genome size measurement is flow cytometry, but this is not available to all groups and can give surprisingly variable results. Popular bioinformatic approaches predict genome size using kmer frequency profiles from high-accuracy (e.g. illumina or hifi) sequencing reads, or the mean depth of coverage reads mapped to an assembly. Both of these approaches can be adversely affected by repetitive regions of the genome. Mean sequencing depth is also highly reliant on assembly completeness.

Here, we present DepthSizer (https://github.com/slimsuite/depthsizer), which refines this approach by estimating sequencing depth based on single-copy complete BUSCO genes. DepthSizer works on the principle that genuine single-copy regions will tend towards the same, true, single-copy read depth. In contrast, assembly errors, collapsed repeats within those genes, or incorrect BUSCO predictions, will give inconsistent read depth deviations. The modal read depth across single-copy BUSCO genes, calculated from a depth density profile of these regions, should therefore provide a good estimate of the true depth of coverage. The method is benchmarked on model organism data and corrections for possible contamination, biases/inconsistencies in read mapping and/or raw read insertion/deletion error profiles are discussed. We also present DepthKopy (https://github.com/slimsuite/depthkopy), which uses the same read depth approach to estimate the copy number of assembly regions. This can be useful for identifying haplotigs, and collapsed repeat regions.

Keywords: BUSCO, Genome Assembly, Genomics, ONT, PacBio, copy number variants

Monday, 11 October 2021

SLiMSuite Short Linear Motif and Genomics Analysis Tools: BUSCOMP v0.13.0 (MetaEuk) release

SLiMSuite: BUSCOMP v0.13.0 (MetaEuk) release: BUSCOMP v0.13.0 is now on GitHub. This release features updates to parse additional BUSCO v5 outputs, including transcriptome and proteome modes. It has also been updated to be compatible with MetaEuk runs by generating the missing *.fna files where possible.

Congratulations to the Edwards Lab #GSAA21 Award winners

Congratulations to Katarina Stuart, Stephanie Chen, and Cadel Watson, who all won prizes at this year’s Genetics Society of AustralAsia 2021 Conference.

Stephanie’s abstract was selected for the Spencer Smith-White Travel Award, Katarina won the best student talk award, and Cadel won the best student lightning talk award. Well done, all!