Friday, 11 July 2025

Pathways to Recovery: Genomics and Resistance Assays for Tree Species Devastated by the Myrtle Rust Pathogen

The latest collaborative paper from the lab is out in Molecular Ecology, with another great contribution from former PhD student Stephanie Chen. I really enjoyed collaborating with the team at the Royal Botanic Gardens and Domain Trust (and beyond), and this accomplishment feels like a fitting way to celebrate my first week as an ecological consultant at ORS. It combines reference genomes with population genomic data and resistance assays to provide hope in the fight against invasive pathogens - the notorious myrtle rust in this case - but with a warning: these species, like so many, have already suffered dramatic losses of genetic variation.

We have the tools and the knowledge to save biodiversity. We need the political will. And we don’t have the time to delay any further.

Chen SH, Yap YS, Viler V, Stehn C, Sandhu KS, Percival J, Pegg GS, Menzies T, Jones A, Guo K, Giblin FR, Cohen J, Edwards RJ, Rossetto M & Bragg JG (2025): Pathways to Recovery: Genomics and Resistance Assays for Tree Species Devastated by the Myrtle Rust Pathogen. Molecular Ecology e70030. [Mol Ecol] [PubMed] [bioRxiv]

Abstract

Myrtle rust is a plant disease caused by the invasive fungal pathogen Austropuccinia psidii (G. Winter) Beenken, which has a global host list of 480 species. It was detected in Australia in 2010 and has caused the rapid decline of native Myrtaceae species, including rainforest trees Rhodamnia rubescens (Benth.) Miq. (scrub turpentine) and Rhodomyrtus psidioides (G.Don) Benth. (native guava). Ex situ collections of these species have been established, with the goal of preserving remaining genetic variation. Analysis of reduced representation sequencing (DArTseq; n = 444 for R. rubescens and n = 301 for R. psidioides) showed genetic diversity is distributed along a latitudinal gradient across the range of each species. A panel of samples of each species (n = 27 for R. rubescens and n = 37 for R. psidioides) was resequenced at genome scale, revealing large historical effective population sizes, and little variation among individuals in inferred levels of deleterious load. In Rhodamnia rubescens, experimental assays (n = 297) identified individuals that are putatively resistant to myrtle rust. This highlights two important points: there are tangible pathways to recovery for species that are highly susceptible to rust via a genetically informed breeding programme, and there is a critical need to act quickly before more standing diversity is lost.

Tuesday, 8 July 2025

A new era for the Edwards Lab

As of July 2025, I will be working outside academia as a ecological consultant. I will be maintaining adjunct status at UNSW and UWA, and some papers will continue to come out and be posted here. The rest of the blogsite will be reconfigured over time to reflect the new situation.

Saturday, 15 March 2025

A genome assembly and annotation for the Australian alpine skink Bassiana duperreyi using long-read technologies.

Hanrahan BJ, Alreja K, Reis ALM, Chang JK, Dissanayake DSB, Edwards RJ, Bertozzi T, Hammond JM, O’Meally D, Deveson IW, Georges A, Waters P & Patel HR (accepted): A genome assembly and annotation for the Australian alpine skink Bassiana duperreyi using long-read technologies. G3 jkaf046, DOI: 10.1093/g3journal/jkaf046 [G3] [PubMed]

Abstract

The eastern three-lined skink (Bassiana duperreyi) inhabits the Australian high country in the southeast of the continent including Tasmania. It is a distinctive oviparous species because it undergoes sex reversal (from XX genotypic females to phenotypic males) at low incubation temperatures. We present a chromosome-scale genome assembly of a Bassiana duperreyi XY male individual, constructed using PacBio HiFi and ONT long reads scaffolded using Illumina HiC data. The genome assembly length is 1.57 Gbp with a scaffold N50 of 222 Mbp, N90 of 26 Mbp, 200 gaps and 43.10% GC content. Most (95%) of the assembly is scaffolded into 6 macrochromosomes, 8 microchromosomes and the X chromosome, corresponding to the karyotype. Fragmented Y chromosome scaffolds (n=11 > 1 Mbp) were identified using Y-specific contigs generated by genome subtraction. We identified two novel alpha-satellite repeats of 187 bp and 199 bp in the putative centromeres that did not form higher order repeats. The genome assembly exceeds the standard recommended by the Earth Biogenome Project; 0.02% false expansions, 99.63% kmer completeness, 94.66% complete single copy BUSCO genes and an average 98.42% of transcriptome data mappable to the genome assembly. The mitochondrial genome (17,506 bp) and the model rDNA repeat unit (15,154 bp) were assembled. The B. duperreyi genome assembly has high completeness for a skink and will provide a resource for research focused on sex determination and thermolabile sex reversal, as an oviparous foundation species for studies of the evolution of viviparity, and for other comparative genomics studies of the Scincidae.

Friday, 14 March 2025

Chromosome-level genome assembly of the spangled emperor, Lethrinus nebulosus (Forsskål 1775)

The first data note from the Ocean Genomes project is now out in Scientific Data - a first-in-family genome from the emperor breams.

Parata L, Anstiss L, de Jong E, Doran A, Edwards RJ, Newman SJ, Payet SD, Skepper CL, Wakefield CB, OceanOmics Centre, OceanOmics Division & Corrigan S (2025): Chromosome-level genome assembly of the spangled emperor, Lethrinus nebulosus (Forsskål 1775). Scientific Data 12:435. DOI: 10.1038/s41597-025-04690-w.

Abstract

Spangled emperor, Lethrinus nebulosus (Forsskål 1775), is a tropical marine fish of economic and cultural importance throughout the Indo-West Pacific. It is one of the most targeted recreational fishes in the Gascoyne Coast Bioregion of Western Australia where it serves as an indicator species for recreational fishing. Here, we present a highly accurate, near-gapless, chromosome-level, haplotype-phased reference genome assembly of L. nebulosus (Lethrinus nebulosus (Spangled Emperor) genome, fLetNeb1.1; PRJNA1074345), the first for the species and the first high-quality genome representative of the family Lethrinidae. The 1.09 Gb genome was assembled from PacBio HiFi and Dovetail Omni-C proximity ligation sequencing data. The contig N50 is 21–24 Mbp and BUSCO completeness greater than 99%. A preliminary gene annotation identified 24,583 genes with the predicted transcriptome achieving a BUSCO completeness score of 99.1%. This resource will facilitate genomic studies to inform the sustainable management of L. nebulosus and other Lethrinids.

Thursday, 13 February 2025

Small but mitey: a gapless telomere-to-telomere assembly of an unidentified mite with a streamlined genome

A few years ago, we accidentally sequenced an interesting mite when making the Rhodamnia argentea reference genome. We don’t really know what it is, but thanks to nanopore sequencing and a low repeat content, we produced a gapless telomere-to-telomere assembly! As an unfunded passion project, it’s been ticking along in the background since then, but is now published in Genome Biology and Evolution.

One of the most interesting aspects of the paper is the genome reduction - despite being a complete nuclear genome, the assembly is under 35 megabases! This is a couple of orders of magnitude smaller than many other arachnid genomes.

We are yet to do a comprehensive analysis, but just looking at the core set of expected arachnid genes, as defined by BUSCO (e.g. single-copy genes that are expected to be present in most arachnid genomes) revealed that about a third of them were missing. (This low completeness was partly responsible for the time taken to fully recognised and deal with the contamination.) Tellingly, the closest sequenced relatives of this mite also have reduced genomes and are missing many of the same BUSCO genes, revealing a long history of gene loss.

The proportion of “Duplicated” BUSCO genes is also surprisingly high at 4.5%. These are genuine duplications, with consistent diploid read depths and many of the pairs present on both chromosomes. It will be interesting to see if these duplications are replacing some of the lost functions from the other genes, or are novel genes behind such a specialised lifestyle. As an unfunded passion project, it was beyond scope to investigate the full annotation as part of this paper, but get in touch if you would be interested to do this!

Edwards RJ, Chen SH, Halliday B & Bragg JG (2025): Small but mitey: a gapless telomere-to-telomere assembly of an unidentified mite with a streamlined genome. Genome Biology and Evolution Feb 13. [Gen Biol Evol] [PubMed]

Abstract

A draft assembly of the rainforest tree Rhodamnia argentea Benth. (malletwood, Myrtaceae) revealed contaminating DNA sequences that most closely matched those from mites in the family Eriophyidae. Eriophyoid mites are plant parasites that often induce galls or other deformities on their host plants. They are notable for their small size (averaging 200 μm), distinctive four-legged body structure, and heavily streamlined genomes, which are among the smallest known of all arthropods. Contaminating mite sequences were assembled into a high-quality gapless telomere-to-telomere nuclear genome. The entire genome was assembled on two fully contiguous chromosomes, capped with a novel TTTGG or TTTGGTGTTGG telomere sequence, and exhibited clear signs of genome reduction (34.5 Mbp total length, 68.6% arachnid Benchmarking Universal Single-Copy Ortholog completeness). Phylogenomic analysis confirmed that this genome is that of a previously unsequenced eriophyoid mite. Despite its unknown identity, this complete nuclear genome provides a valuable resource to investigate invertebrate genome reduction.

Monday, 27 January 2025

A reference genome for the eastern bettong (Bettongia gaimardi)

Silver LW, Edwards RJ, Neaves L,A Manning, CJ Hogg & S Banks (2025): A reference genome for the eastern bettong (Bettongia gaimardi) [version 2; peer review: 3 approved]. F1000Research 13:1544. [F1000Res] [PubMed]

Abstract

The eastern or Tasmanian bettong (Bettongia gaimardi) is one of four extant bettong species and is listed as ‘Near Threatened’ by the IUCN. We sequenced short read data on the 10x system to generate a reference genome 3.46Gb in size and contig N50 of 87.36Kb and scaffold N50 of 2.93Mb. Additionally, we used GeMoMa to provide and accompanying annotation for the reference genome. The generation of a reference genome for the eastern bettong provides a vital resource for the conservation of the species.

Tuesday, 17 December 2024

Parental assigned chromosomes for cultivated cacao provides insights into genetic architecture underlying resistance to vascular streak dieback

As a big fan of chocolate, it was great to have the opportunity to help out with a cacao genome. This study presents the first diploid, fully scaffolded, and parentally phased genome resource for Theobroma cacao L. to provide insights into the genetic architecture underlying resistance and susceptibility to vascular streak dieback (VSD), a significant threat to cacao production in Southeast Asia and Melanesia. By analyzing NLR gene clusters and other disease response gene candidates in proximity to informative QTLs, the research identifies structural variants within NLRs inherited from resistant and susceptible parents, offering potential breeding targets for VSD resistance.

Tobias PA, Downs J, Epaina P, Singh G, Park RF, Edwards RJ, Brugman E, Zulkifli A, Muhammad J, Purwantara A & Guest DI (2024): Parental assigned chromosomes for cultivated cacao provides insights into genetic architecture underlying resistance to vascular streak dieback. The Plant Genome doi: 10.1002/tpg2.20524. [Plant Genome] [bioRxiv] [PubMed]

Abstract

Diseases of Theobroma cacao L. (Malvaceae) disrupt cocoa bean supply and economically impact growers. Vascular streak dieback (VSD), caused by Ceratobasidium theobromae, is a new encounter disease of cacao currently contained to southeast Asia and Melanesia. Resistance to VSD has been tested with large progeny trials in Sulawesi, Indonesia, and in Papua New Guinea with the identification of informative quantitative trait loci (QTLs). Using a VSD susceptible progeny tree (clone 26), derived from a resistant and susceptible parental cross, we assembled the genome to chromosome-level and discriminated alleles inherited from either resistant or susceptible parents. The parentally phased genomes were annotated for all predicted genes and then specifically for resistance genes of the nucleotide-binding site leucine-rich repeat class (NLR). On investigation, we determined the presence of NLR clusters and other potential disease response gene candidates in proximity to informative QTLs. We identified structural variants within NLRs inherited from parentals. We present the first diploid, fully scaffolded, and parentally phased genome resource for T. cacao L. and provide insights into the genetics underlying resistance and susceptibility to VSD.

#AusEvol2024 - Depth-based correction of gene duplications and losses in genome assemblies

The Australasian Evolution Society conference has always been one of my favourites, due to its laid back culture of inclusivity and kindness. (And low cost!) It therefore feels quite fitting that my last conference as an Aussie academic was AES2024.

This talk was a bit of an update from my AES2021 presentation. This showcased some of the latest additions to DepthKopy, including depth-based copy number correction of genome features, such as rDNA genes, repeat families, or multicopy genes. This includes a feature that classifies multicopy “Duplicated” genes identified by BUSCO as true (biological) or false (artefactual) duplicates. TL/DR version: analysis of draft genome assemblies for 45 species of fish across five different depths/qualities indicates that DepthKopy can correct the copy number and total length of multicopy features to within 10% of the true number. (The lower-quality raw assemblies ranged from a 30% under-estimate to a 60% over-estimate.)

This will be of most importance when low quality draft genomes are included in a comparative genomics analysis. However, even the best genome assemblies appear to have some “collapsed” or duplicated loci where the copy number in the assembly does not accurately reflect the copy number in the genome. DepthKopy is useful for exploring the magnitude of such disparities, and can help to identify and correct specific disrepancies in genes or features of interest.

Tuesday, 3 December 2024

So long, Ocean Genomes... and thanks for all the fish!

After a successful couple of years, today was my last day at UWA. I am proud of the team that I helped to build at the Minderoo Oceanomics Centre at UWA, and the things we have accomplished together. The UWA Oceans Institute has been a fantastic place to work, and I look forward to completing some exciting ongoing collaborations in my capacity as adjunct. It's been exciting to see Ocean Genomes grow from a concept with a largely empty lab to a fully-fledged genome factory capable of generating multiple high-quality genomes a week. The associated publications should hopefully be following soon, and I look forward to continued collaboration with the team as an Oceans Institute adjunct.

Developments in DNA sequencing technology over the past few years have been immense, but the most impressive part for me has been witnessing the laboratory technical team optimising the sample preparations for sequencing. Everything gets so much harder when you move from human samples (the focus of most methods development and testing) into non-model organisms, and I am convinced that the quality of genomes we’ve been producing is in large part due to the quality of the DNA going into the sequencers.

I am now looking for my next challenge and am officially Open For Work. We’ll be moving back to Dublin at the end of January. If you are based in Ireland and need an experienced interdisciplinary problem solver with broad expertise across bioinformatics and biomolecular science, please get in touch! Academic and non-academic opportunities are welcome.

Wednesday, 20 November 2024

Adventures in genome assembly curation and QC

It was great to have the opportunity to present some recent - and not so recent - work today to the Australian BioCommons Genomics Community meeting. You can find a link to the recording here, and the slides are up here if you missed it and are curious. I’ll add some additional links to the tools discussed with time.

One of the highlights of my time in Australia has been the awesome bioinformatics and genomics (and other) communities. I’ve been lucky to meet and work with so many awesome scientists and fantastic human beings.

Saturday, 16 November 2024

Repeat-Rich Regions Cause False-Positive Detection of NUMTs: A Case Study in Amphibians Using an Improved Cane Toad Reference Genome

Version 3* of the cane toad reference genome, aRhiMar1.3 is now officially out and published in Genome Biology and Evolution. [*It’s only the second published genome but for internal reasons the original draft genome was version 2!].

The focus of the paper itself is confirming the lack of Nuclear Mitochondrial fragments (a.k.a. NUMTs) in the cane toad genome, which could impact whole-mitogenome analysis of genetic diversity in cane toads. We were pretty surprised when we first looked for NUMTs in the cane toad genome and could not find any! The draft genome is pretty drafty, especially in terms of missing repetitive regions, so an updated long-read assembly was important to rule out a false negative result.

The new genome is in much better shape, with an extra 922 Mbp (>95% repeats) and a 15x increase in scaffold N50 (2.5 Mbp). (My biggest regret with the original paper was not sticking to my guns and including an early DepthSizer genome size estimate of 3.5 Mbp, which has subsequently turned out to be correct.) The cane toad genome remains a tough nut to crack, and we didn’t quite reach the magic 1Mb contig N50 (860 kb), but the functional completeness was markedly improved and we are pretty confident that the continued absence of NUMT detection is a real phenomenom and does not simply reflect technical limitations.

Watch this space for a chromosome-level cane toad genome, which is still in the works.

Cheung K, Rollins LA, Hammond JM, Barton K, Ferguson JM, Eyck HJF, Shine R & Edwards RJ (2024): Repeat-rich regions cause false positive detection of NUMTs - a case study in amphibians using an improved cane toad reference genome. Genome Biology and Evolution evae246. [Gen Biol Evol] [bioRxiv] [PubMed]

Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.

Tuesday, 5 November 2024

#ABACBS2024 Poster 102: Improving phased Hifiasm assemblies with 20 kb ONT reads

After a great presentation this morning by Emma de Jong on our High-Quality Genomes for Australian Lutjanidae Species (abstract below), if you’re at ABACBS2024 then please drop by Poster #102 to find out about some of the work we’re doing with ONT data.

Abstracts

Improving phased Hifiasm assemblies with 20 kb ONT reads

Richard J Edwards, Adrianne Doran, Emma de Jong, Lara Parata, Shannon Corrigan

The quality and quantity of genome assembly has improved dramatically over recent years. Many large-scale genome projects combine assembly of HiFi and HiC reads using Hifiasm to produce contiguous phased assemblies, scaffolded to chromosome-level. Nevertheless, HiFi reads are typically under 25 kb and can still struggle to assemble long, low diversity repeat regions. Obtaining ultra-long (100 kb or longer) ONT reads to solve this problem remains a significant challenge due to technical constraints and DNA sample requirements. Here, we explore the utility of using standard ONT long reads (20 kb or more) as “ultra-long” input to improve phased Hifiasm assemblies for 22 species of bony fish (Genome Size, 627 Mb 1.54 Gb). We also explore whether the new --telo-m mode in Hifiasm v0.9.0 improves telomere prediction. Incorporating 20kb+ ONT reads (7.8X 93.5X) significantly increased assembly contiguity. BUSCO Completeness was not significantly altered, although there was some re-partitioning of BUSCO genes between phased haplotypes for some species. Improvement did not strongly correlate with read depth (either HiFi or ONT), suggesting that the underlying read length distributions and/or specific genome features are more important for determining the outcome. Hifiasm --telo-m mode significantly increased telomere recovery, assembling over six times the number of gapless telomere-to-telomere chromosomes when combined with 20kb+ ONT reads. Verification of how these results translate to the ease of curation and/or quality of final HiC-scaffolded chromosome-level assemblies is ongoing, with a goal to determine whether the additional sample preparation and sequencing in the lab is cost-effective.

High-Quality Genomes for Australian Lutjanidae Species

Emma de Jong, Lara Parata, Philipp E Bayer, Shannon Corrigan, Richard J Edwards

Lutjanidae (snappers) are highly valued in commercial and recreational fisheries worldwide and serve as indicator species of the health of marine environments and fishery bioregions in Western Australia. Comprehensive genomic mapping of immune gene families of Lutjanidae species are lacking, but this information is critical for understanding disease vulnerability, the impact of environmental stress, improving aquaculture efforts and to provide insights into the health of wild populations. Despite their importance, only 3 out of 113 Lutjanid species currently have available reference genomes, two of which are highly fragmented (>11,000 and >200,000 contigs), impacting studies on gene families relevant to aquaculture. In this study, we present high-quality chromosome-level reference genomes for 14 Australian Lutjanidae species across seven genera, generated using HiFi and HiC data. We present initial comparative genomic analyses, including immune gene content and chromosomal synteny analyses across species. These analyses provide insights into the genomic architecture and evolutionary relationships within Lutjanidae. Ongoing work aims to comprehensively map and compare the immune gene family repertoire across Lutjanidae genera, as well as Lethrinidae species as an outgroup, to determine genus-specific changes in genes (e.g. loss, selection, duplication) important for pathogen detection, antigen presentation, inflammation, and immune memory. These genome assemblies will serve as a foundational resource to the wider scientific community interested in Lutjanidae

Sunday, 3 November 2024

Chromosome-level genome assembly of the Australian rainforest tree Rhodamnia argentea (malletwood)

Genome projects don’t always go according to plan, and when we first sequenced Rhodamnia argentea with 10x Genomics linked reads, we accidentally sequenced a parasite along with it. This was quite hard to identify from the sequencing data itself, as the depth of sequencing was quite high, and we were unable to identify the guilty bug itself, which is probably microscopic. Getting to the bottom of this took a back seat for a while when the focus of the project shifted to Melaleuca quinquenervia, but with the addition of ONT reads and Hi-C, we have now been able to generate a chromosome-level decontaminated assembly. (An assembly of the contaminating mite will follow…)

Chen SH, Jones A, Lu-Irving P, Yap JYS, van der Merwe M, Bragg JG & Edwards RJ (2024): Chromosome-level genome assembly of the Australian rainforest tree Rhodamnia argentea (malletwood). Genome Biology and Evolution 16(11):evae238. [Gen Biol Evol] [PubMed]

Abstract

Myrtaceae are a large family of woody plants, including hundreds that are currently under threat from the global spread of a fungal pathogen, Austropuccinia psidii (G. Winter) Beenken, which causes myrtle rust. A reference genome for the Australian native rainforest tree Rhodamnia argentea Benth. (malletwood) was assembled from Oxford Nanopore Technologies long-reads, 10x Genomics Chromium linked-reads, and Hi-C data (N50 = 32.3 Mb and BUSCO completeness 98.0%) with 99.0% of the 347 Mb assembly anchored to 11 chromosomes (2n = 22). The R. argentea genome will inform conservation efforts for Myrtaceae species threatened by myrtle rust, against which it shows variable resistance. We observed contamination in the sequencing data, and further investigation revealed an arthropod source. This study emphasizes the importance of checking sequencing data for contamination, especially when working with nonmodel organisms. It also enhances our understanding of a tree that faces conservation challenges, contributing to broader biodiversity initiatives.

Thursday, 31 October 2024

BioDiversity Genomics Conference (BG24)

The global BioDiversity Genomics Conference (BG24) was another great success this year. As well as running a session on Marine Vertebrate Genomics, it was particularly rewarding to see some many quality contributions from lab members and alumni.

Biodiversity Genomics in Australasia

Jessica Pearce (UWA): Reconstructing tiger shark history using genomics

Sharks and rays are a clade of high evolutionary, ecological, economic, and cultural significance, and yet they are one of the most threatened taxa groups in the marine environment. Despite this, there remains a lack of molecular resources for this class to assist with their conservation. The tiger shark (Galeocerdo cuvier) is a near threatened, keystone species distributed circumglobally that is under substantial pressure from human impacts, making it a high priority for management worldwide. We sequenced and characterised a reference quality assembly for the tiger shark, the first genome for this family, and used this to dive deeper into the evolution, adaption, and demographic history of this ancient species. We investigated how its effective population size (Ne), genome-wide heterozygosity and inbreeding has changed over time to infer how this species has responded to past global events, and hence potential responses to ongoing and future accumulating threats. This aims to assist in effective management of this high-profile species. 

Katarina Stuart (University of Auckland): Lifetime fitness is correlated more strongly with structural variant than SNP mutational load in a threatened bird species

Conservation genomics is becoming increasingly interested in whether structural variant (SV) information can help the management of threatened species. The functional consequences of SVs are more complex than for single nucleotide polymorphisms (SNPs) and thus may be more likely to contribute to load. While the impacts of SV-specific genetic load may be less consequential for large populations, the interplay between weakened selection and stochastic processes mean that smaller populations, like those of the threatened Aotearoa hihi/New Zealand stitchbird (Notiomystis cincta), may harbour a high SV load. Hihi were once confined to a single remnant population, but have been reestablished into six sanctuaries and reserves, often via secondary bottlenecks, resulting in low genetic diversity, low adaptive potential and inbreeding depression. In this study, we use whole genome resequencing of 30 individuals from the Tiritiri Matangi population to identify the nature and distribution of both SNPs and SVs within this small avian population. We find that SNP and SV individual mutation load is only moderately correlated, likely because SVs arise in regions of high recombination and reduced evolutionary conservation. Finally, we leverage a long-term monitoring dataset of pedigree and fitness data to assess the impact of SNP and SV mutation load on individual fitness, and demonstrate that SV load correlates more strongly than SNP load with lifetime fitness. The results of this study indicate that only examining SNPs neglects important aspects of intraspecific variation, and that studying SVs has direct implications for linking genetic diversity and genetic health to inform management decisions.

Richard Edwards (UWA): Improving Hifiasm assemblies with 20 kb ONT reads

The quality and quantity of genome assembly has improved dramatically over recent years. Many large-scale genome projects assemble HiFi and HiC reads using Hifiasm to produce contiguous phased assemblies, scaffolded to chromosome-level. Nevertheless, HiFi reads are typically under 25 kb and can still struggle to assemble long, low-diversity repeat regions. Obtaining ‘ultra-long’ (100 kb or longer) ONT reads to solve this problem remains a significant challenge due to technical constraints and DNA sample requirements. Here, we explore the utility of using standard ONT long reads (20 kb or more) as ‘ultra-long’ input to improve phased Hifiasm assemblies for 22 species of bony fish (Genome Size, 627 Mb - 1.54 Gb). We also explore whether the new ‘telo-m’ mode in Hifiasm v0.9.0 improves telomere prediction in these species. Incorporating 20+ kb ONT reads (7.8X - 93.5X) significantly increased assembly contiguity. BUSCO completeness was not significantly altered, although there was some re-partitioning of BUSCO genes between phased haplotypes for some species. Improvement did not strongly correlate with read depth (neither HiFi nor ONT), suggesting that the underlying read length distributions and/or specific genome features are more important for determining the outcome. Hifiasm ‘telo-m’ mode significantly increased telomere recovery, assembling over six times the number of gapless telomere-to-telomere chromosomes when combined with incorporation of ONT reads. Verification of how these results translate to the quality and/or ease of curation of final HiC-scaffolded chromosome-level assemblies is ongoing, with a goal to determine whether the additional sample preparation and sequencing in the lab is cost-effective.

Emma de Jong (UWA): High-Quality Genomes for Australian Lutjanidae Species

Lutjanidae (snappers) are highly valued in commercial and recreational fisheries worldwide and some species serve as fisheries indicator species particularly for bioregions in Western Australia. Comprehensive genomic mapping of immune gene families of Lutjanidae species are lacking, but this information can inform understanding disease vulnerability, the impact of environmental stress, improving aquaculture efforts and to provide insights into the health of wild populations. Despite their importance, only 3 out of 113 Lutjanid species currently have available reference genomes, two of which are highly fragmented (>11,000 and >200,000 contigs), impacting studies on gene families relevant to aquaculture. In this study, we present high-quality chromosome-level reference genomes for 14 Australian lutjanid species across seven genera, generated using PacBio HiFi and Dovetail HiC data. We present initial comparative genomic analyses, including immune gene content and chromosomal synteny analyses across species. These analyses provide insights into the genomic architecture and evolutionary relationships within Lutjanidae. Ongoing work aims to comprehensively map and compare the immune gene family repertoire across genera in Lutjanidae, as well as lethrinid species as an outgroup, to determine genus-specific changes in genes (e.g., loss, selection, duplication) important for pathogen detection, antigen presentation, inflammation, and immune memory. These genome assemblies will serve as a foundational resource to the wider scientific community interested in these species.

Research of ECRs who work in biodiversity genomics

Lara Parata (UWA): Genome Evolution in Marine Ray-Finned Fishes

Approximately half of extant vertebrate species are fishes, with more than 30,000 species classified as ray-finned fishes (Actinopterygii). Actinopterygii represent diverse phenotypes, feeding strategies, life history traits and occupy distinct ecological niches, making them an ideal taxa for studying molecular drivers of diversity and adaptation. Despite their diversity, ecological, and economical importance, only 145 Illumina genome assemblies are available for marine Actinopterygii species. In this study we present 250 new marine Actinopterygii genome assemblies generated using Illumina whole genome sequencing and initial results from a large-scale study of these 395 genomes. Using reference-based annotation tools we determine which fish families have unique patterns of gene family frequency / structure (e.g., losses, expansions, contractions), and correlate these with predicted functional signatures to infer biological and ecological adaptations. We identify fish families with distinct rates of change in the gene families present within their genomes (e.g., more losses / expansions or diversity) and associate these patterns with increased rates of diversification or speciation to further elucidate the genomic attributes contributing to ecological success. The results of this work contribute to the growing understanding of fish genome evolution and provide new insights into the evolutionary history and ecological success of marine Actinopterygii.

Thursday, 19 September 2024

#PAGAustralia Poster 14: Synteny-Guided Semi-Automated Curation of Chromosome-Level Genome Assemblies

If you are attending PAG Australia 2024, come and have a chat at Poster 14 about easing the burden of chromosome-level assembly curation.

Abstract Text

Reference genomes are fundamental resources that underpin research across most aspects of modern biology. Technological improvements in the length and accuracy of long-read sequencing platforms, combined with Hi-C proximity ligation sequencing, has enabled the routine generation of highly contiguous phased assemblies, scaffolded to chromosome-level. Nevertheless, automated generation of perfect gapless “telomere-to-telomere” assemblies remains out of reach for most eukaryotic organisms. Scaffolding errors and false duplications can still occur, and assembly curation is now the main bottleneck for large-scale assembly projects. Here, I present a streamlined data workflow and scaffolding assessment for high-throughput manual curation of chromosome-level genome assemblies. Synteny between haplotypes, or closely related species, is combined with read mapping to orient, pair and visualise assembled chromosomes. Assembly gaps are classified according to scaffolding confidence, highlighting candidates for simple scaffolding corrections, such as inversions. Synteny visualisation, gap classification, and HiC contact maps are then combined to identify and document scaffolding edits with increased speed, precision and confidence. This accelerates the production of curated chromosome-level assemblies, and enables the identification of regions of the assembly that may require further attention. Individual tools used in the workflow (ChromSyn, Telociraptor, SynBad, PAFScaff, DepthKopy and DepthCharge) are available at https://github.com/slimsuite/.

Thursday, 5 September 2024

Origin and maintenance of large ribosomal RNA gene repeat size in mammals

Our latest paper is out as a Featured Article in the journal Genetics, featuring ONT from both the cane toad and BABS Genome snake genomes. This paper looks at how ribosomal RNA gene repeats (a.k.a. rDNA repeats) have evolved in vertebrates to expand in size in mammals. For something so fundamental to the function of an organism - literally every process of every cell ultimately relies on rRNA - there is surprising diversity. These regions are traditionally hard to assemble with short reads, and still provide challenges for long-read assemblies, so the new era of high-quality long-read assemblies is likely to reveal a lot about their evolution.

  • Macdonald E, Whibley A, Waters PD, Patel H, Edwards RJ & Ganley ARD (2024): Origin and maintenance of large ribosomal RNA gene repeat size in mammals. Genetics 228(1): iyae121 [Genetics] [PubMed]

Abstract

The genes encoding ribosomal RNA are highly conserved across life and in almost all eukaryotes are present in large tandem repeat arrays called the rDNA. rDNA repeat unit size is conserved across most eukaryotes but has expanded dramatically in mammals, principally through the expansion of the intergenic spacer region that separates adjacent rRNA coding regions. Here, we used long-read sequence data from representatives of the major amniote lineages to determine where in amniote evolution rDNA unit size increased. We find that amniote rDNA unit sizes fall into two narrow size classes: “normal” (∼11–20 kb) in all amniotes except monotreme, marsupial, and eutherian mammals, which have “large” (∼35–45 kb) sizes. We confirm that increases in intergenic spacer length explain much of this mammalian size increase. However, in stark contrast to the uniformity of mammalian rDNA unit size, mammalian intergenic spacers differ greatly in sequence. These results suggest a large increase in intergenic spacer size occurred in a mammalian ancestor and has been maintained despite substantial sequence changes over the course of mammalian evolution. This points to a previously unrecognized constraint on the length of the intergenic spacer, a region that was thought to be largely neutral. We finish by speculating on possible causes of this constraint.

Tuesday, 9 July 2024

New pre-print: The Genomics for Australian Plants (GAP) framework initiative – developing genomic resources for understanding the evolution and conservation of the Australian flora

The Bioplatforms Australia Genomics for Australian Plants (GAP) initiative aims to sequence and assemble representative genomes of Australia’s unique flora, which boasts over 24,000 native vascular plant species evolved over millions of years. The program brings together academic groups, herbaria and botanic gardens from across the country to build genomic capacity and create valuable resources for the classification, conservation and utilisation of Australian plants. We were lucky enough to sequence one of the first GAP species, the NSW Waratah. Now, the capstone paper outlining the project and its key findings from multiple species is out as a pre-print at EcoEvoRxiv:

Simpson L, Cantrill DJ, Byrne M, Allnutt TR, King GJ, Lum M, Al Bkhetan Z, Andrew R, Baker WJ, Barrett MD, Batley J, Berry O, Binks RM, Bragg JG, Broadhurst L, Brown G, Bruhl J, Edwards RJ, Ferguson S, Forest F, Gustafsson J, Hammer TA, Holmes GD, Jackson CJ, James EA, Jones A, Kersey PJ, Leitch IJ, Maurin O, McLay TGB, Murphy DJ, Nargar K, Nauheimer L, Sauquet H, Schmidt-Lebuhn AN, Shepherd KA, Syme AE, Waycott M, Wilson TC, Crayn DM (preprint): The Genomics for Australian Plants (GAP) framework initiative – developing genomic resources for understanding the evolution and conservation of the Australian flora. EcoEvoRxiv DOI: https://doi.org/10.32942/X2RP70

The generation and analysis of genome-scale data—genomics—is driving a rapid increase in plant biodiversity knowledge. However, the speed and complexity of technological advance in genomics presents challenges for its widescale use in evolutionary and conservation biology. Here, we introduce and describe a national-scale collaboration conceived to build genomic resources and capability for understanding the Australian flora: the Genomics for Australian Plants (GAP) Framework Initiative. We outline (a) the history of the project including the collaborative framework, partners, and funding; (b) GAP principles such as rigour in design, sample verification and documentation, data management, and data accessibility; and (c) the structure of the consortium and its four activity streams (reference genomes, phylogenomics, conservation genomics, and training), with the rationale and aims for each of them. We show, through discussion of its successes and challenges, the value of this multi-institutional consortium approach and the enablers, such as well-curated collections and national collaborative research infrastructure, all of which have led to a substantial increase in capacity and delivery of biodiversity knowledge outcomes.

The initiative is about more than just reference genomes, with core activity in phylogenomics, conservation genomics and training too. For more information on the project and the resources generated (with more to come), read the paper and/or visit the GAP website.

Wednesday, 3 July 2024

The evolution of ultraconserved elements in vertebrates

Despite first being described twenty years ago, Ultraconserved elements (UCEs) have remained something of a mystery. In this paper, we make use of a new, efficient tool for de novo prediction of UCEs from multiple genomes, dedUCE, to analyse the growing number of mammalian genomes. Based on this analysis, we propose a revised definition for UCEs, and investigate their evolution.

Cummins M, Watson C, Edwards RJ & Mattick JS (2024): The evolution of ultraconserved elements in vertebrates. Mol Biol Evol 41(7):msae146. [Mol Biol Evol] [PubMed]

Abstract

Ultraconserved elements were discovered two decades ago, arbitrarily defined as sequences that are identical over a length ≥ 200 bp in the human, mouse, and rat genomes. The definition was subsequently extended to sequences ≥ 100 bp identical in at least three of five mammalian genomes (including dog and cow), and shown to have undergone rapid expansion from ancestors in fish and strong negative selection in birds and mammals. Since then, many more genomes have become available, allowing better definition and more thorough examination of ultraconserved element distribution and evolutionary history. We developed a fast and flexible analytical pipeline for identifying ultraconserved elements in multiple genomes, dedUCE, which allows manipulation of minimum length, sequence identity, and number of species with a detectable ultraconserved element according to specified parameters. We suggest an updated definition of ultraconserved elements as sequences ≥ 100 bp and ≥97% sequence identity in ≥50% of placental mammal orders (12,813 ultraconserved elements). By mapping ultraconserved elements to ∼200 species, we find that placental ultraconserved elements appeared early in vertebrate evolution, well before land colonization, suggesting that the evolutionary pressures driving ultraconserved element selection were present in aquatic environments in the Cambrian–Devonian periods. Most (>90%) ultraconserved elements likely appeared after the divergence of gnathostomes from jawless predecessors, were largely established in sequence identity by early Sarcopterygii evolution—before the divergence of lobe-finned fishes from tetrapods—and became near fixed in the amniotes. Ultraconserved elements are mainly located in the introns of protein-coding and noncoding genes involved in neurological and skeletomuscular development, enriched in regulatory elements, and dynamically expressed throughout embryonic development.

Tuesday, 2 July 2024

Extant and extinct bilby genomes combined with Indigenous knowledge improve conservation of a unique Australian marsupial

Hogg C*, Edwards RJ*, Farquharson K*, Silver L*, Brandies P, Peel E, Escalona M, Jaya FR, Thavornkanlapachai R, Batley K, Bradford TM, Chang JK, Chen Z, Deshpande N, Dziminski M, Ewart KM, Griffith OW, Marin Gual L, Moon KL, Travouillon KJ, Waters P, Whittington CM, Wilkins MR, Helgen KM, Lo N, Ho SYW, Ruiz Herrera A, Paltridge R, Marshall Graves JA, Renfree M, Shapiro B, Ottewell K, Kiwirrkurra Rangers & Belov K (2024): Extant and extinct bilby genomes combined with Indigenous knowledge improve conservation of a unique Australian marsupial. Nature Ecology & Evolution 8:1311–1326. [*Joint first authors] [Research Square] [Nat Ecol Evol] [PubMed]

Abstract

Ninu (greater bilby, Macrotis lagotis) are desert-dwelling, culturally and ecologically important marsupials. In collaboration with Indigenous rangers and conservation managers, we generated the Ninu chromosome-level genome assembly (3.66 Gbp) and genome sequences for the extinct Yallara (lesser bilby, Macrotis leucura). We developed and tested a scat single-nucleotide polymorphism panel to inform current and future conservation actions, undertake ecological assessments and improve our understanding of Ninu genetic diversity in managed and wild populations. We also assessed the beneficial impact of translocations in the metapopulation (N = 363 Ninu). Resequenced genomes (temperate Ninu, 6; semi-arid Ninu, 6; and Yallara, 4) revealed two major population crashes during global cooling events for both species and differences in Ninu genes involved in anatomical and metabolic pathways. Despite their 45-year captive history, Ninu have fewer long runs of homozygosity than other larger mammals, which may be attributable to their boom-bust life history. Here we investigated the unique Ninu biology using 12 tissue transcriptomes revealing expression of all 115 conserved eutherian chorioallantoic placentation genes in the uterus, an XY1Y2 sex chromosome system and olfactory receptor gene expansions. Together, we demonstrate the holistic value of genomics in improving key conservation actions, understanding unique biological traits and developing tools for Indigenous rangers to monitor remote wild populations.

Friday, 15 March 2024

Towards telomere-to-telomere fish genomes with Oxford Nanopore Technologies gap-filling

It was a pleasure to be invited to the “What You’re Missing Matters” tour at Perth, and present some ongoing work investigating the best way to incorporate Oxford Nanopore Technologies data into our high-quality HiFi+HiC fish genomes. The results are too preliminary to share here (and will soon be superseded) but do get in touch if it sounds interesting to you.