Thursday 8 December 2022

Metaproteomics reveals methyltransferases implicated in dichloromethane and glycine betaine fermentation by ‘Candidatus Formimonas warabiya’ strain DCMF

Holland SI, Vázquez-Campos X, Ertan H, Edwards RJ, Manefield MJ & Lee M (2022): Metaproteomics reveals methyltransferases implicated in dichloromethane and glycine betaine fermentation by ' Candidatus Formimonas warabiya' strain DCMF. Front Microbiol. 13:1035247. doi: 10.3389/fmicb.2022.1035247 [Front Microbiol.] [PubMed]

Dichloromethane (DCM; CH2Cl2) is a widespread pollutant with anthropogenic and natural sources. Anaerobic DCM-dechlorinating bacteria use the Wood–Ljungdahl pathway, yet dechlorination reaction mechanisms remain unclear and the enzyme(s) responsible for carbon-chlorine bond cleavage have not been definitively identified. Of the three bacterial taxa known to carry out anaerobic dechlorination of DCM, ‘Candidatus Formimonas warabiya’ strain DCMF is the only organism that can also ferment non-chlorinated substrates, including quaternary amines (i.e., choline and glycine betaine) and methanol. Strain DCMF is present within enrichment culture DFE, which was derived from an organochlorine-contaminated aquifer. We utilized the metabolic versatility of strain DCMF to carry out comparative metaproteomics of cultures grown with DCM or glycine betaine. This revealed differential abundance of numerous proteins, including a methyltransferase gene cluster (the mec cassette) that was significantly more abundant during DCM degradation, as well as highly conserved amongst anaerobic DCM-degrading bacteria. This lends strong support to its involvement in DCM dechlorination. A putative glycine betaine methyltransferase was also discovered, adding to the limited knowledge about the fate of this widespread osmolyte in anoxic subsurface environments. Furthermore, the metagenome of enrichment culture DFE was assembled, resulting in five high quality and two low quality draft metagenome-assembled genomes. Metaproteogenomic analysis did not reveal any genes or proteins for utilization of DCM or glycine betaine in the cohabiting bacteria, supporting the previously held idea that they persist via necromass utilization.

Wednesday 23 November 2022

Minderoo OceanOmics Centre at UWA Grand Opening

The Grand Opening of the Minderoo OceanOmics Centre at UWA is only a day away! Join the launch of the Centre online from 4:40 to learn more about the inspiration and the vision behind this project, which aims to harness environmental DNA and genomics for marine conservation: https://lnkd.in/gCP4GAhs

You can find out a bit more about the Minderoo OceanOmics Centre at UWA here: https://lnkd.in/gmXKjXNu

And the broader Minderoo OceanOmics program here: https://lnkd.in/gtKHSk7g

Or get in touch if you want to know more!

Friday 21 October 2022

The Ocean Genomes Lab is hiring - Bioinformatics and Sequencing technicians wanted!

Adding to the recently advertised Sequencing technician posts (closing 27 October), we are now pleased to advertise two bioinformatics research assistant positions to support our creation of marine vertebrate reference genome library. If you have experience with genome assembly or bioinformatics workflows, and are passionate about saving marine biodiversity, come and join us!

Two positions are available at Level 5 or 6, depending on your experience. Both roles will be providing bioinformatics support for our marine vertebrate reference genome project. You’ll get to play with data from the latest sequencing toys, including Illumina NovaSeq 6000, NextSeq 2000 and iSeq 100, the PacBio Sequel IIe, and ONT (probably PromethION and MinION).

Job roles will include developing and applying genome assembly workflows, data curation and QC, data sharing, and development/benchmarking of comparative genomics and genome assembly curation tools. If you have experience or passion for integrating bioinformatics workflows with Laboratory Information Management Systems and/or Electronic Laboratory Notebooks, we’d also love to hear to from you. SQL database skills would not go amiss too.

We’re a new team with lots to do, so there is plenty of scope to make the position your own and play to your strengths.

The closing date for applications is 11:55 PM AWST on Thursday 10 November 2022.

To learn more about these opportunities, please click here or contact Rich Edwards at rich.edwards@uwa.edu.au.

ABOUT THE TEAM

The Minderoo OceanOmics Centre at UWA combines a joint Ocean Genomes Laboratory, an OceanOmics Laboratory, and Computational Biology Services.

Equipped with the latest high-throughput sequencing technology and in collaboration with global partners, the Ocean Genomes Laboratory will generate a comprehensive library of high quality marine vertebrate reference genome assemblies. All such reference genome data will be subject to rigorous QA/QC and all assemblies will be released publicly with open access.

The Ocean Genomes Laboratory will undertake research and development under the direction of Minderoo’s ambitious OceanOmics Program which has the goal of revolutionising ocean conservation through novel marine sampling and genomics approaches and scaling these to significantly advance our knowledge of marine life. The Ocean Genomes Laboratory and Computational Biology Services will include state of the art infrastructure including sample and eDNA preparation areas, flow cytometry, single cell sequencing equipment and the latest bioinformatics and computational biology tools.

Thursday 6 October 2022

The Ocean Genomes Laboratory is hiring!

The Minderoo OceanOmics Centre at UWA Ocean Genomes Laboratory is now hiring our technical team to support high throughput DNA sequencing and genome assembly. We currently have three "wet" lab positions going: a Sequencing Specialist Scientific Officer, and two Sequencing Technician positions. Both roles will be providing technical support in the lab, particularly with respect to all aspects of DNA sequencing (sample extraction, library preparation and setting up sequencing runs). You'll get to play with the latest sequencing toys, including Illumina NovaSeq 6000, NextSeq 2000 and iSeq 100, the PacBio Sequel IIe, and ONT (probably PromethION and MinION).

The closing date for applications is 11:55 PM AWST on Thursday 27 October 2022.

To learn more about these opportunities, please click on the links above or contact Rich Edwards at rich.edwards@uwa.edu.au. We will also be advertising some bioinformatics positions soon.

About the team

The Minderoo OceanOmics Centre at UWA combines a joint Ocean Genomes Laboratory, an OceanOmics Laboratory, and Computational Biology Services.

Equipped with the latest high-throughput sequencing technology and in collaboration with global partners, the Ocean Genomes Laboratory will generate a comprehensive library of high quality marine vertebrate reference genome assemblies. All such reference genome data will be subject to rigorous QA/QC and all assemblies will be released publicly with open access.

The Ocean Genomes Laboratory will undertake research and development under the direction of Minderoo’s ambitious OceanOmics Program which has the goal of revolutionising ocean conservation through novel marine sampling and genomics approaches and scaling these to significantly advance our knowledge of marine life. The Ocean Genomes Laboratory and Computational Biology Services will include state of the art infrastructure including sample and eDNA preparation areas, flow cytometry, single cell sequencing equipment and the latest bioinformatics and computational biology tools.

Monday 8 August 2022

Senior Postdoc wanted for UWA Ocean Genomes Lab! (Closing soon)

The new Ocean Genomes Laboratory (part of the Minderoo OceanOmics Centre at the UWA Oceans Institute) is hiring a Level B postdoc in marine genomics. (Three-year fixed term full time role, or flexible working equivalent.)

This is a rare opportunity to work as part of a collaborative team in a high-profile state of the art genomics research facility dedicated to studying marine vertebrates. You should have a PhD in bioinformatics, computational biology, molecular genetics or genomics, plus an interest in marine vertebrates and postdoctoral experience in high throughput DNA sequencing and whole genome assembly. The lab is new and there is plenty of scope to shape its direction beyond the core mission creating a marine vertebrate reference genome library as part of the Vertebrate Genome Project. You will also have an important role in helping to supervise the lab staff and research team.

Closing date: 11:55pm AWST, Friday 12 August 2022

Please see the UWA job advert for more details.

About the team

The Minderoo OceanOmics Centre at UWA combines a joint Ocean Genomes Laboratory, an OceanOmics Laboratory, and a Computational Biology Program.

Equipped with the latest high-throughput sequencing technology, and in collaboration with global partners, the Ocean Genomes Laboratory will generate a comprehensive library of high-quality marine vertebrate reference genome assemblies. All reference genome data will be subject to rigorous QA/QC and all assemblies will be released publicly through open access.

The OceanOmics Centre will be located in the Bayliss Building on the UWA Crawley Campus, OceanOmics staff sharing the building with research and teaching staff primarily from the UWA School of Molecular Sciences and interacting with staff in the UWA Oceans Institute in the nearby IOMRC building.

About the opportunity

As a Research Fellow you will join a research group committed to applying modern molecular biological methods to marine research.

Using modern genomic approaches, you will undertake research on marine vertebrates, focussed on the production, QC and assembly of high-quality reference genome data. You will participate in the entire workflow from sample collection and processing, generating genomic sequence data in the laboratory using multiple modern genome sequencing technologies, with a focus on data processing, assembly, curation, analysis and dissemination.

In this unique role you will also be supported to develop your leadership skills. Working closely with the Centre’s UWA Principal Research Fellow, junior postdoctoral academics, the Centre’s Laboratory Manager, and diverse researchers from Minderoo Foundation you will contribute to decision making, oversee the work of technicians and PhD students and provide leadership in modern high-quality genome assembly production and publication.

Friday 22 July 2022

The Edwards Lab is moving to the UWA Oceans Institute!

More details will follow but, in August, I will be starting a new position at the University of Western Australia Oceans Institute to head up the new Ocean Genomes Laboratory as part of the Minderoo OceanOmics Centre. This exciting project will collaborate closely with the Minderoo Foundation, the Vertebrate Genome Project, and scientists across Australia to create marine vertebrate reference genomes.

The goal of the Ocean Genomes Lab is "building and openly publishing the reference libraries for marine vertebrates ... to accurately detect, monitor and determine the health of these species". The lab is still being setup and we're hiring. Currently available is a Level B postdoc positions: http://bit.ly/OceanOmics. If building genomes is your thing, and you want to help fight the biodiversity crisis in our oceans, come and work with me! (Or pass it on if you know someone who does!) Research Assistant positions will follow.

Look out for a bunch of updates over the next few weeks, both as I update some of the outstanding presentations and posters from this year, and as the website rebrands. In the meantime, please get in touch if any of this sounds interesting!

Wednesday 29 June 2022

The starling genome is out!

See the pre-print post for details.

Stuart KC*, Edwards RJ*, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL & Rollins LA (2022): Transcript- and annotation-guided genome assembly of the European starling. Molecular Ecology 22(8):3141-3160. doi: 10.1111/1755-0998.13679. [*Joint first authors] [Mol Ecol Res] [PubMed] [bioRxiv]

The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second, North American, short-read genome assembly (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterization. S. vulgaris vAU combined 10× genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 6222 scaffolds (7.6 Mb scaffold N50, 94.6% busco completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Species-specific transcript mapping and gene annotation revealed good gene-level assembly and high functional completeness. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, saaga) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counterintuitive behaviour in traditional busco metrics, and present buscomp, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. This work expands our knowledge of avian genomes and the available toolkit for assessing and improving genome quality. The new genomic resources presented will facilitate further global genomic and transcriptomic analysis on this ecologically important species.

Saturday 23 April 2022

The Australian dingo is an early offshoot of modern breed dogs

Field MA, Yadav S, Dudchenko O, Esvaran M, Rosen BD, Skvortsova K, Edwards RJ, Keilwagen J, Cochran BJ, Manandhar B, Bustamante S, Rasmussen JA, Melvin RG, Chernoffl B, Omer A, Colaric Z, Chan EKF, Minoche AE, Smith TPL, Gilbert MTP, Bogdanovic O, Zammit RA, Thomas T, Aiden EL & Ballard JWO (2022): The Australian dingo is an early offshoot of modern breed dogs. Science Advances 8(16):abm5944; DOI: 10.1126/sciadv.abm5944. [Sci Adv] [PubMed] [PDF]

Dogs are uniquely associated with human dispersal and bring transformational insight into the domestication process. Dingoes represent an intriguing case within canine evolution being geographically isolated for thousands of years. Here, we present a high-quality de novo assembly of a pure dingo (CanFam_DDS). We identified large chromosomal differences relative to the current dog reference (CanFam3.1) and confirmed no expanded pancreatic amylase gene as found in breed dogs. Phylogenetic analyses using variant pairwise matrices show that the dingo is distinct from five breed dogs with 100% bootstrap support when using Greenland wolf as the outgroup. Functionally, we observe differences in methylation patterns between the dingo and German shepherd dog genomes and differences in serum biochemistry and microbiome makeup. Our results suggest that distinct demographic and environmental conditions have shaped the dingo genome. In contrast, artificial human selection has likely shaped the genomes of domestic breed dogs after divergence from the dingo.

Sunday 13 February 2022

Edwards Lab at #LorneGenome 2022

Lorne Genome 2022 (the 43rd Annual Lorne Genome Conference 2022) kicks off today in Lorne and online. I wasn’t able to make it in person this year due to Omicron and teaching commitments, but happily the lab is still well represented. As well as an online talk, we have two in-person posters, so please check these out if you are lucky enough to be attending in the flesh.

Details below.


A chromosome-level reference genome for Telopea speciosissima (New South Wales waratah) provides insight into waratah evolution (#138)

Stephanie H Chen, Jason G Bragg, Richard J Edwards

Telopea is an eastern Australian genus of five species of long-lived shrubs in the family Proteaceae. Previous work has characterised population structure and patterns of introgression between Telopea species. These studies were performed using a limited set of genetic markers, but point to the great potential of waratah as a model clade for understanding the processes of divergence, environmental adaptation and speciation, when enhanced by a genome-wide perspective enabled by a reference genome. However, few Proteaceae genomes and no waratah genomes are available.

We assembled the first chromosome-level reference genome for T. speciosissima (New South Wales waratah; 2n = 22) using Nanopore long-reads, 10x Chromium linked-reads and Hi-C data. The assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8 % of Embryophyta universal single-copy orthologues (BUSCOs; n = 1,614) complete. Read depth analysis of 140 ‘Duplicated’ BUSCO genes reveals that almost all are real duplications, increasing confidence in protein family analysis using annotated protein-coding genes, highlighting a possible need to revise the BUSCO set for this lineage. Genome annotation predicted 34,706 genes and pseudogenes, including 27,481 protein-coding genes. We examined the evolutionary dynamics of Telopea using the reference genome in conjunction with DArTseq (n = 244) and whole genome shotgun sequencing (n = 14) of each of the seven lineages; there are three lineages of T. speciosissima – coastal, upland and southern.

Here, I will discuss the population structure and demographic history of the genus. We also examined phylogenomic relationships and developed a scalable method of rapidly generating species trees from short-read data to maximise the recovery of informative data from genomic datasets. The waratah reference genome represents an important new genomic resource in Proteaceae to accelerate our understanding of the origins and evolutionary dynamics of the Australian flora.

[Read more about the waratah genome, here.]


Small but mitey: high-quality long-read assembly of a streamlined mite genome from contaminated sequencing data (#17)

Richard J Edwards, Stephanie H Chen, Jason G Bragg.

As pilot data for project on myrtle rust resistance, we previously assembled two Myrtaceae genomes using 10x Chromium linked reads: Rhodamnia argentea (silver malletwood) and Syzygium oleosum (blue lilly pilly). Both draft genomes achieved scaffolding (N50 > 850 kb) and completeness (BUSCOv3 embryophyta_odb9 > 90 %) of sufficient quality to be annotated by NCBI RefSeq. However, signs of arthropod sequence contamination were subsequently found in the Rhodamnia argentea assembly. We therefore sought to identify and eliminate this contamination during improvement and curation of the genome for publication.

A risk-averse analysis highlighted 49.6 Mb (11.95%) on 2,996 of 15,781 scaffolds of possible arthropod origin. An improved assembly of the same tree, incorporating ~50X long-read (ONT) sequencing, has confirmed this contamination as 11 scaffolds (34.6 Mb) that are distinct from 75 R. argentea assembly scaffolds (346.7 Mb), increasing the likelihood of contamination over the integration of horizontally transferred genes. Taxonomic analysis of predicted protein-coding genes using Taxolotl (https://github.com/slimsuite/taxolotl) suggested that the contamination most likely originates from some form of mite (Order: Trombidiformes), but limited NCBInr mite sequences precluded better taxonomic resolution. Curiously, these contamination scaffolds showed a high depth of coverage (~36X), but a fairly low BUSCO completeness of 58.1% (v5 Augustus, metazoa_odb10 n=954), apparently inconsistent with typical mite genomes.

Phylogenomic analysis with available mite genomes identified the closest relative as Aculops lycopersici, a microscopic (0.2 mm long) eriophyoid mite with a heavily streamlined 32.5 Mb genome. Original low completeness appears to be from a combination of genome reduction and poor performance of that BUSCO version; BUSCO v5 MetaEuk eukaryota_odb10 (n=255) reports 82.8% completeness, which is approaching the 86.3% of A. lycopersici. Here, we discuss the evidence that we have assembled a highly complete but streamlined genome from an unknown eriophyoid mite, plus the need to improve genomic representation of contaminating pest species.


A genetic perspective on rapid adaptation in the globally invasive European starling (Sturnus vulgaris) (#255)

Katarina C Stuart, Richard J Edwards, William (Bill) B Sherwin, Lee Ann Rollins.

Few invasive birds are as globally successful or as well-studied as the common starling (Sturnus vulgaris). Native to the Palaearctic, the starling has been a prolific invader in North and South America, southern Africa, Australia, and The Pacific Islands, while facing declines in excess of 50% in in some native regions. Starlings present an invaluable opportunity to test predictions about the evolutionary trajectory of invasive populations, and gain insight into genetic shifts in response to anthropogenic alteration and climate change.

My research focuses primarily on the invasive European starling population in Australia and aims to investigate the genetics underlying their evolution, using a range of genomic approaches. Through historic museum sample sequencing, I examine single nucleotide polymorphism variations shifts between the native range and Australia, and find parallel selection on both continents, possibly resulting from common global selective forces such as exposure to pollutants and carbohydrate exposure. I further examine matched genetic, morphological, and environmental data to reveal patterns of heritability and plasticity across ecologically significant phenotypic traits, revealing that elevation, as well as rainfall and temperature variability plays an important role in shaping morphology and genetics. Finally, I investigated patterns of structural variants, to uncover evolutionarily significant large-scale genetic variants across a global data set, and more specifically characterise their role in rapid starling adaptation across the entirety of the Australian range. Overall, my research seeks to better understand mechanisms and patterns of genetic change within this species, which may be used to inform invasion or native range management. More broadly, this evolutionary research into the starling provide an important perspective on the role of rapid evolution in invasive species persistence, and the global pressures that may shape range shifts and evolution across many similar avian taxa.

Tuesday 25 January 2022

Horizontal transposon transfer and its implications for the ancestral ecology of hydrophiine snakes

The first of the BABS Genome papers has finally arrived, featuring our two 10x Genomics Supernova snake genomes. Such is the speed that genomics is moving, the snake assemblies themselves have moved on quite a bit since then and we hope to release chromosome-level versions soon. (The goalposts for a genome paper moved faster than they could be written up - always a challenge without dedicated researchers working on assemblies! Do get in touch if they’d be useful and we can collaborate.)

Rather than a pure genome paper, this paper makes use of our two elapid genomes to ask some interesting questions about possible horizontal transfer of transposable (mobile genetic) elements during the evolution of sea snakes - our two elapids provided good sister (mainland tiger snake) and outgroup (eastern brown snake) taxa for the olive sea snake, which was the focus of the study. It was doubly pleasing to collaborate on a transposable elements paper, as they were the subject of my PhD (albeit in bacteria, see here and here).

This paper is part of a special issue, Mobile Elements in Phylogenomic Reconstructions, and features some interesting examples of probable horiztonal transfer of mobile elements that provide insights into the evolutionary history of these species.


Galbraith JD, Ludington AJ, Sanders KL, Amos TG, Thomson VA, Enosi Tuipulotu D, Dunstan N, Edwards RJ, Suh A, Adelson DL (2022): Horizontal transposon transfer and its implications for the ancestral ecology of hydrophiine snakes. Genes 13(2):217. [Genes] [PDF] [bioRxiv]

Abstract

Transposable elements (TEs), also known as jumping genes, are sequences able to move or copy themselves within a genome. As TEs move throughout genomes they often act as a source of genetic novelty, hence understanding TE evolution within lineages may help in understanding environmental adaptation. Studies into the TE content of lineages of mammals such as bats have uncovered horizontal transposon transfer (HTT) into these lineages, with squamates often also containing the same TEs. Despite the repeated finding of HTT into squamates, little comparative research has examined the evolution of TEs within squamates. Here we examine a diverse family of Australo–Melanesian snakes (Hydrophiinae) to examine if the previously identified, order-wide pattern of variable TE content and activity holds true on a smaller scale. Hydrophiinae diverged from Asian elapids ~30 Mya and have since rapidly diversified into six amphibious, ~60 marine and ~100 terrestrial species that fill a broad range of ecological niches. We find TE diversity and expansion differs between hydrophiines and their Asian relatives and identify multiple HTTs into Hydrophiinae, including three likely transferred into the ancestral hydrophiine from fish. These HTT events provide the first tangible evidence that Hydrophiinae reached Australia from Asia via a marine route.

Friday 14 January 2022

The Waratah genome paper is out!

The final version of the waratah genome paper now out in Molecular Ecology Resources. This was a fun collaboration with the Royal Botanic Gardens and Domain Trust as one of the pilot genomes for BioPlatforms Australia’s Genomics for Australian Plants (GAP) initiative.

You can read the press release here, or our piece in the Conversation, We’ve unveiled the waratah’s genetic secrets, helping preserve this Australian icon for the future.

In this paper, we present a chromosome-level assembly for the NSW State Floral Emblem, the New South Wales waratah, Telopea speciosissima. This joins macadamia as the 2nd reference genome for the Proteaceae family & should help future studies for the remaining ca. 1700 species.

The genome was assembled from a ONT chassis, scaffolded with 10x Genomics linked reads and Phase Genomics HiC - made possible thanks to quality data from AGRF and the Ramaciotti Centre for Genomics. The final assembly was chromosome-level, with 94.1% on the 11 chromosomes (2n = 22).

As well as the assembly itself, the paper presents a three genomics tools that we hope will be helpful for other assemblies:

1. DepthSizer uses long-read depths and BUSCO predictions to estimate genome size. We estimated the waratah genome to be ca. 900 Mbp - bigger than kmer estimates, but smaller than flow cytometry of Tasmanian waratah.

2. Diploidocus builds on Purge Haplotigs, combining read depths, kmer frequencies & BUSCO predictions to classify and curate/filter assembly scaffolds. This decreases false duplications & contamination, and flags collapsed repeats for closer inspection.

3. DepthKopy uses BUSCO Complete genes to establish sequencing depth (like DepthSizer) and then estimates copy number for regions (e.g. genes), scaffolds & sliding windows of the assembly. This showed that most “Duplicated” BUSCOs are real duplicates.


Chen SH, Rossetto M, van der Merwe M, Lu-Irving P, Yap JS, Sauquet H, Bourke G, Amos TG, Bragg JG & Edwards RJ (accepted): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. Molecular Ecology Resources.
[Mol Ecol Res] [bioRxiv]

Abstract

Telopea speciosissima, the New South Wales waratah, is an Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Here, we report the first chromosome-level genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8% of Embryophyta BUSCOs “Complete”. We present a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering scaffolds, which combines read depths, k-mer frequencies and BUSCO predictions. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single-copy orthologues and estimate the genome size to be approximately 900 Mb. The largest 11 scaffolds contained 94.1% of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. We investigated CYCLOIDEA (CYC) genes, which have a role in determination of floral symmetry, and confirm the presence of two copies in the genome. Read depth analysis of 180 “Duplicated” BUSCO genes using a new tool, DepthKopy (https://github.com/slimsuite/depthkopy), suggests almost all are real duplications, increasing confidence in the annotation and highlighting a possible need to revise the BUSCO set for this lineage. The chromosome-level T. speciosissima reference genome (Tspe_v1) provides an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.

If you want a read and don’t have access, please get it touch or check out the bioRxiv preprint.