Tuesday, 2 July 2019

#GSA2019 - BUSCOMP: BUSCO Compilation and Comparison for Assessing Completeness in Multiple Genome Assemblies

Richard J. Edwards

If you are at the Genetics Society of Australasia Conference 2019, then come and hear me talk at 11:30 in Symposium 4B – Genomics & Bioinformatics (1). If you cannot make it, or loved the talk so much you want to look at it again, the slides are available on F1000Research (Citation pending):

  • Edwards RJ (2019): BUSCOMP: BUSCO compilation and comparison – Assessing completeness in multiple genome assemblies [version 1; not peer reviewed]. F1000Research 8:995 (slides)
    (doi: 10.7490/f1000research.1116972.1)


Advances in DNA sequencing technology and free availability of bioinformatics tools have placed de novo genome assembly of complex organisms firmly in the domain of individual labs and small consortia. Nevertheless, the assemblies produced are often fragmented and incomplete. Optimal assembly depends on the size, repeat landscape, ploidy and heterozygosity of the genome, which are often unknown. It is therefore common practice to try multiple strategies, and there is a bottleneck in assessing and comparing assemblies.

BUSCO [1] is a powerful and popular tool that estimates genome completeness using gene prediction and curated models of single-copy protein orthologues. BUSCO assessments combine genome completeness, contiguity, and accuracy to rate genes as “Complete (Single Copy)”, “Duplicated”, “Fragmented” or “Missing”. However, results can be counterintuitive and lack robustness when comparing multiple assemblies of the same genome. Adding/removing scaffolds can alter the BUSCO genes returned by the rest of the assembly [2], while low sequence quality may reduce “completeness” scores and miss genes that are present in the assembly [3].

BUSCOMP (BUSCO Compilation and Comparison) is designed to complement BUSCO and identify/overcome these issues. BUSCOMP first compiles a non-redundant maximal set of the highest-scoring single-copy complete sequences for as many BUSCO genes as possible. These are then searched against assemblies using Minimap2 [4], converted into global alignment statistics, and used to robustly re-rate genes as Complete (Single/Duplicated), Fragmented/Partial or Missing. On test data from three organisms (yeast, cane toad and mainland tiger snake), BUSCOMP (1) gives consistent results when re-running the same assembly, (2) is not affected by adding or removing non-BUSCO-containing scaffolds, and (3) is minimally affected by assembly quality. This makes BUSCOMP ideal to run alongside BUSCO when trying to compare and rank genome assemblies, even in the absence of error-correction.

BUSCOMP is freely available at https://github.com/slimsuite/buscomp under a GNU GPL v3 license.

  1. Simão FA et al. (2015) Bioinformatics 31:3210–3212
  2. Edwards RJ et al. (2018) F1000Research 7:753
  3. Edwards RJ et al. (2018) GigaScience 7:giy095
  4. Li H (2018) Bioinformatics 34:3094-3100

Thursday, 6 June 2019

PhD available: Developing genomic resources to advance the molecular ecology of invasions

Expressions of interest are now open for a Scientia PhD Scholarship, in collaboration between the Edwards Lab, Lee Ann Rollins, and Marc Wilkins:

Developing genomic resources to advance the molecular ecology of invasions

These are exciting four-year scholarships to start in 2020 with full fees covered, a generous stipend, and career development funds. Please click on the toad or get in touch if you want to know more!

Closing date: 12 July 2019.
Location: University of New South Wales, Sydney, Australia


Invasive species pose a major challenge to biodiversity worldwide but also provide the unique opportunity to study evolution in action. Rapid changes are often associated with invaders’ introduction to novel environments. Understanding how molecular mechanisms drive these changes enables the creation of innovative solutions to controlling invasions and managing native species’ response to climatic change. The iconic Australian cane toad invasion is one of the best studied globally and is an emerging model for invasion genomics. This project will use whole genome sequencing, novel bioinformatic approaches and proteomics to identify molecular drivers of invasion success.


We seek a highly motivated, curiosity-driven student with an interest in evolutionary biology and bioinformatics, who would like to understand why invasive species flourish. Ideally, candidates will have demonstrated computer literacy and be willing to learn new approaches to analysing genomic and proteomic data. Strong writing skills will be an asset. We will consider applicants coming from either a computing background who want to work in evolutionary biology or those with evolutionary biology backgrounds and a keen interest in bioinformatics. The supervisory team offer a high level of support in the fields of evolutionary biology, genomics, proteomics and bioinformatics. This project provides the opportunity for the successful applicant to develop the most current skills and build a successful career in these fields while contributing to solutions for two major global issues: loss of biodiversity and species’ response to climate change.


  • Dr Lee Ann Rollins, UNSW Scientia Fellow
  • Dr Richard Edwards, Senior Lecturer in Genomics & Bioinformatics
  • Prof Marc Wilkins, Professor of Systems Biology


Complete an expression of interest at: https://www.scientia.unsw.edu.au/scientia-phd-scholarships/developing-genomic-resources-advance-molecular-ecology-invasions

The strongest expressions will receive an invitation to submit a full application to the scholarship competition.


To discuss the project and related opportunities, please contact Lee Ann Rollins (l.rollins[at]unsw.edu.au / @rollins_lee) or Rich Edwards (richard.edwards[at]unsw.edu.au / @slimsuite).

Thursday, 23 May 2019

Complete genome sequences of pooled genomic DNA from 10 marine bacteria using PacBio long-read sequencing

Song W, Thomas T & Edwards RJ (2019) Complete genome sequences of pooled genomic DNA from 10 marine bacteria using PacBio long-read sequencing. Marine Genomics in press DOI: 10.1016/j.margen.2019.05.002



High-quality, completed genomes are important to understand the functions of marine bacteria. PacBio sequencing technology provides a powerful way to obtain high-quality completed genomes. However individual library production is currently still costly, limiting the utility of the PacBio system for high-throughput genomics. Here we investigate how to generate high-quality genomes from pooled marine bacterial genomes.


Pooled genomic DNA from 10 marine bacteria were subjected to a single library production and sequenced with eight SMRT cells on the PacBio RS II sequencing platform. In total, 7.35 Gbp of long-read data was generated, which is equivalent to an approximate 168× average coverage for the input genomes. Genome assembly showed that eight genomes with average nucleotide identities (ANI) lower than 91.4% can be assembled with high-quality and completion using standard assembly algorithms (e.g. HGAP or Canu). A reference-based reads phasing step was developed and incorporated to assemble the complete genomes of the remaining two marine bacteria that had an ANI > 97% and whose initial assemblies were highly fragmented.


Ten complete high-quality genomes of marine bacteria were generated. The findings and developments made here, including the reference-based read phasing approach for the assembly of highly similar genomes, can be used in the future to design strategies to sequence pooled genomes using long-read sequencing.

Friday, 10 May 2019

Preprint: Whole genome sequencing of a novel, dichloromethane-fermenting Peptococcaceae from an enrichment culture

Holland​ SI*, Edwards​ RJ*, Ertan H, Wong YK, Russell TL, Deshpande NP, Manefield M & Lee MJ​ (2019) Whole genome sequencing of a novel, dichloromethane-fermenting Peptococcaceae from an enrichment culture. PeerJ Preprints 7:e27718v1. DOI 10.7287/peerj.preprints.27718v1 [*Joint first authors]


Bacteria capable of dechlorinating the toxic environmental contaminant dichloromethane (DCM, CH2Cl2) are of great interest for potential bioremediation applications. A novel, strictly anaerobic, DCM-fermenting bacterium, “DCMF”, was enriched from organochlorine-contaminated groundwater near Botany Bay, Australia. The enrichment culture was maintained in minimal, mineral salt medium amended with dichloromethane as the sole energy source. PacBio whole genome SMRTTM sequencing of DCMF allowed de novo, gap-free assembly despite the presence of cohabiting organisms in the culture. Illumina sequencing reads were utilised to correct minor indels. The single, circularised 6.44 Mb chromosome was annotated with the IMG pipeline and contains 5,773 predicted protein-coding genes. Based on 16S rRNA gene and predicted proteome phylogeny, the organism appears to be a novel member of the Peptococcaceae family. The DCMF genome is large in comparison to known DCM-fermenting bacteria and includes 96 predicted methylamine methyltransferases, which may provide clues to the basis of its DCM metabolism. Full annotation has been provided in a custom genome browser and search tool, in addition to multiple sequence alignments and phylogenetic trees for every predicted protein, available at http://www.slimsuite.unsw.edu.au/research/dcmf/.

Monday, 18 March 2019

Term 2 Honours projects available

For any students completing their undegrad studies in UNSW Term 1, or external students finishing before June 2019, the UNSW Term 2 honours student application is now open. Deadline for Term 2 applications is 5pm Friday 12th April 2019. For how to apply please check the Faculty link: http://www.science.unsw.edu.au/honours-apply.

As usual, the EdwardsLab is looking to recruit enthusiastic students in genomics, in two main areas:

  1. Comparative genomics and molecular evolution in yeast, using long-read PacBio sequencing. We are particularly keen to get a student to work on aspects of our ARC Linkage grant, investigating the evolution of a novel biochemical pathway in yeast.

  2. De novo whole genome sequencing of vertebrates, including two snakes and the cane toad. In addition to general assembly and annotation activities, the lab has a few analytical tools that would benefit from

More details of honours can be found on the BABS website, or please get in touch if you have questions about specific projects. We welcome students interested in any of our Research areas, not just the projects listed. Applications from non-UNSW students are also encouraged.

Term 3 applications

Please also note that T3 honours application open date and deadline below, should you apply for T3 intake:

  • Applications open from Friday 17th May 2019

  • Deadline for Term 3 applications is 5pm Friday 26th July 2019

Wednesday, 16 January 2019

We have a new 10x Supernova assembly lab record

One of our sequencing approaches to play with at the moment is 10x Chromium linked read, assembled with Supernova. We’ve now done a bunch of different species, and they mostly come out pretty decent - and excellent value for money. As part of the Oz Mammals Genomics project, we are currently assembling several rock-wallabies. One of them, Petrogale mareeba has just broken our lab record for the most intact assembly - initial “pseudohaploid” stats:

  • Total length of sequences: 3,329,538,806
  • Max. length of sequences: 275,258,067
  • N50 length of sequences: 47,482,977
  • L50 count of sequences: 20
  • Gap (N) length: 40,183,160 (1.21%)

Over half of the 3.3 Gb genome is covered by just twenty scaffolds, at least 47.5 Mb in length - the longest is over 275 Mb! (We normally feel pretty happy to get an N50 of a few Mb.) Cuter than a cane toad too!

Photo credit: Richard.Fisher CC BY 2.0