Tuesday, 10 December 2019

Edwards Lab at GIW/ABACBS2019

The Edwards Lab and close affiliates have five posters at GIW-ABACBS2019 this year. Come and check them out during the two poster sesstions:


Poster #26: Comparative performance of long-read whole genome assembly tools in diploid eukaryotes

Åsa Pérez-Bercoff, Paris Thompson & Richard J. Edwards [PDF]

As long-read (single molecule) sequencing from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) is getting more common, more long-read assemblers are emerging. Whilst a few benchmarking studies have been performed, there is no clear “best” assembly tool, and the choice is often a combination of compute resource availability and anecdotal reports of relative performance in similar organisms. Here, we have de novo assembled PacBio long-read sequencing data from 13 diploid Saccharomyces cerevisiae (baker’s yeast) strains (11 diploid, 1 haploid and 1 tetraploid) using four different assemblers: Canu v1.8, Flye v2.4.2, WTDBG2 (a.k.a. Redbean) v2.4 and Ra v20181211. Assembly performance statistics have been generated by comparing assemblies to the reference yeast genome (SGD R64.2.1) using QUAST v5.0.2, and rating each assembly for accuracy, coverage, and contiguity.

The latest generation of assembly tools decide how to process reads based on a genome size parameter. For heterozygous diploid organisms, it is not clear whether this should be the haploid or diploid genome size, or something in-between based on heterozygosity. In this study, we used three genome size settings: haploid (13.1Mb), diploid (26.2Mb) and half-way in-between (19.65Mb). This will enable us to establish how the genome size parameter influences the quality of the resulting genome assembly.


Poster #33: Estimating genome size using long read depth profiles and single copy regions of draft genome assemblies

Timothy G Amos, Ziying Zhang & Richard J Edwards [PDF]

Estimating the size of a eukaryote genome is a fundamental task in genome assembly. As well as informing decisions on sequencing technology and depth, greater accuracy in genome size prediction can assist in assessing the completeness and duplication of a genome assembly. Genome sizes can be estimated through both experimental and genome sequencing approaches. Experimental methods include densitometry, flow cytometry analysis of stained nuclei and quantitative PCR (qPCR) of single copy genes. Genome sequencing approaches include short-read k-mer distributions. However, these methods can give variable results, and are prone to inaccurate predictions for genomes with abundant repetitive sequences.

Here, we show that the read depth of long reads in a draft genome can be used to provide a relatively accurate estimate of genome size across various model organisms. In general, diploid genome assemblies will consist of regions of haploid, diploid and incorrect (or multi-copy organelle) depth. Repeats and assembly errors are likely to be highly variable in terms of genome vs assembly copy number and, therefore, coverage. The modal read depth of conserved single copy orthologues should therefore approximate the sequencing depth of the input data, which can be extrapolated to estimate genome size. We found that our method can provide comparable, if not more accurate, estimates than short-read k-mer distributions.


Poster #92: Using genomics to reveal drivers of invasion success

Katarina Stuart, Lee Ann Rollins, William Sherwin, Richard Edwards, Natalie Hofmeister & Yuanyuan Cheng [PDF]

Invasive species are a global concern due to their negative impacts on the economy and local ecosystems. However, well-documented invasions provide a useful system in which to pose biologically interesting questions regarding short time scale evolution. Answering these questions will further our knowledge of evolutionary mechanisms, as well as inform specific management strategies for the invasive population. The European starling (Sturnus vulgaris) is a global pest that was introduced into Australia’s south-eastern states in the 1860’s and has since greatly expanded its range. Previous research on multiple introduced starling populations has demonstrated that their morphology has undergone subtle shifts following colonisation. My research applies a range of sequencing techniques to investigate genomic variation across Australia’s starling population. We are combining long-, short- and linked-read whole genome sequencing to assemble a high-quality starling reference genome. This genome will be annotated using Iso-seq long-read (PacBio) whole transcriptome sequencing in order to properly identify putatively evolving loci. Population sequencing data (whole genome and DArTSeq) will be mapped onto this reference to identify functional SNPs and reveal potential drivers of the rapid phenotypic divergences across the Australian population.


Poster #93: Advancing genomic resources for myrtle rust research and management

Stephanie Chen, Jason Bragg & Richard Edwards [PDF]

Myrtle rust is a plant disease caused by an invasive fungal pathogen (Austropuccinia psidii) first detected in Australia in 2010. Over 350 native species from the family Myrtaceae, which includes eucalypts, paperbarks, tea-trees, and lillipillies, are known hosts. Detailed genetic information needed for an effective and coordinated response encompassing conservation management is lacking. We performed de novo genome assembly and annotation for two species which exhibit a spectrum of resistance – Rhodamnia argentea (malletwood) and Syzygium oleosum (blue lilly pilly). Reference genomes have been generated using 10X linked reads, and are being supplemented with long reads and a hybrid assembly approach. These genomes will complement reduced representation sequencing (DArTseq) and rust resistance assays from different genotypes across the landscape. Together, these data will facilitate the characterisation of genetic structure and rust resistance across space as well as within and among species. This research is crucial for the management of at-risk species through optimising methods of improving disease resistance and adaptation to future climates in addition to increasing our understanding of myrtle rust which is a pressing concern to native biodiversity.


Poster #168: BUSCOMP: BUSCO Compilation and Comparison for Assessing Completeness in Multiple Genome Assemblies

Richard J. Edwards [PDF]

Advances in DNA sequencing technology and bioinformatics tools have placed de novo genome assembly of complex organisms firmly in the domain of individual labs and small consortia. Nevertheless, the assemblies produced are often fragmented and incomplete. Optimal assembly depends on the size, repeat landscape, ploidy and heterozygosity of the genome, which are often unknown. It is therefore common practice to try multiple strategies, and there is a bottleneck in assessing and comparing assemblies.

BUSCO [1] is a powerful and popular tool that estimates genome completeness using gene prediction and curated models of single-copy protein orthologues. However, results can be counterintuitive: adding/removing scaffolds can alter BUSCO predictions elsewhere in the assembly, while low sequence quality may reduce “completeness” scores and miss genes that are present in the assembly [2].

BUSCOMP (BUSCO Compilation and Comparison) complements BUSCO to identify/overcome these issues. BUSCOMP compiles a non-redundant set of the highest-scoring single-copy BUSCO complete sequences, rapidly searches these against assemblies, and robustly re-rates genes as Complete (Single/Duplicated), Fragmented/Partial or Missing. On test data from three organisms (yeast, cane toad and mainland tiger snake), BUSCOMP (1) gives consistent results when re-running the same assembly, (2) is not affected by adding or removing non-BUSCO-containing scaffolds, and (3) is minimally affected by assembly quality. This makes BUSCOMP ideal to run alongside BUSCO when trying to compare and rank genome assemblies, even in the absence of error-correction.

Available at: >https://github.com/slimsuite/buscomp>.

  1. Simão FA et al. (2015) Bioinformatics 31:3210–3212
  2. Edwards RJ et al. (2018) GigaScience 7:giy095

Tuesday, 9 July 2019

We’ve been funded! ARC Linkage - Optimising plant populations for ecological restoration and resilience

We are very happy to report another successful ARC Linkage grant application.

LP180100721: Optimising plant populations for ecological restoration and resilience

Dr Richard Edwards; Professor Justin Borevitz; Dr Jason Bragg; Dr Maurizio Rossetto; Dr Brett Summerell; Dr Marlien van der Merwe

When choosing individual plants for restoration populations, there is potentially a trade-off between maximising genetic diversity (‘adaptability’) and selection for desirable properties (‘adaptation’). This project aims to develop pioneering methods to quantify this trade-off, and facilitate the design of optimised populations, with a focus on two Australian rainforest trees that are being impacted by myrtle rust infection: Rhodamnia argentea and Rhodamnia rubescens. By studying the genetic variation in each species, and how this relates to myrtle rust resistance and climate, this project aims to design populations that are genetically diverse, maximally resistant to myrtle rust, and adapted to future climate.

We are collaborating with the Royal Botanic Garden and Domain Trust to apply genomics to challenges of conservation for rainforest trees in the face of climate change and invasive pathogens.

There will be job and studentship opportunities associated with this grant, so watch this space (or get in touch)!

Friday, 5 July 2019

Research snapshot - July 2019

One of the most important, interesting and challenging questions in biology is how new traits evolve at the molecular level. My lab employs sequence analysis techniques to interrogate protein and DNA sequences for the signals left behind by evolution. We are a bioinformatics lab but like to incorporate bench data through collaboration wherever possible.

Main Research

The core research in the lab is broadly divided into two main themes:

1. Evolutionary Genomics.

Since moving to UNSW, a major focus of the lab has been the exploitation of genomic and post-genomic data to understand biological function and adaptation to novel environments. We work closely with the Ramaciotti Centre for Genomics and are involved in numerous de novo whole genome sequencing and assembly projects, using short read (Illumina), long read (PacBio & Nanopore) and linked read (10x Chromium) sequencing. The biggest of these is leading the bioinformatics and assembly effort in a consortium to sequence the cane toad genome, and leading the BABS Genome project to sequence two iconic Australian snakes. We are a member of the Oz Mammals Genomics initiative, assisting with the sequencing and assembly of Australia’s unique marsupial fauna. In 2018, we were selected as part of a team to sequence the Waratah genome as part of the pilot phase for the new Genomics of Australian Plants initiative.

We enjoy bringing our bioinformatics to bear on a variety of collaborative research projects. Most notably, we have an ARC Linkage Grant with Microbiogen Pty Ltd to understand how a strain of Saccharomyces cerevisiae has evolved to efficiently use xylose as a sole carbon source: something vital for second-generation biofuel production that wild yeast cannot do. We are combining comparative genomics, evolutionary genetics, RNA-Seq transcriptomics, and competition assays to understand how the novel metabolism evolved. Through deep Illumina resequencing of evolving populations, and assembling reliable complete genomes of the founding ancestors, the ultimate goal is to trace how mutations have interacted with existing genetic variation during adaptive evolution. More recently, we have received an ARC Linkage Grant with the Royal Botanic Gardens and Domain Trust, to apply genomics approaches to the challenges of rainforest tree conservation in the face of climate change and invasive pathogens. We are also collaborating with industrial and academic partners to de novo sequence, assemble, annotate and interrogate the genomes of a selection of microbes with interesting metabolic abilities.

2. Short Linear Motifs (SLiMs).

Many protein-protein interactions are mediated by Short Linear Motifs (SLiMs): short stretches of proteins (5-15 amino acids long), of which only a few positions are critical to function. These motifs are vital for biological processes of fundamental importance, acting as ligands for molecular signalling, post-translational modifications and subcellular targeting. SLiMs have extremely compact protein interaction interfaces, generally encoded by less than 4 major affinity-/specificity-determining residues. Their small size enables high functional density and evolutionary plasticity, making them frequent products of convergent “ex nihilo” evolution. It also makes them challenging to identify, both experimentally and computationally. A major focus of the lab is the computational prediction of SLiMs from protein sequences. Of particular interest is the study of rapidly evolving pathogens that exploit host SLiMs and use them to hijack cellular processes. Methods are made available through the SLiMSuite bioinformatics package and webservers.

Many protein-protein interactions are mediated by Short Linear Motifs (SLiMs): short stretches of proteins (5-15 amino acids long), of which only a few positions are critical to function. These motifs are vital for biological processes of fundamental importance, acting as ligands for molecular signalling, post-translational modifications and subcellular targeting. SLiMs have extremely compact protein interaction interfaces, generally encoded by less than 4 major affinity-/specificity-determining residues. Their small size enables high functional density and evolutionary plasticity, making them frequent products of convergent “ex nihilo” evolution. It also makes them challenging to identify, both experimentally and computationally.

A major focus of the lab is the computational prediction of SLiMs from protein sequences. This research originated with Rich’s postdoctoral research, during which he developed a sequence analysis methods for the rational design of biologically active short peptides. He subsequently developed SLiMDisc, one of the first algorithms for successfully predicting novel SLiMs from sequence data - and coined the term “SLiM” into the bargain. This subsequently lead to the development of SLiMFinder, the first SLiM prediction algorithm able to estimate the statistical significance of motif predictions. SLiMFinder greatly increased the reliability of predictions. SLiMFinder has since spawned a number of motif discovery tools and webservers and is still arguably the most successful SLiM prediction tool on benchmarking data. Methods are made available through the SLiMSuite bioinformatics package and webservers.

Current research is looking to develop these SLiM prediction tools further and apply them to important biological questions. Of particular interest is the molecular mimicry employed by viruses to interact with host proteins and the role of SLiMs in other diseases, such as cancer. Other work is concerned with the evolutionary dynamics of SLiMs within protein interaction networks.

OTHER RESEARCH PROJECTS

In addition to the main research in the lab, the lab has a number of interdisciplinary collaborative projects applying bioinformatics tools and molecular evolution theory to experimental biology, often using large genomic, transcriptomic and/or proteomic datasets. These projects often involve the development of bespoke bioinformatics pipelines and a number of open source bioinformatics tools have been generated as a result. Please see the Publications and Lab software pages for more detail, or get in touch if something catches your eye and you want to find out more. We frequently have small collaborations and/or undergraduate student research projects. Many of these are “on hold” waiting for the right person, or sometimes data, to come along. If you think that you have what it needs, get in touch!

Tuesday, 2 July 2019

#GSA2019 - BUSCOMP: BUSCO Compilation and Comparison for Assessing Completeness in Multiple Genome Assemblies

Richard J. Edwards

If you are at the Genetics Society of Australasia Conference 2019, then come and hear me talk at 11:30 in Symposium 4B – Genomics & Bioinformatics (1). If you cannot make it, or loved the talk so much you want to look at it again, the slides are available on F1000Research:

  • Edwards RJ (2019): BUSCOMP: BUSCO compilation and comparison – Assessing completeness in multiple genome assemblies [version 1; not peer reviewed]. F1000Research 8:995 (slides)
    (doi: 10.7490/f1000research.1116972.1)

Abstract

Advances in DNA sequencing technology and free availability of bioinformatics tools have placed de novo genome assembly of complex organisms firmly in the domain of individual labs and small consortia. Nevertheless, the assemblies produced are often fragmented and incomplete. Optimal assembly depends on the size, repeat landscape, ploidy and heterozygosity of the genome, which are often unknown. It is therefore common practice to try multiple strategies, and there is a bottleneck in assessing and comparing assemblies.

BUSCO [1] is a powerful and popular tool that estimates genome completeness using gene prediction and curated models of single-copy protein orthologues. BUSCO assessments combine genome completeness, contiguity, and accuracy to rate genes as “Complete (Single Copy)”, “Duplicated”, “Fragmented” or “Missing”. However, results can be counterintuitive and lack robustness when comparing multiple assemblies of the same genome. Adding/removing scaffolds can alter the BUSCO genes returned by the rest of the assembly [2], while low sequence quality may reduce “completeness” scores and miss genes that are present in the assembly [3].

BUSCOMP (BUSCO Compilation and Comparison) is designed to complement BUSCO and identify/overcome these issues. BUSCOMP first compiles a non-redundant maximal set of the highest-scoring single-copy complete sequences for as many BUSCO genes as possible. These are then searched against assemblies using Minimap2 [4], converted into global alignment statistics, and used to robustly re-rate genes as Complete (Single/Duplicated), Fragmented/Partial or Missing. On test data from three organisms (yeast, cane toad and mainland tiger snake), BUSCOMP (1) gives consistent results when re-running the same assembly, (2) is not affected by adding or removing non-BUSCO-containing scaffolds, and (3) is minimally affected by assembly quality. This makes BUSCOMP ideal to run alongside BUSCO when trying to compare and rank genome assemblies, even in the absence of error-correction.

BUSCOMP is freely available at https://github.com/slimsuite/buscomp under a GNU GPL v3 license.

  1. Simão FA et al. (2015) Bioinformatics 31:3210–3212
  2. Edwards RJ et al. (2018) F1000Research 7:753
  3. Edwards RJ et al. (2018) GigaScience 7:giy095
  4. Li H (2018) Bioinformatics 34:3094-3100

Thursday, 6 June 2019

PhD available: Developing genomic resources to advance the molecular ecology of invasions

Expressions of interest are now open for a Scientia PhD Scholarship, in collaboration between the Edwards Lab, Lee Ann Rollins, and Marc Wilkins:

Developing genomic resources to advance the molecular ecology of invasions

These are exciting four-year scholarships to start in 2020 with full fees covered, a generous stipend, and career development funds. Please click on the toad or get in touch if you want to know more!

Closing date: 12 July 2019.
Location: University of New South Wales, Sydney, Australia

PROJECT DESCRIPTION

Invasive species pose a major challenge to biodiversity worldwide but also provide the unique opportunity to study evolution in action. Rapid changes are often associated with invaders’ introduction to novel environments. Understanding how molecular mechanisms drive these changes enables the creation of innovative solutions to controlling invasions and managing native species’ response to climatic change. The iconic Australian cane toad invasion is one of the best studied globally and is an emerging model for invasion genomics. This project will use whole genome sequencing, novel bioinformatic approaches and proteomics to identify molecular drivers of invasion success.

IDEAL CANDIDATE

We seek a highly motivated, curiosity-driven student with an interest in evolutionary biology and bioinformatics, who would like to understand why invasive species flourish. Ideally, candidates will have demonstrated computer literacy and be willing to learn new approaches to analysing genomic and proteomic data. Strong writing skills will be an asset. We will consider applicants coming from either a computing background who want to work in evolutionary biology or those with evolutionary biology backgrounds and a keen interest in bioinformatics. The supervisory team offer a high level of support in the fields of evolutionary biology, genomics, proteomics and bioinformatics. This project provides the opportunity for the successful applicant to develop the most current skills and build a successful career in these fields while contributing to solutions for two major global issues: loss of biodiversity and species’ response to climate change.

SUPERVISORY TEAM:

  • Dr Lee Ann Rollins, UNSW Scientia Fellow
  • Dr Richard Edwards, Senior Lecturer in Genomics & Bioinformatics
  • Prof Marc Wilkins, Professor of Systems Biology

HOW TO APPLY:

Complete an expression of interest at: https://www.scientia.unsw.edu.au/scientia-phd-scholarships/developing-genomic-resources-advance-molecular-ecology-invasions

The strongest expressions will receive an invitation to submit a full application to the scholarship competition.

CONTACT

To discuss the project and related opportunities, please contact Lee Ann Rollins (l.rollins[at]unsw.edu.au / @rollins_lee) or Rich Edwards (richard.edwards[at]unsw.edu.au / @slimsuite).

Monday, 3 June 2019

Anissa Benkaza (undergrad student)

Anissa Benkaza is a final year genetics student in the School of BABS. She is volunteering in the lab to annotate snake venom proteins as part of the BABS Genome Project.

Monday, 27 May 2019

SLiMSuite Short Linear Motif discovery and analysis: SLiMSuite release v1.8.1 (2019-05-27)

SLiMSuite Short Linear Motif discovery and analysis: SLiMSuite release v1.8.1 (2019-05-27): SLiMSuite release v1.8.1 (2019-05-27) is now on GitHub and Zenodo : Edwards RJ (2019): SLiMSuite v1.8.1 (2019-05-27). Zenodo DOI: 10....

Stephanie Chen (PhD student)

Stephanie H. Chen joined the Edwards Lab at UNSW as a PhD student in May 2019. She is working on a collaborative project with the Royal Botanic Garden and Domain Trust and is co-supervised by Richard Edwards and Jason Bragg. Her research focuses on landscape genomics of Myrtaceae species (includes eucalypts, paperbarks, and tea-trees) and the genetic basis of resistance to myrtle rust, which is of pressing concern to Australia’s native biodiversity. She is also contributing to assembling and annotating the waratah (Telopea speciosissima) genome as part of the Genomics for Australian Plants Framework Initiative.

Stephanie holds a Bachelor of Science (Honours) (First Class Honours and the University Medal) from the University of Sydney, Australia, with a major in Plant Science.

[LinkedIn]

Thursday, 23 May 2019

Complete genome sequences of pooled genomic DNA from 10 marine bacteria using PacBio long-read sequencing

Song W, Thomas T & Edwards RJ (2019) Complete genome sequences of pooled genomic DNA from 10 marine bacteria using PacBio long-read sequencing. Marine Genomics in press DOI: 10.1016/j.margen.2019.05.002

Abstract

Background

High-quality, completed genomes are important to understand the functions of marine bacteria. PacBio sequencing technology provides a powerful way to obtain high-quality completed genomes. However individual library production is currently still costly, limiting the utility of the PacBio system for high-throughput genomics. Here we investigate how to generate high-quality genomes from pooled marine bacterial genomes.

Results

Pooled genomic DNA from 10 marine bacteria were subjected to a single library production and sequenced with eight SMRT cells on the PacBio RS II sequencing platform. In total, 7.35 Gbp of long-read data was generated, which is equivalent to an approximate 168× average coverage for the input genomes. Genome assembly showed that eight genomes with average nucleotide identities (ANI) lower than 91.4% can be assembled with high-quality and completion using standard assembly algorithms (e.g. HGAP or Canu). A reference-based reads phasing step was developed and incorporated to assemble the complete genomes of the remaining two marine bacteria that had an ANI > 97% and whose initial assemblies were highly fragmented.

Conclusions

Ten complete high-quality genomes of marine bacteria were generated. The findings and developments made here, including the reference-based read phasing approach for the assembly of highly similar genomes, can be used in the future to design strategies to sequence pooled genomes using long-read sequencing.

Friday, 10 May 2019

Preprint: Whole genome sequencing of a novel, dichloromethane-fermenting Peptococcaceae from an enrichment culture

Holland​ SI*, Edwards​ RJ*, Ertan H, Wong YK, Russell TL, Deshpande NP, Manefield M & Lee MJ​ (2019) Whole genome sequencing of a novel, dichloromethane-fermenting Peptococcaceae from an enrichment culture. PeerJ Preprints 7:e27718v1. DOI 10.7287/peerj.preprints.27718v1 [*Joint first authors]

Abstract

Bacteria capable of dechlorinating the toxic environmental contaminant dichloromethane (DCM, CH2Cl2) are of great interest for potential bioremediation applications. A novel, strictly anaerobic, DCM-fermenting bacterium, “DCMF”, was enriched from organochlorine-contaminated groundwater near Botany Bay, Australia. The enrichment culture was maintained in minimal, mineral salt medium amended with dichloromethane as the sole energy source. PacBio whole genome SMRTTM sequencing of DCMF allowed de novo, gap-free assembly despite the presence of cohabiting organisms in the culture. Illumina sequencing reads were utilised to correct minor indels. The single, circularised 6.44 Mb chromosome was annotated with the IMG pipeline and contains 5,773 predicted protein-coding genes. Based on 16S rRNA gene and predicted proteome phylogeny, the organism appears to be a novel member of the Peptococcaceae family. The DCMF genome is large in comparison to known DCM-fermenting bacteria and includes 96 predicted methylamine methyltransferases, which may provide clues to the basis of its DCM metabolism. Full annotation has been provided in a custom genome browser and search tool, in addition to multiple sequence alignments and phylogenetic trees for every predicted protein, available at http://www.slimsuite.unsw.edu.au/research/dcmf/.

Thursday, 2 May 2019

Carla Aguilar Gomez (Visiting researcher)

Carla Aguilar Gomez was a visiting researcher in the lab from May to July 2019, working on SLiM prediction from cross-linking mass spectrometry data in yeast. Carla left the lab to move to Austria for a PhD in biotechnology.

Monday, 18 March 2019

Term 2 Honours projects available

For any students completing their undegrad studies in UNSW Term 1, or external students finishing before June 2019, the UNSW Term 2 honours student application is now open. Deadline for Term 2 applications is 5pm Friday 12th April 2019. For how to apply please check the Faculty link: http://www.science.unsw.edu.au/honours-apply.

As usual, the EdwardsLab is looking to recruit enthusiastic students in genomics, in two main areas:

  1. Comparative genomics and molecular evolution in yeast, using long-read PacBio sequencing. We are particularly keen to get a student to work on aspects of our ARC Linkage grant, investigating the evolution of a novel biochemical pathway in yeast.

  2. De novo whole genome sequencing of vertebrates, including two snakes and the cane toad. In addition to general assembly and annotation activities, the lab has a few analytical tools that would benefit from

More details of honours can be found on the BABS website, or please get in touch if you have questions about specific projects. We welcome students interested in any of our Research areas, not just the projects listed. Applications from non-UNSW students are also encouraged.

Term 3 applications

Please also note that T3 honours application open date and deadline below, should you apply for T3 intake:

  • Applications open from Friday 17th May 2019

  • Deadline for Term 3 applications is 5pm Friday 26th July 2019

Monday, 18 February 2019

Paris Thompson (BABS3301 student)

Paris Thompson is an advanced science student at UNSW majoring in Genetics and Microbiology in her final trimester of undergrad coursework. She is doing the Biomolecular Science Laboratory Project (BABS3301) with the Edwards Lab and working on the Yeast genomes project.

Wednesday, 16 January 2019

We have a new 10x Supernova assembly lab record

One of our sequencing approaches to play with at the moment is 10x Chromium linked read, assembled with Supernova. We’ve now done a bunch of different species, and they mostly come out pretty decent - and excellent value for money. As part of the Oz Mammals Genomics project, we are currently assembling several rock-wallabies. One of them, Petrogale mareeba has just broken our lab record for the most intact assembly - initial “pseudohaploid” stats:

  • Total length of sequences: 3,329,538,806
  • Max. length of sequences: 275,258,067
  • N50 length of sequences: 47,482,977
  • L50 count of sequences: 20
  • Gap (N) length: 40,183,160 (1.21%)

Over half of the 3.3 Gb genome is covered by just twenty scaffolds, at least 47.5 Mb in length - the longest is over 275 Mb! (We normally feel pretty happy to get an N50 of a few Mb.) Cuter than a cane toad too!


Photo credit: Richard.Fisher CC BY 2.0