Friday, 15 June 2018

Sequencing snakes: Pseudodiploid pseudo-long-read whole genome sequencing and assembly of Pseudonaja textilis (eastern brown snake) and Notechis scutatus (mainland tiger snake)

Richard J Edwards, Timothy G Amos, Joshua Tang, Beni Cawood, Sabrina Rispin, Daniel Enosi Tuipulotu & Paul Waters.

This work was presented at the Sydney Bioinformatics Research Symposium 2018. (Abstract below.) Click on thumbnail for full resolution PDF. Citation:

Edwards RJ et al. Pseudodiploid pseudo-long-read whole genome sequencing and assembly of Pseudonaja textilis (eastern brown snake) and Notechis scutatus (mainland tiger snake) [version 1; not peer reviewed]. F1000Research 2018, 7:753 (poster) (doi: 10.7490/f1000research.1115550.1)

Abstract

The precipitous drop in sequencing costs over recent years has seen the bottleneck in vertebrate whole genome sequencing (WGS) shift from data generation (sequencing) to data processing (assembly and annotation). Draft genomes generated from cheap shotgun Illumina sequencing tend to be highly fragmented with many tens of thousands of short contigs or scaffolds. This can be improved by preparing multiple paired end and “mate pair” libraries with different insert sizes, but this increases the cost of both sequencing and data storage/analysis. PacBio or Oxford Nanopore long read sequencing enables massive improvements in assembly quality but tends to be prohibitively expensive for organisms with large genome sizes, such as vertebrates. 10x Genomics Chromium “linked read” sequencing offers a solution to this problem. High molecular weight molecules of DNA are barcoded prior to standard shotgun Illumina sequencing. These barcodes can then be used for pseudo-long-read assembly, with improved handling of repetitive regions. Where heterozygous variants are dense enough, haplotypes can be phased to generate a “pseudodiploid” assembly with some regions represented as two alleles. This is all for the cost of an additional library prep with no extra sequencing. But does it work?

We have sequenced two of the deadliest venomous snakes in Australia using 10x Chromium linked reads: the mainland tiger snake (Notechis scutatus) and the eastern brown snake (Pseudonaja textilis). Supernova v2 assemblies of the data generated exceptionally high quality genomes for the price, with maximum scaffolds over 50 Mb and N50 values of 5.99 Mb for the tiger snake and 14.7 Mb for the brown snake. This was reflected in BUSCO (v2.0.1 short) completeness estimates of 87.3% (tiger snake) and 90.5% (brown snake). These data will be compared to tiger snake WGS using standard paired end Illumina NovaSeq shotgun sequencing, and discussed with respect to some of the downstream opportunities and challenges provided by pseudodiploid genome assemblies. In particular, BUSCO analysis of haploid, pseudodiploid, and non-redundant genome assemblies revealed some interesting and unexpected behaviour of this widely-used tool. We also present results from GenomeR, a Shiny app (in development) for batch kmer genome size estimation (http://shiny.slimsuite.unsw.edu.au/GenomeR/).

Snake genomes and ongoing annotation are being made available through the lab Web Apollo browser and search tool (http://www.slimsuite.unsw.edu.au/servers/apollo.php). We welcome contact from anyone interested in getting involved with the annotation and analysis of these genomes.

No comments:

Post a Comment