Richard J. Edwards, Åsa Pérez-Bercoff, Tonia L. Russell, Zhiliang Chen , Marc R. Wilkins, Paul V. Attfield & Philip J.L. Bell. F1000Research 2016, 5:172 (poster) (doi: 10.7490/f1000research.1111305.1).
Abstract
PacBio Single Molecule Real Time (SMRT™) sequencing is rapidly becoming the technology of choice for de novo whole genome sequencing. The long read lengths and random error of PacBio data make genome assembly considerably easier and more accurate than short read data. Here, we report on de novo genome sequencing and assembly of three Saccharomyces cerevisiae genomes using the PacBio RSII at the UNSW Ramaciotti Centre for Genomics. A haploid reference yeast genome strain, S288C, and two novel diploid strains were sequenced as part of a larger functional genomics project. For each strain, 20kb SMRT Bell library preps were performed and sequenced on two SMRT Cells using the P6-C4 chemistry with read lengths of up to 53.3 kb. Whole genome de novo assemblies are then generated through the PacBio SMRT Portal.
We are using the S288C data to explore performance in comparison to the published genome as a reference. An initial assembly of S288C yielded over 99.97% genome coverage at 99.99% accuracy on only 26 contigs, with 16/17 reference chromosomes (16 nuclear chromosomes plus mitochondrion) essentially returned as a single, complete contig. The long reads enable accurate reconstruction of tandemly repeated genes (except >900kb of rRNA repeats), transposition and chromosomal translocations. We are now using the S288C data to optimise the assembly process and derive assembly settings for the two novel diploid strains. To this end, we have developed a new pipeline for the comparative assessment of high quality whole genomes against a reference, which we are now adapting for the additional challenge of appropriately handling diploid data.