Åsa Pérez-Bercoff, Tonia L. Russell, Paul V. Attfield, Philip J.L. Bell & Richard J. Edwards. http://f1000research.com/posters/4-1022.
We have performed PacBio single molecule real time (SMRT) sequencing of three yeast whole genomes. A haploid reference yeast strain (S288C) and two novel diploid strains were sequenced as part of a larger functional genomics project. For each strain, 20kb SMRT Bell library preps were performed and sequenced on two SMRT Cells using the P6-C4 chemistry. Between 1.74 Gb and 2.55 Gb of usable sequence data was generated for each strain, with read lengths of up to 53.3 kb. Pure PacBio whole genome de novo assemblies were generated using the HGAP3 pipeline. We are using the S288C data to explore performance in comparison to the published genome as a reference. An initial assembly of S288C yielded over 99.9% genome coverage at 99.997% accuracy on only 26 unitigs, versus 17 reference chromosomes (16 nuclear chromosomes plus mitochondrion). Of these, 16 chromosomes were essentially returned as a single, complete unitig. We are now using the S288C data to optimise the assembly process and derive assembly settings for two novel strains. To this end, we have developed a new pipeline for the comparative assessment of high quality whole genomes against a reference. In this poster, we explore how good PacBio sequencing is for de novo genome sequencing in yeast with examples of both the power and limits of the technology.