de novo genome sequencing and assembly with PacBio
Rountree Room Level 3
D26 Biological Sciences Building, UNSW
Friday 4 Sept 3.00pm
We have performed PacBio single molecule real time (SMRT) sequencing of three yeast whole genomes. A haploid reference yeast strain (S288C) and two novel diploid strains were sequenced as part of a larger functional genomics project. For each strain, 2-2.6 Gb of usable sequence data was generated with read lengths of up to 53.3 kb. Pure PacBio whole genome de novo assemblies were generated using the HGAP3 pipeline. initial assembly of S288C yielded over 99.9% genome coverage at 99.997% accuracy with 15 of 17 reference chromosomes (16 nuclear chromosomes plus mitochondrion) essentially returned as a single, complete unitig. We are now using the S288C data to optimise the assembly process and derive assembly settings for the two novel strains. To this end, we have developed a new pipeline for the comparative assessment of high quality whole genomes against a reference. We are also exploring the trade-off between accuracy and sequencing depth of the PacBio “pre-assembly” and how this affects the final assembly.
[This work will also be presented at AGTA 2015, if you are interested and miss it.]