Richard J Edwards, Åsa Pérez-Bercoff, Tonia L Russell, Paul V Attfield & Philip JL Bell.
PacBio Single Molecule Real Time (SMRT™) sequencing is rapidly becoming the technology of choice for de novo whole genome sequencing. The long read lengths and random error of PacBio data make genome assembly considerably easier and more accurate than short read data. In polyploid genomes, heterozygosity – particularly in structural variants – generates some additional challenges for de novo genome assembly. For some regions, homologous chromosomes are assembled together into a chimeric sequence. However, long reads from heterozygous regions will often assemble into two distinct contigs, fragmenting the assembly. Despite these challenges, long reads present new opportunities for assembly of polyploid genomes. Where the heterozygosity (i.e. density and number of polymorphisms) is high enough, polymorphisms can be phased into distinct “haplotigs” derived from a single parent.
We have performed genome sequencing of novel diploid Saccharomyces cerevisiae strains using the PacBio RSII at the UNSW Ramaciotti Centre for Genomics. In addition, we have generated an “in silico diploid” strain by combining sequence data from two haploid strains that we previously sequenced and assembled. Here, we report on progress regarding de novo diploid assembly and efforts towards full haplotype phasing of these strains. Our in silico diploid enables us to assess how successfully we can reconstruct known parental chromosomes and haplotypes. The types and levels of heterozygosity required for haplotype phasing will be discussed, along with the challenges presented by heterozygous chromosomal translocations.