If you are attending PAG Australia 2024, come and have a chat at Poster 14 about easing the burden of chromosome-level assembly curation.
Abstract Text
Reference genomes are fundamental resources that underpin research across most aspects of modern biology. Technological improvements in the length and accuracy of long-read sequencing platforms, combined with Hi-C proximity ligation sequencing, has enabled the routine generation of highly contiguous phased assemblies, scaffolded to chromosome-level. Nevertheless, automated generation of perfect gapless “telomere-to-telomere” assemblies remains out of reach for most eukaryotic organisms. Scaffolding errors and false duplications can still occur, and assembly curation is now the main bottleneck for large-scale assembly projects. Here, I present a streamlined data workflow and scaffolding assessment for high-throughput manual curation of chromosome-level genome assemblies. Synteny between haplotypes, or closely related species, is combined with read mapping to orient, pair and visualise assembled chromosomes. Assembly gaps are classified according to scaffolding confidence, highlighting candidates for simple scaffolding corrections, such as inversions. Synteny visualisation, gap classification, and HiC contact maps are then combined to identify and document scaffolding edits with increased speed, precision and confidence. This accelerates the production of curated chromosome-level assemblies, and enables the identification of regions of the assembly that may require further attention. Individual tools used in the workflow (ChromSyn, Telociraptor, SynBad, PAFScaff, DepthKopy and DepthCharge) are available at https://github.com/slimsuite/.