Thursday, 19 September 2024

#PAGAustralia Poster 14: Synteny-Guided Semi-Automated Curation of Chromosome-Level Genome Assemblies

If you are attending PAG Australia 2024, come and have a chat at Poster 14 about easing the burden of chromosome-level assembly curation.

Abstract Text

Reference genomes are fundamental resources that underpin research across most aspects of modern biology. Technological improvements in the length and accuracy of long-read sequencing platforms, combined with Hi-C proximity ligation sequencing, has enabled the routine generation of highly contiguous phased assemblies, scaffolded to chromosome-level. Nevertheless, automated generation of perfect gapless “telomere-to-telomere” assemblies remains out of reach for most eukaryotic organisms. Scaffolding errors and false duplications can still occur, and assembly curation is now the main bottleneck for large-scale assembly projects. Here, I present a streamlined data workflow and scaffolding assessment for high-throughput manual curation of chromosome-level genome assemblies. Synteny between haplotypes, or closely related species, is combined with read mapping to orient, pair and visualise assembled chromosomes. Assembly gaps are classified according to scaffolding confidence, highlighting candidates for simple scaffolding corrections, such as inversions. Synteny visualisation, gap classification, and HiC contact maps are then combined to identify and document scaffolding edits with increased speed, precision and confidence. This accelerates the production of curated chromosome-level assemblies, and enables the identification of regions of the assembly that may require further attention. Individual tools used in the workflow (ChromSyn, Telociraptor, SynBad, PAFScaff, DepthKopy and DepthCharge) are available at https://github.com/slimsuite/.

Thursday, 5 September 2024

Origin and maintenance of large ribosomal RNA gene repeat size in mammals

Our latest paper is out as a Featured Article in the journal Genetics, featuring ONT from both the cane toad and BABS Genome snake genomes. This paper looks at how ribosomal RNA gene repeats (a.k.a. rDNA repeats) have evolved in vertebrates to expand in size in mammals. For something so fundamental to the function of an organism - literally every process of every cell ultimately relies on rRNA - there is surprising diversity. These regions are traditionally hard to assemble with short reads, and still provide challenges for long-read assemblies, so the new era of high-quality long-read assemblies is likely to reveal a lot about their evolution.

  • Macdonald E, Whibley A, Waters PD, Patel H, Edwards RJ & Ganley ARD (2024): Origin and maintenance of large ribosomal RNA gene repeat size in mammals. Genetics 228(1): iyae121 [Genetics] [PubMed]

Abstract

The genes encoding ribosomal RNA are highly conserved across life and in almost all eukaryotes are present in large tandem repeat arrays called the rDNA. rDNA repeat unit size is conserved across most eukaryotes but has expanded dramatically in mammals, principally through the expansion of the intergenic spacer region that separates adjacent rRNA coding regions. Here, we used long-read sequence data from representatives of the major amniote lineages to determine where in amniote evolution rDNA unit size increased. We find that amniote rDNA unit sizes fall into two narrow size classes: “normal” (∼11–20 kb) in all amniotes except monotreme, marsupial, and eutherian mammals, which have “large” (∼35–45 kb) sizes. We confirm that increases in intergenic spacer length explain much of this mammalian size increase. However, in stark contrast to the uniformity of mammalian rDNA unit size, mammalian intergenic spacers differ greatly in sequence. These results suggest a large increase in intergenic spacer size occurred in a mammalian ancestor and has been maintained despite substantial sequence changes over the course of mammalian evolution. This points to a previously unrecognized constraint on the length of the intergenic spacer, a region that was thought to be largely neutral. We finish by speculating on possible causes of this constraint.