Saturday, 16 November 2024

Repeat-Rich Regions Cause False-Positive Detection of NUMTs: A Case Study in Amphibians Using an Improved Cane Toad Reference Genome

Version 3* of the cane toad reference genome, aRhiMar1.3 is now officially out and published in Genome Biology and Evolution. [*It’s only the second published genome but for internal reasons the original draft genome was version 2!].

The focus of the paper itself is confirming the lack of Nuclear Mitochondrial fragments (a.k.a. NUMTs) in the cane toad genome, which could impact whole-mitogenome analysis of genetic diversity in cane toads. We were pretty surprised when we first looked for NUMTs in the cane toad genome and could not find any! The draft genome is pretty drafty, especially in terms of missing repetitive regions, so an updated long-read assembly was important to rule out a false negative result.

The new genome is in much better shape, with an extra 922 Mbp (>95% repeats) and a 15x increase in scaffold N50 (2.5 Mbp). (My biggest regret with the original paper was not sticking to my guns and including an early DepthSizer genome size estimate of 3.5 Mbp, which has subsequently turned out to be correct.) The cane toad genome remains a tough nut to crack, and we didn’t quite reach the magic 1Mb contig N50 (860 kb), but the functional completeness was markedly improved and we are pretty confident that the continued absence of NUMT detection is a real phenomenom and does not simply reflect technical limitations.

Watch this space for a chromosome-level cane toad genome, which is still in the works.

Cheung K, Rollins LA, Hammond JM, Barton K, Ferguson JM, Eyck HJF, Shine R & Edwards RJ (2024): Repeat-rich regions cause false positive detection of NUMTs - a case study in amphibians using an improved cane toad reference genome. Genome Biology and Evolution evae246. [Gen Biol Evol] [bioRxiv] [PubMed]

Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.

No comments:

Post a Comment