Tuesday, 17 December 2024

#AusEvol2024 - Depth-based correction of gene duplications and losses in genome assemblies

The Australasian Evolution Society conference has always been one of my favourites, due to its laid back culture of inclusivity and kindness. (And low cost!) It therefore feels quite fitting that my last conference as an Aussie academic was AES2024.

This talk was a bit of an update from my AES2021 presentation. This showcased some of the latest additions to DepthKopy, including depth-based copy number correction of genome features, such as rDNA genes, repeat families, or multicopy genes. This includes a feature that classifies multicopy “Duplicated” genes identified by BUSCO as true (biological) or false (artefactual) duplicates. TL/DR version: analysis of draft genome assemblies for 45 species of fish across five different depths/qualities indicates that DepthKopy can correct the copy number and total length of multicopy features to within 10% of the true number. (The lower-quality raw assemblies ranged from a 30% under-estimate to a 60% over-estimate.)

This will be of most importance when low quality draft genomes are included in a comparative genomics analysis. However, even the best genome assemblies appear to have some “collapsed” or duplicated loci where the copy number in the assembly does not accurately reflect the copy number in the genome. DepthKopy is useful for exploring the magnitude of such disparities, and can help to identify and correct specific disrepancies in genes or features of interest.

No comments:

Post a Comment