Friday, 3 December 2021

Limited Introgression between Rock-Wallabies with Extensive Chromosomal Rearrangements

When we were assembling the 10x Genomics linked read assemblies of several rock wallabies, one of them broke the lab record for Supernova assembly quality.

10x linked reads are a bit of a dead technology now, but genomes made with them are still useful and generating insights. One example is the first rock wallaby we assembled, as a (very small) part of a team that used the draft 10x genome assembly for Petrogale penicillata as a reference to investigate chromosomal rearrangements, published online today in Molecular Biology and Evolution, one of my favourite journals.


Potter P, Bragg JG, Turakulov R, Eldridge MDB, Deakin J, Kirkpatrick M, Edwards RJ & Moritz C (accepted): Limited introgression between rock-wallabies with extensive chromosomal rearrangements. Molecular Biology and Evolution msab333 https://doi.org/10.1093/molbev/msab333

Abstract

Chromosome rearrangements can result in the rapid evolution of hybrid incompatibilities. Robertsonian fusions, particularly those with monobrachial homology, can drive reproductive isolation amongst recently diverged taxa. The recent radiation of rock-wallabies (genus Petrogale) is an important model to explore the role of Robertsonian fusions in speciation. Here, we pursue that goal using an extensive sampling of populations and genomes of Petrogale from north-eastern Australia. In contrast to previous assessments using mitochondrial DNA or nuclear microsatellite loci, genomic data are able to separate the most closely related species and to resolve their divergence histories. Both phylogenetic and population genetic analyses indicate introgression between two species that differ by a single Robertsonian fusion. Based on the available data, there is also evidence for introgression between two species which share complex chromosomal rearrangements. However, the remaining results show no consistent signature of introgression amongst species pairs and where evident, indicate generally low introgression overall. X-linked loci have elevated divergence compared with autosomal loci indicating a potential role for genic evolution to produce reproductive isolation in concert with chromosome change. Our results highlight the value of genome scale data in evaluating the role of Robertsonian fusions and structural variation in divergence, speciation, and patterns of molecular evolution.

Thursday, 2 December 2021

EdwardsLab at Australasian Evolution Society #AUSEVO2021

Look out for some interesting talks by Edwards Lab members at this year’s Australasian Evolution Society 2021 conference, starting today:

Thursday 2nd December: 3-minute talks | 1130-1230

Kelton Cheung - Analysis of mitochondrial DNA reveals significant genetic diversity in invasive Australian cane toads

Kelton Cheung, Mark Richardson, Richard Edwards & Lee Ann Rollins

Mitochondrial DNA haplotype patterns across the native and invaded ranges can reveal the history and evolutionary trajectory of invasions. The invasion of Australia by the cane toad (Rhinella marina) has accelerated as it expanded westward. Despite this success, previous studies reported no mitochondrial genetic diversity in Australia. Here, we assembled a complete mitochondrial reference genome and haplotyped toads (N=119) from the native range and two introduced populations (Hawai’i and Australia), using whole genome and RNA-seq data. The complete R. marina mitochondrial genome consists of 18,152 base pairs with no significant gene arrangement as compared to other bufonid species. Although native range genetic diversity was much higher than that of the introduced ranges, we identified 29 haplotypes in Australia. While we did find evidence of founder effects following introduction, our results suggest there is significant genetic diversity within Australia, which may assist adaptation and invasion success in this species.


Thursday 2nd December: Selection 1 | 1415-1430

Katarina Stuart - A genetic perspective on rapid adaptation in the globally invasive European starling (Sturnus vulgaris)

Abstract coming soon…


Thursday 2nd December: Biogeography and phylogenetics | 1530-1545

Richard Edwards - DepthKopy: copy number prediction using single-copy long-read depth profiles

Richard J Edwards, Stephanie H Chen, Katarina C Stuart, Mark M Tanaka & Jason G Bragg

Gene duplication, followed by functional divergence of gene copies, is a fundamental component of genetic adaption and the evolution of novelty. Increasingly, evolutionary studies make use of the ever-expanding number of high-quality genome assemblies to characterise patterns of gene gain and loss. However, even reference genomes of the highest quality can experience assembly errors at tandemly repeated gene loci, resulting in incorrect inference of gene duplication patterns. Here, we present DepthKopy (https://github.com/slimsuite/depthkopy), which estimates the copy number for a gene, region or sequence of interest, using an estimate of single-copy sequencing depth derived from complete BUSCO genes. This is useful for identifying haplotigs, and collapsed repeat regions during genome assembly curation. Critically, for evolutionary studies of gene families, DepthKopy can identify genes for which the number of genes in the assembly does not seem to match the genome.


Friday 3rd December: Climate change and temperature | 1330-1345

Collin Ahrens - Genomic constraints of drought adaptation

Abstract coming soon…

Wednesday, 24 November 2021

#ABACBS2021 Lightning talk - DepthSizer and DepthKopy: genome size and copy number prediction using single-copy long-read depth profiles

Tune in for the ABACBS 2021 lightning talks this morning to hear about our applications of long reads to genome size and copy number prediction. For more information, you can read the NSW Waratah genome paper pre-print or visit the GitHub pages for DepthSizer and DepthKopy.

Richard J Edwards, Stephanie H Chen, Katarina C Stuart, Mark M Tanaka, Jason G Bragg.

DepthSizer and DepthKopy: genome size and copy number prediction using single-copy long-read depth profiles

A fundamental part of any genome project is establishing the genome size of the organism being sequenced. The gold standard for genome size measurement is flow cytometry, but this is not available to all groups and can give surprisingly variable results. Popular bioinformatic approaches predict genome size using kmer frequency profiles from high-accuracy (e.g. illumina or hifi) sequencing reads, or the mean depth of coverage reads mapped to an assembly. Both of these approaches can be adversely affected by repetitive regions of the genome. Mean sequencing depth is also highly reliant on assembly completeness.

Here, we present DepthSizer (https://github.com/slimsuite/depthsizer), which refines this approach by estimating sequencing depth based on single-copy complete BUSCO genes. DepthSizer works on the principle that genuine single-copy regions will tend towards the same, true, single-copy read depth. In contrast, assembly errors, collapsed repeats within those genes, or incorrect BUSCO predictions, will give inconsistent read depth deviations. The modal read depth across single-copy BUSCO genes, calculated from a depth density profile of these regions, should therefore provide a good estimate of the true depth of coverage. The method is benchmarked on model organism data and corrections for possible contamination, biases/inconsistencies in read mapping and/or raw read insertion/deletion error profiles are discussed. We also present DepthKopy (https://github.com/slimsuite/depthkopy), which uses the same read depth approach to estimate the copy number of assembly regions. This can be useful for identifying haplotigs, and collapsed repeat regions.

Keywords: BUSCO, Genome Assembly, Genomics, ONT, PacBio, copy number variants

Monday, 11 October 2021

SLiMSuite Short Linear Motif and Genomics Analysis Tools: BUSCOMP v0.13.0 (MetaEuk) release

SLiMSuite: BUSCOMP v0.13.0 (MetaEuk) release: BUSCOMP v0.13.0 is now on GitHub. This release features updates to parse additional BUSCO v5 outputs, including transcriptome and proteome modes. It has also been updated to be compatible with MetaEuk runs by generating the missing *.fna files where possible.

Congratulations to the Edwards Lab #GSAA21 Award winners

Congratulations to Katarina Stuart, Stephanie Chen, and Cadel Watson, who all won prizes at this year’s Genetics Society of AustralAsia 2021 Conference.

Stephanie’s abstract was selected for the Spencer Smith-White Travel Award, Katarina won the best student talk award, and Cadel won the best student lightning talk award. Well done, all!

Wednesday, 6 October 2021

Edwards Lab at Genetics Society of AustralAsia 2021 #GSAA21

Look out for some interesting genomics talks by Edwards Lab members at this year’s Genetics Society of AustralAsia 2021 conference, which started today. Congratulation to Stephanie for winning the Spencer Smith-White Travel Award (shame about the lack of travel!), Cadel for getting a lightning talk as an Honours student. And a shout out to Kat, who is one of the conference organisers.

Thursday 7th October: Genomics and Transcriptomics Session | 1:30-2:00 (Lightning talks)

Cadel Watson - dedUCE: efficient identification of Ultraconserved Elements from multiple genomes

Cadel Watson, Mitchell J. Cummins, Yasir Kusay, Maxine Halbheer, Eric Urng, John S. Mattick and Richard J. Edwards

Ultraconserved elements (UCEs) are DNA sequences which are extremely conserved and found almost unchanged in the genomes of multiple, divergent species [1]. UCEs have been found in a wide variety of organisms, including mammals, fish, insects, birds, and plants. Whilst the evidence suggests that that they are the result of natural selection, indicating biological importance, their function has thus far proven elusive [2]. The recent (and ongoing) explosion in the quality and quantity of reference genomes across multiple taxa provides new opportunities for investigating the prevalence, evolution and role of UCEs. However, the field is hampered by a lack of fast and resource-efficient algorithms to identify UCEs. Furthermore, common alignment-based algorithms fail to identify non-syntenic UCEs.

Here, we present dedUCE, a novel tool for identifying all UCEs in a set of genomes. dedUCE uses a hash-based algorithm to rapidly identify core UCE kmers that are shared by multiple genomes, before extending and merging candidates into a final comprehensive but non-redundant set of UCEs. dedUCE can support UCEs appearing out-of-order due to genetic rearrangements and/or assembly artefacts, and is able to return UCEs with inexact homology. Stringency can be controlled by parameters controlling the length, support (number of genomes) and required sequence identity. Preliminary results show that dedUCE can identify all UCEs in a group of 40 mammalian genomes in 8 hours on a 16-core machine, which is orders of magnitude faster than previous algorithms. Applications of dedUCE will be discussed, including improving the definition of UCEs, and making use of UCE content to assess genome assembly completeness.

  1. Gill Bejerano, Michael Pheasant, Igor Makunin, Stuart Stephen, W. James Kent, John S. Mattick, and David Haussler (2004). Ultraconserved El- ements in the Human Genome. Science, 304(5675):1321–1325.

  2. Konstantinos Kritsas, Samuel E. Wuest, Daniel Hupalo, Andrew D. Kern, Thomas Wicker, and Ueli Grossniklaus (2012). Computational analysis and char- acterization of UCE-like elements (ULEs) in plant genomes. Genome Research, 22(12):2455–2466.


Friday 8th October: Ecological and Evolutionary Genetics Session | 10:45-11:00

Katarina Stuart - A genetic perspective on rapid adaptation in the globally invasive European starling (Sturnus vulgaris)

Stuart KC, Sherwin WB, Edwards RJ & Rollins LA

Few invasive birds are as globally successful or as well-studied as the common starling (Sturnus vulgaris). Native to the Palaearctic, the starling has been a prolific invader in North and South America, southern Africa, Australia, and The Pacific Islands, while facing declines in excess of 50% in in some native regions. Starlings present an invaluable opportunity to test predictions about the evolutionary trajectory of invasive populations, and gain insight into genetic shifts in response to anthropogenic alteration and climate change. My research focuses primarily on the invasive European starling population in Australia and aims to investigate the genetics underlying their evolution, using a range of genomic approaches. Through historic museum sample sequencing, I examine single nucleotide polymorphism variations shifts between the native range and Australia, and find parallel selection on both continents, possibly resulting from common global selective forces such as exposure to pollutants and carbohydrate exposure. I further examine matched genetic, morphological, and environmental data to reveal patterns of heritability and plasticity across ecologically significant phenotypic traits, revealing that elevation, as well as rainfall and temperature variability plays an important role in shaping morphology and genetics. Finally, I investigated patterns of structural variants, to uncover evolutionarily significant large-scale genetic variants across a global data set, and more specifically characterise their role in rapid starling adaptation across the entirety of the Australian range. Overall, my research seeks to better understand mechanisms and patterns of genetic change within this species, which may be used to inform invasion or native range management. More broadly, this evolutionary research into the starling provide an important perspective on the role of rapid evolution in invasive species persistence, and the global pressures that may shape range shifts and evolution across many similar avian taxa.


Friday 8th October: Spencer Smith-White Travel Award recipient | 1:15-1:30

Stephanie Chen - Genomics of speciation and introgression: insights from waratah (Telopea spp.) as a model clade

Telopea is an eastern Australian genus of five species of long-lived shrubs in the family Proteaceae. Previous work has characterised population structure and patterns of introgression between Telopea species. These studies were performed using a limited set of genetic markers, but point to the great potential of waratah as a model clade for understanding the processes of divergence, environmental adaptation and speciation, when enhanced by a genome-wide perspective enabled by a reference genome. However, few Proteaceae genomes and no waratah genomes are available. We assembled the first chromosome-level reference genome for T. speciosissima (New South Wales waratah; 2n = 22) using Nanopore long-reads, 10x Chromium linked-reads and Hi-C data. The assembly spans 823 Mb, representing 93.9 % of the estimated genome size, with a scaffold N50 of 69.1 Mb and 91.3 % of complete Embryophyta universal single-copy orthologs (BUSCOs) are present. We examined the evolutionary dynamics of Telopea using the reference genome in conjunction with DArTseq (n = 244) and whole genome shotgun sequencing (n = 14) of each of the seven lineages; there are three lineages of T. speciosissima – coastal, upland, and southern. Here, I will discuss the population structure and demographic history of the genus. We also examined phylogenomic relationships and developed a scalable method of rapidly generating species trees from short-read data to maximise the recovery of informative data from genomic datasets. The waratah reference genome represents an important new genomic resource in Proteaceae to accelerate our understanding of the origins and evolutionary dynamics of the Australian flora.

Thursday, 3 June 2021

Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C

The latest genomics paper from the lab is now out on bioRvix. This is the first paper from Stephanie Chen’s PhD project in collaboration with the Royal Botanic Gardens and Domain Trust (RBGDT), Sydney. In this paper, Stephanie reports on the chromosome-level assembly of the New South Wales Waratah, the floral emblem of NSW. This is the first of the pilot reference genomes to be released from the Genomics for Australian Plants initiative.

In addition to the genome itself, this paper describes a couple of genomics tools from the lab. DepthSizer (https://github.com/slimsuite/depthsizer) uses BUSCO predictions to establish the single-copy read depth of sequencing data, from which the genome size can be estimated in a way that is hopefully quite robust to assembly quality. Diploidocus (https://github.com/slimsuite/diploidocus) has been used for our previous Dog genome assemblies to help eliminate “haplotigs” (heterozygous regions of the genome that appear in the assembly twice), and low-quality sequences, in addition to flagging possible collapsed repeats or contaminants for further investigation. Here, the Diploidocus “tidy” pipeline is considerably extended for a much more nuanced classification and filtering of scaffolds, using a combination of read depths, homology, kmer analysis and BUSCO predictions.


Chen SH, Rossetto M, van der Merwe M, Lu-Irving P, Yap JS, Sauquet H, Bourke G, Bragg JG & Edwards RJ (preprint): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. bioRxiv 2021.06.02.444084; doi: 10.1101/2021.06.02.444084.
[bioRxiv]

Abstract

Background: Telopea speciosissima, the New South Wales waratah, is Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Findings: Here, we report the first chromosome-level reference genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 91.2 % of Embryophyta BUSCOs complete. We introduce a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering assembly scaffolds. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single copy orthologues and find that the assembly is 93.9 % of the estimated genome size. The largest 11 scaffolds contained 94.1 % of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. Our results indicate that the waratah genome is highly repetitive, with a repeat content of 62.3 %. Conclusions: The T. speciosissima genome (Tspe_v1) will accelerate waratah evolutionary genomics and facilitate marker assisted approaches for breeding. Broadly, it represents an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.

Monday, 31 May 2021

Gabriella Cozijnsen (Honours student)

Gabriella Cozijnsen completed a Bachelor of Science from UNSW in 2020, majoring in genetics and physics with minor in chemistry. She commenced her honours (Genetics) with the Edwards Lab in Term 2, 2021.

Gabby’s project is to resolve and characterise the sex chromosomes in our two snakes sequenced as part of the BABS Genome Project and in collaboration with the The Australian Amphibian and Reptile Genomics Initiative (AusARG) and A/Prof Paul Waters.

Monday, 19 April 2021

Intergenerational effects of manipulating DNA methylation in the early life of an iconic invader

Another cane toad paper has hit the shelves! This is another paper from our ongoing collaboration with Lee Ann Rollins and her great team of invasion biologists and molecular ecologists. This paper once again uses our draft cane toad genome* and builds on the previous cane toad methylation analysis by PhD student Roshmi Sarma to look at some really interesting intergenerational effects. [*Genome update coming soon - watch this space!]

Could this be epigenetic inheritance? Maybe. But it could also be some kind of parental germline thing. Either way, it’s a fascinating result and further evidence to support the fact that whilst our genome may establish our genetic potential, it does not control our destiny.

And if you want to know what it takes to identify effects like this, check out the mind-boggling experiment design Figure! (PDF available on request!)


This article is part of the theme issue ‘How does epigenetics influence the course of evolution?’

Sarma RR, Crossland MR, Eyck HJF, Edwards RJ, DeVore JL, Cocomazzo M, Zhou J, Brown GP, Shine R & Rollins LA (2021): Intergenerational effects of manipulating DNA methylation in the early life of an iconic invader. Philosophical Transactions of the Royal Society B 376:20200125. [Phil Trans Roy Soc B] [PubMed]

Abstract

In response to novel environments, invasive populations often evolve rapidly. Standing genetic variation is an important predictor of evolutionary response but epigenetic variation may also play a role. Here, we use an iconic invader, the cane toad (Rhinella marina), to investigate how manipulating epigenetic status affects phenotypic traits. We collected wild toads from across Australia, bred them, and experimentally manipulated DNA methylation of the subsequent two generations (G1, G2) through exposure to the DNA methylation inhibitor zebularine and/or conspecific tadpole alarm cues. Direct exposure to alarm cues (an indicator of predation risk) increased the potency of G2 tadpole chemical cues, but this was accompanied by reductions in survival. Exposure to alarm cues during G1 also increased the potency of G2 tadpole cues, indicating intergenerational plasticity in this inducible defence. In addition, the negative effects of alarm cues on tadpole viability (i.e. the costs of producing the inducible defence) were minimized in the second generation. Exposure to zebularine during G1 induced similar intergenerational effects, suggesting a role for alteration in DNA methylation. Accordingly, we identified intergenerational shifts in DNA methylation at some loci in response to alarm cue exposure. Substantial demethylation occurred within the sodium channel epithelial 1 subunit gamma gene (SCNN1G) in alarm cue exposed individuals and their offspring. This gene is a key to the regulation of sodium in epithelial cells and may help to maintain the protective epidermal barrier. These data suggest that early life experiences of tadpoles induce intergenerational effects through epigenetic mechanisms, which enhance larval fitness.

Thursday, 8 April 2021

Transcript- and annotation-guided genome assembly of the European starling

Our starling genome paper is now available as a pre-print on bioRxiv! This was some great work by PhD student, Kat Stuart. Kat assembled a new Australian starling genome, using a combination of linked reads, low coverage long reads, and long-read PacBio iso-seq transcriptomics data. A second Illumina assembly of a North American group is also presented. As we saw with our Basenji genome paper, having two (or more) genomes from a species can be really useful for disentangling real difference from assembly artefacts. (No assembly is perfect!)

This paper is a great example of how a bit of TLC and imagination can get the most out of data produced with a limited budget. We were unable to get deep long-read sequencing this time, but instead show the additional power that long-read full-length transcriptome data can provide in assembling a genome - above and beyond the annotation.

This paper also officially describes a couple of genomics tools from the lab BUSCOMP has been in the works for some time, and this paper updates previous results to BUSCO v5 analysis and confirms our previous snake results using starling-derived test data. BUSCO is a powerful and popular tool that estimates genome completeness using gene prediction and curated models of single-copy protein orthologues. However, we demonstrate how results can be counterintuitive: adding/removing scaffolds can alter BUSCO predictions elsewhere in the assembly, while low sequence quality may reduce “completeness” scores and miss genes that are present in the assembly. BUSCOMP (BUSCO Compilation and Comparison) (https://github.com/slimsuite/buscomp) complements BUSCO to identify/overcome these issues by compiling a non-redundant set of the highest-scoring single-copy BUSCO complete sequences and re-searching these against assemblies for consistent completness scoring. SAAGA (https://github.com/slimsuite/saaga) is a new tool for annotation versus reference proteome comparisons. SAAGA can compare different annotations of the same assembly, or be combined with a lightweight annotation tool like GeMoMa to compare different assemblies of the same organism.


Stuart KC, Edwards RJ, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL & Rollins LA (preprint): Transcript- and annotation-guided genome assembly of the European starling. bioRxiv 2021.04.07.438753; doi: 10.1101/2021.04.07.438753. [*Joint first authors] [bioRxiv]

Abstract

The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second North American genome (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterisation. S. vulgaris vAU combined 10x Genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 1,628 scaffolds (72.5 Mb scaffold N50). Species-specific transcript mapping and gene annotation revealed high structural and functional completeness (94.6% BUSCO completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Rapid, recent advances in sequencing technologies and bioinformatics software have highlighted the need for evidence-based assessment of assembly decisions on a case-by-case basis. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, SAAGA) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counter-intuitive behaviour in traditional BUSCO metrics, and present BUSCOMP, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. Finally, we present a second starling assembly, S. vulgaris vNA, to facilitate comparative analysis and global genomic research on this ecologically important species.

Thursday, 18 March 2021

Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome

Our latest genome paper is now out at BMC Genomics. This was a second collaboration in the team behind the German Shepherd Dog genome last year, led by Bill Ballard. This time, we used a combination of BGI short reads, ONT long reads, and Hi-C scaffolding to generate a chromosome-length assembly. This is one of the most intact and complete dog genomes generated to date, and joins only a handful of published breed-specific chromosome-length assemblies.

The Basenji is particularly interesting as it sits at the base of the dog breed family tree, making it a good unbiased reference for future comparisons between breeds.

The paper also has a few nice nuggets for those interesting in genome assembly. Of particular interest, our initial assembly had an artefact where the entire mitochondrial genome got assembled (in two copies) into the middle of one of the nuclear chromosomes. It is not entirely clear why this happened, but it was inserted into a NUMT (nuclear mitochondrial DNA insertion) fragment at that location. To make finding such things easier, we’ve released a new NUMT finding tool, NUMTFinder.

As with our previous dog genome, the German Shepherd Dog, we also observe that the tandem repeat of Amy2B (Amylase Alpha 2B) genes, was assembled intact but with fewer copies than are present in the actual genome. (This gene is of interest for dog domestication and adaptations to a starch-rich diet.) Crucially, without looking at the raw sequencing data, it would not have been clear that the assembly under-represents Amy2B copy number. This kind of analysis can be repeated using the regcheck or regcnv run modes of Diploidocus, which estimates the copy number of a region based on its read depth versus the single-copy read depth determined from BUSCO single-copy complete genes.

Overall, this presents a nice case study of the need for a bit of TLC and manual curation, even when you have some very impressive completeness and contiguity statistics.


Edwards RJ, Field MA, Ferguson JM, Dudchenko O, Keilwagen K, Rosen BD, Johnson GS, Rice ES, Hillier L, Hammond JM, Towarnicki SG, Omer A, Khan R, Skvortsova K, Bogdanovic O, Zammit RA, Lieberman Aiden E, Warren WC & Ballard JWO (2021): Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genomics 22:188

Abstract

Background: Basenjis are considered an ancient dog breed of central African origins that still live and hunt with tribesmen in the African Congo. Nicknamed the barkless dog, Basenjis possess unique phylogeny, geographical origins and traits, making their genome structure of great interest. The increasing number of available canid reference genomes allows us to examine the impact the choice of reference genome makes with regard to reference genome quality and breed relatedness.

Results: Here, we report two high quality de novo Basenji genome assemblies: a female, China (CanFam_Bas), and a male, Wags. We conduct pairwise comparisons and report structural variations between assembled genomes of three dog breeds: Basenji (CanFam_Bas), Boxer (CanFam3.1) and German Shepherd Dog (GSD) (CanFam_GSD). CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. By aligning short read data from 58 representative dog breeds to three reference genomes, we demonstrate how the choice of reference genome significantly impacts both read mapping and variant detection.

Conclusions: The growing number of high-quality canid reference genomes means the choice of reference genome is an increasingly critical decision in subsequent canid variant analyses. The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. However, we believe more comprehensive analyses across the entire family of canids is more suited to a pangenome approach. Collectively this work highlights the importance the choice of reference genome makes in all variation studies.

Monday, 15 February 2021

Cadel Watson (Honours student)

Cadel Watson joined the lab in 2021 as an Honours student. His project focuses on the identification of ultra-conserved elements in genomes, including exploring their definition and building analysis tools, and potential applications of UCEs to assessing genome completeness. Cadel is in the final year of a Bachelor of Engineering (Bioinformatics) degree.

[LinkedIn]

Monday, 1 February 2021

Dr Collin Ahrens (Postdoc)

Dr Collin Ahrens joined the Edwards lab as a postdoctoral researcher in January of 2021. He studies local adaptation to environmental variation (e.g. pathogens, drought, heatwaves etc.), and investigates how these adaptive patterns can be used to manage plant populations.

Local adaptation is often driven by differential physiological responses to environmental stress, controlled by genetic mechanisms. Therefore, he focuses on the E + G = P paradigm to ask questions such as how do populations evolve such different responses to different environmental conditions? And how do species evolve such different responses to the same environmental conditions? To answer these fundamental questions, he leverages several computational techniques to disentangle patterns of adaptation. At the Edwards lab, he will use whole genome sequencing, quantitative genetics, and physiological experimentation to explore how myrtle rust resistance segregates within Melaleuca quinquenervia populations to assist in broader conservation programs, including applied outcomes such as seed collection and ex situ breeding programs.