Wednesday, 16 January 2019

We have a new 10x Supernova assembly lab record

One of our sequencing approaches to play with at the moment is 10x Chromium linked read, assembled with Supernova. We’ve now done a bunch of different species, and they mostly come out pretty decent - and excellent value for money. As part of the Oz Mammals Genomics project, we are currently assembling several rock-wallabies. One of them, Petrogale mareeba has just broken our lab record for the most intact assembly - initial “pseudohaploid” stats:

  • Total length of sequences: 3,329,538,806
  • Max. length of sequences: 275,258,067
  • N50 length of sequences: 47,482,977
  • L50 count of sequences: 20
  • Gap (N) length: 40,183,160 (1.21%)

Over half of the 3.3 Gb genome is covered by just twenty scaffolds, at least 47.5 Mb in length - the longest is over 275 Mb! (We normally feel pretty happy to get an N50 of a few Mb.) Cuter than a cane toad too!

Photo credit: Richard.Fisher CC BY 2.0

Monday, 17 December 2018

What are we sequencing next? The waratah!

Thanks to seed funding from the UNSW, we were able to sequence two rainforest tree species earlier this year in collaboration with the Royal Botanic Gardens and Domain Trust (RBGDT), Sydney. I am pleased to announce that, together with RBGDT and the Blue Mountains Botanic Garden, Mt Tomah, we won a bid to sequence one of the first genomes as part of the new Genomics for Australian Plants Framework Initiative by Bioplatforms Australia: the NSW state flower, the Waratah (Telopea speciosissima).

As announced recently, this is one of three species selected for the initial pilot study. Details will be sorted out in the new year, but we will be looking to use a combination of 10x Genomics linked reads and long-read sequencing (PacBio and/or Nanopore).

Collaborators on the project: M Rossetto1, M van der Merwe1, H Sauquet1, P Lu-Irving1, J Bragg1, G Bourke2, RJ Edwards3

  1. Royal Botanic Gardens and Domain Trust, Sydney
  2. Blue Mountains Botanic Garden, Mt Tomah
  3. The University of New South Wales, Sydney

Image: Telopea speciosissima, Suellen’s Garden, Falls Ck NSW: Photo, Suellen Harris

Wednesday, 28 November 2018

EdwardsLab at #ABACBS2018

For those who missed it, there’s a (slightly old) poster version of my ABACBS 2018 - Sequencing snakes: Pseudodiploid pseudo-long-read whole genome sequencing and assembly of Pseudonaja textilis (eastern brown snake) and Notechis scutatus (mainland tiger snake). If anything in the talk (except the repeat stuff) looks useful to you, this is a citeable poster:

Edwards RJ et al. Pseudodiploid pseudo-long-read whole genome sequencing and assembly of Pseudonaja textilis (eastern brown snake) and Notechis scutatus (mainland tiger snake) [version 1; not peer reviewed]. F1000Research 2018, 7:753 (poster) (doi: 10.7490/f1000research.1115550.1)

We’re still developing the genome size prediction and BUSCO comparison/compilation tools, so get in touch if either of these look useful to you.

ABACBS2018 Posters

We have three lab posters in Poster session 2 this morning:

  • Poster #16. Åsa Pérez-Bercoff, Using structural variant detection to resolve difficult regions of a genome assembly.

  • Poster #21. Kirsti Paulsen, Optimising intrinsic protein disorder prediction for short linear motif discovery.

  • Poster #26. Katarina Stuart, Evolution in invasive populations: using genomics to reveal drivers of invasion success in the Australian European starling (Sturnus vulgaris) introduction across Australia.

Also check out the posters of our UNSW neighbours from the Wilkins lab:

  • Poster #29. Chi Nam Ignatius (Igy) Pang, Benchmarking Protein Correlation Profiling datasets against reference protein complexes: case studies in S. cerevisiae.

  • Poster #44. Susan Corley, QuantSeq 3’ sequencing paired with Salmon quantification provides a fast reliable approach for high throughput transcriptomic analysis.

  • Poster #49. Xabier Vázquez-Campos, OTUreporter: an automated pipeline for the analysis and report of amplicon sequencing data.

Wednesday, 31 October 2018

SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions

Idrees S, Pérez-Bercoff Å & Edwards RJ. (2018) SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions. PeerJ 6:e5858


Many important cellular processes involve protein–protein interactions (PPIs) mediated by a Short Linear Motif (SLiM) in one protein interacting with a globular domain in another. Despite their significance, these domain-motif interactions (DMIs) are typically low affinity, which makes them challenging to identify by classical experimental approaches, such as affinity pulldown mass spectrometry (AP-MS) and yeast two-hybrid (Y2H). DMIs are generally underrepresented in PPI networks as a result. A number of computational methods now exist to predict SLiMs and/or DMIs from experimental interaction data but it is yet to be established how effective different PPI detection methods are for capturing these low affinity SLiM-mediated interactions. Here, we introduce a new computational pipeline (SLiMEnrich) to assess how well a given source of PPI data captures DMIs and thus, by inference, how useful that data should be for SLiM discovery. SLiMEnrich interrogates a PPI network for pairs of interacting proteins in which the first protein is known or predicted to interact with the second protein via a DMI. Permutation tests compare the number of known/predicted DMIs to the expected distribution if the two sets of proteins are randomly associated. This provides an estimate of DMI enrichment within the data and the false positive rate for individual DMIs. As a case study, we detect significant DMI enrichment in a high-throughput Y2H human PPI study. SLiMEnrich analysis supports Y2H data as a source of DMIs and highlights the high false positive rates associated with naïve DMI prediction. SLiMEnrich is available as an R Shiny app. The code is open source and available via a GNU GPL v3 license at: A web server is available at:

Ziying Zhang (Visiting Masters student)

Ziying Zhang is a visiting Masters student from Wageningen University in the Netherlands. She got a bachelor degree with the background of bioscience at Hainan University, China in 2016. She started her MSc in bioinformatics at Wageningen University in February 2017. The project for Ziying’s Masters thesis focused on developing a downstream analysis toolbox for microbial diversity analysis, executing SPARQL to query RDF datasets of 16s rRNA before launching an analysis to simplify the pre-processing procedures.

Ziying started an internship at the Edwards Lab at the end of October 2018, working on genome size prediction from kmer profiles. Her interests include genomics, genetics, bioinformatics, programming and data management.


Tuesday, 7 August 2018

Draft genome assembly of the invasive cane toad, Rhinella marina

Richard J Edwards, Daniel Enosi Tuipulotu, Timothy G Amos, Denis O’Meally, Mark F Richardson, Tonia L Russell, Marcelo Vallinoto, Miguel Carneiro, Nuno Ferrand, Marc R Wilkins, Fernando Sequeira, Lee A Rollins, Edward C Holmes, Richard Shine & Peter A White (2018): Draft genome assembly of the invasive cane toad, Rhinella marina. GigaScience giy095 (Adv. access, 07 August 2018)


Background. The cane toad (Rhinella marina formerly Bufo marinus) is a species native to Central and South America that has spread across many regions of the globe. Cane toads are known for their rapid adaptation and deleterious impacts on native fauna in invaded regions. However, despite an iconic status, there are major gaps in our understanding of cane toad genetics. The availability of a genome would help to close these gaps and accelerate cane toad research.

Findings. We report a draft genome assembly for R. marina, the first of its kind for the Bufonidae family. We used a combination of long read PacBio RS II and short read Illumina HiSeq X sequencing to generate a total of 359.5 Gb of raw sequence data. The final hybrid assembly of 31,392 scaffolds was 2.55 Gb in length with a scaffold N50 of 168 kb. BUSCO analysis revealed that the assembly included full length or partial fragments of 90.6% of tetrapod universal single-copy orthologs (n = 3950), illustrating that the gene-containing regions have been well-assembled. Annotation predicted 25,846 protein coding genes with similarity to known proteins in SwissProt. Repeat sequences were estimated to account for 63.9% of the assembly.

Conclusion. The R. marina draft genome assembly will be an invaluable resource that can be used to further probe the biology of this invasive species. Future analysis of the genome will provide insights into cane toad evolution and enrich our understanding of their interplay with the ecosystem at large.

(More details to follow in future posts.)

Sunday, 1 July 2018

Edwards Lab at Genetics Society of AustralAsia 2018

Åsa Pérez-Bercoff is attending GSA 2018 in Canberra, this week. Today at 16:30, she will be presenting an updated and extended version of her SBRS18 talk, Investigating the evolution of complex, novel traits using whole genome sequencing and molecular palaeontology as part of the Evolutionary Genetics session.

Find out what this plot means…

…and more!