Category Archives: genome sequencing

Aspergillus has a posse


Shepard Fairley has gotten alot of notice lately for his Obama art that has been replicated pretty much everywhere. I mocked up a homage to his earlier street art — here we’ll discuss the growing Aspergillus genome posse.

But the work from mainly the JCVI, Broad Institute, JGI, NITE, and Sanger centre has generated an excellent collection of genome sequences for the Eurotiales clade (feel free to get a login for the wiki and add other that are missing).  The Aspergillus community now has a AGD – Aspergillus Genome Database project that includes a curator of genome annotation (they are hiring) and presumably literature in the SGD and CGD model of curation.

I think a lot of other projects have a Posse too (or maybe just a loosely organized band) in terms of a community of people working on related species and willing to work together to coordinate.  As these sort of “clade” databases start to develop we will have better clusters of information that can be mapped among multiple species.

Eventually I hope this will spur efforts for more coordinated genome databases for comparative genomic and transfer of known gene and functional information between experimental systems.  The efforts really require coordination or centralization of the data so that gene models can be updated as well as orthologs and phylogenomic inference of function.

Yeast population genomics
I have cheered the Sanger-Wellcome SGRP group work to generate multiple Saccharomyces cerevisiae and S. paradoxus strain genome sequences.   The group had previously submitted a version of the manuscript to Nature precedings and it is now published in Nature AOP showing that submitting to a preprint server doesn’t necessarily hurt your manuscript getting published…  The research groups explored the impact of domestication (as was also recently done for the sake and soy sauce worker fungus, Aspergillus oryzae) on the Saccharomyces genome by comparing individuals from wild strains of S. paradoxus.

This paper addressed several challenges including methodology for light genome sequencing for population genomics. This data represents in a way, a pilot project on for genome resequencing projects and using draft genome sequencing with next generation sequencing tools. Of course with the pace of sequencing technology development, any project more than a couple months old will be using outdated technology it seems, but this work represents some important progress.  Tools like MAQ were also developed and tuned as part of the project.  In addition to the methods development it also provided a new look at evolutionary dynamics of a well-studied fungus.

Genome assembly
The authors apply several different quality controls and utilize a new tool called PALAS (Parallel ALignment and ASsembly)  to assemble all the strains at the same time using a graph-based approach that utilized the reference genome sequences for each species. This is different than a full-blown WGA approach like PCAP, Phusion or Arachne because this is deliberately low-coverage sequencing pass.  The authors are trying impute missing sequence via Ancestral Recombination Graphs as implemented in the Margarita system.   They also use MAQ to align sequence from Illumina/Solexa sequencing to these assemblies made by PALAS.

Since this project was on two species of SaccharomycesS. cerevisiae and S. paradoxus they needed good reference assemblies for each of these species. The previously availably S.paradoxus assembly wasn’t complete enough for this study so they did an addition 4.3 X coverage with sanger/ABI sequencing and 80X coverage with Illumina.

Population genomics and domestication

The sequencing data also provided a framework for population genetic investigations. Some simple findings showed that geographic isolates within each species were more genetically similar to each other.  The main geographic regions of samples for S.paradoxus data included the UK, American, and Far East samples, some of which had been analyzed in a very nice study on Chromosome III.  For the S. cerevisiae samples there were individuals from around Europe, at least 10 European wine strains, Malaysian, Sake brewing strains, West Africa, and North America. From these data it was possible to discover that there are several of strains with mosiac genomes meaning that pieces of the genome match best with the sake fermentation strains and other parts from the wine/European samples.

Efforts to detect the effects of natural selection that may be linked to domestication of these strains explored two different approaches. The McDonald-Kreitman test did not identify any loci under positive selection while Tajima’s D was negative in the S.cerevisiae global and wine strain populations indicating an excess of singleton polymorphisms – though they draw little conclusions from that.  The authors also observed a sharper decay of linkage disequilibrium in S.cerevisiae (half maximum of 3kb) than S.paradoxus (half maximum 9kb) suggesting that S.cerevisiae is recombining more, either due to increased opportunities or a great frequency of recombination events when it does.

In context of the paper title and the idea of exploring the effects of domestication on the genome, the authors observe that the standard paradigm that ‘domesticated’ species have lower diversity levels is simply not the case in these samples.  This isn’t to say there isn’t evidence of the selection for fermentation production from these strains based on the stress response conditions they were tested on, but that there is still ample evidence of maintaining diversity within the populations presumably through various amounts of outcrossing.

We are also interested in these results as we apply similar questions to population genomics of the human pathogenic fungus Coccidioides where 14 strains have been sequenced with sanger sequencing technology.  Hopefully some of these lessons will resonate in our analyses and also that this era of population genomics will see ever more extensive collections to address aspects of migration, phylogeography, and local adaptations within populations of fungi and other microbes.

Gianni Liti, David M. Carter, Alan M. Moses, Jonas Warringer, Leopold Parts, Stephen A. James, Robert P. Davey, Ian N. Roberts, Austin Burt, Vassiliki Koufopanou, Isheng J. Tsai, Casey M. Bergman, Douda Bensasson, Michael J. T. O’Kelly, Alexander van Oudenaarden, David B. H. Barton, Elizabeth Bailes, Alex N. Nguyen, Matthew Jones, Michael A. Quail, Ian Goodhead, Sarah Sims, Frances Smith, Anders Blomberg, Richard Durbin, Edward J. Louis (2009). Population genomics of domestic and wild yeasts Nature DOI: 10.1038/nature07743

First release of N.tetrasperma and N.discreta

The JGI in collaboration with our lab at Berkeley have released the Neurospora tetrasperma (mat A) and N. discreta (mat A) genome sequences and annotation after about two years of work.  These are two closely related species to the well studied laboratory workhorse Neurospora crassa.

The N.tetrasperma assembly (8X) has an N50 of 976kb and is highly colinear with the N.crassa genome.  With the JGI, we’ve also done some additional 454 sequencing which will represent an improved assembly and 23X coverage in the next release.  We also did some comparative scaffolding and can basically double that N50 – most of which looks good when compared to the improved V2 assembly.

The N.discreta assembly (8X) is also quite good with an N50 of 2.3 Mb. For comparison, the V7 of N.crassa has an N50 of 664 kb. although with genetic map information the 250+ contigs can be scaffolded into 7 chromosomes with 146 unmapped contigs.

Both N.discreta and N.tetrasperma genomes contain about 10k predicted genes similar to counts in other related species like N.crassa and Podospora anserina.

We’re finalizing several analyses to present at the Asilomar meeting to describe these Neurospora genomes and comparisons with other Sordariomycete species.

Brown rotting fungal genome published

ResearchBlogging.orgPostia placenta genome is now published in early edition of PNAS.   Brown rotting fungi are import part of the cellulose degrading ecology of the forest as well (hopefully) providing some enzymes that will help in the ligin to biofuels process. Brown rotters break down cellulose but cannot break down lignin or lignocellulose while white rotters (like the previously sequenced Phanerochaete chrysosporium) are able to break down the lignin.  This fungus was chosen for sequencing as it is another potentially helpful fungus in the war on sugars (turning them into fuels) including recently published Trichoderma reesei and 1st basidiomycete genome Phanerochaete (all incidentally with the Diego Martinez as first author – go Diego!). It is also helpful to contrast the white and brown rotters to understand how their enzyme capabilities have changed and how these different lifestyles evolved.  There had been some issues with the initial assembly of this genome which is basically twice as big as one would expect because the dikaryon genome was sequenced – this is where two nuclei with different genomes are present as the result of fusion between two parents of opposite mating types.  When genome sequenced is performed it is hard to assemble these into a single assembly since there are really two haplotypes present.  So these haplotypes have to be sorted out to obtain the gene ‘count’ for the organism for those who like simple numbers. This is a similar situation to the Candida albicans genome, although those haplotypes are much more similar.  The main problem is that one has to generate twice as much sequence to get the same coverage of each haplotype without playing some tricks to collapse them into a consensus and them afterwards separate the haplotypes back out.  At any rate, this sequenced provided a good summary of the gene content and thus metabolic and enzymatic capabilities to match up functional data collected from LC/MS and transcriptional profiling. 

There are several other rotting fungi that are nearly done at JGI (but the task of writing and coordinating the analyses for the papers are ongoing!) include Schizophyllum commune and Pleurotus ostreatus. There are also several more mycorrhizal and plant pathogenic basidiomycete fungi as well as some classic model systems that have finished genomes and are in the process of finalizing papers.  It is an exciting time that is just beginning as these genome and transcriptional data are integrated and compared for their different ecological, morphological, and metabolic capabilities.

The article is unfortunately not Open Access so I haven’t even read it from home yet, but pass along this news to you, dear reader. Will get a chance to read through more than the abstract to see what glistening gems have been extracted from this genomic endeavor.
D. Martinez, J. Challacombe, I. Morgenstern, D. Hibbett, M. Schmoll, C. P. Kubicek, P. Ferreira, F. J. Ruiz-Duenas, A. T. Martinez, P. Kersten, K. E. Hammel, A. V. Wymelenberg, J. Gaskell, E. Lindquist, G. Sabat, S. S. BonDurant, L. F. Larrondo, P. Canessa, R. Vicuna, J. Yadav, H. Doddapaneni, V. Subramanian, A. G. Pisabarro, J. L. Lavin, J. A. Oguiza, E. Master, B. Henrissat, P. M. Coutinho, P. Harris, J. K. Magnuson, S. E. Baker, K. Bruno, W. Kenealy, P. J. Hoegger, U. Kues, P. Ramaiya, S. Lucas, A. Salamov, H. Shapiro, H. Tu, C. L. Chee, M. Misra, G. Xie, S. Teter, D. Yaver, T. James, M. Mokrejs, M. Pospisek, I. V. Grigoriev, T. Brettin, D. Rokhsar, R. Berka, D. Cullen (2009). Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0809575106

Coprinopsis cinereus genome annotation updated

Coprinus cinereus genome projectThe Broad Institute in collaboration with many of the Coprinopsis cinereus (Coprinus cinerea) community of researchers have updated the genome annotation for C. cinereus with additional gene calls based on ESTs and improved gene callers. The annotation was made on the 13 chromosome assembly produced by work by SEMO fungal biology group and collaborators across the globe including a BAC map from H. Muraguchi.  Thanks to Jonathan Goldberg and colleagues at the Broad Institute for getting this updated annotation out the door.


This updated annotation is able to join and split several sets of genes and the gene count sits at just under 14k genes in this 36Mb genome. There are a couple of hiccups in the GTF and Genome contig/supercontig file naming that I am told will be fixed by early next week.  Additional work to annotate the “Kinome” by the Broad team provides some promising new insight to this genome annotation as well.

We’re using this updated genome assembly address questions about evolution of genome structure by studying syntenic conservation and aspects of crossing over points during meiosis.  The C. cinereus system has long been used as model for fungal development and morphogensis of mushrooms as it is straightforward to induce mushroom fruiting in the laboratory.  It also a model for studying meiosis due to the synchronized meiosis occurring in the cells in the cap of the mushroom.

Happy genome shrooming.

Updated Cryptococcus serotype A annotation

SEM of clamp cell, yeast cells and sexual spore chains. Courtesy R. Velagapudi & J. Heitman

A new and improved annotation of Cryptococcus neoformans var grubii strain H99 (serotype A) has been made available in GenBank and the Broad Institute website. This update is collaboration between several groups providing data and analyses and the genome annotation team at the Broad Institute.

Some changes noted by the Broad Institute include:

“This release of gene predictions for the serotype A isolate Cryptococcus neoformans var. grubii H99 is based on a new genomic assembly provided by Dr. Fred Dietrich at the Duke Center for Genome Technology. The new assembly consists of 14 nuclear chromosomes and a single 21 KB mitochondrial chromosome, and has resulted in a reduction of the estimated genome size from 19.5 to 18.9 Mb. Improvements in the assembly and in our annotation process have resulted in a set of 6,967 predicted protein products, 335 fewer than the previous release.”

Genome survey sequencing of Witches’ Broom

Genome survey sequencing (1.9X coverage) was generated for Moniliophthora perniciosa, the cause of witches’ broom disease on cacao plants. The sequence for this basidiomycete plant pathogen was published in BMC Genomics this week. The authors report a higher number of ROS metabolism and P450 genes. Evaluating whether these copy number differences are significantly different from other basidiomycete fungi and are lineage specific expansions will help determine if these families played a role in the adaptation of this plant pathogen.

This work provides an important stepping stone in understanding and eventually controlling this pathogen which is devastating cacao plantations. An associated review describes what we have and can learn about Witches’ broom disease.

See related:

Jorge MC Mondego, Marcelo F Carazzolle, Gustavo GL Costa, Eduardo F Formighieri, Lucas P Parizzi, Johana Rincones, Carolina Cotomacci, Dirce M Carraro, Anderson F Cunha, Helaine Carrer, Ramon O Vidal, Raissa C Estrela, Odalys Garcia, Daniela PT Thomazella, Bruno V de Oliveira, Acassia BL Pires, Maria Carolina S Rio, Marcos Renato R Araujo, Marcos H de Moraes, Luis AB Castro, Karina P Gramacho, Marilda S Goncalves, Jose P Moura Neto, Aristoteles Goes Neto, Luciana V Barbosa, Mark J Guiltinan, Bryan A Bailey, Lyndel W Meinhardt, Julio CM Cascardo, Goncalo AG Pereira (2008). A genome survey of Moniliophthora perniciosa gives new insights into Witches’ Broom Disease of cacao BMC Genomics, 9 (1) DOI: 10.1186/1471-2164-9-548

Melampsora larici-populina genome sequenced

From Francis Martin

The DNA sequence of Melampsora larici-populina has been determined by the U.S. Department of Energy DOE Joint Genome Institute (DOE JGI). Annotations of the v1.0 assembly of Melampsora laricis-populina are publicly available at
Genome analyses have been carried out by an international consortium comprised of DOE JGI, France’s National Institute for Agricultural Research (F Martin et al., INRA-Nancy), Canadian Forest Service (R Hamelin et al., Laurentian Forestry Centre), and the Bioinformatics & Evolutionary Genomics Division (Rouzé et al., Gent University) in Belgium.

The poplar leaf rust fungus Melampsora is the most devastating and widespread pathogen of poplars, and has limited the use of poplars for environmental and wood production goals in many parts of the world. All known poplar cultivars are susceptible to Melampsora species, and new virulent strains are continuously developing. This disease therefore has a strong potential impact on current and future poplar plantations used for production of forest products (principally pulp and consolidated wood products), carbon sequestration, biofuels production, and bioremediation.

Lichen genome projects and the power shift prompted by next-gen sequencing

Genome Technology highlights the very cool thing about next-gen sequencing – it puts the power in the hands of the researchers to explore genome sequence and doesn’t limit them to projects only funded through sequencing centers. The Genome Technology piece highlights work at Duke to sequence the genome Cladonia grayi, a lichenized fungus, with 454 technology at Duke’s Institute for Genome Sciences and Policy through their next-gen sequencing program. This is the way of the future where sequencing core facilities will be able to generate sequence only having to wait in the queue at the own university rather than through community sequencing project or sequencing center proposal queues.

This isn’t the only lichen being sequenced. Xanthoria parietina is also in the queue at JGI, but has taken a while to get going because of some logistical problems getting the DNA (and any problems are amplified because it takes a long time to get new material since lichens grow very slow).

The transfer of the power for researchers to be able to quick exploratory whole-genome sequencing with next-gen and eventually, high quality genome sequences from next-gen sequencing is predicted to transform how this kind of science gets done. It means we’ll probably just sequence a mutant strain instead of trying to map the mutation – this is happening already in anecdotal stories in worms and in our work in mushrooms. N.B. this is done after a mutagenized strain has been cleaned up a bit to insure we’re looking for one or only a few mutations based on some crosses – but that is part of standard genetic approaches anyways.

This fast,cheap,whole-genome-sequencing is also the stuff of personal genomics, but for basic research it will also mean that a first pass exploring gene repertoire of an organism will be a multi-week instead of multi-year project. I just hope we’re training enough people who can efficiently extract the information from all this data with solid bioinformatics, computational, data-oriented programming, and statistical skills to support all the labs that will want to take this approach. You’ll need a life-vest to swim in the big data pool for a while until more tools are developed that can be deployed by non-experts.

P. chrysogenum genome

BBC news and GTO report the sequence of P. chrysogenum, will be published in October in Nat Biotechnology in a project based at the biotech company DSM. P. chrysogenum being the mold that fortuitously contaminated Dr Fleming’s bacterial plates.

The 13,500 reported genes in the press release is quite bit larger than relatives in the Aspergillus clade (~10,000 genes) so it will be intriguing to see what’s going on here and if there will be interesting examples of horizontal transfer like what has been investigated in Aspergillus oryzae. I am unclear as to whether the selected strain is a wild isolate or represents an industrial strain, but look forward to reading the full account of the genome.

Factoid – Most of the industrial fungal genome papers have seen publication in Nature Biotechnology (Aspergillus niger, Trichodermera reesei, and Phanerochaete chrysosporium).

Edit: 1-Oct-2008, Jonathan Badger, an author on the paper, blogs about the paper and links to the pre-print available on NBT site.