CSP: Letter of support time!

Several groups working on Fungi are submitting proposals to the JGI Community Sequencing Program.  Several proposals relating to the JGI’s interest in an encyclopedia of fungal genomes sequencing genomes of ascomycete and basidiomycete yeasts, filamentous ascomycetes, basidiomycetes,  and early diverging fungi are being put forward.  If you haven’t been contacted by these community members but would like to write a letter of support in these areas, please get in touch as the deadline for the proposals in early next week. There are also other proposals going in for Neurospora mutant strain resequencing, more Fusarium species, transcriptomes of mycorrhizal fungi, and other topics.  If you are are a user of data from any of the previous fungal projects that you know how important these resources are in both comparative genomics and molecular biology work, so support to get additional sequences generated will benefit many in the community.

I don’t know if it is appropriate for me to post the text of the solicitation for letters here, differing to the privacy of the groups submitting proposals, but if you want to help out the community by writing a letter showing that you would benefit from these resources, I can try to put you in touch.

For your reading pleasure

Too much on my plate as of late, so I’m woefully behind on posting much on interesting papers or news.  Here’s a short list of links and papers that are worth a look though.

  • “Evolution of pathogenicity and sexual reproduction in eight Candida genomes” published (Nature)
  • NYT Science article sort of summarizing the good, bad, and ugly of fungi and human interactions
  • Attempts to save amphibians from chytridiomycosis “Riders of a Modern-Day Ark” (PLoS Biology)
  • Looks like Scott Baker with the JGI are in the process of resequencing several classical mutant strains of Phycomyces, Neurospora and Cochliobolus, Cryphonectria for sequence-based mapping of mutants (i.e. here and here and here).

Yeast population genomics
I have cheered the Sanger-Wellcome SGRP group work to generate multiple Saccharomyces cerevisiae and S. paradoxus strain genome sequences.   The group had previously submitted a version of the manuscript to Nature precedings and it is now published in Nature AOP showing that submitting to a preprint server doesn’t necessarily hurt your manuscript getting published…  The research groups explored the impact of domestication (as was also recently done for the sake and soy sauce worker fungus, Aspergillus oryzae) on the Saccharomyces genome by comparing individuals from wild strains of S. paradoxus.

This paper addressed several challenges including methodology for light genome sequencing for population genomics. This data represents in a way, a pilot project on for genome resequencing projects and using draft genome sequencing with next generation sequencing tools. Of course with the pace of sequencing technology development, any project more than a couple months old will be using outdated technology it seems, but this work represents some important progress.  Tools like MAQ were also developed and tuned as part of the project.  In addition to the methods development it also provided a new look at evolutionary dynamics of a well-studied fungus.

Genome assembly
The authors apply several different quality controls and utilize a new tool called PALAS (Parallel ALignment and ASsembly)  to assemble all the strains at the same time using a graph-based approach that utilized the reference genome sequences for each species. This is different than a full-blown WGA approach like PCAP, Phusion or Arachne because this is deliberately low-coverage sequencing pass.  The authors are trying impute missing sequence via Ancestral Recombination Graphs as implemented in the Margarita system.   They also use MAQ to align sequence from Illumina/Solexa sequencing to these assemblies made by PALAS.

Since this project was on two species of SaccharomycesS. cerevisiae and S. paradoxus they needed good reference assemblies for each of these species. The previously availably S.paradoxus assembly wasn’t complete enough for this study so they did an addition 4.3 X coverage with sanger/ABI sequencing and 80X coverage with Illumina.

Population genomics and domestication

The sequencing data also provided a framework for population genetic investigations. Some simple findings showed that geographic isolates within each species were more genetically similar to each other.  The main geographic regions of samples for S.paradoxus data included the UK, American, and Far East samples, some of which had been analyzed in a very nice study on Chromosome III.  For the S. cerevisiae samples there were individuals from around Europe, at least 10 European wine strains, Malaysian, Sake brewing strains, West Africa, and North America. From these data it was possible to discover that there are several of strains with mosiac genomes meaning that pieces of the genome match best with the sake fermentation strains and other parts from the wine/European samples.

Efforts to detect the effects of natural selection that may be linked to domestication of these strains explored two different approaches. The McDonald-Kreitman test did not identify any loci under positive selection while Tajima’s D was negative in the S.cerevisiae global and wine strain populations indicating an excess of singleton polymorphisms – though they draw little conclusions from that.  The authors also observed a sharper decay of linkage disequilibrium in S.cerevisiae (half maximum of 3kb) than S.paradoxus (half maximum 9kb) suggesting that S.cerevisiae is recombining more, either due to increased opportunities or a great frequency of recombination events when it does.

In context of the paper title and the idea of exploring the effects of domestication on the genome, the authors observe that the standard paradigm that ‘domesticated’ species have lower diversity levels is simply not the case in these samples.  This isn’t to say there isn’t evidence of the selection for fermentation production from these strains based on the stress response conditions they were tested on, but that there is still ample evidence of maintaining diversity within the populations presumably through various amounts of outcrossing.

We are also interested in these results as we apply similar questions to population genomics of the human pathogenic fungus Coccidioides where 14 strains have been sequenced with sanger sequencing technology.  Hopefully some of these lessons will resonate in our analyses and also that this era of population genomics will see ever more extensive collections to address aspects of migration, phylogeography, and local adaptations within populations of fungi and other microbes.

Gianni Liti, David M. Carter, Alan M. Moses, Jonas Warringer, Leopold Parts, Stephen A. James, Robert P. Davey, Ian N. Roberts, Austin Burt, Vassiliki Koufopanou, Isheng J. Tsai, Casey M. Bergman, Douda Bensasson, Michael J. T. O’Kelly, Alexander van Oudenaarden, David B. H. Barton, Elizabeth Bailes, Alex N. Nguyen, Matthew Jones, Michael A. Quail, Ian Goodhead, Sarah Sims, Frances Smith, Anders Blomberg, Richard Durbin, Edward J. Louis (2009). Population genomics of domestic and wild yeasts Nature DOI: 10.1038/nature07743

Fungal genome assembly from short-read sequences

This is a research blog so I though I’d post some quick numbers we are seeing for de novo assembly of the Neurospora crassa genome using Velvet. The genome of N.crassa is about 40Mb and sequencing of several flow cells using Solexa/Illumina technology to see what kind of de novo reconstruction we’d get. I knew that this is probably insufficient for a very good assembly given what has been reported in the literature, but sometimes it is helpful to give it a try on local data.  Mostly this is a project about SNP discovery from the outset. I used a hash size of 21 in velvet with an early (2FC) and later (4FC) dataset. Velvet was run with a hashsize of 21 for these data based on some calculations and running it with different hash sizes to see the optimal N50.  Summary contig size numbers come from the commands using cndtools from Colin Dewey.

  faLen < contigs.fa | stats

2 flowcells (~10M reads @36bp/read or about 10X coverage of 40Mb genome)

            N = 199562
          SUM = 25463251
          MIN = 49
       MEDIAN = 107.0
          MAX = 5371
         MEAN = 127.59568956
          N50 = 130

4 flow cells  (~20M reads @36bp/read; or about 20X coverage of a 40Mb genome)

            N = 102437
          SUM = 38352075
          MIN = 41
 1ST-QUARTILE = 77.0
       MEDIAN = 153
          MAX = 7189
         MEAN = 374.396702363
          N50 = 837

So that’s N50 of 837bp – for those used to seeing N50 on the order or 1.5Mb this is not great.  But from4 FC worth of sequencing which was pretty cheap.  This is a reasonably repeat-limited genome so we should get pretty good recovery if the seq coverage is high enough. Using Maq we can both scaffold the reads and recover a sufficient number of high quality SNPs for the mapping part of the project.

To get a better assembly one would need much deeper coverage as Daniel and Ewan explain in their Velvet paper and shown in Figure 4 (sorry, not open-access for 6 mo). Full credit: This sequence was from unpaired sequence reads from Illumina/Solexa Genomic sequencing done at UCB/QB3 facility on libraries prepared by Charles Hall in the Glass lab.

New Saccharomyces resequencing assembly

SGRP LogoDavid Carter at the Sanger Centre emailed a message that new assemblies of Saccharomyces strain resequencing project have been posted including a new three-way alignment of S. bayanus-S.paradoxus-S.cerevisiae. This updates the Dec 2007 release.

Next next-gen sequencing technology

I’m not at AGBT, but Jonathan and Anthony both have coverage of Pacific Biosciences’s new sequencing technology that uses detection of DNA polymerase activity to determine sequence.  I believe some of the details are in the paper “Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures“, but I’ve not had a chance to read it.

More updates on Saccharomyces resequencing project at Sanger

I’ve paraphrased an email sent by David Carter to folks interested in Saccharomyces resequencing project.

The latest version of the SGRP data is on the web site and ftp site. This release is somewhat provisional, and motivated more by the fact that we have a paper deadline coming up than by any claim to finality. It should be quite a bit better than what was there before, but doesn’t have a correct treatment of transposons.

You can get the data by starting here:

There is also a new version of the browser:

There are a few new features in the browser which [David] is going to document over the next couple of days.

Major new features of the data are that there should be much better consistency between alignments; Solexa/Illumina data has been incorporated for the strains that had it; and the S. paradoxus alignments are based on a new assembly that created a few weeks ago and which covers about 95% of the genome; a description is at

Saccharomyces strain sequencing

Blogging on Peer-Reviewed ResearchWhile many strains of S. cerevisiae are being sequenced, a single strain, YJM789, isolated from the lung of an AIDS patient was sequenced a few years ago at Stanford and published this summer. The genome was described in a paper entitled “Genome sequencing and comparative analysis of Saccharomyces cerevisiae strain YJM789″.

Yeast resequencing update

Ed Louis at Nottingham sent out an email today outlining plans for publishing analyses of the Saccharomyces Genome Resequencing Project.  They are in process of analyzing the data and ask that people respect their use of the data, but also invite collaborations and companion papers.

“If anyone has done or plans on doing a global analysis with a tight clean result which you think should be included in the overview paper, please contact us [Richard Durbin and Ed Louis; emails available through above links]. The analysis would have to be complete by 14 December and you would have to be willing to have the details transparently displayed on the web pages associated with the project.”

Next gen sequencing technology

Nature has an overview of what goes in and out of next generation sequencers with an interview with a smiling Chad Nusbaum from the Broad Institute. Most of these have been out and about for a while, but it seems that the hayride/bandwagon is starting to pick up more steam as GT‘s Genome Scan has several posts about sequencing referencing J. Craig V, George Church, and the Nature news article (not free).

Note that Solexa is no longer the cool name – “Genome analyzer” being the name for the machine that was previously called Solexa 1G. I’m holding out hope for funnier names in the future. I do feel that ABI’s choice of SOLiD is more exciting than 310/3700/3730 that is as inspiring as HAL9000.

But I mean if your technology is called pyrosequencing, I am hoping Roche will come up with a firey or at least smoldering play on words if they rename 454 again (GS FLX for now).